Tuesday, March 27, 2018

Custom Model Scripts for Oracle Analytics



In this blog post we will discuss how to use custom models in OAC. We will walk-through the process of developing python scripts compatible with OAC, to train and apply a model using your own Machine Learning algorithm.

At the time of this blog, Oracle Analytics Cloud (OAC) is shipped with more than 10 Machine Learning algorithms which fall under Supervised (Classification, Regression) and Unsupervised(Clustering) Learning. List of inbuilt algorithms in OAC include CART, Logistic Regression, KNN, KMeans, Linear Regression, Support Vector Machine and Neural Networks (for an exhaustive list of inbuilt algorithms in OAC please refer to our earlier blog post). These inbuilt algorithms cover majority of the real world business use cases. However sometimes users are faced with cases/datasets where they need use to use a different algorithm to achieve their goal.

In such cases OAC lets users write their own scripts to Train and Apply a model using algorithm of their choice. We need to develop two scripts for working with custom models: First script to train your model and Second script to score/apply the model you just trained. Like other custom Advanced Analytics (Python or R) scripts Train and Apply model scripts need to be embedded in XML format.


Oracle Analytics-Library has an example for custom Train/Apply Model using Support Vector Regression (SVR). Please refer that sample to know more about the required XML structure before proceeding further. We'll use that sample to walk through these script parts.



Model-Train-Script:

1) Capturing Parameters:

    Data is typically prepared and pre-processed before it is sent for Training a model. Pre-processing
    involves filling missing values, converting categorical values to numerical values (if needed) and 
    standardizing the data. The way this data pre-processing is done can influence the accuracy of a
    model to a good extent. In OAC users are provided with parameters in Train Model UI to
    choose/influence the methods to be used for pre-processing. Also through the same UI users are
    provided a bunch of options to tune their model. All the parameters sent from this User Interface
    need to be captured before we start processing the data. For example in Train Model script for SVR
    following snippet of code reads all the parameters:

         ## Read the optional parameters into variables
         target = args['target']
         max_null_value_percent = float(args['maximumNullValuePercent'])
         numerical_impute_method = args['numericalColumnsImputationMethod']
         ...

2) Data Pre-processing(optional):

     Before we train a model data needs to be cleansed and normalized if necessary to get better
     prediction. In some cases Training data may already be cleansed and processed and be ready for
     training. But if the data is not cleansed and prepared user can define their own functions to
     perform the cleansing operation or use inbuilt methods in OAC to perform these operations.
     Following blog discusses in detail how to use inbuilt methods in OAC to perform
     cleanse/prepare the data for Training a model.

3) Train/Create Model:
     Now we are ready for actually training the model. Train Model process can be sub-divided into 2
     steps: 1) Splitting the data for testing the model 2) Train the model which contain model
     performance/accuracy details.

    Train-Test split: It is a good strategy to keep aside some randomized portion of the Training data
    for testing. This portion of data will be used for evaluating the model performance in terms of
    accuracy. Amount of data to be used for Training and Testing is controlled by a user parameter
    called split. And there is an inbuilt method for performing this split in a randomized fashion so as
    to avoid any bias or class imbalance problems. Following snippet of code performs this Train-Test
    split:
 
        # split data into test and train
        train_X, test_X, train_y, test_y = train_test_split(features_df, target_col, test_size=test_size,
        random_state=0)


    Train Model: Now we have datasets ready for Training and testing the model. It's time to train the
    model using inbuilt train methods for that particular algorithm. fit() is the inbuilt method for most
    of the algorithms implemented in python. Following snippet of code does that for SVR algorithms:

        # construct model formula
        svr = SVR(kernel=kernel, gamma = 0.001, C= 10)
        SVR_Model = svr.fit(train_X, train_y)

4) Save Model:

    Model that we created in the previous step needs to be saved/persisted so that it can be accessed
    during Apply/Scoring Model phase.
 
   Save Model as pickle Object: Created models are saved/stored as pickle objects and they are re-
   accessed during Apply model phase using reference name. There are inbuilt methods in OAC to
   save the model as pickle object. Following snippet of code saves the model as pickle object:

        # Save the model as a pickel object. And create a reference name for the object.
        pickleobj={'SVRegression':SVR_Model}
        d = base64.b64encode(pickle.dumps(pickleobj)).decode('utf-8')
 
   In this case SVRRegression is the reference name for the model created. The pickle doesnt have to
   be just the model. In addition to the model other information and objects can also be saved as pickle
   file. For example if wish to save additional flags standardizer indexes along with the model, you
   can create a dictionary object which contains the model and the flag/indexer and save this entire
   dictionary as a pickle object.

5) Add Related Datasets (optional): Now that we have the model, let us see how well this model
   performs. In the previous step we have set aside some part of the data for testing the model. Now
   using that testing dataset let us calculate some accuracy metrics and store them in Related datasets.
   This is an optional step and in cases where users are confident about model's accuracy they can skip
   this step. However if users wish to view the accuracy metrics or populate them in quality tab they
   can use inbuilt methods in OAC to create the related datasets. More information on how to add
   these Related datasets can be found in this blog: How to create Related datasets

6) Populate Quality Tab (optional): In the model Inspect pane, there is a tab called Quality. This
    tab visualizes the model accuracy details. Users can view the quality tab and evaluate the model
    accuracy and decide if he/she wants to tune the model further or use it for prediction. Please note
    that this is also an optional step and not mandatory. However if users wish to view the model
    quality details in Quality tab then they can use inbuilt functions in OAC. More details on inbuilt
    functions that populate quality tab can be found in this blog How to Populate Quality Tab.

Now the Model along with related datasets is all prepared and saved in a pickle object. This marks the end of Train Model phase. The script returns the model to the framework and it is stored in dataset storage. If the train model executes successfully you should find the model in Machine Learning > Models tab. For example I have created a model called SVR Donations using the SV Regression scripts uploaded in the analytics-library:

                                    

Model-Apply-Script:

Apply script should have the same name as Train script except for train part i.e. for example it should follow nomenclature: OAC.ML.Algo_Name.apply.xml. Apply script accepts the model name, and other data pre-processing parameters and user parameters as input to the script. Most of the pre-processing steps are same as what we have done in Train Model scripts

1) Capturing Parameters
2) Data Pre-processing: Same inbuilt methods can be used for cleansing (filling missing values), Encoding and Standardizing the Scoring data.

After the data is cleansed and standardized it can be used for Prediction/Scoring. 

Load Model and Predict:
    Using the reference name we gave to the model in Train Script, retrieve the model pickle object and predict the results for Cleansed and Standardized scoring data. Following code in SVR Apply model script does that:

       ## Load the pickle object that we saved during Train Model phase. It is stored as an element 
       in dictionary. Fetch it using the reference name given.

       pickleobj = pickle.loads(base64.b64decode(bytes(model.data, 'utf-8')))
       SVR_Model=pickleobj['SVRegression']
    
       ## Predict values.
       y_pred=SVR_Model.predict(data)
       y_pred_df = pd.DataFrame(y_pred.reshape(-1, 1), columns=['PredictedValue'])

If includeInputColumns option is set to True, the framework appends the predicted result to input columns and return the complete dataframe.

This concludes the process of developing scripts for Train and Apply of Custom models.


Related Blogs: Prepare data using inbuilt functions in OAC, How to add Related Datasets, How to Populate Quality Tab

Are you an Oracle Analytics customer or user?

We want to hear your story!

Please voice your experience and provide feedback with a quick product review for Oracle Analytics Cloud!
 

41 comments:

harikasri.blogspot.com said...
This comment has been removed by the author.
brindhaajay said...

Great work. Very useful blog.
Got many ideas about the oracle. Thanks for the blog.
oracle certification
oracle training
oracle certification cource in Coimbatore
oracle training center in Coimbatore
oracle course in bangalore
best oracle training institutes in bangalore

pridesys said...

Nice post.Thanks for sharing this post. Machine Learning is steadily moving away from abstractions and engaging more in business problem solving with support from AI and Deep Learning. With Big Data making its way back to mainstream business activities, to know more information visit: Pridesys IT Ltd

Anjali Siva said...

I am very happy to visit your blog. This is definitely helpful to me, eagerly waiting for more updates.
Data Science Training in Chennai
Data Science Course in Chennai
Data Analytics Courses in Chennai
Machine Learning Course in Chennai
Machine Learning Training in Chennai
RPA Training in Chennai
Data Science Training in Velachery
Data Science Course in Chennai

lost_in_woods said...

nice blog , very helpful and visit us for VISUALIZATION SERVICES in India

Sivanandhana Girish said...

Thanks for sharing this information. This is really useful. Keep doing more.
Spoken English Classes in Chennai
Best Spoken English Classes in Chennai
IELTS Coaching in Chennai
IELTS Coaching Centre in Chennai
English Speaking Classes in Mumbai
English Speaking Course in Mumbai
IELTS Classes in Mumbai
IELTS Coaching in Mumbai
IELTS Coaching in Anna Nagar
Spoken English Class in Anna Nagar

manisha said...

Thanks for sharing such a great blog Keep posting.. 
Machine Learning Training in Delhi
Machine Learning Training institute in Delhi

Vijay Kumar said...

nice

Vijay Kumar said...

Very informative post I enjoyed reading it. Are you want to join the good data science training in Gurgaon

Yazhini said...

Good post. thanks for sharing with us.
Bigdata training in Pallikranai
Machine learning training in Pallikranai
Spark training in Pallikranai
Data analytics training in Pallikranai
Data science training in Pallikranai
Spark with ML training in Pallikranai
Python training in Pallikranai
MongoDB training in Pallikaranai
Hadoop training in Pallikranai

Ramakrishna said...

Very good explanation sir. Thank you for sharing
Ionic Online Training
Ionic Training
Ionic Online Training in Hyderabad
Ionic Training in Ameerpet

citu said...

Thanks for sharing the great post.
Machine Learning training in Pallikranai Chennai
Pytorch training in Pallikaranai chennai
Data science training in Pallikaranai
Python Training in Pallikaranai chennai
Deep learning with Pytorch training in Pallikaranai chennai
Bigdata training in Pallikaranai chennai

citu said...

Nice post. Keep sharing.
Machine Learning training in Pallikranai Chennai
Pytorch training in Pallikaranai chennai
Data science training in Pallikaranai
Python Training in Pallikaranai chennai
Deep learning with Pytorch training in Pallikaranai chennai
Bigdata training in Pallikaranai chennai

tejaswini said...

Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!
data scientist certification malaysia

Reethu said...

Good post. Thanks for sharing.
Machine Learning training in Pallikranai Chennai
Pytorch training in Pallikaranai chennai
Data science training in Pallikaranai
Python Training in Pallikaranai chennai
Deep learning with Pytorch training in Pallikaranai chennai
Bigdata training in Pallikaranai chennai
Mongodb Nosql training in Pallikaranai chennai
Spark with ML training in Pallikaranai chennai
Data science Python training in Pallikaranai
Bigdata Spark training in Pallikaranai chennai

latesttechnologyblogs said...

Enjoyed reading the article above, really explains everything in detail, the article is very interesting and effective. Thank you and good luck for the upcoming articles machine learning training

Adhuntt said...

Great blog thanks for sharing Your website is the portal to your brand identity. The look and feel of every page carry a strong message. This is why your brand needs the best web design company in chennai to capture your visions and make it art. Adhuntt Media is graced with the most creative design team in Chennai.
digital marketing agency in chennai

Pixies said...

Excellent blog thanks for sharing Looking for the best place in Chennai to get your cosmetics at wholesale? The Pixies Beauty Shop is the premium wholesale cosmetics shop in Chennai that has all the international brands your salon deserves.
Cosmetics Shop in Chennai

Bala said...

I feel very happy to visit your webpage and seeing forward for more updates about this topic...
Placement Training in Chennai
Training institutes in Chennai with placement
Oracle DBA Training in Chennai
Social Media Marketing Courses in Chennai
Pega Training in Chennai
Job Openings in Chennai
Oracle Training in Chennai
Primavera Training in Chennai
Unix Training in Chennai
Placement Training in OMR
Placement Training in Velachery

visionhook said...

Nice blog,
Thanks for this amazing blog and information.

Data science and Machine Learning with python Training in Noida
http://visionhook.in/best-data-science-and-machine-learning-with-python-training-in-noida.html

Johan said...

This is most informative and also this post most user friendly and super navigation to all posts. Thank you so much for giving this information to me.datascience with python training in bangalore










Deepak Kumar said...

Hi,
I found the article plainly considerable and the information published by you might benefit innumerable learners. Continue sharing this beneficial guidance and keep sharing.
Thanks!

Oracle Certification

Oracle Certification

Tech Leads IT said...

Tech Leads IT provides best training for the Oracle Fusion Financials training with the best efficient trainers in the market.
Oracle fusion financials training
Oracle Fusion Financials free demo session

manisha said...

Really a awesome blog for the freshers. Thanks for posting the information.
Machine Learning Training in Delhi

Jenifer said...

Pretty! This was a really wonderful post. Thank you for providing these details.

oracle training institutes in bangalore

oracle training in bangalore

best oracle training institutes in bangalore

oracle training course content

oracle training interview questions

oracle training & placement in bangalore

oracle training center in bangalore

Uvaska said...

Thanks for your great information, the contents are quiet interesting.I will be waiting for your next post. Custom Purpose Machines | Precision Components | Industrial Automation

Janu said...

Very useful and easily understood this language and it helps to work more and more.
i suggest to everyone about this article.



Dot Net Training in Chennai | Dot Net Training in anna nagar | Dot Net Training in omr | Dot Net Training in porur | Dot Net Training in tambaram | Dot Net Training in velachery


Anirban Ghosh said...

Your writing style says a lot about who you are and in my opinion I'd have to say you're insightful. This article reflects many of my own thoughts on this subject. You are truly unique.
SAP training in Kolkata
SAP training Kolkata
Best SAP training in Kolkata
SAP course in Kolkata
SAP training institute Kolkata

Techi Top said...

thanks for sharing this information.
techitop
tamilyogi unblock
oreotv
stream2watch
www.mpl.live

Tamil novels said...

Very informative and useful blog. Thank you for sharing with us.
Tamil novels pdf free download
Ramanichandran novels PDF
srikala novels PDF
Mallika manivannan novels PDF
muthulakshmi raghavan novels PDF
Infaa Alocious Novels PDF
N Seethalakshmi Novels PDF
Sashi Murali Tamil Novels PDF Download

360digitmg said...

It would help if you thought that the data scientists are the highest-paid employees in a company.data science course in kochi

ttt said...

Hi, thanks for the post!

Can you please explain how can I upload my own custom training model script?
I have tried to upload it in DVD by Create > Script and uploading the one from the Oracle Analytics Examples Library (OAC.ML.SupportVectorRegression.train) but when I click OK it's not appearing in the list of scripts. I can see it only after I upload the .dva file. I'm asking because I want to know how can I upload some other custom train script.

Kristy said...

Every business needs a well-designed website to help them make a good first impression on potential customers. It gives customers a good user experience and allows them to easily navigate your website. Visit our website to learn more about the website design services we provide.website design

Amrutha said...

Web development is closely related to the work of creating the features and functioning of websites and apps (commonly referred to as "web design"), but the phrase "web development" is usually reserved for the actual creation and programming of websites and applications.
best web design agency in vizag

shis said...

Thanks for the excellent post. It is very useful to read and learn. Keep post more blog which helps to improve my skills in the programs.
Power bi course in Chennai
Dot net training in Chennai
Azure training institute in Chennai
Best oracle training in Chennai

Tutor said...

IGCP provides best sas training in hyderabad IGCP also offers Best Clinical SAS Training in Hyderabad with various courses

iteducationcentre said...

Nice writing.
Full stack classes in Pune

vibrantnissan said...

Thanks for sharing wonderful information Nissan maginte on road price in hyderabad

Fusion technology said...
This comment has been removed by the author.
tickets said...

The facts confirm that you can get the all around kept up with character following the straightforward daily practice to keep it slick and clean with funny trending t-shirts. With the inclusion of time, there has been a few changes for shopping of adult humor t-shirts. It doesn't make any difference whether you want for a typical or high level cotton wearing style things.

Tutor said...

Selecting the best clinical SAS training institute in Hyderabad requires considering several factors to ensure you receive high-quality education and support for your career in clinical research and data analysis.

Post a Comment