Oracle Underground BI & Dataviz: How to Create Related Datasets for ML Models in Oracle Analytics Cloud

Tuesday, March 27, 2018

How to Create Related Datasets for ML Models in Oracle Analytics Cloud

In this blog post we will discuss how to Add Related datasets for custom Machine Learning models in OAC.

After a Machine Learning model is created it is important to evaluate how well the model performs, before we go ahead and apply that model. To evaluate how well a model performs there are various accuracy metrics like Mean Absolute Error (MAE), Root Mean Squared Error(RMSE), Relative Absolute error(RAE), Residuals etc for numeric prediction/Regression algorithms and False Positive Rate(FPR), False Negative Error, Confusion Matrix etc for classification algorithms. Machine Learning feature in Oracle Analytics cloud has inbuilt methods to compute most of these accuracy metrics and store them in Related datasets. Related datasets are the tables/datasets which contain information about the model like accuracy metrics and prediction rules. In our previous blog spot named Understanding the Performance of a Oracle DV Machine Learning models using related datasets feature we have covered in depth about Related datasets.

In this blog post we will talk about how to add such related datasets in Custom Train model code. There are inbuilt methods in Oracle Analytics Cloud to add related datasets. User has to define the structure of these datasets i.e., columns these tables/datasets should contain, data type for all these columns and aggregation rules for these columns(if they are numeric). Once the required related datasets are added they can be found under Related Tab in Model Inspect pane:

Let us discuss in detail how to add Related datasets for a model:

ModelDataset() class implemented in Model module is a generic class that represents related datasets. Pass the name of the dataset that you are trying to create as an argument to ModelDataset() along with column names and mappings. This will return a related dataset. Generic Model class has an inbuilt method called add_output_dataset() which adds the passed dataset/dataframe as related dataset for that model. Following lines of code shows how to add a sample Related dataset called "Predicted Results" using df1 dataframe.

df1=pd.DataFrame({target:y_test, \
"PredVal":y_pred1, \
"PredProb":[y_pred_prob1[i][list(clf1.classes_).index(y_pred1[i])] for i in range(len(y_pred1))]})

df1_mappings = pd.DataFrame({
'name':[target,'PredVal','PredProb','Target','Model Name'],
'datatype':["varchar(100)","varchar(100)", double","varchar(100)","varchar(2000)"],
'aggr_rule':["none","none", "avg","none","none"]})

model.add_output_dataset(ModelDataset("Predicted Results", df1, df1_mappings))

Mappings dataframe contains column names mapped to corresponding datatypes and aggregation rules. Some of these related datasets are used by the framework to populate Quality Tab in the Model Inspect page. More details on how to populate Quality tab can be found in this blog: How to populate Quality tab.

Related Blogs: How to build Train/Apply custom model scripts in OAC, How to Populate Quality Tab, How to use inbuilt methods in OAC to Prepare data for Training/Applying ML Model

Are you an Oracle Analytics customer or user?

We want to hear your story!

Please voice your experience and provide feedback with a quick product review for Oracle Analytics Cloud!