Tuesday, March 27, 2018

How to Populate Quality Tab in ML Model Inspect page in Oracle Analytics Cloud

In this blog post we will discuss about how to Populate Quality Tab of a Machine Learning Model's Inspect page in OAC.

Assessing quality of a Machine Learning model is an important step in evaluating its performance. Various metrics like RSE, RAE Residuals , R-Squared Adjusted etc help in assessing the quality of the model prediction. If the error metrics are not satisfactory or does not meet user's goals, he/she can tune the model model further till the required level of accuracy is reached. So it is important to expose this quality information of a model in an intuitive and comprehensive fashion so that users can take next course of actions as necessary. Quality tab in Inspect page of ML model in Oracle Analytics Cloud aims to visualize and give complete information on model accuracy details. Here is a snapshot of Quality tab for Linear Regression Model that predicts Bike Rental Count:

In this blog we will talk about how to populate this Quality tab for a custom model. Quality Tab is populated by adding required Related datasets in Train Model script. More details on how to add a related dataset can be found in this blog. Quality tab is auto-populated if user adds the following related datasets:

For Numeric Prediction: If Residuals and Statistics datasets are added as Related datasets in Train Model script, ML framework in OAC takes information from that dataset and populates the Quality tab. Graph is populated using Residuals dataset and all the error metrics are populated using Statistics Dataset. Image shown above shows how Quality tab of a Numeric Prediction model looks like. Here is a sample code that shows how to add Statistics and Residuals Related datasets:

        #residuals dataset
        residuals_mappings = None
        residuals_ds = ModelDataset("Residuals", residuals_df, residuals_mappings)

        # statistics dataset
        statistics_mappings = None
        statistics_ds = ModelDataset("Statistics", statistics_df, statistics_mappings)

For Classification: For Binary and Multi-Classification models Confusion Matrix and Statistics Related datasets are used to populate the Quality tab. Here is a sample snapshot of Quality tab for Classification models:

Confusion Matrix Related dataset is used to populate the Confusion matrix table that can be seen in the image and Statistics Related dataset is used to populate the metrics. Here is a sample code that shows how to add Confusion Matrix and Statistics related datasets:
            metrics['Statistics'] = stats
            metrics['Confusion Matrix'] = confMatrix
            self.add_datasets(metrics=metrics, model=model)

Quality tab is updated automatically if user changes the values of user or model tuning parameters and trains the model. 

Related Blogs: How to build Train/Apply Model Custom Scripts in OAC, How to create Related DatasetsHow to use inbuilt methods in OAC to Prepare data for Training/Applying ML Model 

No comments:

Post a Comment