Wednesday, April 22, 2020

How to retrieve valuable information related to your ML models in Oracle Database - Part I


Oracle Database offers powerful Machine Learning algorithms that enable data analysts to discover hidden patterns and insights. With Oracle Machine Learning (OML), you can build and apply predictive models inside the database that help predict customer behavior, identify cross-selling opportunities, build customer profiles and detect anomalies. These algorithms are implemented as SQL functions that can be directly invoked inside the database without moving the data and thereby leveraging the inherent strengths of the database.

OML offers a broad range of in-database algorithms to solve different kinds of business problems. For classification type of problem, there are algorithms like Naïve Bayes, Decision Tree, Support Vector Machine etc. For clustering , there is Enhanced K Means, O Clusters and so on.
For every ML model that is created in the database, different kinds of information related to the model are stored in several specific DB Tables and Views within the database.

In this blog, let’s create a simple OML model and explore the various mining model views that are created with the model in the database and how these can be leveraged to get valuable information about the model itself.

Sample Problem 

Let’s assume we are given a demographic data set about a set of customers and we would like to predict the customer response to an affinity card program using classification function based on Decision Tree algorithm. This dataset contains 3000 records with customer attributes like cust_gender, cust_marital_status, education, occupation, household_size and so on. Grain of this dataset is cust_id so every record has a unique cust_id.


Create an ML Model

Let’s create a classification model using Decision Tree (DT) to address this problem. The database offers several options to enhance the overall accuracy of a model like how to handle missing value for predictors, outlier treatment for predictors, method for binning high cardinality data and so on. For the purpose of this blog,  let us use the basic minimum steps to build a DT model.
Default mining algorithm used to build a classification model is Naïve Bayes. In order to build a Decision Tree classification model, we need to first create and populate a settings table and use this settings table as input to the model building procedure.
- Sample Syntax to create/populate model settings table
CREATE TABLE dt_sh_sample_settings
(setting_name  VARCHAR2(30),
setting_value VARCHAR2(4000));

                  INSERT INTO dt_sh_sample_settings VALUES
    (dbms_data_mining.algo_name, dbms_data_mining.algo_decision_tree);

- Sample Syntax to create a Classification model using Decision Tree algorithm.
                BEGIN
                DBMS_DATA_MINING.CREATE_MODEL(
model_name          => 'DT_TEST,
mining_function     => dbms_data_mining.classification,
data_table_name     => 'mining_data_build_v',
case_id_column_name => 'cust_id',
target_column_name  => 'affinity_card',
settings_table_name => 'dt_sh_sample_settings');
END;

Query model related information

Every time an ML model is built, there are several details about the model that are stored in a few data mining related views. Following section describes the important mining views in the database that can be leveraged to understand the model details. The information from these views can be easily visualized using Oracle Analytics Cloud.

a)       ALL_MINING_MODELS – This view in the database describes all the mining models accessible to the current user. It has details about the model owner, model_name, mining_function used in the model (Classification, Regression, Clustering and so on) , algorithm used to build the model (Naïve Bayes, Decision Tree and so on) , model creation date , model size and so on.  This view has one record for every OML model created.
There are two related views: 
             i) DBA_MINING_MODELS that describes all the mining models in the database.
             ii) USER_MINING_MODELS  that describes the models owned by the current user.

Information in this view can be visualized within Oracle Analytics Cloud by creating a dataset sourcing from this view.
Important Note: When you navigate to the schema in the data set creation screen, these data mining views are not listed as objects. However, you can enter a custom SQL similar to ‘Select * from ALL_MINING_MODELS’ to retrieve the model details.


Here's an example of the output of the ALL_MINING_MODELS view related to the model called DT_TEST.


b)       ALL_MINING_MODEL_ATTRIBUTES – This is an important view and contains information about the columns in the training data that were used to build the model. For each attribute, details about the attribute name, datatype, whether attribute is Target etc is available in this view. When this model is applied on a data set for scoring, columns with the same names and data types should be available in the scoring data set. In this case as well, there are two related views DBA_MINING_MODEL_ATTRIBUTES and USER_MINING_MODEL_ATTRIBUTES. 

Here's a sample output of this view for a model called DT_TEST as visualized within OAC. 




c)     ALL_MINING_MODEL_SETTINGS – This view describes the settings of the mining model. Here's a example of the model settings for a model called DT_TEST visualized using OAC.




d)       ALL_MINING_MODEL_VIEWS - This is another important view that provides slightly different kind of information. Every time an ML model is created in the database, there are several underlying model views that get created relevant to this model. The number and kind of these model views created differ from algorithm to another. An ML model built using Naive Bayes algorithm will have a set of related model views. An ML model built using Decision Tree will have another set of related model views created and so on. For example, consider a Decision Tree model called DT_TEST. Following are the model views created :              


Note that the view names have a fixed format. It begins with DM$V followed by an alphabet that is specific to the model view followed by the model name. DM$V<Alphabet><Model Name>. 
We will refer to the details on these specific model views for various mining algorithms in an upcoming Part II of blog entry on this topic. 

Conclusion

For every ML model that is created in the Oracle Database, there are a lot of metadata about the model available as part of the dictionary views. If you are interested in knowing more about this topic, kindly refer to Oracle Documentation, or stay tuned for the second part of this blog entry coming soon.

No comments:

Post a Comment