Oracle Database offers powerful Machine Learning algorithms that enable data analysts to discover hidden patterns and insights. With
Oracle Machine Learning (OML), you can build and apply predictive models inside the
database that help predict customer behavior, identify cross-selling opportunities,
build customer profiles and detect anomalies. These algorithms are implemented
as SQL functions that can be directly invoked inside the database without moving the data and thereby
leveraging the inherent strengths of the database.
OML offers a broad range of in-database
algorithms to solve different kinds of business problems. For classification type
of problem, there are algorithms like Naïve Bayes, Decision Tree, Support Vector
Machine etc. For clustering , there is Enhanced K Means, O Clusters and so on.
For every ML model that is created in the database, different
kinds of information related to the model are stored in several specific DB Tables and Views within the database.
In this blog, let’s create a simple OML model and explore the various mining model views that are created with the model in the database and how these can be leveraged to get valuable information about the model itself.
In this blog, let’s create a simple OML model and explore the various mining model views that are created with the model in the database and how these can be leveraged to get valuable information about the model itself.
Sample Problem
Let’s
assume we are given a demographic data set about a set of customers and we
would like to predict the customer response to an affinity card program using classification function based on Decision Tree algorithm. This dataset contains 3000
records with customer attributes like cust_gender, cust_marital_status,
education, occupation, household_size and so on. Grain of this dataset is
cust_id so every record has a unique cust_id.
Create an ML Model
Let’s create a classification model using Decision
Tree (DT) to address this problem. The database offers several options to enhance
the overall accuracy of a model like how to handle missing value for
predictors, outlier treatment for predictors, method for binning high
cardinality data and so on. For the purpose of this blog, let us use the basic minimum steps to
build a DT model.
Default mining algorithm used to build a classification
model is Naïve Bayes. In order to build a Decision Tree classification model,
we need to first create and populate a settings table and use this settings
table as input to the model building procedure.
- Sample Syntax to create/populate model settings table
CREATE TABLE
dt_sh_sample_settings
(setting_name VARCHAR2(30),
setting_value
VARCHAR2(4000));
INSERT INTO dt_sh_sample_settings VALUES
(dbms_data_mining.algo_name,
dbms_data_mining.algo_decision_tree);
- Sample Syntax to create a Classification model using
Decision Tree algorithm.
BEGIN
DBMS_DATA_MINING.CREATE_MODEL(
model_name => 'DT_TEST,
mining_function => dbms_data_mining.classification,
data_table_name => 'mining_data_build_v',
case_id_column_name
=> 'cust_id',
target_column_name => 'affinity_card',
settings_table_name
=> 'dt_sh_sample_settings');
END;
Query model related information
Every time an ML model is built, there are several
details about the model that are stored in a few data mining related views. Following
section describes the important mining views in the database that can be
leveraged to understand the model details. The information from these views can be easily visualized using Oracle Analytics Cloud.
a) ALL_MINING_MODELS
– This view in the database describes all the mining models accessible to the current user. It has
details about the model owner, model_name, mining_function used in the model
(Classification, Regression, Clustering and so on) , algorithm used to build
the model (Naïve Bayes, Decision Tree and so on) , model creation date , model
size and so on. This view has one record
for every OML model created.
There are two related views:
i) DBA_MINING_MODELS that describes all the mining models in the database.
ii) USER_MINING_MODELS that describes the models owned by the current user.
There are two related views:
i) DBA_MINING_MODELS that describes all the mining models in the database.
ii) USER_MINING_MODELS that describes the models owned by the current user.
Information in this view can be
visualized within Oracle Analytics Cloud by creating a dataset sourcing from
this view.
Important Note: When you navigate to the schema in the data set creation
screen, these data mining views are not listed as objects. However, you can enter a
custom SQL similar to ‘Select * from ALL_MINING_MODELS’ to retrieve the model
details.
Here's an example of the output of the ALL_MINING_MODELS view related to the model called DT_TEST.
b) ALL_MINING_MODEL_ATTRIBUTES
– This is an important view and contains information about the columns in the training data that
were used to build the model. For each attribute, details about the attribute name, datatype, whether attribute is Target etc is available in this view. When this model is applied on a data set for scoring, columns with the same names and data types should be available in the scoring data set. In this case as well, there are two related views DBA_MINING_MODEL_ATTRIBUTES and USER_MINING_MODEL_ATTRIBUTES.
c) ALL_MINING_MODEL_SETTINGS
– This view describes the settings of the mining model. Here's a example of the model settings for a model called DT_TEST visualized using OAC.
d) ALL_MINING_MODEL_VIEWS - This is another important view that provides slightly different kind of information. Every time an ML model is created in the database, there are several underlying model views that get created relevant to this model. The number and kind of these model views created differ from algorithm to another. An ML model built using Naive Bayes algorithm will have a set of related model views. An ML model built using Decision Tree will have another set of related model views created and so on. For example, consider a Decision Tree model called DT_TEST. Following are the model views created :
Note that the view names have a fixed format. It begins with DM$V followed by an alphabet that is specific to the model view followed by the model name. DM$V<Alphabet><Model Name>.
We will refer to the details on these specific model views for various mining algorithms in an upcoming Part II of blog entry on this topic.