Monday, May 25, 2020

How to retrieve valuable information related to your ML models in Oracle Database - Part II


In part I of the blog on this topic (http://oracledataviz.blogspot.com/2020/04/how-to-retrieve-valuable-information.html), we talked about the various data mining views that store information about every ML model that gets created in the Oracle database. These views have information about ML models size, models settings, attributes used in the models etc. This information is available for each ML model created in the database irrespective of the mining function or the algorithm used.

We also talked about one particular view called ALL_MINING_MODEL_ViEWS which contains the names of all the underlying model views that gets created behind the scenes every time we create an ML model. The number and kind of views that get created differ from one algorithm to another. An ML model built using Naive Bayes algorithm will have a set of related model views. An ML model built using Decision Tree will have another set of related model views created and so on.

In this blog, let’s look at these specific model Views that get created for different algorithms.
Let us consider the same Sample Problem discussed in Part I of the blog which is to determine a customer’s response to an affinity card program. Let’s build an ML model called DT_TEST using Decision Tree algorithm. (Refer to Part I for Model training syntax details).

Once the model is created, if you query ALL_MINING_MODEL_ViEWS for the model_name as DT_TEST, you will find that the following underlying views have been created.




These view names have a fixed format. It begins with DM$V followed by an alphabet that is specific to the model view followed by the model name. DM$V<Alphabet><Model Name>. 
Let’s look at some of these views, see what information they contain and how to visualize them using Oracle Analytics.


a) DM$VCDT_TEST – This is the scoring cost view and describes the scoring matrix for classification models. The view has actual_target_value, predicted_target_value and cost.
Conclusion
x





     b) DM$VGDT_TEST - This view describes global statistics related to the model build like number of rows used in the model build, convergence status and so on.

     c)  DM$VIDT_TEST - This is the node statistics view and describes the statistics associated with individual nodes in the decision tree. The statistics include a target histogram for the data in the node. For every node in the tree, this view has information about predicted_target_value,  actual_target_value and node support.
Here's a sample visualization of this information for the DT_TEST model that resulted in a tree with 19 nodes.




     d) DM$VMDT_TEST - This view describes the cost matrix used by the decision tree build.

     e) DM$VODT_TEST - This view contains the high level node description. It contains the Node and Parent information that allows us to understand the hierarchy of nodes. For every node, there is information about the attribute associated with the node, operator (IN, > , <> etc) and the attribute value. The attribute value is stored as an XMLTYPE in the view. In order to visualize it within Oracle Analytics, extract the string value from the XML in the data set creation similar to what's shown below.




Once the data set is available in Oracle Analytics, a tree diagram helps in understanding the hierarchy of nodes in the decision tree. Attribute details about each node is also available in this view providing more insight about attributes that are relevant in the input dataset.



     f) DM$VPDT_TEST - This is the split information view and describes the decision tree hierarchy and the split information for each level in the Decision Tree. For each node in the tree, this view has details about the attribute associated with the node and the split type (Main/Surrogate).



     g) DM$VSDT_TEST - This is the computed settings view and lists all the algorithm computed settings as a name-value pair.

     h) DM$VTDT_TEST - This is the target map view and describes the target distribution. The view contains target_value, target_count and target_weight information.

     i) DM$VWDT_TEST - This is the alert view and lists all the alerts issued during the model building process.

While a Decision Tree model results in a specific set of views , a Naive Bayes model called NB_TEST will result in a slightly different set of views as shown below.


Conclusion
Oracle Database maintains a lot of valuable information in several dictionary views when a mining model is built. The information in these views helps in understanding several aspects like what attributes are important in the input data set, what is the distribution of the target value, details about the model that can help in iterative-ly improving the overall accuracy of the model predictions and so on. Oracle Analytics helps in visualizing all this information in an intuitive and interactive manner.

Thursday, May 21, 2020

Using of Session and Repository variables in Oracle Analytics (OAC and OAS)

In this blog, let's explore the different types of variables existing within OAC 5.6 infrastructure with a specific focus on repository variables and session variables. We will be looking at how we can leverage these two types of variables in Oracle Self Service Analtyics.

Repository Variables:

A repository variable has a single value at any point in time. Repository variables are substitutes for literals and constants and the Oracle OAC server substitutes this value in any logical expression, either in the semantic layer metadata or in a project/analysis calculation or filter expression. There are 2 different types of repository variables :

1) Static Repository variable

The value of a static repository variable is initialized inside the variable dialog interface of OAC Admintool. The value then persists and does not change until an administrator decides to change it.
For example, you can define a static variable called Max_Age and assign a value to it.

The syntax to access a static repository variable from any logical calculation or filter is as follows :
VALUEOF("<name_of_static_variable>"). In the case of the example it will be something like :


2) Dynamic Repository Variable

Dynamic repository variables in the same way as static variables, but the values are refreshed by data returned from pre-defined queries in the semantic layer (Repository) metadata. When defining a dynamic repository variable, one first creates an 'initialization block' that contains a specific SQL query. You also set up a schedule that the Oracle BI Server will follow to execute the query and periodically refresh the value of the variable.
For example, assume you have a table that contains information on various trips done by a cab service.

Now you have to keep track of the trip which has the traveled the longest distance.
Distance is a metric that will be calculated using some spatial functions on the columns TRIP_START and TRIP_END.
Let us say that there are a few thousand records in your data, and so far the maximum distance is, say 1000 miles. Without using dynamic repository variables, you would describe the trip containing the longest distance with an expression such as: Trips.distance >= 1000
This content statement becomes invalid as new data is added to the recent source which might actually change the value of the maximum distance. Instead of modifying the variable definition manually, dynamic repository variables can be set up to do it automatically.

To create dynamic repository variables, you create both the dynamic variables and an initialization block, then match both together.

To create an initialization block, go to Action->New->Repository->Initialization block.
Give it a name and schedule the time & refresh interval. Then click on the Edit Data Source and point it to the correct connection pool. Type in the sql after choosing the correct connection pool.

In our example : 
select trip_id, sdo_geom.sdo_distance(trip_start, trip_end, 0.005) distance
from trips
where  sdo_geom.sdo_distance(trip_start, trip_end, 0.005)=
(select max(sdo_geom.sdo_distance(trip_start, trip_end, 0.005)) from trips)

The first column returned by the above sql is the trip_id and the second column is the max distance. In our example, let's assign these values to two distinct dynamic repository variables :


Click on Edit Data Target and add two variables and name it Longest Trip and Maximum_Distance to associate with the query result.


Simply save the RPD, upload it on to an OAC instance and bring the subject area into a project or Analysis. The syntax for accessing the dynamic repository variables is the same as for a static repository variables.
VALUEOF("<name_of_dynamic_repo_variable>")

Create 2 custom calculations for the 2 dynamic repo variables with the following syntax:
VALUEOF("Longest Trip")
VALUEOF("Maximum_Distance")
and display it on a DV canvas.


A common use of these variables is to set filters for use in Oracle Analytics.

Session variables :

Session variables are similar to dynamic repository variables in that they obtain their values from initialization blocks. Unlike dynamic repository variables, however, the initialization of session variables is not scheduled. When a user logs-in and begins a session, the Oracle Analytics server creates new instances of session variables and initializes them.
Unlike a repository variable, there are as many instances of a session variable as there are active sessions (users) on the Oracle BI Server. Each instance of a session variable could be initialized to a different value. Here again, there are 2 types of Session variables :

1) System Session Variables:

System session variables are session variables that the Oracle Analytics server use for specific purposes. They have reserved names and cannot be used for other kinds of variables. There exists many system session variables, some of them are listed below :

USER
This variable holds the value the user enters with login name. This variable is typically populated from the LDAP profile of the user.
USERGUID
This variable contains the Global Unique Identifier (GUID) of the user and it is populated from the LDAP profile of the user.
GROUP
It contains the groups to which the user belongs. When a user belongs to multiple groups, group names are concatenated separated by semicolons (Example - GroupA;GroupB;GroupC). 
ROLES
This variable contains the application roles to which the user belongs. When a user belongs to multiple roles,  the role names are concatenated separated by semicolons (Example - RoleA;RoleB;RoleC). 
ROLEGUIDS
It contains the GUIDs for the application roles to which the user belongs. GUIDs for application roles are the same as the application role names.
Let us consider the system session variable USER which holds the value the user enters as his or her logon name. The syntax for accessing the System session variable in DV is as follow.
VALUEOF("NQ_SESSION.<name_of_system_session_variable>")


This is similar to syntax for repository variables but with prefacing their names with NQ_SESSION.
This syntax can be used as an argument for in any logical calculations, or custom filters. When dragging the custom calculation on to a DV canvas, the value of system session variable is seen.

The Oracle Analytics documentation will help you get an exhaustive list on all the system session variables along with their description. 

2) Non-System Session Variables:

These are system variables that an administrator can create using the Oracle BI Administration tool. They are part of the semantic layer of Oracle Analytics. The use of Session variable is wide, they are basically holding session based value that refresh everytime a user connects, using any particular predefined SQL. An example of non-system session variables could be to set user filters for example.

To define Non-System Session Variables, simply go to the variable manager in the admin tool and click on Action -> New -> Session -> Initialization block and replicate the one shown in the below diagram.

Now a non system session variable called Currency is created and the user can now access them in DV using the same syntax of system session variables. VALUEOF("NQ_SESSION.<name_of_non-system_session_variable>")

Drag and drop this calculation on a canvas to see the value of the currency variable set.