Friday, December 30, 2016

Advanced Analytics: Calculate Attribute Importance using Custom R-Scripts on OracleDV

Resources: OracleBI Public Store , Boruta Documentation , Boruta in Action

Attribute Importance is a method that identifies and ranks the attributes that are most important in predicting a target attribute or in understanding the degree of influence of an attribute on the target attribute. For example in a typical customer satisfaction study customers are asked to provide ratings on individual attributes followed by a rating on overall Satisfaction. Customers do not give equal weightage to all attributes and some factors influence the ratings more than others. So marketers need to identify which of these individual attributes are rated more by customers to so that can focus their limited resources on improving the customer satisfaction for that attribute. In such scenarios Attribute Importance method comes to the rescue of Marketers.

In this blog we will discuss a way to calculate the attribute importance on OracleDV using a custom R-Script. This R-Script along with a sample DV project can be downloaded from Oracle BI Public Store. For the purpose of demonstration we have taken a dataset which contains the factors along with metric values that contribute to diabetes. We will take this dataset and identify the importance of each of these attributes in causing diabetes. This R-Script is quite easy to deploy and can be used for many other datasets. This R-Script uses Boruta R package, which  can be downloaded from CRAN repository. Boruta follows an all-relavant feature selection method. This method captures all features which are in some circumstances relavant to the outcome variable.

How does this Script work:This scripts calculates Importance of Attribute Columns(numerical/categorical) in determining values of Target column. Boruta R uses all-relavant feature selection method. This method is performed using multiple iterations; summary of scores obtained by each column in these iterations are returned along with the Decision if the column should be considered important in determining values of Target Column("Confirmed") or not("Rejected"). For more information on the Boruta Package please refer to the Boruta Documentation. For detailed explanation on usage of Boruta R package using an example, please refer to this link: Boruta in Action.

Inputs: This script needs a RecId column, a Target column based on which we will compute Importance of Attribute columns and Attribute columns. Attribute columns can be numerical or categorical
Optional Input: ColNameList : By default script assigns Column1, Column2, Column3 etc as names to the input columns and as a result in output of "ColumnName" column we will see the same names. However to see actual column names in output, pass the actual column names as optional input parameter to the script in a comma seperated format, for ex: "pregnant,diabetes,age.."

Output: This script returns Attribute Importance score of each column in determining the values of Target column along with column names passed as optional input.
   ColumnName: Name of the Column
MeanImp   : Mean of the Importance score computed over multiple iterations.
MedianImp : Median of the Importance score computed over multiple iterations.
MinImp    : Minimum of the Importance score computed over multiple iterations.
MaxImp    : Maximum of the Importance score computed over multiple iterations.
NormHits  : Number of hits normalised to number of importance source runs
   Decision  : "Confirmed" : Column can be considered Important ; "Rejected" : Column has very low importance score and can be neglected

Following are the steps to deploy this R-Script in your local OracleDV:

1) Install Advanced Analytics feature in Oracle DV by clicking on the below icon. This will install Oracle R deployment. Alternatively you can install Advanced Analytics by running install_advanced_analytics.cmd present in <DV_INSTALL_DIRECTORY>

2) If not installed Boruta R-Package already, please install it using following instructions
    Open R console(double click Rgui.exe present in <Advanced_Analytics_Install_Dir>\bin\x64) and
    install Boruta Package.
    Following are the R commands to install:
     Set Proxy:
        $ Sys.setenv(http_proxy="http://<your_proxy_host>:<port>")
           set proxy appropriate to your network settings.
     Install Package:
        $ install.packages("Boruta")
3) Download from OracleBI Public Store and unzip it.
4) Copy R.AttributeImportance.xml to <DV_INSTALL_DIRECTORY>\OracleBI1\bifoundation\advanced_analytics\script_repository
5) Import the .dva project to Oracle DV. Password for the .dva is Admin123

Here is a snapshot:


Mousas said...
This comment has been removed by the author.
Deepen Kshetri said...

may i know how meanIMP, MedianImp, MinImp, MaxImp and NormHits are calculated? what are the formula for all these.

Ruby Gracie said...

I was very interested in the article, it’s quite inspiring I should admit. I like visiting your site since I always come across interesting articles like this one. Keep sharing! Regards. Read more about Big data Services

Post a Comment