Thursday, September 20, 2018

Dynamic Heatmaps in Oracle Analytics

As an Operations head of a chain of multi brand retail stores if you are trying visualize and identify areas with highest population density or with highest economic activity to open new stores, Heatmaps are extremely useful. What are Heatmaps? Heatmap is a graphical representation of data that use a system of color-coding to represent different values. In this blog we will discuss about new Heatmap Visualization on Maps in the latest version of Oracle Analytics. 

Heatmaps have various applications and can take different forms. For example Heatmaps can be used to visualize variance across and correlation between  variables in a dataset,  they can also be used to give visual representation of user clicks on a website, another good use of heatmap is it can be used to depict the intensity of data at geographical points on a Map visualization. Here is a quick snapshot of how a heatmap visualization in Oracle Analytics which visualizes sales by city  looks like:

How to create Heatmap based map visualizations in Oracle Analytics Cloud? 
Heatmaps can be created using Point geometry based map layers. After visualizing data using a point geometry based map layer, choose the Layer type to be Heatmap. 


Heatmaps in Oracle Analytics can be density based or Metric based. In density based Heatmap color intensity at a point depicts the density or number of points in a particular region on the map visualization. Density heatmaps can be generated by populating Category(Geography) grammar field with the geographic column and by leaving the Color grammar field empty. In Metric based heatmap color intensity at a point is influenced by the aggregated value of metric in a particular region on map visualization. To visualize a metric based heatmap populate Color grammar field with a metric.

Users are provided multiple options to influence and customize how data is visualized on a heatmap. Options like what should be the radius of each point on heatmap, color spectrum i.e., points/areas with highest density or metric value should be displayed in Orange or blue or any other custom color, Layer Transparency, Interpolation etc. Here is a quick snapshot showing all options:                                       

Interpolation option is specific to heatmap visualizations and it influences the way in which points are rendered on a heatmap. For example if Interpolation is set to Maximum, points with maximum value are highlighted more. Similarly for minimum, points with lowest or minimum value is displayed to draw attention. Here is a heatmap visualization with Interpolation set to Maximum:

 In addition to Cumulative, Maximum, Minimum other options for Interpolation style are Average, Average(Constant). 

Depending on the aggregate rule that is set for a metric, Oracle Analytics chooses interpolation style automatically. For example columns on which Aggregate rule is set to Sum, Cumulative would be chosen as interpolation style automatically. Similarly metrics for which aggregation rule is set to Maximum, minimum, Average interpolation would be automatically set to Maximum, Minimum or Average respectively. Users can change these auto assigned interpolation styles later. Another important aspect to be noted is since Heatmap does not treat individual points as discrete, consequently brushing, selection, tool tips do not work for points on a heatmap.

To summarize Oracle Analytics lets users visualize their data on a Map visualization using heatmap with great many options to influence the way data is rendered on a heatmap. This video to demonstrate various features of heatmaps in Oracle Analytics:

Friday, April 27, 2018

Multi-Layered Maps in Oracle Analytics

Visualizing multiple layers of business data on a map and spatially interacting with them, have been a forte of specialized GIS tools, however it has been a challenge for multiple market leading BI and Data Visualization vendors.
Recent release of Oracle Analytics is pushing its map data visualization capabilities to new heights to address such spatial analysis needs. In this blog we will do a brief tour of the Multi-layered mapping capabilities included in recent release Oracle Analytics Cloud & Data Visualization. 

Multi-layered maps architecture allows users to visualize data using multiple map layers in a single map visualization. It does that by overlaying location data on layers represented by different types of geometries (points, lines, and polygons) or layers with specialized rendering of those geometries (heatmap or clusters). 

One example need for multi-layered maps is if operations manager of a Railway company wants to analyze traffic by geography which can be City, Zip code, County, State or by lines. He/she could use multi layered maps to visualize traffic by City using Point geometry map layer; by Zip code, county, State using polygon geometry map layer; by line using line geometry map layer. Here is a quick preview of how such a visualization would look like in OAC:

This visualization shows the passenger traffic data (fictitious) of Washington DC metro by Stations (Point geometry), Line (Line geometry) and by Zip code (Polygon geometry).

Following video walks through this multi-map layers:

Adding Data Layers on a Map

To render a map visualization you can simply select a geography column (and optionally its associated metrics) and choose Map visualization. Map viz in DV looks up the most suitable map layer and displays that data on map. For example, select zip name and passenger count and display as Map.
To add more layers of business data to Map visualization, click on Menu option above Category (Geography) grammar field and select Add Layer. This will add a new map layer to the Map visualization. A new layer can also be added from the map properties pane.


This menu also lets user manage the map layers i.e., to Hide, Delete or Order a layer. We will discuss ordering map layers later in this blog. We may add and manage any number of layers using these options.

Once a layer is added, drag and drop the geography column to be visualized in Category (Geography) field. Map layer corresponding to that geography field will be auto selected and it gets visualized on the map.

Once all the required map layers are added to the map visualization, users can view the list of map layers added to the visualization by clicking on the drop down list next to Menu option. We have 3 map layers for DC Metro in the example shown above. 1) DC Metro Lines (Line geometries) 2) DC Metro Stations (Point geometries) 3) DC Zip Code (Polygon geometries).

In latest OAC/DV following layer types are supported in map visualization –
1) Point Geometry layers
2) Line Geometry layers
3) Polygon Geometry layers
4) Heat maps

In this blog we will look at features related to Point, Line and Polygon geometry layers using the DC Metro example above. We will discuss about Heat maps in detail in a separate blog. 

Layer Properties

Data Layers tab under map properties lets users manage the properties of all map layers like Name of the layer, Map Layer, Layer type, Transparency of a Layer and whether to show the layer or not. Layers icon highlighted in red sitting right next to Globe icon is Properties tab of multi layered maps. Map layers added to this map visualization are listed here. Clicking + icon will add more Map layers.

Each map layer has a set of properties: 

      1) Name: By default OAC assigns a name to the layer with column names used in that layer. Users can change it to any custom name that they want.
      2) Map Layer: Map layer is automatically chosen based on the match between contents of the column present in Category (Geography) field and Key columns of the custom map layer. Users can also change the Map layer and manually assign other layers that are a better match. Here is quick snapshot that shows how to change a map layer:

      3) Layer type: Depending on the type of geometries defined in a map layer, users can change the type of layer. This will change the way a layer is rendered on the map. For example: Map layers with Points and Polygon geometries can either be displayed as points and polygons respectively or as points. Map layers with Point geometries can be displayed as either Point or Heatmap.  

      4) Transparency: Transparency lets users choose the degree of Transparency of this layer so that other co-existing layers are visible through this layer.                                           

      5) Show Layer: Toggling Show Layer to Off/On will show/hide the layer from the map visualization. Another way to Toggle layer display is by clicking on the Map Layer icon in the legend. Here is a snapshot that shows this behavior:

      6) Visualization Grammar by Layer type:  Fields available in the visualization grammar differs by the map layer type.
a.   Category (Geography): This field represents the geographic columns which will be matched against the key columns of custom map layer. This field can take multiple columns and all these columns do not have to match with key columns in the layer. Values of all these columns will be shown in the tooltip that is displayed when we hover over that geometry
b.   Color: This field takes either metrics or attribute columns and assigns either gradient or distinct colors depending on the type of column.
c.   Tooltip: Tooltip field takes metric columns and displays these metric values in the tooltip that is displayed when those data points are hovered over in the map visualization
d.   Filters: if you want to apply filters to your dataset
e.   Size: This field can take only metric columns and in case of point geometry, adjusts the size of the bubble in proportion to the metric value.

      7) Legends: Each map layer contains its own legend and all legends are displayed on the map visualization with appropriate icons to indicate the map layer to which that particular legend is associated with.

Users can display these legends on the Top, Bottom, Left, Right, None or Auto. Selecting Layer icon allows to toggle display of its corresponding layers

Layer Reordering and Display Controls

Layer Ordering: 'Order Layer' option present in the Menu option lets user reorder a layer. Here is a quick snapshot which shows the allowed options with Order Layer:


Users can either bring the layer to Front (top) or move it just above the layer which is on top of the current layer (Bring Forward). Similarly users can send the map layer beneath/behind the layer above which the current map layer is (Send Backward) or send it all the way to the bottom (Send to Back).
Users can also visually reorder map layers using simple hold and slide option available in Properties pane. This will reorder the map layers in the map visualization.


Layer Display Controls: In addition to the Toggle hide layer options described above, hide layer option in Menu lets user Hide a map layer. Functionally this is same as Toggling Hide button in Manage Layer properties Tab and in the Legend. Users can also delete a map layer using Delete Layer option. This will permanently remove a map layer.


Tuesday, March 27, 2018

Custom Model Scripts for Oracle Analytics

In this blog post we will discuss how to use custom models in OAC. We will walk-through the process of developing python scripts compatible with OAC, to train and apply a model using your own Machine Learning algorithm.

At the time of this blog, Oracle Analytics Cloud (OAC) is shipped with more than 10 Machine Learning algorithms which fall under Supervised (Classification, Regression) and Unsupervised(Clustering) Learning. List of inbuilt algorithms in OAC include CART, Logistic Regression, KNN, KMeans, Linear Regression, Support Vector Machine and Neural Networks (for an exhaustive list of inbuilt algorithms in OAC please refer to our earlier blog post). These inbuilt algorithms cover majority of the real world business use cases. However sometimes users are faced with cases/datasets where they need use to use a different algorithm to achieve their goal.

In such cases OAC lets users write their own scripts to Train and Apply a model using algorithm of their choice. We need to develop two scripts for working with custom models: First script to train your model and Second script to score/apply the model you just trained. Like other custom Advanced Analytics (Python or R) scripts Train and Apply model scripts need to be embedded in XML format.

Oracle Analytics-Library has an example for custom Train/Apply Model using Support Vector Regression (SVR). Please refer that sample to know more about the required XML structure before proceeding further. We'll use that sample to walk through these script parts.


1) Capturing Parameters:

    Data is typically prepared and pre-processed before it is sent for Training a model. Pre-processing
    involves filling missing values, converting categorical values to numerical values (if needed) and 
    standardizing the data. The way this data pre-processing is done can influence the accuracy of a
    model to a good extent. In OAC users are provided with parameters in Train Model UI to
    choose/influence the methods to be used for pre-processing. Also through the same UI users are
    provided a bunch of options to tune their model. All the parameters sent from this User Interface
    need to be captured before we start processing the data. For example in Train Model script for SVR
    following snippet of code reads all the parameters:

         ## Read the optional parameters into variables
         target = args['target']
         max_null_value_percent = float(args['maximumNullValuePercent'])
         numerical_impute_method = args['numericalColumnsImputationMethod']

2) Data Pre-processing(optional):

     Before we train a model data needs to be cleansed and normalized if necessary to get better
     prediction. In some cases Training data may already be cleansed and processed and be ready for
     training. But if the data is not cleansed and prepared user can define their own functions to
     perform the cleansing operation or use inbuilt methods in OAC to perform these operations.
     Following blog discusses in detail how to use inbuilt methods in OAC to perform
     cleanse/prepare the data for Training a model.

3) Train/Create Model:
     Now we are ready for actually training the model. Train Model process can be sub-divided into 2
     steps: 1) Splitting the data for testing the model 2) Train the model which contain model
     performance/accuracy details.

    Train-Test split: It is a good strategy to keep aside some randomized portion of the Training data
    for testing. This portion of data will be used for evaluating the model performance in terms of
    accuracy. Amount of data to be used for Training and Testing is controlled by a user parameter
    called split. And there is an inbuilt method for performing this split in a randomized fashion so as
    to avoid any bias or class imbalance problems. Following snippet of code performs this Train-Test
        # split data into test and train
        train_X, test_X, train_y, test_y = train_test_split(features_df, target_col, test_size=test_size,

    Train Model: Now we have datasets ready for Training and testing the model. It's time to train the
    model using inbuilt train methods for that particular algorithm. fit() is the inbuilt method for most
    of the algorithms implemented in python. Following snippet of code does that for SVR algorithms:

        # construct model formula
        svr = SVR(kernel=kernel, gamma = 0.001, C= 10)
        SVR_Model =, train_y)

4) Save Model:

    Model that we created in the previous step needs to be saved/persisted so that it can be accessed
    during Apply/Scoring Model phase.
   Save Model as pickle Object: Created models are saved/stored as pickle objects and they are re-
   accessed during Apply model phase using reference name. There are inbuilt methods in OAC to
   save the model as pickle object. Following snippet of code saves the model as pickle object:

        # Save the model as a pickel object. And create a reference name for the object.
        d = base64.b64encode(pickle.dumps(pickleobj)).decode('utf-8')
   In this case SVRRegression is the reference name for the model created. The pickle doesnt have to
   be just the model. In addition to the model other information and objects can also be saved as pickle
   file. For example if wish to save additional flags standardizer indexes along with the model, you
   can create a dictionary object which contains the model and the flag/indexer and save this entire
   dictionary as a pickle object.

5) Add Related Datasets (optional): Now that we have the model, let us see how well this model
   performs. In the previous step we have set aside some part of the data for testing the model. Now
   using that testing dataset let us calculate some accuracy metrics and store them in Related datasets.
   This is an optional step and in cases where users are confident about model's accuracy they can skip
   this step. However if users wish to view the accuracy metrics or populate them in quality tab they
   can use inbuilt methods in OAC to create the related datasets. More information on how to add
   these Related datasets can be found in this blog: How to create Related datasets

6) Populate Quality Tab (optional): In the model Inspect pane, there is a tab called Quality. This
    tab visualizes the model accuracy details. Users can view the quality tab and evaluate the model
    accuracy and decide if he/she wants to tune the model further or use it for prediction. Please note
    that this is also an optional step and not mandatory. However if users wish to view the model
    quality details in Quality tab then they can use inbuilt functions in OAC. More details on inbuilt
    functions that populate quality tab can be found in this blog How to Populate Quality Tab.

Now the Model along with related datasets is all prepared and saved in a pickle object. This marks the end of Train Model phase. The script returns the model to the framework and it is stored in dataset storage. If the train model executes successfully you should find the model in Machine Learning > Models tab. For example I have created a model called SVR Donations using the SV Regression scripts uploaded in the analytics-library:



Apply script should have the same name as Train script except for train part i.e. for example it should follow nomenclature: OAC.ML.Algo_Name.apply.xml. Apply script accepts the model name, and other data pre-processing parameters and user parameters as input to the script. Most of the pre-processing steps are same as what we have done in Train Model scripts

1) Capturing Parameters
2) Data Pre-processing: Same inbuilt methods can be used for cleansing (filling missing values), Encoding and Standardizing the Scoring data.

After the data is cleansed and standardized it can be used for Prediction/Scoring. 

Load Model and Predict:
    Using the reference name we gave to the model in Train Script, retrieve the model pickle object and predict the results for Cleansed and Standardized scoring data. Following code in SVR Apply model script does that:

       ## Load the pickle object that we saved during Train Model phase. It is stored as an element 
       in dictionary. Fetch it using the reference name given.

       pickleobj = pickle.loads(base64.b64decode(bytes(, 'utf-8')))
       ## Predict values.
       y_pred_df = pd.DataFrame(y_pred.reshape(-1, 1), columns=['PredictedValue'])

If includeInputColumns option is set to True, the framework appends the predicted result to input columns and return the complete dataframe.

This concludes the process of developing scripts for Train and Apply of Custom models.

Related Blogs: Prepare data using inbuilt functions in OAC, How to add Related Datasets, How to Populate Quality Tab

How to Populate Quality Tab in ML Model Inspect page in Oracle Analytics Cloud

In this blog post we will discuss about how to Populate Quality Tab of a Machine Learning Model's Inspect page in OAC.

Assessing quality of a Machine Learning model is an important step in evaluating its performance. Various metrics like RSE, RAE Residuals , R-Squared Adjusted etc help in assessing the quality of the model prediction. If the error metrics are not satisfactory or does not meet user's goals, he/she can tune the model model further till the required level of accuracy is reached. So it is important to expose this quality information of a model in an intuitive and comprehensive fashion so that users can take next course of actions as necessary. Quality tab in Inspect page of ML model in Oracle Analytics Cloud aims to visualize and give complete information on model accuracy details. Here is a snapshot of Quality tab for Linear Regression Model that predicts Bike Rental Count:

In this blog we will talk about how to populate this Quality tab for a custom model. Quality Tab is populated by adding required Related datasets in Train Model script. More details on how to add a related dataset can be found in this blog. Quality tab is auto-populated if user adds the following related datasets:

For Numeric Prediction: If Residuals and Statistics datasets are added as Related datasets in Train Model script, ML framework in OAC takes information from that dataset and populates the Quality tab. Graph is populated using Residuals dataset and all the error metrics are populated using Statistics Dataset. Image shown above shows how Quality tab of a Numeric Prediction model looks like. Here is a sample code that shows how to add Statistics and Residuals Related datasets:

        #residuals dataset
        residuals_mappings = None
        residuals_ds = ModelDataset("Residuals", residuals_df, residuals_mappings)

        # statistics dataset
        statistics_mappings = None
        statistics_ds = ModelDataset("Statistics", statistics_df, statistics_mappings)

For Classification: For Binary and Multi-Classification models Confusion Matrix and Statistics Related datasets are used to populate the Quality tab. Here is a sample snapshot of Quality tab for Classification models:

Confusion Matrix Related dataset is used to populate the Confusion matrix table that can be seen in the image and Statistics Related dataset is used to populate the metrics. Here is a sample code that shows how to add Confusion Matrix and Statistics related datasets:
            metrics['Statistics'] = stats
            metrics['Confusion Matrix'] = confMatrix
            self.add_datasets(metrics=metrics, model=model)

Quality tab is updated automatically if user changes the values of user or model tuning parameters and trains the model. 

Related Blogs: How to build Train/Apply Model Custom Scripts in OAC, How to create Related DatasetsHow to use inbuilt methods in OAC to Prepare data for Training/Applying ML Model 

Pre-Processing and Preparing Data for ML Predictions in OAC

In this blog post we will talk about how to use inbuilt methods in OAC to cleanse and prepare data used for Training a Machine Learning model in OAC.

One of the important steps in Training a Machine Learning model is to cleanse and prepare the data that we are going to use to train the model. What exactly do we mean by "cleanse and prepare the data":

  "It is the process of detecting and correcting (or removing) corrupt or inaccurate records from a
    record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or
    irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data."

It is important to cleanse and prepare data because if we train a model with missing column values or anomalies/outliers in the data which can either be garbage data or simple outliers then the model prediction accuracy can go awry. To solve this problem Machine Learning feature in Oracle Analytics cloud provides some inbuilt methods to perform data cleansing/preparation which can be invoked in Train model scripts.

    In this blog post we will learn about how to cleanse/prepare Training data in a Custom Train Model script using inbuilt methods in Machine Learning in Oracle Analytics Cloud. Data cleansing/preparation process can be broken down into three steps: Data Imputation (filling missing values), Encoding (Converting Categorical to numerical values if necessary) and Standardization (Normalization). All the functions needed to perform these operations are implemented in a python module called datasetutils with in OAC. Users can develop their own functions or use the existing module. Here is a snapshot of the parameters accepted from UI for each of these operations while Training a model:


Here is brief description about each of the data preparation functions:

 1) Data Imputation: Data imputation is a process of filling missing values. There are multiple
     inbuilt imputation methods. Users are given option to choose the imputation method for filling the
     missing values for both numerical(Mean, Median, Min, Max) and Categorical(Most Frequent,
     Least Frequent) variables. datasetutils python module in OAC contains a function called fill_na()
     that performs data imputation. This code accepts the methods for imputation as parameters and
     returns a dataframe with data imputed for categorical and numerical columns. Following snippet
     of code shows a sample usage of fill_na() function.:
     # fill nan columns with mean, max, min, median values for numeric and most frequent, least
        frequent for categorical

     df = datasetutils.fill_na(df, max_null_percent=max_null_value_percent,
                 numerical_impute_method=numerical_impute_method, categorical_impute_method =

 2) Encoding: Encoding is a process of converting Categorical variables to numerical values. This is
      usually required in cases where regression needs to be performed. There are two inbuilt methods
      to perform encoding, they are: Onehot and Indexer. features_encoding() function in datasetutils
      module performs encoding. Following snippet of code performs encoding:

      # encoding categorical features
      data, input_features, categorical_mappings = datasetutils.features_encoding(df, target,     
encoding_method=encoding_method, type = "regression")

  3) Standardization: 
Standardization is a process of normalizing the data to reduce the effects of
      skews introduced due to outliers. standardize_clean_data() function in datasetutils module is an
      inbuilt method in OAC to perform standardization. Following is a sample snippet of code that
      performs standardization and returns a dataframe with standardized data by normalizing

      # Standardize data. This is to make sure that the data is normalized so as to reduce the
         influence of Outliers

      target_col = data[[target]]
      if standardization_flag:
            features_df = datasetutils.standardize_clean_data(data, input_features)
            features_df = data[features]

All these functions can be invoked directly from Train model custom scripts by importing datasetutils module in the beginning of Train Model script.

Related Blogs: How to build Train/Apply custom model scripts in OAC, How to Create Related Datasets, How to Populate Quality Tab

How to Create Related Datasets for ML Models in Oracle Analytics Cloud

In this blog post we will discuss how to Add Related datasets for custom Machine Learning models in OAC.

After a Machine Learning model is created it is important to evaluate how well the model performs, before we go ahead and apply that model. To evaluate how well a model performs there are various accuracy metrics like Mean Absolute Error (MAE), Root Mean Squared Error(RMSE), Relative Absolute error(RAE), Residuals etc for numeric prediction/Regression algorithms and False Positive Rate(FPR), False Negative Error, Confusion Matrix etc for classification algorithms. Machine Learning feature in Oracle Analytics cloud has inbuilt methods to compute most of these accuracy metrics and store them in Related datasets. Related datasets are the tables/datasets which contain information about the model like accuracy metrics and prediction rules. In our previous blog spot named Understanding the Performance of a Oracle DV Machine Learning models using related datasets feature we have covered in depth about Related datasets.

     In this blog post we will talk about how to add such related datasets in Custom Train model code. There are inbuilt methods in Oracle Analytics Cloud to add related datasets. User has to define the structure of these datasets i.e., columns these tables/datasets should contain, data type for all these columns and aggregation rules for these columns(if they are numeric). Once the required related datasets are added they can be found under Related Tab in Model Inspect pane:

Let us discuss in detail how to add Related datasets for a model:

ModelDataset() class implemented in Model module is a generic class that represents related datasets. Pass the name of the dataset that you are trying to create as an argument to ModelDataset() along with column names and mappings. This will return a related dataset. Generic Model class has an inbuilt method called add_output_dataset() which adds the passed dataset/dataframe as related dataset for that model. Following lines of code shows how to add a sample Related dataset called "Predicted Results" using df1 dataframe.

   df1=pd.DataFrame({target:y_test, \
                     "PredVal":y_pred1, \
                     "PredProb":[y_pred_prob1[i][list(clf1.classes_).index(y_pred1[i])] for i in range(len(y_pred1))]})

   df1_mappings = pd.DataFrame({
               'name':[target,'PredVal','PredProb','Target','Model Name'],
               'datatype':["varchar(100)","varchar(100)", double","varchar(100)","varchar(2000)"],
               'aggr_rule':["none","none", "avg","none","none"]})   

 model.add_output_dataset(ModelDataset("Predicted Results", df1, df1_mappings))

Mappings dataframe contains column names mapped to corresponding datatypes and aggregation rules. Some of these related datasets are used by the framework to populate Quality Tab in the Model Inspect page. More details on how to populate Quality tab can be found in this blog: How to populate Quality tab.

Related Blogs: How to build Train/Apply custom model scripts in OACHow to Populate Quality Tab, How to use inbuilt methods in OAC to Prepare data for Training/Applying ML Model 

Friday, November 24, 2017

Measure effectiveness of your Marketing Campaign using Oracle DV ML

In this blog we will talk about Cumulative Gains chart and Lift chart created in Oracle Data Visualization for Binary Classification ML models and how these charts are useful evaluating performance of classification model.

What are Cumulative Gain &Lift charts and what are they used for?
Let us suppose that a company wants to perform a direct marketing campaign to get a response (like a subscription , purchase etc) from users. It wants to run marketing campaign for around 10000 users out of which only 1000 users are expected to respond. But the company doesn't have a budget to reach out to all the 10000 customers. To minimize the cost company wants to reach out to as less customers as possible but at the same time reach out to most (user defined) of the customers who are likely to respond. Company can create ML models to predict which users are likely to respond and with what probability. Then the question comes which model should I choose? Which ML model is likely to give me the most of number of respondents with as less selection of original respondents as possible? Cumulative Gains and Lift chart answers these questions.

Cumulative Gains and Lift chart are a measure of effectiveness of a binary classification predictive model calculated as the ratio between the results obtained with and without the predictive model. They are visual aids for measuring model performance and contain a lift curve and baseline. Effectiveness of a model is measured by the area between the lift curve and baseline: Greater the area between lift curve and baseline better the model. One academic reference on how to construct these charts can be found here. Gains & Lift charts are popular techniques in direct marketing.

Sample Project for Cumulative Gains and Lift chart computation
Oracle Analytics Store has an example project for this that was build using Marketing Campaign data of a bank. This is how the charts look like:

Scenario: This Marketing Campaign aims to identify users who are likely to subscribe to one of their financial services. They are planning to run this campaign for close to 50,000 individuals out of which only close to 5000 people i.e., ~10% are likely to subscribe for the service. Marketing Campaign data is split into Training and Testing data. Using training data we created Binary classification ML model using Naive Bayes to identify the likely subscribers along with prediction confidence (note that the Actual values i.e., whether a customer actually subscribed or not is also available in the dataset). Now they want to find out how good the model is in identifying most number of likely subscribers by selecting relatively small number of campaign base(i.e., 50,000).

ML models are applied on Test data and got the Predicted Value and Prediction Confidence for each prediction. This prediction data and Actual outcome data is used in a dataflow to compute cumulative gain and lift values.

How to interpret these charts and how to measure effectiveness of a Model:
Cumulative Gains chart depicts cumulative of percentage of Actual subscribers (Cumulative Actuals) on Y-Axis and Total population(50,000) on X-Axis in comparison with random prediction (Gains Chart Baseline) and Ideal prediction (Gains Chart Ideal Model Line) which depicts all the 5000 likely subscribers are identified by selecting first 5000 customers sorted based on PredictionConfidence for Yes. What the cumulative Actuals chart says is that by the time we covered 40% of the population we already identified 80% of the subscribers and by reaching close to 70% of the population we have 90% of the subscribers. If we are to compare one model with another using cumulative gains chart model with greater area between Cumulative Actuals line and Baseline is more effective in identifying larger portion of subscribers by selecting relatively smaller portion of total population.

Lift Chart depicts how much more likely we are to receive respondents than if we contact a random sample of customers. For example, by contacting only 10% of customers based on the predictive models we will reach 3.20 times as many respondents as if we use no model.

Max Gain shows at which point the difference between cumulative gains and baseline is maximum. For Naive Bayes model this occurs when population percentage is 41% and maximum gain is 83.88%

How to compare two models using Cumulative Gain and Lift Chart in Oracle DV:
To compare how well two ML models have performed we can use Lift Calculation dataflow(included in the .dva project) as a template and plug in output of Apply Model dataflow as data source/input to the flow. Add the output dataset of Lift Calculation to the same project and add columns to the same charts as shown above to compare. Please note that the data flow expects dataset to contain these columns(ID, ActualValue, PredictedValue, PredictionConfidence). This is how it will look like when we compare two models using same visualizations: