Wednesday, March 25, 2020

How effective is my Marketing Campaign? Find out with Oracle DV ML

In this blog we will talk about a technique of comparing performance of two Machine Learning Binary classification models using Cumulative Gains chart and Lift chart. In one of our earlier blogs we have seen how to compare performance of two machine learning models: Which ML model is right for me. This blog goes a little further and deeper to explain the capabilities of Oracle DV Machine Learning in performing advanced model comparison techniques.

What are Cumulative Gains Chart and Lift chart and what are they used for?
Let us suppose that a company wants to perform a direct marketing campaign to get a response (like a subscription , purchase etc) from users. It wants to run marketing campaign for around 10000 users out of which only 1000 users are expected to respond. But the company doesn't have a budget to reach out to all the 10000 customers. To minimize the cost company wants to reach out to as less customers as possible but at the same time reach out to most of the customers who are likely to respond. Company can create ML models to predict which users are likely to respond and with what probability. Then the question comes which model should I choose ? Which ML model is likely to give me the most of number of respondents with as less selection of original respondents as possible? Cumulative Gains and Lift chart answers these questions.

Cumulative Gains and Lift chart are a measure of effectiveness of a binary classification predictive model calculated as the ratio between the results obtained with and without the predictive model. They are visual aids for measuring model performance and contain a lift curve and baseline. Effectiveness of a model is measured by the area between the lift curve and baseline: Greater the area between lift curve and baseline better the model. One academic reference on how to construct these charts can be found here. Gains & Lift charts are popular techniques in direct marketing.

Sample Project for Cumulative Gains and Lift chart computation
Oracle Analytics Store has an example project for this that was build using Marketing Campaign data of a bank. This is how the charts look like:



Scenario: This Marketing Campaign aims to identify users who are likely to subscribe to one of their financial services. They are planning to run this campaign for close to 50,000 individuals out of which only close to 5000 people i.e., ~10% are likely to subscribe for the service. Marketing Campaign data is split into Training and Testing data. Using training data we created two ML models using Naive Bayes and Logistic regression to identify the likely subscribers along with prediction confidence(Please note that the Actual values i.e., whether a customer actually subscribed or not is also available in the dataset). Now they want to find out which model is good at identifying most number of likely subscribers by selecting relatively small number of campaign base(i.e., 50,000).

ML models are applied on Test data and got the Predicted Value and Prediction Confidence for each prediction. Using this prediction data and Actual outcome data we have created dataflows to compute cumulative Gain and lift.
 

How to interpret these charts and how to measure effectiveness of a Model:
Cumulative Gains chart depicts cumulative of percentage of Actual subscribers (Cumulative Actuals) on Y-Axis and Total population(50,000) on X-Axis in comparison with random prediction (Gains Chart Baseline) and Ideal prediction (Gains Chart Ideal Model Line) which depicts all the 5000 likely subscribers are identified by selecting first 5000 customers sorted based on PredictionConfidence for Yes. Model with greater area between Cumulative Actuals line and Baseline is more effective in identifying larger portion of subscribers by selecting relatively smaller portion of total population.

Lift Chart depicts how much more likely we are to receive respondents than if we contact a random sample of customers. For example, by contacting only 10% of customers based on the predictive models we will reach 2.09 and 3.20 times as many respondents as if we use no model for Logistic Regression and Naive Bayes models respectively.

Max Gain shows at which point the difference between cumulative gains and baseline is maximum. For Logistic Regression this occurs when population percentage is 23% and maximum gain recorded is 41.84% for Naive Bayes this occurs when population percentage is 41% and maximum gain is 83.88%

By simply examining we can see that Naive Bayes model has larger area between cumulative gains curve and baseline and is a better model for prediction between the two.

Are you an Oracle Analytics customer or user?

We want to hear your story!

Please voice your experience and provide feedback with a quick product review for Oracle Analytics Cloud!
 

Monday, March 9, 2020

Where is the log for my Data-Flow ?


Have you ever wondered exactly what is happening when you execute your Oracle Analytics data flow ? How can you see the exact log of actions triggered by the dataflow you just executed ?

Most of us rely only on the system confirmation that an OAC data-flow ran successfully along with the creation of a refreshed data set. But what if it fails ? What if, the user is interested in know what exactly happened in the underlying Database source, and what was retrieved in the OAC engine ?
                     







In our example scenario we need to track the logs of a data-flow we just built. The logs exists in the OAC server, but we don't know how to easily find them. A simple trick to help identify which log is the right one, is to mark our data-flow with a searchable marker. That can be done by adding a new column with a unique name in the data-flow. In this screenshot below, we are adding a column named 'TIP33' for example:







 
Now lets see what happened….



A data-flow fires a set of 'logical SQL' commands to the Oracle Analytics server, just like any other data-viz query do, and hence generates various logs behind the scenes that can be visited to track what exactly happened when it was run. But how to find the right log... Here is a simple pragmatic trick to allow users a peak behind the scenes and do this is quick and simple.
































Note that this addition to the data-flow will have no adverse effect on the resulting data. Let's just save the dataflow and re-run it now


Let's open OAC Console and navigate to Session and Query Cache


Here we can see the server output of any activity within the OAC Server. To find the most recent queries pertaining to our Data-flow run lets just search the page for TIP33.


This highlights entries in the log that pertain to our Data Flow. By looking up the logs, physical SQLs, # of rows returned, and potentially error messages, Users can now see what executed when the dataflow was run…… we can now rapidly fine tune our dataflows designs to optimize them !


Are you an Oracle Analytics customer or user?

We want to hear your story!

Please voice your experience and provide feedback with a quick product review for Oracle Analytics Cloud!