Friday, November 24, 2017

Measure effectiveness of your Marketing Campaign using Oracle DV ML

In this blog we will talk about Cumulative Gains chart and Lift chart created in Oracle Data Visualization for Binary Classification ML models and how these charts are useful evaluating performance of classification model.

What are Cumulative Gain &Lift charts and what are they used for?
Let us suppose that a company wants to perform a direct marketing campaign to get a response (like a subscription , purchase etc) from users. It wants to run marketing campaign for around 10000 users out of which only 1000 users are expected to respond. But the company doesn't have a budget to reach out to all the 10000 customers. To minimize the cost company wants to reach out to as less customers as possible but at the same time reach out to most (user defined) of the customers who are likely to respond. Company can create ML models to predict which users are likely to respond and with what probability. Then the question comes which model should I choose? Which ML model is likely to give me the most of number of respondents with as less selection of original respondents as possible? Cumulative Gains and Lift chart answers these questions.

Cumulative Gains and Lift chart are a measure of effectiveness of a binary classification predictive model calculated as the ratio between the results obtained with and without the predictive model. They are visual aids for measuring model performance and contain a lift curve and baseline. Effectiveness of a model is measured by the area between the lift curve and baseline: Greater the area between lift curve and baseline better the model. One academic reference on how to construct these charts can be found here. Gains & Lift charts are popular techniques in direct marketing.

Sample Project for Cumulative Gains and Lift chart computation
Oracle Analytics Store has an example project for this that was build using Marketing Campaign data of a bank. This is how the charts look like:

Scenario: This Marketing Campaign aims to identify users who are likely to subscribe to one of their financial services. They are planning to run this campaign for close to 50,000 individuals out of which only close to 5000 people i.e., ~10% are likely to subscribe for the service. Marketing Campaign data is split into Training and Testing data. Using training data we created Binary classification ML model using Naive Bayes to identify the likely subscribers along with prediction confidence (note that the Actual values i.e., whether a customer actually subscribed or not is also available in the dataset). Now they want to find out how good the model is in identifying most number of likely subscribers by selecting relatively small number of campaign base(i.e., 50,000).

ML models are applied on Test data and got the Predicted Value and Prediction Confidence for each prediction. This prediction data and Actual outcome data is used in a dataflow to compute cumulative gain and lift values.

How to interpret these charts and how to measure effectiveness of a Model:
Cumulative Gains chart depicts cumulative of percentage of Actual subscribers (Cumulative Actuals) on Y-Axis and Total population(50,000) on X-Axis in comparison with random prediction (Gains Chart Baseline) and Ideal prediction (Gains Chart Ideal Model Line) which depicts all the 5000 likely subscribers are identified by selecting first 5000 customers sorted based on PredictionConfidence for Yes. What the cumulative Actuals chart says is that by the time we covered 40% of the population we already identified 80% of the subscribers and by reaching close to 70% of the population we have 90% of the subscribers. If we are to compare one model with another using cumulative gains chart model with greater area between Cumulative Actuals line and Baseline is more effective in identifying larger portion of subscribers by selecting relatively smaller portion of total population.

Lift Chart depicts how much more likely we are to receive respondents than if we contact a random sample of customers. For example, by contacting only 10% of customers based on the predictive models we will reach 3.20 times as many respondents as if we use no model.

Max Gain shows at which point the difference between cumulative gains and baseline is maximum. For Naive Bayes model this occurs when population percentage is 41% and maximum gain is 83.88%

How to compare two models using Cumulative Gain and Lift Chart in Oracle DV:
To compare how well two ML models have performed we can use Lift Calculation dataflow(included in the .dva project) as a template and plug in output of Apply Model dataflow as data source/input to the flow. Add the output dataset of Lift Calculation to the same project and add columns to the same charts as shown above to compare. Please note that the data flow expects dataset to contain these columns(ID, ActualValue, PredictedValue, PredictionConfidence). This is how it will look like when we compare two models using same visualizations:


Wednesday, November 22, 2017

Which ML model is the right one for me?

In the world of Machine learning quite often we would want to create multiple prediction models, compare them and choose the one that is more likely to give results that satisfy our criteria and requirements.

These criteria can vary, sometimes models which have better overall accuracy are chosen, sometimes models that have least Type I and Type II errors(False Positive and False Negative Rates) are chosen,  and in some cases models that return results faster with acceptable level of accuracy are chosen (even if not ideal), and there are more such criteria.

Oracle DV has multiple Machine Learning algorithms implemented out of the box for each kind of prediction/ classification. So users have luxury to create more than one model using these algorithms, or using different fine-tuned parameters to those algorithms or using different input training datasets and then, choose best model out of them. But to choose the best model, we need to compare two models and weigh them against our own criteria.

So how to compare these models? Where can we find the data in Oracle Data Visualization to do this comparison?  In our previous blog we have talked about related datasets and model quality details they contain. Here is an example of how to use these related datasets to compare two models based on a criteria: Choose model with least Type II (False Negative Rate) errors. This video explains the process of using these related datasets to compare two models:


Thursday, November 16, 2017

Predicting Sales using Oracle Data Visualization

New Machine learning feature in Oracle Data Visualization lets users train/build their own Machine learning models which can perform various prediction and classification operations like Numeric Prediction, Classification and Clustering. To know more about Machine Learning feature download Oracle Data Visualization Desktop from here and play around with it.

Below video demonstrates an example on using Machine Learning algorithms in Oracle Data Visualization to predict expected Bike Rentals for a Bike renting company which wants to prepare itself for the upcoming demand. 

Example seen in the video can be downloaded from Oracle Analytics Store. Name of the project is Example DV Project: Bike Rental Prediction:

To predict the demand we will use one of the most commonly used ML techniques: Numeric Prediction. Numeric Prediction is a common requirement in business world, classic examples include Sales forecast, demand prediction, stock price prediction etc.

Oracle DV comes loaded with multiple Numeric prediction algorithms and users can choose any one of these algorithms based on the need. List of algorithms include Linear Regression, Elastic Net Linear Regression and Classification and Regression Tree(CART) for Numeric prediction. Here is a snapshot showing list of algorithms in Oracle DV:

Users can develop their own custom Python/R scripts that can perform Numeric prediction and upload it to Oracle Data Visualization. Uploaded scripts can be invoked from dataflows in Oracle DV. In case you are interested here is a short video showing how to upload format and upload custom Python scripts.

How strong are ours hearts? Oracle DV ML helps with the answer.

In this demonstration video, Oracle DV machine learning algorithms are applied on patient health data to predict heart disease likelihood. Multi-classification Machine Learning technique is used in this demonstration. The process shown in the video can be summarized as follows:

1) Get data of patients known to have heart disease. This dataset contains information related to heart disease like Blood Sugar, cholesterol and other medical information about the individual.
2) Create a multi-classification neural net model using that data.
3) Use that model to predict the Heart disease likelihood in other individuals for whom we know their medical history/information.

Example seen in the video can be downloaded from Oracle Analytics Store. Name of the project is Example DV project: Heart Disease Prediction:

More than often most of us (individual users as well as businesses) have access to historical data which contains information on whether a particular event has happened or not; under what conditions has it happened and what are the values of other factors involved in this event. Wouldn't you want to use this historical data to predict whether that event is likely to happen or not? (likely? Less Likely? More Likely? definitely?).

The method of training a model using actual known values of a column, to predict the column value for unknown cases, comes under the domain of Supervised Machine Learning. Oracle Data Visualization comes equipped with inbuilt algorithms to perform such supervised multi-classification and others. Users can choose any one of these algorithms based on the need. Here is a snapshot showing list of inbuilt algorithms in Oracle DV that can perform Multi-classification:


Predicting Attrition using Oracle DV Machine Learning (Binary Classification)

Latest release of Oracle Data Visualization has inbuilt Machine Learning features. This means  users can now build their own models from training data and use these trained models for prediction and classification. Good news is that Oracle DV comes equipped with host of ML algorithms that can perform Numeric Prediction, Multi & Binary Classification and Clustering in addition to allowing your own custom model scripts for train & score.

In this blog we are going to focus on Binary classification algorithms and show how to use those inbuilt algorithms for addressing a real-life, common question for any organization: Predict Employee Attrition - i.e. find which employees are likely to quit.

Before we venture any further let us try to understand briefly what is Binary classification. Binary classification is a technique of classifying records/elements of a given dataset into two groups on the basis of classification rules for ex: Employee Attrition Prediction whether the employee is expected to Leave or Not Leave (Leave and Not Leave are two different groups).

These classification rules are generated when we train a model using training dataset which contains information about the employees and whether the employee has left the company or not. Oracle DV is shipped with multiple algorithms that can perform Binary classification. Here is a snapshot showing list of inbuilt algorithms in Oracle DV that can perform binary classification:

Users can also upload their own Python/R scripts(with appropriate tags) which can perform Binary classification and these custom algorithms will show up in the list and can be used for prediction.

Now let us see how one of these inbuilt algorithms can be used to predict Employee Attrition prediction i.e., whether the employee will leave or not i..e, Yes or No. This video explains process of model creation as well as prediction process (i.e. scoring using created model).

Example seen in the video can be downloaded from Oracle Analytics Store. Name of the project is Example DV project: Attrition Prediction:


Wednesday, November 8, 2017

Understand Performance of Oracle DV Machine Learning models using Related Datasets feature

In this blog we dicuss Related datasets produced by Machine Learning algorithms in Oracle Data Visualization.

Related datasets are generated when we Train/Create a Machine learning model in Oracle DV (present in onwards, called V4 in short). These datasets contain details about the model like: Prediction rules, Accuracy metrics, Confusion Matrix, Key Drivers for prediction etc depending on the type of algorithm. Related datasets can be found in inspect model menu: Inspect Model -> Related tab.

These datasets are useful in more ways than one. These datasets let users examine/understand the rules used by model to do prediction/classification, this in-turn will help in fine tuning the model to get better results. Related datasets are also useful in comparing models, in determining which is better than others for solving the same problem.

Here is a pictorial representation of Related datasets generated by different out of the box Machine algorithms in Oracle Data Visualization V4:


Different ML algorithms generate similar Related datasets and all of them can be clubbed into 8 datasets. Individual parameters and column names may change in dataset depending on the type of algorithm, but the functionality of dataset remains the same for ex: columns in Statistics dataset may change Linear Regression and Logistic Regression, but statistics dataset contains accuracy metrics of the model. Here is a brief description of each of these datasets:

1) Drivers: This dataset gives information on columns that are key determinants/drivers of the target column value. Train/Create model performs linear regression and identifies columns that take part in predicting the values for target column. Each of the identified columns are assigned coefficient and correlation values. Coefficient value talks about the weight-age given to that column in determining the target column value and correlation refers to the direction of relationship with target column i.e., if the target value increases or decreases with corresponding change in dependent column.

2) Residuals: This dataset also gives information on the quality of model prediction, Residuals in particular. Residual is the difference between the measured value and the predicted value of a regression model. This dataset gives an aggregated(sum) value of absolute difference between Actual and Predicted values for all the columns in dataset. This dataset is visualized using a bar graph in the Quality tab Linear Regression model Inspect menu.

3) CARTree: This dataset is a tabular representation of Decision Tree computed to predict the target column values. It contains columns that represent the conditions and criteria for conditions in decision tree, prediction for each group, prediction confidence. Inbuilt Tree Diagram visualization can be used to visualize this decision tree.

4) Confusion.Matrix: Confusion Matrix also known as error matrix is a specific table(pivot) layout that allows visualization of performance of an algorithm. Each row of the matrix represents instances of predicted class while each column represents instances in an actual class. This table reports the number of false positives, false negatives, true positives, and true negatives based on which precision, recall, F1 accuracy metrics are computed.

5) Hitmap: This dataset contains information on leaf nodes in the decision tree. Each row in the table represents a leaf node and it contains information the criteria/Branch-segment that leaf node represents, Segment Size, Confidence and Expected # of rows i.e., expected number of correct predictions = Segment Size * Confidence.

6) ClassificationReport: This dataset is a tabular representation of accuracy metrics for each distinct value of target column. For ex: if the target column can have two distinct values 'Yes' and 'No' , this dataset shows accuracy metrics like F1, Precision, Recall, Support(number of rows in Training dataset with this value) for each and every distinct value of Target column.

7) Summary: This dataset contains a summary of input and optional parameters to the model specified during model creation and contains details like Target name and Model name.

8) Statistics: This dataset contains metrics that quantify model accuracy. Depending on the algorithm/model that generates this dataset metrics present in the dataset will vary. Here is a list of metrics based on the model:

  • Linear Regression, CART numeric, Elastic Net Linear:
    • R-Square, R-Square Adjusted, Mean Absolute Error(MAE), Mean Squared Error(MSE), Relative Absolute Error(RAE), Related Squared Error(RSE), Root Mean Squared Error(RMSE)
  • CART(Classification And Regression Trees), Naive Bayes Classification, Neural Network, Support Vector Machine(SVM), Random Forest, Logistic Regression:
    • Accuracy, Total F1

Now you know what the Related datasets are and how they can be useful for fine tuning your Machine Learning model or for comparing two different models.


Maps - How to extract a geoJSON from Oracle DB map theme for use in OracleDV

In this blog we will discuss about how to create a GeoJSON map layer from an existing Oracle DB map theme. This helps Oracle customers who have their maps/spatial data in Oracle Database and wants to leverage that investment in Oracle Analytics - Data Visualization. 

What is an Oracle Map Theme? Oracle Map Themes are also called Geometry Theme. A theme is a visual representation of a particular data layer. Using Oracle Map builder you can extract a GeoJSON from this Geometry theme. This geoJSON can be directly uploaded into Oracle Data Visualization as a custom map layer.

 Oracle DB Map Theme for a sample of congressional districts (preview on Oracle Map Builder):

Extracted map layer rendered in Oracle Data Visualization Map:

Detailed steps on how to do this conversion can be found in this document

High level steps:

1) Install Oracle Map Builder(If not installed already).
2) Connect to the Database Schema where the maps/Spatial data is present
3) Select the table that contains Geometry Theme data
4) Use Map Builder tools to extract this geometry table in to geoJSON.


Maps - How to convert a Map Shapefile to geoJSON for use in Oracle DV

Have geographic map layer data sitting in a shapefile format and would like to visualize it in Oracle Data Visualization? In this blog we will discuss how to use Oracle tools to convert a shapefile into geoJSON format for use in Oracle Data Visualization.

Shapefile format is a digital vector storage format for storing geometric location and associated attribute information. The shapefile format can spatially describe vector features like points, lines, and polygons representing different kind of geographies.  File name extension of shapefiles is .shpMore information on shapefiles can be found here. 

GeoJSON is a format for encoding a variety of geographic data structures like maps of Cities, State, countries etc. Oracle DV supports custom map layers defined in GeoJSON formats.   

Using Oracle Map Builder you can convert shapefiles to geoJSON files. GeoJSON can be directly uploaded into OracleDV as a custom map layer and the data can be visualized directly on top of the Map layer. See detailed instructions here

Overview of steps involved:

1) Install Oracle Map Builder .
2) Use Export to JSON feature in Map Builder and use ShapefileSDP as source type and use the shape file to convert.
3) Choose appropriate key columns and SRID to do the conversion.

Maps - How to convert an Image to geoJSON for use in Oracle DV

Many a time we would encounter images of a geographic layout(like a floor plan of a shopping mall, musuem, Airport or a demo hall etc) and wonder how great it would be to be able to visualize data with this layout as a map layer in OracleDV. Great news is this is possible! and in this blog we will discuss about how to convert an image to a geoJSON file format. Using a combination of Oracle tools users can convert image of a layout into geoJSON

Here is a snapshot that shows how a floor plan map layer extracted from an image looks like:
                                                             Floor Plan Image

                                   Floor Plan Custom Map Layer extracted from Image

One way to achieve this is using Oracle Map Builder and Oracle Map Editor tools. 

Step by step instructions on how to convert image to a map layer: Image to geoJSON Map Layer
Here is a high level view of steps involved in this process: 

1) Create a Base Map using the Image file received, using Oracle Map Builder tool. 
2) Create a GeoRaster theme using the Base Map using Map Builder tool
3) Create a Base Map based on the GeoRaster Theme using Map builder tool
4) Create a Geometry layer to show different regions on the map using Map Editor tool
5) Create a Theme(a database table) based on the Geometry Layer using Map Builder tool
6) Export the Theme to geoJSON using Oracle Map builder.


Tuesday, May 2, 2017

Collapsible Tree Plugin

In this blog we will talk about the Collapsible Tree custom visualization plugin. It is a representative of D3's family of hierarchical layouts.

It is designed to produce a 'node-link' diagram that lays out the connection between nodes in a method that displays the relationship of one node to another in a parent-child fashion.

The collapsible tree plugin can be downloaded from the Oracle BI Public Store.

Monday, April 3, 2017

Auto Refresh Plugin

Ever wanted to analyze changing or streaming data on Oracle Data Visualization? Wanted to perform analytics on sliding windows of increasing time series data? A plugin can help.

In this blog, we will talk about an exciting custom plugin for Oracle DV that allows you to refresh your data and data sources used in your DV projects automatically. This is done through Auto Refresh Custom Visualization plugin.

This plugin has the following capabilities
  • An option to refresh either the data or the data sources
  • Refresh Now - This is one time refresh and refreshes the data/data sources as and when you press the Refresh Now button
  • Periodic Auto Refresh - On the click of the timer refresh button ( button with timer symbol inside the refresh icon), a timer is set off which fetches the data periodically in the time interval specified in the number box. This auto refresh can be stopped by clicking on the stop button.
The Auto Refresh Custom Plugin can be downloaded from Oracle BI Public Store.

Here is a brief demo to show how the plugin works:

Tuesday, March 28, 2017

Make your Oracle DV visualizations sing the tunes of Motion charts using Dim Player Plugin

In this blog we will talk about Dim Player Custom Visualization plugin. This plugin lets you explore several measures/attributes in all the visualizations in your canvas over any dimension columns like time, geography etc. This plugin makes all the visualizations in DV canvas behave like Dynamic charts. This custom visualization can be downloaded from Oracle BI Public Store

How does it work: The DIM Player plugin  plays through the values of a dimension column like time, region etc and automatically updates all the visualizations in the canvas with the values of that dimension one at a time. There are two modes in which you can use DIM player plugin.
1) Use as Filter: If the DIM Player plugin is used as a filter using "Use as Filter" option in DV then it simulates motion charts for all the other visualizations in the canvas.
2) Brushing: If plugin is not used as filter it will act as a brushing sequencer and brushes/highlights charts in the canvas based on the dimension value.

The DIM Player also allows you to play, pause and stop while playing through the values.

Here is a brief demo to show how DIM player works:

Friday, February 24, 2017

OracleDV: Calculate correlation between numerical and categorical variables.

In this blog we will talk about two custom R-scripts that calculates and plots(resp) Correlation not just between two numerical variables, but between numerical and or categorical variables. Before we jump into the details about this script, let us understand what is correlation. Correlation refers to the extent to which two variables have a linear relationship with each other. Some of the famous and well known measures to compute correlation between variables include: Pearson's Product Moment coefficient, Rank correlation coefficients, Kendall and Spearman coefficients. But these coefficients work well only with numeric variables. To compute correlation between two categorical variables or between a numerical and categorical variable chi-squared test or ANOVA.

In these R-scripts we tried to address the need for a script which can compute correlation between not only two numeric variables but also between numeric and or categorical variables(num vs categorical and categorical vs categorical). Like we mentioned earlier there are two custom R-Scripts, first script computes just computes the correlation and returns the results in tabular format and 2nd script computes the correlation, plots these correlation coefficients using corrplot R-package and returns these R-visualizations.These scripts use Goodman Kruskal Algorithm (more information here) to compute correlation between num vs categorical variables and categorical vs categorical variables. To compute correlation between two numeric variables the script can use various methods like : pearson, kendall and spearman depending on users' preference. To demonstrate these scripts we have attached a sample .dva project which demonstrates how the R-Scripts can be invoked in OracleDV. You can download this script from Oracle BI Public Store.This is how your OracleDV should look like after you import the .dva project:

Please note that you have to deploy R Viz(Base64Image) custom plugin before you import the .dva project.

How does this scripts work: This Script computes correlation between two variables and generates plots using corrplot R-package. The variables can either be both numeric or numeric and categorical or both categorical. This script uses two methods to calculate correlation coefficient depending on the type of input variables. Following are the methods:
1) If the variables are all numeric then the script uses one of Pearson,Kendall and Spearman methods depending on users preference.
2) For computing correlation between categorical and numerical or between categorical and categorical variables, script uses Goodman Kruskal Algorithm.
 Script scans the datatype of input data frame and if all the columns are numeric then it chooses method-1 else it chooses method-2. Script returns correlation coefficient for each pair wise combination of the input columns.

1) id: ID to uniquely identify each column and to avoid auto aggregation.
2) column1 ... column12: Columns list for which correlation needs to be computed between each possible pair. If user needs to compute correlation between more columns, more inputs can be added to this script in exactly the same format as existing input columns.

Optional Inputs:
1) column_names: Names of the columns sent as input to the R-Script, excluding ID column. This is needed to name the columns appropriately in the output returned by R.
2) corr_method: This is applicable only if all the columns are numeric. If all the columns are numeric/metric then the script lets user choose anyone correlation method from Pearson,Kendall and Spearman.
3) plot_width: Width of the plot generated by the R-Script. Default is 400
4) plot_height: height of the plot generated by the R-Script. Default is 400

1) corr_col1: Name of first column in the pair of columns for which we are trying to compute correlation.
2) corr_col2: Name of second column in the pair of columns for which we are trying to compute correlation.
3) img* columns return the R plots in base 64 encoded image format. R Viz(base64image) custom Viz plugin parses these base64 encoded image strings and displays the image on DV Desktop canvas.

Package Dependency: corrplot, reshape, data.table, classInt, base64enc

This package contains two R-scripts:
1) R.Correlation.xml: This R-Script computes correlation between the variables.
2) R.CorrelationPlot.xml: This R-Script, in addition to computing the correlation coefficient also displays the correlation plot and converts the images to base64 encoded string formats and sends it to DV. Base64Image custom visualization converts these strings back to image.

Steps to deploy this plugin in your local Oracle DV:

1) Install Advanced Analytics feature in Oracle DV by clicking on the below icon. This will install Oracle R deployment. Alternatively you can install Advanced Analytics by running install_advanced_analytics.cmd present in <DV_INSTALL_DIRECTORY>

2) Install R-Packages:
    Open R console(double click Rgui.exe present in <Advanced_Analytics_Install_Dir>\bin\x64),
    install arules Package. Following are the R-commands to install:
     Set Proxy:
        $ Sys.setenv(http_proxy="<your_proxy_host>:<port_number>")
           set proxy appropriate to your network config.
     Install Package(updated instructions):
        $ install.packages("corrplot")
        $ install.packages("reshape")
        $ install.packages("data.table")
        $ install.packages("classInt")
        $ install.packages("base64enc")
3) Download from OracleBI Public Store and unzip it.
4) Copy R.Correlation.xml and R.CorrelationPlot.xml to <DV_INSTALL_DIRECTORY>\OracleBI1\bifoundation\advanced_analytics\script_repository
5) Deploy R Viz(Base64Image) Custom visualization.
6) Import the .dva project to Oracle DV. Password for the .dva is Admin123

Thursday, February 23, 2017

Customize look & feel of Oracle DV using skin plugins

In this blog post we will discuss about customizing the appearance of your Oracle DV Desktop by using skin plugins. Companies and/or users may want to change the appearance of DV for reasons such as house style, professionalism or simply for fun.

Oracle DV Desktop's UI is generated using scripts and is therefore highly customizable. The look and feel aspects is controlled by skins and styles. Customization can be achieved by editing the following css (cascading style sheet) files that can be packaged and deployed as a skin plugin.

Check out the skin plugin example on the Oracle BI Public Store. Main CSS files and key UI elements it drives are listed below. You may launch DV in SDK mode and use the property inspector in the browser to explore this yourselves.

  • applicationstyles.css - responsible for the global level styles including the logo, progress pane, menus, context-menus, font-icons, dialogs, gadgets, tooltips, etc
  • dataenrichstyles.css - responsible for the Advanced Analytics styles including the Analytics tab in the gadget dialog.
  • homepagestyles.css - responsible for the styles of  home page and data source page.
  • ojetstyles.css - responsible for the JET styles of data visualizations, tabs, buttons, menus, dialogs, trees, text input, etc.
  • reportstyles.css - responsible for the project level styles.  The majority of the non-visualization styling is handled by this css including Insights, search, color management, the fingerpane, the gadget/properties dialog, filter bar, data sources, expressions, toolbar, save dialog, etc.
 In case you want to explore further there are other css files 
  • filterstyles.css - responsible for the filter styles including the date range, expression, list and number range filters.
  • stagestyles.css - responsible for the styles of the stage and data source diagrammer.
  • thirdpartystyles.css - responsible for the styling of the 3rd party components including:
    • JQuery UI - utilized by gadget sliders, drop target tooltips, and resizable components like the image visualization, floating panels, and layouts
    • CodeMirror - utilized by the expression text editor
    • Spectrum - utilized by the color picker
  • vizstyles.css - responsible for the visualization level styles of the visualization placeholders, drop targets, image visualization, tile visualization, textbox visualization, legend, etc.


To apply the sample customization, perform the following steps.
  • Access the sample plugin here
  • Copy the sample plugin to your plugins directory %LOCALAPPDATA%\DVDesktop\plugins
  • Restart the server
You should see something like this.

Sample Customization

You can notice that there is a change in the logo, background color of header and that there is a green colored theme in your Oracle DV.

This was achieved by making the following changes:
  • In application-styles.css, the Oracle logo was replaced with a new logo
-   content:"\e666"; 
+  content: url("star_logo.png"); 
CAVEAT: The logo must be of the size 130 x 25 px in width and height respectively. Incorrect size would need more corrections to fit it within that frame. Also make sure that you provide the correct name of the logo.

  • The dark green background color was applied to the header by making the following change to the homepagestyles.css
.bitech-global-header > div:first-child{
background-color: green;

  • The light green background color to the explore panel was applied by making the following change in the reportstyles.css 
+ background-color: #C0D9AF;
There are many more css changes that needs to be done to achieve the customization shown in the sample customization. However all the changes follow the same form as what is described above.

Tuesday, February 21, 2017

Build your own Recommendation engine(Collaborative Filtering) on Oracle DV using Custom R-Scripts

In this blog we will discuss about a custom R-script that creates a Recommendation engine by performing collaborative filtering. Before we get into any details about this R-script let us understand what is Collaborative Filtering and Recommendation system/engine. Collaborative Filtering is a method of making automatic predictions(filtering) about the interests of a user by collecting preferences or taste information from multiple users(collaborate). The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, then A is more likely to have B's opinion on a different issue/object than that of a randomly chosen person. So when you have to design a recommendation engine which recommends items to be purchased by a user say A based on his past purchases, it can perform collaborative filtering by checking who else bought same products as user A and what additional items were bought by those users and recommends those additional items to user A based on ratings. In addition to the recommendation, collaborative filtering can also predict what could be the possible Rating given to the recommended product by user A. This custom R-script can be downloaded from Oracle BI Public store. This is the R-Script to download :


In addition to the R-Script we have provided you a sample dva project which demonstrates how to use the R-Script. This is how the project looks like after importing the .dva file in DV Desktop:

How does this script work: This script performs Collaborative Filtering by taking data on purchases/subscriptions/movies watched along with the ratings and returns top N recommendations for users along with rating that is expected(predicted) to be given by the user for those recommended items. This script performs two kinds of collaborative filtering depending on the users' input and they work as follows:
1) User Based Collaborative Filtering (UBCF): Look for users who share the same rating patterns with the active user (the user whom the prediction is for). Use the ratings from those like-minded users found in step 1 to calculate a prediction for the active user.
2) Item Based Collaborative Filtering (IBCF): users who bought x also bought y : Build an item-item matrix determining relationships between pairs of items. Infer the tastes of the current user by examining the matrix and matching that user's data.
Please note that IBCF is resource consuming process, so we recommend to save and reuse the Recommender model incase you are using IBCF. This can be done by setting optional parameter reuse_savedmodel to "YES". If you are reusing the model, then please make sure that you are reusing it on identical data i.e., User and Item Names/Ids should be the same as stored in the model.

This script also provides the option to save the prediction model and reuse it later. If we are reusing the saved model, then the data using which the model is created/saved will act as train data and current data will act as the test data. Application of this script is not limited to datasets related Movies/Television it can be applied for other product segments like books and/or for products from different categories.

Inputs to the Script:
1) userid: Name/ID of the user
2) itemid: ID of the item.

3) rating: Rating given by user for this item.

Optional Inputs: 
1) topn: Top N recommendations to be returned for each user.
2) method: What is the collaborative filering method to be used. Options are UBCF and IBCF
3) reuse_savedmodel: Option to choose already saved model for prediction or to create a new model. If reuse_savedmodel is set to "YES", currently saved model will be reused. If no model exists as of now, a new model will be created. If reuse_savedmodel is "NO" a new model will be created even if a model exists.
4) model_directory: Place where the created model should be saved. Even if you choose not to reuse the saved model, please select a valid directory to save the model as the script requires the model to be saved on disk. I am choosing temp directory, so that I need not worry about cleaning it up manually every time. Make sure you have correct privileges on the directory.

1) userid: Name/ID of the user
2) recommended_item: ID/name of the item recommended.
3) predicted_rating: Predicted rating for the recommended item.
4) dummy: Dummy output.

R Packages needed:
1) reshape2
2) recommenderlab

Steps to deploy this plugin in your local Oracle DV:

1) Install Advanced Analytics feature in Oracle DV by clicking on the below icon. This will install Oracle R deployment. Alternatively you can install Advanced Analytics by running install_advanced_analytics.cmd present in <DV_INSTALL_DIRECTORY>

2) If not installed reshape2 & recommenderlab Package already, please install it using following instructions:
    Open R console(double click Rgui.exe present in <Advanced_Analytics_Install_Dir>\bin\x64),
    install arules Package. Following are the R-commands to install:
     Set Proxy:
        $ Sys.setenv(http_proxy="<your_proxy_host>:<port_number>")
           set proxy appropriate to your network config.
     Install Package(updated instructions):
        $ install.packages("reshape2")
        $ install.packages("recommenderlab")
3) Download from OracleBI Public Store and unzip it.
4) Copy R.CollaborativeFiltering.xml to <DV_INSTALL_DIRECTORY>\OracleBI1\bifoundation\advanced_analytics\script_repository
5) Create a directory Model_dir under D drive. This is to save the model files. If you intend to save the model files in a different directory, then please change the value of model_directory parameter in inputs to EVALUATE_SCRIPT function in DV.
6) Import the .dva project to Oracle DV. Password for the .dva is Admin123

Monday, February 20, 2017

OracleDV : Calculating distance using latitude/Longitude

In this blog we will talk about how to compute distance between two points using latitude and longitude using inbuilt functions in Oracle DV. In Geospatial Analysis, requirement to compute distance between two points using latitude and longitude is quite prevalent. Haversine formula is frequently used to calculate distance between two points on earth using latitudes and longitudes. Haversine formula computes great circle distance(distance as measured along the surface of earth/sphere rather than the distance through the sphere/earth). This formula is based on a generic formula in Spherical trignometry, called law of haversines. Following is the formula:

* snapshot taken from Movable Type Script site

Following is the calculation in OracleDV to compute the distance between two lat longs using Haversine formula:

WHEN Source_Lat=Dest_Lat AND Source_Long=Dest_Long 
* COS(RADIANS(90-Dest_Lat)) 
+ SIN(RADIANS(90-Source_Lat))
* SIN(RADIANS(90-Dest_Lat))
* COS(RADIANS(Source_Long-Dest_Long))  )

In this formula:
Source_lat refers to Source Latitude
Source_Long refers to Source Longitude
Dest_Lat refers to Destination Latitude
Dest_Long refers to Destination Longitude

Please note that Source and destination are used only for naming convenience, they can actually be used interchangeably. Distance computed using lat long may differ from the actual driving distance between two points depending on various factors such as road connectivity and presence of other geographic bodies. Here is a snapshot of the project on Oracle DV Desktop.

More information on Haversine formula can be found here.

Applications: Ability of OracleDV to handle such distance calculated formulae demonstrates the capability of DV to perform spatial analytic operations which involve calculating the number of stores/customers within a radius of certain driving distance etc. To demonstrate this capability better we have implemented a sample project using this formula to find out what are the establishments within 2 mile radius of WESTERN STATE BANK. Here is a snapshot:

We have attached the .dva project as well. You can download it from here and play with it.

Friday, February 17, 2017

Geolocate IP Address on OracleDV using Custom R-Script

Prerequisite - Internet Connection
In this blog we will discuss about how to Geolocate IP Address on OracleDV using Custom R-Script. Geolocation is the process of identification or estimation of the real-world geographic location of an object like Mobile phone or a Computer/Machine. Real-world geographic location details include City,Region,Country,Postal code and most importantly Latitude and Longitude etc. This script uses web service to geolocate the IP address. This script takes IP Address and an ID column which uniquely identifies this IP address, invokes the web service and returns geographic details of the location where this machine is used. These details include Latitude,Longitude, City, Country, Region Postal Code among other information. This script can be downloaded from Oracle BI Public Store and here is how your DV Desktop will look like after deploying this plugin:

Please note:
1) You have to deploy Heat Map custom visualization plugin from Oracle BI Public store to get above rendering.
2) restricts number of requests from a particular ip to 15000/hour. If the number of requests exceed 15000 in an hour it throws http 403 forbidden error, before your quota is refurbished.
3) As of now the script invokes this service for each and every IP, as a result this can be little slow. We will try to improve this script further as soon as provides interface/API to make bulk requests.

How does this script work: This script performs GeoTagging for IP Address. It takes IP Address(both IPv4 and IPv6 formats) as input and returns the latitude and longitude where the machine/computer with this IP address is operating. This script uses a webservice called In this script we invoke this service by calling the HTTP API by passing in the IP Address. In addition to the latitude and longitude details, it returns other geographic information like country name, region/state name, zip code, time zone etc. For invalid IP addresses, the script populates all the result columns values with "Invalid IP Address". Since this invokes webservice, internet connection is mandatory for this script. This script uses rjson r-package.
NOTE: this script does not guarantee to return the latitude and longitude details for each and every valid IP Address neither does it guarantee to return these geographic details completely/accurately.

Inputs to the Script:
1) ID: which uniquely determines the IP Address. This column is needed to join back the result set returned by R to the dataset in DV
2) ipaddr : IP Address. Both IPV4 and IPV6 formats are accepted.

Optional Inputs:
proxy_url : If your network requires you to use some proxy(because of some firewall or VPN), please specify the proxy url with the port number.

1) Geographic details: This script returns the geographic details of the location where this IP Address is located. Geographic details include City, State/County/Region, Country, zip code, time zone and lat long.
2) Located: If the IP Address couldn't be located, the script returns the column "located" with value "N". If the IP Address is successfully located it returns "Y".

Steps to deploy this R-Script plugin in your local OracleDV:
1) Install Advanced Analytics feature in Oracle DV by clicking on the below icon. This will install Oracle R deployment. Alternatively you can install Advanced Analytics by running install_advanced_analytics.cmd present in <DV_INSTALL_DIRECTORY>

2) If not installed rjson R package already, please install it using following instructions
    Open R console(double click Rgui.exe present in <Advanced_Analytics_Install_Dir>\bin\x64) and
    install rjson Package.
    Following are the R commands to install:
     Set Proxy:
        $ Sys.setenv(http_proxy="http://<your_proxy_host>:<port>")
           set proxy appropriate to your network settings.
     Install Package:
        $ install.packages("rjson")
3) Download from OracleBI Public Store and unzip it.
4) Copy R.GeoLocateIPAddress.xml to <DV_INSTALL_DIRECTORY>\OracleBI1\bifoundation\advanced_analytics\script_repository
5) Download and deploy Heat Map custom visualization plugin from Oracle BI Public Store. Instructions to deploy this Custom Viz plugin are described in the Public store.
6) Import the .dva project to Oracle DV. Password for the .dva file is Admin123
Use Proxy_Url optional parameter in EVALUATE_SCRIPT only if your network requires you to use proxy.