Wednesday, January 18, 2017

Advanced Analytics: Fraud detection Example using Benford's Law

In this post we will talk about an R based example which performs Fraud detection using Benford's law on Oracle DV Desktop.This example also highlights Oracle DV's capability to consume multiple and distinct tabular results, visual charts returned from a single invocation of R-Script. This example can be downloaded from Oracle BI Public store.

What does this script do: This script takes financial data (or any other data that satisfies characteristics of Benford's law. More about these characteristics are described below) which includes financial amounts along with 1 or 2 identifiers and applies Benford law and returns suspicious transactions. 

It also returns some metrics and plots which depict the expected distribution of data according to Benford's law vis a vis actual observed distribution. These plots are displayed in DV Desktop using R Viz(base64Image) plugin which can be downloaded from Oracle BI Public Store

Metrics contain information on the expected probability, actual probability, distribution frequency and difference details along with data summation details. For more details on these metrics please go through the documentation of benford.analysis packageThis example uses benford.analysis R package, which can be downloaded from CRAN repository

Here is how your DV Desktop will look like after deploying this example:

What is Benford's Law?

Benford's law, also called the first-digit law, is an observation about the frequency distribution of leading digits in many real-life sets of numerical data. The law states that in many naturally occurring collections of numbers, the leading significant digit is likely to be small. For example, in sets which obey the law, the number 1 appears as the most significant digit about 30% of the time, while 9 appears as the most significant digit less than 5% of the time. By contrast, if the digits were distributed uniformly, they would each occur about 11.1% of the time. Benford's law also makes (different) predictions about the distribution of second digits, third digits, digit combinations, and so on.Benford's law usually hold for data with following characteristics:
  • Data with values that are formed through a mathematical combination of numbers from several distributions. 
  • Data that has a wide variety in the number of figures e.g. data with plenty of values in the hundreds, thousands, tens of thousands etc.
  • The data set is fairly large.
  • Non symmetric distribution of data around Mean/Median, with large right skew
  • No predefined Maximum/Minimum except for 0 as minimum
Benford's law is applicable irrespective of the scale of data. More information and experiments on the applicability of Benford's law across multiple scales can be found in Datagenetics blog. 

Accounting Fraud detection
Benford's law can be used to analyze financial data and spot possible red flags. If the digit distribution doesn't look anything like the distribution predicted by Benford's law then it may mean that the data is manipulated. Financial data include Accounts receivable, Accounts payable, sales and expenses data.

How does the script work on Oracle DV Desktop

Inputs: This script takes payment amount(in dollars) along with one or more idenitfiers/details. In this example we are passing in the Vendor Number(Identifier1), Invoice Number (Identifier2) and the corporate payment amount as inputs to the data. This script can also be used to perform Fraud detection for other statistical data like census and other surveys which have the characteristics we discussed above.

Optional Inputs: num_of_digits: We can send in the number first digits which we would like to analyze
                        TopPercent    : Top N percentage of the Suspicious entries you would like to be returned.

Output: This R-Script returns 3 sets of results/information. They are:
1) Columns Identifier1,Identifier2,Suspicious Amounts return the top N % Suspicious transactions .
2) image* columns return the R plots in base 64 encoded image format. R Viz(base64image) custom Viz plugin parses these base64 encoded image strings and displays the image on DV Desktop canvas.
3) Columns from Digits to Metrics: return metrics like distribution frequency etc for each first digit(s).

Please note that this R-Script returns all the these 3 sets of results/information in a single dataframe. And Oracle DV simultaneously displays these distinct tabular results and image results returned by a single R script.

Steps to deploy this R-Script plugin in your local OracleDV:

1) Install Advanced Analytics feature in Oracle DV by clicking on the below icon. This will install Oracle R deployment. Alternatively you can install Advanced Analytics by running install_advanced_analytics.cmd present in <DV_INSTALL_DIRECTORY>

2) If not installed benford.analysis already, please install it using following instructions
    Open R console(double click Rgui.exe present in <Advanced_Analytics_Install_Dir>\bin\x64) and
    install benford.analysis Package.
    Following are the R commands to install:
     Set Proxy:
        $ Sys.setenv(http_proxy="http://<your_proxy_host>:<port>")
           set proxy appropriate to your network settings.
     Install Package:
        $ install.packages("benford.analysis")
3) Download from OracleBI Public Store and unzip it.
4) Copy R.BenfordFraudDetection.xml to <DV_INSTALL_DIRECTORY>\OracleBI1\bifoundation\advanced_analytics\script_repository
5) Download R Viz(Base64Image) custom visualization plugin from Oracle BI Public Store. Instructions to deploy this Custom Viz plugin are described in the Public store.
6) Import the .dva project to Oracle DV. Password for the .dva file is Admin123

No comments:

Post a Comment