Monday, December 19, 2016

Advanced Analytics: R Term Frequency Analysis on OracleDV

In this blog we will discuss how to perform Term Frequency Analysis on Oracle DV using R. What is Term Frequency Analysis(TFA)? TFA is a technique which takes Textual data as input and counts how many times each word is repeated in the textual data. Will it count frequency of each and every distinct word in the text? Yes, it does. But it also provides users option to filter out common words, which do not actually add any meaning like (and, like etc), these words are called stop words. TFA can filter out these stop words and make your analysis more meaningful. TFA has many applications and most common among them are: to analyse the quality of web pages, to identify key highlights in online reviews/posts and to identify popularity of a particular brand or product in social media posts.

It is well known that OracleDV supports R-Integration and allows users to run Custom R-scripts on Oracle DV. Term Frequency Analysis is implemented using Custom R-Script. And it is quite easy to deploy Term Frequency Analysis R Cartridge on your DV. Just download from OracleBI Public store, deploy it and get going with your analysis. You can also use the R-script in Dataflows and perform analysis on your textual data as part of the flow. The R-script generates output in a tabular format containing the words used and associated frequency. This tabular output can either be saved as dataset or exported/downloaded to excel and can be used as part of Dataflow.

Below are the steps to deploy:

1) Install Advanced Analytics feature in Oracle DV by clicking on the below icon. This will install Oracle R deployment. Alternatively you can install Advanced Analytics by running install_advanced_analytics.cmd present in <DV_INSTALL_DIRECTORY>



2) If not installed "tm" R-Package already, Please install it using following instructions:
     Open R console(double click Rgui.exe present in <Advanced_Analytics_Install_Dir>\bin\x64),
     install arules Package.
     Following are the R-commands to install:
     Set Proxy:
        $ Sys.setenv(http_proxy="<your_proxy_host>:<port_number>")
           set proxy appropriate to your network config.
     Install Package:
        $ install.packages("tm")
3) Download Term_Frequency_Analysis_V1.zip from from OracleBI Public store and unzip it.
4) Copy R.TermFrequency.xml to <DV_INSTALL_DIRECTORY>\OracleBI1\bifoundation\advanced_analytics\script_repository
5) Import the .dva project to Oracle DV. Password for the .dva is Admin123

Here is a snapshot:

Are you an Oracle Analytics customer or user?

We want to hear your story!

Please voice your experience and provide feedback with a quick product review for Oracle Analytics Cloud!
 

6 comments:

mig said...

can you add url to download "Advanced Analytics feature in Oracle DV" from step 1? -Thanks

Daan Bakboord said...

You can install this feature from the Windows Start Menu. It's in the DVD startup folder.

Cheers,

Daan Bakboord
http://www.daanalytics.nl

oraclebitechdemo said...

Alternatively you can install Advanced Analytics by running install_advanced_analytics.cmd present in

updated the same in the blog. Thank you for raising the question.

Sulakshana said...

I tried to add a filter to the terms, so that i could exclude some common words like well etc. But the filter does not work on this calculated item. Any suggestions please

rmouniak said...

So awesome blog Thanks for sharing
Oracle SOA Online Training

Unknown said...

I've tried with latest DVD Version 12.2.5.0, installed DVML and ran install_advanced_analytics.cmd
But the data flow doesn't run. I'm getting this error:
Step |j| Execution failed. Status: FAILED. Message: Error Message: 'DataFrame' object has no attribute 'sort'
Traceback (most recent call last):
File "C:\Users\<>\AppData\Local\Temp\DVDesktop\servers\obis1\tmp\nQS_PAF_19808_3_48893272.TMP", line 86, in
return_dat = obi_execute_script(dat, columnMetadata, args)
File "C:\Users\<>\AppData\Local\Temp\DVDesktop\servers\obis1\tmp\nQS_PAF_19808_3_48893272.TMP", line 57, in obi_execute_script
df = df.sort('frequency', ascending=False)
File "C:\Program Files\DVMLRuntime\lib\site-packages\pandas\core\generic.py", line 3614, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'sort'
[nQSError: 43224] The Dataflow "Claims Top Terms" failed during the execution.
[nQSError: 43204] Asynchronous Job Manager failed to execute the asynchronous job.

Do you have a clue why?

Post a Comment