Friday, September 27, 2019

Calculating Period to Period Growth in OAC Without RPD Modelling

Being able to calculate Period to Period growth and growth rates is a common need in most analytics insights today. OAC provides a robust semantic layer capability to do this with the OAC Metadata Repository where time hierarchies and rich time series calculations can be defined and modeled. But most often in self-service-analytics context, we quickly upload a file-based dataset and don't take time to model a metadata layer on it. We directly run visualizations on the data, and we soon need to start defining calculations like time series, growth rates etc... to extract deeper insights.

How to calculate Period to Period with an Oracle DV datasource and no repository ?


This blog highlights a simple technique to build any period to period calculations without building an RPD model with OAC. There are several ways this can be achieved, this entry explains a direct way of running live calculations using a data-source join preparation. Other upcoming blog entries will detail other alternative techniques for similar calculations.

This 6 minute video goes through the process of defining this, below are some illustrated steps as well.



Our data is an excel dataset with a list of 10K orders over several years. Each order has a date and several attributes and metrics. One of the many attribute is the Product Category (3 distinct values), and another is Customer Segment (4 distinct values). We do need to calculate Y/Y growth of monthly sales by Customer Segment by Product Category.
The calculation needed here must be able to properly aggregate Sales it by these attributes (Month, Segment, Category), call the same value for last year, and then compile the growth rate. As we have no repository on our xls upload, we have no semantic Time Hierarchy availalbe. So we cannot leverage the Time Series specific calculations in OAC.

To achieve this dynamic calculation in OAC DV, we can use a simple technique : duplicating the initial dataset in Oracle DV, and joining the two duplicate sources with a Period <---> (Period -1) join. If we do this, the initial dataset will still show actual period in the project, and the duplicated - joined dataset will provide the data for the previous period, at the right aggregated level.

First, let's start by duplicating the original dataset in DV. Note that this does not duplicate the source file or source table data, it only creates another entry in OAC pointing to the same source data.

Once dataset is duplicated, we need to create a custom 'join-key' column in each dataset which we will use to join the two datasets. That happens in the data preparation tab of DV. The column we create in our case is a 'month-key' column. in the original dataset it's simply a month sequence showing YEAR*100+MONTH_NUMBER. For example a date like 21-Jan-2012 will show a month key of 201201

So we create this column in both datasets, but the trick is that in one of the datasets we add or substract 1 to the year number (depending on which dataset). In my case, I will use the duplicated dataset to show year ago data, so I create a column in this dataset with a '+1' in the year number. I could just as well have created a 'Year-1' column in the other dataset.

That way, when we join both datasets via this month-key, records with for example year 2013 in the original dataset will join to the records with year 2012 in the Yago dataset : 2013 = 2012 + 1. So now we join the datasets in DV data prepare :

Note that we join on the month key column, but we must also join on every other column that we want our analysis to potentially aggregate on : Customer Segment and Product Category in our case. If we don't join on these, our reports by Customer Segment and Product Category will not give detailed results.

So once this is done, we just have to build visualizations on the joined datasets. One more important point : by default visualizations will show a join type (Data Blending icon, on each viz properties tab) of All Rows joining with All Rows (full outer joins). In our case, we need to set this to 'Only Matching Rows' for the Yago dataset. This viz config will be needed for each viz we build using the two datasets:


With this, we can now build all sorts of visuals and calculations using the Yago Sales metric. Because we specifically joined on month-key,Customer Segment & Product Category, all these computation will aggregate correctly at the levels of Month & above (Quarter, Year), Customer Segment & above, and Product Category & above.
This technique can be used in a similar way in DV Data Flows, but in this case, datasets do not need to be duplicated. Also, default joining in Data Flows there is not of type full outer. That simplifies the various Vizs configuration process. An upcoming blog post will describe how to optimally use this technique in a Data Flow.

Are you an Oracle Analytics customer or user?

We want to hear your story!

Please voice your experience and provide feedback with a quick product review for Oracle Analytics Cloud!
 

2 comments:

snkr said...

Hello Team,

Great Article!!

I was trying to download Sentiment_Analysis_V1.zip, Term_Frequency_Analysis_V1.zip from OracleBI Public Store but it seems its is not available and moved to different location.

Could you please share the link where I can download these 2 projects...

Thank you very much in advance.

sailaja N said...
This comment has been removed by a blog administrator.

Post a Comment