Tuesday, February 4, 2020

Where to build my OAC custom calculations, in Projects or under Data-prep ?

Oracle Analytics allows any user to easily create custom calculations in data-visualization projects. It's a simple process to extend your analysis or data-sets with any calculations and aggregations you may need right while you are building your Vizs, irrespective of the type of underlying data. One can build a custom calculation on a DV Project directly, or also, build it directly in the data-set using the Data-prepare tab.

But precisely, since there are multiple places where custom calculations can be built, which one is the right one ? What is different between a custom calculation directly built in a DV project versus a custom calculation built in the Data Prep tab of OAC ? This blog entry is a short helper in understanding the main difference between these two. It should help users make the best choice on where to build their next custom calculation.

The flow of this blog entry is also illustrated in this short voiced video :





To illustrate the point, let's start with a simple example whose results may seem confusing. My data-set has order lines details, with sales value and quantity sold for each order line.

I built a custom calculation defined as Sales / Quantity twice : once in my project using Add Calculation menu, and once in my Data Prepare tab using the Create Column menu. Both these calculations are built with exactly identical formula shown on the picture.

Yet when add both calcs in the same visualization and compare the results, I am getting different values for each calculation...

See the two last columns in my result table below, once for each custom calculation :


Why is that ?????


At first sight, this is puzzling. How can the exact same calculation return different results based on where it's built in Oracle Analytics ? Which one is right and which one is wrong ?

The question behind this confusion really is : how is the sequence of aggregations & calculations happening in each case. Think about it, just like if in a spreadsheet file you computed the sum in a total column at the far right of your table, and then the divisions in the bottom line. If you did it vice versa, you may also get different results with the same formula...

  • In the case of the Project based calculation (first case here),

Oracle Analytics will first run the initial query to fetch raw data needed in the view, and return the aggregated results set. Then it will run the calculation. So for this example, it will first return sales and quantity by Product Category (3 rows) and only then it will calculate the division of Sales by Quantity ordered, for each of the 3 Product Category lines. That will give us what we could call a 'weighted Unit Price Average'.

  • In the case of the Data-prepare based calculation (second case here),

In this case, Oracle Analytics will do it the other way around : it will first execute the calculation at the data-set detail level. So, first, since our data-set is at Order-line level (each row in the data-set represents an order-line), Oracle Analytics will compute the division of sales by quantity distinctly for each Order-line.
Only once that is done, it will run the query to fulfill the data-viz and will then aggregate the result-set data for each Product Category. So, since in our example we set an aggregation rule of avg for our new Data-prep calculated Unit Price column, it will take all the granular unit prices for all order-lines in a given Product Category and return average of these. It will do so for each Product Category (3 rows). We could call this an 'un-weighted Unit Price Average' in this case.


The video above shows how these calculations can easily be manually verified in a spreadsheet.
Note that in the case of a Data-prep calculation, the result is exactly what we would have gotten if a Unit Price column had existed directly in the source data-set, for each order-line row (database table, csv file, etc). So in that sense, extending your  data-set with Data-prep new calculated columns is like extending it directly in your source file.

So, which calculation is correct then ?


... it depends.
It depends on what your analysis requires.

In most of the cases data visualizations require a level of calc-aggregation on initial data : by country, by employee, by year, etc.... The most representative value here is a weighted average, so that's produced using a Project-based custom calculation. This was our example. So in most of the cases, Project based calculations are the safest way to go, particularly when you are not sure of what you need. You can't be really wrong with it. Build your calculations in the Project custom calcs directly and be aware that if you want to persist them in the data-set for others to use, they may behave slightly differently if the calculations are later on aggregated in some vizs.

But in some cases, like binning for example, or running Explain, you need to keep a granular data in your calculation. And that is a data-prep level calculation. In our example, let's say that we also need to show sales by 'ranges' of unit prices. That is, bins of unit prices. Sum all sales where unit price is between 0 and x, then x and y, etc. This will work easily with the Data-prep calculation we have, we just need to drag it in the category axis of any chart and the binning will happen automatically. It will not work with the Project based calculation at all.

In conclusion, building custom calculations in a project or in a data-prep (data-set) are different things, potentially leading to different results. They each have a purpose within OAC and should be leveraged for appropriate needs. Use project build custom calculations in general as this will cover the most frequent need. Keep in mind Data-prep based calculation will help you achieve more advanced aggregation needs.

No comments:

Post a Comment