Monday, December 7, 2020

Easily Get a Unique ID Column in Your OA Dataset

It often happens to me that, while analyzing data, I end-up needing a unique identifier for every row in my dataset, but... it's just not present in there.... Doesn't that happen to you too sometimes ? Whether the underlying data is self uploaded , or is an enterprise semantic layer, sometimes simply we don't have the choice to go back to the physical data and add a unique row id for each record in the data. Yet there are cases where we badly need that info : for some specific visualizations (Box-plots, Scatter-plots, etc...), or to set an aggregation level for a calculation for example. 

How do you resolve this easily if you don't have access to the underlying physical data ? Here is a very simple way to address this with a small Oracle Analytics calculation, which is not intrusive to the underlying data-source : the 'Rowid()' logical function.

The video just below gives a quick tour of how to use Rowid()


There is an easy way to add a calculated row number in Oracle Analytics, just creating a project level custom calculation with a simple function : rowid(), as shown in the picture below. This function will return a sequential number assigned to every distinct row in the result-set.

This calculated column can then be used in any visualization, but it's only calculating a row number for the result-set distinct rows. For example, my dataset has 9000 rows, one for each order line, but when I query for total revenue by customer segment on this dataset, I only get 4 rows returned as the result of my query. The result set in my case has 4 distinct rows and the logical calculation we built here is only assigning a distinct number to each row in the result set, that's what we see here. 
But if I need to have a unique row id for every row in my dataset, no matter what attributes I am showing in my visualization, then I simply need to use Rowid() in the data prep pane :

The exact same function will work, but in this case, it will always be set at the level of the dataset detail and hence will return a unique ID for every row in the dataset. For instance, if I now build a viz with  the same attributes, but with the Data Prep level Rowid() calculation (table on the right) instead of the initial project calc (table on the left), I can see that every of the 9000 row ids from the dataset are returned when using the Data Prep level calc.

This is returning a row id exactly as if it was in the underlying physical dataset. So I can now use this prepared attribute in calculations or 'by clauses' aggregations. I can combine it into any other calculations just as if it was a real column. For example, I can use it as argument for the Detail grammar object in specific viz, like boxplot : 


In this boxplot viz, you can see on the left pane (viz grammar) that the Prep Row ID calculation is used in the Detail field. That means that every dot on the chart is a unique row from the dataset. I could not have achieve this specific viz at this level without a unique ID per column.

Thanks for your time reading this blog !




1 comment:

Michal Zima said...

I have tested rowid() function on OAS (not OAC). OAS is "aware" of this function, but when defined as project-level calculation, it does not have the "granularity" of final result set in visualization, but it is calculated at the granularity of data set rows (ID assigned to each row of data set, not assigned at the granularity of resulting data set for visualization). Is this working as expected or is it just something different in OAS than in OAC ?
Thanks and regards
Michal

Post a Comment