Thursday, February 25, 2021

How to Calculate Top N Versus Rest of Population in Oracle Analytics

Oracle Analytics enables users to easily filter visualizations or whole canvases by using Top N or Bottom N filters. These filters have several nice options such as starting from a Metric, or starting from an Attribute, or even setting the 'implicit grain' that will control what level of aggregation the Top N calculation will start from. These filters come extremely handy in most situations.

But Oracle Analytics also supports many other powerful calculation in this context. One of them, for example, allows to simply address : what is the total (or average) value of all the other individuals that are not in Top N... For example, how do sales of each of our top 10 offices in the world, compare with average offices sales of all the non top 10 ? 

This blog entry explains how to design this calculation in Oracle Analytics, and highlights techniques used for this, that will also be useful for other calculations. In particular, the ability to force an aggregation at a given 'implicit' level, or also the ability to set a second aggregation, at the visualization, after the Oracle Analytics engine first aggregation to provide initial query results. 

We will take the example here of Sales data in various cities across the world (133 cities). Our objective is to show a bar chart with 11 bars, with 10 bars showing respective sales total for top 10 cities in sales, and 11th bar showing the average of all remaining non top 10 cities (aka 123 remaining cities). For now, the data we have on our starting chart looks like this, I have simply high-lighted the top cities on the chart with my mouse



We can apply a filter on our viz or canvas, or if we build a calculation using TopN function like the one below : TOPN(Sales,10)

but as we do this, the chart will show only be filtered top 10 cities only :


What we need is the same as the chart below, but with an extra 11th bar showing the average sales value by city, for all the non top 10 cities. To achieve this, we will use a combination of Rank(), Attribute() and case when function in OAC.

First, let's calculate the plain rank of each city, based on it Sales value. We use Rank(Sales) for this: Rank(Sales). This simple ranking will compute the rank of every row we have in the results of a report, and will return a number (a metric). So, we can try to use this formula now : 

case   when Rank(Sales)<11 then City    else 'All Other Cities'    end

This calculation works, when we add it to our chart, we can see proper coloring of top 10 cities vs all the rest of the cities


But now, we need to aggregate all the blue cities into a single bar, the 11th bar. 

  • If we simply drag the new calculated attribute into Category field, and remove the city column from the viz, it will not render the chart we expect. In this case, the expression will evaluate Rank(Sales) as rank of total sales - that is, 1 total value, not by city, so it will return a rank value of 1. The condition will be fulfilled : 1<10 so the expression will return all the cities. Not what we want.
  • if we drag the calculated attribute in other viz grammar fields, the chart will error out (logical explanation for this, but too long to expose here)
What we need to do, is to force the Rank calculation to happen at the city level : aggregate sales by city, then rank() each, then apply the case when formula. This must happen, even if city column is not part of the viz grammar. The Attribute function in Oracle Analytics allows to force this leveling in the calculation :

case  when 

Attribute(  Rank(Sales)   by City)   <11 then City    

else 'All Other Cities'    

end



When using this formula in the chart, it works, we are getting closer to what we want. Now we only see 11 bars, with the 11th blue one aggregating all the cities that are not in the Top 10 sales :

But one problem remains, since there are 123 cities in the non top 10, the sum of all the sales for these is bigger than the individual top-ers. What we need is to show the average value by city, for the 'All Other Cities' bar. That is simple to achieve using the Viz properties in Oracle Analytics, and does not require any calculation.

Click on the bar chart itself to select that viz, then go to the bottom left of the screen in the Viz Properties fort that Viz, and there, click on the Metrics Tab (# sign). This tab offers various options to tune for the metrics used in the viz. In our case, Sales is the only metric used here. At the bottom of the option list for Sales Metric, there is an option for Aggregation Method. That is a very important option, as it defines the aggregation that will happen at the viz level, after the Oracle Analytics engine has already returned computed results from the query. So that's a second level of 'browser aggregation' that we can make happen here for that viz. Let's click it and select Average instead of Auto.

But that's only half of it, as we click that agg rule, the chart does not yet change. Oracle Analytics is applying this aggregation only to the results set returned by the engine. There are 11 rows returned in the results set here, all we have done is to switch, for each row, from sum of a single value in the row to average of a single value in the row. That's making no difference for now. What we want, is to force the viz average aggregation to happen at the city level, not just the result set level (11 rows). Even if city is not part of the query here. 

Easy to do : there is a By property just below the aggregation one we used, we simply need to click it, and navigate the wizard to select the city column in our dataset. That will force the Viz Average aggregation to happen 'By City' which is what we want.


As we click this, now the chart changes and really shows the insight we needed, with average value of sales by city, for the 123 cities in the 11th bar. We can see how this compares visually


This technique would apply the same as well if we had picked a different object than City for the aggregation by. If we had picked customer for example, the chart would have shown us : for the top 10 cities and for the rest of all grouped the non top 10, what is the respective average revenue by customer
As you can see in the picture below, the cities shown are the same (still the top 10 revenue cities, plus the 'All Others'), but the values show and hence the sorting are very different. It's a different, powerful analysis shown :


This technique applies to all vizs where metrics are used. The two important take overs in this process were 
  • the fact that we built a calculation which was forced at a level of detail by using the Attribute() function in Oracle Analytics. 
  • we then also cumulated a second sequential aggregation rule, at the viz level, that allowed us to also be set to happen implicitly at the level of our choice (city, customer, etc).
These two are very powerful capabilities for easily designing all sorts of level-based analysis in Oracle Analytics. This video walks through a live example of the process (& more) what we just followed in this blog, if you are interested.


Thanks for reading this blog

No comments:

Post a Comment