Saturday, February 27, 2021

Multiple Top N Filters – Part 1 of 2

Top N analysis is a useful technique and helps in focusing on a small ordered set of data items from a large dataset.  They help in answering queries like 'What are the Top 5 products that have had maximum Sales' or ' Who are the top 3 agents who sold maximum policies' and so on. Oracle Analytics offers Top Bottom N filters that allows you to easily perform Top N or Bottom N analysis.

Please take a look at this brief video about Top Bottom N feature in Oracle Analytics.

In Oracle Analytics, Top N filters can be applied on metrics and attributes and help in restricting the number of rows retrieved in the dataset. While it is easy to understand the results when we apply a single Top N filter, what exactly happens when we apply more than one Top N filter? When we apply two Top N filters, does filter 1 get applied first and filter 2 gets applied on the results of filter 1 or vice versa? Does the sequence of filtering depend on the position of filters in the canvas? These are some common questions we encounter when we think of multiple filters. In this blog, let’s understand what exactly happens when you apply multiple Top/Bottom N filters.

Let’s use a Student Analytics dataset containing academic details of students like degree, major, institution, course, grades, faculties etc.

Let’s say we want to apply two different Top N filters on this dataset

  • Filter 1: Top 10 Major by Student Count
  • Filter 2: Top 5 Degree by Student Count

Let’s consider these two filters as two circles in the following Venn diagram.

Circle A represents the result of Filter 1 and the circle B represents result of Filter 2. When Filter 1 alone is applied, circle A is shaded representing the output of the filter.

Similarly, when Filter 2 alone is applied, circle B represents the output.

When both the filters are applied, the output is represented as the intersection of A and B.

Let's use this explanation and understand how multiple filters work in Oracle Analytics. First we create a project using this dataset and add three tables as follows:

  1. Table 1: Student Count by Major
  2. Table 2: Student Count by Degree
  3. Table 3: Student Count by Major and Degree
    

Unfiltered Output


In the above picture, Table 1 shows Student Count by Major, Table 2 shows Student Count by Degree and Table 3 at the bottom shows Student Count by Major and Degree with Major as rows and Degree as columns. On the status bar at the bottom, we see that there are 119 rows and 27 columns in Table 3 before applying any filters. This indicates that there are 119 Majors and 27 Degrees in our compelte dataset.

Apply Filter 1 

Now, let’s first add Filter 1 to the canvas on the attribute column Major. Drag/drop Major to the filters section, make it a Top Bottom N filter type and specify Top 10 by Student Count. Following is how the filter definition looks.

Once this filter is applied to the canvas on all the three visualizations, we see the following results

Filter 1 Applied


Table 1 is filtered and only shows Top 10 Majors by Student Count. In Table 2, the Student Count value has reduced as a result of the filter. Table 3 at the bottom is also filtered and shows 10 rows and 14 columns. 

Apply Filter 2

Let’s disable Filter 1 and add a new Filter 2 as follows.

Now in Table 1, Student Count has reduced, Table 2 has Top 5 Degrees by Student Count and Table 3 at the bottom shows 65 rows and 5 columns.

Filter 2 Applied


Apply Filter 1 and Filter 2

Next, let’s enable both the Top N filters. When both the filters are applied, Table 1 shows only 7 rows, which is the result of Filter 1 and Filter 2 applied together. Table 2 shows only 4 rows instead of 5 as both the filters are applied and Table 3 shows 7 rows and 4 columns. This output represents the intersection in the Venn diagram that we saw above. In this case, filters are applied on all the three tables. 

Filter 1 and Filter 2 Applied


This implies that when multiple Top N filters are applied, they do not get applied on the dataset sequentially. Instead, the two filters get applied independently on the dataset and the intersection of the resulting two filtered datasets is what we see as the final output. The same logic can be applied for more than two Top Bottom N filters. Also note that changing the order of the Top Bottom N filters will not have any impact on the results. We will end up with the same result.

Thank you for reading this blog !

No comments:

Post a Comment