Oracle Underground BI & Dataviz: April 2019

Wednesday, April 24, 2019

How to perform incremental data loads in Oracle Analytics Cloud

Oracle Analytics Cloud offers the ability to create data flows and perform incremental loads on the target table. Data flows have the ability to operate only on the incremental data which becomes available in the source in between the current run and the previous run.

In this blog, let's see how to perform incremental load on a database table. This video gives a sense of the whole experience. The blog shares a few more details as well

Prerequisites for performing incremental loads

1) Incremental loads can be performed only on those data sets where the source and target tables are database based.
2) If there are multiple data sets in a data flow, then only one of the sources can be set for an incremental update.

Defining a source and new data identifier for the source

The first step is to have a source table and identify a column using which new data can be identified in the table. In this example, I have a revenue fact table with month key as a new data identifier

Next step is to create the Oracle Analytics datasource pointing to this table. In the process of creating the datasource, make sure you mark the new identifier column by clicking on the 3rd node in the process flow. This is an important step as this column defines how the system will be able to identify new rows in the dataset.

Define a data flow

Now that our datasource is created, let's define a data flow by importing the revenue tact table from the source connection. The key here is to check the "Add new data only" box to ensure that the source table is marked for incremental load.

To make my dataflow a bit more functionally representative, I will add some business example : converting currencies values. Let's bring in a spreadsheet which has exchange rates for every month to convert and let's join it based on the month key column. Let's add a new calculation to convert revenue.

Finally lets select step 'Save Data' and specify a name for the resulting data set. Make sure you choose the target connection as a database and specify the table name where the result set needs to be saved. There are 2 options available to select in "When Run" drop down.

"Replace existing data" : that will make the data flow truncate the target table and reload all the records.
"Add new data to existing data" : keep the existing records intact and load only the new records in the source table. New records are identified by the column we defined in the source dataset above.

Lets set the When Run option to "Add new data to existing data" and save the data flow.

Now, let's run the data flow for the first time. As it completes, we can see in our underlying data base that the target table has been created. Since this was the first run of the data flow, all the records in the source table are inserted into the target table.

Now, for the example of this blog, let's go and delete a month of data (201812) from our newly created target table. After doing this, our source table still has its 12 months of data (Jan to Dec) but our target table now only has 11 months, it is missing December. Notice that we did not change data in our source table, so there are no new records there since our last run of the data flow.

So, as we run the data flow for the second time now, the target table does not get incremented at all. The data flow was set to only bring across new data, but there is no new data in the source so nothing is changed in the target table. We can check that the target table is still not loaded with the deleted month's data. If the data flow had been set to full load, all the data would be in the target.

Now, to complete the test, let's manually load our source table with 3 more months of data. This will represent some incremental data. Then let's re-run the data flow once again. We can find that the target table has been incremented with the 3 new months of data coming from the source table. But notice that data in target table is still missing for the month where the records were deleted :

Just to confirm for the purpose of this blog, if we go back to the data flow definition and set Run option to "Replace Existing Data" in the target table, then when we run the data flow all the data gets loaded, including the the deleted month's data.

Are you an Oracle Analytics customer or user?

We want to hear your story!

Please voice your experience and provide feedback with a quick product review for Oracle Analytics Cloud!

Tuesday, April 9, 2019

DV Aggregation : Metric Level vs Visualization Level

Oracle DV supports different aggregation rules for any metric like Sum, Average, Minimum, Maximum etc. The rule defines how the metric will aggregate when queries return data at an aggregated level (exp : sum of sales by customer when source data is at the sales order level)...

A metric's aggregation rule can be edited in several places while building analysis in OAC. It can be changed as a Metric property level, via different dialog boxes, or it can also be changed in a specific Visualization, only for that Viz. If you ever wondered about difference in behavior between these two main scenarios, let's dig deeper and find out.

1) Setting aggregation at a Metric level

This can be done in two ways: In the Prepare tab or in the Visualize tab when building a DV project. The resulting behavior is the same in both cases: it sets the default rule for aggregation of the metric in this specific dataset. So this will establish the default aggregation rule for any project using this dataset column, not just for this project we are working on.

Setting it directly from the Prepare Tab:

To set a metric aggregation in the Prepare Tab, click the Metric and in the properties pane, go to General tab and change the aggregation to a desired option.

Once you apply this change, the new aggregation method applies to all the visualizations that use this metric across different DV projects. For example, if we set the aggregation of Sales metric to Sum, every time this metric is used in a viz, the value is computed using Sum aggregation. Let's look at Sales by Product Categories just below :

So now that the property is set for this metric the dataset level every project or visualization using this metric will compute the new aggregation rule. To check that, when we inspect the dataset from the dataset pane and look at its elements, we see the metric aggregation is set to Sum.

Setting the aggregation rule from the DV Canvas (Visualize tab) :

Another option to set the same level of aggregation is to change it directly from a Metric column in the DV Canvas (Visualize tab as opposed to Prepare tab). To set a Metric aggregation from there, simply click on the metric among the data elements and change the aggregation rule in the in the properties pane (bottom left pane).

The scope here is the same as setting it in the Prepare tab : it will set it as the default aggregation for that metric and will be used every time the metric value is computed, in any project.

2) Setting aggregation at a Viz Level

Metric aggregation could also be set for a single visualization in a given DV project only. In this case, the setting is specific to the visualization and overrides the default aggregation rule set at the dataset level. This can be done by clicking on any given Viz that exists on your canvas, going to the Values sub-tab within the Properties pane (the # sign) and setting the Aggregation method here. If a Viz has multiple metrics, each metric can have a different aggregation method. Changing the rule there will only apply to the Viz that you edited and will not impact the dataset. So any other viz using the same metric, in this project or any other project, will keep the original aggregation rule.

At first sight, changing the aggregation method at the Viz level may seem of trivial use : if the Viz has a Total invoked in it, the aggregation to arrive at the Total value will be the one specified in the Viz property. For example, looking at the table of Sales by Products Category with default aggregation Sum for this column, and let's add a Total to the Viz. By default, the Total value is calculated as a Sum of Sales. If we change the aggregation method only for the Viz to Average, we will now see the Total of the report showing an Average of Product Categories Sales.

Note that this only changed the Total aggregation line, the value for lines did not change. Why ? The calculation done by OAC here includes two passes : first retrieve Sales by each dimension requested in the Viz, in this case Product Category. This retrieving of data is still done using the default dataset aggregation rule (Sum in our example). Then, the second pass, at the Viz level, aggregates the retrieved data using the specific Viz rule for each row in the Viz. In this case, simply show the Avg of Sales for each Product Category and also for the Total of the report. For each single Product Category, Avg of Total Sales is the same as Total Sales, but for the sum of all Product Categories, we get the average value of Product Category Sales.

But if you look closer in the option, there is a very powerful subtlety that can be leverage when using Viz level aggregation rules : the by clause option.
Let's keep using our table of Sales by Product Categories, and let's now say we want to see average monthly Sales for each Product Category. For each Product Category, we want average of sales by Month... That's what the By clause will let us achieve.
As we set the Viz level aggregation method to Average we can click on the By field and set it to any set of attributes (one or many). Let's just pick Order Month in our example. The view now shows Monthly average Sales for each Product Category and this aggregation is specific to the viz.

In this case, OAC actually retrieved information of Sales, summed by Product Category and Order Month in the first pass, and then, for each Product Category, calculated the average of all the monthly sales value. This applies to any other aggregation, with the 'by' columns of your choice, and for any Viz. This is pretty powerful analytics calculation done by a single user friendly click.

Are you an Oracle Analytics customer or user?

We want to hear your story!

Please voice your experience and provide feedback with a quick product review for Oracle Analytics Cloud!