Thursday, July 4, 2019

Unleash your Database power within OAC using Evaluate function


Database systems like Oracle have tremendous analytical and data manipulation capabilities. These help in performing powerful transformations and gaining valuable insights out of massive amounts of data. Databases typically provide a wide range of inbuilt functions that can perform different kinds of deep analyses in a very scalable manner. Often, users are also able to define their own custom functions to meet specific query needs.

As users of Oracle Analytics (OA Cloud or Server), it is extremely valuable to be able to leverage these DBs functions directly from within the OA framework. 

The Oracle Analytics (OA) Evaluate function helps in doing just that. It enables users to create powerful analyses by directly invoking database functions. Evaluate provides a syntax to pass the specified database function with optional OA referenced columns as parameters. It is particularly useful when users want to call a specialized database function to process a heavy calculation directly in the underlying database. Evaluate can be used to call functions from any underlying database, Oracle, Essbase cube, DB2, MSSQL, Apache Hive and so on. It is a Scalar function, that is, which takes one or more values but returns a single value for every row.

The following video guides you through the process of leveraging Evaluate with Oracle Analytics, the rest of the blog below also gives the details needed.


Syntax

The syntax of Evaluate wraps around the syntax of the underlying DB function : EVALUATE('db_function(%1...%N)' AS data_typecolumn1columnN) where 
  • db_function() syntax is a valid SQL function available in the underlying database.
  • Column1 to columnN are the optional input parameters to the DB function. %1 is substituted with the value of column1, %2 with column2 and so on, for as many arguments as needed in the db function.
  • As data_type is an optional parameter that specifies the datatype of the return value (an OA datatype). Specifying data_type is not meant for type casting. Use it only when the return type cannot be easily predicted by the input columns. If there is a need to cast the return value, then add an explicit cast function.

Prerequisites

In order to be able to use Evaluate function in DV, following two conditions need to be met
  1. First, set Evaluate Support Level parameter to 2 in OAC system configuration. By default, Evaluate support level is set to 0, which prevents OAC from executing evaluate functions. To enable it, go to System Settings on the  OAC environment and set Evaluate Support Level to 1 or 2. Setting it to 1 means Evaluate is supported to users with ManageRepositories permission. Setting it to 2 means Evaluate is supported for all users.
  1. Make sure Data Access is set to Live For the datasets that will be referenced by the Evaluate function. By setting it to Live, we are ensuring that OAC Server always sends the query to the underlying database and does not try to simply retrieve results from cache.


How to use Evaluate




Let’s consider the Customers table from the sample data SH schema in Oracle Database. 
This table contains information about Customer attributes like name, gender, address, phone number etc.  

Querying the customers table in OA DataViz, here's how customer phone numbers look, with mixed format : some of the customer records have the phone number in the format xxx-xxx-xxxx while some are in the format (xxx) xxx-xxxx. Let’s say we want to change all the phone numbers to a common format (xxx) xxx-xxxx. 

Let's take the example of running some re-gex based replacement rule to achieve this. The database we are using (Oracle DB) has a REGEXP function that allows to do this, let us invoke it just for illustration of Evaluate syntax here. 




To begin with, let's first go to our OA System Settings and make sure Evaluate Support Level is set to 1 or 2. This is done only once for the OA environment.

Next, let's create an OA DV data-set sourcing from the Customers table in SH schema. Let's make sure we set the Data Access to Live for this dataset in order to be able to use Evaluate function.  


In the Oracle Database, the SQL syntax of the REGEXP_REPLACE function we want to use to change the Phone Number formats would be the following :

REGEXP_REPLACE(cust_main_phone_number, '([[:digit:]]{3})\-([[:digit:]]{3})\-([[:digit:]]{4})', '(\1) \2-\3')

To use this in DV, let's create a custom calculation directly from a project connected to the datasource and use the Evaluate function with the following syntax.

In the Evaluate Expression, we have the syntax for REGEXP_REPLACE as the db function that needs to be called, %1 is the column value being passed to the function which will be substituted by the first column, CUST_MAIN_PHONE_NUMBER (in blue). There is only one parameter in our example.
Notice that quotes is the 'escape character' : so for every single quote used within the original SQL function, we use the escape character which means we precede it by another single quote. That's why you see several places with twice single quotes in the Evaluate formula.

Bringing the new calculation into a visualization looks like this. Phone number format has changed by invoking the database REGEXP_REPLACE function. 


If we look at the underlying log with physical SQL sent by OA to the database (use the Session Monitor screen for accessing this), we see that Evaluate has been removed and only REGEXP_REPLACE function is being sent to the underlying database. 



Evaluate function can also be used within an OA dataflow. The same REGEXP_REPLACE can be called using an Evaluate function in the Add Columns step from any OA dataflow. Evaluate will properly functionship if all parameters of the dataflow are present in source tables from the same underlying database.


EVALUATE vs EVALUATE_AGGR
Evaluate formula will return the result of the database function as a column of type Attribute in OA. This result cannot be used as a metric or be broken down by other dimensions/attribute in the Datasets.
Evaluate_Aggr is a variation of the Evaluate syntax for OA that allows to invoke aggregate functions in the underlying database and return them as aggregated Metrics in OA. This function is intended for aggregate functions with a GROUP BY clause. 

Oracle database, for instance, supports a wide range of aggregate functions. APPROX_COUNT_DISTINCT is one such function that returns the approximate number of distinct values of an expression. This function is particularly useful when used on large amounts of data wherein APPROX_COUNT_DISTINCT is significantly faster than an exact COUNT(DISTINCT) with negligible deviation from the exact result. 
Let's see how to invoke this function from within OA using Evaluate_Aggr. Let us use the SH schema that has Sales, Customers and Products table. Let's approximate the number of distinct customers who purchased the various product categories in our fact table. We could run a COUNT(DISTINCT cust_id) on the Sales table, grouped by Product Categories. But this could be process time consuming if we had large amounts of data. 

Instead of this, we can invoke APPROX_COUNT_DISTINCT function using Evaluate_Aggr in OAC. The OA DV calculation would look like the following.

The result of this calculation is a metric in OA because we are using Evaluate_Aggr syntax. So in our DV or Answers visualization, we can simply add some attributes to break down the result further. For example we can bring PROD_CATEGORY column and show a bar chart. The bar chart below compares the result of Evaluate_Aggr(APPROX_COUNT_DISTINCT) with the results of actual count distinct calculation. We can see that the approximation of number of customers is very close to the actual values.


So both Evaluate and Evaluate_Aggr OA functions offer a powerful way to leverage any database capabilities directly within the OA framework, both for Cloud (OAC), Server (OAS), or even DV Desktop.

Are you an Oracle Analytics customer or user?

We want to hear your story!

Please voice your experience and provide feedback with a quick product review for Oracle Analytics Cloud!
 

Thursday, June 13, 2019

How to configure Usage Tracking in OAC?

Oracle Analytics Cloud supports the accumulation of Usage Tracking (UT) statistics that helps in monitoring system usage and performance. 

These statistics can be used in a variety of ways such as in system or database optimization, aggregation strategies, or internal billing of users/departments based on the resources that they consume. Usage tracking is particularly useful in determining user queries that are creating performance bottlenecks, based on query frequency and response time. 
Oracle BI Server tracks usage at a detailed query level. When Usage Tracking is enabled, BI Server collects data records for every query that is executed and writes it all to database tables. Both logical and physical SAL queries are tracked and logged in separate tables, along with various performance measures : 



The following video gives a high level overview of how to set up Usage Tracking on your OAC environment. This blog content below gives more details about this capability.




Prerequisites to enable Usage Tracking

Following conditions need to be met in order to enable Usage Tracking on your environment:
1) Usage Tracking requires a full RPD (Metadata Repository) to be enabled. It will track usage on any queries, even outside the RPD, but it only requires an RPD for its configuration. For example, UT cannot be enabled when only a TCM (Thin Client Modeler, ie web based admin-tool) model is active on your OAC.

2) UT requires an access to a Database to write data back into tables. So a user login with Create table privileges on the database schema will be needed during the configuration.

Steps to configure Usage Tracking

UT configuration broadly involves the following two steps: 
1) Defining a Database connection in RPD: using Admintool client (editor for RPD repository), define a database connection in the physical layer of your RPD. This connection should point to a database where Usage Tracking tables will be created and maintained. This database can be anywhere on the Cloud as long as it is accessible for write accss by OAC. 

2) System Settings in OAC : Configure Usage Tracking parameters like connection pool name, physical query table name and logical query table name.

Step 1: Define a database connection in rpd

Open the RPD which is uploaded in OAC and create a new database in the physical layer. Provide an appropriate name (eg: UsageTracking) and choose the database type as Oracle 12c.



Under this database, create a new Connection Pool with an appropriate name (Eg: UTConnectionPool). Provide the connection details to the database where Usage Tracking is to be configured and login credentials to the schema. 
Note:- This database user needs to have create table privileges on the schema. 



Next, create a physical schema in the RPD with the same name as the Database schema to be used by UT (Eg:- UT_demo). 


Once these definitions are completed, the physical layer should look like this. 


Save the RPD and upload it onto OAC using the Replace Data Model option within the Service Console.


Step 2:System Configuration

On OAC, open the System settings screen from the Console Tab. 



Scroll down and modify the following properties.

1) Toggle on the Enable Usage Tracking button

2) Usage Tracking Connection Pool: 
Enter the connection pool name in the format : <Database>.<ConnectionPool>. 
Eg:- UsageTracking.UTConnectionPool

3) Usage Tracking Physical Query Logging Table: 
This the table where details about the physical queries executed by BI Server against the underlying database are tracked. 
Enter the name in the format :  <Database>.<Schema>.<Table>
Eg:- UsageTracking.UT_demo.PhysicalQueries

4) Usage Tracking Logical Query Logging Table
This is the table where details about the logical SQLs executed by BI Server are tracked. Enter the name in the format : <Database>.<Schema>.<Table>
Eg:- UsageTracking.UT_demo.LogicalQueries

5) Usage Tracking Max Rows : Specifies the number of rows after which the UT tables will be purged. The default value is 0 which implies there is no purging. You can set it to a any desired value and the records would be purged once the threshold is reached.




After entering these details, restart Oracle BI Server. Once the server is restarted, Usage Tracking is enabled and the two UT tables are created in the database.

UT Tables Explained

Open SQL Developer, login to the UT database and observe that the 2 tables are created.



To generate some usage tracking data, login to OAC and click around some DV projects: both XSA as well as subject area based. Also open the BI classic home and open few dashboards to generate some queries. Now observe that the tables for logical and physical queries are populated with usage tracking information.




Important Columns in Logical Queries table

END_TS End time of query execution
ERROR_TEXT Error message if the query has errored out
ID Primary key
QUERY_TEXT Actual Query Text
RESP_TIME_SEC Total response time of query in seconds
ROW_COUNT No of rows returned by the query
SAW_DASHBOARD Dashboard name where query is getting generated
SESSION_ID Session ID from the user firing the query
START_TS Start time of the query execution
SUBJECT_AREA_NAME   Subject area used for the query
USER_NAME User ID who has executed the query

Important Columns in Physical Queries table

ID Primary Key
LOGICAL_QUERY_ID  Foreign Key from the logical queries table
QUERY_TEXT Query Text
TIME_SEC Time taken by query to complete
ROW_COUNT No of rows returned by Query
START_TS Start time of the query
END_TS End time of query

The join between the 2 tables can be performed using the join condition 
   LogicalQueries.ID  =  PhysicalQueries.Logical_Query_ID. 
Note that not all logical queries would generate a corresponding physical query. Some of the logical queries may hit cache and return results from cache and not generate a physical query.

Analyzing UT Data

Once UT is enabled, the system usage can be analyzed from DV. In order to do this, create a DV connection to the UT database, create datasets for the PhysicalQueries and LogicalQueries table and analyze them within a DV project. 

Following is a sample analysis built on the UT tables that shows # of sessions, # of queries,  most frequently used subject areas, most frequently used dashboards etc. 




Conclusion

Usage Tracking provides a mechanism for administrators to keep track of the usage of the OAC system. These statistics can be leveraged to take decisions to scale up, scale down, restrict access during certain time periods, pause/resume the system and so on.

Are you an Oracle Analytics customer or user?

We want to hear your story!

Please voice your experience and provide feedback with a quick product review for Oracle Analytics Cloud!
 

Thursday, May 2, 2019

To Bat first or Bowl first? Strategy for the upcoming ICC Cricket World Cup 2019


2019 is finally here and along with it, the biggest extravaganza in the world of cricket - the ICC Cricket World Cup 2019!! The tournament is one of the world's most viewed sporting events and is considered the "flagship event of the international cricket calendar" by the International Cricket Council. This is the 12th edition of the Cricket World Cup scheduled to be hosted by England and Wales.

Have you ever thought about the innumerable factors influencing a cricket match outcome? 

  1. The weather on the day of the match dictates whether it would end up in a high scoring or a low scoring affair. 
  2. The condition of the pitch can also play an important role in the outcome. 
  3. The toss... who doesn't dread losing the toss? 

While there are several factors influencing the outcome, one of the most difficult decisions that captains have to make is whether to bat first or bowl first. This is a crucial decision and a strongly debated topic. One key aspect to this decision making would be to look at history. See how teams fared batting first vs chasing, how this pattern varies across venues, how some teams have done better batting first vs chasing etc ... 

Can any Analytics help in making this decision ?

Following are very simple OAC (Oracle Analytics Cloud) visualizations that were built on all the world cup matches played from 1975 to 2015. Using OAC, it only took a few minutes to gain interesting insights on strategy to adopt for teams competing at the 2019 World Cup :



Eleven World Cups have been played so far and England hosted four (4) of them. Australia won the World Cup 5 times, India and West Indies come a distant second by winning 2 each, Sri Lanka and Pakistan won it only once.

A Marker in Finalists Strategy

Teams batting first won 63% of the time while teams chasing first only won 37% of the time. So if a team is skilled enough to enter the finals, batting first doubles its chances of holding that cup!!




Team wise Strategies that worked

Digging a bit deeper, this pattern varies by teams: for Australia and India batting first or second didn't seem to matter when they won the World Cup. For West Indies however, they only won by batting first. So if West Indies makes it to final at Lords this time around, their bet should definitely be on batting first.




Team Strategy when England is hosting

Since England is the host nation in the upcoming event, let's shift our focus to only those World Cups which were played in England. Interestingly, when hosted in England, finals where won most of the time (75%) by teams batting first. 
When it comes to semi finals, it's the opposite: teams batting second have won most semi finals !!
These numbers contrasts vigorously with semi-finals and finals hosted outside of England:
In all the other countries, teams batting first have won most finals and semi finals.




Venue-wise Strategy

Could venue even play a part in this pattern ? Let's consider the premier venues of England where the majority of the games in 2019 world cup are about to be played: Birmingham, Leeds and Lords. Teams playing there have donw equally well batting first or second. 
But if you are playing in Manchester (where the 2019 semifinals would be played), history suggests its better to chase, while at The Oval (London), teams batting first have fared way better.

Let's look at the venues where teams have won with the highest average runs margin. Birmingham, Taunton and Chester Le Street are the preferred grounds to bat first where the winning margin is pretty high

Lets now look at those venues where teams bowling first have won with the highest wickets margin. Canterbury, Lords and Leeds are the venues where teams chasing first have fared well. 





Looking at overall batting win rate across premier venues in England, Nottingham and The Oval are the most preferred venues to bat first. 



Team strategy across World Cups games

Some individual teams historically fare better when batting first vs chasing at World Cup matches, irrespective of venues and location. Australia, the top performer has mostly won by batting first (60%). In contrast, arch rivals New Zealand are way better chasers (60%) !! 



In the early days of World Cups or even other cricket matches for that matter, teams always preferred to bat first. Bat first, score big and put the opposition under pressure. But noticing the trend from 2007 World Cup onwards, stats show us that there is not much of a difference in batting first or chasing. Does this indicate that teams are getting better playing under pressure while chasing? Perhaps they are !


Conclusion

Most of the cricket enthusiasts would argue that batting first has been the best winning strategy so far, but data does not always support that position : captains winning the toss would benefit from looking at historical stats to make an informed decision. 
For the upcoming World Cup in England, insights from historical data suggests a few distinct approaches : 

Semi Final 1 (in Manchester) : data shows that Manchester has been a ground where teams have dominated while chasing. It also shows that, historically in England, semi finals were won by chasing. So the team winning the toss is more likely to win the match if they let the opposition bat first.

Semi Final 2 (in Birmingham) : if you win the toss in Birmingham, though the venue doesn't favor teams batting first or bowing, data also suggests that there has been big wins with runs margin (ie by batting first). The team winning the toss is more likely to win the match if they bat first.

Finals (Lord's) : Historically teams have won in finals in England by batting first. The team winning the toss is more likely to have their chance of lifting the World Cup if they bat first.

May the best team win the World Cup!!

Are you an Oracle Analytics customer or user?

We want to hear your story!

Please voice your experience and provide feedback with a quick product review for Oracle Analytics Cloud!
 

Wednesday, April 24, 2019

How to perform incremental data loads in Oracle Analytics Cloud

Oracle Analytics Cloud offers the ability to create data flows and perform incremental loads on the target table. Data flows have the ability to operate only on the incremental data which becomes available in the source in between the current run and the previous run. 

In this blog, let's see how to perform incremental load on a database table. This video gives a sense of the whole experience. The blog shares a few more details as well


Prerequisites for performing incremental loads

1) Incremental loads can be performed only on those data sets where the source and target tables are database based. 
2) If there are multiple data sets in a data flow, then only one of the sources can be set for an incremental update.

Defining a source and new data identifier for the source

The first step is to have a source table and identify a column using which new data can be identified in the table. In this example, I have a revenue fact table with month key as a new data identifier


Next step is to create the Oracle Analytics datasource pointing to this table. In the process of creating the datasource, make sure you mark the new identifier column by clicking on the 3rd node in the process flow. This is an important step as this column defines how the system will be able to identify new rows in the dataset.



Define a data flow

Now that our datasource is created, let's define a data flow by importing the revenue tact table from the source connection. The key here is to check the "Add new data only" box to ensure that the source table is marked for incremental load.


To make my dataflow a bit more functionally representative, I will add some business example : converting currencies values. Let's bring in a spreadsheet which has exchange rates for every month to convert and let's join it based on the month key column. Let's add a new calculation to convert revenue. 

Finally lets select step 'Save Data' and specify a name for the resulting data set. Make sure you choose the target connection as a database and specify the table name where the result set needs to be saved. There are 2 options available to select in "When Run" drop down. 
  1. "Replace existing data" : that will make the data flow truncate the target table and reload all the records. 
  2. "Add new data to existing data" : keep the existing records intact and load only the new records in the source table. New records are identified by the column we defined in the source dataset above.
Lets set the When Run option to "Add new data to existing data" and save the data flow.

Now, let's run the data flow for the first time. As it completes, we can see in our underlying data base that the target table has been created. Since this was the first run of the data flow, all the records in the source table are inserted into the target table. 



Now, for the example of this blog, let's go and delete a month of data (201812) from our newly created target table. After doing this, our source table still has its 12 months of data (Jan to Dec) but our target table now only has 11 months, it is missing December. Notice that we did not change data in our source table, so there are no new records there since our last run of the data flow.

So, as we run the data flow for the second time now, the target table does not get incremented at all. The data flow was set to only bring across new data, but there is no new data in the source so nothing is changed in the target table. We can check that the target table is still not loaded with the deleted month's data. If the data flow had been set to full load, all the data would be in the target.



Now, to complete the test, let's manually load our source table with 3 more months of data. This will represent some incremental data. Then let's re-run the data flow once again. We can find that the target table has been incremented with the 3 new months of data coming from the source table. But notice that data in target table is still missing for the month where the records were deleted :

Just to confirm for the purpose of this blog, if we go back to the data flow definition and set Run option to "Replace Existing Data" in the target table, then when we run the data flow all the data gets loaded, including the the deleted month's data.
 



Are you an Oracle Analytics customer or user?

We want to hear your story!

Please voice your experience and provide feedback with a quick product review for Oracle Analytics Cloud!
 

Tuesday, April 9, 2019

DV Aggregation : Metric Level vs Visualization Level



Oracle DV supports different aggregation rules for any metric like Sum, Average, Minimum, Maximum etc. The rule defines how the metric will aggregate when queries return data at an aggregated level (exp : sum of sales by customer when source data is at the sales order level)...

A metric's aggregation rule can be edited in several places while building analysis in OAC. It can be changed as a Metric property level, via different dialog boxes, or it can also be changed in a specific Visualization, only for that Viz. If you ever wondered about difference in behavior between these two main scenarios, let's dig deeper and find out. 

1) Setting aggregation at a Metric level

This can be done in two ways: In the Prepare tab or in the Visualize tab when building a DV project. The resulting behavior is the same in both cases: it sets the default rule for aggregation of the metric in this specific dataset. So this will establish the default aggregation rule for any project using this dataset column, not just for this project we are working on.

Setting it directly from the Prepare Tab:

To set a metric aggregation in the Prepare Tab, click the Metric and in the properties pane, go to General tab and change the aggregation to a desired option.


Once you apply this change, the new aggregation method applies to all the visualizations that use this metric across different DV projects. For example, if we set the aggregation of Sales metric to Sum, every time this metric is used in a viz, the value is computed using Sum aggregation. Let's look at Sales by Product Categories just below :


So now that the property is set for this metric the dataset level every project or visualization using this metric will compute the new aggregation rule. To check that, when we inspect the dataset from the dataset pane and look at its elements, we see the metric aggregation is set to Sum.


Setting the aggregation rule from the DV Canvas (Visualize tab) :

Another option to set the same level of aggregation is to change it directly from a Metric column in the DV Canvas (Visualize tab as opposed to Prepare tab). To set a Metric aggregation from there, simply click on the metric among the data elements and change the aggregation rule in the in the properties pane (bottom left pane).


The scope here is the same as setting it in the Prepare tab : it will set it as the default aggregation for that metric and will be used every time the metric value is computed, in any project.

2) Setting aggregation at a Viz Level

Metric aggregation could also be set for a single visualization in a given DV project only. In this case, the setting is specific to the visualization and overrides the default aggregation rule set at the dataset level. This can be done by clicking on any given Viz that exists on your canvas, going to the Values sub-tab within the Properties pane (the # sign) and setting the Aggregation method here. If a Viz has multiple metrics, each metric can have a different aggregation method. Changing the rule there will only apply to the Viz that you edited and will not impact the dataset. So any other viz using the same metric, in this project or any other project, will keep the original aggregation rule.

At first sight, changing the aggregation method at the Viz level may seem of trivial use : if the Viz has a Total invoked in it, the aggregation to arrive at the Total value will be the one specified in the Viz property. For example, looking at the table of Sales by Products Category with default aggregation Sum for this column, and let's add a Total to the Viz. By default, the Total value is calculated as a Sum of Sales. If we change the aggregation method only for the Viz to Average, we will now see the Total of the report showing an Average of Product Categories Sales.


Note that this only changed the Total aggregation line, the value for lines did not change. Why ? The calculation done by OAC here includes two passes : first retrieve Sales by each dimension requested in the Viz, in this case Product Category. This retrieving of data is still done using the default dataset aggregation rule (Sum in our example). Then, the second pass, at the Viz level, aggregates the retrieved data using the specific Viz rule for each row in the Viz. In this case, simply show the Avg of Sales for each Product Category and also for the Total of the report. For each single Product Category, Avg of Total Sales is the same as Total Sales, but for the sum of all Product Categories, we get the average value of Product Category Sales.

But if you look closer in the option, there is a very powerful subtlety that can be leverage when using Viz level aggregation rules : the by clause option.
Let's keep using our table of Sales by Product Categories, and let's now say we want to see average monthly Sales for each Product Category. For each Product Category, we want average of sales by Month... That's what the By clause will let us achieve. 
As we set the Viz level aggregation method to Average we can click on the By field and set it to any set of attributes (one or many). Let's just pick Order Month in our example. The view now shows Monthly average Sales for each Product Category and this aggregation is specific to the viz.



In this case, OAC actually retrieved information of Sales, summed by Product Category and Order Month in the first pass, and then, for each Product Category, calculated the average of all the monthly sales value. This applies to any other aggregation, with the 'by' columns of your choice, and for any Viz. This is pretty powerful analytics calculation done by a single user friendly click.

Are you an Oracle Analytics customer or user?

We want to hear your story!

Please voice your experience and provide feedback with a quick product review for Oracle Analytics Cloud!