Monday, February 17, 2020

How to secure your projects in OAC?

Sharing a data analysis & visualizations with the other team members in a safe and secure manner is a key requirements for Analytics solutions. Oracle Analytics offers the capability for designer to completely control the way their projects are shared and leveraged with other users.

This blog explores various scenarios in which a user can share a project with other users. The brief video below illustrates how a user can share projects with other users and how various rules define the privileges for other user. 

Scenario 1 : I don't want to share my project with anyone

There could be various instances where the data used for a project could be confidential and cannot be shared with other users. In this scenario the project should not be accessible to other users on the system. In order to achieve this, the owner of the project should save the project under My Folders instead of Shared Folders. All the objects under My Folders are accessible only to the owner, no one else can see them.

Scenario 2 : I want selected other users to have Read-Only access to my project

The first pre-requisite for sharing a project with other users is that to have it saved in Shared Folders. Projects that reside under my folders directory will not be visible to anyone else than their owner.

Once a project is under the Shared Folders directory, all users with only 'DV Consumer' role in OAC, will be granted Read-Only access to it by default. The purpose  of 'DV Consumer' role is to limit the access to users, to only consume content, and not edit it. Irrespective of any permission configuration for underlying projects.
Now, if I want also limit other users with higher roles (like 'DV Content Author') to also have Read-Only access to my project, I need to edit the security configuration of the project using the Share Tab and Access Tab of a project. Share and Access Tabs are accessed by clicking on the project properties and then on inspect menu.

  • Share tab is used to specify how the project can be consumed : only viewing the results, or being able to edit the queries and viz definitions.
  • Access tab is used to specify if a project can be opened in Read-Only mode or Read-Write mode. ie, can a user save over your project, or is he requested to save-as something else.

In order to make a project read only, we need to specify these permissions Share : View and Access : Read Only. 

With that, when a user with DV Content Author role opens the project, the project will open in read only mode : no Save or Save As option displayed.

Two important notes :

  • permissions can be granted to roles (groups of users), or to individual users. If a user belongs to a role with access rights, he will benefit permissions assigned to this role.
  • If a user is not listed in the permissioning tabs, nor belongs to any role that is listed in these tabs, he will simply not see the project.

Scenario 3 : I want user to be able to Save As, but not Save over

The access and share rules have to be set to Share tab : Edit, Access tab : Read-Only for the project to be opened in edit mode but without Save being enabled. 
When user opens the project he will have full experience mode but only with Save-As option enabled. He can therefore not modify your project and will have to save edits under a different name.

Scenario 4: I let user have full control on my project.

The access and share rules have to be set to Share tab : Edit, Access tab : Read-Write for the project to be opened in full experience mode with both Save and Save As options enabled. 

One important point 

Even if a project is shared with other users, they will only be able to see results if the data-sources of this project are objects they have access to as well. For example, if the project I just shared is using a data I uploaded from an XSLX file, I may need to visit the inspect tab of this OAC dataset and grant access to respective roles or users.

Securing entire folders

Catalog folders (under shared folders) containing multiple projects and sub-folders can be secured by configuring Access rules for that folder. Read-Only or Read-Write are the privileges available which can be set against a folder. Read-Only will render all the projects under the folder as read-only and Read-Write would provide edit access to the projects under the folder. By default, setting the privilege at the folder level trickles down the permission to all the sub folders and its objects.

The default behavior can be altered by checking the box "Apply access permissions to this folder only". If this check box is checked, then only those projects directly under this folder are affected with the permission settings, the sub folders and its projects are spared.

Controlling the access to projects for other consumers of the analytics system is an important aspect which ensures smooth collaboration between the teams and eliminate any unnecessary tampering of projects by other users. OAC's security mechanism helps the users safely share the projects with other users on the system and collaborate in an secure manner.

Friday, February 14, 2020

Using Pareto Chart OAC Custom Viz Plugin

Pareto principle, developed by Vilfredo Pareto says - "80 percent of the losses come from 20 percent of the causes". Usually, Pareto principle or the 80/20 rule is applied on the significant few vs the trivial many. Graphically, a Pareto diagram simply displays a bar chart distribution of a metric by a dimension sorted in descending order of the value, overlaid with a line chart plotting the growing cumulative percentage. It helps to identify the significant few. Pareto chart can help in identifying top performers for a metric and their overall contribution across the categories.

This blog entry describes how to use the Pareto Custom Viz plugin for Oracle Analytics. That Custom Viz plugin can be downloaded for free from the Oracle Analytics LibraryThis brief video describes how to use this plugin within OAC :

Using the Pareto Custom Viz Plugin

Pareto Chart requires two mandatory input parameters : a Category dimension (X) and a Values object (Y axis). Color is an optional parameter. Values(Y-Axis) accepts a single measure. Category(X-Axis) accepts one or more attributes. 

The plugin automatically sorts the metric value in descending order, by dimension members, and plots it as a bar chart for each member. The metric value is computed as a cumulative percentage and plotted as a line chart. The axis of cumulative percentage is placed on the right hand side. Here's an example of a pareto depicting Sales by City:

Adding a color parameter changes the bar chart to a stacked bar chart with each bar divided into different color categories. However, the line chart is not altered.

As you hover over the bars, you see a tool-tip with X-Axis and Y-Axis values. When you hover the mouse on the line chart, along with the tool-tip, you see two dotted lines each pointing towards X-Axis and Cumulative %. These lines are helpful in dividing the category with Cumulative %. You can see the stacked bar and the dividing lines in the pic below.

Pareto Chart Plugin can plot negative values as well. Here's an example of a Profit metric with positive and negative values.

The plugin also supports typical DV mouse right-click interactions like Keep/Remove Selected, Use as Filter, Brushing and so on.

Hope this write-up was informative! Thanks for reading.

Tuesday, February 4, 2020

Where to build my OAC custom calculations, in Projects or under Data-prep ?

Oracle Analytics allows any user to easily create custom calculations in data-visualization projects. It's a simple process to extend your analysis or data-sets with any calculations and aggregations you may need right while you are building your Vizs, irrespective of the type of underlying data. One can build a custom calculation on a DV Project directly, or also, build it directly in the data-set using the Data-prepare tab.

But precisely, since there are multiple places where custom calculations can be built, which one is the right one ? What is different between a custom calculation directly built in a DV project versus a custom calculation built in the Data Prep tab of OAC ? This blog entry is a short helper in understanding the main difference between these two. It should help users make the best choice on where to build their next custom calculation.

The flow of this blog entry is also illustrated in this short voiced video :

To illustrate the point, let's start with a simple example whose results may seem confusing. My data-set has order lines details, with sales value and quantity sold for each order line.

I built a custom calculation defined as Sales / Quantity twice : once in my project using Add Calculation menu, and once in my Data Prepare tab using the Create Column menu. Both these calculations are built with exactly identical formula shown on the picture.

Yet when add both calcs in the same visualization and compare the results, I am getting different values for each calculation...

See the two last columns in my result table below, once for each custom calculation :

Why is that ?????

At first sight, this is puzzling. How can the exact same calculation return different results based on where it's built in Oracle Analytics ? Which one is right and which one is wrong ?

The question behind this confusion really is : how is the sequence of aggregations & calculations happening in each case. Think about it, just like if in a spreadsheet file you computed the sum in a total column at the far right of your table, and then the divisions in the bottom line. If you did it vice versa, you may also get different results with the same formula...

  • In the case of the Project based calculation (first case here),

Oracle Analytics will first run the initial query to fetch raw data needed in the view, and return the aggregated results set. Then it will run the calculation. So for this example, it will first return sales and quantity by Product Category (3 rows) and only then it will calculate the division of Sales by Quantity ordered, for each of the 3 Product Category lines. That will give us what we could call a 'weighted Unit Price Average'.

  • In the case of the Data-prepare based calculation (second case here),

In this case, Oracle Analytics will do it the other way around : it will first execute the calculation at the data-set detail level. So, first, since our data-set is at Order-line level (each row in the data-set represents an order-line), Oracle Analytics will compute the division of sales by quantity distinctly for each Order-line.
Only once that is done, it will run the query to fulfill the data-viz and will then aggregate the result-set data for each Product Category. So, since in our example we set an aggregation rule of avg for our new Data-prep calculated Unit Price column, it will take all the granular unit prices for all order-lines in a given Product Category and return average of these. It will do so for each Product Category (3 rows). We could call this an 'un-weighted Unit Price Average' in this case.

The video above shows how these calculations can easily be manually verified in a spreadsheet.
Note that in the case of a Data-prep calculation, the result is exactly what we would have gotten if a Unit Price column had existed directly in the source data-set, for each order-line row (database table, csv file, etc). So in that sense, extending your  data-set with Data-prep new calculated columns is like extending it directly in your source file.

So, which calculation is correct then ?

... it depends.
It depends on what your analysis requires.

In most of the cases data visualizations require a level of calc-aggregation on initial data : by country, by employee, by year, etc.... The most representative value here is a weighted average, so that's produced using a Project-based custom calculation. This was our example. So in most of the cases, Project based calculations are the safest way to go, particularly when you are not sure of what you need. You can't be really wrong with it. Build your calculations in the Project custom calcs directly and be aware that if you want to persist them in the data-set for others to use, they may behave slightly differently if the calculations are later on aggregated in some vizs.

But in some cases, like binning for example, or running Explain, you need to keep a granular data in your calculation. And that is a data-prep level calculation. In our example, let's say that we also need to show sales by 'ranges' of unit prices. That is, bins of unit prices. Sum all sales where unit price is between 0 and x, then x and y, etc. This will work easily with the Data-prep calculation we have, we just need to drag it in the category axis of any chart and the binning will happen automatically. It will not work with the Project based calculation at all.

In conclusion, building custom calculations in a project or in a data-prep (data-set) are different things, potentially leading to different results. They each have a purpose within OAC and should be leveraged for appropriate needs. Use project build custom calculations in general as this will cover the most frequent need. Keep in mind Data-prep based calculation will help you achieve more advanced aggregation needs.

Monday, February 3, 2020

Maps - Using the Location Match Dialog

What should an Oracle Analytics Map visualization show when some of the cities names it supposed to plot on a map do exist with same names in various countries ? 

For example, let's take a dataset that includes entries lie Barcelona and Liverpool for cities. We would expect Barcelona to plotted in Spain and Liverpool to be plotted in England. But it may not always be the case since there is a city named Barcelona in Venezuela and Liverpool in Canada and you might end up seeing these points plotted over Venezuela and Canada.

A short, simple fix for correcting this is to add the country name as part of the Map Category grammar. See the picture below, the left viz shows a Map grammar with only City in the Category, the right viz shows it with both Country and City. 

This easy addition removes the ambiguity and corrects the problem. The question is how can a user detect if there are any ambiguities or even mismatches in his data, compared to what the map layer expects ? Oracle Analytics Cloud includes a Location Match Dialog feature that directly answers the question for users of Maps Visualizations. 

On any map visualization, a user can perform a right click and choose the option "Location Matches" to pop up the Location Match Dialog. This helps to inspect how well the data from the dataset has been matched the map layer's data.
Location Match Dialog for Cities column
In the case of a multiple map layers on a map visualization scenario, there is a tab for each of the map layers with the open tab corresponding to the current map layer the user is on. There are various columns meant to help users understand and improve the match quality of the data :

Map Layer drop down:

When the user opens the Location Match Dialog for the first time, the data from the column is matched against the map layer used in the map. This is the layer selected in the drop down of the dialog box, it is a visual indication of the map layer that is matched to. The user can also select a different layer from the list of existing map layers and the dialog will show how well the data from the dataset can match to the newly selected map layer.

Summary Section:

The summary section shows the total number of rows in the user's dataset and also gives the precise number of rows that it had difficulty in matching with the map layer's data. In the above diagram, there were a total of 144 rows of City data matched with world cities map layer and there were issues with 22 among them.

Your Data Column:
The first 'Your Data' column shows the original data directly coming from the user's dataset. It only represents the rows for which an ambiguity or issue was detected.

Match Column:
Match actually shows the data matched to in the map layer (the data that resides geojson file for your map layer, wether it's a Vanilla OAC Layer or a custom Layer that was uploaded by the user).

If the data from the dataset didn't match with any data in the map layer, then a red warning indicator is displayed for that row. If there was a match, but not exactly a perfect match, then a yellow warning indicator is displayed. This does not necessarily mean that it was a wrong match, just that there were other potential matches and the system is not 100% sure which one to pick. So maybe the user would want to review these use cases to see if he can make the match better. For perfect matches, there are no indicators.

Match Quality:
The Match Quality column quantifies how good the match was. When the user opens the Location Match dialog, the rows are sorted from the worst matches to best matches.

  • The rows with no matching data with map layers data has the value "No Match" for the Match Quality column.
  • Multiple matches are the ambiguous cases.  In case the data from the dataset matches with multiple distinct entries of map layer's data, then the Match Quality column indicates how many such matches were found. Common way to resolve these type of issues is to add more information (columns) in the Location edge so that the ambiguities can be resolved. For example, let us revisit our original use case of plotting cities Barcelona, Liverpool on the map visualization. By default when the user bring these cities into the map visualization, they get plotted in Venezuela and Canada respectively instead of Spain and England. Location Match dialog for the same can be found below

Both the entries have ambiguous matches. To give a better picture of the mapping consider the following tables
Your Data
Map Layer Data
Barcelona, Venezuela

Barcelona, Spain

It found 2 matches for Barcelona entry in the map layer data and gives the Match Quality value as 2 matches. Similarly for Liverpool

Your Data
Map Layer Data
Liverpool, Canada

Liverpool, England

Now this is telling the user that maybe if he can add more data to the Location Edge, then it can resolve this ambiguity and plot the data point exactly where it is intended to. In this case, if the user adds the Country column to the Location edge, then it can resolve this ambiguity.

After adding the country column to the Location edge, the Location Match Dialog looks like this.

Now you can see that the Your Data column has the country value appended to the city name and with the help of this it is able to resolve the ambiguities for both Liverpool and Barcelona and plot it exactly in the map.

Barcelona and Liverpool plotted in Europe after adding Country column to the Location Edge
  • The next category of matches are the Partial matches. The partial match could be that a part of the word in the user's data got matched with that of map layer's data. In this case, the match quality value is the percentage of how close the two strings are.

  • The last category of matches are the Good matches where the data matches exactly with the map layer's data. In this case, the match quality value is 100% Confidence.

Finally we have the Remove column. this one allows the user to exclude rows of data from the viz, and also gives the option to set the scope for which the rows of data needs to be excluded. This includes Project scope, Canvas scope and the Visual scope. Once the user selects the rows and sets the scope, appropriate filters are created to exclude the rows.

Another interesting use case to ponder is what will happen to the Location Match dialog if a user decides to bring Lat/Long columns instead of semantic column types like city, state, country etc.

In this case, the map layer matched to is the Latitude/Longitude layer. There is one less column in the Location Match Dialog because we know what coordinates we are trying to match to. So the Match column just indicates whether the row entry is valid or not.

So the Location Match dialog primarily helps users understand the extent of ambiguous matches or mis matches in their data when they are plotting it against a map layer. It won't automatically fix the data, so the user still has to either bring more data in his viz to remove the ambiguity, or fix the map layer. But the Location Match dialog gives a detailed list of what needs to be fixed to fully match map layers. This is big step in overcoming the barrier of the geospatial data mismatch.

Friday, December 13, 2019

Merge/Union rows in OAC Data Flows

Merging 2 data sets into a single one is a common need for analysts manipulating diverse data sources. Oracle Analytics Cloud's Data Flow enables an friendly way to do this within the context of preparing data, just before building visualizations and deriving insights out of it. 

This blog highlights the different scenarios in which union rows step of the data flow can be leveraged. 

Prerequisites for using the Union Rows step in Data Flows 

The Union-Rows step in Data Flows can merge 2 data sets into a single one, provided the two input sources have the same structure. It operates in a similar way to the SQL Union operation in the database, where result set of 2 queries are merged into a single output :

  1. Both the data sets should have same number of columns,
  2. The data types of the corresponding columns in data sets should match. For example column1 of data set 1 must have the same data type as column1 of data set 2 and so on.
In other words, you can easily merge datasets that are of similar format. For example sales data for different months are in different datasources, or HR data for various subsidiaries are in different files, etc.

So, for our example, lets consider 2 data sets with sales order information for 2 separate months, Jun-2019 and Jul-2019. This data has columns like Order Priority, Customer Name, Product Category, Product Name and Sales as a metric. There are some customers in these data sets who have purchased the same product across the two months.

Data Set - June 2019  (100 rows) 

Data Set - July 2019 (100 rows)

We have 10 rows of customers who have purchased the same products across the two months. We will see how these 10 rows of duplicate data across the two months are handled in the various Union operations discussed below.

How to create a Union Row step in Data Flow?

Lets add both the data sets to an Oracle Analytics Data Flow, and further down in our Data Flow click on the "+" icon and choose Union Rows step. The Union Rows step gets enabled only when there is more than one data set added in the Data Flow.

As the step adds to the Flow, click on the suggested 'second' data node to complete the Union Step. that is just to indicate which two data-sets we want to union. 
The details of this node presents various options in which an Union operation can be performed are presented along with a visual representation of the chosen option. The visual representation clearly specifies what would be the resulting data set when the options are chosen. The Options available are

  • All rows from input1 and input2 (Union All)
  • Unique rows from input1 and input2 (Union)
  • Rows common to input1 and input2 (Interesct)
  • Rows unique to input1 (Except)
  • Rows unique to input2 (Except)

Use Case 1: Merge both the data sets

Lets say we want to merge both sets into a single data set for reporting purposes and keep all the rows. In this scenario, all the rows from data set 1 has to be merged with all the rows from data set 2 (equivalent to a Union All operation in SQL). This operation will not eliminate any duplicate records.The visual representation of this operation is shown on the right of the screen, explaining how the resulting data set would look like.

So let's complete our Data Flow with the two input data sets, use the Unions Rows node with the first option selected and execute the flow.

When the data flow is executed, a Data Set is created with the merged output, and its total number of rows (which can be calculated using a formula like sum(1)) is 200. This shows that the union operation merged both the data sets without removing any duplicates.

Use Case 2: Merge without duplicates

Lets now say we want to merge both the data sets but only want to keep the distinct combinations of customers and products they purchased over the two months. In effect, we want to remove duplicates across the two data sets with these same combinations. This is similar to a Union operation in SQL. A Union operation merges the rows from both the data sets by eliminating the duplicates and retaining a single copy of the duplicate rows

In order to achieve this, lets trim down the data set by retaining only customer and product related columns. So we achieve this by using a 'Select Columns' node just before our Union Rows node, and we keep only the following columns : Customer Name, Product Category, Product Sub Category, Product Container and Product Name. Let's now perform a Union Rows on these two data-sets and select the second option which is 'Unique rows from Input 1 and Input 2.'

When the data flow is executed and creates the merged output, we notice that row count in resulting data set is 190. This shows that only one copy of the duplicate records (10) were retained in the output.

Use Case 3: Rows common from input1 and input2

Now lets say we want to find the list of customers who have ordered the same products twice, once in June and once in July. In this scenario, we have to find the common records with same customer and same product between the two data sets. This is similar to an intersect operation in SQL. 

Let's define the data flow similar to the one described in Use Case 2 with just the customer and product related columns selected and apply a Union rows node with the third option 'Rows common to Input 1 and Input 2'.

When the data flow is executed and creates the merged output, we notice that row count in resulting data set is 10. Only one copy of the duplicate records were retained in the output. The detailed tabular reports shows the exact intersecting records between the data sets

Use Case 4 and 5 : Rows unique to input1 (Except)

Finally, lets say we are interested in those customers and products that were ordered only in June but were explicitly not ordered in July. Both the data sets have to be merged by retaining only the unique records in the first data set. This is equivalent to an except operation in SQL.

Let's again define the data flow and choose the option 'Rows unique to Input 1'

When the data flow is executed and creates the merged output, we notice that record count in resulting data set is 90, meaning only those records in the first data set except the records that are common to the second data set are saved in the output.

Note that if a user needs to merge more than two data sets, it can be done in multiple steps in the same Data Flow : first merge two data sets, then merge the output to the third data set and so on.

Union-Rows node allows a friendly way to merge two data sets in an Oracle Analytics Data Flow. It will work agnostic to the type of data source, as long as the respective structure of the data is identical.

Wednesday, December 11, 2019

Creating a Custom infograhic with OAC 5.5

This blog shows a preview of an upcoming OAC 5.5 feature. OAC 5.5 is expected to be GA very soon by Oracle (Jan 2020), the feature will become available then.

In the 5.5 release of OAC (Oracle Analytics Cloud) creating custom Infographics visualizations is made simple through the use of custom dynamic canvas backgrounds and the new spacer visualization. Infographics created on OAC can then be used from any device (browser or mobile) and dynamically resize per the screen size.  

In this example we are building a infographic showing stats about NFL fans attendance. We selected a custom picture of a stadium that we want to use as a custom background to our infographics. 

1. Setting custom DV background with an image.

To configure that image as a custom background, we are using Canvas Properties menu by right-clicking on the Canvas tab itself and selecting the option. Once there, we are setting the value Cutom for Background property.
There, pick the image location, adjust transparency and make sure you set size parameters to Auto-Fit, this will ensure your DV sample scales regardless of your screen size and resolution.

All we have to do next is to start building our infographic by positioning visualizations on top of that picture.

2. Laying out visualizations

In our example, we will use a combination of vertical bar charts and horizontal bar charts, along with the tag cloud extension. As we build our visualizationsm we can use the spacer visualization (OAC5.5), to customize our layout exactly as we need it…

we can drag the spacer extension to create separations between different visualizations. It can be resized as needed, just like other vizs, flipped either horizontal or vertically and set background color or as an image. While we are organizing our vizs, the canvas properties keep to the default autofit configuration, which means the whole page will properly resize accoding to screen size of the consuming devices.

Once our visualizations content and layout completed we can customize additional display properties such as font color and size for all report components, i.e, title, values, x/y axis

  Blog Collateral

  • Watch this short 3 minute video - Spacer Viz and Custom Background that comes OOB with OAC 5.5.