Monday, February 17, 2020

How to secure your projects in OAC?

Sharing a data analysis & visualizations with the other team members in a safe and secure manner is a key requirements for Analytics solutions. Oracle Analytics offers the capability for designer to completely control the way their projects are shared and leveraged with other users.

This blog explores various scenarios in which a user can share a project with other users. The brief video below illustrates how a user can share projects with other users and how various rules define the privileges for other user. 

Scenario 1 : I don't want to share my project with anyone

There could be various instances where the data used for a project could be confidential and cannot be shared with other users. In this scenario the project should not be accessible to other users on the system. In order to achieve this, the owner of the project should save the project under My Folders instead of Shared Folders. All the objects under My Folders are accessible only to the owner, no one else can see them.

Scenario 2 : I want selected other users to have Read-Only access to my project

The first pre-requisite for sharing a project with other users is that to have it saved in Shared Folders. Projects that reside under my folders directory will not be visible to anyone else than their owner.

Once a project is under the Shared Folders directory, all users with only 'DV Consumer' role in OAC, will be granted Read-Only access to it by default. The purpose  of 'DV Consumer' role is to limit the access to users, to only consume content, and not edit it. Irrespective of any permission configuration for underlying projects.
Now, if I want also limit other users with higher roles (like 'DV Content Author') to also have Read-Only access to my project, I need to edit the security configuration of the project using the Share Tab and Access Tab of a project. Share and Access Tabs are accessed by clicking on the project properties and then on inspect menu.

  • Share tab is used to specify how the project can be consumed : only viewing the results, or being able to edit the queries and viz definitions.
  • Access tab is used to specify if a project can be opened in Read-Only mode or Read-Write mode. ie, can a user save over your project, or is he requested to save-as something else.




In order to make a project read only, we need to specify these permissions Share : View and Access : Read Only. 







With that, when a user with DV Content Author role opens the project, the project will open in read only mode : no Save or Save As option displayed.



Two important notes :

  • permissions can be granted to roles (groups of users), or to individual users. If a user belongs to a role with access rights, he will benefit permissions assigned to this role.
  • If a user is not listed in the permissioning tabs, nor belongs to any role that is listed in these tabs, he will simply not see the project.

Scenario 3 : I want user to be able to Save As, but not Save over

The access and share rules have to be set to Share tab : Edit, Access tab : Read-Only for the project to be opened in edit mode but without Save being enabled. 
When user opens the project he will have full experience mode but only with Save-As option enabled. He can therefore not modify your project and will have to save edits under a different name.

Scenario 4: I let user have full control on my project.

The access and share rules have to be set to Share tab : Edit, Access tab : Read-Write for the project to be opened in full experience mode with both Save and Save As options enabled. 




One important point 

Even if a project is shared with other users, they will only be able to see results if the data-sources of this project are objects they have access to as well. For example, if the project I just shared is using a data I uploaded from an XSLX file, I may need to visit the inspect tab of this OAC dataset and grant access to respective roles or users.

Securing entire folders

Catalog folders (under shared folders) containing multiple projects and sub-folders can be secured by configuring Access rules for that folder. Read-Only or Read-Write are the privileges available which can be set against a folder. Read-Only will render all the projects under the folder as read-only and Read-Write would provide edit access to the projects under the folder. By default, setting the privilege at the folder level trickles down the permission to all the sub folders and its objects.

The default behavior can be altered by checking the box "Apply access permissions to this folder only". If this check box is checked, then only those projects directly under this folder are affected with the permission settings, the sub folders and its projects are spared.


Controlling the access to projects for other consumers of the analytics system is an important aspect which ensures smooth collaboration between the teams and eliminate any unnecessary tampering of projects by other users. OAC's security mechanism helps the users safely share the projects with other users on the system and collaborate in an secure manner.

Are you an Oracle Analytics customer or user?

We want to hear your story!

Please voice your experience and provide feedback with a quick product review for Oracle Analytics Cloud!
 

Friday, February 14, 2020

Using Pareto Chart OAC Custom Viz Plugin

Pareto principle, developed by Vilfredo Pareto says - "80 percent of the losses come from 20 percent of the causes". Usually, Pareto principle or the 80/20 rule is applied on the significant few vs the trivial many. Graphically, a Pareto diagram simply displays a bar chart distribution of a metric by a dimension sorted in descending order of the value, overlaid with a line chart plotting the growing cumulative percentage. It helps to identify the significant few. Pareto chart can help in identifying top performers for a metric and their overall contribution across the categories.

This blog entry describes how to use the Pareto Custom Viz plugin for Oracle Analytics. That Custom Viz plugin can be downloaded for free from the Oracle Analytics LibraryThis brief video describes how to use this plugin within OAC :



Using the Pareto Custom Viz Plugin


Pareto Chart requires two mandatory input parameters : a Category dimension (X) and a Values object (Y axis). Color is an optional parameter. Values(Y-Axis) accepts a single measure. Category(X-Axis) accepts one or more attributes. 



The plugin automatically sorts the metric value in descending order, by dimension members, and plots it as a bar chart for each member. The metric value is computed as a cumulative percentage and plotted as a line chart. The axis of cumulative percentage is placed on the right hand side. Here's an example of a pareto depicting Sales by City:


Adding a color parameter changes the bar chart to a stacked bar chart with each bar divided into different color categories. However, the line chart is not altered.

As you hover over the bars, you see a tool-tip with X-Axis and Y-Axis values. When you hover the mouse on the line chart, along with the tool-tip, you see two dotted lines each pointing towards X-Axis and Cumulative %. These lines are helpful in dividing the category with Cumulative %. You can see the stacked bar and the dividing lines in the pic below.



Pareto Chart Plugin can plot negative values as well. Here's an example of a Profit metric with positive and negative values.



The plugin also supports typical DV mouse right-click interactions like Keep/Remove Selected, Use as Filter, Brushing and so on.

Hope this write-up was informative! Thanks for reading.

Are you an Oracle Analytics customer or user?

We want to hear your story!

Please voice your experience and provide feedback with a quick product review for Oracle Analytics Cloud!
 

Tuesday, February 4, 2020

Where to build my OAC custom calculations, in Projects or under Data-prep ?

Oracle Analytics allows any user to easily create custom calculations in data-visualization projects. It's a simple process to extend your analysis or data-sets with any calculations and aggregations you may need right while you are building your Vizs, irrespective of the type of underlying data. One can build a custom calculation on a DV Project directly, or also, build it directly in the data-set using the Data-prepare tab.

But precisely, since there are multiple places where custom calculations can be built, which one is the right one ? What is different between a custom calculation directly built in a DV project versus a custom calculation built in the Data Prep tab of OAC ? This blog entry is a short helper in understanding the main difference between these two. It should help users make the best choice on where to build their next custom calculation.

The flow of this blog entry is also illustrated in this short voiced video :





To illustrate the point, let's start with a simple example whose results may seem confusing. My data-set has order lines details, with sales value and quantity sold for each order line.

I built a custom calculation defined as Sales / Quantity twice : once in my project using Add Calculation menu, and once in my Data Prepare tab using the Create Column menu. Both these calculations are built with exactly identical formula shown on the picture.

Yet when add both calcs in the same visualization and compare the results, I am getting different values for each calculation...

See the two last columns in my result table below, once for each custom calculation :

Why is that ?????


At first sight, this is puzzling. How can the exact same calculation return different results based on where it's built in Oracle Analytics ? Which one is right and which one is wrong ?

The question behind this confusion really is : how is the sequence of aggregations & calculations happening in each case. Think about it, just like if in a spreadsheet file you computed the sum in a total column at the far right of your table, and then the divisions in the bottom line. If you did it vice versa, you may also get different results with the same formula...

  • In the case of the Project based calculation (first case here),

Oracle Analytics will first run the initial query to fetch raw data needed in the view, and return the aggregated results set. Then it will run the calculation. So for this example, it will first return sales and quantity by Product Category (3 rows) and only then it will calculate the division of Sales by Quantity ordered, for each of the 3 Product Category lines. That will give us what we could call a 'weighted Unit Price Average'.

  • In the case of the Data-prepare based calculation (second case here),

In this case, Oracle Analytics will do it the other way around : it will first execute the calculation at the data-set detail level. So, first, since our data-set is at Order-line level (each row in the data-set represents an order-line), Oracle Analytics will compute the division of sales by quantity distinctly for each Order-line.
Only once that is done, it will run the query to fulfill the data-viz and will then aggregate the result-set data for each Product Category. So, since in our example we set an aggregation rule of avg for our new Data-prep calculated Unit Price column, it will take all the granular unit prices for all order-lines in a given Product Category and return average of these. It will do so for each Product Category (3 rows). We could call this an 'un-weighted Unit Price Average' in this case.


The video above shows how these calculations can easily be manually verified in a spreadsheet.
Note that in the case of a Data-prep calculation, the result is exactly what we would have gotten if a Unit Price column had existed directly in the source data-set, for each order-line row (database table, csv file, etc). So in that sense, extending your  data-set with Data-prep new calculated columns is like extending it directly in your source file.

So, which calculation is correct then ?


... it depends.
It depends on what your analysis requires.

In most of the cases data visualizations require a level of calc-aggregation on initial data : by country, by employee, by year, etc.... The most representative value here is a weighted average, so that's produced using a Project-based custom calculation. This was our example. So in most of the cases, Project based calculations are the safest way to go, particularly when you are not sure of what you need. You can't be really wrong with it. Build your calculations in the Project custom calcs directly and be aware that if you want to persist them in the data-set for others to use, they may behave slightly differently if the calculations are later on aggregated in some vizs.

But in some cases, like binning for example, or running Explain, you need to keep a granular data in your calculation. And that is a data-prep level calculation. In our example, let's say that we also need to show sales by 'ranges' of unit prices. That is, bins of unit prices. Sum all sales where unit price is between 0 and x, then x and y, etc. This will work easily with the Data-prep calculation we have, we just need to drag it in the category axis of any chart and the binning will happen automatically. It will not work with the Project based calculation at all.

In conclusion, building custom calculations in a project or in a data-prep (data-set) are different things, potentially leading to different results. They each have a purpose within OAC and should be leveraged for appropriate needs. Use project build custom calculations in general as this will cover the most frequent need. Keep in mind Data-prep based calculation will help you achieve more advanced aggregation needs.

Are you an Oracle Analytics customer or user?

We want to hear your story!

Please voice your experience and provide feedback with a quick product review for Oracle Analytics Cloud!
 

Monday, February 3, 2020

Maps - Using the Location Match Dialog

What should an Oracle Analytics Map visualization show when some of the cities names it supposed to plot on a map do exist with same names in various countries ? 

For example, let's take a dataset that includes entries lie Barcelona and Liverpool for cities. We would expect Barcelona to plotted in Spain and Liverpool to be plotted in England. But it may not always be the case since there is a city named Barcelona in Venezuela and Liverpool in Canada and you might end up seeing these points plotted over Venezuela and Canada.

A short, simple fix for correcting this is to add the country name as part of the Map Category grammar. See the picture below, the left viz shows a Map grammar with only City in the Category, the right viz shows it with both Country and City. 

This easy addition removes the ambiguity and corrects the problem. The question is how can a user detect if there are any ambiguities or even mismatches in his data, compared to what the map layer expects ? Oracle Analytics Cloud includes a Location Match Dialog feature that directly answers the question for users of Maps Visualizations. 

On any map visualization, a user can perform a right click and choose the option "Location Matches" to pop up the Location Match Dialog. This helps to inspect how well the data from the dataset has been matched the map layer's data.
Location Match Dialog for Cities column
In the case of a multiple map layers on a map visualization scenario, there is a tab for each of the map layers with the open tab corresponding to the current map layer the user is on. There are various columns meant to help users understand and improve the match quality of the data :

Map Layer drop down:


When the user opens the Location Match Dialog for the first time, the data from the column is matched against the map layer used in the map. This is the layer selected in the drop down of the dialog box, it is a visual indication of the map layer that is matched to. The user can also select a different layer from the list of existing map layers and the dialog will show how well the data from the dataset can match to the newly selected map layer.

Summary Section:

The summary section shows the total number of rows in the user's dataset and also gives the precise number of rows that it had difficulty in matching with the map layer's data. In the above diagram, there were a total of 144 rows of City data matched with world cities map layer and there were issues with 22 among them.


Your Data Column:
The first 'Your Data' column shows the original data directly coming from the user's dataset. It only represents the rows for which an ambiguity or issue was detected.

Match Column:
Match actually shows the data matched to in the map layer (the data that resides geojson file for your map layer, wether it's a Vanilla OAC Layer or a custom Layer that was uploaded by the user).

If the data from the dataset didn't match with any data in the map layer, then a red warning indicator is displayed for that row. If there was a match, but not exactly a perfect match, then a yellow warning indicator is displayed. This does not necessarily mean that it was a wrong match, just that there were other potential matches and the system is not 100% sure which one to pick. So maybe the user would want to review these use cases to see if he can make the match better. For perfect matches, there are no indicators.

Match Quality:
The Match Quality column quantifies how good the match was. When the user opens the Location Match dialog, the rows are sorted from the worst matches to best matches.

  • The rows with no matching data with map layers data has the value "No Match" for the Match Quality column.
  • Multiple matches are the ambiguous cases.  In case the data from the dataset matches with multiple distinct entries of map layer's data, then the Match Quality column indicates how many such matches were found. Common way to resolve these type of issues is to add more information (columns) in the Location edge so that the ambiguities can be resolved. For example, let us revisit our original use case of plotting cities Barcelona, Liverpool on the map visualization. By default when the user bring these cities into the map visualization, they get plotted in Venezuela and Canada respectively instead of Spain and England. Location Match dialog for the same can be found below

Both the entries have ambiguous matches. To give a better picture of the mapping consider the following tables
Your Data
Map Layer Data
Barcelona
Barcelona, Venezuela

Barcelona, Spain

It found 2 matches for Barcelona entry in the map layer data and gives the Match Quality value as 2 matches. Similarly for Liverpool


Your Data
Map Layer Data
Liverpool
Liverpool, Canada

Liverpool, England

Now this is telling the user that maybe if he can add more data to the Location Edge, then it can resolve this ambiguity and plot the data point exactly where it is intended to. In this case, if the user adds the Country column to the Location edge, then it can resolve this ambiguity.

After adding the country column to the Location edge, the Location Match Dialog looks like this.

Now you can see that the Your Data column has the country value appended to the city name and with the help of this it is able to resolve the ambiguities for both Liverpool and Barcelona and plot it exactly in the map.


Barcelona and Liverpool plotted in Europe after adding Country column to the Location Edge
  • The next category of matches are the Partial matches. The partial match could be that a part of the word in the user's data got matched with that of map layer's data. In this case, the match quality value is the percentage of how close the two strings are.



  • The last category of matches are the Good matches where the data matches exactly with the map layer's data. In this case, the match quality value is 100% Confidence.


Finally we have the Remove column. this one allows the user to exclude rows of data from the viz, and also gives the option to set the scope for which the rows of data needs to be excluded. This includes Project scope, Canvas scope and the Visual scope. Once the user selects the rows and sets the scope, appropriate filters are created to exclude the rows.

Another interesting use case to ponder is what will happen to the Location Match dialog if a user decides to bring Lat/Long columns instead of semantic column types like city, state, country etc.



In this case, the map layer matched to is the Latitude/Longitude layer. There is one less column in the Location Match Dialog because we know what coordinates we are trying to match to. So the Match column just indicates whether the row entry is valid or not.

So the Location Match dialog primarily helps users understand the extent of ambiguous matches or mis matches in their data when they are plotting it against a map layer. It won't automatically fix the data, so the user still has to either bring more data in his viz to remove the ambiguity, or fix the map layer. But the Location Match dialog gives a detailed list of what needs to be fixed to fully match map layers. This is big step in overcoming the barrier of the geospatial data mismatch.

Are you an Oracle Analytics customer or user?

We want to hear your story!

Please voice your experience and provide feedback with a quick product review for Oracle Analytics Cloud!