Monday, June 8, 2020

Maps - Custom Image backgrounds

Images are everywhere - and are not just worth a thousand words, thousand tweets, or text or anything not visual.Whether it's a meme, artsy photo, selfie or an article click-bait, a good portion of the websites are increasingly filled with more images and less text. Images are more impact-full than text, more engaging, more insightful and definitely more quick to interpret by human brains.

Oracle Analytics Cloud (5.6.0 onwards) allows users to pick any local custom image and rapidly turn it into a custom visualization.  Here is an business example we recently came across : process of manufacturing phosphoric acid. The organization needed to present analytics over a diagram that depicted the entire manufacturing process, for which they had data associated with.


Let's use the Oracle Analytics feature to simply turn this diagram into a powerful analytics visualization. The image on top of which the user wants to perform analytics will be used as a custom map background and the various portions of the image will be used as map layers. The problem statement can be broken down in to 3 simple tasks.

  1. Upload an image into the OAC platform.
  2. Identify, mark and designate various portions on the image to represent metrics/attributes
  3. Use the image to visualize data  

Upload an image into the OAC platform

To address #1, the user can now do the following with Oracle Analytics : go to Console -> Maps -> Backgrounds and then click on the Image Background section.


The user can click on Add Image and can choose any image stored locally. All image types are supported and there is no hard limit on the size of image.

Designate various portions on the image : build a custom layer

A new tool now helps users directly draw and mark various portions on top of a image. To navigate to the tool, the user needs to find the image on top of which he wants to draw layers and choose the option to "Create Map Layer".


This takes the user to the digitization tool where he simply has to draw the areas he wants on the image..


Users can draw 3 types of layers on top of the image :

  1. Points 
  2. Lines 
  3. Polygons
A layer can only be of a single type on, so this pick happens only once. But multiple layers can be build for a single image, and multiple layers can be represented on top of images on Oracle Analytics as well.
Since the phosphoric acid process mainly has rectangles involved, we decide to go with polygon layers on top of this image. Also note that under polygons, the user can draw a rectangles, polygons or circles of any sizes.


We use the Rectangles option to draw rectangles over the various stages in the process of manufacturing acid.


Once the drawing is completed, user can go to the Edit tab and still adjust the position, resize or restructure the objects on the layer. There are options to duplicate or delete objects, and also to undo changes made to a layer. 
As and when a user completes drawing an object on an image, a default name gets assigned to it, it appears at the top of the screen and gets added to the list of objects on the left of the screen.

Users should be cognizant of this since these names will be the IDs with which data will be matched against to find the best matching map layer. So it is imperative on the user to go and name the ids of the various layers accordingly.  

We have manually renamed the names of the objects on our layer it now looks like the image shown below

Once the user is done creating objects on top of the image, the map layer can be saved. 

Behind the scenes a geojson file will be automatically generated based on the layers drawn on top of the image. The user can go under the Custom Map Layers tab in Maps Console and see the map layer that was generated.

Use the image to visualize data

Now the final step is to bring-in the dataset. In our case, it consists of details on the various stages in the acid generation process including power consumption and  volume capacity.


We will be making use of the assignment of map layer to a column feature to manually assign the custom map layer generated in the previous step to the NAME column. This step is not required, as map backgrounds and map layers can be configured directly on the Map Visualization properties. However assigning the layer directly in the data preparation step (what we do here) will help Oracle Analytics default the right view/Map layer to this column.

Once in the visualize tab, the user can bring in the NAME column into a canvas and should see map viz render with the acid process image as map background with the custom map layer generated in the previous step on top of it.


We can bring in a power consumption metric to the color edge to show how much power is consumed in various stages.


Also other business charts can be brought into the canvas and the normal DV interactions such as brushing and filtering can be used with image background.


   Another example of image background with polygon layer is that of a picture of motor bike shown     below

An example of image background with point layers is shown below

This is a very powerful feature and can be treated like a custom visualization for different images.
Thanks for reading our blog !

How Regex can enhance Data Preparation?

Is simple find-and-replace not sufficient for your need in Oracle Analytics Data Preparation ? Regular Expression Replace functionality is bringing great new powerful abilities with pattern matching in Oracle Analytics Data Prepare. We will see what it has got to offer, in this blog.
First, take a quick look at the video which explains briefly about this feature.


Oracle Analytics now allows users to utilize the full potential of Regular Expressions aka Regex in pattern matching of column text in OAC Data Preparation or Data Set editor. Before we jump into the feature, let's recollect the basics of regular expressions. You can skip this part if you’re already familiar with Regular Expressions.

Regular Expressions in a nutshell

Regular Expressions is a powerful tool for matching patterns. A pattern can be a word or a specific character or sequence of characters, occurring once or multiple times. Regex can also help you find words occurring in a specific format, like email address. We will only address some of the useful Regex patterns in this blog entry, which will still help you kick-start Regex Replace in OAC Data Preparation.

For starters, let's see some of the commonly used Regex patterns :
Let's start with a pattern to match one or more digits. ‘\d+’ can be used to find the digits from 0-9 anywhere in the string. The backslash ‘\’ is used as escape character, ‘d’ is the shorthand code for digits and ‘+’ sign denotes one or more occurrence. Other useful patterns are:

  • w            Any alphanumeric character
  • W           Any non-alphanumeric character
  • d             Any digit
  • D            Any non-digit character
  • .              Any character
  • *             Zero or more repetitions
  • +             One or more repetitions
  • ?             Optional character

Character Classes or Character Sets
When you want to match only one out of several characters, Character classes or Character-sets can be used. For eg. to match an a or e, we can use [ae]. The pattern ‘gr[ae]y’ matches gray and grey but not Grey or graay.
It is also possible to specify a range with [] brackets. For eg. [0-9] matches a digit from 0-9, [a-z] matches a letter from a to z. Multiple character ranges can also be used. For Eg. [A-Z0-9] matches an uppercase alphabet followed by a digit. Remember, the patterns are case-sensitive!

Grouping Patterns
In Regex, group of characters can be defined for capturing and processing the data. Let's do this with an example. Consider the text ‘ABSD0028’ which has two parts – alphabets(ABSD) and digits(0028). If we have to write a pattern to split the text into two parts, we can do so by grouping the characters with parenthesis and each group can be captured using back-references. Here is our pattern.
([A-Z]+)(\d+)

Back-references
Now, having learnt about the grouping, we must know how useful are they in the real-world scenario. The main purpose of grouping in regular expressions is to use one or more groups in the output. Let’s consider the above example. We grouped the text into two parts. We can refer these parts using back-references and construct a new text. Naming of the back-reference parameters might vary depending upon the platform/programming language on which Regex engine works. In OAC, back-reference parameters are denoted by a ‘$’ sign followed by the order of occurrence of the group.  For eg. in the pattern ([A-Z]+)(\d+), ‘([A-Z]+)’ will be referred as $1 and ‘(\d+)’ will be referred as $2.

Alternation
Alternation is the ‘OR’ operator in regular expression. To match red or green or blue, use red|green|blue. To match ‘red color’ or ‘blue color’ use grouping, for eg. (red|blue) color.

Constructing a pattern
Now, let’s construct a pattern which we could use in Regex Replace. Consider extracting email handle from an email address. Email ids are in the following format: <some_text>@<org_name>.<com>. Let’s split this format into three parts – first part, the email handle, second part, the ‘@’ symbol and third part, the domain. The first part could include alphanumeric characters with special characters like ‘_’ and ‘.’. So, the pattern for this part would be: ‘[A-z0-9_\.]+’. We can use the second part as it is. And, the third part is going to be alphabets with ‘.’, the pattern would be ‘[a-z\.]+’. Now, if we combine the three parts, we get the following pattern:

([A-z0-9_\.]+)@([a-z\.]+)

So, what’s new with Replace in OAC Data Preparation?

Let’s see what’s new in Replace functionality. For the demo purpose, I have created a project with a data set called Customer Profile, which has customer details like JOB_LEVEL, Email address, Phone number, Occupation, Education etc.


Earlier, in Data Preparation, there was just plain search and replace feature in the context menu of the columns. There was no option to specify a partial or entire match. Now, there are three options available in Replace page as radio buttons.

       a.   Match partial values
       b.   Match entire values only
       c.   Use regular expression


When option a is selected, matches are found in both partial and entire values of the text. Let’s see how ‘Match partial values’ work in the column Occupation. In the Prepare tab of the project, right-click on the column Occupation and select Replace.


You can see the following values in Occupation: 


Enter ‘Manual’ in String to replace and ‘Worker’ in New String. Select ‘Match partial values’. You will notice that both ‘Manual’ and ‘Skilled Manual’ are modified to ‘Worker’ and ‘Skilled Worker’ respectively.


Now, select ‘Match entire values only’ with same input. You will notice that only ‘Manual’ is modified to ‘Worker’ and ‘Skilled Manual’ remains the same. This option comes in handy when the requirement is specific to replace the entire values alone.


We will explore option c, ‘Use regular expression’ with rest of the examples.

Use of Alternation or ‘|’
In the column ‘ENGLISH_EDUCATION’ let’s remove the words ‘Partial’ and ‘Degree’ from the values.


Enter ‘Partial|Degree’ in String to replace, leave New String as blank and select ‘Use regular expression’. Now you will see that ‘Partial’ and ‘Degree’ are removed from the values like ‘Partial High School’, ‘Partial College’ and ‘Graduate Degree’ correspondingly.


You can use the ‘|’ symbol as the ‘OR’ operator to include more than one value to match. As you might have noticed, this eliminates multiple steps when more than one matches need to be replaced with a single value.

Obfuscation, using literal matches
Obfuscation is a significant step in Data Preparation, as it improves the privacy and security of sensitive content. Popular candidates for obfuscation could be Social security numbers, phone numbers, account numbers etc. With powerful features of regular expressions, it becomes easy to obfuscate in OAC. Let’s see how to do that with an example. In our data set, let's obfuscate the column Phone. If you notice the values of this column, it is divided into three parts. The first three digits within parenthesis, next three digits separated with a dot (.) from the last four.


Let’s replace the digits with ‘x’ mark. Enter ‘\d+’ in String to replace and enter ‘xxx’ in New String (‘[0-9]+’ and ‘\d+’ can be used interchangeably). Select Use regular expression and you will see all the numbers are replaced with ‘xxx’. Various patterns can be used according to the requirement.


Extraction with Back-reference
Let’s see how to extract specific part of the text using regular expression. Consider the column Email_address from our data set. Let’s extract the email handle from the email address.


As you might recall the pattern we constructed in the Regex example for the email address – ([A-z0-9_\.]+)@([a-z\.]+) can be divided into three parts. We’re interested in the first part, which is the email handle. Enter the pattern ‘([A-z0-9_\.]+)@([a-z\.]+)’ in String to replace and enter $1 in the New String and select Use regular expression. Now, you will just see the first part of the email address in the column. You may rename the column name as email_handle.


So, once we group and divide a text according to our need, we can use them in the output as back-references. This might be helpful where the data has multiple columns combined into one. For eg. IL-Chicago, where State and City combined into one column. Columns with log entries, timestamps etc could be interesting candidates for extraction using back-references.

Enriching data with Regular Expressions
When we want to enrich the existing data in a highly customized format, we can use grouping and back-references together to make a powerful combination to transform data in one shot. Let’s enrich the column JOB_LEVEL by inserting some text in between.


Similar to the example we saw in Grouping Patterns, JOB_LEVEL has two parts – alphabets first and digits next. The pattern ‘([A-Z]+)([0-9]+)’ can be used to match and split into two parts. Let's insert the text ‘_000’ between two parts. Enter ‘([A-Z]+)([0-9]+)’ in String to replace, ‘$1_000$2’ in New String and select Use regular expression. You will see the JOB_LEVEL is transformed into an enriched column. This could be helpful when the data is expected to be in a specific format.


What we have seen here in this blog is, just the tip of the ice-berg. A lot more can be done with Regular Expression Replace and it can be a powerful combination in OAC Data Preparation.

Thanks for reading this blog!