Thursday, December 10, 2020

OAC querying JSON Data with AJD (Autonomous JSON Database)

JSON or JavaScript Object Notation is an open data format and the preferred data interchange format for web applications. JSON format is highly used in developing web applications where fast, compact and convenient serialization of data is required.

While Oracle Analytics Cloud (OAC) does not inherently support JSON format data sources, overcoming this limitation is simply done by integrating with Oracle Autonomous Data Warehouse (ADW) or Oracle Autonomous JSON Database (AJD). Oracle AJD stores JSON documents in a native tree-oriented binary format and provides native, open-source document store API called SODA (Simple Oracle Document Access) that allows users to store and query JSON documents.

This short playlist gives a quick video tour of the steps involved to query JSON from OAC : OAC querying JSON data with AJD (Autonomous JSON Database)

In this blog, we will cover the steps involved in starting from a simple JSON document and enabling to visualize it directly in Oracle Analytics. An important point to note is that this should all be achieved without the user having to know any of the data architecture in the JSON... That is, I bring a JSON file which I am not familiar with, and I need to visualize it in OAC without having to understand it's structure beforehand.

Sample JSON document

Let us consider a sample JSON file with around 2000 Purchase Orders data rows. Each Purchase Order object contains several attributes like PONumber, Reference, Requestor, User, CostCenter and so on. It also has a nested array of LineItem objects. Each LineItem has attributes like ItemNumber, Part, Quantity and so on. Let's pretend we are not familiar with any of these details. For us, this is a PO Dataset :


In order to store this document in AJD (ADW), we need to make sure that the JSON format adheres to the following 'classic' specifications:

  1. Each JSON object appears as a single line in the file. Attributes and nested array of objects within an object should not be on separate lines
  2. Each JSON object be enclosed within curly braces 
  3. No comma separator between JSON objects
  4. No square brackets at the start and end of the JSON document

Here’s an example of a JSON file with the right format.






Once our format is checked, we need to go through a sequence of steps to make this JSON exist in the database, and query it with OAC.

While the list seems long, it's a series of basic steps, most of which need to be initiated only once for initial loading/analysis of the file. Subsequent refreshes of data for the same file require less steps to complete. Let’s look at each of these steps in detail :

Step 1 – Upload JSON file on OCI storage

First in the process is to upload our purchase order JSON file into an OCI object storage so it can be accessed by AJD. In order to do that, we log into to our cloud console, select Object Storage and choose an appropriate Bucket.  

Select the Purchase_order.json file from your local machine and upload it to the object store. Once the file is uploaded, choose the menu option on the uploaded file and View Object Details. This will display the URL path to access the file. Copy this path and save it for later use.

Step 2 – Create a Collection object in the Database

Next step is to connect to the AJD instance (via a SQL interface) and use SODA APIs command to create a 'collection' object that will store the JSON document. For example, connect to the database instance using SQL Developer, ensure that the user account used to make the connection is granted the database role SODA_APP, and run the following command :

DECLARE
    collection  SODA_Collection_T;
BEGIN
    collection := DBMS_SODA.create_collection('DEMO_JSON_PO');  
END;

This creates a collection called DEMO_JSON_PO. Once the collection is created, a corresponding table with the collection name is automatically created as well in the database.

Step 3 – Store JSON in the collection

Now,  you can to store the JSON document from object storage into the collection we just created. Make sure you first have a DB credential that can access the file in the object storage, to confirm that you can use the following syntax :

SET DEFINE OFF
BEGIN
  DBMS_CLOUD.CREATE_CREDENTIAL(
    credential_name => 'DEMO_CRED',
    username => 'oracleidentitycloudservice/demo.user@oracle.com',
    password => '************'
  );
END;

The username/password used in the credential creation should be one that has access to the JSON file on object storage. Once the credential is created, we use it to copy the JSON document into the collection using the following syntax

BEGIN 
 DBMS_CLOUD.COPY_COLLECTION(   
    collection_name => 'DEMO_JSON_PO',
    credential_name => 'DEMO_CRED',
    file_uri_list   =>
      'https://objectstorage.us-ashburn-1.oraclecloud.com/n/xxxxxxxxxxx/b/oacdemostorage/o/purchase_order.json',     
    format          =>
      JSON_OBJECT('recorddelimiter' value '''\n''')  );
END;

In the syntax, use the collection and credential name created in the above steps. file_uri_list is the URL to the JSON file on object storage (stored in step 1) . Recorddelimiter is a newline character in our JSON file.

Once the copy_collection is executed, we can confirm things by query the DEMO_JSON_PO table in the database and we find that each JSON line from the source file is loaded as a separate row in this table. The JSON contents are loaded into a blob column called json_document. 

Step 4- Create JSON index

Now that we’ve stored the JSON data in the collection, next step is to flatten this JSON structure into rows and columns so we can query it from OAC. Keep in mind we don't know the architecture of our JSON data hierarchy. So in order to do this flatting without this information, we first create a search index on the json column using the following syntax

Create search index DEMO_JSON_PO_IDX on DEMO_JSON_PO (JSON_DOCUMENT) for JSON parameters ('DATAGUIDE ON') ;

Here DEMO_JSON_PO_IDX is the index name, DEMO_JSON_PO is the collection name and JSON_DOCUMENT is the blob column name.  Specifying ‘DATAGUIDE ON’ is important in order to be able to extract the JSON schema as columns. This is actually what will parse the JSON file architecture into a series of flattened columns for us. Creating this search index will result in the creation of a set of related DR$ tables.

Step 5- Flatten JSON into relational format

Finally, we create a Database view with expanded columns for the tree structure of the JSON using the following syntax

exec dbms_json.create_view_on_path ('DEMO_JSON_PO_FLAT','DEMO_JSON_PO','json_document','$');

where DEMO_JSON_PO_FLAT is view-name, DEMO_JSON_PO is the collection name, json_document is the column with blob and $ indicates JSON path from the root. On executing this procedure a Database view is created with the JSON schema extracted as distinct columns. And each JSON object (line from our original file) is extracted as one or more rows in this view.


Step 6 – Query flattened data in OAC

That's it, we are done with flattening this JSON Data ! We can now simply query it via SQL commands. So to make it available in OAC we just need to create it as a dataset in OAC. One caveat, as the column names in the view are of the format JSON_DOCUMENT$xxx, previewing a dataset that is directly sourced from this view in OAC will likely result in “Invalid identifier” error. To get around this we just need to click the 'Enter SQL' option during dataset creation, just after we select all the columns of the view. Here, for each column specify an alias column name. 










At this point, we are able to query this flattened view directly from OAC retrieve data and visualize like any other source ! This datasource can be used in connection with any other OAC source, and for any of the OAC capabilities (dataviz, data flows, ML, Mobile, Answers...)

Incremental Data Refresh Scenario

Previous section described the various steps required for an initial load of JSON document into an AJD collection. For subsequent refreshes only a subset of these steps  are involved : 

- For simple JSON data refresh with no changes to JSON structure, following are the steps to follow







- For JSON data refresh with changes to JSON structure, following are the steps to follow






Thanks for your time reading this blog !

No comments:

Post a Comment