Wednesday, March 31, 2021

Database privileges required for Text Tokenization in Database Analytics

Oracle Analytics leverages several advanced analytics capabilities of Oracle Database and makes them available across various components within the product. Database Analytics step is one such option within the Dataflows that exposes several database functions like dynamic predictions, sampling, unpivoting, frequent itemset detection, text tokenization and so on. All these operations are directly sent (function-shipped) to the underlying database where certain DDL and DML statements are executed to process the data and return appropriate results back to Oracle Analytics. 

As this heavy lifting processing happens in the underlying database, Administrators need to make sure certain privileges and permissions are granted once and for all in order to execute the various SQLs statements required. The privileges needed depend on the type of Analytics function under consideration. Text Tokenization, for instance, is one such function that leverages Oracle Text capabilities to extract tokens from a column containing descriptive textual data. Refer to this video to understand more about Text tokenization within Oracle Analytics and its benefits.


In this blog, let’s look at what are the various steps involved when a dataflow with Text tokenization is executed and what are the database privileges required to execute them.

Following is a simple example of a dataflow with Text Tokenization step in the middle. In case of dataflows with Database Analytics step, the OAC connection used in the input dataset is automatically set as the connection in the Save Data step as well. This is done to ensure that there is no data transfer and the entire tokenization happens in the database.

Following are the privileges required to perform Text Tokenization: 

1) Access to CTX_DDL package 

        CTX_DDL is a PL/SQL package used to create and manage the preferences, section groups, and stoplists required for Text indexes. This package contains several stored procedures and functions. Text tokenization uses some of these stored procedures like CREATE_STOPLIST, ADD_STOPWORD, ADD_STOPCLASS, ADD_STOPTHEME etc to identify and manage words that are not to be indexed.  

Privileges – The user used in the OAC connection where the input dataset is sourcing from needs to have execute privileges on CTX_DDL package.  For example, let’s say the following is a screenshot of the OAC connection used to create the input dataset. Demouser is the database user used in this connection. demouser needs to have execute privileges on CTX_DDL package.      

Following is the SQL to grant the privilege.

Grant execute on CTXSYS.CTX_DDL to demouser;

Note:- Grant privilege needs to be executed explicitly to demouser. Doing something similar to grant execute on CTXSYS.CTX_DDL to public will not work.

2)   Create Table/Index/View 

As part of the dataflow execution, there are create table, create index and create view statements execution in the underlying database. Ensure that the user of the OAC connection has privileges to create table, create view and create index. If one of these is missing, you will encounter the following error during execution 

[nQSError: 17001] Oracle Error code: 1031, message: ORA-01031: insufficient privileges

Note: All the objects created as part of the dataflow like the intermediate view, stoplist, index etc will be deleted at the end of the tokenization process. Only the output dataset containing the tokens will be retained. 

Since the dataflow execution for tokenization involves executing several SQL statements, recommended approach is to create an OAC connection with username being the same as the owner of the input dataset table in order to avoid running into privilege issues during execution. However, if they are different, let’s see what happens during dataflow execution.

1) What happens when database table owner is different from OAC connection user ?

Let’s assume that a table called COMPLAINTS with a complaint_description column that needs to be tokenized and this table exists in a database schema called sourceuser. In OAC, a connection to this database is created with a different username called demouser.  In this scenario, let’s see if it is possible to use this OAC connection to tokenize the table and what privileges are required.

a) Select privilege on the table – To begin with, OAC Connection username demouser needs to have select privilege on the table in the underlying database. This is required for the table to even show up during create dataset step.  

b) Once the user has select privileges on the table in a different schema, tokenization should work as expected. All the database objects like the resultant table with the extracted tokens, intermediate steps involving view creation, index creation etc will all happen on the OAC connection username which is demouser. This dataflow execution will not create any objects in the source schema i.e sourceuser 

2) What happens when database table already has a text index on the column ? 

The tokenization process first checks if the input table already has an existing token. If it exists, then the dataflow will use this index and retrieve the tokens from the column using this index. Since this index has not been created by the dataflow process, it will remain untouched after the dataflow execution as well.  

Thanks for reading this blog !

4 comments:

sanket said...

Gsim is the best digital marketing institute in Gurdaspur

Jiguru G said...

Looking for Best Digital marketing course in Bhandup? I recommend you JiguruG as here we specifying the name of Best Digital marketing course in Bhandup. For More Detail Visit Here - https://jigurug.com/best-digital-marketing-courses-institutes-in-bhandup/

Aaron jhonson said...

I truly appreciate the time and work you put into sharing your knowledge. I found this topic to be quite effective and beneficial to me. Thank you very much for sharing. Continue to blog.

Data Engineering Services 

AI & ML Solutions

Data Analytics Services

Data Modernization Services

The prayas India said...

This is an awesome post. Really very informative and creative contents. Visit my website to get best Information About Best IAS Coaching Institute in Thane.
Best IAS Coaching Institute in Thane
Top IAS Coaching Institute in Thane

Post a Comment