Quantcast
Channel: Infosys-Oracle Blog
Viewing all articles
Browse latest Browse all 561

Enrich Oracle Database Data with Data from Distributed File System

$
0
0

In today's era of advance analytics, data from only relational database is not enough to get proper insight about any business. There is a growing need to integrate data from various sources and analyze them to produce more meaningful and innovative idea to take better decision.

Below are the steps to Enrich Oracle Database Data with Data from distributed File System e.g. HDFS) and accessed through Hadoop jobs for Integrated Reporting:

1.Table function's launched DBMS_SCHEDULER manages Hadoops Map reduce job

2.Enabling to redirect Hadoop's result into Oracle query for blending

3. A bash script is launched by the DBMS_SCHEDULER asynchronously which in turn register Hadoop job

4.Oracle table function and the Hadoop's mapper collaborate using Oracle's Advanced Queuing feature

•A Common queue is used to en-queue and de-queue data by Hadoop mapper and Oracle table function respectively

Oracle Advance Queue supplies the data into Oracle Database of various BI tool's consumption

5. At a time one of the slave node submits the External Job

 

View imageEnrich Oracle Data With HDFS Data.PNG


The following steps will provide more insights and detailed steps regarding the integration and various steps/procedure to optimize performance while fetching data from hadoop's process: 


  • Create a table function that will act like a Query coordinator(QC) as well processing role
  • DBMS_SCHEDULER of Table function invocation starts an asynchronous job which will run a bash script and in return the script will start a synchronous launcher in the Hadoop cluster for the mapper process
  • The Mapper process processed the data and writes into a queue. To improve the performance, rather the writing sequentially by each and every mapper process, we can choose to write in batch mode
  • In the next step, the De-queue process, Oracle database supports parallel invocation of table function to get data from Oracle Advance Queue. The table functions leverages both the oracle data and data from queue, then integrates them into a single dataset for the user
  • Once the Hadoop mapper processes are started, the job monitor process keeps an eye of the launcher bash script. The bash script finishes execution once the mapper process finished processing

·           The job monitor also keeps track of remaining data elements in queue and the table function keeps on processing the data.

  Pros of Using Oracle table Function and Advance Queuing Facility:

  •  The Oracle Advance queuing feature provides us the load balancing capability by enabling the parallel execution of table function and          at   the same time Hadoop mapper job will also run in parallel with a different cardinality of parallelism and without any intervention of            Oracle's Query Coordinator
  • This method eliminates the necessity external data storage while accessing Hadoop's result
  • Data from both Oracle table and Oracle queue (Hadoop's result) are used by Table function to produce combined result for the user

        Cons of Using Oracle table Function and Advance Queuing Facility:

  • When there is a need to synchronize the timing of the data to be processed, we should use Table Functions and Oracle Advanced Queue          features

        Recommendation:

  1.  For better performance of Queue, we can insert data in a batch mode from individual mapper job
  2.  Delivery mechanism can be changed to pipeline and relational table for optimized performance

 

 


Viewing all articles
Browse latest Browse all 561

Trending Articles