Big Data & ETL's Evolution

Big Data & ETL's Evolution

The need for Extract Transform Load (ETL) Tools are ever-present as long as data consumption is there. ETL tool has been used in batch processing and transforming data as per the format required by data warehouse. Transformations have evolved into more complex due to enormous growth in the amount of unstructured data.

At High level, Big Data Hadoop eco system consists of,

Structure data : High level of organization. Data is stored typically in organized table structure.
Unstructured data : Data is not stored in any organized form. E.g. data from Social media, Smart phones, Sensors, Images, Emails, etc.
Hadoop (Hadoop Distributed File System) : Framework for processing/storage of extremely vast data; breaks the data into chunks and stores in the participating node servers.
MapReduce : S/W Framework for processing vast data on multiple clusters(nodes) in parallel in master (Map task) and Slave (Reduce task) mode.
Spark : Data analytics tool that operates on distributed data sources like Hadoop.
Pig & Hive : Both ease the complexity of writing complex MapReduce programs (Similar to Scripting/SQL but not exactly).
Sqoop : Migrates data in/out of Hadoop and relational data bases.

(Note: some of the above components are optional)

Image may be NSFW.
Clik here to view.

Fig 1. Hadoop Eco System

Given the growth and significance of unstructured data, there has been increase in need for major ETL players to provide solution options for transforming unstructured data to be used in analytics. Most of the ETL tools in the market are successfully marching towards that path. Here are some of the ETL tools offerings w.r.t Big Data,

Oracle - ODI:

The approach of Oracle's BigData is to enable Client's current data architecture to incorporate BigData and help to get more value to business and prospective analytical reporting and enable it to support other big data needs. ODI is important key tool for Oracle in this pursuit. The advanced new Big Data Wizard in ODI 12.2.1.1.0 supports many new Hadoop technologies.

Image may be NSFW.
Clik here to view.

Fig 2. Oracle Data Integrator

ODI ELT doesn't require middle tier engine for supporting big data components whereas typically ETL tools require intermediate servers to convert the mapping into programming languages like C++ for execution. ODI leverages its predominant feature of using underlying database efficiency for the processing to support big data. ODI ability to produce native code results in tremendous efficiency for the processing can be attained.

Cognos:

IBM has introduced a new suite 'BigInsights' for big data and analytical reporting. BigSQL authorizes Cognos to configure Hadoop as a data source. BigSQL can access Hive, Hbase and Spark synchronously using a single DB connection via Hadoop.

Business analysts and Executives can experience visually enhanced Big data reports from Cognos Presentation service which is a good value addition for understanding Big data. With BigInsights and BigSQL, IBM is providing tools for enabling Hadoop operations, including the ability to exchange components with the existing infrastructure and functionality of Cognos.

DataStage:

IBM platform for DataStage has engineered an easy integration service of heterogeneous data, including big data at rest (Data is stored and analyzed. E.g conventional data warehousing) or big data in motion (Dynamic data based on Real-Time or operational intelligence architecture. E.g Trading, Fraud detection, etc.).

DataStage, in its newer versions, now includes components such as new Big data file stages to access files (both read &write) from HDFS, Hive stages or has Stages to automatically generate MapReduce program.

Talend Studio for Data Integration:

Talend Data Fabric solution delivers high-scale and in-memory fast data processing. To generate native Spark and MapReduce code, it leverages Hadoop's parallel environment property.

Since Talend Open Studio is an open source solution it can be downloaded at no cost, but support will be provided only for subscription products. Subscription products has more functionality like shared repository, versioning and dashboards.

PowerCenter Informatica:

Informatica Corp launched Informatica BigData Edition which can be used for ETL in Hadoop environment along with RDBMS. Informatica BDE is available in versions 9.6 and later.

BDE runs in two modes, Native mode for normal power center ETL and Hive mode to support BigData additionally. Mappings moved to Hive will be executed in Hadoop cluster using Hadoop's parallelism (By MapReduce cability).

SQL Server Integration Services (SSIS):

Microsoft has new Visual Studio 2015 tools which contains new SQL Server Integration Services (SSIS) Tasks. This provides ETL options on Apache Hadoop, Sqoop for data import/export, Hive for SQL queries, the MapReduce distributed programming infrastructure and ODBC drivers to connect to your data in HDFS from tools like Excel and SQL Server.

JaspersoftETL:

Jaspersoft amended OEM agreement with Talend to use native connectors to Apache Hadoop Big Data environments in Jaspersoft ETL. Also Integration of Talend into the Jaspersoft BI Suite, supports all Big Data use cases.

Talend supports major Big Data platforms including Amazon EMR, Apache Hadoop (HBase, HDFS, and Hive), Cassandra, Cloudera, etc. For the robust performance and reliability, Big Data Edition has high availability and load balancing features for critical reporting and analysis requirements.

List of ETL Big Data Solutions Vendor-wise:

	*Big Data*	*Big Data in Cloud*
*ODI*	ODI for Big Data	Oracle Data Integrator Cloud Service
*Cognos*	BigInsights Suite	IBM BigInsights on Cloud
*DataStage*	Native BD file stages.	IBM Bluemix - IBM InfoSphere DataStage on Cloud
*Informatica*	Informatica Big data edition BDE	Informatica Big data edition BDE
*SSIS*	SQL Server Data Tools for Visual Studio 2015	Azure Data Factory
*Talend*	Talend Big Data Integration platform	Talend Integration Cloud
*Jaspersoft*	Talend native connectors	Amazon Redshift

- Xavier Philip

Big Data & ETL's Evolution

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112