Oracle Exadata, MPP Databases or Hadoop for Analytics

INTRODUCTION

There is a plethora of databases today for example SQL databases, No SQL databases, Open source databases, Columnar databases, MPP Databases etc. Oracle which is a leader in the relational databases space is often compared to these. So let us look at some basic differences between Oracle and some other players like MPP databases (Teradata, Vertica, Greenplum, Redshift, Netezza etc). and Hadoop for various Analytics workflows

ARCHITECTURE

Oracle, Teradata, Vertica, Greenplum, PostgresSQL, Redshift and Netezza are all Relational Databases. However, Teradata, Vertica, Greenplum, PostgresSQL, Redshift and Netezza are massively parallel processing databases which have parallelism built into each component of its architecture. They have a shared nothing architecture and no single point of failure. On the other hand, Oracle database has a shared everything architecture. Even Exadata (which is Oracle's engineered system or Oracle's database appliance specifically for Analytics or OLAP) is based on existing Oracle engine which means any machine can access any data which is fundamentally different from Teradata as shown in diagram below. Thus MPP databases are able to break a query into a number of DB operations that are then performed in parallel thus increasing the performance of a query

This brings us to the next logical question of how are these MPP databases different from Hadoop? Hadoop is also an MPP platform. The more obvious answer would be MPP databases are used for structured data while Hadoop can be used for structured or for unstructured data with HDFS - a distributed file system. Also, while MPP databases introduce parallelism mainly in storage and access of data. Hadoop, with Map Reduce framework is used for batch processing of large amounts of structured and unstructured data more like an ETL tool. So it is a data platform

Image may be NSFW.
Clik here to view.

USER INTERFACE

Oracle and most MPP databases use SQL interface while Hadoop uses Map reduce programs or Spark which are java based interfaces. Apache HIVE project however is aimed towards introducing a SQL interface over Map Reduce programs.

INFRASTRUCTURE

The other difference between these systems is that most MPP databases like Teradata and Oracle Exadata run on propriety hardware or appliances while Hadoop runs on commodity hardware.

SCALABILITY

Oracle Exadata and most MPP databases scale vertically on propriety hardware while Hadoop scales horizontally which results in a very cost effective model especially for large data storage

STORAGE

The MPP databases use columnar data storage techniques while Oracle uses row wise storage which is less efficient in disk space usage and also in performance to columnar storage. However, Oracle Exadata uses Hybrid Columnar Compression (HCC) which is an aggregate data block created above the rows of data. The compression is achieved by storing the repeating the values only once in the HCC. Thus performance of Oracle Exadata is considerably better than row wise storage Oracle database. Hadoop on the other hand supports HDFS which is distributed file storage

USE CASES

Oracle is often the choice of database for Analytics where Oracle ERP systems are deployed. Oracle Exadata can meet OLAP workflow/ DSS requirements and has many Advanced Analytics options. More details can be seen at Oracle's Machine Learning and Advanced Analytics 12.2c and Oracle Data Miner 4.2 New Features.

Teradata is the choice of DB in case of pure OLAP workflows with its massively parallel processing capabilities especially when data volumes are high. Teradata is also the preferred choice in case of low latency analytics requirement where an RDBMS is still required However, it is losing market share as Teradata migration is a priority for most cost conscious CEOs due to its prohibitive year on year expense. Another reason for migration off Teradata is the adoption of new generation data analytics architecture with support for unstructured data.

The above sets the stage for Hadoop with its support for big data which can be structured or unstructured. It provides a platform for data streaming and analytics over large amounts of data coming from IOT sensors, social data from various platforms, weather data or spatial data. It is based off open source technologies and uses commodity hardware which is another attraction for many companies moving from Data warehouse to data lake ecosystem.

CONCLUSION

Thus, it is important to consider the Use case a database under consideration is designed to serve before deciding the best fit for your Big Data ecosystem. Making a decision solely on amount of data (Petabytes or terabytes) that need to be stored might not be accurate. The other factors that can influence one's decision might be your overall IT landscape/ preferred Infrastructure platform, developer skills, cost, future requirements which is specific to each individual organization. So though new age databases are opening new opportunities for data storage and usage, the traditional RDMS will most likely not go away in the near future.

REFERENCES

https://docs.oracle.com/cd/E11882_01/server.112/e17157/architectures.htm#HAOVW215

https://downloads.teradata.com/blog/carrie/2015/08/teradata-basics-parallelism

Oracle Exadata, MPP Databases or Hadoop for Analytics

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112