Comparative study between Oracle BDCS and Oracle Big
Data Cloud Compute Engine.
1. Oracle
Big Data Cloud Service: Gives us access to the resources of a preinstalled Oracle Big Data
environment, this also comes with an entire installation of the Cloudera
Distribution Including open source Apache Hadoop and Apache Spark. This can
be used to analyze data generated from Social Media Feeds, E-mail, Smart
Meters etc.
OBD CS contains:
·
3-60 Nodes cluster, 3 is the minimum
number of cluster node(OCPU) available to start with; where we can increase the
processing power and secondary memory of the cluster node can be
extended by adding Cluster computer nodes("bursting").
·
Linux Operating System Provided by Oracle
·
Cloudera Distribution with Apache Hadoop
(CDH):
-
File System: HDFS to store different types of
files
-
MapReduce Engine (YARN is default for resource
management)
-
Administrative Framework, cloud era manager is
default
-
Apache Projects e.g. Zookeeper, Oozie, Pig,
Hive, Ambari
-
Cloudera Application, Cloudera Enterprise
Edition Data hub, Impala Search and Navigator
·
Built-in Utilities for managing data and
resource
·
Big Data Spatial and Graph for Oracle
·
Big Data Connectors for Oracle:
-
Oracle SQL Connector for HDFS
-
Oracle Loader for Hadoop environment
-
Oracle XQuery for Big Data
-
ORE Advanced Analytics for Big Data
-
ODI Enterprise Edition
Typical Workflow of OBDCS: Purchase a subscription -> Create and manages users
and their roles -> Create a service instance
-> Create
an SSH key pair -> Create a cluster -> Control network access to services
-> Access and work with your cluster -> Add permanent nodes to a cluster
-> Add temporary compute nodes to a cluster (bursting) -> Patch a cluster
-> Manage storage providers and copy data
odiff (Oracle Distributed Diff) is a Oracle developed innovative
tool to compare huge data sets stores sparsely using a Spark application and
compatible with CDH 5.7.x. Maximum file/directory size limit is 2 G.B. to
compare.
2.