Oracle Data Lake
Evolution of data
In today's world, the quantity of data produced in a day is exponentially growing which is about 2.5 quintillion bytes of data being generated. The reason for the epidemic creation of data includes several platforms like Internet (information at our fingertips, web searches), Social media (fuels data creation with Facebook, Instagram, twitter, snapchat), Communication (from sending texts to email, GIFs, emoji's, skype calls), Digital photos (YouTube, voice search), Services like weather channel, Uber rides, transactions. As the data keeps growing, data handling comes to stake for most of the enterprise. The importance of these data rely on how the data is stored and how to extract value out of it effectively. The traditional method of storing data, such as relational database and data warehouses have their own limitations of storage capacity, type of data stored (Unstructured /semi- structured), storage cost, non-scalable.
Why Data Lake?
In order to overcome the limitations of traditional storage methods, Data Lake is provided by many service providers like amazon, snowflake, Microsoft, Oracle etc. for large storage with structured data, semi - structured data, unstructured data and binary data. It is a single root of all enterprise data including raw data from source system and transformed data used for activities such as visualization, reporting, prediction, advanced analytics and machine learning.
Here is how data lake differs from traditional data
warehouse.
Data Lake |
Data Warehouse |
No Structured Data model and Retains all the data irrespective of any
models |
Highly structured data model which have specific data which answers
the necessary questions |
Data Lake stores all the data types including web server logs and
sensor data |
Data warehouse does not
supports datatypes such as web server logs, social network activity, sensor
data |
Data Lake stores Raw Data and Data is available all time, to go back
in time and do an analysis. |
Data warehouse stores Processed Data and significant time is spent
on analyzing various data sources |
Schema is defined after data is stored, efforts at the end of the
process |
Schema is defined before data is stored, efforts at the start of the
process |
Can store unlimited data forever |
Expensive to store large amount of data |
Adaptive, Highly accessible and quick to update |
More complicated and costly to make changes |
Used by data scientist for predictive analysis and machine learning,
in-depth analysis |
Used by business professionals for structured view of data and
operational view of data |
Uses ELT (Extract, Load, Transform) process, it empowers users to
access data prior to the process of transformed and structured. |
Uses ETL (Extract,Tranform,Load) process, it provide insights into
pre-defined questions |
Data Lake- Oracle Cloud
Architecture:
Data Lake mainly constitutes of:
- Sources
- Landing zone
- Standardization zone and
- Analytics Sandbox
·
Key Components:
Oracle Data Integration Platform
Cloud(ODI)
Oracle Data Integration Platform Cloud is affiliated
platform for real-time data replication, data quality, data transformation,
data governance, cleanse, integrate and analyze data. ODI encompass:
- Migrate data without any down time
- Integrate Big Data
- Data health monitoring
- Automate Data Mart generation
- Profile and validate data
- Synchronizing data
- Support redundancy
Oracle Autonomous Data warehouse:
Oracle Autonomous Data Warehouse provides a fully autonomous
database that does not require data administration for scalability and provides fast
query performance. Deployment features includes either dedicated private cloud
in public cloud service or a shared simple elastic choice. Database is capable
of self-patching, self-tuning and upgrading by itself. The key features are:
- Elasticity
- Autonomous
- Database migration utility
- Cloud-based data loading
- Enterprise grade security
- Concurrent workloads
- High performance
· Oracle Stream Analytics
Oracle Stream Analytics is a tool for real-time analytic
computing on streaming big data. OSA executes in a scalable and highly
available clustered Big Data environment. It significantly enables users to
explore real-time data like sensor data, social media, Banking etc. through live
charts, maps, visualizations. Oracle Stream Analytics includes 30+
visualization charts, which are user friendly with respect to interface, based
on Apache Superset. It is developed and made available to all the users without
the need of any technical background.
Key features:
- Location-based analytics using built-in spatial patterns
- Machine learning to predict upcoming events
- Ad hoc queries on processed data
- Detecting real time fraud
· Oracle Cloud Infrastructure
Oracle Cloud Infrastructure is a cloud service, which
enables you to build and run a broad space of applications in a highly
available environment with control improvements related to on premise data
centers, subject to cost savings and the elasticity of the public cloud. Oracle
provides technologies that entrust enterprises to solve critical business
problems. Oracle Cloud Infrastructure is cloud purpose-built to allow
enterprises to run business-critical production workloads. Key Features
includes:
- High availability - deployment against multiple regions, availability domains (AD) and faulty domain configuration
- Scalability - ability to scale resources automatically up and down w.r.t changing business needs so you pay for only what you use
- Performance - High performance computing instances (HPC)
- Price - low and enhanced price performance compared to other cloud services
· Oracle Identity Cloud Service OICS
Oracle Identity Cloud Service provides single-sign-on SSO,
identity management and identity governance for the applications, which is in
the mobile, cloud and on premise application. It is fully integrated service
delivering the core identity and access management activity with a multi-tenant
cloud platform. Anyone can use the application any time anywhere on a device in
secure manner. Oracle Identity Cloud Service will directly integrate with the
existing directories and identity management which in turn easier for the users
to access the applications. The benefits includes
- Better user productivity and experience
- Reduced cost
- Improved business responsiveness
- Hybrid identity
OAC - BI Reporting &
Visualization:
BI helps in decision-making driven by data. BI encompasses
the generation of data and analysis, eventually visualization of data so that
business analysts and business leaders make the most needed decisions about
products, strategies, market timing, and other mission-critical factors.
- Oracle Analytics Cloud allows you to take data from any source, and explore and collaborate with real-time data
- OAC helps you ask any question from your data with mobile-friendly features in OAC
- OAC includes Self-service Visualization, Data preparation, Advanced Analytics, Enterprise Reporting
- OAC is cloud-based analytics solution within the Oracle Analytics or Business Intelligence space
Advantages of Data Lake
- Data Lake stores data in original form and the advanced analytics depends on the actual raw data, used by data scientists and analyst to experiment with data and advanced analytical support
- A data lake handles structured, semi structured or unstructured data such as streaming data, logs, equipment readings, telemetry data and able to derive value regardless of data type
- For high-speed data streaming in huge volumes, Data Lake makes use of tools such as Kafka, Flume, Scribe, and Chukwa to acquire high-velocity data, which is in the form of Tweets, WhatsApp messages, Instagram or it could be sensor data from the machine
- Offers cost-effective scalability and flexibility, we can store all types of data inexpensively hang on to it for some future analysis for getting value out of it anytime needed
- Collects and stores huge data sets, visualize telemetry and customer data, detect anomalies and ensure security
- Data Lake can be the data source for a front-end application providing application support
- In Data Lake, we can define the structure of data or schema, transformations at the time of its use, which is called schema on reading and also it allows schema free unlike traditional data warehouse
- Data Lake supports more languages other than SQL such as to analyze the data flow, PIG can be used and Spark MLIB for machine learning. Tools like Hive allow us to run multiple parallel sql queries thereby reducing the query access time
Industrial Applications of Data Lake
·