What is Data Stream:
Speed is one
characteristic that drives the world now a days, whether it is downloading a
big file, movie or working from home etc. Merely increasing speed is not
sufficient, storage increase demand is also continuously rising. A solution was
offered by services like Netflix/Spotify to consume content directly into
handheld devices without downloading with exceptional speed. These services
made it possible to send and receive billions of bytes of data. Due to
continuous flow of data like jet of water, these services are called as streaming
services. Today data streaming exists in many forms; audio, video, media
streaming is just one part of it. Humongous growth in data and advancements in
engineering processes led to different ways of gathering, analyzing and
processing the data. Due to this it was possible to provide instantaneous analysis
of the streamed data.
Why Stream analytics:
Streaming
analytics or Real-Time analytics is an emerging type of analytics
that sources data in real time, performs simple operations or calculations real
time in order to provide business insight of fast moving data. It is quite different from traditional
Warehousing ETL techniques, in traditional techniques business calculations are
performed on a batch of data overnight, however in real-time analytics
operations like filtering aggregation, grouping etc. are performed on a stream
of continuous flowing data. Huge amount of data is flowing from one system to
another system every minute. It is observed that organizations which can act on
the stream of data are able to improve their operational efficiency. A wide
range of industries can take advantages by issuing real time alerts with the
help of real time data stream analytics. These alerts can be different type
including promotional alerts, fraud detection alerts or informative alerts etc.
Data stream analytics is highly scalable, low cost, high throughput and reliable
solution. Data Stream analytics is cloud based service, making it as low cost
solution in which organizations pay as per
usage. Streaming analytics is primarily a cloud solution provided by multiple
vendors like Microsoft, Oracle, Amazon etc.
Oracle Stream Analytics (OSA):
Oracle Stream
Analytics is a big data based real time tool which uses in-memory engine
technologies for real time stream data analytics. Data streams can source data
from applications from different areas like sensing equipment, Banking Point of
Sales, ATMs, Twitter or any other social media, Traditional Databases or Data Warehouses
etc. OSA offers a web-based, user-friendly
streaming analytics for business users. Users can dynamically develop, design and
implement instant analytical solutions which give insight of streaming real
time data. One of the best advantages of the tool is that it allows user to
explore the data with different advanced visualizations like charts, maps,
geographic markings etc. OSA uses Apache
Kafka and Apache Streams integrated with Oracle's engine in order to address
the real time requirements and analytical challenges of the users.
Components:
Stream: As the name suggest stream specifies the
source of flowing data (not static, continuously changing). The data can be
sourced from stock market, JMS Server, REST APIs, Twitter etc. This data or
stream of data changes with every passing second and is fed to Oracle Stream
analytics for processing.
Reference: Reference is the source of data which is
referred for fetching some information about the event data. It can provide
contextual information about the flowing data in stream. It can be static
database tables or static excel or csv files. In this release of OSA only
oracle tables are supported for reference.
Exploration: Business rules or set of criteria
defined for exploring and managing the event data. Exploration applies filters
on data, group data by different groups, provide summary of the event etc. An
already configured data can be added or attached to an exploration.
Topology
Viewer: Topology
viewer provides a graphical representation that showcase the dependencies amongst
different entities. Immediate Family and Extended family are the two contexts
supported by OSA topologies. Immediate family identifies the dependencies between
parent and child, however Extended family identifies the dependencies in full
context.
Pattern: Based on common business scenarios a
simple way to explore event streams is referred as Pattern.
Map: Geo fencing collection is referred as Maps,
it is used to locate the geographic coordinates specified from different sources
like GPS.
Shape: Shape is the representation of event
data in different forms like charts, pie graphs etc.
OSA Architecture:
First step in
OSA is to ingest data from applications, golden gate change data capture method
from Kafka. After that examining and analyzing is performed on the sourced
stream by using data pipelines. In Data pipeline data is queried, business or
conditional logics are applied, patterns are identified on the data streams.
All these operations are performed when data is flowing and not stored
anywhere. Continuous Query language(CQL) is used for querying data. CQL is
similar to SQL, it contains additional constructs for pattern matching and
recognition. OSA generate the query and spark stream automatically. Once the
analysis is complete using data pipeline data can be fed in data lake for
deeper insight analysis or any other integration trigger/alerts can be sent
immediately. A high level architecture
is as shown below:
Few reasons why
OSA should be used against its competitors are:
Simplicity: It is simple to use web-based tool which
doesn't require much technical skills. It can also generate and validate some
of the most powerful data pipelines automatically.
Apache
Spark: OSA
is built on apache spark which give the flexibility to attach itself to any compliant
yarn cluster. It is the first tool in market to introduce event by event processing
on a spark streaming.
Enterprise
Grade: OSA
can scale out horizontally and highly available (24*7) for critical workload
pipelines makes it an enterprise level tool. In-built governance ensures no
data loss at any point in time.
Industry Advantages:
Risk and
Fraud Management - Financial industry uses stream analytics
to detect the fraud on the PoS or online by analyzing the data streams.
Transportation
and Logistics - OSA
can help in managing fleet, tracking assets, and help in improving supply chain
efficiencies.
Customer
Experience and Consumer Analytics - Knowing the sentiment of the customers is
the key in releasing offers, analyzing trends etc. OSA can play a crucial role
in analyzing the customer trends.
Telecommunications -
OSA can help in proactively
monitoring the networks. It can also predict network failures and help achieving
high availability.
Retail -- Instant shopping trends, shelf
arrangements for benefits, customer cart utilization response can be achieved
with OSA to increase the sales in retail industry.