Big Data Processing Architecture

In today's world, business needs to take decisions instantly on the data provided by business analyst to stay on top. In current scenario business analyst needs to process and analyze all types of data (structured, semi-structured and unstructured) in short span of time, which is not possible only through the traditional data warehouse concepts, to achieve this we need to move to big data. When we have decided to move to big data, which is the best architecture we can implement?

The two widely used big data processing architecture

Lambda architecture
Kappa architecture

Lambda Architecture:

The lambda architecture ensures both batch and real time data are taken into consideration for analysis. Using this architecture history of the data is maintained for any future analysis. The lambda architecture has the following advantages:

latency time
throughput
volatile

Latency time:

Latency time is the time taken between the generation of data and its availability in the reporting layer for analytics.

Throughput:

The large volume of data is broken into small blocks, gets processed in parallel increasing the throughput.

Fault tolerance:

The system which is designed to continue processing even when there is some error.

Architecture diagram:

The three major components of Lambda architecture:

Batch layer
Speed layer
Serving layer

Batch layer:

In the batch layer the complete data will be stored in HDFS immutable system, which means the data can only be appended not updated or deleted. The versioning of the data in append logic is achieved using the time stamp. From the main file system, the by using the Map reducer the data will be pre-commute to the batch views as per the business requirement. The ad-hoc querying is also possible on the batch view. The batch view is generated to achieve low latency time.

Apache Hadoop is used in the batch layer processing.

Speed layer:

The records which are missed in the latency time of batch processing will be fetched and stored in the speed layer dataset. Speed layer is a delete and create dataset, once the batch layer processing is completed speed layer dataset will be deleted and created with new data. From the speed layer dataset by using complex algorithm real time views are created.

Apache storm, SQL stream is used in speed layer processing.

Serving layer:

The serving layer is where the user queries for output. Once the user query is fired batch view and real-time view outputs are combined and near real time output will be provided. Druid can be used in the serving layer to handle both batch and speed layer views.

Kappa Architecture:

Kappa architecture is nothing but a simple Lambda architecture system with batch layer removed. This architecture has been designed in such a way that the speed layer is capable enough to handle both real time and batch data.

Architecture diagram:

The Apache Kafka streaming data is specifically designed to achieve Kappa architecture. In Kappa architecture there is only one code which needs to maintained for stream processing (Apache Kafka). In kappa architecture we need to run the full load job at first time of development as well any code change. Once the reprocessing of the full data is done and presented in the serving layer, the speed layer job will run fetching the latest records and present it in the serving layer. The speed layer job can be designed using (Apache Storm, Apache Samza, and Spark Streaming). The serving layer can use no sql database, apache drill etc.

Which architecture to go for?

If the code logic for both the batch layer and real time layer are same then we can go for kappa architecture.
If the code logic of batch layer and speed layer are different then we can go for lambda architecture.

Big Data Processing Architecture

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112