Hadoop Overview

This blog deals with the Architecture of Hadoop, advantages and disadvantages of Hadoop.

------------------------------------------------------------------------------------------------------------------------------------------

Let's first understand what is Hadoop?

Hadoop is open source framework for processing and storing large data sets across the different clusters of computer which are present in different geographical locations.

Now let's understand why Hadoop?

The problem with the traditional database management systems is that it can process only structured data and it can handle only small amount of data (giga bytes). Hadoop can handle structured, unstructured and semi structured data. Hadoop can handle large amounts of data with high processing speed through parallel processing.

The Architecture of Hadoop has mainly two components. They are

1. Hadoop Distributed File System - For Storing Data

2. Map Reduce - Processing

Name Node is the master node which does the tasks like memory management, process management. It is the single point of failure in Hadoop Cluster. Secondary Name Node takes the backup of the namespace of the Name Node and updates the edits file into the FSimage file periodically. Data Nodes are the slave nodes which does the computations.

When client submits the job to Name Node, it divides the files into chunks and distributes the chunks to Data Nodes for processing. Each chunk is replicated 3 times and will be stored on three different Data Nodes. If one Node is going down, then the Name Node identifies the Data Node which have the replicated file and starts execution. This process makes Hadoop a fault tolerant System.

Now let's discuss the Limitations of Hadoop

1. Handling small files:

If you want to process large number of small files, then Name Node needs to store the HDFS location of each file. This will become over head for the Name Node. This is the reason why Hadoop is not recommended when it comes to handling large number of small files.

2. Processing Speed:

To process large datasets MapReduce follows Map and Reduce mechanism. During this process, the intermediate results of Mapper, Reducer function are stored to HDFS Location which results in the increase of I/O operations. Thus, the processing speed get decreased.

3. Not able to Handle Real Stream Data:

Hadoop can process large amount of batch files very efficiently. When it comes to Real Stream processing, Hadoop failed handle the real-time data.

4. Not Easy to Code:

Developers need to write code for each operation they need to perform on data, which makes it very difficult for them to work.

5. Security:

Hadoop does not provide proper authentication for accessing the cluster and it does not provide any information about who has accessed the cluster and what data the user has viewed. Security is the biggest draw back when it comes to Hadoop.

6. Easy to Hack:

Since Hadoop is written in Java, which makes cyber criminals to hack the system very easily.

7. Caching:

There is no cache mechanism in Hadoop for storing the intermediate results for further use. As result of this the performance got diminished.

8. Line of Code:

The line of code for Hadoop 1,20,000, which makes it difficult for debugging and executing.

9. Unpredictability:

In Hadoop we can't guarantee the time for completion of job.

Hadoop Overview

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112