Mapreduce Vs Pig Vs Hive

Apache Hadoop is an open source framework intended to make interaction with big data easier. Hadoop has made its place in the industries and companies that need to work on large data sets which are sensitive and needs efficient handling.

There are several components that Hadoop Ecosystem has to handle the huge data collectively. MapReduce, Pig and Hive are one of the Key components.

MapReduce is a software framework and programming model used for processing huge amounts of data. MapReduce program works in two phases, namely, Map and Reduce. Map tasks deal with splitting and mapping of data while Reduce tasks shuffle and reduce the data.

Apache Pig is an abstraction over MapReduce. It is a tool/platform which is used to analyze larger sets of data representing them as data flows. Pig is generally used with Hadoop. We can perform all the data manipulation operations in Hadoop using Pig.

Hive is an open-source system that processes structured data in Hadoop, residing on top of the latter for summarizing Big Data, as well as facilitating analysis and queries. Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. The structure can be projected onto data already in storage.

There are differences among three components.

MapReduce	Pig	Hive
MapReduce built on top of Hadoop	Pig is open source	Hive open source
It is a data processing paradigm.	It is a data flow language.	Hive uses a language called HiveQL.
MapReduce is low level language.	Pig is a high level language.	HiveQL is a query processing language.
MapReduce jobs have a long compilation process.	In pig there is no need for compilation.	Hive compiler parses the query.
Exposure to Java is must to work with MapReduce.	Basic knowledge of SQL is enough to work with Apache Pig.	Basic knowledge of SQL is enough to work with Hive.
MapReduce was developed by Google	It was originally created at Yahoo.	It was originally created at Facebook.
More lines of code	Comparatively less line of codes than MapReduce	Comparatively less line of codes than MapReduce and Pig
More development involved	Development effort is less code efficiency	Development effort is less code efficiency
MapReduce can handle structured and unstructured data	Apache Pig can handle structured, unstructured, and semi-structured data.	Basically Hive handle only structured data.

A2Z

Search This Blog

Mapreduce Vs Pig Vs Hive

Labels

Comments

Popular posts from this blog

Photo : Savitri during her last days

Girl signed on 100/- Stamp Paper for getting 5K!

Rare Collection : Bhuma Nagi Reddy with Sr. NTR