Apache Hadoop is an open source framework intended to make interaction with big data easier. Hadoop has made its place in the industries and companies that need to work on large data sets which are sensitive and needs efficient handling.
There are several components that Hadoop Ecosystem has to handle the huge data collectively. MapReduce, Pig and Hive are one of the Key components.
MapReduce is a software framework and programming model used
for processing huge amounts of data. MapReduce program works in two phases,
namely, Map and Reduce. Map tasks deal with splitting and mapping of data while
Reduce tasks shuffle and reduce the data.
Apache Pig is an abstraction over MapReduce. It is a tool/platform which is used to analyze larger sets of data representing them as data flows. Pig is generally used with Hadoop. We can perform all the data manipulation operations in Hadoop using Pig.
Hive is an open-source system that processes structured data in Hadoop, residing on top of the latter for summarizing Big Data, as well as facilitating analysis and queries. Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. The structure can be projected onto data already in storage.
There are differences among three components.
MapReduce |
Pig |
Hive |
MapReduce built on top of Hadoop |
Pig is open source |
Hive open source |
It is a data processing paradigm. |
It is a data flow language. |
Hive uses a language called HiveQL. |
MapReduce is low level language. |
Pig is a high level language. |
HiveQL is a query processing language. |
MapReduce jobs have a long compilation process. |
In pig there is no need for compilation. |
Hive compiler parses the query. |
Exposure to Java is must to work with MapReduce. |
Basic knowledge of SQL is enough to work with Apache Pig. |
Basic knowledge of SQL is enough to work with Hive. |
MapReduce was developed by Google |
It was originally created at Yahoo. |
It was originally created at Facebook. |
More lines of code |
Comparatively less line of codes than MapReduce |
Comparatively less line of codes than MapReduce and Pig |
More development involved |
Development effort is less code efficiency
|
Development effort is less code efficiency
|
MapReduce can handle structured and unstructured data |
Apache Pig can handle structured, unstructured, and semi-structured data.
|
Basically Hive handle only structured data.
|
Comments