Skip to main content

Mapreduce Vs Pig Vs Hive


Apache Hadoop is an open source framework intended to make interaction with big data easier. Hadoop has made its place in the industries and companies that need to work on large data sets which are sensitive and needs efficient handling. 

There are several components that Hadoop Ecosystem has to handle the huge data collectively. MapReduce, Pig and Hive are one of the Key components.



MapReduce is a software framework and programming model used for processing huge amounts of data. MapReduce program works in two phases, namely, Map and Reduce. Map tasks deal with splitting and mapping of data while Reduce tasks shuffle and reduce the data.

Apache Pig is an abstraction over MapReduce. It is a tool/platform which is used to analyze larger sets of data representing them as data flows. Pig is generally used with Hadoop. We can perform all the data manipulation operations in Hadoop using Pig.

Hive is an open-source system that processes structured data in Hadoop, residing on top of the latter for summarizing Big Data, as well as facilitating analysis and queries. Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. The structure can be projected onto data already in storage.

There are differences among three components.

 

MapReduce

Pig

Hive

MapReduce built on top of Hadoop

Pig is open source

Hive open source

It is a data processing paradigm.

It is a data flow language.

Hive uses a language called HiveQL.

MapReduce is low level language.

Pig is a high level language.

HiveQL is a query processing language.

MapReduce jobs have a long compilation process.

In pig there is no need for compilation.

Hive compiler parses the query.

Exposure to Java is must to work with MapReduce.

Basic knowledge of SQL is enough to work with Apache Pig.

Basic knowledge of SQL is enough to work with Hive.

MapReduce was developed by Google

It was originally created at Yahoo.

It was originally created at Facebook.

More lines of code

Comparatively less line of codes than MapReduce

Comparatively less line of codes than MapReduce and Pig

More development involved

Development effort is less code efficiency

 

Development effort is less code efficiency

 

MapReduce can handle structured and unstructured data

Apache Pig can handle structured, unstructured, and semi-structured data.

 

Basically Hive handle only structured data.

 

 

Comments

Popular posts from this blog

Photo : Savitri during her last days

Even after 4 decades of her death, people are very much interested to watched her (Savitri) biopic. Mahanati turned to be stupendous success. On eve of remembering Savitri, here we posted Savitri photo. The photo seems her last days. The man stand behind Savitri is Gemini Ganeshan who married Savitri. It seems that she suffered bad days during her last days.

Police Attacked Asaduddin Owaisi - Rarest Photo

Asaduddin Owaisi make mockery at Muslims!

Asaduddin Owaisi who is president of All India Majlis-e-Ittehadul Muslimeen is playing crucial hidden role in TRS Government. MIM was friendly ally to Congress once in United Andhra Pradesh. 12% Minorities turns to be vote bank to Congress in every elections. MIM was openly supported Congress to counter BJP. But people are seeing very difficult situation in Telangana state. Congress is in opposition in Telangana and BJP comes to fourth position as per 2014 results. Though TRS was not in NDA, but KCR political movements seems to be close to BJP in every aspect. At the time of No-Confidence motion TRS voted for BJP government where as MIM voted against BJP. But both MIM and TRS were in good understanding in Telangana and diversified political stand at Central. During the time of Deputy Chairman for Rajya Sabha too TRS voted in favor of NDA nominee and MIM voted for Congress nominee. Now the question among many muslim voters why Owaisi brothers go along with TRS though know KCR main...