Apache PIG and Apache HIVE are two components in HADOOP ecosystem. Both PIG and HIVE used for the developers aimed for easing in writing Java code for Mapreduce programs. Those who have not much knowledge on JAVA would opt their choice either PIG or HIVE. If we observe diagrammatic HADOOP ecosystem both PIG and HIVE have exist in same verticals. When we discussed on performance of job, both HIVE and PIG are slow compared to traditional Map Reduce job. HIVE or PIG scripts have to be converted into a series of Map Reduce jobs. But both Hive and PIG can join, order & sort dynamically.
PIG | HIVE |
It is best for semi structured data | It is best for structured Data |
It is developed for programming | It is meant for reporting |
It is used as procedural language | It is used as a declarative SQL |
It does not support PARTITIONS | Hive supports PARTITIONS |
PIG can't start thrift based server | Hive can start an optional thrift based server |
PIG don't have dedicated metadata of database | It defines tables before hand (schema) + stores schema information in database |
PIG supports Avro file format | Hive does not support Avro |
It supports additional COGROUP feature for performing outer joins | It doesn't support COGROUP |
Comments