Map Reduce is heart of Hadoop Process and it is meant for processing huge data. We can see process flow and Life Cycle of Map Reduce.
Phase | Input | Output |
---|---|---|
Mapper | Key, Value | Key, Value |
Sort & Shuffle | Key, Value | Key, List(Values) |
Reducer | Key, List(Values) | Key, Value |
Map Reduce Life Cycle:
Job Tracker and Task Tracker are two daemons which are primarily responsible for Map Reduce Job Execution. Below are the steps that taken place during the Map Reduce process.
- When the client submits input which is in the form of .jar file (driver code, mapper code and reducer code) will be received by Job Tracker always.
- By taking mapper business logic from the jar file, Job Tracker will initiate mapper phase on all the available Task Trackers.
- Once the assigned Task Trackers are done with mapper phase completely (100%), they will send status to the Job Tracker.
- Upon the completion of the mapper task (100%), Job Tracker will initiate Sort & Shuffle phase on the mapper outputs.
- Once Sort & Shuffle is completely done (100%), Job Tracker will initiate reducer phase on all the Task Trackers by taking the business logic from the Jar file
- Once all the assigned Task Trackers done with reducer processing (100%), they will respond back Job Tracker with their outputs. Job Tracker will consolidate the final output and sends the report the client.
Comments