Following is the life cycle of a typical MapReduce job and the roles of the primary actors.The full life cycle are more complex so here we will concentrate on the primary components.
The Hadoop configuration can be done in different ways but the basic configuration consists of the following.
- Single master node running Job Tracker
- Multiple worker nodes running Task Tracker
Following are the life cycle components of MapReduce job.
- Local Job client: The local job Client prepares the job for submission and hands it off to the Job Tracker.
- Job Tracker: The Job Tracker schedules the job and distributes the map work among the Task Trackers for parallel processing.
- Task Tracker: Each Task Tracker spawns a Map Task. The Job Tracker receives progress information from the Task Trackers.
Once map results are available, the Job Tracker distributes the reduce work among the Task Trackers for parallel processing.
Each Task Tracker spawns a Reduce Task to perform the work. The Job Tracker receives progress information from the Task Trackers.
All map tasks do not have to complete before reduce tasks begin running. Reduce tasks can begin as soon as map tasks begin completing. Thus, the map and reduce steps often overlap.