Functionality of different components in MapReduce job

Job Client: Job client performs the following tasks

  • Validates the job configuration
  • Generates the input splits. This is basically splitting the input job into chunks
  • Copies the job resources (configuration, job JAR file, input splits) to a shared location, such as an HDFS directory, where it is accessible to the Job Tracker and Task Trackers
  • Submits the job to the Job Tracker

Job Tracker: Job Tracker performs the following tasks

  • Fetches input splits from the shared location where the Job Client placed the information
  • Creates a map task for each split
  • Assigns each map task to a Task Tracker (worker node)

After the map task is complete, Job Tracker does the following tasks

  • Creates reduce tasks up to the maximum enabled by the job configuration.
  • Assigns each map result partition to a reduce task.
  • Assigns each reduce task to a Task Tracker.

Task Tracker: A Task Tracker manages the tasks of one worker node and reports status to the Job Tracker.

Task Tracker does the following tasks when map or reduce task is assigned to it

  • Fetches job resources locally
  • Spawns a child JVM on the worker node to execute the map or reduce task
  • Reports status to the Job Tracker