6. How MapReduce Works Jari-Pekka Voutilainen
MapReduce Implementations Apache Hadoop has 2 implementations of MapReduce: Classic MapReduce (MapReduce 1) YARN (MapReduce 2)
Classic MapReduce The Client JobTracker TaskTrackers Distributed FileSystem (usually HDFS)
Job Submission Ask JobID from JobTracker Validate output specification of the job Compute input splits for the job Copy resources for the job (job JAR file, configuration and input splits) Tell JobTracker that the job is ready to run
Job Initialization Job object encapsulates tasks and bookkeeping of progress List of tasks consists of one map task for every input split, reduce tasks set by the job, job setup and job cleanup tasks
Task assignment TaskTrackers send heartbeat messages to JobTracker: TaskTracker is still alive and messages contain payload information for task assignment. JobTracker chooses job according to scheduling algorithm.
Task Assignment TaskTrackers have fixed number of map and reduce slots If there is a free map slot, map task is chosen. Otherwise reduce task is chosen.
Task execution TaskRunner launches new JVM for each task. (It is possible to reuse JVMs for later tasks) Task progress is reported every few seconds.
Job Completion When the JobTracker receives notification from TaskTracker that the last task of the job is complete, status of job is changed to successful. Client polls the job and eventually notices that the job is finished.
YARN Classic MapReduce has scalability issues around 4000 nodes and higher. YARN splits responsibilities of the jobtracker into separate entities Jobtracker takes care of job scheduling and task progress monitoring.
YARN ResourceManager manages resources across the cluster Application Master manages lifecycle of application. Each MR job has a dedicated Application Master, which runs for duration of application
YARN YARN is more general than Classic MapReduce. Classic MapReduce is just one type of YARN application. Same cluster can run different YARN applications.
YARN entities The Client Resource manager Node manager, which launch and monitor containers Application master, which coordinates tasks Distributed filesystem
Job Submission Similar to Classic MapReduce JobID is retrieved from ResourceManager Job is submitted to Resourcemanager
Job Initialization ResourceManager allocates container and launches application master inside the container Application master initializes job as in Classic MapReduce
Job Initialization AppMaster decides how to run the tasks of the job. If the job is small, AppMaster may choose to run tasks in the same JVM as itself. Larger tasks are executed in their own containers and JVMs. The choice is done by judging the overhead of JVM creations.
Task Execution Once a task is assigned to a container by the resource managers scheduler, appmaster starts the container. Progress is reported to AppMaster which is polled by the client.
Failures Failures in Classic MapReduce: failure of task failure of TaskTracker failure of JobTracker
Task Failure Exception in map or reduce tasks, TaskTracker marks the task as failed. JVM suddenly exists, TaskTracker marks the task as failed. Hanging tasks stop sending progress updates, TaskTracker kill the JVM and task is marked as failed.
Task Failure JobTracker reschedules the failed task to different TaskTracker if possible If the task has failed 4 or more times, it will not retried again. If the task fails 4 times, the whole job fails.
TaskTracker Failure If the TaskTracker crashes or runs very slowly, the JobTracker notices this from missing heartbeats. Successful map tasks are rescheduled to different TaskTracker if they belong to incomplete job. All tasks in progress are also rescheduled.
TaskTracker Failure TaskTracker may be blacklisted by JobTracker If 4 or more tasks from the same job has failed on a particular TaskTracker, JobTracker records this as fault. When minimum threshold of faults is exceeded, TaskTracker is blacklisted. Faults expire over time (one per day), TaskTrackers get a chance to run jobs again.
JobTracker Failure Single Point of Failure in Classic MapReduce All running jobs fail After restart, All jobs must be resubmitted.
Failures in YARN Task Failure, same as in Classic MapReduce AppMaster Failure ResourceManager Failure
Application Master Failure Resource Manager notices failed AppMaster Resource Manager starts a new instance of AppMaster in new container Client experiences a timeout and get a new address of AppMaster from ResourceManager
Resource Manager Failure Resource Managers have checkpointing mechanism which saves its state to persistent storage. After crash, administrator brings new Resource Manager up and it recovers saved state.
Speculative Execution If Hadoop detects that some task is slower than normal, another equivalent backup task is launched. Which ever completes first, the second one is killed immediately. Optimization, not a feature to make jobs run more reliably.