In the event that a TaskTracker is not performing properly, it can be blacklisted so that no jobs will be scheduled to run on it. There are two types of TaskTracker blacklisting:
- Per-job blacklisting, which prevents scheduling new tasks from a particular job
- Cluster-wide blacklisting, which prevents scheduling new tasks from all jobs
Per-Job Blacklisting
The configuration value mapred.max.tracker.failures in mapred-site.xml specifies a number of task failures in a specific job after which the TaskTracker is blacklisted for that job. The TaskTracker can still accept tasks from other jobs, as long as it is not blacklisted cluster-wide (see below).
A job can only blacklist up to 25% of TaskTrackers in the cluster.
Cluster-Wide Blacklisting
A TaskTracker can be blacklisted cluster-wide for any of the following reasons:
- The number of blacklists from successful jobs (the fault count) exceeds mapred.max.tracker.blacklists
- The TaskTracker has been manually blacklisted using hadoop job -blacklist-tracker <host>
- The status of the TaskTracker (as reported by a user-provided health-check script) is not healthy
If a TaskTracker is blacklisted, any currently running tasks are allowed to finish, but no further tasks are scheduled. If a TaskTracker has been blacklisted due to mapred.max.tracker.blacklists or using the hadoop job -blacklist-tracker <host> command, un-blacklisting requires a TaskTracker restart.
Only 50% of the TaskTrackers in a cluster can be blacklisted at any one time.
After 24 hours, the TaskTracker is automatically removed from the blacklist and can accept jobs again.
Blacklisting a TaskTracker Manually
To blacklist a TaskTracker manually, run the following command as the administrative user:
hadoop job -blacklist-tracker <hostname>
Manually blacklisting a TaskTracker prevents additional tasks from being scheduled on the TaskTracker. Any currently running tasks are allowed to fihish.
Un-blacklisting a TaskTracker Manually
If a TaskTracker is blacklisted per job, you can un-blacklist it by running the following command as the administrative user:
hadoop job -unblacklist <jobid> <hostname>
If a TaskTracker has been blacklisted cluster-wide due to mapred.max.tracker.blacklists or using the hadoop job -blacklist-tracker <host> command, un-blacklisting requires a TaskTracker restart. If a TaskTracker has been blacklisted cluster-wide due to a non-healthy status, correct the indicated problem and run the health check script again. When the script picks up the healthy status, the TaskTracker is un-blacklisted.