Label-Based Scheduling for Hadoop Clusters

Hadoop and distributed data processing are being increasingly adopted. Everybody is in the race to process more data faster and at the same time allow multiple applications running within the same cluster. Not surprisingly, new challenges and growing pains are emerging, especially in multi-tenant production environments.

More and more cluster administrators are getting hit with a few distinct problems:

  1. My hardware is heterogeneous (availability of different CPU, memory and disk resources)
  2. My network is heterogeneous
  3. I have different types of applications running with different workloads during different times of the day

How do I solve those issues?

A couple of years ago, MapR came up with a new feature that can help solve a lot of those issues. That feature is called “Label-based scheduling.”

Using this feature, the administrator of a cluster can label all the nodes (can use regex or glob to name the nodes) in the cluster with multiple labels. By specifying label expressions with simple logical semantics, users could then submit jobs to run only on the nodes that satisfy the expression.

Format of the expressions
logical operators : ||, &&, ! normal expression like (a || b || !c)

Let’s look at the concrete example. Here is a Hadoop Cluster “A”:

Cluster administrator labels all the nodes in the cluster according to their specifications:

  • nodes with large memory - “red”
  • nodes with high CPU - “blue”
  • nodes with high network bandwidth - “green”
  • nodes on rackA - “rackA”
  • nodes on rackB - “rackB”

Cluster administrator has also defined multiple queues to divide resources among different organizations, so each org has its share of resources to use.

In addition, cluster is heterogeneous in different aspects:

  • it has different types of hardware (some nodes have more memory, some more CPU, some better network bandwidth)
  • data in distributed file system is broken into topologies that are tied to particular set of nodes/racks
  • other differences

If we want all the applications submitted to Queue 1 to run on “rackA_red” or “rackA_blue” nodes, cluster admin will just define label expression while configuring Queue 1 as: “(red || blue) && rackA”.

In addition, cluster admin configures a Queue Policy that defines scheduling behavior in case submitted application defines its own label expression. Admin can define such that:

  • Queue Label Expression always wins (job runs on the nodes that satisfy Queue Label Expression)
  • Job Label Expression always wins (job runs only on nodes that satisfy job label expression)
  • There is an AND condition between the above (job runs on the nodes that satisfy Queue AND Job Label Expressions)
  • There is an OR condition between the above (job runs on the nodes that satisfy either Queue or Job Label Expressions)

PREFER_QUEUE("PREFER_QUEUE"), // Queue label expression always wins
PREFER_APP("PREFER_APP"), // App label expression always wins
AND("AND"),  // Use && on Queue and App label expressions  ← default
OR("OR");  // Use || on Queue and App label expressions

All this can be setup in few simple steps:

  1. Create a file with node to labels mapping perfnode.* red, blue, green scale.* rackA , red perfnode30 rackB, blue

  2. Copy this file to maprfs. hadoop fs -copyFromLocal <local_file> <path on maprfs>

  3. Add following properties to mapred-site.xml <property> <name>mapreduce.jobtracker.node.labels.file</name> <value>/labels.file</value> <description> Location of the file that contain node labels on DFS </description> </property>

    If there is no need to set any labels on the queues, it would conclude the setup.

  4. If setting labels expressions on the queues - few more properties to set up per queue that will have label expression associated with them (this is old style queue setup - new style is shown below in “YARN section”)

    For default queue
      <value>red || blue</value>

To submit a job, you would specify label expression either on command line:
Example for wordcount job:
hadoop jar hadoop-mapreduce-examples*.jar wordcount -Dmapreduce.job.queuename=Marketing.CustomerDataAnalysis.FastLanes -Dmapreduce.job.label=”green && rackA” input output

Or in mapred-site.xml with the same property: mapreduce.job.label

There are two new commands to show and refresh labels:

hadoop job -showlabels (dumps label information of all active nodes)
hadoop job -refreshlabels (Tells JobTracker to update label information - reread the file with labels)

How does Label Based Scheduling Fit into YARN

Since the MapR 4.0.1 release, label-based scheduling is supported for virtually any YARN application, as long as that application has a way to set labels on resource requests from Application Master to ResourceManager.

*MRAppMaster is fully supported out of the box.

Let’s see what properties and where you need to be setup to work with YARN:

1. Required - if using label-based scheduling:

   <description>Location of the file that contain node labels on FS

2. Optional (default to 2 min.)

   <name>node.labels.monitor.interval </name>
   <description> Interval to check labels file for updates 

Labels setup for queues depends on the scheduler

Capacity Scheduler

Capacity Scheduler configuration is located in capacity-scheduler.xml

Following are new properties that can be defined for a particular queue to support Label Expression and/or Label Policy (Example is given to set Label Expression and Queue Label Policy for Queue “alpha”)

     Queue Label Policy

      Queue Label Expression

In case Queue “alpha” has children queues that do not have labels set on them, label and label policy from parent queue will apply. In case children queues have their own labels and/or label policies, they will be used.

Below is an example for “alpha.a1” label configuration:

      Queue Label Policy

      Queue Label Experession

Fair Share Scheduler

Fair Share Scheduler can be configured in any xml file as long as file name is defined in yarn-site.xml
To configure label expressions and/or label policies additional properties - label, labelPolicy - are used. Below is an example of a Queues configuration in FairScheduler where label and labelPolicy can be defined on any level - the closest will be used.

<queue name="Marketing">

    <queue name="WebsiteLogsETL">

    <queue name="CustomerDataAnalysis">
      <queue name="FastLanes">
      <queue name="Regular">


Note: While making changes to capacity-scheduler.xml or fair share scheduler configuration xml file, use xml encoding for following special characters.

Admin commands to show and refresh labels for any YARN applications are: yarn rmadmin -showlabels yarn rmadmin -refreshlabels

Community adoption

The Hadoop community has certainly felt that Labels is a really useful feature, and YARN-796 was open in April to support the use of labels.

We have submitted our Engineering Design Proposal as well as code patch based on MapR experience with labels during the last two years.

We are glad that the community took our proposal and the majority of the code patch into consideration while completing work on YARN-796.

It proves once again the usefulness of label-based scheduling , the MapR foresight in building this feature two years ago, and the flexibility in making it available for YARN now.


Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free