Editor's note: This post was a result of the combined efforts of Swapnil Daingade, Sarjeet Singh, and Mitra Kaseebhotla.
We recently had the pleasure of participating in Docker Global Hack #3. We are happy to announce that we won first place in the “freestyle” category, which meant that our solution had to use features from the latest Docker releases, including Engine and other Docker OSS projects. In this blog post, we’ll go into detail about our solution for providing YARN clusters on shared infrastructure which have the same security and performance.
Businesses constantly strive for higher utilization and better returns from their data center investments. This need is causing businesses to explore consolidating their infrastructure into one giant cluster instead of several siloed ones.
Businesses currently have several Hadoop/YARN clusters that need to be separate for various reasons (security, performance, etc.). Is it possible to provide YARN clusters on shared infrastructure which have the same security and performance?
A recent project called Apache Myriad (incubating) attempts to solve this problem. It uses Apache Mesos as the underlying resource scheduler. It currently supports running a single YARN cluster on top of a Mesos cluster. Businesses can share resources between YARN and other applications such as web servers, Spark, etc. As active contributors to Myriad, we are exploring options to provide multi-tenancy and tenant isolation in Myriad. We have tried to use the experimental networking plugin for creating overlay networks for network isolation. With containers, compute isolation is free. For storage, we looked at multiple options: 1) Providing each cluster their own storage (loopback or raw devices attached to containers, and 2) Providing isolation via file system permissions to a single shared instance of DFS.
YARN consists of a central daemon called ResourceManager (RM). Applications running on YARN can request resources from the RM in the form of YARN containers (blocks of CPU, mem). Each node in the cluster has a daemon that manages resources on that node called the NodeManager (NM).
Mesos is a lightweight resource scheduler. Applications running on Mesos are called frameworks. Unlike YARN, it is Mesos that makes resource (CPU, mem) offers to frameworks. If a framework can utilize an offer, it accepts the offer and launches a Mesos task (a process) on the node on which the resource was offered. The Mesos task is not complete until the process terminates.
In Myriad, the RM is launched using a meta framework like Marathon (open source). Marathon accepts an offer from Mesos and launches RM as a Mesos task. Myriad has code running inside the RM process that registers with Mesos as another framework. We call this code Myriad Scheduler. Note that since RM has now registered as a Mesos framework, it can launch NMs after accepting offers from Mesos.
Myriad currently does not support Docker containers for RM and NM. It also does not support multi-tenancy or isolation from other non-YARN applications such as web servers. Following are details of what we tried as part of the hackathon. Mesos is deployed on the physical cluster. RMs and NMs are in containers. RMs and NMs belonging to the same YARN cluster can talk to each other. Those that belong to different YARN clusters cannot talk to each other.
Network Configuration and Isolation
The diagram above shows how we configured networking on our physical cluster. For demonstration purposes, we assume that there are three YARN clusters running on our physical cluster. We create an overlay network per YARN cluster that spans all nodes of the physical cluster. We use Docker’s overlay network plugin. However, membership of an overlay network does not allow a container to talk to the underlay network. This is a problem for us. If RM is to run inside a Docker container, it has to have the ability to register with Mesos as a Mesos framework in order to launch NMs (The Mesos infrastructure is running on the underlay network).
Also, we would like to support a configuration where there is only one instance of Distributed File System (HDFS, MapR-FS, etc.) across multiple YARN clusters. The YARN clusters using this single DFS instance are isolated by file system permissions. Hence, in order to allow Docker containers to access the DFS, we created a bridge per cluster on every host (just like the docker0 bridge that is present after Docker is installed.) Just like on the docker0 bridge, NAT is configured so containers can access the underlay network easily, but cannot access containers belonging to another cluster.
This configuration allows RM (which also acts as a Mesos framework) to register with Mesos running on the underlay network. Figure 2 (below) shows what this network configuration looks when seen on a three-node physical cluster. See scripts
OnDemandYARNClusters/scripts/docker-create-worker.sh to see how this is configured.
When RM (acting as a Mesos framework) decides to launch a NM container, it accepts an offer from Mesos and asks Mesos to run the
OnDemandYARNClusters/scripts/docker_deploy.sh script (present on each physical server).
This script takes three parameters:
- The Mesos task id of the NM container being launched
- The overlay network to connect to. (Remember that all overlay networks are available on all physical hosts).
- The Docker image to run for the NM.
This script configures storage and networking (connecting the container to the right bridge so it can talk to the underlay network in addition to the overlay network).
We also modified the Myriad code so RM (acting as a Mesos framework) could generate the command for NM Docker container creation. A patch for myriad (phase1 branch) is provided here: OnDemandYARNClusters/myriad-docker-hack-day-3.patch/p>
One can also build Myriad with this patch by following these steps:
git clone https://github.com/sdaingade/myriad.git
git checkout docker-hack-day
./gradlew clean; ./gradlew build
Finally, Docker files for the MapR Distribution including Hadoop are also provided under:
Storage: We tried to use loopback devices exposed to Docker containers for running the DFS (i.e., one DFS instance per YARN cluster). In the future, we plan to try have a single instance of the DFS shared across multiple YARN clusters, with isolation provided by using file system permissions.
Status: We were able to execute all the pieces in isolation (e.g., networking, RM and NM creation, storage configuration, etc.). We did run out of time during integration, but we are very confident that with a little more time, we’ll be able to get this working end to end.
In this blog post, you learned about the solution we came up with at Docker Global Hack Day for providing YARN clusters on shared infrastructure which have the same security and performance.
If you have any questions about our solution, please ask them in the comments section below.