Project Myriad - No Hadoop is an Island

As part of working at MapR, we live and breathe Apache Hadoop. And we use Hadoop to help customers solve difficult business problems that would be intractable otherwise.

Last year, about six months after shipping our first version of Hadoop 2.x with YARN, multiple customers asked us to consider working with Apache Mesos. Our early response was that of curiosity. Why are multiple customers asking us to work with Mesos when we just released YARN?  YARN is a Hadoop cluster resource manager. We were also spending effort making YARN betteradding label-based scheduling, disk as a resource, and generally improving YARN high availability at that time.

Upon further inquiry, it became clear that the need was not necessarily around Hadoop, but beyond Hadoop. There is a need for a global resource manager that helps across entire data centers (or even beyond that).

This is when Santosh Marella (a MapR engineer) got involved. He started conversations with Adam Bordelon from Mesosphere, and hiked up to the Mesosphere San Francisco office for a hackathon. Within a week, they had a working prototype to make Mesos and YARN co-operate and integrate with each other. The idea was to use Mesos as a data center-wide resource manager and continue to use YARN for Hadoop resource management.

Independently, about a week earlier, Mohit Soni (an eBay engineer) had presented a prototype solution for solving the same problem at MesosCon in Chicago.

It was quickly apparent that we were all working on the same problem. Adam connected Santosh and Mohit together to explore the possibility of a collaboration. After a detailed technical pow-wow, they decided it was better to join together, combine ideas from both solutions, and unite the efforts.

Myriad was born out of this (with naming credits going to Mohit).

Benefits of Myriad

Once you allow Mesos and YARN to talk to each other, a new set of possibilities open up.

Let’s focus on the benefits to the Hadoop ecosystem first:

  • Run Hadoop elastically across your data center resources. This allows Hadoop to opportunistically use other DC resources that may be lightly loaded for long periods of time.
  • Run multiple YARN sub-clusters (virtual Hadoop compute clusters) on the same DFS. Share data services and run multiple YARN sub-clusters. This is extremely useful for multi-tenant shared services clusters, as well as being able to have dev, staging and production compute clusters on the same physical infrastructure.
  • Scale up and scale down YARN sub-clusters.
    • Allows the ability to increase or decrease resources to YARN sub-clusters on demand.
  • No rewrite of YARN apps is required to run with Myraid.
  • Possibly even run different versions of YARN on the same system.

Beyond Hadoop, there are several benefits as well:

  • Run many types of compute frameworks side by side. This includes frameworks that are supported on YARN as well as those that are already supported on Mesos.
  • Logically split a physical cluster into multiple virtual clusters.
    • Use containers for isolation.
  • Move resources between virtual clusters as needs evolve.

Current Status

A diverse team of engineers has collectively decided to push this effort even further. We welcome any and all input, as well as participation. The early code base is available at github but it is not production ready.

To Learn More

To learn more about YARN and MESOS:

1) Read/watch our YARN and Mesos Whiteboard Walkthrough

2) Attend one of the Strata presentations or visit our MapR booth:

YARN vs. MESOS: Can’t We All Just Get Along?

  • 2:20pm–3:00pm Friday, 02/20/2015
  • Hadoop & Beyond
  • Location: 230 C
  • Speaker: Ted Dunning

Maintaining Low Latency While Maximizing Throughput on a Single Cluster

  • 4:50pm–5:30pm Thursday, 02/19/2015
  • Hadoop Platform
  • Location: 210 B/F
  • Speaker: Yuliya Feldman

3) Stop by the MapR Myriad Demo Pod at Strata to meet the technical brains behind the project and discuss your use cases.


Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free