Apache MyriadTM

Apache Myriad is an open source Hadoop project that lets YARN applications run side by side with Apache Mesos frameworks. It does this by registering YARN as a Mesos framework, requesting Mesos resources on which to launch YARN applications. This allows YARN applications to run on top of a Mesos cluster without any modification.

Why Use Myriad?

Myriad is useful for organizations that use Hadoop with Docker and/or Apache Mesos and want to create a converged application environment between their enterprise applications and analytics. It lets you run Hadoop YARN applications on top of Apache Mesos clusters. This lets you share all resources, including data, across different workloads to improve time-to-value.

The combination of YARN, Docker, and Mesos makes up key components of the Zeta Architecture. The Zeta Architecture is a modern enterprise data architecture that simplifies business processes and defines a fast and scalable way to integrate data into your business.

Benefits

With Myriad, you get the benefits of a big data virtualization environment, including:

  • Lower TCO through elasticity. Analytics tasks can take more cluster resources opportunistically when other applications don’t need them, and vice versa. This level of virtualization leads to significantly improved resource utilization.
  • Reduced data movement. By letting tasks share resources across your data center in a common storage layer, you simplify your environment and thus reduce errors by avoiding extra processes that move data across independent, task-specific clusters.

The advantages of the MapR Platform for big data virtualization go even beyond integration with Hadoop YARN. A containerized environment that uses Docker gains benefit from the MapR Platform with:

  • Persistent storage for your Docker containers. The MapR file system (MapR-FS) lets you run any type of application in a container, including those that need to save state to disk in a read-write manner.
  • Unlimited redeployments. Containers are frequently redeployed across many different nodes in your data center. With MapR, your containers can write an unlimited number of files that can be quickly and easily accessed again by redeployed containers due to the scalable, distributed architecture of MapR-FS. This significantly reduces the administrative overhead compared to using a non-distributed file system, or using a file system like HDFS with relatively small file count limits.