Large clusters that store enterprise big data for the long run, while exposing that data to a variety of workloads at the same time, are turning out to be the preferred deployment option for Hadoop. This model makes it easy for businesses to avoid data silos and progressively build a full suite of big data applications over time.
Apache YARN is a great enabler of such a model. YARN, as it progresses towards enterprise readiness, promises dynamic resource utilization on Hadoop—providing users the choice to run a variety of applications on Hadoop without worrying whether the system can handle the workloads. The YARN advantage becomes much more apparent when it comes to real-time operational applications on Hadoop—one of the fastest growing customer use cases at MapR.
Hadoop ecosystem components that enable real-time applications are growing rapidly. For instance, real-time stream processing on Hadoop is being adopted widely and can be deployed through several options today including Apache Spark Streaming and Apache Storm. These real-time technologies are now enabled on YARN on MapR. If the user is not ready for YARN yet, we obviously support the pre-YARN model as well and provide the necessary assistance to port the applications to YARN.
Furthermore, DataTorrent, one of our commercial partners who specializes in providing real-time stream processing for big data, recently certified their DataTorrent RTS offering on the YARN package on MapR.
Beyond the real-time components, MapR supports the remaining Hadoop stack via YARN including commonly deployed applications such as MapReduce and HBase. In the near future, as customers continue to deploy centralized data hubs and lakes, we fully expect a good mix of YARN and non-YARN applications running on Hadoop and we look forward to supporting users in that mode.
Do share your thoughts and ideas on what you think about YARN. If you would like to share a YARN app or utility that you have built, please visit our new App Gallery for Hadoop.