MapR Embraces Ecosystem Projects

Apache Hadoop has inspired a rich ecosystem of projects and products that benefit everyone interested in Big Data. With so many options available, I frequently am asked whether MapR supports a specific project - examples include Impala, Knox, Storm and Falcon - so I thought it would make sense to provide an overview that will help explain how this works for MapR.

There are hundreds of open source and commercial projects that relate to Hadoop in one way or another. These projects can be divided into two categories:

1. Included in the MapR distribution. As a Hadoop distribution, our product includes over a dozen projects, such as Hive, Pig, Oozie, Flume and Sqoop. These projects are part of our product. We test, harden and support these projects. If there's a bug in Hive, for example, we'll make the necessary code changes to Hive to address that bug. Over time, based on customer demand and the dynamics of the Hadoop ecosystem, we add projects to our distribution. For example, we are adding Hue, Impala and YARN to the distribution.

2. Run on (or work with) our distribution. There are many projects that run on or connect to Hadoop in one way or another. For example, customers can use Spark, Shark, Storm, Pentaho, Datameer, Platfora and hundreds other projects and products with Hadoop. Needless to say, because MapR includes Apache Hadoop, any project that works with Hadoop works with the MapR distribution. It's also worth noting that any application that can read and write files will work with MapR thanks to our POSIX interface, even if that application wasn't designed for Hadoop.

Note that the projects in both categories run better with MapR. They run faster due to the performance advantages of our data platform. Their data is protected thanks to our data protection and disaster recovery capabilities. And in some cases, we make it possible to use these projects in ways that are not otherwise possible. For example, Storm can feed directly from our data platform because we support simultaneously writing to and reading from a file, whereas HDFS does not because it requires that the file must be closed before it can be read.

If you’re interested in more information about the Hadoop ecosystem projects that are included in our distribution, check out the release notes for our latest version.

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free