Insights from Strata Big Data Conference

This was an exciting week at the Strata Big Data conference. Our CEO, John Schroeder delivered a short keynote, Ted Dunning presented on moving Beyond Hadoop and included a glimpse of real-time streaming with MapR and Storm integration. I also presented an overview of Apache Drill in a standing room only session.

There have also been interesting news announcements including the world record MinuteSort record set with MapR on the Google Compute Engine and Hadoop distribution announcements from EMC and Intel. We’re happy to see the new Hadoop announcements because they further validate the importance of the Hadoop market and put the focus on differentiation across the various distributions. I think it’s interesting that Cloudera and Hortonworks quickly turned to their blogs to discredit these announcements. In their blog posts they both talked about the history of Hadoop and how many Apache committers they employ, but neither talked about differentiated functionality or providing value to customers.

One could claim that neither Intel nor EMC provided any significant value add for Hadoop. Intel announced a distribution inclusive of Apache Hadoop with plans to enhance security in the future, while Greenplum announced yet another SQL-on-Hadoop project (really just the ability to run its legacy Greenplum database on HDFS instead of XFS) on top of their existing Greenplum HD distribution. However, you could also look at these and claim that there is very little gap between these announced distributions and the distributions from Hortonworks and Cloudera.

Packaging and supporting an open source project and providing a management suite is not rocket science, so I would not be surprised to see even more companies announce their own Hadoop distributions. Developing the technology to address the platform's limitations and provide a true enterprise-grade solution is much harder, and MapR is currently unique positioned as the only company that has been able so.

MapR’s relentless focus on addressing the core issues and limitations of Hadoop and providing the best support to customers is what truly distinguishes MapR from all other distributions. For example, disaster recovery requires more than an audit trail showing that you’ve lost data, or a GUI wrapper for the MapReduce-based distcp command. You need snapshots and mirroring to support recovery point and recovery time objectives. The announcements from Intel and EMC and the subsequent blog posts from Cloudera and Hortonworks make MapR’s differentiated features stand out much more prominently. Whether there are three distributions on the market or eight distributions, MapR is still the only distribution that provides random read/write support, NFS access, multi-tenancy, a no-NameNode architecture, JobTracker HA, snapshots, mirroring and best-in-class management and performance.

To summarize, there’s a reason MapR has more production deployments than any other Hadoop distribution.

Welcome to the new entrants.

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free