Our View on Open Data Platform

MapR was invited to participate in the Open Data Platform initiative and declined after carefully considering the value to the market place. The announced Open Data Platform benefits Hortonworks marketing and provides a graceful market exit for Greenplum Pivotal. Should the Open Data Platform charter change in the future, we might consider participation. But for now here are our concerns:

Open Data Platform is redundant with Apache Software Foundation Governance

The Apache Software Foundation has done a wonderful job governing Hadoop, resulting in the Hadoop standard in which applications are interoperable among Hadoop distributions. Apache governance is based on a meritocracy that doesn’t require payment to participate or for voting rights. The Apache community is vibrant and has resulted in Hadoop becoming ubiquitous in the market in only a few short years.

Open Data Platform is “solving” problems that don’t need solving

Companies implementing Hadoop applications do not need to be concerned about vendor lock-in or interoperability issues. Gartner analysts Merv Adrian and Nick Heudecker disclosed in a recent blog that less than 1% of companies surveyed thought that vendor lock-in or interoperability was an issue—dead last on the list of customer concerns. Project and sub-project interoperability are very good and guaranteed by both free and paid-for distributions. Applications built on one distribution can be migrated with virtually zero switching costs to the other distributions.

Open Data Platform “core” is misdefined

The Open Data Platform “core” definition is vendor-biased. The “core” is defined as MapReduce, YARN, Ambari and HDFS. MapReduce is broadly used, but many resource managers and computing frameworks, including YARN, Spark and Mesos, are gaining market share. Ambari is used by less than 25% of the market. Hadoop was architected to support plug-and-play alternative technologies to HDFS. HDFS was built to serve as secondary storage for batch Hadoop processing. Many production use cases requiring POSIX-compliant storage replace HDFS with MapR, IBM GPFS, EMC Isilon, or NetApp.

Open Data Platform participation lacks participation by the Hadoop leaders

~75% of Hadoop implementations run on MapR and Cloudera. MapR and Cloudera have both chosen not to participate. The Open Data Platform without MapR and Cloudera is a bit like one of the Big Three automakers pushing for a standards initiative without the involvement of the other two.

The Open Data Platform is not open unless equal voting rights are provided to the leading Hadoop distributions. The Open Data Platform has not disclosed how governance is done, but it is a different model than the preferred and fair meritocracy used by the Apache Software Foundation.

Questions to be answered include:

  • Is it pay-to-play? What are the dues and fees?
  • How do fees affect voting rights?
  • Where do the funds go?
  • Why was the Open Data Platform announced just as Pivotal exited the Hadoop market and reportedly laid off many engineers?
  • Have all of the Open Data Platform members really committed? What dues did they pay? Do subsequent members pay the same dues and gain the same voting rights?

We are committed to community involvement and cooperation to drive technology advancement that drives customer value. Until it is clear that the structure, focus, and participation of the ODP can effectively address key customer concerns, we will continue to focus our efforts on the Apache Software Foundation.  

no

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams

 

 

 

Download for free