MapR and MapReduce 2.0 – The Integrated Next Generation Distribution

MapReduce 2.0 is the codename for a new execution engine for Hadoop (developed primarily by Yahoo! engineers that are now at HortonWorks). MapReduce 2.0 is expected to become available in the next major release of Hadoop (0.23). The source code directory structure can be accessed at

The fundamental idea in MapReduce 2.0 is the splitting of the existing JobTracker’s roles – resource management and job lifecycle management. MapReduce 2.0 provides many benefits over the existing MapReduce framework, such as better scalability (through distributed job lifecycle management) and support for multiple Hadoop MapReduce API versions in a single cluster. These benefits are complementary to MapR’s existing advantages in the MapReduce layer, including MapR’s direct shuffle (which makes the shuffle 4-5x faster), and MapR’s ability to maintain all running tasks in the event of a JobTracker failure (or MapReduce ApplicationMaster, in MapReduce 2.0 terminology).

We are currently integrating the MapReduce 2.0 framework with our next generation distribution, and will release it once the MapReduce 2.0 framework is stable. The combination of our distribution’s speed, business continuity (HA, snapshots, mirroring) and NFS access, with the advantages of the MapReduce 2.0 framework, will take Hadoop to the next level and make the platform more appealing to both existing and new Hadoop users.

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free