The fundamental idea in MapReduce 2.0 is the splitting of the existing JobTracker’s roles – resource management and job lifecycle management. MapReduce 2.0 provides many benefits over the existing MapReduce framework, such as better scalability (through distributed job lifecycle management) and support for multiple Hadoop MapReduce API versions in a single cluster. These benefits are complementary to MapR’s existing advantages in the MapReduce layer, including MapR’s direct shuffle (which makes the shuffle 4-5x faster), and MapR’s ability to maintain all running tasks in the event of a JobTracker failure (or MapReduce ApplicationMaster, in MapReduce 2.0 terminology).
We are currently integrating the MapReduce 2.0 framework with our next generation distribution, and will release it once the MapReduce 2.0 framework is stable. The combination of our distribution’s speed, business continuity (HA, snapshots, mirroring) and NFS access, with the advantages of the MapReduce 2.0 framework, will take Hadoop to the next level and make the platform more appealing to both existing and new Hadoop users.