Hadoop as an Enterprise Data Hub was discussed in depth by Mike Ferguson in his May 2013 paper “Offloading and Accelerating Data Warehouse ETL Processing Using Hadoop.” Mike was a principal and co-founder of Codd and Date Europe Limited (the inventors of the Relational Model) and was also a Chief Architect at Teradata. In his paper, Mike delineates the requirements for an enterprise data hub and why the MapR Distribution for Hadoop is best suited to serve this purpose. The platform capabilities that he discusses include full data protection, business continuity and availability features to form the foundation for cleansing, transforming and integrating structured and multi-structured data from multiple sources.
Mike notes that “MapRʼs data protection and disaster recovery capabilities make MapR Hadoop distributions suitable for long-term storage of Big Data and data warehouse archived data, which can then be selectively re-processed in specific analyses.”
MapR invested several years of engineering effort to re-architect a data platform for Hadoop so it could support such enterprise-grade capabilities. Other distributions claim enterprise functionality without the right platform to support it, and such false expectations are setting users up for grand failure—at an enterprise level. The facts are:
• Only MapR provides automated stateful failover, disaster recovery through snapshots and mirrors, and full data protection against user and application errors.
• MapR eliminates downtime associated with HBase applications with instant recovery and provides consistent low latency support with no compactions and no Java garbage collection.
• Even with multiple hardware or software outages and errors, applications will continue running without any administrator actions required.
• MapR’s distributed, No NameNode HA architecture provides fast recovery. On a large cluster, MapR can recover from a 1000 node outage within three minutes. The same recovery would take over 24 hours on any other Hadoop distribution.
The MapR Distribution for Hadoop is best suited to meet the requirements of an enterprise data hub. MapR’s full data protection, business continuity and disaster recovery features make MapR the best choice for companies who are moving towards an enterprise data hub solution in order to maintain their competitive advantage.