What does it mean to be “Lights Out Data Center Ready”? It means that any failures whether hardware, software or user errors do not require immediate administrator action. On a scheduled basis administrators can visit the data center and perform maintenance that is now routine, not an emergency. Picture an administrator with a shopping cart full of disk drives casually moving through the aisle.
In discussions with customers it is immediately clear that they are confused by the various descriptions of Hadoop High Availability. My personal favorite is another vendor’s description of their HA as providing “Hot Manual Failover”. Huh? How is a manual failover process “hot”? What generates the heat exactly — the flames by business users when the cluster is unavailable? This has to be the biggest oxymoron to hit business continuity since “Highly Available – Not”. At least it’s clear from the latter that it isn’t really Highly Available.
In contrast, MapR has been designed specifically for High Availability and is the only Hadoop distribution with no single points of failure. Other distributions use a single NameNode and when that name node goes down, the entire cluster becomes unavailable and you lose data. With MapR, the NameNode function is distributed across the cluster. In a sense, MapR has a “No NameNode” architecture so there is no data loss or downtime, even in the face of multiple disk or node failures.
When we talk about high availability we’re talking about automated, stateful failover for all software and hardware errors. Automated re-replication of data means that your system will work through any errors without issues. MapR’s rolling upgrades guarantee high availability during routine hardware and software maintenance.
MapR is also built to give full data protection with Mirroring and Snapshots – features designed to efficiently maintain data integrity and business continuity across clusters and sites. This is significant because the replication that other Hadoop distributions use does not protect against user or application errors that are replicated across a cluster but with MapR you are fully protected. MapR makes data protection easy and built in. Furthermore, you will experience zero performance lost on writing to original during snapshot, a petabyte snapshot can be performed in only seconds.
So when considering High Availability for Hadoop make sure to get the complete picture, and then you can safely turn off the lights.