High availability (HA) is the ability of a system to remain up and running despite unforeseen failures, avoiding unplanned downtime or service disruption*. HA is a critical feature that businesses rely on to support customer-facing applications and service level agreements.
HA Benefits in the MapR Distribution for Hadoop
Advanced HA features in the MapR Distribution for Hadoop provides numerous benefits to organizations trying to harness big data.
No Data Loss
The MapR Distribution for Hadoop ensures critical data is never lost via configurable levels of replication. Automatic failover ensures the cluster is always available so big data applications can run on a 24x7 basis, helping organizations meet stringent business SLAs.
Jobs started on the MapR Distribution run to completion despite failures of associated job trackers or resource managers. This tremendously improves Hadoop cluster efficiency and resource utilization by avoiding restarts of jobs, especially the long-running MapReduce analytics jobs.
24x7 NoSQL Applications
MapR supports organizations to quickly graduate from batch-oriented analytics to operational NoSQL applications on Hadoop, by providing instant recovery capabilities and eliminating downtime associated with NoSQL housekeeping.
Continuous Access to Data
MapR provides unprecedented application and user access to Hadoop via the NFS interface. To ensure continuous, uninterrupted operations, MapR makes the NFS access resilient.
Maintaining Availability during Planned Downtime
Upgrading large clusters often require service disruptions. MapR provides options to ensure clusters are available even during planned downtime for maintenance tasks such as software upgrades.
MapR HA Implementation
The MapR Distribution for Hadoop is the only distribution that is designed for 24x7 environments providing HA across several critical elements of the Hadoop cluster. MapR provides HA not only for data and job completion, but also for access points and ancillary services running on Hadoop.
Cluster metadata includes critical information about the location of application data and the associated replicas. Metadata HA is therefore critical for long-running Hadoop operations.
MapR provides self-healing from multiple, simultaneous failures, allowing cluster availability at all times. MapR automatically shards and replicates its metadata along with application data, making HA part of the core architecture. This also makes it extremely easy to implement HA, which works right out of the box with no requirements for deploying specialized nodes on specialized hardware and with minimal configuration to setup and monitor. As an added advantage, the distributed metadata architecture allows for extreme scalability with no practical limit on the number of files that can be stored on Hadoop.
MapR is the only distribution that supports fully functional MapReduce HA. Job execution will proceed to completion even if the associated trackers and resource managers go down. In other distributions, hardware failures result in failed jobs, thus requiring jobs to be completely restarted. This functionality is applicable to both MapReduce v1 as well as MapReduce v2 (YARN) jobs.
MapR uniquely provides network-attached storage (NAS) style access to Hadoop through the standard NFS (Network File System) interface. MapR allows you to mount the cluster via NFS and ensures that the NFS mount point is also HA enabled. This ensures continuous undisrupted access to incoming streaming data and to applications requiring random read/write.
Instant Recovery for NoSQL Applications
MapR ensures that data from a failed node is automatically and instantly available to the NoSQL application. The automatic and instant failover means there is no reassignment lag time, ensuring uninterrupted availability.
Zero NoSQL Maintenance
In the broader objective of minimizing service disruptions, MapR requires zero NoSQL maintenance to further improve availability. Automatic, workload-aware scaling maintains high performance as the data load grows. The simplified architecture means there are no NoSQL servers to administer, thus reducing the number of failure points. And the optimized, compaction-less design prevents disruptive I/O storms and eliminates downtime from performing housekeeping tasks.
Rolling upgrades also help with minimizing disruptions. Users can eliminate planned downtime by performing maintenance or software upgrades on the cluster, a few nodes at a time, while the system continues to run.
The MapR model of distributing the metadata can be easily extended to services running on Hadoop. One can easily implement HA for any service running on the MapR cluster by configuring the service to store its state information as part of the cluster metadata and by registering the service with the ZooKeeper. If the service goes down, the ZooKeeper and Warden services take care of automatically restarting the services on a different node.
HDFS-Based Distributions and HA
HDFS-based distributions provide minimal HA capabilities. All HDFS-based distributions rely on a single server known as the NameNode to store and process metadata. This single-server approach creates performance and scalability bottlenecks, forcing a federated model of data storage that further increases SLA risks by creating multiple points of failure across the system.
More importantly—from an HA standpoint—this model requires an Active-Standby implementation that ends up protecting from just one failure. This means that if you have another NameNode-related failure before the failed node is replaced/repaired, you will lose or corrupt data.
Furthermore, the complexity of the system increases for setup and configuration. Administrators have additional tasks associated with configuring specialized hardware – which also increases the total cost of ownership - to accommodate the NameNode. The setup must also ensure continuous sharing of metadata across Active and Standby nodes, and enable every node in the cluster to maintain a heartbeat connection to both Active and Standby nodes at all times.
The figure below delineates the differences between the HDFS model and the MapR model of storing metadata.
With reference to jobs, since the jobs-related metadata is not stored in HDFS-based distributions today, the jobs have to be restarted whenever there is a failure or if the resource manager or the job trackers go down.
Furthermore, for NoSQL applications, HDFS-based distributions do not provide any HA capabilities because of complex architectural issues associated with working with an append-only file system. Long running downtime is one of the common issues associated with these HDFS-based NoSQL applications.
MapR architectural innovations deliver 24x7 big data applications ensuring high availability for all the critical components of Hadoop, including for Hadoop 2.0 features such as YARN. The MapR Distribution for Hadoop provides high availability across nodes, jobs, access methods, and services for both file-based as well as NoSQL applications in a uniform fashion across the cluster.