High Availability on MapR

High Availability on MapR


Introduction

High availability (HA) is the ability of a system to remain up and running despite unforeseen failures, avoiding unplanned downtime or service disruption. HA is a critical feature that businesses rely on to support customer-facing applications and service level agreements

HA Benefits in the MapR Converged Data Platform

Advanced HA features in the MapR Converged Data Platform provide numerous benefits to organizations trying to harness big data

No Data Loss
The MapR Platform ensures critical data is never lost via configurable levels of replication. Automatic failover ensures the cluster is always available so big data applications can run on a 24x7 basis, helping organizations meet stringent business SLAs

Dependable Jobs
Jobs started on the MapR Platform run to completion despite failures of associated job trackers or resource managers. This tremendously improves cluster efficiency and resource utilization by avoiding restarts of jobs, especially the long-running MapReduce analytics jobs.

24x7 NoSQL Applications
MapR supports organizations to quickly graduate from batch-oriented analytics to operational NoSQL applications on their data lakes, by providing instant recovery capabilities and eliminating downtime associated with NoSQL housekeeping.

Continuous Access to Data
MapR provides unprecedented application and user access to the data lake via the NFS interface. To ensure continuous, uninterrupted operations, MapR makes the NFS access resilient.

Maintaining Availability during Planned Downtime
Upgrading large clusters often require service disruptions. MapR provides options to ensure clusters are available even during planned downtime for maintenance tasks such as software upgrades

MapR HA Implementation

The MapR Platform is the only data platform that is designed for 24x7 environments providing HA across several critical elements of the big data cluster. MapR provides HA not only for data and job completion, but also for access points and ancillary services running on the MapR Platform.

Metadata HA
Cluster metadata includes critical information about the location of application data and the associated replicas. Metadata HA is therefore critical for long-running cluster operations.

MapR provides self-healing from multiple, simultaneous failures, allowing cluster availability at all times. MapR automatically shards and replicates its metadata along with application data, making HA part of the core architecture. This also makes it extremely easy to implement HA, which works right out of the box with no requirements for deploying specialized nodes on specialized hardware and with minimal configuration to setup and monitor. As an added advantage, the distributed metadata architecture allows for extreme scalability with no practical limit on the number of files that can be stored on the cluster

MapReduce HA
MapR is the only data platform that supports fully functional MapReduce HA. Job execution will proceed to completion even if the associated trackers and resource managers go down. In other big data platforms, hardware failures result in failed jobs, thus requiring jobs to be completely restarted. This functionality is applicable to both MapReduce v1 as well as MapReduce v2 (YARN) jobs.

NFS HA
MapR uniquely provides network-attached storage (NAS) style access to big data through the standard NFS (Network File System) interface. MapR allows you to mount the cluster via NFS and ensures that the NFS mount point is also HA enabled. This ensures continuous undisrupted access to incoming streaming data and to applications requiring random read/write

Instant Recovery for NoSQL Applications
MapR ensures that data from a failed node is automatically and instantly available to the NoSQL application. The automatic and instant failover means there is no reassignment lag time, ensuring uninterrupted availability.

Instant Recovery for Streaming Applications
MapR Streams ensures that events data from a failed node is automatically and instantly available to the streaming application. Automatic failover is available for both producer and consumer applications. The automatic and instant failover means there is no reassignment lag time, ensuring uninterrupted availability.

Zero NoSQL Maintenance
In the broader objective of minimizing service disruptions, MapR requires zero NoSQL maintenance to further improve availability. Automatic, workload-aware scaling maintains high performance as the data load grows. The simplified architecture means there are no NoSQL servers to administer, thus reducing the number of failure points. And the optimized, compaction-less design prevents disruptive I/O storms and eliminates downtime from performing housekeeping tasks

Rolling Upgrades
Rolling upgrades also help with minimizing disruptions. Users can eliminate planned downtime by performing maintenance or software upgrades on the cluster, a few nodes at a time, while the system continues to run.

Services HA
The MapR model of distributing the metadata can be easily extended to services running on the MapR Platform. One can easily implement HA for any service running on the MapR cluster by configuring the service to store its state information as part of the cluster metadata and by registering the service with the ZooKeeper. If the service goes down, the ZooKeeper and Warden services take care of automatically restarting the services on a different node.

HDFS-Based Data Platforms and HA

HDFS-based data platforms provide minimal HA capabilities. All HDFS-based data platforms rely on a separate server known as the NameNode to store and process metadata. This approach creates performance and scalability bottlenecks, forcing a federated model of data storage that further increases SLA risks by creating multiple points of failure across the system.

More importantly—from an HA standpoint—this model requires an Active-Standby implementation that ends up protecting from just one failure. This means that if you have another NameNode-related failure before the failed node is replaced/repaired, you will lose or corrupt data.

Furthermore, the complexity of the system increases for setup and configuration. Administrators have additional tasks associated with configuring specialized hardware—which also increases the total cost of ownership—to accommodate the NameNode. The setup must also ensure continuous sharing of metadata across Active and Standby nodes, and enable every node in the cluster to maintain a heartbeat connection to both Active and Standby nodes at all times.

With reference to jobs, since the jobs-related metadata is not stored in HDFS-based data platforms today, the jobs have to be restarted whenever there is a failure or if the resource manager or the job trackers go down.

Furthermore, for NoSQL applications, HDFS-based data platforms do not provide any HA capabilities because of complex architectural issues associated with working with an append-only file system. Long running downtime is one of the common issues associated with these HDFS-based NoSQL applications

Conclusion

MapR architectural innovations deliver 24x7 big data applications ensuring high availability for all the critical components of your deployment. The MapR Converged Data Platform provides high availability across nodes, jobs, access methods, and services for both file-based as well as NoSQL applications in a uniform fashion across the cluster.



DOWNLOAD PDF