Google Compute Engine is a proven platform for running MapR. Consistent, high performance virtual machines coupled with a high bandwidth, low latency network linking them together and to the rest of the Google Cloud Platform services deliver a solid foundation for cloud-based data processing architectures. The ability to quickly instantiate numerous virtual machines on demand and the availability of per-minute pricing make Compute Engine well-suited and cost-effective for spinning up and turning off ad hoc clusters. Further, Compute Engine’s advanced routing and data encryption features together with Google’s global network performance enable the construction of secure, compelling hybrid solutions.
This paper presents several techniques for those who wish to manage their own MapR installations on Google Compute Engine, and select scenarios (migration across zones, disaster recovery and high availability) that arise when dealing with long-lived clusters and operating across multiple zones. This list is neither exhaustive nor authoritative as other solutions certainly exist and might even be more applicable to specific situations. This paper, however, illustrates how the same features that make Compute Engine a powerful platform upon which to run MapR can provide the basis for MapR cluster management solutions.
The scenarios presented in this section are all based on the premise that the cluster under consideration ought to be preserved to the greatest extent possible when faced with an adverse environmental event (for example, a zone is shut down for maintenance or experiences an unanticipated outage). This is a common situation for long-lived clusters used in support of job pipelines and stream processing. On-demand clusters1 require little or no management and are, generally speaking, outside the scope of this document. Further, the scenarios presented assume that data consumed and generated by the cluster needs to reside durably in a self-managed MapR-FS (MapR File System).2
Administrators can plan accordingly for maintenance events and manage cluster availability. However, should an unexpected situation occur, jobs currently being processed can be impacted. While specific outcomes vary, in some cases, these jobs are restarted rather than resumed, though there is a possibility that jobs will need to be resubmitted.
At some point, it may become desirable or necessary to move or clone a cluster from one zone to another. This section presents two different zone migration scenarios:
- Minimize the unavailability of the cluster to accept and run MapReduce jobs, at the expense of operational complexity.
- Trade additional unavailability in favor of significantly simpler management.
Zone Migration Scenario 1: Minimizing Unavailability
Many MapR clusters are integral parts of a business’s daily operations. Any time users cannot submit work can be costly. This first approach attempts to reduce this duration as much as possible in a traditional deployment3. First, new FileServers and TaskTrackers are added to the cluster in the destination zone, and then the Container Location Database (CLDB) and JobTracker master nodes are switched over as well. This method bounds the cluster’s unavailability to roughly the interval defined from the shutdown of the master services in the source zone to their startup in the destination zone.
FileServers may use either (or both) scratch or persistent disks for MapR-FS. Persistent disks offer more storage space3 as well as durability beyond the lifetime of an instance. With persistent disks, administrators have the option of reducing the MapR-FS replication factor while protecting data in the event of a zone becoming unavailable.
Using persistent root disks enables any node to survive an unexpected reboot.
This scenario makes use of MapR rack topologies to enable the cluster to continue running while nodes are decommissioned.4 Topology describes the locations of nodes and racks in a cluster. The MapR software uses node topology to determine the location of replicated copies of data. Optimally defined cluster topology results in data being replicated to separate racks, providing continued data availability in the event of rack or node failure. Data is copied across zones when the Container Location Database (CLDB) distributes replicated data containers on separate racks. A rack is a logical collection of nodes configured with FileServers and TaskTrackers running in the same zone.
- Run the configure.sh script to configure new nodes. The nodes are automatically added into the MapR cluster when Warden (the service that starts and stops other services in a cluster) starts the services on the new node. You can create new instances for these nodes programmatically using the gcutil utility, REST API or Google client libraries, or manually via the Google Cloud Console.
- Migrate CLDB and JobTracker. There are several ways to implement this step. The automatic approach is recommended if persistent root and data disks have been used as this greatly simplifies the process. The manual approach may be necessary when performing custom configurations and/or moving data from scratch disks.
- a. Shut down the cluster.
- b. Use the gcutil move instances command to migrate the nodes into the destination zone. This command copies instance configurations, takes snapshots of the persistent disks, deletes the existing instances, and then recreates the instances in the destination zone. Once the new instances are up and services are running, the cluster is available to accept jobs.
- a. Shutdown the cluster.
- b. If persistent root and/or data disks have been used, create snapshots for later use.
- c. If any data residing on scratch disk must be preserved, take either of the following actions:
- Create and attach a new persistent disk, copy the data from the scratch disk, detach the persistent disk and create a snapshot.
- Create and attach a new persistent disk, copy the data from the scratch disk, detach the persistent disk and create a snapshot.
tar zcf - <src> | gsutil cp - gs://<bucket>/src.tar.gz
- d. Terminate the nodes in the source zone.
- e. If applicable, create new persistent root and data disks from snapshots created previously.
- f. Use the new persistent root and data disks created in the previous step to create instances for the nodes in the destination zone.
- g. If data has been temporarily stored in Google Cloud Storage, copy the data from Google Cloud Storage to the new instances.
- h. Start Warden and ZooKeeper. Warden will start up the master services on the new nodes. Once the new instances are up and services are running, the cluster can accept jobs.
- c. Blacklist TaskTrackers, and wait for the blacklist to finish.
- d. Move the node to the /offline or /decommissioned topology.
- e. When the node no longer has non-local data (the data has been “drained”), shut down the FileServer service.
- f. Remove the node from the cluster using the maprcli node remove command.
This scenario presents how to perform a migration with a single command, easing migration operations at the possible expense of longer cluster downtime. This process is nearly identical to the automatic migration of the Container Location Database (CLDB) and JobTracker instances, as described in scenario 1. gcutil moveinstances migrates the entire cluster between zones.
Use persistent disks when adding data disks to nodes for MapR-FS (MapR File System).
Migrate the cluster. Use the gcutil moveinstances command to clone all of the nodes in the cluster and their persistent disks into the destination zone. For example, consider a small cluster with a single master node (mapr-maprm) and ten worker nodes (mapr-maprw-000, …, mapr-maprw-009). The following single call will move from the us-central2-a zone and restart the cluster in uscentral1-a:
Zone-to-zone migration is an effective way to gracefully plan for and manage anticipated zone maintenance windows and other drivers of cluster relocation. However, it is always a good practice to account for the unexpected; catastrophic zone-wide failures are extremely rare but can occur5. You can deploy a zone across multiple zones for disaster recovery.
When you deploy a cluster across multiple zones6, commission another set of FileServers and TaskTrackers in a second zone. Add a standby Container Location Database (CLDB) and JobTracker to the second zone for failover.
This pattern requires twice the number of FileServers and TaskTrackers than would otherwise be deployed. It also requires substantial cross-zone communication. While this pattern results in twice the processing capability, operational (instance, storage and network) expenses will increase and overall performance could be degraded, especially if data is accessed across zones.
MapR provides high availability for the Container Location Database (CLDB) and JobTracker.
- Distribute the cluster. FileServers and TaskTrackers are spread equally across both zones and fully commissioned. The FileServers within a zone should identify with the same rack and each zone should use a different rack identifier, ensuring that at least one copy of each data container exists in the second zone.
- Failover the CLDB. The CLDB can be restored in the second zone automatically.
- Failover the JobTracker. The JobTracker can be restored automatically in the second zone.
MapR is the only distribution with high availability at the cluster and job levels to eliminate any single points of failure across the system. MapR distributes metadata across the cluster to avoid bottlenecks and improve cluster performance.
The JobTracker in MapR has high availability. If the service crashes, a new JobTracker automatically picks up where the original JobTracker stopped, and the MapReduce can continue without any restart or intervention. If the node itself crashes, a new node automatically takes over and continues the process.
Google Compute Engine Network DNS
One of the benefits of Google Compute Engine is the ability to address an instance on the network in any zone in any region by its user-provided hostname. Additionally, these names can be reused when instances are recycled (turning down one instance and spinning up another with the same name), which is convenient for deployments that want to rely on name-based addressing. It is also worth noting that there are no guarantees around internal IP address assignment. That instance hostnames can be reused dynamically is due to the fact that DNS entries are cleaned up almost immediately after an instance is terminated.
The MapR cluster uses Apache ZooKeeper to coordinate services and enables high availability (HA) and fault tolerance for MapR clusters. The Warden will not start any services unless ZooKeeper is reachable and more than half of the configured ZooKeeper nodes (a quorum) are live. It’s worth noting that special considerations need to be made with regards to DNS when deploying ZooKeeper for MapR on Google Compute Engine.
Currently, ZooKeeper resolves DNS names once at startup. This prevents the replacement of a member instance without rebooting the remaining members of the ensemble. So given the previous description of the behavior of Google Compute Engine DNS, it is worth noting that any time a node running ZooKeeper is rebooted or replaced, all other ZooKeeper services must be restarted. Failure to do so will leave the existing ensemble with one less member, and the new instance will be running what will amount to an orphaned ZooKeeper instance.
Straddling Multiple Zones
High availability can also be considered in terms of resilience of the cluster to zone failure. This might be applicable in scenarios where any interruption of long running jobs or pipelines might be detrimental and/or incur substantial costs. Consider the scenario where zone A is scheduled for maintenance earlier than zone B. Given this knowledge, it is possible to construct a cluster spanning both zones such that it will continue to run in the event that either goes down.
- Distribute the cluster. As previously addressed, FileServers and TaskTrackers are split equally across both zones and commissioned as part of the MapR-FS cluster; multiple MapR-FS racks are used to ensure data is fully replicated.
- Distribute the ZooKeeper ensemble. The majority of ZooKeeper nodes must be deployed in zone B along with the standby CLDB and the active JobTracker.
- Manage the CLDB. If zone A becomes unavailable, the ZooKeeper ensemble still maintains a quorum and can automatically facilitate the failover and promotion of the standby CLDB to active. If instead, zone B experiences an outage, the active CLBD will continue to run. The cluster, however, will not be able to sustain an active node failure as the ensemble has no quorum. Replacement ZooKeeper nodes must be added and the ensemble must be restarted.
- Manage the JobTracker. The active JobTracker is deployed to zone B, so that if zone A becomes unavailable (as anticipated), any active jobs can continue running until completion.
Google Cloud Platform not only offers a high performance platform upon which to run MapR, but also provides features and tools that can assist in the maintenance of clusters across zones to keep business-critical jobs running in zone migration, disaster recovery, and high availability scenarios. Persistent disks, Google Cloud Storage, and the network infrastructure enable efficient data and instance migration; and gcutil provides a rich set of commands to help accomplish cluster management tasks.
MapR, Hive, and Pig on Google Compute Engine Google Cloud Platform solution – for more on how to take advantage of Google Compute Engine, with support from Google Cloud Storage, to run a self managed MapR cluster with Apache Hive and Apache Pig as part of a Big Data processing solution.
Google-compute-engine-cluster-for-hadoop – a sample application to assist in setting up Hadoop compute clusters and executing MapReduce tasks. Please note that this application does not perform any of the cluster management outlined in this paper.