The MapR Distribution for ApacheTM Hadoop® adds enterprise-grade features to the Hadoop platform that make Hadoop easier to use and more dependable. The MapR Distribution for Hadoop is fully integrated with the Google Compute Engine (GCE) framework, allowing customers to deploy a MapR cluster with ready access to Google’s cloud infrastructure. MapR provides network file system (NFS) and open database connectivity (ODBC) interfaces, a comprehensive management suite, and automatic compression. MapR provides high availability with a No NameNode architecture and data protection with snapshots, disaster recovery, and cross-cluster mirroring.
Before You Start: Prerequisites
These instructions assume you meet the following prerequisites:
- You have an active Google Cloud Platform account.
- You have a client machine with the gcutil client installed and in your $PATH environment variable.
- You have access to a GCE project where you can add instances.
Deploying a MapR cluster within GCE relies on the following scripts:
launch-mapr-cluster.sh prepare-mapr-image.sh configure-mapr-instance.sh
You can download these scripts from https://github.com/mapr/gce
Launching a MapR Cluster on GCE
|The GCE project ID of the project where you want the cluster to be deployed. Note that the GCE project ID, the GCE project’s name, and the cluster’s name are all distinct.|
|The name of the new cluster. This is a MapR-specific property.|
|The version of the MapR distribution for Hadoop to install. The default version is 3.0.1. Other supported versions are 2.1.2, 2.1.3, and 22.214.171.124.|
|This parameter specifies the location of a configuration file that determines the allocation of cluster roles to the nodes in the cluster. See The GCE Configuration File for more information.|
|The OS image to use on the nodes. Legal values can be found through your GCE console.|
|Defines the hardware resources of the nodes in the cluster. Legal values can be constructed as n1-
|Optional: Use persistent disks if you’re not using a “-d” machine type. Specifies the number and size of persistent disks for this node in the format mxn, where m is the number of disks and n is the size in GB. For example, the value 4x128 specifies four 128GB disks. While you can specify any number of disks with any capacity, within the limits of your quota, more than 8 disks will not provide significant advantages in the GCE environment.|
|The GCE zone for the virtual instances. Zones include us-central1-a, us-central1-b, us-central2-a, europe-west1-a, and europe-west1-b.|
|Optional: This provides a path to a trial MapR license file.|
About Ephemeral Disks: Ephemeral disks do not maintain data after the instances have been shut down for an extended period of time.
Here is an example of a fully defined launch operation:
The GCE Configuration File
The configuration file that you pass to the launch-mapr-cluster.sh script describes the allocation of cluster roles to the nodes in the cluster. The configuration file uses the following format:
Each element on an entry in a configuration file is separated by a space. Each entry consists of these elements:
- Indexed identifier for the node in the cluster
- A comma-delimited list of packages to be installed on that node
Nodes in a MapR cluster can assume the following roles:
For more information about roles, see the main MapR documentation regarding planning service layout on a cluster.
Sample M3 Configuration File
This sample configuration file sets up a typical M3-licensed three-node cluster.
Sample M5 Configuration File
This sample configuration file sets up a typical M5-licensed five-node cluster to illustrate MapR’s highavailability features, such as redundant CLDB nodes, redundant JobTracker nodes, and redundant NFS servers.
Licensing: Install the M5 trial license after installing the cluster to enable the High Availability features.
For more examples of cluster designs, see the MapR documentation at: http://doc.mapr.com/display/MapR/Planning%2Bthe%2BCluster%23PlanningtheCluster-ExampleClusterDesigns
Using SSH to Access Nodes
You can use the gcutil ssh command. To log in to the nodes on your cluster. Use the following command to access the node launched above.