Planning Roles

Skip to end of metadata
Go to start of metadata

In most clusters, a small number of nodes runs a set of control services devoted to cluster management and Hadoop infrastructure:

  • CLDB
  • JobTracker
  • WebServer
  • Zookeeper

The remainder of the nodes are devoted to services related to data processing and storage:

  • FileServer
  • TaskTracker

Supplementary services can run on many or few nodes, depending on how the cluster is to be used. Examples:

  • NFS
  • HBase

The following table provides general guidelines for the number of instances of each service to run in a cluster:

Service Package How Many
CLDB mapr-cldb 1-3
FileServer mapr-fileserver Most or all nodes
HBase Master mapr-hbase-master 1-3
HBase RegionServer mapr-hbase-regionserver Varies
JobTracker mapr-jobtracker 1-3
NFS mapr-nfs Varies
TaskTracker mapr-tasktracker Most or all nodes
WebServer mapr-webserver One or more
Zookeeper mapr-zookeeper 1, 3, 5, or a higher odd number

Sample Configurations

The following sections describe a few typical ways to deploy a MapR cluster.

Small M3 Cluster

A small M3 cluster runs most control services on only one node (except for ZooKeeper, which runs on three) and data services on the remaining nodes. The M3 license does not permit failover or high availability, and only allows one running CLDB.

Small M5 Cluster

A small M5 cluster runs control services on three nodes and data services on the remaining nodes, providing failover and high availability for all critical services.

Larger M5 Cluster

A large cluster (over 100 nodes) should isolate CLDB nodes from the TaskTracker and NFS nodes.

In large clusters, you should not run TaskTracker and ZooKeeper together on any nodes.

Example

Unable to render embedded object: File (RackWorksheetLarger.png) not found. 
 

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.