Requirements

Skip to end of metadata
Go to start of metadata

Before setting up a MapR cluster, ensure that every node satisfies the following hardware and software requirements, and consider which MapR license will provide the features you need.

If you are setting up a large cluster, it is a good idea to use a configuration management tool such as Puppet or Chef, or a parallel ssh tool, to facilitate the installation of MapR packages across all the nodes in the cluster. The following sections provide details about the prerequisites for setting up the cluster.

Node Hardware

Minimum Requirements Recommended
  • 64-bit processor
  • 4G DRAM
  • 1 network interface
  • At least one free unmounted drive or partition, 100 GB or more
  • At least 10 GB of free space on the operating system partition
  • Twice as much swap space as RAM (if this is not possible, see Memory Overcommit)
  • 64-bit processor with 8-12 cores
  • 32G DRAM or more
  • 2 GigE network interfaces
  • 3-12 disks of 1-3 TB each
  • At least 20 GB of free space on the operating system partition
  • 32 GB swap space or more (see also: Memory Overcommit)

In practice, it is useful to have 12 or more disks per node, not only for greater total storage but also to provide a larger number of storage pools available. If you anticipate a lot of big reduces, you will need additional network bandwidth in relation to disk I/O speeds. MapR can detect multiple NICs with multiple IP addresses on each node and manage network throughput accordingly to maximize bandwidth. In general, the more network bandwidth you can provide, the faster jobs will run on the cluster. When designing a cluster for heavy CPU workloads, the processor on each node is more important than networking bandwidth and available disk space.

Disks

Set up at least three unmounted drives or partitions, separate from the operating system drives or partitions, for use by MapR-FS. For information on setting up disks for MapR-FS, see Setting Up Disks for MapR. If you do not have disks available for MapR, or to test with a small installation, you can use a flat file instead.

It is not necessary to set up RAID on disks used by MapR-FS. MapR uses a script called disksetup to set up storage pools. In most cases, you should let MapR calculate storage pools using the default stripe width of two or three disks. If you anticipate a high volume of random-access I/O, you can use the -W option with disksetup to specify larger storage pools of up to 8 disks each.

You can set up RAID on the operating system partition(s) or drive(s) at installation time, to provide higher operating system performance (RAID 0), disk mirroring for failover (RAID 1), or both (RAID 10), for example. See the following instructions from the operating system websites:

Software

Install a compatible 64-bit operating system on all nodes. MapR currently supports the following operating systems:

  • 64-bit CentOS 5.4 or greater
  • 64-bit Red Hat 5.4 or greater
  • 64-bit Ubuntu 9.04 or greater

Each node must also have the following software installed:

  • Java JDK version 1.6.0_24 (not JRE)

    If Java is already installed, check which versions of Java are installed: java -version
    If JDK 6 is installed, the output will include a version number starting with 1.6, and then below that the text Java(TM). Example:

    java version "1.6.0_24"
    Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
    

    Use update-alternatives to make sure JDK 6 is the default Java: sudo update-alternatives --config java

Configuration

Each node must be configured as follows:

  • Unique hostname
  • SELinux disabled
  • Able to perform forward and reverse host name resolution with every other node in the cluster
  • Administrative user - a Linux user chosen to have administrative privileges on the cluster
    • Make sure the user has a password (using sudo passwd <user> for example)
  • Make sure the limit on the number of processes (NPROC_RLIMIT) is not set too low for the root user; the value should be at least 32786. In Red Hat or CentOS, the default may be very low (1024, for example). In Ubuntu, there may be no default; you should only set this value if you see errors related to inability to create new threads.
    To set the value, add the following line to the appropriate configuration for your version of Linux:
    * soft nproc 32768
    • In Red Hat or CentOS, set the value in /etc/security/limits.d/90-nproc.conf
    • In Ubuntu, if needed, set the value in /etc/security/limits.conf

In VM environments like EC2, VMware, and Xen, when running Ubuntu 10.10, problems can occur due to an Ubuntu bug unless the IRQ balancer is turned off. On all nodes, edit the file /etc/default/irqbalance and set ENABLED=0 to turn off the IRQ balancer (requires reboot to take effect).

Keyless SSH

Keyless (passwordless) SSH should be set up between all nodes in the cluster; MapR uses keyless SSH for centralized management of disks (via the disk commands), support utilities, and rolling upgrades.

If you choose not to provide keyless SSH, everything in the cluster will run fine. The only inconvenience is that you will be unable to use the above features remotely; however, you can accomplish the same tasks locally on each node as follows:

  • Use the disk commands on each node to manage its own disks.
  • Use the support utility mapr-support-collect.sh with the -O or --online option, to use the warden instead of SSH for support dump collection from nodes.
  • Upgrade the cluster manually instead of performing a rolling upgrade.

NTP

To keep all cluster nodes time-synchronized, MapR requires NTP to be configured and running on every node. If server clocks in the cluster drift out of sync, serious problems will occur with HBase and other MapR services. MapR raises a Time Skew alarm on any out-of-sync nodes. See http://www.ntp.org/ for more information about obtaining and installing NTP. In the event that a large adjustment must be made to the time on a particular node, you should stop ZooKeeper on the node, then adjust the time, then restart ZooKeeper.

DNS Resolution

For MapR to work properly, all nodes on the cluster must be able to communicate with each other. Each node must have a unique hostname, and must be able to resolve all other hosts with both normal and reverse DNS name lookup.

You can use the hostname command on each node to check the hostname. Example:

$ hostname -f
swarm

If the command returns a hostname, you can use the getent command to check whether the hostname exists in the hosts database. The getent command should return a valid IP address on the local network, associated with a fully-qualified domain name for the host. Example:

$ getent hosts `hostname`
10.250.1.53     swarm.corp.example.com

If you do not get the expected output from the hostname command or the getent command, correct the host and DNS settings on the node. A common problem is an incorrect loopback entry (127.0.x.x), which prevents the correct IP address from being assigned to the hostname.


Pay special attention to the format of /etc/hosts. For more information, see the hosts(5) man page. Example:

127.0.0.1       localhost
10.10.5.10     mapr-hadoopn.maprtech.prv mapr-hadoopn

Users and Groups

MapR uses each node's native operating system configuration to authenticate users and groups for access to the cluster. If you are deploying a large cluster, you should consider configuring all nodes to use LDAP or another user management system. You can use the MapR Control System to give specific permissions to particular users and groups. For more information, see Managing Permissions. Each user can be restricted to a specific amount of disk usage. For more information, see Managing Quotas.

All nodes in the cluster must have the same set of users and groups, with the same uid and gid numbers on all nodes:

  • When adding a user to a cluster node, specify the --uid option with the useradd command to guarantee that the user has the same uid on all machines.
  • When adding a group to a cluster node, specify the --gid option with the groupadd command to guarantee that the group has the same gid on all machines.

Choose a specific user to be the administrative user for the cluster. By default, MapR gives the user root full administrative permissions. If the nodes do not have an explicit root login (as is sometimes the case with Ubuntu, for example), you can give full permissions to the chosen administrative user after deployment. See Cluster Configuration.

On the node where you plan to run the mapr-webserver (the MapR Control System), install Pluggable Authentication Modules (PAM). See PAM Configuration.

Network Ports

The following table lists the network ports that must be open for use by MapR.

Service Port
SSH 22
NFS 2049
MFS server 5660
ZooKeeper 5181
CLDB web port 7221
CLDB 7222
Web UI HTTP 8080 (set by user)
Web UI HTTPS 8443 (set by user)
JobTracker 9001
NFS monitor (for HA) 9997
NFS management 9998
JobTracker web 50030
TaskTracker web 50060
HBase Master 60000
LDAP Set by user
SMTP Set by user

The MapR UI runs on Apache. By default, installation does not close port 80 (even though the MapR Control System is available over HTTPS on port 8443). If this would present a security risk to your datacenter, you should close port 80 manually on any nodes running the MapR Control System.

Licensing

Before installing MapR, consider the capabilities you will need and make sure you have obtained the corresponding license. If you need NFS, data protection with snapshots and mirroring, or plan to set up a cluster with high availability (HA), you will need an M5 license. You can obtain and install a license through the License Manager after installation. For more information about which features are included in each license type, see MapR Editions.

If installing a new cluster, make sure to install the latest version of MapR software. If applying a new license to an existing MapR cluster, make sure to upgrade to the latest version of MapR first. If you are not sure, check the contents of the file MapRBuildVersion in the /opt/mapr directory. If the version is 1.0.0 and includes GA then you must upgrade before applying a license. Example:
# cat /opt/mapr/MapRBuildVersion 
1.0.0.10178GA-0v

For information about upgrading the cluster, see Cluster Upgrade.

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.