M3 - RHEL or CentOS

Skip to end of metadata
Go to start of metadata

Use the following steps to install a simple MapR cluster up to 100 nodes with a basic set of services. To build a larger cluster, or to build a cluster that includes additional services (such as Hive, Pig, Flume, or Oozie), see the Installation Guide. To add services to nodes on a running cluster, see Reconfiguring a Node. To get the most out of this tutorial, and to enable NFS, be sure to register for your free M3 license after installing the MapR software.

Setup

Follow these instructions to install a small MapR cluster (3-100 nodes) on machines that meet the following requirements:

  • 64-bit Red Hat 5.4 or greater, or 64-bit CentOS 5.4 or greater
  • RAM: 4 GB or more
  • At least one free unmounted drive or partition, 50 GB or more
  • At least 10 GB of free space on the operating system partition
  • Java JDK version 1.6.0_24 (not JRE)
  • The root password, or sudo privileges
  • A Linux user chosen to have administrative privileges on the cluster
    • Make sure the user has a password (using sudo passwd <user> for example)

Each node must have a unique hostname, and keyless SSH set up to all other nodes.

This procedure assumes you have free, unmounted physical partitions or hard disks for use by MapR. If you are not sure, please read Setting Up Disks for MapR.

  • Create a text file /tmp/disks.txt listing disks and partitions for use by MapR. Each line lists a single disk, or partitions on a single disk. Example:
    /dev/sdb
    /dev/sdc1 /dev/sdc2 /dev/sdc4
    /dev/sdd
    

    Later, when you run disksetup to format the disks, specify the disks and partitions file. Example:

    disksetup -F /tmp/disks.txt

For the steps that follow, make the following substitutions:

  • <user> - the chosen administrative username
  • <node 1>, <node 2>, <node 3>... - the IP addresses of nodes 1, 2, 3 ...
  • <proxy user>, <proxy password>, <host>, <port> - proxy server credentials and settings
If you are installing a MapR cluster on nodes that are not connected to the Internet, contact MapR for assistance. If you are installing a cluster larger than 100 nodes, see the Installation Guide. In particular, CLDB nodes on large clusters should not run any other service (see Isolating CLDB Nodes).

Deployment

  1. Change to the root user (or use sudo for the following commands).
  2. On all nodes, create a text file called maprtech.repo in the directory /etc/yum.repos.d/ with the following contents:
    [maprtech]
    name=MapR Technologies
    baseurl=http://package.mapr.com/releases/v1.2.3/redhat/
    enabled=1
    gpgcheck=0
    protect=1
    To install a previous release, see the Release Notes for the correct path to use in the baseurl parameter.
  3. If your connection to the Internet is through a proxy server, you must set the http_proxy environment variable before installation:
    http_proxy=http://<host>:<port>
    export http_proxy
    

    If you don't have Internet connectivity, do one of the following:

  4. On node 1, execute the following command:
    yum install mapr-cldb mapr-fileserver mapr-jobtracker mapr-nfs mapr-tasktracker mapr-webserver mapr-zookeeper
  5. On nodes 2 and 3, execute the following command:
    yum install mapr-fileserver mapr-tasktracker mapr-zookeeper
  6. On all other nodes (nodes 4...n), execute the following commands:
    yum install mapr-fileserver mapr-tasktracker
  7. On all nodes, execute the following commands:
    /opt/mapr/server/configure.sh -C <node 1> -Z <node 1>,<node 2>,<node 3>
    /opt/mapr/server/disksetup -F /tmp/disks.txt
    
  8. On nodes 1, 2, and 3, execute the following command:
    /etc/init.d/mapr-zookeeper start
  9. On node 1, execute the following command:
    /etc/init.d/mapr-warden start

    Tips
  10. On node 1, give full permission to the chosen administrative user using the following command:
    /opt/mapr/bin/maprcli acl edit -type cluster -user <user>:fc

    Tips
  11. On a machine that is connected to the cluster and to the Internet, perform the following steps to install the license:
    • In a browser, view the MapR Control System by navigating to the node that is running the WebServer:
      https://<node 1>:8443
      Your computer won't have an HTTPS certificate yet, so the browser will warn you that the connection is not trustworthy. You can ignore the warning this time.
    • The first time MapR starts, you must accept the agreement and choose whether to enable the MapR Dial Home service.
    • Log in to the MapR Control System as the administrative user you designated earlier.
    • In the navigation pane of the MapR Control System, expand the System Settings group and click MapR Licenses to display the MapR License Management dialog.
    • Click Add Licenses via Web.
    • If the cluster is already registered, the license is applied automatically. Otherwise, click OK to register the cluster on MapR.com and follow the instructions there.
      • If the cluster is not yet registered, the message "Cluster not found" appears and the browser is redirected to a registration page.
      • On the registration page, create an account and log in.
      • On the Register Cluster page, choose M3 and click Register.
      • When the message "Cluster Registered" appears, click Return to your MapR Cluster UI.
  12. On node 1, execute the following command:
    /opt/mapr/bin/maprcli node services -nodes <node 1> -nfs start
  13. On all other nodes (nodes 2...n), execute the following command:
    /etc/init.d/mapr-warden start
  14. Log in to the MapR Control System.
  15. Under the Cluster group in the left pane, click Dashboard.
  16. Check the Services pane and make sure each service is running the correct number of instances:
    • Instances of the FileServer and TaskTracker on all nodes
    • 3 instances of ZooKeeper
    • 1 instance of the CLDB, JobTracker, NFS, and WebServer

Next Steps

Start Working with Volumes

MapR provides volumes as a way to organize data into groups, so that you can manage your data and apply policy all at once instead of file by file. Think of a volume as being similar to a huge hard drive---it can be mounted or unmounted, belong to a specific department or user, and have permissions set as a whole or on any directory or file within. In this section, you will create a volume that you can use for later parts of the tutorial.

Create a volume:

  1. In the Navigation pane, click Volumes in the MapR-FS group.
  2. Click the New Volume button to display the New Volume dialog.
  3. For the Volume Type, select Standard Volume.
  4. Type the name MyVolume in the Volume Name field.
  5. Type the path /myvolume in the Mount Path field.
  6. Scroll to the bottom and click OK to create the volume.

Mount the Cluster via NFS

With MapR, you can export and mount the Hadoop cluster as a read/write volume via NFS from the machine where you installed MapR, or from a different machine.

  • If you are mounting from the machine where you installed MapR, replace <host> in the steps below with localhost
  • If you are mounting from a different machine, make sure the machine where you installed MapR is reachable over the network and replace <host> in the steps below with the hostname of the machine where you installed MapR.

Try the following steps to see how it works:

  1. Change to the root user (or use sudo for the following commands).
  2. See what is exported from the machine where you installed MapR:
    showmount -e <host>
  3. Set up a mount point for the NFS share:
     mkdir /mapr
  4. Mount the cluster via NFS:
     mount <host>:/mapr /mapr
    Tips
  5. To see the cluster, list the /mapr directory:
    # ls /mapr/my.cluster.com
    my.cluster.com
    
  6. List the cluster itself, and notice that the volume you created is there:
    # ls -l /mapr/my.cluster.com
    Found 3 items
    drwxrwxrwx   - root root                   0 2011-11-22-12:44 /myvolume
    drwxr-xr-x   - mapr mapr                   0 2011-01-03 13:50 /tmp
    drwxr-xr-x   - mapr mapr                   0 2011-01-04 13:57 /user
    drwxr-xr-x   - root root                   0 2010-11-25 09:41 /var
    
  7. Try creating a directory in your new volume via NFS:
    mkdir /mapr/my.cluster.com/myvolume/foo
  8. List the contents of /myvolume:
    hadoop fs -ls /myvolume

Notice that Hadoop can see the directory you just created with NFS. Try navigating to the cluster using the computer's file browser --- you can drag and drop files directly to your new volume, and see them immediately in Hadoop!

If you are already running an NFS server, MapR will not run its own NFS gateway. In that case, you will not be able to mount the single-node cluster via NFS, but your previous NFS exports will remain available.

Try MapReduce

In this section, you will run the well-known Word Count MapReduce example. You'll need one or more text files. The Word Count program reads files from an input directory, counts the words, and writes the results of the job to files in an output directory. For this exercise we will use /myvolume/in for the input, and /myvolume/out for the output. The input directory must exist and must contain the input files before running the job; the output directory must not exist, as the Word Count example creates it.

  1. Open a terminal (select Applications > Accessories > Terminal)
  2. Copy a couple of text files into the cluster, either using the file browser or the command line. Create the directory /myvolume/in and put the files there. Example:
    mkdir /mapr/my.cluster.com/myvolume/in
    cp <some files> /mapr/my.cluster.com/myvolume/in
  3. Type the following line to run the Word Count job:
    hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2-dev-examples.jar wordcount /myvolume/in /myvolume/out
    
  4. Look in the newly-created /myvolume/out for a file called part-r-00000 containing the results.

Next Steps

MapR works with the leaders in the Hadoop ecosystem to provide the most powerful data analysis solutions. For more information about our partners, take a look at the following pages:

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.