Tutorial: How to Set Up MapR on a Private Cloud Using Sahara on DevStack

In this tutorial, you’ll learn how you can deploy your MapR clusters with just one click on your private cloud infrastructure. In order to set this up, you will use open source software for creating clouds, and a plugin to spin up MapR clusters on demand.

Understanding OpenStack

OpenStack is open source software for creating private/public clouds. It provides a complete technology stack similar to the one provided by major public cloud services, including management of virtual machines, storage, and networking.

For our demonstration, we will use DevStack. DevStack is quick way to set up a development environment for OpenStack, which provides and maintains tools used for the installation of OpenStack services from source.

Sahara Plugin

Sahara is the Hadoop data processing module/plugin within OpenStack. It provides a solution for users who want to deploy Hadoop clusters or run big data applications in a cloud environment.

Setup Details:

  • Ubuntu 14.04  
  • OpenStack Release: Juno
  • 1 Physical UCS Box (128GB Ram, 24 CPU)

1) First, let’s install git:

sudo apt-get install git

2) Clone repo to default user (say MapR user) directory to get initial setup scripts:

cd ~

git clone https://github.com/wochanda/devstack.git -b stable/juno

3) Create the DevStack user:

sudo devstack/tools/create-stack-user.sh

4) Switch to stack user and enter user directory:

sudo su stack

cd ~

5) Re-clone the repo, this time as stack user:

sudo git clone https://github.com/wochanda/devstack.git -b stable/juno

-Set environment variables, we will need to execute a CLI

source /opt/stack/devstack/openrc admin admin

6) Clone the MapR plugin repository from GitHub to a local directory in your OpenStack environment:

sudo git clone https://github.com/mapr/sahara.git /opt/stack/sahara

7) Cross verify that the MapR plugin with Sahara is added to the setup.cfg file (/opt/stack/sahara/setup.cfg):

[entry_points]

...

sahara.cluster.plugins =

vanilla = sahara.plugins.vanilla.plugin:VanillaProvider

hdp = sahara.plugins.hdp.ambariplugin:AmbariPlugin

cdh = sahara.plugins.cdh.plugin:CDHPluginProvider

mapr = sahara.plugins.mapr.plugin:MapRPlugin

fake = sahara.plugins.fake.plugin:FakePluginProvider

spark = sahara.plugins.spark.plugin:SparkProvider

8) Verify MapR plugin entry exists in sahara.conf file (/etc/sahara/sahara.conf):

[DEFAULT]

use_floating_ips = false

...

plugins = vanilla,mapr,hdp

debug = True

verbose = True

9) In /opt/stack/devstack/local.conf add HOST_IP and active interface details devstack can use while spinning up VMs on cloud:

HOST_IP=10.10.xx.xxx

FLAT_INTERFACE=eth0

10) Add “SAHARA_REPO" and “SAHARA_ENABLED_PLUGINS” and also verify “SAHARA_BRANCH” is set correctly under Sahara configs:

# Sahara configs

enable_service sahara

SAHARA_REPO=https://github.com/mapr/sahara.git

SAHARA_BRANCH=juno-release

SAHARA_ENABLED_PLUGINS=mapr,vanilla,hdp

11) Add below line in file “/opt/stack/devstack/lib/infra”:

   echo "oslo.log" >> $REQUIREMENTS_DIR/global-requirements.txt

Oslo

12) Now run the script to set up Sahara on dev/openstack:

cd /opt/stack/devstack

./stack.sh                   ( This till take a while ~400s)

devstack - success

Note: If you like to see logs, or start or stop a process, join the screen session.

/opt/stack/devstack/rejoin-stack.sh

Now you have Horizon UI available at http://<Host-IP>/

The default user is: admin

The password is: mapr

Under Admin Hypervisors, you see your host configuration available for DevStack to spin VMs on the cloud.

herpvisors devstack

Now we have to set up different templates that we can later use to spin up MapR clusters on demand.

Step 1: Adding MapR Images to OpenStack

  1. On the Dashboard, select Project > Compute > Images.
  2. Click the Create Image button in the top-right corner of the screen.
  3. Fill out the Create an Image screen with the necessary details and then click Create.

create image

Note: I used a pre-built MapR Distribution image.

https://s3-us-west-2.amazonaws.com/sahara-images/ubuntu_trusty_mapr_plain_latest.qcow2

There are a few MapR distribution pre-built images for Ubuntu and CentOS which can be found at the following locations:

http://doc.mapr.com/display/MapR40x/Setting+Up+Images+and+Templates+for+the+MapR+Plugin

Once completed, you should be able to see the image you created in an active state.

Image active

Step 2: Creating a Flavor

In this step, you will create a different node template which you can use at a later stage for MapR deployments.

  1. On the Dashboard, select Project > Compute > Flavors
  2. Click the Create Flavor button.
  3. Fill out the Create Flavor screen with the necessary details and then click Create. In our case, since I only plan to spin up a two-node cluster, I will create two flavors: one which will suit control nodes, and the other which will suit data nodes.

Create flavor          Flavor small

Step 3: Registering Images for the MapR Plugin

Sahara users who want to provision clusters have to specify additional properties for images that were previously added in Step 1.

Use the Image Registry to register images for use with the MapR Sahara plugin.

  1. Go to Project > Data Processing > Image Registry > Register Image.
  2. Select the image from the Image list.

   Enter ubuntu for Ubuntu in the User Name field.

   Select and add MapR Plugin and 3.1.1 Version tags, then click the Add plugin tags button.

Register image

 3. Finally, click Done to get this image registered with the Sahara Plugin.

Image registry

Step 4: Creating MapR Node Group Templates

In this step, we will create node group templates. These templates describe the type of workload for a node in a cluster. For instance, I will create a control node and data node template for my two-node cluster.

  1. Go to Project > Data Processing > Node Group Templates > Create Template.
  2. Select MapR Distribution and the 3.1.1 version, then click Create, which will bring up the template you can fill out with services you need.
  3. I created 2 templates one for Control node and other for Data node (with respective services as needed) below is what I should see.

Node group templates

Step 5: Creating MapR Cluster Templates

The last step is to create templates for MapR clusters so that users can launch clusters with just one click.

1) Define cluster templates by referencing existing node group templates, depending on the number of nodes needed in the cluster.

2) Go to Project > Data Processing > Cluster Templates > Create Template.

Select the MapR plugin name and 3.1.1 Hadoop version, then click Create.

3) Enter the template name when the Create Cluster Template box opens. On the Node Groups tab, select node group templates (click the + sign) and specify the number of nodes per group in the Count column. Select one control node and one data node, and click Create.

create cluster template

Finally now you can click on Launch cluster and specify cluster name to launch and kick of launching cluster as needed.

data processing - cluster template

Once the cluster is launched and ready you should see the cluster status to be Active which indicates your cluster is up and ready for you to run jobs against it.

active cluster

 

active cluster overview

We can also create multiple cluster templates by reusing the same node templates and spinning up different clusters for various use cases.

  • 1 node cluster template - control and data node
  • 2 node - very small cluster - 1 control and 1 data node
  • 10 node -  medium cluster -  3 control and 7 data nodes

In this tutorial, you learned how to set up MapR on a private cloud using Sahara on DevStack. Let us know if you have any feedback on the tutorial, or if you are running into any issues.

no

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams

 

 

 

Download for free