In this tutorial, you’ll learn how you can deploy your MapR clusters with just one click on your private cloud infrastructure. In order to set this up, you will use open source software for creating clouds, and a plugin to spin up MapR clusters on demand.
OpenStack is open source software for creating private/public clouds. It provides a complete technology stack similar to the one provided by major public cloud services, including management of virtual machines, storage, and networking.
For our demonstration, we will use DevStack. DevStack is quick way to set up a development environment for OpenStack, which provides and maintains tools used for the installation of OpenStack services from source.
Sahara is the Hadoop data processing module/plugin within OpenStack. It provides a solution for users who want to deploy Hadoop clusters or run big data applications in a cloud environment.
- Ubuntu 14.04
- OpenStack Release: Juno
- 1 Physical UCS Box (128GB Ram, 24 CPU)
1) First, let’s install git:
sudo apt-get install git
2) Clone repo to default user (say MapR user) directory to get initial setup scripts:
git clone https://github.com/wochanda/devstack.git -b stable/juno
3) Create the DevStack user:
4) Switch to stack user and enter user directory:
sudo su stack
5) Re-clone the repo, this time as stack user:
sudo git clone https://github.com/wochanda/devstack.git -b stable/juno
-Set environment variables, we will need to execute a CLI
source /opt/stack/devstack/openrc admin admin
6) Clone the MapR plugin repository from GitHub to a local directory in your OpenStack environment:
sudo git clone https://github.com/mapr/sahara.git /opt/stack/sahara
7) Cross verify that the MapR plugin with Sahara is added to the setup.cfg file (/opt/stack/sahara/setup.cfg):
vanilla = sahara.plugins.vanilla.plugin:VanillaProvider
hdp = sahara.plugins.hdp.ambariplugin:AmbariPlugin
cdh = sahara.plugins.cdh.plugin:CDHPluginProvider
mapr = sahara.plugins.mapr.plugin:MapRPlugin
fake = sahara.plugins.fake.plugin:FakePluginProvider
spark = sahara.plugins.spark.plugin:SparkProvider
8) Verify MapR plugin entry exists in sahara.conf file (/etc/sahara/sahara.conf):
use_floating_ips = false
plugins = vanilla,mapr,hdp
debug = True
verbose = True
9) In /opt/stack/devstack/local.conf add HOST_IP and active interface details devstack can use while spinning up VMs on cloud:
10) Add “SAHARA_REPO" and “SAHARA_ENABLED_PLUGINS” and also verify “SAHARA_BRANCH” is set correctly under Sahara configs:
# Sahara configs
11) Add below line in file “/opt/stack/devstack/lib/infra”:
echo "oslo.log" >> $REQUIREMENTS_DIR/global-requirements.txt
12) Now run the script to set up Sahara on dev/openstack:
./stack.sh ( This till take a while ~400s)
Note: If you like to see logs, or start or stop a process, join the screen session.
Now you have Horizon UI available at http://<Host-IP>/
The default user is: admin
The password is: mapr
Under Admin → Hypervisors, you see your host configuration available for DevStack to spin VMs on the cloud.
Now we have to set up different templates that we can later use to spin up MapR clusters on demand.
Step 1: Adding MapR Images to OpenStack
- On the Dashboard, select Project > Compute > Images.
- Click the Create Image button in the top-right corner of the screen.
- Fill out the Create an Image screen with the necessary details and then click Create.
Note: I used a pre-built MapR Distribution image.
There are a few MapR distribution pre-built images for Ubuntu and CentOS which can be found at the following locations:
Once completed, you should be able to see the image you created in an active state.
Step 2: Creating a Flavor
In this step, you will create a different node template which you can use at a later stage for MapR deployments.
- On the Dashboard, select Project > Compute > Flavors
- Click the Create Flavor button.
- Fill out the Create Flavor screen with the necessary details and then click Create. In our case, since I only plan to spin up a two-node cluster, I will create two flavors: one which will suit control nodes, and the other which will suit data nodes.
Step 3: Registering Images for the MapR Plugin
Sahara users who want to provision clusters have to specify additional properties for images that were previously added in Step 1.
Use the Image Registry to register images for use with the MapR Sahara plugin.
- Go to Project > Data Processing > Image Registry > Register Image.
- Select the image from the Image list.
Enter ubuntu for Ubuntu in the User Name field.
Select and add MapR Plugin and 3.1.1 Version tags, then click the Add plugin tags button.
3. Finally, click Done to get this image registered with the Sahara Plugin.
Step 4: Creating MapR Node Group Templates
In this step, we will create node group templates. These templates describe the type of workload for a node in a cluster. For instance, I will create a control node and data node template for my two-node cluster.
- Go to Project > Data Processing > Node Group Templates > Create Template.
- Select MapR Distribution and the 3.1.1 version, then click Create, which will bring up the template you can fill out with services you need.
- I created 2 templates one for Control node and other for Data node (with respective services as needed) below is what I should see.
Step 5: Creating MapR Cluster Templates
The last step is to create templates for MapR clusters so that users can launch clusters with just one click.
1) Define cluster templates by referencing existing node group templates, depending on the number of nodes needed in the cluster.
2) Go to Project > Data Processing > Cluster Templates > Create Template.
Select the MapR plugin name and 3.1.1 Hadoop version, then click Create.
3) Enter the template name when the Create Cluster Template box opens. On the Node Groups tab, select node group templates (click the + sign) and specify the number of nodes per group in the Count column. Select one control node and one data node, and click Create.
Finally now you can click on Launch cluster and specify cluster name to launch and kick of launching cluster as needed.
Once the cluster is launched and ready you should see the cluster status to be Active which indicates your cluster is up and ready for you to run jobs against it.
We can also create multiple cluster templates by reusing the same node templates and spinning up different clusters for various use cases.
- 1 node cluster template - control and data node
- 2 node - very small cluster - 1 control and 1 data node
- 10 node - medium cluster - 3 control and 7 data nodes
In this tutorial, you learned how to set up MapR on a private cloud using Sahara on DevStack. Let us know if you have any feedback on the tutorial, or if you are running into any issues.