Here at MapR, developer productivity is critical to us. In order to keep our pace of innovation high and give customers more choice and flexibility in Apache Hadoop and other open source projects we ship with the MapR Distribution for Hadoop, we apply DevOps methodologies as widely as we can. One critical piece of this is ensuring we can rapidly test our builds to ensure quality in the codebase. Automation is key here, which is what allows us to integrate all the latest innovations across multiple releases from the community in our Hadoop distribution. For example, we test and support Hadoop 2.7 with Drill 1.1 and Hive 1.0, Hadoop 2.6 with Drill 1.2 and Spark 1.3.1, and so on. For customers supporting 50 or more applications on a single MapR cluster there are many combinations possible within the MapR Distribution, which allows them to upgrade applications incrementally, saving lots of time and money.
To deliver this fast pace of innovation, we’ve been using Docker extensively. Rather than using physical servers or VMs to provision this multitude of test clusters, we build and maintain Docker images of MapR that can be provisioned on demand. This has reduced the deployment time of a test cluster from hours to seconds!
In this post, we will share the tools and methodology we use to create these Dockerized MapR clusters. We expect that you’ll find these useful as well, both to learn MapR and to test out new applications.
- Create a multi-node MapR cluster.
- The cluster nodes need to be accessible outside the host running the containers.
- Launch clusters of different sizes.
- Use real disks to achieve realistic performance.
- Server running CentOS/RHEL 7.x with 16GB+ RAM
- Docker 1.6.0+
- sshpass installed
- Free, unmounted physical disks to be attached to the MapR node containers
Network Set-up: While working towards these goals, the networking requirement was one of the critical pieces. The containers/cluster nodes need to be accessible from outside(routable). We don’t want to have a complex network setup.
Step1 : Set up a bridge interface which is routable. (Eg : br0) Ref
Here is a config example on CentOS 7.0 server:
# cat /etc/sysconfig/network-scripts/ifcfg-br0 DEVICE="br0" ONBOOT=yes IPV6INIT=no BOOTPROTO=static TYPE=Bridge NAME="br0" IPADDR=10.10.101.135 NETMASK=255.255.255.0 GATEWAY=10.10.101.1 # # cat /etc/sysconfig/network-scripts/ifcfg-enp4s0 DEVICE="enp4s0" ONBOOT=yes IPV6INIT=no BOOTPROTO=none HWADDR="0c:c4:7a:58:7d:19" TYPE=Ethernet NAME="enp4s0" BRIDGE=br0 #
Step 2 : Get a free range of routable IP addresses from the network admin to be used for the containers in the same vlan as the bridge IP address.
Eg: We got 10.10.101.16/29 - This gives IPs 10.10.101.17 to 10.10.101.22 (for containers)
Configure docker with the following options:
-b=bridge-inf --fixed-cidr=x.x.x.x/mask Eg: -b=br0 --fixed-cidr=10.10.101.8/29 This gives the containers the routable IP addresses in the abovementioned range.
Disks for the Containers:
Each container requires one disk drive or partition to be used for MapR.
Generate a list of disks and put one per each line in a text file.
Eg : # cat /tmp/disklist.txt /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
If there are a greater number of disks in the text file than the containers requested, the remaining disks are added to the first container.
Usage : ./launch-cluster.sh ClusterName NumberOfNodes MemSize-in-kB Path-to-DisklistFile Eg: # ./launch-cluster.sh demo 4 16384000 /tmp/disklist.txt Control Node IP : 10.10.101.21 Starting the cluster: https://10.10.101.21:8443/ login:mapr password:mapr Data Nodes : 10.10.101.22,10.10.101.17,10.10.101.18 #
Launch MapR management console with control node IP: https://10.10.101.21:8443 (from the output of the above example)
In this blog post, you’ve learned how to create instant MapR clusters with Docker. If you have any further questions, please ask them in the comments section below.
Are you interested in reading more about working with Docker and MapR? Read the blog post My Experience with Running Docker Containers on Mesos.