My Experience with Running Docker Containers on Mesos

The following is a guest blog post from John Omernik, Data Enthusiast and VP of Big Data Analytics and Manager of the Fraud Center of Excellence at Zions Bank, one of the nation's premier financial services companies. In this blog post, he shares how he is using the MapR file system with new technologies like Mesos and Docker, and he has included a script he wrote to help with the process.

My Technology Stack
In this blog post, I’d like to share with you how I’m able to run analytics workloads alongside Docker containers in a single cluster. The stack that we’re researching at Zions (and that I’m running at home) is Apache Mesos running on top of the MapR platform and MapR-FS. My goal is to try to make this a ubiquitous computing platform. For analytics, I’m running Spark and Myriad (where there’s a lot of development work happening by MapR and others in the field). Myriad is how I’m running my MapReduce jobs. I also have Kafka and Storm running on Mesos, working with the MapR file system, and working in concert with the environment here.

MapR helps out immensely when it comes to running Docker containers on Mesos. One example of a service I’m running in Docker is the Hive metastore service. Since the Hive metastore requires a relational database to persist table metadata, it requires I also deploy a MySQL server instance. Instead of deploying MySQL on a separate server outside the cluster, I’m running it in a Docker container on Mesos, launched through Marathon. Because the data MySQL stores is extremely important, I want to make sure if the container crashes or the node it’s hosted on dies, Marathon will be able to create a new container to take over where it left off with all data intact. The NFS feature of MapR-FS made this extremely easy, both because of its random read/write capabilities and the high performance needed to sustain the load of a database.

Taking Advantage of the MapR File System
One problem that I needed to solve was that if a MySQL database were to be started, it would need exclusive access to its database files. I wanted to prevent the accidental starting of another Docker container that would be running on those same files. That’s because having two instances of MySQL accessing the same data files would not be a good thing if you want your database files to have integrity. So I looked into it, and started working with Ted Dunning and Keys Botzum at MapR. I asked them, “How do I do a lock?” Although MapR NFS does not support locking in the traditional Unix sense, MapR does support the file system standards of enabling locking by creating directories and creating files.

Using their suggestions, I wrote a script that implements this locking scheme that allows for reliable persistent data storage. This seems like something others would benefit from, so I’m sharing it here.

There are two separate sides to it. The first is, “I want to take a lock on this file and I want it to be exclusive.” That’s not supported, but on the other side, MapR does support the semantics of creating a directory and being the only one to be able to create that directory, and that’s what I utilized in this script. I wanted to be able to create something that my Docker container could detect and say, “Oh, someone else is using this data. I need to shut down.” My script prevents having two different instances of MySQL or Hive Metastore running on my cluster, but I still have the ability to run MySQL on any node in my cluster. There are no constraints on where it runs. One of the ways that the Mesos community is looking to solve this problem is to persist data to different frameworks—so you get to use this data block—and that’s coming in a future release. But MapR has this performant file system that’s available on all my nodes, and I wanted to take advantage of that.

My Code for Handling File System Locking for Docker Containers
Basically, the code acts as a shim. Instead of starting whatever process I’m looking to start directly within the Mesos world, I call this code, and it checks for each specific directory that I set. For example, if it’s the MySQL or Minecraft Docker container, it checks a single location per container. My Minecraft server has one location in MapR-FS; that’s what it checks to determine if it can take an exclusive lock on that directory and therefore, run. If it can’t do that—if it sees that something else has that directory locked, then it knows it can’t run and it closes the container down. This ensures that I don’t have more than one Docker container of the same type. I don’t want two Minecraft servers running, because they’d be working off the same data, and that could cause file corruption.

Here is the code that I wrote for handling file system locking for Docker containers:

#!/bin/bash

#The location the lock will be attempted in 
LOCKROOT="/minecraft/lock"
LOCKDIRNAME="lock"
LOCKFILENAME="mylock.lck"

#This is the command to run if we get the lock. 
RUNCMD="./start.sh"

#Number of seconds to consider the Lock stale, this could be application dependent. 
LOCKTIMEOUT=60
SLEEPLOOP=30

LOCKDIR=${LOCKROOT}/${LOCKDIRNAME}
LOCKFILE=${LOCKDIR}/${LOCKFILENAME}

if mkdir "${LOCKDIR}" &>/dev/null; then
    echo "No Lockdir. Our lock"
    # This means we created the dir!
    # The lock is ours
    # Run a sleep loop that puts the file in the directory
    while true; do date +%s > $LOCKFILE ; sleep $SLEEPLOOP; done &
    #Now run the real shell scrip
    $RUNCMD
else
    #Pause to allow another lock to start
    sleep 1
    if [ -e "$LOCKFILE" ]; then
        echo "lock dir and lock file Checking Stats"
        CURTIME=`date +%s`
        FILETIME=`cat $LOCKFILE`
        DIFFTIME=$(($CURTIME-$FILETIME))
        echo "Filetime $FILETIME"
        echo "Curtime $CURTIME"
        echo "Difftime $DIFFTIME"

        if [ "$DIFFTIME" -gt "$LOCKTIMEOUT" ]; then
            echo "Time is greater then Timeout We are taking Lock"
            # We should take the lock! First we remove the current directory because we want to be atomic
            rm -rf $LOCKDIR
            if mkdir "${LOCKDIR}" &>/dev/null; then
                while true; do date +%s > $LOCKFILE ; sleep $SLEEPLOOP; done &
                $RUNCMD
            else
                echo "Cannot Establish Lock file"
                exit 1
            fi
        else
            # The lock is not ours.
            echo "Cannot Estblish Lock file - Active "
            exit 1
        fi
    else
        # We get to be the locker. However, we need to delete the directory and recreate so we can be all atomic about
        rm -rf $LOCKDIR
        if mkdir "${LOCKDIR}" &>/dev/null; then
            while true; do date +%s > $LOCKFILE ; sleep $SLEEPLOOP; done &
            $RUNCMD
        else
            echo "Cannot Establish Lock file - Issue"
            exit 1
        fi
    fi
fi
# End

Running Open Source Software on MapR: Support Has Been Fantastic
Some people might be afraid of using a “hybrid” like MapR. And by hybrid, I mean that a lot of tools that you want to run are going to be open source, yet the file system is not. This is a challenge for some people in the open source community, because some people may think, “I want to run Spark; I want to run something like Mesos. If I try to pair that with something like MapR, who is going to support me? Who is going to help me make this work? If I’m running it on standard Apache HDFS, a lot of people are available to help, from a community perspective.” That’s one of the fears that people have when combining open source with closed source.

But what I’ve found to be the case is that MapR has been fantastic in terms of working with their community through resources like answers.mapr.com, as well as through direct interaction. If there’s something that I can’t solve because the code that I need is not available, MapR is always willing to work with me and help me understand what’s going on.

Advice for Those Wanting to Run Mesos and Docker on MapR
Start with identifying the amount of resources to give to MapR, and then give the rest of the resources to Mesos. Currently, I kind of have it “half together” because I don’t have an official installation package. I just installed MapR and Mesos and said, “All right, play nice together.” Things have been working well, but I could definitely see conflicts because of how I have resources allocated. And that’s something that MapR is working on addressing, as they look at dynamically carving resources between MapR and Mesos in the near future.

Other Fun Projects Using Mesos on MapR
I could easily go off topic on a number of subjects here! The stuff that I’m doing right now with this is just awesome—I can do everything from running the MySQL database to running my kids’ Minecraft server on this cluster. That’s one of the things that I find so fascinating—this really can do anything! My kids are enjoying it; there are no issues between running the Minecraft server on a VM or running in Docker on my cluster. All the Minecraft world data is persisted to MapR-FS via the NFS service. For me, it’s about solving a problem, and MapR was able to do things that other technologies couldn’t. I don’t how to do a random read/write in a file on HDFS; I don’t know how to get Minecraft running in HDFS – but I can do this using MapR-FS.

As I mentioned, I’ve been working on making my home network run on MapR with Mesos, because there’s a lot of interesting ways to use it. Of course, very few people are going to be doing that level of integration. I do it so that I can understand how MapR with Mesos works. Also, I use an open source DVR called MythTV that’s Linux-based, and lets you record TV. I’m running a VM now, and my goal is to try to get that running in Docker on my cluster, just to see if I can.

I’ve really enjoyed using technologies such as Mesos and Docker on MapR, and I hope that you’ll find the code that I wrote for persistent Docker storage useful.

Do you have any questions, or want to share how you’re using Mesos and Docker on MapR? Tell us in the comments section below.

no

CTA_Inside

Data Centric Enterprise
Learn how to transform your company into a supercharged, data-centric enterprise—download your copy today!

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams

 

 

 

Download for free