Cisco and MapR have been long time partners on the Big Data journey. The recently published MapR CVD further strengthens the integration of our products with UCS’s superb management capabilities complemented by the award winning, enterprise-grade big data platform, the MapR Distribution including Apache Hadoop.
With the advent of container technology like Docker and application resource management platforms such as Apache Mesos, enterprise customers are looking at these technologies very seriously as they promise much shorter development cycles and highly scalable product deployment.
A common use case for Mesos deployments is scaling Apache webserver services dynamically. Normally without a utility-grade persistent storage such as one backed by MapR-FS, the storage allocated to Docker containers is ephemeral and is lost if a container crashes or is killed. With UCS and MapR, the web content is consistent among the Docker containers. Logs are persisted to MapR-FS and later analyzed with Hadoop.
Dockerizing the web server
With the Cisco UCS servers at the foundation, we effortlessly spun up a 10-node MapR cluster with Mesos installed. Using Docker, we created a webserver container and then launched the container with Marathon – a Mesos framework for initiating/scaling long-running applications. We simply typed in an arbitrary ID and the following string in the command section of the New Application form:
“docker run -d –v /mapr:/www my/webserver”(my/webserver is the Docker container name) and off it went. The webserver spun up almost instantaneously. We used the “Scale” button to quickly spin up 5 more web containers in less than a few seconds. See figure below.
Sharing persistent storage among containers with MapR-FS
A container has its own storage that is limited in space and cannot be shared with the other containers. The MapR POSIX compliant NFS gateway is a perfect solution that allows the containers to tap into the robust, HA/DR-ready MapR-FS for big data analytics. Note that we already have NFS-mounted MapR-FS on the cluster nodes under /mapr. When we spun up the container, the –v option allows the /mapr mount point on the host node to be mapped to a /www mount point in the container. Furthermore, we modified the DocumentRoot directive in httpd.conf to point to /www. This makes managing the web content much easier with real-time synchronization across all the web containers. Additionally, we modified the CustomLog and ErrorLog directives pointing to a log directory under /www where each container has its own set of log files associated with a unique hostid. With the MapR NFS gateway, we can simply verify these log files by typing the Unix ls command against the NFS mount point:
# ls /mapr/
/www/logs 2c4b95924357_access.log 64cc248c438b_error.log 869fff95a17a_access.log 871e141fc8e7_error.log 2c4b95924357_error.log 6b33974a3848_access.log 869fff95a17a_error.log 9631b85d9dd2_access.log 64cc248c438b_access.log 6b33974a3848_error.log 871e141fc8e7_access.log 9631b85d9dd2_error.log
This setup ensures that we have a central log repository protected by MapR-FS with scheduled snapshots and mirrored volumes and can be processed later with SQL-on-Hadoop tools like Apache Drill to perform web click stream analysis. Of course this is just to quickly demonstrate what the combined power of MapR-FS, Docker and Mesos is capable of. The sky is the limit when it comes to other big data applications.
Project Myriad and beyond
As you know, Yarn manages Hadoop cluster resource and Mesos manages cluster resources for applications. Unfortunately they do not communicate with each other although each does a fairly good job in its own realm. Project Myriad was created to break down the wall between Mesos and Yarn. We believe that with Cisco UCS at the hardware foundation that delivers rock solid performance with top quality compute/network/storage resources, the MapR Distribution with Myriad enable the aggregation of the pools of resources for Yarn and Mesos. This combo holds great promise to solving most operation/development challenges by achieving much better resource allocation while retaining agility at scale.