How to Store Job Metrics Database on MapR Cluster

The MapR Job Metrics database is a powerful analytic tool for MapR deployments. One limitation, however, is that the MySQL instance hosting the data is potentially locked to a single node with only the local boot disk for storage. Storing MySQL data on the MapR Data Platform itself offers a simple solution to this problem, with the added advantage of preserving the metrics data in the event the database server goes off-line.

MySQL can use any file system to store its data. Using the Direct Access NFS features of MapR to provide the storage for MySQL delivers several advantages over the basic local file system configuration:
  1. The metrics database can utilize the capacity of the MapR cluster, not just the local Linux file systems. Of course, the administrative capabilities of MapR still enable rational limits to be placed on the size of the metrics database.
  2. The database will be automatically replicated, providing a more reliable storage solution in the event of an individual disk drive failure.
  3. In the event of node failure, the MySQL data is still available.  Another node can be easily configured to re-host the metrics service.
The remainder of this post presents a process for deploying the metrics database on the MapR cluster. This post assumes a basic knowledge of the MapR installation and configuration process, along with an understanding of NFS administration.

Assumptions and Prerequisites

If the MySQL server is not yet installed, you’ll need to identify the server you wish to use and install the proper packages (mysql-server and mysql-client on Ubuntu systems, mysql-server and mysql on RedHat/CentOS systems). MapR software does not have to be installed on the server running MySQL, although having the basic mapr-client packages will simplify some of the following configuration steps.  You must have a copy of the /opt/mapr/bin/setup.sql script (whether from the mapr-* packages installed on the MySQL server node or copied from another node in the cluster).
NOTE: RedHat installations require the soci-mysql package from the EPEL Repository as well; you’ll need to configure the EPEL repository specification on your server before attempting the mysql-server installation.
The MySQL server should also be configured to NFS mount the MapR cluster file system to a local directory. Details can be found at Accessing Data with NFS. We’ll assume that the cluster is mounted at /mapr.

Setting Up the Metrics Database

Now we’re ready to set up MySQL.  The basic details of setting up the metrics database can be found in Setting up the MapR Metrics Database in the MapR Documentation.

We will make the following changes to the setup procedure in order to store MySQL data on the MapR cluster.
  1. Create a new volume in the MapR cluster to store the metrics data
  2. Ensure that the volume is mounted to the MySQL server system via NFS. (This step has been completed above, and we assume the cluster is mounted at /mapr.)
  3. Configure the MySQL instance to use that location for its data.
The instructions below show how to perform these steps from a command line, but you can accomplish the same result using the MapR Control System GUI.

Creating a volume is simple using the following maprcli commands. Ideally, the following commands should be executed on the server that has the MySQL software installed so as to properly identify the user mysql to the MapR cluster.
maprcli volume create -name mapr.mysql -user mysql:fc \
   -path /var/mapr/mysql -createparent true -topology /
maprcli acl edit -type volume -name mapr.mysql -user mysql:fc
If you have a running MySQL instance, you’ll need to shut it down to make the necessary configuration changes. Any metrics data for MapR jobs that run during this brief shutdown will not be logged.

Edit the my.cnf file (/etc/mysql/my.cnf on Ubuntu or /etc/my.cnf on RedHat/CentOS) and update the datadir specification to match the path specified for the MySQL data volume above. Replace <cluster> with the name of your cluster.
datadir = /mapr/<cluster>/var/mapr/mysql
Set the ownership of that directory to match the default data location.   Failing to do this will result in a failure to launch the database.
chown --reference=/var/lib/mysql /mapr/<cluster>/var/mapr/mysql
If your server had already instantiated a MySQL database, you’ll want to move those data files from the local file system to the new /mapr location. A simple cp -rp command will do. For example:
cp -rp /var/lib/mysql/* /mapr/<cluster>/var/mapr/mysql
Otherwise, you’ll need to initialize the new directory for the database instance with the command
mysql_install_db
The database can be started with the command
  • RedHat/CentOS: service mysqld start
  • Ubuntu: mysql start
If the database was initialized from scratch on the MapR cluster, you’ll need to perform the initialization steps described in Setting up the MapR Metrics Database.  The steps include creating a user account within the MySQL instance for the MapR user, enabling the proper permissions, and initializing the metrics schema with the $MAPR_HOME/bin/setup.sql script. Those steps are the same, whether the data is stored on a local file system or stored on the cluster.

And you’re done! If everything went successfully, you can confirm that the MapR Control System GUI can access the Metrics database and that any MySQL data you copied onto the cluster is recognized.

Improving High Availability of the Metrics Database

So what happens if the MySQL server fails? How will you recover so that MapR Metrics can continue logging data? As a best practice, you can save the MySQL configuration file my.cnf to the cluster as well. In the event that the server node running MySQL fails, you can recover the metrics functionality by doing the following.
  1. Configure a new server with the MySQL server software.
  2. Copy the archived my.cnf file into place on the new server.
  3. NFS-mount the cluster file system on that new server.
  4. Start up the database instance.
  5. Reconfigure the cluster nodes with the new metrics specification and restart the hoststats process on those nodes.
    $MAPR_HOME/server/configure.sh -R -d <new_MySQL_srvr>:3306
    cd $MAPR_HOME/initscripts
    maprcli node services -name hoststats -action restart -nodes `hostname`
    
On clusters with the centralized configuration feature enabled, step 5 above can be simplified by editing db.conf and hibernate.cfg.xml in /mapr/<cluster>/var/mapr/configuration/default/conf and updating the metrics-host field. Changes to the metrics-host field will be pulled down to the cluster nodes automatically within a few minutes.   You will still need to restart the hoststats daemon once the configuration files have been downloaded to the cluster nodes.

Related Links

David Tucker is a Cloud Solutions Architect for MapR Technologies, Inc.

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams

 

 

 

Download for free