Central Configuration – the Simple Way to Apply Customized Configuration Files

Central configuration has been around since the 2.0 release, but many people are not using this time-saving feature. This post explains briefly how to use it and how it can simplify the way you run MapR.

Customized Configuration Files – Why You Use Them

Each MapR service has a set of configuration files associated with it. Each configuration file has a set of default values that can be customized for your purposes.

For example, the TaskTracker service has a configuration file, hadoop/hadoop-0.20.2/conf/mapred-site.xml, that contains the parameter mapred.tasktracker.map.tasks.maximum. The default value is -1, which means that the number of map task slots is calculated by a formula. To override the default, you would assign a new value to this parameter and load the mapred-site.xml file to each node where you wanted to apply the new value. Without central configuration, this could be very time-consuming, especially for a large cluster with a lot of nodes.

Central Configuration to the Rescue

Customized configuration files are stored in a volume, mapr.configuration (mounted at /var/mapr/configuration), that is created just for central configuration. The directory structure looks like this:
  • Files that apply to all nodes with a particular service: /var/mapr/configuration/default
  • Files that apply to specific nodes with a particular service: /var/mapr/configuration/nodes/
These files are polled at regular intervals (every five minutes, by default) to check if they are more recent than the version stored locally in the /opt/mapr directory. If a more recent version of a configuration file is found, it is copied to the /opt/mapr directory.

The pullcentralconfig Script

At the heart of the central configuration feature is the pullcentralconfig script. Here’s how it works:
  1. The script picks up the names of the services assigned to a node from /opt/mapr/roles.
  2. For each service (or role), the script reads the file /opt/mapr/servicesconf/<role> to figure out the configuration files that need to be pulled.
  3. For each configuration file, the script checks if a per-node copy is present in MapR-FS and if it is more recent than the local copy. If so, it copies the later version over to local.
  4. For each configuration file, if the per-node configuration file is not present, the script checks if a central copy is present. If it is present and if it is more recent than the local copy, it copies the central copy of the file over to local.
  5. The script performs copy-to-local this way:
    1. Take a backup of the local config file.
    2. Copy the file from MapR-FS to a temp file on local.
    3. Rename the temp file to local config file.
Example Central Configuration saves time in large-cluster scenarios like this:

Suppose you have a cluster with 120 nodes, and 100 of them are running the TaskTracker service. Now suppose that 90 of these TaskTracker nodes (named host1 – host 90 in this example) need to use the same customized version of mapred-site.xml. Instead of loading the customized file to each node individually, you can create the file and load it to the /var/mapr/configuration/default directory. The pullcentralconfig script does the rest.

Step-by-step instructions:
  1. Make a copy of the existing default version of the mapred-site.xml file (so you can use it as a template), and store it in the /tmp directory. For example:
    $ cp /opt/mapr/hadoop/hadoop-0.20.2/conf/mapred-site.xml \ /tmp/mapred-site.xml
  2. Edit the copy and put in the changes you want for host1 through host90.
  3. Store the new configuration file in the /var/mapr/configuration/default directory. For example:
    $ hadoop fs -put /tmp/mapred-site.xml \ /var/mapr/configuration/default/hadoop/hadoop-0.20.2/conf/mapred-site.xml
Now suppose that the remaining 10 TaskTracker nodes (host 91 – host100) each use a different version of the mapred-site.xml file. These node-specific configuration files get stored under /var/mapr/configuration/nodes in a node-specific sub-directory.

To assign each customized configuration file to its corresponding node, follow these steps:
  1. Copy the default version of mapred-site.xml into the /tmp directory.
  2. Edit that file to create the node-specific configuration file for each node.
  3. Create a sub-directory under /var/mapr/configuration/nodes for each node. For example, the sub-directory for host91 is /host91. For example: $ hadoop fs –mkdir /var/mapr/configuration/nodes/host91
  4. Store the customized configuration file in the directory you just created. For example, $ hadoop fs –put /tmp/mapred-site.xml \ /var/mapr/configuration/nodes/host91/hadoop/hadoop-\ 0.20.2/conf/mapred-site.xml
  5. Follow steps 3 and 4 for the remaining nodes (host92 – host100).
Now that you have the customized configuration files stored in /var/mapr/configuration/default and /var/mapr/configuration/nodes, the pullcentralconfig script does the rest. Just restart the service (TaskTracker in this case) for the changes to take effect. No more commands are necessary to load customized configuration files to each node!

For a full description of central configuration, go to doc.mapr.com/display/MapR/Central+Configuration.

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free