License and Topology and Volumes, Oh My!

Three Powerful Things You Can Do to Ensure the Success of Your Cluster

In a previous blog post, From Zero to Cluster-Ready, we talked about getting your hardware ready for installing MapR, which we followed with a blog post, Planning and Installing Services, that shared some tips and tricks for installation. Now that your cluster is up and running, there are a few things you should do right away to help ensure that things run smoothly as your cluster grows:
 
  • Adding a license
  • Setting up topology
  • Creating volumes
In this blog post, we’ll cover a few high-level tips about each of these topics.
 

Add a License

The first step in configuring a new cluster is to apply a license. Choose a license level that meets your needs:
 
  • M3—Free edition, with unlimited scale, full Hadoop capability and single-point NFS
  • M5—Supported edition, offering high availability, multiple NFS nodes, and data management
  • M7—All of the above plus native MapR Tables
 
If you do not apply even the free M3 license, you cannot take advantage of features such as NFS. You can see a complete listing of the features available with each license on the MapR Editions page. To add a license, click on the Manage Licenses link at the top right of the MapR Control System screen.
 
 
You can add an M3 license or a trial M5 license yourself, but if you need an M7 license, please contact your MapR sales engineer.

Set up Your Topology

Topology is a description of the physical layout of the cluster hardware, so that MapR knows which nodes are on different racks. When your data is replicated, the copies go to separate racks. That way, if an entire rack goes down, you don’t lose access to your data. Topology is expressed as a tree, and in that way looks very similar to a directory tree—but it’s not! It’s really just a description of the locations of racks and nodes.
 
 
The diagram above shows a cluster with three racks, labeled “rack1,” “rack2,”and “rack3.” Each rack has eight nodes, labeled “node1,” “node2,” “node3,” and so on. To describe the cluster using the terms of physical topology, we use the following guidelines:
 
  • All the racks are inside the enclosing topology /data
  • Each node is inside its rack
  • Any given node can only be in one topology
 
So, in the above example, the node “node2” on rack “rack1” would have the following topology:
/data/rack1/node2
 
As soon as your cluster is running, you should use the MapR Control System to set up the physical topology. From the dashboard, click Nodes to see all the nodes.
 
 
Check the checkboxes for all the nodes on a single rack, and click Change Topology to set up a label to describe the rack.
Setting up the cluster’s physical topology is an important step—not only does it help protect your data in case of a rack failure, it also enables data placement and job placement features in M5-licensed clusters. For more information, see Node Topology.
 

Create Volumes for Your Data

Volumes give you a way to manage data—you should set up volumes for different users, projects, or departments. It is very important to set up volumes as soon as you can. Putting all your data in the cluster without organizing it into volumes can lead to headaches later.
 
A volume is like a flexible bag for data. It takes up only as much space as it needs to. You can apply various policies to a volume as a whole, including permissions, size quota, and data protection policy (with snapshots and mirrors). If you have worked with Linux volumes, then MapR volumes will seem familiar to you; a volume has a name, which is not to be confused with its mount point (the path to the volume, which is often different from the name).
 
Volumes empower the data management features that MapR provides:
 
  • Volume topology lets you specify a subset of cluster nodes that a volume is allowed to use, for data placement (see Setting Volume Topology)
  • Snapshots let you preserve the state of a volume at a particular point in time (see Snapshots)
  • Mirrors let you create read-only copies of a volume for load-balancing, separation of development from production, or backup (see Mirror Volumes)
You’ll need an M5 or M7 license to take advantage of these features.
 
A MapR cluster comes with certain system volumes out of the box. The following diagram shows the system volumes (blue) along with recommended volumes that you should add to your new cluster.
 
 
The root volume (mapr.cluster.root, mounted at /) contains the mount points for the other volumes. MapR provides a volume for HBase (if installed) and a /var/mapr volume containing information about cluster configuration. There is also a local volume for each node—limited by its topology to reside only on its own node.
 
As shown in the example above, you should add a hierarchy of volumes for users, projects and departments, to enable you to manage data for these different entities separately.
 
 
To create a volume in the MapR Control System, click Volumes and then click New Volume. This brings up a dialog box that allows you to specify settings for the volume. Here’s a quick overview:
 
  • Volume Setup: Set the name and mount path of the volume. The mount path determines where the volume will be mounted. For example, if you follow the above volume layout diagram, you might create a volume called johnsmith with a mount path of /users/jsmith. You can also set volume topology here (default is /data of course, to use all racks), and choose whether to create a normal read/write volume or a mirror volume.
  • Permissions: Set the permissions, for each user, for volume operations such as backing up or deleting the volume.
  • Usage Tracking: Set a quota, if desired, to limit the maximum size of the volume. The hard quota is a limit above which writes to the volume are disabled; the advisory quota is a limit above which a warning is sent to the volume’s owner.
  • Replication: Set the desired replication and the replication method for the volume.
For more information on what these settings mean, see Managing Data with Volumes.
 
Setting up the license, topology, and plenty of volumes right after installing your cluster will help ensure a successful deployment, and will provide the maximum benefit of the MapR data management features. There are a few other things that are helpful as well—setting up central configuration, configuring multiple NICs for high network bandwidth, setting up users and authentication, and more. For further reading about how to fully customize your new cluster, see Next Steps after Installation.

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams

 

 

 

Download for free