Welcome to the MapR Whiteboard Walkthrough. My name is John and I'm the author of the cluster administration course that you'll find at training.mapr.com. I'm here to talk to you a about the CLDB, or the container location database. Over the next few minutes, I'll give you a quick definition of the CLDB, an overview, and talk a little bit more about what's inside the CLDB.
At the end of this video you should be able to define the function of the CLDB in a MapR cluster and also describe how it differs from the namenode in standard Hadoop.
In a nutshell, the CLDB is a service. It can be installed on one or more of the nodes of a MapR cluster. It maintains the locations of the containers of the cluster as well as containing a lot of other information about the cluster. Being a service, it can be installed on more than one node for high availability. If you choose to install it on more than one node, it will be set up as a master/slave configuration with the slaves on active standby.
This allows for a very quick failover with the recovery time in just a few seconds. While those slaves are in standby, however, they are not just idle. They also serve read traffic, taking a lot of the load off the master. The CLDB is very light weight when compared to the namenode of standard Hadoop.
Containers in a MapR cluster are very large with a default size of 32 gigabytes. This is 500 times larger than the 64 megabyte size of a block in HDFS. Because of this, the CLDB has to process much less data and therefore has a much faster lookup. Even though the CLDB replaces the functions of the namenode in a MapR cluster, it doesn't just serve lookup data. The CLDB is directly integrated into the MapR file system and therefore contains much more information.
If you look at the MapR control system GUI interface in a cluster, you'll find the CLDB view. On the CLDB view, you'll find all sorts of information about the CLDB and the cluster. For example, you can see details about the CLDB itself such as the current mode, version and status. You can also find cluster and node level information such as how much storage is available, and how many task slots are still available. You can find the node health, the service configuration and also information about the disk balancer and role balancers.
In terms of container level information, you can find information about the volume, snapshots and mirrors of the containers as well as the replication type and level. What all this means then is that the integration between the CLDB and MapR-FS allows for a much faster lookup time and a much faster failover recovery than you'll get from a standard HDFS. It also allows for some of the more advanced features that you'll find in MapR Hadoop, such as NFS ingestion, snapshots and HDFS API compatibility.
If you want to learn more about the CLDB, go check out doc.mapr.com, do a search for CLDB view and also look for API reference. I would also recommend that you go to answers.mapr.com and join any of the CLDB discussions there.
If you have any comments that you'd like to make about this video or about the whiteboard walkthrough, please leave them below. Go check out Twitter and look us up @MapR #WhiteboardWalkthrough.
Thanks for watching and feel free to suggest new topics for us to cover in the comments below!