Getting Started with MapR Security

February 8, 2014

Earlier this month, we released MapR version 3.1. The primary focus of this release was enhancing Hadoop security. We built on the existing operating system based authentication and authorization from previous releases to add features in these four key areas:

  • Network safe authentication
  • Network encryption
  • Enhanced authorization using Access Control Expressions (ACE) in MapR tables
  • Tighter permissions on files on the local file system

What distinguishes MapR's security implementation is that we focused heavily on ease of use and enterprise integration. This led to several key elements of our enterprise grade solution that go beyond the basic Apache functionality. They are:

  • Security with or without the use of Kerberos - while Kerberos is an excellent security technology, it is often difficult to deploy and manage. We've found that many customers interested in securing Hadoop have been blocked by the complexity of Kerberos. In our implementation Kerberos security is rarely needed. But don't worry, if you are a big fan of Kerberos, we've ensured that Kerberos authentication works with MapR easily.
  • Standard Linux account integration - we use PAM and the standard nsswitch functionality to authenticate users and access their user/group information. This is the same function any user uses when logging into a Linux box. As such MapR should work with almost any enterprise user registry with minimal configuration.
  • Ease of use - as with the rest of MapR, we are always focused on ease of use. We worked hard to make securing a MapR Hadoop cluster dramatically easier than anything else out there today.
  • Performance - a highly secure system that performs terribly is difficult to accept. As part of our implementation, performance was a key concern and you'll see that reflected in several key decisions.

This is a first in a series of articles that will discuss various aspects of the MapR security feature set. In this first installment, I'm going to take you through the steps to install and configure a cluster and then demo a few things.

Installation

MapR 3.1 includes a new installer to simplify installation.

 

Follow the installation steps for the first control node, but after the summary page is displayed, select 'm' to modify the configuration options and then select 's' to modify the security options. After enabling security, continue with the installation process. While certainly not required, I recommend installing a one-node cluster for this quick demo. Just to simplify things.

After a while, the cluster configuration process will be completed.

If you are coming from an earlier release of MapR, you are already familiar with our installation process prior the new installer. If you prefer to use the old way with security, not problem. There are just a few minor differences. They are:

  • on the first node with the CLDB run configure.sh with two new options: -genkeys and -secure, for example:

    # cd /opt/mapr/server
    ./configure.sh -C host1 -Z host1 -secure -genkeys

  • on every additional node copy from the first node's conf directory the maprserverticket, ssl_keystore, and ssl_truststore to the conf directory. If the new node is a cldb or zookeeper node, you also need to copy cldb.key. Here's an example using scp on the new node to get data from the first node:

    scp "rootORmapr@node1:/opt/mapr/conf/
    {cldb.key,maprserverticket,ssl_keystore,ssl_truststore}" /opt/mapr/conf

  • on every additional node when you run configure.sh add -secure as an option. Running this way, configure.sh will use the keys you copied from the other nodes. It is critical that all nodes share the same keys. For example if I'm configuring a node that is not a CLDB or ZK node (no need for cldb.key), I'd use these commands:

    #scp " mapr@host1:/opt/mapr/conf/{maprserverticket,ssl_keystore,ssl_truststore}" /opt/mapr/conf
    # cd /opt/mapr/server
    # ./configure.sh -C host1 -Z host1 -secure


Using a Secure Cluster

Now that the cluster is up and running, you'll want to use it. This is where things get a little bit different from an insecure cluster. You'll notice that commands like 'hadoop fs -ls /' fail with security errors. You also cannot connect to the Job Tracker web page on port 50030 without authenticating. Try it and see. I'll wait.

This is because the cluster is now secure and unauthenticated access is not allowed. To use most of the command line tools (hadoop, maprcli, etc.) you need a MapR ticket. To get a ticket, run 'maprlogin password' as shown in this example:

% maprlogin password
[Password for user 'fred' at cluster 'my.cluster.com': ]
MapR credentials of user 'fred' for cluster 'my.cluster.com' are written to '/tmp/maprticket_1001'

% maprlogin print
Opening keyfile /tmp/maprticket_1001
my.cluster.com: user = fred, created = 'Tue Jan 21 13:54:15 PST 2014', expires = 'Tue Feb 04 13:54:15 PST 2014', RenewalTill = 'Thu Feb 20 13:54:15 PST 2014', uid = 1001, gids = 4

You've now used maprlogin to perform two things. First, you used it to authenticate using a password to MapR. Then you used it to print out your MapR ticket information. Now try 'hadoop fs -ls' and it will work.

It may not be obvious, but your MapR identity is now independent of your Unix identity. This is particularly handy if you are accessing MapR from some remote client system or even Windows. Try this if you can. Login as one Unix user and then use maprlogin to become another user to MapR. Here's an example, notice that I specify '-user mapr':

% id
uid=1001(fred) gid=4(adm) groups=4(adm)
% maprlogin password -user mapr
[Password for user 'mapr' at cluster 'my.cluster.com': ]
MapR credentials of user 'mapr' for cluster 'my.cluster.com' are written to '/tmp/maprticket_1001'
% maprlogin print
Opening keyfile /tmp/maprticket_1001
my.cluster.com: user = mapr, created = 'Tue Jan 21 13:57:42 PST 2014', expires = 'Tue Feb 04 13:57:42 PST 2014', RenewalTill = 'Thu Feb 20 13:57:42 PST 2014', uid = 2147483632, gids = 2000, 42

Notice that your MapR identity and your Unix identity are completely different. Now try to create a file on the cluster and notice who owns it as in this example:

$ hadoop fs -mkdir /test
$ hadoop fs -ls /
Found 4 items
-rwxr-xr-x    3 mapr mapr 0          2014-01-21 13:45 /hbase
drwxr-xr-x    - mapr mapr 0          2014-01-21 13:58 /test
-rwxr-xr-x    3 mapr mapr 1          2014-01-21 13:46 /user
drwxr-xr-x    - mapr mapr 1          2014-01-21 13:45 /var

This independence from the operating system identity is both how we ensure a secure cluster and provides you with great flexibility.

One final thing to try. Connect to the Job Tracker page using HTTPS (secure by default) and enter a valid userid and password. Take note that you can now see the page but only if authenticated. We won't show it here, but if you have time, try running some jobs as one user and trying to look at the job details as another non-administrative user. You'll see that you can't. Only users (and administrators) can see their own jobs.

Notice by the way that we didn't have to perform any Kerberos configuration. If you do want to use Kerberos, we'll discuss that in a later article.

In this article, we've shown you the basics of MapR security. In future articles we'll take you into more depth on a variety of topics. If you have suggestions for future articles, let us know. For now, we suggest you take a look at the security guide.