As a Hue developer at MapR, I have realized that ease-of-use wins customers at the end of the day, and accelerates the adoption of Hadoop. Hue, which is an open source UI that makes it easier to use Apache Hadoop, offers a window into analyzing and visualizing big data for monetary value.
While big data security analytics promises to deliver great insights in the battle against cyber threats, the concept and the tools are still maturing. In this blog, I’ll simplify the topic of adopting security in Hadoop by showing you how to encrypt traffic between Hue and Hive.
Hue can communicate with Hive over a channel encrypted with SSL. Let’s take a look at the interface and the handshake mechanism first before trying to secure it.
The basic high-level idea concept behind the SSL protocol handshake mechanism is shown in the diagram shown below, where Hue is the SSL Client, and Hive is the SSL Server.
- SSL Client (Hue) opens a socket connection and connects to Hive. This is then encapsulated with a wrapper that encrypts and decrypts the data going over the socket with SSL.
- Once Hive receives an incoming connection, it shows a certificate to Hue (which is like a public key saying it can be trusted).
- Hue can then verify the authenticity of this certificate with a trusted certificate-issuing authority, or it can be skipped for self-signed certificates.
- Hue encrypts messages using this public key and sends data to Hive.
- Hive decrypts the message with its private key.
The public/private keys always come in pairs and are used to encrypt/decrypt messages. These can be generated with the UNIX keytool command-line utility which is understood by the Java keystore library, or with the UNIX OpenSSL utility which is understood directly by the Python SSL library.
The Hive-side uses Java keystore certificates and public/private keys and Hue’s Python code calls the SSL library implemented in C. Much of the complication arises in not having one uniform format which can be understood by all languages—Python, Java and C. For example, the SSL C library on the client side expects a private key from the SSL server, which is not a requirement in a pure java SSL client implementation. Using the Java keytool command, you cannot export a private key directly into the pem format understood by Python. You need an intermediate PKCS12 format.
Let’s step through the procedure to create certificates and keys:
If you're using mapr-release 3.1.1 or later with secure mode enabled, the keystore (generated below in step 1) is already generated for you at /opt/mapr/conf/ssl_keystore (instead of keystore.jks used below)
If you're using ssl_keystore, the srcstorepass is contained in /opt/mapr/conf/ssl-server.xml So you can skip step 1 below.
- Generate keystore.jks containing private key (used by Hive to decrypt messages
received from Hue over SSL) and public certificate (used by Hue to encrypt
messages over SSL)
keytool -genkeypair -alias certificatekey -keyalg RSA -validity 7 -keystore keystore.jks
- Generate certificate from keystore
keytool -export -alias certificatekey -keystore keystore.jks -rfc -file cert.pem
- Export private key and certificate with openSSL for Hue's SSL library to ingest
Exporting the private key from a jks file (Java keystore) needs an
- Import the keystore from JKS to PKCS12
keytool -importkeystore -srckeystore keystore.jks -destkeystore keystore.p12 -srcstoretype
JKS -deststoretype PKCS12 -srcstorepass mysecret -deststorepass mysecret -srcalias
certificatekey -destalias certificatekey -srckeypass mykeypass -destkeypass mykeypass -noprompt
- Convert pkcs12 to pem using OpenSSL
openssl pkcs12 -in keystore.p12 -out keystore.pem -passin pass:mysecret -passout pass:mysecret
- Strip the pass phrase so Python doesn't prompt for password while
connecting to Hive
openssl rsa -in keystore.pem -out hue_private_keystore.pem
The following needs to be setup in hue’s configuration file hue.ini - under beeswax section:
[[ssl]] # SSL communication enabled for this server. enabled=true # Path to the private key file. key=/path/to/hue_private_keystore.pem # Path to the public certificate file. cert=/path/to/cert.pem # Choose whether Hue should validate certificates received from the server. validate=false
Now configure your hive-site.xml with the following properties on hive-0.13
Make sure no custom authentication mechanism is turned on
<property> <name>hive.server2.use.SSL</name> <value>true</value> </property> <property> <name>hive.server2.keystore.path</name> <value>/path/to/keystore.jks</value> </property> <property> <name>hive.server2.keystore.password</name> <value>mysecret</value> </property>
On Hive-0.12, the property is hive.server2.enable.SSL instead of hive.server2.use.SSL
That’s it—you’re done! I hope this saved you hours of painstaking effort and frustration. You can also enable authentication by setting up security in Hadoop core and in Hive. If you have any questions, please let me know by leaving a comment below!