Key Tips for Managing Passwords in Sqoop

Sqoop is a popular data transfer tool for Hadoop. Sqoop allows easy import and export of data from structured data stores like relational databases, enterprise data warehouses, and NoSQL datastores. Sqoop also integrates with Hadoop-based systems such as Hive, HBase, and Oozie.

In this blog post, I will cover the different options available for managing passwords in Sqoop. Sqoop is a very stable and easy to use tool for transferring data into and out of Hadoop. However, if you want to automate the import/export process, the password has to be in plan text, and this can be abused. Here are some solutions to this challenge.

1. Use a Protected File in the Filesystem

Create separate files for Prod and Development (it is always a good pratice to use separate accounts for Prod and Development).

Set File permissions based on the user

ProdFile, Only Prod users have access

-rw-------  1 vgunnuProd  staff   1.1M Oct  6  2014 password.txt

Dev File, Giving group read access - To share the file between team members

-rw-r-----  1 vgunnuDev  staff   1.1M Oct  6  2014 password.txt

sqoop import --connect jdbc:mysql://mapr.com/sqoop \
	 		--username sqoop \
			--password-file /mapr/democluster/opt/passwords/prodpass.txt --table vgunnu

2. Read Passwords from the Database

Maintain Dev and PROD database tables, and leverage MySQL grant premissions to lock down PROD password tables in order to give access to certain users from Prod edgenodes.

# Read password from database
# Select Password from mapr.Prod where applicationdb='mapr.vgunnu'
echo -n $Password > /mapr/democluster/opt/passwords/prodpass.txt
sqoop import --connect jdbc:mysql://mapr.com/sqoop \
	 		--username vgunnu \
			--password-file /mapr/democluster/opt/passwords/prodpass.txt --table vgunnu
rm /mapr/democluster/opt/passwords/prodpass.txt

3. Use the Database and Expect Script

In the above option, instead of saving the password to file, we can use a linux spawn and expect script.

#!/bin/bash
# Read password from database
# Select Password from mapr.Prod where applicationdb='mapr.vgunnu'
# Read password to variable pwd
/usr/bin/expect «EOF
    spawn sqoop import --connect jdbc:mysql://mapr.com/sqoop --username vgunnu -P --table vgunnu
    expect "Enter password:"
    send "$pwd\r\n"
    set timeout -1  # Wait for Sqoop to finish
    expect "~~~~~~~~~~~~"
    wait
EOF

4. Use the Hadoop CredentialProvider API

In Hadoop 2.6, a fundamental feature was introduced – the Credential API. The CredentialProvider API in Hadoop allows for the separation of applications and how they store their required passwords/secrets. With Sqoop 1.4.5, the credentail API keystore is supported by Sqoop.

To generate an encrypted credentail keystore:

[mapr@maprdemo ~]$ hadoop credential create msql.vgunnudb -provider jceks://maprfs/user/mapr/mysql.password.m.jceks
Enter password: 
Enter password again: 
msql.vgunnudb has been successfully created.
org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated.

[mapr@maprdemo ~]$ hadoop credential list -provider jceks://maprfs/user/mapr/mysql.password.m.jceks
Listing aliases for CredentialProvider: jceks://maprfs/user/mapr/mysql.password.m.jceks
msql.vgunnudb

sqoop import -Dhadoop.security.credential.provider.path=jceks://maprfs/user/mapr/mysql.password.m.jceks \
			--connect jdbc:mysql://mapr.com/sqoop \
	 		--username vgunnu \
			--password-file /mapr/democluster/opt/passwords/prodpass.txt --table vgunnu \
             -–password-alias msql.vgunnudb

In this blog post, you learned about the different options available for managing passwords in Sqoop. If you have any further questions, please ask them in the comments section below.

no

CTA_Inside

Ebook: Getting Started with Apache Spark
Interested in Apache Spark? Experience our interactive ebook with real code, running in real time, to learn more about Spark.

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams

 

 

 

Download for free