Developing M7 applications with Maven

MapR M7 offers an HBase API compliant implementation of tables which address a lot of architectural limitations of HBase. Since the API is HBase compliant, developing and migrating applications is fairly straightforward but there can be some gotchas one needs to be aware of while building applications with Maven. This post explains how to create a simple M7 application with Maven and deploy it on an M7 cluster. These steps can be applied to a MapReduce application as well.

The application below creates a table with 2 column families, adds 2 rows of data and then prints them back.

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import java.io.IOException;

/* Create table schema using following command. The path for M7 need not be absolute if table mapping is defined
 * echo "create '/user/root/students','address','account'" | hbase shell
 */

public class M7Demo {

	public static void main(String[] args) throws IOException {
		Configuration conf = HBaseConfiguration.create();
		HTable table = new HTable(conf,"/user/root/students");
		Put p1 = new Put("student1".getBytes());
		
		byte[] account = "account".getBytes();
		byte[] address = "address".getBytes();
		
		p1.add(account,"name".getBytes(),"Alice".getBytes());
		p1.add(address,"street".getBytes(),"123 Ballmer Av".getBytes());
		p1.add(address,"zipcode".getBytes(),"12345".getBytes());
		p1.add(address,"state".getBytes(),"CA".getBytes());
		
		Put p2 = new Put("student2".getBytes());
		p2.add(account,"name".getBytes(),"Bob".getBytes());
		p2.add(address,"street".getBytes(),"1 Infinite Loop".getBytes());
		p2.add(address,"zipcode".getBytes(),"12345".getBytes());
		p2.add(address,"state".getBytes(),"CA".getBytes());
				
		table.put(p1);
		table.put(p2);
		table.close();
	}
}

MapR’s maven repository is available at repository.mapr.com/maven/ and list of available artifacts can be found here doc.mapr.com/display/MapR/Maven+Repository+and+Artifacts+for+MapR.

For an M7 application, we will use org.apache.hbase/hbase artifacts which automatically pulls in Hbase and M7 specific dependencies including the client libraries. Add the following under repository section.

	
        <repository>
		<id>mapr-maven</id>
		<url>http://repository.mapr.com/maven</url>
		<releases><enabled>true</enabled></releases>
		<snapshots><enabled>false</enabled></snapshots>
	</repository>
The only needed artifact is org.apache.hbase/hbase which can be added below.
  	<dependency>
  		<groupId>org.apache.hbase</groupId>
  		<artifactId>hbase</artifactId>
  		<version>0.94.13-mapr-1401-m7-3.1.0</version>
  	</dependency>

To make the jar executable, specify a manifest with the class that contains the main method as follows

<plugin>
 <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-jar-plugin</artifactId>
    <configuration>
    <archive>
      <manifest>
          <mainClass>M7Demo</mainClass>
      </manifest>
    </archive>
  </configuration>
</plugin>

The final project pom .xml is listed below.

pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.mapr.support</groupId>
  <artifactId>m7example</artifactId>
  <version>0.0.1</version>
  
  <repositories>
	<repository>
		<id>mapr-maven</id>
		<url>http://repository.mapr.com/maven</url>
		<releases><enabled>true</enabled></releases>
		<snapshots><enabled>false</enabled></snapshots>
	</repository>
  </repositories>
   
  <dependencies>
  	<dependency>
  		<groupId>org.apache.hbase</groupId>
  		<artifactId>hbase</artifactId>
  		<version>0.94.13-mapr-1401-m7-3.1.0</version>
  	</dependency>
  </dependencies>
  <build>
  <plugins>
  	<plugin>
  		 <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-jar-plugin</artifactId>
    <configuration>
    <archive>
      <manifest>
          <mainClass>M7Demo</mainClass>
      </manifest>
    </archive>
     </configuration>
  	</plugin>
  </plugins>
  </build>
</project>

Once the project is ready, create the jar file by right clicking the project under Package Explorer > Run As > Maven install. The final jar is placed under target directory under the project’s home directory. On my system the project has following layout.

The application jar path will be available in Console.

[INFO] 
[INFO] --- maven-jar-plugin:2.3.2:jar (default-jar) @ m7example ---
[INFO] Building jar: C:\Users\Abhinav Chawade\workspace\m7example\target\m7example-0.0.1.jar
[INFO] 
[INFO] --- maven-install-plugin:2.3.1:install (default-install) @ m7example ---
[INFO] Installing C:\Users\Abhinav Chawade\workspace\m7example\target\m7example-0.0.1.jar to C:\Users\Abhinav Chawade\.m2\repository\com\mapr\support\m7example\0.0.1\m7example-0.0.1.jar
[INFO] Installing C:\Users\Abhinav Chawade\workspace\m7example\pom.xml to C:\Users\Abhinav Chawade\.m2\repository\com\mapr\support\m7example\0.0.1\m7example-0.0.1.pom
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.363s
[INFO] Finished at: Fri Mar 28 13:21:23 PDT 2014
[INFO] Final Memory: 12M/304M
[INFO] ------------------------------------------------------------------------

Copy the m7example-0.0.1.jar to one of the cluster nodes. The next section elaborates running the application when scope is changed.

Next step is to copy over the jar file to any of the cluster nodes and launch it. But before launching the application, we need to create a table named “students” with “account” and “address” column families. Create the table by using hbase shell.

The application can be compiled as an Uber jar that includes all dependencies needed to run or as a “thin” jar that includes only application specific classes and configuration files. An uber jar is not a recommended approach as it binds the application with specific version of product or project and makes the application less portable. In addition if the underlying application version changes, the application launch can fail with NoSuchMethodError, UnsatisfiedLinkError or ClassNotFoundException. The next post in this series will elaborate ClassLoaders and their significance in Hadoop.

Running the application on a MapR cluster

Running the application on the cluster entails setting up classpath, create an empty table and running the application. The node which is going to run this application needs to be part of a running cluster with M7 license and have mapr-hbase and mapr-hbase-internal packages installed. Ensure that appropriate packages are installed by using rpm command on RHEL/Centos and dpkg command on Ubuntu. The output of rpm –qa command on my system is as follows.

rpm –qa ‘mapr-*’
mapr-fileserver-3.0.2.22510.GA-1.x86_64
mapr-patch-3.0.2.22510.GA-24829.x86_64
mapr-core-3.0.2.22510.GA-1.x86_64
mapr-cldb-3.0.2.22510.GA-1.x86_64
mapr-sqoop-1.4.4.23554-1.noarch
mapr-nfs-3.0.2.22510.GA-1.x86_64
mapr-hbase-0.94.17.24867.GA-1.noarch
mapr-webserver-3.0.2.22510.GA-1.x86_64
mapr-zk-internal-3.0.2.22510.GA.v3.3.6-1.x86_64
mapr-hive-0.12.24707-1.noarch
mapr-hbase-internal-0.94.17.24867.GA-1.noarch

Ensure that hbase jars are in the classpath by running ‘hbase classpath’ command.

hbase classpath | egrep --color=auto hbase | tr ':' '\n' | less
/opt/mapr/hbase/hbase-0.94.17/bin/../hbase-0.94.17-mapr-1403-SNAPSHOT.jar
/opt/mapr/hbase/hbase-0.94.17/bin/../hbase-0.94.17-mapr-1403-SNAPSHOT-tests.jar
…
/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/maprfs-1.0.3-mapr-3.0.2.jar
/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/maprfs-diagnostic-tools-1.0.3-mapr-3.0.2.jar
/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/mapr-hbase-1.0.3-mapr-3.0.2.jar
/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/mapr-hbase-1.0.3-mapr-3.0.2-tests.jar
…

Hbase classpath is inclusive of libraries that are part of Hadoop libraries. Create a blank table using hbase shell.

echo "create '/user/root/students','address','account'" | hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.94.17-mapr-1403-SNAPSHOT, r3495abbbbeff92ade2e0f38eb46f3946c39a59d8, Thu Mar 27 15:00:38 PDT 2014

Not all HBase shell commands are applicable to MapR tables. Consult MapR documentation for the list of supported commands.

create '/user/root/students','address','account'
0 row(s) in 0.3670 seconds

Launch the application using java command as follows

java -cp `hbase classpath`:m7example-0.0.1.jar M7Demo
14/03/28 15:34:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/03/28 15:34:25 INFO security.JniBasedUnixGroupsMappingWithFallback: Falling back to shell based
echo "scan '/user/root/students'" | hbase shell
HBase Shell; enter 'help' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.94.17-mapr-1403-SNAPSHOT, r3495abbbbeff92ade2e0f38eb46f3946c39a59d8, Thu Mar 27 15:00:38 PDT 2014

Not all HBase shell commands are applicable to MapR tables. Consult MapR documentation for the list of supported commands.

scan '/user/root/students'
ROW                                     COLUMN+CELL
 student1                               column=account:name, timestamp=1396046065516, value=Alice
 student1                               column=address:state, timestamp=1396046065516, value=CA
 student1                               column=address:street, timestamp=1396046065516, value=123 Ballmer Av
 student1                               column=address:zipcode, timestamp=1396046065516, value=12345
 student2                               column=account:name, timestamp=1396046065524, value=Frank
 student2                               column=address:state, timestamp=1396046065524, value=CA
 student2                               column=address:street, timestamp=1396046065524, value=1 Infinite Loop
 student2                               column=address:zipcode, timestamp=1396046065524, value=12345
2 row(s) in 0.4280 seconds
Tags
HBase
M7
Beginner