How to Build Applications on a NoSQL Document Database and Perform Analytics in Place

In this article we will explore what it means to have a converged data platform for building and delivering business applications. This sample application will be to create blog articles for a personal website.

The following picture illustrates the the flow of data and how the MapR-DB Rest application utilises the OJAI API to interact with the MapR-DB JSON tables. Then we will explore how  to run ANSI SQL using Apache Drill 1.6 on the same MapR-DB JSON tables, without having to transform or move the data elsewhere.

Ingest anything in it’s natural form and use standard tools for in-place analytics.

This MapR-DB  REST Application contains the following components:

  • A Main class that starts a Jetty Server and configures a REST Interface using JAX-RS
  • Swagger to easily access the HTTP API
  • An Angular JS Application that consumes this API

To get started you first need a MapR installation or sandbox running 5.1 and Apache Drill 1.6 with at least 8GB of RAM. If you have this already please skip to “Configuring the build environment” section.

Getting a MapR 5.1 Sandbox (if required)

Download the MapR 5.1 sandbox by following the instructions here

http://maprdocs.mapr.com/51/index.html#SandboxHadoop/t_install_sandbox_vmware.html

This sandbox should have Internet access to complete the rest of the install.

Please make sure you increase the RAM from 6GB to 8GB on the VM before starting it. Apache Drill will require an additional 2GB of RAM. 

After the installation and starting the sandbox you should see the following screen. Please note I use VM Fusion On Mac OSX

Logon to the cluster as mapr, please note the IP address of your Sandbox may vary from my example.  

$ ssh mapr@192.168.185.248
The authenticity of host '192.168.185.248 (192.168.185.248)' can't be established.
RSA key fingerprint is 6a:36:ea:47:74:e6:57:92:e0:12:c4:8f:ee:64:09:20.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.185.248' (RSA) to the list of known hosts.
Password: mapr
Welcome to your Mapr Demo virtual machine.
[mapr@maprdemo ~]$
Password :- mapr

Configuring the build environment

To be able to build the MapR-DB Rest Application we first need to configure Maven.

These instructions will install Maven 3 on Centos so we can compile the OJAI application. Please alter accordingly for Ubuntu based distro’s

$ su - root
Password: mapr
# yum install git
# wget http://mirror.cc.columbia.edu/pub/software/apache/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
# tar xvf apache-maven-3.3.9-bin.tar.gz -C /usr/local
# cd /usr/local
# ln -s apache-maven-3.3.9/ maven
# vi /etc/profile.d/maven.sh
add the following lines.
export M2_HOME=/usr/local/maven
export PATH=${M2_HOME}/

Log out of the sandbox and back in to pick up these changes. You should see the following.

$ ssh mapr@192.168.185.248
Password: mapr
Last login: Thu Apr  7 23:58:05 2016 from 192.168.185.1
Welcome to your Mapr Demo virtual machine.
[mapr@maprdemo ~]$ mvn -version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T08:41:47-08:00)
Maven home: /usr/local/maven
Java version: 1.7.0_79, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79.x86_64/jre
Default locale: en_GB, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-573.el6.x86_64", arch: "amd64", family: "unix"

Downloading and compiling the MapR-DB Rest Application

The following instructions will first setup the directory structure in MapR-FS, download the MapR-DB Rest Application, compile it and finally launch it.

As user MapR

# cd /mapr/demo.mapr.com/
# chmod 777 apps
# mkdir apps/blog
# chmod 777 apps/blog
# cd /home/mapr
# git clone https://github.com/mapr-demos/maprdb-ojai-rest-sample.git
# cd maprdb-ojai-rest-sample/
# mvn clean package

Finally start the program by typing the following command. Please note this will lock the console so you may need another console into the cluster for additional commands to be run.

# mvn exec:java -Dexec.mainClass="com.mapr.db.samples.rest.Main"

Please note sometimes port 8080 is already in use, this can be altered by changing the ports in the following file and then recompiling the app.

/home/mapr/maprdb-ojai-rest-sample/src/main/java/com/mapr/db/samples/rest/ Main.java

Explore the MapR-DB Rest Application

Please feel free to explore the MapR-DB Rest Application and create as many articles for your blog as you require for demonstration or testing purposes.

Once the server is started you can access the Swagger UI using the following URI:

  • http://<ip_address>:8080/swagger

Or the Web applications

  • http://<ip_address>:8080/app/#/

You can use the Web application to create a user and a blog post.

Then in the Swagger UI you can discover some interesting features:

  • Create user from a predefined JSON Object
{"_id":"2","first_name":"Leon","last_name":"Clayton","age":"46"}
  • Create an article from a JSON String[a][b]
{"title":"Article1","content":"Here is the content","author":{"name":"leon clayton","id":"1"}}
  • List users using a simple projection

  • Update user to add a nickname, add interests, and remove these attributes. You see here the flexible schema in action

Similar features are exposed in the Articles REST API

After you have created some users and posts we can now use Apache Drill to access the same MapR-DB JSON tables.

Installing Apache Drill

If you are using the MapR Sandbox you should already have Apache Drill installed, but if your installation does not have Apache Drill installed then please follow the instructions below.  

As root we need to install Apache Drill 1.6 or higher. First clean and update the yum cache to make sure we get the latest package.

# su - root
# yum clean all
# yum makecache

Please make sure the following says Drill 1.6 or higher before continuing. This will download 300 MB so it may take some time.

# yum install mapr-drill
Loaded plugins: fastestmirror, security
Setting up Install Process
Loading mirror speeds from cached hostfile
 * base: mirror.vorboss.net
 * epel: www.mirrorservice.org
 * extras: mirror.vorboss.net
 * updates: mirror.vorboss.net
Resolving Dependencies
--> Running transaction check
---> Package mapr-drill.noarch 0:1.6.0.201603302146-1 will be installed
--> Finished Dependency Resolution

Get the new server to pick up Apache Drill

# /opt/mapr/server/configure.sh -R
Configuring Hadoop-2.7.0 at /opt/mapr/hadoop/hadoop-2.7.0
Done configuring Hadoop
Node setup configuration:  cldb drill-bits fileserver hbasethrift hbinternal historyserver hivemetastore hiveserver2 hue nfs nodemanager oozie resourcemanager spark-historyserver webserver zookeeper
Log can be found at:  /opt/mapr/logs/configure.log
  setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"
New Oozie WAR file with added 'Hadoop JARs, ExtJS library, JARs' at /opt/mapr/oozie/oozie-4.2.0/oozie-hadoop1.war
New Oozie WAR file with added 'Hadoop JARs, ExtJS library, JARs' at /opt/mapr/oozie/oozie-4.2.0/oozie-hadoop2.war
INFO: Oozie is ready to be started->

Analyzing the MapR-DB  JSON tables using Apache Drill

Lets use the Apache Drill GUI to look at the tables the OJAI application has just created. Apache Drill can be driven by any BI tool via ODBC or JDBC connections. In the following example i have used the Drill’s web interface but you could easily use anything you want here.
Goto port 8047 on the sandbox and you will see a query option. In the query type the following and press submit.

select * from dfs.`default`.`/apps/blog/articles`

You will see the following results.

A second test is the following query

select * from dfs.`default`.`/apps/blog/users`

Here in this article we explored how to setup an application running on top of MapR-DB JSON Tables and how to query that data without having to move or transform the data from its original storage format. Since JSON is self describing, this enables others to be able to easily understand and query the data structures being used. This is a very powerful feature that can be leveraged to streamline application development and analytics. I hope you found this useful.

no

CTA_Inside

Streaming Data Architecture: New Designs Using Apache Kafka and MapR Streams
Life happens as a continuous flow of events (a stream). Ted Dunning and Ellen Friedman describe new designs for streaming data architecture that help you get real-time insights and greatly improve the efficiency of your organization.

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams

 

 

 

Download for free