How to Use SQL, Hadoop, Drill, REST, JSON, NoSQL, and HBase in a Simple REST Client

SQL will become one of the most prolific use cases in the Hadoop ecosystem, according to Forrester Research. Apache Drill is an open source SQL query engine for big data exploration. REST services and clients have emerged as popular technologies on the Internet. Apache HBase is a hugely popular Hadoop NoSQL database. In this blog post, I will discuss combining all of these technologies: SQL, Hadoop, Drill, REST with JSON, NoSQL, and HBase, by showing how to use the Drill REST API to query HBase and Hive. I will also share a simple jQuery client that uses the Drill REST API, with JSON as the data exchange, to provide a basic user interface.

Apache Drill Tutorial and MapR Sandbox
Apache Drill provides direct queries on self-describing and semi-structured data in files (such as JSON and Parquet) and HBase tables without needing to define and maintain schemas in a centralized store such as Hive metastore. To learn more and to download Drill, go to This example is built using the MapR Sandbox with Drill and the Drill tutorial sample data. The MapR Sandbox with Apache Drill is a fully functional, single-node cluster that can be used to get an overview on Apache Drill in a Hadoop environment. To learn more about using the Drill tutorial on the MapR Sandbox, go to:

Exploring the Use Case Data
The Drill tutorial use case is an online retail business that accepts purchases from its customers through a web-based interface and a new mobile application. The master customer profile information and product catalog are managed in MapR-DB, which is a NoSQL HBase database. Clickstream data from web and mobile applications is stored in Hadoop as JSON files, and order data is in Hive. With Drill, we can dynamically query these different data sources and even combine the different data sources in queries. The diagram below shows the “relations” between the different data sources in the tutorial.

Exploring the Log Data
With Drill, you can perform SQL operations directly on files and directories without the need for up-front schema definitions or schema management for any model changes. The schema is discovered on the fly based on the query. Here is an example of using Drill Explorer to explore JSON data without defining and managing any centralized schema definitions:

Here is an example of performing an SQL query on a JSON file to get the transaction id, device name, and keywords, where the keyword starts with cr:

Exploring the Hive Orders Data
Here is an example of using Drill Explorer to explore the Hive orders table. Drill can use the Hive Metastore without using the Hive execution engine.

Exploring the Use Case HBase Products Table
is a NoSQL database. MapR-DB is an enterprise in-Hadoop NoSQL database which implements the HBase APIs. It exposes the HBase API to support application development. You use MapR-DB just like HBase, but underneath the implementation is more efficient, and it is integrated with the MapR file system. HBase is a column family oriented database, and it is “schema-less”, meaning you only define the column families when you create a table. Columns are defined dynamically when you write data, and data is not typed, as it is all stored as byte arrays.

The HBase products table has two column families:

In order to query HBase with Hive or other SQL-on-Hadoop options, you have to overlay a schema definition in order to know how to convert the binary values. Unlike other SQL-on-Hadoop options, Drill requires no overlay schema definitions in Hive to work with HBase data. You can use the Drill explorer to explore the HBase product table as shown below:

Here is an example of querying the HBase products table using the Drill sqlline shell interface.

0: jdbc:drill:> select * from maprdb.products limit 3;
|  row_key   |  details   |  pricing   |
| [B@1babffd6 | {"category":"bGFwdG9w","name":"IlNvbnkgbm90ZWJvb2si"} | {"price":"OTU5"} |
| [B@456a24be | {"category":"RW52ZWxvcGVz","name":"IzEwLTQgMS84IHggOSAxLzIgUHJlbWl1bSBEaWFnb25hbCBTZWFtIEVudmVsb3Blcw=="} | {"price":"MTY="} |
| [B@3b92598c | {"category":"U3RvcmFnZSAmIE9yZ2FuaXphdGlvbg==","name":"MjQgQ2FwYWNpdHkgTWF4aSBEYXRhIEJpbmRlciBSYWNrc1BlYXJs"} | {"price":"MjEx"} |

Given that everything in HBase is stored as byte arrays, and Drill requires no upfront schema definitions indicating data types, the query returns the raw byte arrays for column values, just as they are stored. With Drill, you can use the convert_from to convert data values to their corresponding data types. Also, the columns can be aliased as seen in this example:

To avoid writing long and complex queries for HBase over and over again, a common workflow is to first create a view containing the useful information and querying off the view instead. A view can also make data available in a more usable “table” format to use in ODBC tools, JDBC clients or in this case, for a REST client. The REST client uses a view of the HBase data converted as shown above and joined with the Hive Table. Here is the query used to create a view on the HBase data:

first change to a writable workspace:

0: jdbc:drill:> use dfs.mydata;
|     ok     |  summary   |
| true       | Default schema changed to 'dfs.mydata' |
1 row selected (0.078 seconds)

Create a view on the Product HBase table:

0: jdbc:drill:> create or replace view prodview as SELECT CAST(row_key AS INTEGER) AS prod_id, CAST(t.details.category AS VARCHAR(40)) AS category, CAST( AS VARCHAR(129)) AS name FROM maprdb.products t;
|     ok     |  summary   |
| true       | View 'prodview' replaced successfully in 'dfs.mydata' schema |

Unlike a traditional database where views typically are DBA/developer-driven operations, file system-based views in Drill are very lightweight. A Drill view is really a virtual dataset that is defined in a simple JSON file. Here is a look at the JSON file that was created:

# cat /mapr/
  "name" : "prodview",
  "sql" : "SELECT CAST(`row_key` AS INTEGER) AS `prod_id`, CAST(`t`.`details`['category'] AS VARCHAR(40)) AS `category`, CAST(`t`.`details`['name'] AS VARCHAR(129)) AS `name`\nFROM `maprdb`.`products` AS `t`",
  "fields" : [ {
    "name" : "prod_id",
    "type" : "INTEGER"
  }, {
    "name" : "category",
    "type" : "VARCHAR",
    "precision" : 40
  }, {
    "name" : "name",
    "type" : "VARCHAR",
    "precision" : 129
  } ],
  "workspaceSchemaPath" : [ "dfs", "mydata" ]

You can use Drill Explorer to explore the prodview as shown below:

Joining the Hive Orders Table with the Products View in a Query
With Drill, we can dynamically combine Hive, files and HBase tables in queries. For example, here is a query to get the device and cust_id from the json log file, and the order id from hive orders where the cust_id in the json file is equal to the cust_id in the Hive orders.

Create a View Joining the Product HBase Table with Hive Orders:

0: jdbc:drill:> create or replace view orderprodview as select o.order_id, o.`month`, o.cust_id, o.state, o.prod_id, o.order_total, p.category, from hive.orders o, dfs.mydata.prodview p where o.prod_id=p.prod_id limit 100;
|     ok     |  summary   |
| true       | View 'orderprodview' replaced successfully in 'dfs.mydata' schema |
1 row selected (0.127 seconds)

Here is an example of querying the orderprodview:

A Drill view can be used:

  • To simplify complex queries
  • To pull together data from multiple data sources
  • Like a table
  • With BI tools like Tableau
  • Give flexibility to data management

Next, let’s take a look at using this view with REST.

Drill REST Interface
Drill provides a Simple REST Interface which you can read more about on the Drill wiki Here is the REST API to submit a query and receive results. You can use HTTP Post to the Drill URL <drill_node_ip_address>:8047/query.json with a JSON Request body in the query as shown. You will receive the response as a list of JSON objects :

POST <drill_node_ip_address>:8047/query.json
Request body:
 "queryType" : "SQL",
 "query" : "select * from  dfs.mydata.orderprodview limit 5”
Response body (list of JSON objects):
   "order_total" : 13,
   "category" : "Binders and Binder Accessories",
   "prod_id" : 909,
   "name" : "Wilson Jones Ledger-SizePiano-Hinge Binder2Blue",
   "state" : "ca", }, …

REST interfaces are really easy to test using cURL or a browser plugin. Here is an example using cURL from the Linux command line to send a query on the orderprodview:

# curl  \
  --header "Content-type: application/json" \
  --request POST \
  --data '{
    "queryType" : "SQL",
    "query" : "select * from  dfs.mydata.orderprodview limit 1"
 }' \
[ {
  "order_total" : 13,
  "category" : "Binders and Binder Accessories",
  "prod_id" : 909,
  "name" : "Wilson Jones Ledger-SizePiano-Hinge Binder2Blue",
  "state" : "ca",
  "month" : "June",
  "order_id" : 67212,
  "cust_id" : 10001

Here is an example of using a Google Chrome Extension REST Client to query the orderprodview:

JQuery Client for Querying the Order Product View Using the Drill REST Interface
Now, let’s look at a simple JQuery client for the REST query we just looked at. Here is a screenshot of the client. It posts a query to Drill and displays the results in a table.

Using the Google Chrome Developer tools, you can see the HTTP request shown below:

And the HTTP response shown below:

A jQuery Client
Here is the jQuery code involved in calling the services. The jQuery client uses jquery.ajax to perform an HTTP POST request to the drill query URL, passing the query data in JSON format.

var rootURL = "http://host:8047/query.json";
var query = "select * from dfs.mydata.orderprodview"

function doQuery(query) {
        type: 'POST',
        contentType: 'application/json',
        url: rootURL,
        dataType: "json",
        data: queryToJSON(query),
        success: function(data) {
            console.log(' success: ' + data);
        error: function(jqXHR, textStatus, errorThrown) {
            alert('error: ' + textStatus);
function queryToJSON(query) {
    return JSON.stringify({
        "queryType": "SQL",
        "query": query

If the request succeeds renderList(data) is called, it is shown below. The response data is an array of JSON objects, and renderList appends the order objects to the HTML table in the index.html page.

function renderList(data) {
    var list = data == null ? [] : (data instanceof Array ? data : [data])
   $.each(list, function(order) {
        $('#orderList').append('<tr><td>' + order.order_id + '</td><td>' 
+ order.cust_id + '</td><td>'
            + order.state + '</td><td>' + order.month + '</td><td>' 
+ order.order_total + '</td><td>'
            + order.prod_id + '</td><td>' + order.category + '</td><td>' 
+ + '</td></tr>');

        <table id="list" class='tablesorter-blue'>
                <th>< b>Order Id</b></th>
                <th>< b>Customer Id</b></th>
                <th>< b>State</b></th>
                <th>< b>Month</b></th>
                <th>< b>Order Total</b></th>
                <th>< b>Product Id</b></th>
                <th>< b>Product Category</b></th>
                <th>< b>Product Name</b></th>
           <tbody id="orderList" ></tbody>

The client also provides the feature to search on the Product Category shown below.

The code for the search functionality simply adds where category like ‘searchKey’ to the query:

function search(searchKey) {
    if (searchKey != '')
        query = query + ' where category like \'' + searchKey + '%\'';

Here is the Request JSON data shown in the Chrome Developer Tools window:

This concludes the simple REST client example, which uses the Drill REST API to dynamically query HBase and Hive.

For more examples on how to use Drill, download the MapR sandbox and try out the Drill sandbox tutorial. Refer to Apache Drill web site for additional information.


Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free