Insights from Apache Drill Beta – Part 1: Initial Drill Use Cases

This is a two-part series that covers what we have learned so far in our ongoing Apache Drill beta program at MapR. Part one covers the use cases we are uncovering from our beta customer usage and interactions, and the second part will cover the new product features we have implemented thus far, based on customer feedback.

The Apache Drill beta program was a great opportunity for our team to validate the power of Drill as the first schema-less SQL query engine that allows enterprise SQL users (BI developers, business analysts and others) to start harnessing Hadoop data without undergoing any learning curve. Our beta customers covered both large enterprises (telco, high tech, etc.) that have deployed Hadoop, and also Web 2.0 companies (ad tech and retail analytics) that are well-versed with Hadoop and are looking to create newer Hadoop-based services.

Here is a high-level summarized view of Apache Drill’s first set of use cases from the beta program:

Enterprise Data Hub Augmentation: This is the classic cost-savings use case where customers (mostly large enterprise companies) had deployed Hadoop to optimize data warehouse workloads. Apache Drill augments this use case by providing users with the ability to interactively explore data stored in Hadoop and run ad hoc reports, where necessary. The premise in this case was the ease with which customers could explore data without refining it or running any ETL. Once the user understood the business value they could get from the data, they would then switch to traditional data warehousing techniques for deeper analytics and reports.

Data Exploration on HBase/MapR-DB: Another prominent use case that emerged across the board was the customer’s ability to instantly query HBase tables using SQL. The users were interested in running short range scans on HBase (and MapR-DB) to improve existing analysis/reports, but wanted to use SQL instead of HBase APIs. The real benefit that emerged was that until now, HBase was not readily available to the traditional SQL analysts in these companies, unless schema was built upfront and maintained on an on-going basis. The analysts had to rely on Java developers to retrieve any information from these NoSQL data sources. With Apache Drill, however, SQL analysts are now able to independently access and consume a completely new and important data source for deeper analysis.

Exploring Raw Data with Ease: JSON is everywhere now, and so is semi-structured log data. There was a large majority of beta users who instantly jumped into using Apache Drill to explore new data formats easily. These users (in IT departments) had nailed the process to get RDBMS data into the hands of business analysts in the quickest way possible, but were struggling to do the same with newer formats that were less structured. Hive developers were spending enormous amounts of time trying to build and maintain these constantly evolving data structures and formats. It was a painful experience across the board. With Apache Drill, the IT folks intend to cut down on the data preparation activities and dramatically simplify the process of getting JSON/log data into the hands of analysts. They see a remarkable time-to-market value from using Apache Drill.  

We continue to see immense interest in Apache Drill capabilities. As we march towards its GA, we encourage you to download Apache Drill or the MapR Sandbox for Apache Drill, use it  (it takes only a few minutes to get started), and provide feedback to the Apache Drill community.

See Part 2 of this blog series here!

Please post any questions you may have in the comments section below! 


Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free