Apache Drill Approaches New Milestone

Following the alpha milestone release in November 2013, the open source incubator project Apache Drill is well on its way towards its next big milestone, the 1.0 beta release. In fact, you can be part of the project by joining the hack-a-thon on April 24th.  Today, we’d like to share the great progress the Drill community has been making on the project, and outline the next steps for meeting the 1.0 beta milestone.

For those of you who are unfamiliar with the project, Drill is an Apache open source SQL query engine for big data exploration. You can read more in the detailed article Apache Drill:  Interactive Ad-Hoc Analysis at Scale.

Interactive SQL has been evolving as a key use case for Hadoop, as more and more organizations are looking to make Hadoop data broadly available to business and technical users with SQL skillsets, and to integrate Hadoop with their existing BI/analytics toolsets for improved decision making and reduced costs.

Apache Drill is designed from the ground up to support high-performance analysis on semi-structured/nested and rapidly evolving data coming from modern big data applications – while still providing the familiarity and ecosystem of SQL, the industry-standard query language. Additionally, Drill provides plug-and-play integration with existing Apache Hive and Apache HBase deployments.  

What’s the Current Status of Apache Drill?

Apache Drill is in very active development, with contributions being made by a wide community of participants. Due to the continued hard work by Drill participants, several major features have been completed over the last few months. These features include:

  • Ability to perform Drill queries on tables/views defined in the Hive metastore
  • Hive SerDe integration to query data from all Hive file formats
  • Ability to use Hive UDFs in Drill queries
  • Ability to query directly from HBase tables
  • Support for JSON and text file formats
  • SQL data types and functions
  • Hash aggregation
  • Distributed query execution
  • …A variety of performance optimizations

At this point, we do not have an official milestone release for users who want to try out Drill with these features. However, the beta version is coming soon, so stay tuned.

Interested members who would like to get started or who want to contribute to the project can download, compile and experiment with Drill very quickly.  Please note that Drill code is rapidly changing with new features and bug fixes, so do not expect it to be stable or ready to use for production purposes before the beta release.

We also started working on some key pieces of the documentation, which will be available on the Apache Drill Wiki. To help you get started with Drill, read Apache Drill in 10 Minutes.

If you’d like to learn more about the progress the community has been making on Drill, check out the videos (listed in the comments section) from our recent Bay Area Apache Drill User Group meetup.

What Are the Next Steps?

The next milestone for Drill is 1.0 beta. The Apache Drill community is geared up to hit this milestone within the next couple of months; more details on the beta release will be announced soon.

Here are the features that we are working on in the 1.0 beta timeframe:

  • Low latency SQL queries
  • Dynamic queries on self-describing/schema-less data in files and HBase, without requiring metadata definitions in Hive.
  • Nested data support
  • Integration with Apache Hive (queries on Hive tables/views, support for all Hive file formats and Hive UDFs)
  • BI/SQL tool integration using standard JDBC/ODBC drivers

How Can You Join?

There are several different ways to contribute to the development of Apache Drill, including writing code, fixing JIRAs, testing, contributing to documentation, etc. Since we are approaching beta, another way to contribute to Drill is to provide sample queries from your environment, so that Drill can be validated with actual customer use cases.

Here are some useful links for those of you who are interested in using or contributing to Drill:

Last but not the least, we will be holding an Apache Drill Hackathon at MapR on April 24th, where you can join other “Drillers” for a half-day of building new features that will be part of the beta release.

We look forward to the continued collaboration and development of our strong Apache Drill community.

– The Drill team


Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free