Apache Drill

Apache Drill is a distributed system for interactive analysis of large-scale datasets. Drill is similar to Google’s Dremel, with the additional flexibility needed to support a broader range of query languages, data sources and data formats, including nested, self-describing data. Drill offers the following benefits:

  • Flexibility: Drill can read from all kinds of data, including nested and schema-less. It supports querying against many different schema-less data sources including HBase, Cassandra and MongoDB. Naturally flat records are included as a special case of nested data.
  • Speed: Drill is optimized for interactive applications, and thus is designed to process petabytes of data and trillions of records in seconds.
  • Compatibility: Unlike other SQL-like interfaces to Hadoop, such as Hive, Impala, and Shark, Drill does not expose a HiveQL interface to users and applications. In order to achieve the highest level of compatibility with traditional databases, Drill exposes an ANSI-compliant SQL interface.

Why is MapR involved in the Drill Project?

MapR is a recognized as the leading Hadoop innovator and is dedicated to providing the best big data processing capabilities. MapR is committed to a highly transparent, open source project so that the best architecture can be put in place to ensure a high quality and flexible solution. This includes developing and defining open APIs to ensure a robust ecosystem. Apache Drill represents a huge leap forward for organizations looking to augment their big data processing with interactive queries across massive data sets, with a focus on schema-less and nested data which is an unmet need in the SQL-on-Hadoop market today. Driving Drill as an open source project reduces the barriers to adopting a new set of big data APIs.

How is Apache Drill different from Apache HBase™?

Drill provides a distributed execution engine for interactive queries. HBase represents a supported data source for Drill.

How is Apache Drill different from Apache Hive, Pig and Cascading?

Today these systems compile higher-level languages (e.g., HiveQL, Pig Latin) into MapReduce jobs. Once Drill is available, these systems may support Drill as an underlying low-latency execution engine, enabling interactive queries across billions of records. Chris Wensel, the author of Cascading, is collaborating with MapR on this project and is one of the initial committers.