Apache Drill Blog Posts

Posted on September 22, 2016 by Raphaël Velfre

A very common use case for the MapR Converged Data Platform is collecting and analyzing data from a variety of sources, including traditional relational databases. Until recently, data engineers would build an ETL pipeline that periodically walks the relational database and loads the data into files on the MapR cluster, then perform batch analytics on that data.

Posted on September 20, 2016 by Robin Moffatt

Apache Drill is an engine that can connect to many different data sources, and provide a SQL interface to them. It's not just a wanna-be SQL interface that trips over at anything complex - it's a hugely functional one including support for many built in functions as well as windowing functions. Whilst it can connect to standard data sources that you'd be able to query with SQL anyway, like Oracle or MySQL, it can also work with flat files such as CSV or JSON, as well as Avro and Parquet formats.

Posted on September 14, 2016 by Neeraja Rentachintala

Today we are excited to announce the availability of Drill 1.8 on the MapR Converged Data Platform. As part of the Apache Drill community, we continue to deliver iterative releases of Drill, providing significant feature enhancements along with enterprise readiness improvements based on feedback from a variety of customer deployments.

Posted on September 2, 2016 by Robin Moffatt

Apache Drill enables querying with SQL against a multitude of data sources, including JSON files, Parquet and Avro, Hive tables, RDBMS, and more. MapR has released an ODBC driver for it, and I thought it'd be neat to get it to work with OBIEE. It evidently does work for OBIEE running on Windows, but I wanted to be able to use it on my standard environment, Linux.

Posted on August 17, 2016 by Vinay Bhat

In this week’s Whiteboard Walkthrough, Vinay Bhat, Solution Architect at MapR Technologies, takes you step-by-step through a widespread big data use case: data warehouse offload and building an interactive analytics application using Apache Spark and Apache Drill. Vinay explains how the MapR Converged Data Platform provides unique capabilities to make this process easy and efficient, including support for multi-tenancy.

Posted on August 3, 2016 by Neeraja Rentachintala

In this week’s Whiteboard Walkthrough, Neeraja Rentachintala, Senior Director of Product Management at MapR Technologies, gives an overview of how open source Apache Drill achieves low latency for interactive SQL queries carried out on large datasets. With Drill, you can use familiar ANSI SQL BI tools, such as Tableau or MicroStrategy, plus do exploration directly on big data.

Posted on July 20, 2016 by Sameer Nori

One of the customer questions has centered around wanting to understand how to determine the degree of parallelism being used for various operators in queries. We’ll address this question and the best practice that originated from this in the rest of this blog post.

Posted on July 8, 2016 by Craig Warman

In this blog post, I’ll describe how to install Apache Drill on the MapR Sandbox for Hadoop, resulting in a "super" sandbox environment that essentially provides the best of both worlds—a fully-functional, single-node MapR/Hadoop/Spark deployment with Apache Drill.

Posted on June 13, 2016 by Ellen Friedman

The power of SQL for business analytics is a given, but the challenge in big data settings is that SQL is normally a static language that assumes pre-defined, fixed and well-known schema. SQL also needs flat data structures. It has been assumed that you need fixed schema for performance.

Posted on June 10, 2016 by Vince Gonzalez

Drill is a fantastic tool for querying JSON data. But Drill isn’t magical, and sometimes it runs into some data that it can’t quite handle (yet). This post walks through an example of such a scenario, and how you might work through the issue using a little bit of Python code.

Posted on May 25, 2016 by Magnus Pierre

A few months ago, I created the first XML plugin for Apache Drill. The idea behind the plugin is simple: Since Apache Drill already has great support for JSON, why not convert the XML documents to JSON, and feed the information into the JSON driver for further processing and presentation in Apache Drill?

Posted on April 21, 2016 by Leon Clayton

In this article we will explore what it means to have a converged data platform for building and delivering business applications. This sample application will be to create blog articles for a personal website.

Posted on April 6, 2016 by Neeraja Rentachintala

Today we are very excited to announce the release of Apache Drill 1.6 on the MapR Converged Data Platform. Drill has been on the path of rapid iterative releases for one and a half years now, gathering amazing traction with customers and OSS community users on the way.

Posted on February 17, 2016 by Parth Chandra

During the early days of developing Apache Drill, the Drill team realized the need for an efficient way to represent complex, columnar data in memory. Projects like Protobuf provided an efficient way to represent data that had a predefined schema for transmission over the network, and the Apache Parquet project had implemented an efficient way to represent complex columnar data on disk.

Posted on January 13, 2016 by Neeraja Rentachintala

Today we are excited to announce that Apache Drill 1.4 is now available on the MapR Distribution. Drill 1.4 is a production-ready and supported version on MapR and can be downloaded from here and the find the 1.4 release notes here

Posted on December 22, 2015 by Tugdual Grall

Apache Drill has a hidden gem: an easy to use REST interface. This API can be used to Query, Profile and Configure Drill engine.

Posted on November 4, 2015 by Mitsutoshi Kiuchi

SQL engines for Hadoop differ in their approach and functionality. My focus for this blog post is to compare and contrast the functions and performance of Apache Spark and Apache Drill and discuss their expected use cases.

Posted on October 21, 2015 by Neeraja Rentachintala

In this blog post, I would like to briefly introduce the new analytics capabilities added to Drill namely ANSI SQL compliant Analytic and Window functions and how to get started with these.

Posted on October 1, 2015 by Joseph Blue

It’s difficult to describe what a real breach looks like, but you will know it when you see it. To identify a potential breach, we assess the amount of activity of accounts later experiencing fraud at each merchant and then visualize the results.

Posted on August 31, 2015 by Carol McDonald

In this blog post, we’ll take a look at the inner workings of Apache Drill, learn what services are involved, and find out what happens in Apache Drill when we submit a query.

Posted on August 18, 2015 by Tugdual Grall

A very common use case when working with Hadoop is to store and query simple files (such as CSV or TSV), and then to convert these files into a more efficient format such as Apache Parquet in order to achieve better performance and efficient storage.

Posted on July 28, 2015 by Tugdual Grall

Apache Drill allows users to explore any type of data using ANSI SQL. This is great, but Drill goes even further than that and allows you to create custom functions to extend the query engine. These custom functions have all the performance of any of the Drill primitive operations, but allowing that performance makes writing these functions a little trickier than you might expect.

Posted on July 7, 2015 by David Tucker

I’m very pleased to announce the release of a custom EMR bootstrap action to deploy Apache Drill on a MapR cluster. MapR is the only commercial Hadoop distribution available for Amazon’s Elastic MapReduce service (EMR), and this addition allows EMR users to easily deploy and evaluate the powerful Drill query engine.

Posted on July 6, 2015 by Jim Scott

In part one of this series, Drilling into Healthy Choices we explored using Drill to create Parquet tables as well as configuring Drill to read data formats that are not very standard. In part two of this series we are going to utilize this same database to think beyond traditional database design.

Posted on July 2, 2015 by Jim Scott

Drill is a SQL-engine for everything (almost). From simple tabular data, to semi-structured to even the most complex structured JSON data. In this two-part series we will explore what Apache Drill can do and how it enables us to rethink database design to make everyone's life easier.

Posted on June 29, 2015 by Nick Amato

Drill offers life-changing ways to simplify connecting to Hadoop-scale data in an application or script. OK, maybe not life-changing, but still pretty cool. In this post we will look at how to do it in your language of choice.

Posted on June 19, 2015 by Uli Bethke

Did you know you can run Apache Drill on your laptop? This is great news for business analysts who need to explore complex and semi-structured data. Let's look at a particular example.

Posted on June 4, 2015 by Dean Yao

JReport is an embeddable BI solution that empowers users to create reports, dashboards, and data analysis. JReport accesses data from Hadoop, such as the MapR Distribution through Apache Drill, as well as other big data and transactional data sources. By visualizing data through Drill, users can perform their own reporting and data discovery for agile, on-the-fly decision-making.

Posted on May 26, 2015 by Nitin Bandugula

This is the third and final entry in our three-part series focused on building basic skill sets for use in data analysis. The series is aimed at those who have some familiarity with using SQL to query data but limited or no experience with Apache Drill.

Posted on May 19, 2015 by Neeraja Rentachintala

Today, we are extremely excited and proud to announce the general availability (GA) of Apache Drill 1.0, as part of the MapR Distribution. Congratulations to the Drill community on this significant milestone and achievement!

Posted on May 18, 2015 by Nitin Bandugula

This is the second in our three-part series focused on building basic skill sets for use in data analysis. The material is intended for those who have no prior, or very limited, experience with Apache Drill, but do have some familiarity with running SQL queries.

Posted on May 13, 2015 by Tomer Shiran

In this week's Whiteboard Walkthrough, Tomer Shiran, PMC member and Apache Drill committer, walks you through the history of the non-relational datastore and why Apache Drill is so important for this type of technology.

Posted on May 11, 2015 by Nick Amato

In this post, I’ll show you how to build a simple real-time dashboard using Spark on MapR.

Posted on May 4, 2015 by Neeraja Rentachintala

Today, the Apache Drill community announced the release of Drill 0.9, and MapR is very excited to package this release as part of the MapR Distribution including Hadoop.

Posted on April 14, 2015 by Kirk Borne

Data across the enterprise are typically stored in silos belonging to different business divisions and even to different projects within the same division. These silos may be further segmented by services/products and functions. Silos (which stifle data-sharing and innovation) are often identified as a primary impediment (both practically and culturally) to business progress and thus they may be the cause of numerous difficulties.

Posted on April 13, 2015 by Andries Engelbrecht

Twitter, as we all know, is a powerful social media platform that can be used to harness incredibly useful information about products, brands and customer experience. This blog will explain how to: 1) Quickly configure an environment to stream Twitter data (filtered on keywords and languages) using Apache Flume, 2) analyze the data in native JSON format with SQL using Apache Drill, and 3) run interactive reports and analysis using MicroStrategy

Posted on April 2, 2015 by Neeraja Rentachintala

Since its Beta release in September '14, Apache Drill, the most flexible SQL-on-Hadoop technology, is making great strides in terms of the product progress as well as the community adoption. With four significant iterative releases (0.5, 0.6, 0.7, 0.8) in less than six months, thousands of downloads from the MapR website, nearly 1500 message threads in the Apache Drill user email alias, and an active open source community, Drill is well on its way to becoming generally available in the Q2 '15 time frame.

Posted on March 31, 2015 by Nitin Bandugula

We recently wrapped up a webinar series, covering global audience, on the topic of “Apache Drill: Introduction, Differentiation and Use Cases” that proved to be highly interactive and engaging.The webinar provided a quick introduction to Drill, covered key Drill differentiators for SQL specialists and business analysts, and provided an overview of new Hadoop use cases that were uncovered during the Drill Beta at MapR.

Posted on March 23, 2015 by Andries Engelbrecht

The value of Apache Drill becomes apparent when integrated with powerful analytics and BI platforms. Today, MicroStrategy announced that Apache Drill is certified with the MicroStrategy Analytics Enterprise Platform™. MicroStrategy Analytics Enterprise connected to Apache Drill allows users to explore multiple data formats instantly on Hadoop enabling direct access to semi-structured data, without having to rely on IT teams for schema creation.

Posted on February 13, 2015 by Nitin Bandugula

This is part two of the MapR - Apache Drill beta blog. You can read part one of the series here that talks about the different use cases we uncovered during the Drill Beta program at MapR. This blog delves into the Drill features that our beta customers felt were exciting and important for them, and also discusses some noteworthy features that the Drill community implemented based on some of our feedback. Features that our beta customers loved about Drill include: Getting Started with Drill is Extremely Easy, Improving Data Pipelining Processes, Seamless Connectivity to Existing BI Tools.

Posted on February 2, 2015 by Neeraja Rentachintala

Today’s data is dynamic and application-driven. The growth of a new era of business applications driven by industry trends such as web/social/mobile/IOT are generating datasets with new data types and new data models. These applications are iterative, and the associated data models typically are semi-structured, schema-less and constantly evolving. Semi-structured where an element can be complex/nested, and schema-less with its ability to allow varying fields in every single row and constantly evolving where fields get added and removed frequently to meet business requirements. In other words, the modern datasets are not only about volume and velocity, but also about variety and variability.

Posted on January 28, 2015 by Nitin Bandugula

This is a two-part series that covers what we have learned so far in our ongoing Apache Drill beta program at MapR. Part one covers the use cases we are uncovering from our beta customer usage and interactions, and the second part will cover the new product features we have implemented thus far, based on customer feedback. The Apache Drill beta program was a great opportunity for our team to validate the power of Drill as the first schema-less SQL query engine that allows enterprise SQL users (BI developers, business analysts and others) to start harnessing Hadoop data without undergoing any learning curve.

Posted on January 6, 2015 by Carol McDonald

SQL will become one of the most prolific use cases in the Hadoop ecosystem, according to Forrester Research. Apache Drill is an open source SQL query engine for big data exploration. REST services and clients have emerged as popular technologies on the Internet. Apache HBase is a hugely popular Hadoop NoSQL database. In this blog post, I will discuss combining all of these technologies: SQL, Hadoop, Drill, REST with JSON, NoSQL, and HBase, by showing how to use the Drill REST API to query HBase and Hive. I will also share a simple jQuery client that uses the Drill REST API, with JSON as the data exchange, to provide a basic user interface.

Posted on December 30, 2014 by Karen Whipple

After being promoted to a top-level project earlier this month, Apache Drill has reached yet another milestone. Jacques Nadeau, Apache Drill PMC Chair, recently announced on the Drill blog that the community has released Drill 0.7. This release contains 228 resolved JIRAs and numerous enhancements, including more freedom - Drill will now work on EC2, since there is no more dependency on UDP/Multicast.

Posted on December 1, 2014 by Nick Amato

At the recent SAP TechED && d-code event, we were excited to see what SAP is doing in terms of their major initiatives and how SAP (and MapR) will be able to help organizations around the world achieve simplicity while embracing the new trends shaping our industry: cloud, mobility, big data, and the Internet of Things. Apache Hadoop is a key part of SAP’s overall big data strategy, and we believe we’re very much aligned, both in terms of technology and strategy, with SAP’s key initiatives. How about an example you can put to use right away? This new demo shows the integration of Apache Drill and SAP Lumira, a self-service, data visualization application for business users.

Posted on November 13, 2014 by Neeraja Rentachintala

Apache Drill is one of the fastest growing open source projects, with the community making rapid progress with monthly releases. The latest release of Drill 0.6 is another important milestone for the project and builds on the product with key enhancements, including the ability to do SQL queries directly on MongoDB (along with file system, HBase, and Hive sources that are already supported today), as well as a number of performance and SQL improvements.

Posted on October 27, 2014 by Alex Rodrigues

Customer feedback is a valuable tool for every business, and one of the primary ways to get quality feedback is through surveys. However, asking customers to fill out lengthy surveys with 15+ questions will often result in a very low response rate. Most customers are not willing to take a long survey, and the ones who do often regret it after the first couple of questions.

Posted on September 28, 2014 by Karen Whipple

The recent MapR webinar titled “The Future of Hadoop Analytics: Total Data Warehouses and Self-Service Data Exploration” proved to be a highly informative, in-depth look at the future of data warehouses and how SQL-on-Hadoop technologies will play a pivotal role in those settings. Matt Aslett, Research Director for 451 Research, along with Apache Drill architect Jacques Nadeau, discussed what lies ahead for enterprise data warehouse architects and BI users in 2015 and beyond.

Posted on September 16, 2014 by Neeraja Rentachintala

Since Apache Drill 0.4 was released in August for experimentation on the MapR Distribution, there has been tremendous interest in the customer and partner community on the promise and potential of Drill to unlock the new types of data in their Hadoop/NoSQL systems for interactive analysis throughout the organization. Today we're excited to announce Apache Drill 0.5.

Posted on September 16, 2014 by Nitin Bandugula

The September release of the Apache open source packages in MapR is now available for customers. The September updates to the Apache Open Source packages in the MapR Distribution are part of the MapR 4.0.1 major release. Details about the MapR 4.0.1 release can be found here.

Here are the top highlights of this month’s release:

Posted on September 15, 2014 by Dale Kim

At the Big Data Everywhere conference held in Israel, Atzmon Hen-Tov, Vice President of R&D of Pontis, and Lior Schachter, Director of Cloud Technology and Platform Group Manager of Pontis, gave an informative talk titled “Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform.” The five phase, two-year migration of their operational and analytical functions to MapR resulted in a true, real-time operational analytics environment on Hadoop.

Posted on August 15, 2014 by Michele Nemschoff

Getting back to basics, MapR CTO and co-Founder M.C. Srivas provides a brief introduction to Hadoop, and explains where it fits on the “dumb data” to “very smart data” spectrum. After watching this video, you’ll have a better understanding of Hadoop, and how MapR has taken the best innovations from both ends of the data spectrum to develop the leading Hadoop technology for big data deployments. 

A few key points made in the video include:

Posted on August 12, 2014 by Neeraja Rentachintala

Congratulations to the Apache Drill community on reaching a big milestone. Apache Drill 0.4.0—a developer preview—has just been released. This is the first in a series of monthly builds the project team will deliver as it drives towards Beta and GA milestones.

Let’s take a brief look at why Apache Drill matters and its key features.

Posted on July 6, 2014 by Michele Nemschoff

M.C.Srivas, CTO and Co-Founder of MapR Technologies recently spoke at the Munich Hadoop User Group about the Apache Drill project.  The following is a blog from HUG Muenchen originally published on the comSysto blog.


A deep dive into Apache Drill - fast interactive SQL on Hadoop

Posted on June 20, 2014 by Nitin Bandugula

The latest monthly release of the Apache open source packages in MapR is now available for customers. The release includes updates to several OSS packages including Hive, HBase, Oozie, Hue and Sqoop. Here are some of the highlights of the release:

Posted on May 14, 2014 by Patrick Toole

With our recent announcement of HP Vertica’s deployment onto MapR, we have already been flooded with questions about the integration.

Use Cases

Posted on May 12, 2014 by Neeraja Rentachintala

This is was origionally posted on The HIVE on May 12, 2014.

Recently I happened to observe martial arts agility training at my son’s Taekwondo school. The ability to move quickly, change direction and still be coordinated enough to throw an effective strike or kick is the key to many martial arts, including Taekwondo.

Posted on May 7, 2014 by Jon Posnik

SQL-on-Hadoop just got easier this morning.  Working together with the HP Vertica team, we are excited to announce general availability of the HP Vertica Analytics Platform running on the MapR Distribution for Apache Hadoop.

Posted on April 29, 2014 by Jacques Nadeau

MapR recently hosted the first Apache Drill hackathon, with nearly forty people in attendance who helped push Drill toward its first beta release. It was great to see people from companies such as Visa, Cisco, LinkedIn and Hortonworks come together to harden and enhance the Apache Drill project. 

The hackathon participants worked on many different aspects of Apache Drill. Over the next few weeks, these features will be incorporated into mainline. Here’s a preview of what we worked on, coming soon to a master near you:

Posted on February 11, 2014 by Anoop Dawar

It gives me immense pleasure to write this blog on behalf of all of us here at MapR to announce the release of Hadoop 2.x, including YARN, on MapR. Much has been written about Hadoop 2.x and YARN and how it promises to expand Hadoop beyond MapReduce. I will give a quick summary before highlighting some of the unique benefits of Hadoop 2.x and YARN in the MapR Distribution for Hadoop.


Posted on February 11, 2014 by Neeraja Rentachintala

Today we are very excited to announce early access of the new HP Vertica Analytics Platform on MapR at the O’Reilly Strata Conference: Making Data Work. This solution tightly integrates HP Vertica’s high-performance analytic platform directly on the MapR Enterprise-Grade Distribution for Hadoop with no “connectors” required. We wanted to provide some additional details on this integration and why this is important for customers.

Posted on January 6, 2014 by Michael Hausenblas
At the end of last year my colleague Steve Wooledge discussed options you have at your disposal for querying both schema-based or self-describing structured datasources with the MapR Big Data platform. Around that time I also reviewed Open Source SQL-in-Hadoop Solutions over at InfoQ.
Posted on November 8, 2013 by Ted Dunning
The open source incubator project Apache Drill has just made its first release, a significant milestone on the road to graduating to a top-level Apache Software Foundation project. This is a big step that represents a lot of work by the Drill engineering contributors who built the software, by core Drill committers, and by the Apache Drill community who participated in the code review and voting process for the release.
Posted on August 6, 2013 by Ellen Friedman

Over 40 developers gathered recently at OSCON for an Apache Drill hands-on workshop in Portland, OR to learn what Drill is, how it can be used and to jump in and try it out. Jacques Nadeau, Drill committer and MapR engineer, and Ted Dunning, Drill project champion and MapR Chief Application Architect, guided the workshop participants. Thought if you couldn’t make we’d share what the participants experienced.

Posted on June 20, 2013 by Karen Whipple

The Big Data Journal has recently published an article titled “Apache Drill: Interactive Ad-Hoc Analysis at Scale” by MapR Chief Data Engineer Michael Hausenblas and Development Lead Jacques Nadeau.

Posted on February 6, 2013 by Ellen Friedman
It’s been almost six months since the Apache Drill project launched in August 2012. The project is making great strides both in terms of community participation and code writing. In fact, we’re getting to the point where we hope people are starting to think about use-cases.

Drill will be used by analysts and developers who are doing interactive analysis of large-scale datasets. It is intended for ad hoc, fast query where there are multiple data sources and formats. Previously that required writing Java programs, which is neither ad hoc nor fast. Drill plans to change that.

Posted on February 6, 2013 by Ellen Friedman
We are very excited to report the first public demonstration of Apache Drill at the recent Portland Java User's Group. The gathering of about fifty had a front row seat for this major milestone for the project. The reference interpreter was shown executing the internal query format against nested data.

Posted on October 19, 2012 by Ted Dunning
Apache Drill is coming together rapidly. Lots of progress is being made on multiple fronts as different groups start digging in and as the Apache infrastructure is fleshed out. The progress falls into several categories including community building, coding and logistics.

Posted on September 18, 2012 by Jack Norris
It’s been a couple of days since the kickoff meeting of the Apache Drill User Group. We are very encouraged and excited by the initial response and it bodes well for the future success of the Drill project.

Blog Sign Up

Sign up and get the top posts from each week delivered to your inbox every Friday!

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free