Editor's Note: In this week's Whiteboard Walkthrough, Tomer Shiran, PMC member and Apache Drill committer, walks you through the history of the non-relational datastore and why Apache Drill is so important for this type of technology.
Here's the transcription:
The reason we started Apache Drill was, basically, it comes down to the rise of the non-relational data store. If you look back a few decades ago, when the relational database was invented, it was exactly the right tool for the job, for the volumes of data that people had, and the way in they were developing applications. That monopoly lasted for around 30 or 40 years, but around 2010 we started to see this explosion in data. Today, data is doubling in size every two years. Also, the rate at which applications are being developed is much, much faster than it used to be. Instead of a two-year development cycle, many organizations are now releasing new version of their application, maybe every day or every week.
What happened then is that in order to deal with this larger volume of data and the need for more agility from a development standpoint, we started to see these non-relational data stores becoming increasingly popular. Things like NoSQL databases, Apache Hadoop, and Cloud Storage… NoSQL database, those things like HBase, MongoDB, Cassandra; Cloud Storage, technologies like Amazon's S3, Azure Blob Storage, Google Cloud Storage. All these systems started to capture that growing amount of data.
The problem is that you now have these emerging as the new standard data store, but they don’t have that same interface that's accessible to everybody, so they really can only be accessed by developers, there is no great interactive query engine for these systems that maintains the agility that they offer. That's really why we invented Apache Drill was really to provide that kind of standard high-performance SQL service for all of these types of data stores.