Earlier this year, I published a series of posts on the deployment of Apache Drill to Azure. While the steps covered in those posts work, I’d like to speed up the process significantly. With the MapR Converged Data Platform available in the Azure Marketplace, I can have a Drill-enabled MapR cluster up and running much faster and with much less effort.
Apache Drill Blog Posts
In this week's Whiteboard Walkthrough video, Neeraja Rentachintala, Senior Director of Product Management at MapR Technologies, explains how Apache Drill optimization achieves interactive performance for low latency SQL queries on very large data sets when working with familiar BI tools such as Tableau, Microstrategy or Qlikview and includes techniques used for successful optimization using Drill in production. Neeraja describes Drill optimization capabilities based on Apache Calcite that include projection pruning, filter push down, partition pruning, cost-based optimization and meta-data caching.
MapR has worked closely with Azure to develop sandboxes that enable users to do a proof of concept with the MapR Converged Data Platform. These sandboxes, which are pre-loaded and preconfigured with the MapR software and the required supporting operating system, can be launched on the Azure Marketplace portal.
In this Whiteboard Walkthrough Parth Chandra, Chair of PMC for Apache Drill project and member of MapR engineering team, describes how the Apache Drill SQL query engine reads data in Parquet format and some of the best practices to get maximum performance from Parquet.
It's the 21st century, and who doesn’t want to be independent? Everyone wants to enjoy some freedom in whatever work they do. As a business user, you can experience this kind of freedom with Apache Drill, which gives you the flexibility to explore data in new and powerful ways.
It’s hard to believe it’s been a year since Apache Drill first became generally available on the MapR Converged Data Platform—yes, a full 365 days! This is just the beginning of the impact Apache Drill will have on big data analytics. Explore the infographic to see how Drill has been leveraged over the last year:
Dimensionality reduction is a critical component of any solution dealing with massive data collections. Being able to sift through a mountain of data efficiently in order to find the key descriptive, predictive and explanatory features of the collection is a fundamental required capability for coping with the Big Data avalanche.
Today we at MapR would like to congratulate Apache Arrow, a cross system data layer to speed up big data analytics and a brand new addition to the Apache Open Source Software community on its announcement as a Top Level project.
For the past 25 years, applications have been built using an RDBMS with a predefined schema that forces data to conform with a schema on-write. Many people still think that they must use an RDBMS for applications, even though records in their datasets have no relation to one another.
Someone once said “if you can’t measure something, you can’t understand it.” Another version of this belief says: “If you can’t measure it, it doesn’t exist.” This is a false way of thinking – a fallacy – in fact it is sometimes called the McNamara fallacy.
It’s the start of a new year -- we’re on the threshold of something new -- so let’s look forward to what you’re likely to be doing in 2016.
In this week's Whiteboard Walkthrough, Sameer Nori, Business Intelligence Expert at MapR, explains how BI has evolved over the last 3 decades from being IT driven to analyst driven with Self-Service tools.
In order to truly appreciate Apache Drill, it is important to understand the history of the projects in this space, as well as the design principles and the goals of its implementation.
As technology advances at breakneck speed, our lives are becoming increasingly digitized. From Twitter feeds to sensor data to medical devices, companies are drowning in big data yet starving for actionable information.
In this blog post, I will briefly summarize some of the key capabilities that customers are finding immensely valuable in Drill. I’ll also cover common use cases where Drill is deployed, as well as resources for getting started with Drill.
You are probably all somewhere on the Spark journey to production scale—you're either at Spark Summit to learn, to start doing something with Spark, or perhaps you have mission-critical applications already running in your enterprise. On this journey, there's a lot to think about—mostly about your application—but you also need to figure out how to actually get Spark into production scale as more and more groups will want the power of the results and the value of using Spark in mission-critical, operational deployments.
In the past few decades, the standard for working with and managing data has been SQL. SQL largely dominates the enterprise, and is used for everything from operational workloads and reporting to analytics. This standard will continue on Hadoop.
The MapR Distribution including Hadoop is now available in a private IT sandbox environment on the Amazon Web Services (AWS) Test Drive. We’ve partnered with AWS to create this lab environment so that you can gain hands-on experience with Hadoop.
In this week's Whiteboard Walkthrough, Tomer Shiran, PMC member and Apache Drill committer, walks you through the deployment of Apache Drill with different storage systems and the connection with BI tools.
Big Data analysis is intimidating to some people. They assume you need a background in statistics, deep technical knowledge, and other complex skills. But you don’t need to be a data scientist to extract insights and value from Big Data with Hadoop and Apache Drill.
We are pleased to announce the availability of the Apache Drill Essentials course as part of MapR Hadoop On-Demand Training.
MapR announced today that our SQL-on-Hadoop solution earned the highest score for Hadoop/data warehouse interoperability. MapR was among six vendors invited to participate in Gigaom Research’s January 2015 report, “Sector Roadmap: Hadoop/Data Warehouse Interoperability.” One of the key factors for our top placement in this competitive evaluation was the integration powers of Apache Drill’s technology included in the MapR Distribution. This report validates Apache Drill as a major advancement in data exploration given its schema flexibility, which makes it possible for you to immediately query complex data in native formats, such as schema-less data, nested data, and data with rapidly-evolving schemas, with minimal IT involvement.
As we close out the year, here is a look back at our 10 most popular blogs of 2014. Our top posts include machine learning and time series data topics, new milestones for the Apache projects Drill and Spark, and hands-on technical explanations that save you time and headaches.
During a recent trip to the Asia Pacific region, I was astonished at the growing excitement already over Apache Drill; equally amazing is the fact that the adoption of Drill is skyrocketing. The groundswell of support we are seeing within the community is validated today by the Apache Foundation announcement that Drill is now a Top Level Project. You’ll find plenty of documentation on the Apache Drill project site, but let me tell you about the glowing feedback I received from customers and system integrators, and why these customers are so excited about this new tool.
The recent MapR webinar titled “The Future of Hadoop Analytics: Total Data Warehouses and Self-Service Data Exploration” proved to be a highly informative, in-depth look at the future of data warehouses and how SQL-on-Hadoop technologies will play a pivotal role in those settings. Matt Aslett, Research Director for 451 Research, along with Apache Drill architect Jacques Nadeau, discussed what lies ahead for enterprise data warehouse architects and BI users in 2015 and beyond.
The first day of the 2014 Hadoop Summit was filled with announcements and interviews. MapR announced our first Apache Hadoop App Gallery, as well as our exciting partnership with Syncsort. Jack Norris, MapR CMO, had a chance to talk about this news on theCUBE with Wikibon’s Jeffrey Kelly and SiliconANGLE’s John Furrier.
This quarter is shaping up to be a great one for MapR! After recently announcing record growth in the first quarterwe’ve got some great momentum going into the second.
SQL-on-Hadoop just got easier this morning. Working together with the HP Vertica team, we are excited to announce general availability of the HP Vertica Analytics Platform running on the MapR Distribution for Apache Hadoop.
M.C. Srivas, CTO will participate on the Big Data Ecosystem partner panel at Splunk .conf2013.
SQL has become really hot – Why? Customers are looking for interactive performance in big data solutions with streamlined work flow and flexibility in their choices. Being able to use SQL effectively on Hadoop and other big data systems is a big step toward meeting that goal.
Blog Sign Up
Sign up and get the top posts from each week delivered to your inbox every Friday!