DA 4000 - Self-service SQL Analytics with Apache Drill


Feb 20 - 21
Tokyo, Japan
199,000 yen (+ tax) Register

About this course

You will write SQL queries on a variety of data types, including structured data in a Hive table, semi-structured data in HBase or MapR-DB, and complex data file types, such as Parquet and JSON. You will also learn the different services involved at each step, and how Drill optimizes a query for distributed SQL execution.

Prerequisites for Success in the Course

Review the following prerequisites carefully and decide if you are ready to succeed in this course. The instructor will move forward with lab exercises, assuming that you have mastered the skills listed below.

  • Required:
      Basic Linux knowledge, including familiarity with basic command-line options such a mv, cp, cd, ls, ssh, and scp
    • Access to, and the ability to use, a laptop with a terminal program installed (such as terminal on the Mac, or PuTTY and WinSCP on Windows)
    • Beginner to intermediate fluency with SQL
  • Recommended:
  • Optional: Basic Hadoop knowledge

Right for you?

  • For data analysts and data scientists who use SQL to perform data analysis.
  • This is a data analysis course; You must have SQL experience to do the exercises.


This course prepares you for the MapR Certified Data Analyst (MCDA) certification exam.


Included in this 2-day course are:

  • Slide Guide pdf
  • Lab Guide pdf
  • Lab Code

Day 1

  • Write familiar SQL queries on structured data
    • Get familiar with the ease of using SQL with Drill
  • Write familiar SQL queries on a range of data types
    • Perform SQL queries on semi-structured data
    • Use SQL to query complex and nested JSON data
  • Use Drill Explorer to explore semi-structured data
    • Discover the schema of unknown data
    • Preview unknown data
  • List the different types of data Drill can explore and query
    • Simple and complex data types
    • Structured and semi-structured data
  • Describe how Drill interacts with data and discovers its schema
  • Explore data to create queries using multiple data sources
    • Join simple, complex, structures and semi-structured data in the same query
  • Use views to visualize data in BI tools
    • Create a view to save common queries for easy reuse
    • Load a view into your BI tools to visualize queries without writing code
  • Components of Drill
    • Learn the components of a Drillbit
    • Customize extensible Drill core modules
  • Drill execution
    • Follow a query through execution in Drill
    • Learn how a query is broken into fragments and distributed on a cluster
    • Describe the stages involved with query planning
  • Optimization and flexibility of Drill
    • Learn the cost-based and rule-based techniques used by the Drill optimizer
    • Discover the flexibility and extensibility of Drill

Lab Exercises

  • Familiar SQL queries on structured Hive data
  • Familiar SQL queries on complex data
    • Query Parquet data
    • Query JSON data
    • A single query that joins Hive, HBase and JSON
  • Explore multiple data sources with the Drill Explorer
    • Drill Explorer Interface
    • Data sources
    • Discover data schema
    • Preview data
    • Save a view