Parth Chandra. Member of the Apache Drill team at MapR. Has 20 years of software development experience in application software, data integration, web advertising, and distributed systems. Was most recently Chief Architect at Jivox, a web advertising startup. Before that he was as Senior Architect at Informatica.
In this Whiteboard Walkthrough Parth Chandra, Chair of PMC for Apache Drill project and member of MapR engineering team, describes how the Apache Drill SQL query engine reads data in Parquet format and some of the best practices to get maximum performance from Parquet.
During the early days of developing Apache Drill, the Drill team realized the need for an efficient way to represent complex, columnar data in memory. Projects like Protobuf provided an efficient way to represent data that had a predefined schema for transmission over the network, and the Apache Parquet project had implemented an efficient way to represent complex columnar data on disk.
Blog Sign Up
Sign up and get the top posts from each week delivered to your inbox every Friday!