Seattle Data Science Apr 2016
Seattle, WA
Thursday, April 21, 2016
Data Scientists in Seattle are doing incredible work: making graph models of symptoms and human disease, extracting insight from huge amounts of real-time data, and building tools to make the whole process easier. The Seattle Data Science Meetup provides data scientists with a continued opportunity to promote and contribute to this expanding field.


Putting Apache Drill to Use: Best Practices for Production Deployments

Neeraja Rentachintala View Bio

Apache Drill is the industry's first schema-free SQL query engine for big data. With its flexibility to explore both structured and complex datasets on the fly from a variety of data sources combined with its distributed query processing capabilities that provide low latency performance at petabyte scale, Drill is getting rapidly adopted by organizations to open up their Hadoop deployments to wide variety of users in a self service fashion. This session provides deep dive into the use cases ofApache Drill and best practices for deploying it in production using real customer examples. We will start with an introduction to Apache Drill , how it fits into the Hadoop eco system and quickly delve into the topics that matter for production rollout such as the data ingestion methodologies , data model trade offs, storage format selection, picking the data layout for optimal performance, query design tips & tricks and finally wrap up a preview of the road ahead for the project in 2016.


Neeraja Rentachintala

As Director of Product Management, Neeraja is responsible for the product strategy, roadmap and requirements of MapR's SQL initiatives. Prior to MapR, Neeraja held numerous product management and engineering roles at Informatica, Microsoft SQL Server, Oracle and, most recently as the principal product manager for Informatica Data Services/Data Virtualization.