Big Data Everywhere - Denver
Denver, CO
Thursday, May 21, 2015
Big Data Everywhere is a one-day event focused on Hadoop and surrounding technologies, as well as business applications. Big Data Everywhere will bring together users and developers to share their expertise and experience about these projects. Learn emerging trends from big data industry thought leaders, share best practices with users and developers, and gain valuable insights from some of today's most successful Hadoop deployments.


Rethinking SQL for Big Data with Apache Drill

Neeraja Rentachintala View Bio

Can I reduce the time to value for my business users on Hadoop data?

How can I do SQL on semi-structured and NoSQL data formats?

How do I create and manage schemas when the applications are changing fast?

What types of distributed systems problems do I have to solve when you move beyond traditional MPP scale to Hadoop scale?

Overall, a new way of thinking is needed to bring end-to-end agility with the BI/Analytics environments operating on Hadoop and NoSQL data. Along with the table stakes requirements to support broad eco system of SQL tools and provide interactive performance at scale, close attention must be paid to the new requirements such as working with semi-structured data formats and being able to incorporate data from fast changing data models. This session will cover how Apache Drill, the most flexible SQL on Hadoop technology, can achieve lightning fast performance and provide ground breaking flexibility and ease of use to enable self service data exploration experience for users on Hadoop and NoSQL data . The session will provide in-depth walk through of the powerful SQL query capabilities in Drill, an overview of the architecture that enables the performance and flexibility and example use cases from the customer deployments.

Math in Spark: How Mahout makes Spark do Math

Ted Dunning View Bio

The new Mahout DSL has two aims. One, to make it easy to program distributed machine learning algorithms using a math-like notation for the programs. The secondary goal is to allow such programs to be fairly performant by allowing alternative back-end computational engines. The primary back-end for Mahout is currently Spark, but there is also work going on with the h2o system. In this session, Ted will talk about how these back-ends help achieve these two goals, with particular attention to how speed is achieved.


Neeraja Rentachintala

As Director of Product Management, Neeraja is responsible for the product strategy, roadmap and requirements of MapR's SQL initiatives. Prior to MapR, Neeraja held numerous product management and engineering roles at Informatica, Microsoft SQL Server, Oracle and, most recently as the principal product manager for Informatica Data Services/Data Virtualization.

Ted Dunning

Ted Dunning is Chief Application Architect at MapR Technologies and committer and PMC member of the Apache Mahout, Apache ZooKeeper, and Apache Drill projects​. Ted has been very active in mentoring new Apache projects and is currently serving as vice president of incubation for the Apache Software Foundation​.​ Ted was the chief architect behind the MusicMatch (now Yahoo Music) and Veoh recommendation systems. He built fraud detection systems for ID Analytics (later purchased by LifeLock) and he has 24 patents issued to date and a dozen pending. Ted has a PhD in computing science from the University of Sheffield. When he’s not doing data science, he plays guitar and mandolin. He also bought the beer at the first Hadoop user group meeting..