Big Data Everywhere New York 2016
New York, NY
Thursday, May 19, 2016
Big Data Everywhere is a half-day conference focused on Hadoop, Spark and other big data technologies that brings together users and developers to share their experience via technical sessions and user success stories. Industry and technical experts will share their knowledge, share best practices, and discuss use cases and business applications.


Modern Streaming Analytics, Flow versus State

Ted Dunning View Bio

The leading edge of big data architectural practice is rapidly moving to flow-based computing using streaming architectures as opposed to state-based computing based on batch programs plus workflow schedulers. This transition is part of the larger movement towards micro-services and devOps oriented development and maintenance of large systems. This fashion is spreading quickly, but the understanding of why flow-based computing is different from state-based computing and what this means is practice is lagging behind. In fact, there is a huge difference and this has the potential of massively simplifying big data systems, thereby improving reliability and time to market. I will describe the necessary key concepts and illustrate them with practical examples. I will also describe why this matters in the real world.
Self-Service Data Exploration Analytics on Hadoop - Introduction to Apache Drill

Vince Gonzalez View Bio

SQL is one of the most widely used languages to access, analyze, and manipulate structured data. As Hadoop gains traction within enterprise data architectures across industries, the need for SQL for both structured and loosely-structured data on Hadoop is growing rapidly. Apache Drill started off with the audacious goal of delivering consistent, millisecond ANSI SQL query capability across wide range of data formats. At a high level, this translates to two key requirements – Schema Flexibility and Performance. Apache Drill provides the users the ability to interact with big data on Hadoop much faster and far more easily using the familiar SQL language. Users are no longer dependent on central IT teams and DBAs to produce schemas and then maintain them when the structure changes for a few records. Drill alleviates the pain associated with structuring unstructured data before one gains any insights by providing a simple mechanism to query any dataset on Hadoop - be it flat files, parquet or JSON files or tables within an HBase table. This session will give you an overview of several different use cases that enterprises are testing Drill for.

I will describe the necessary key concepts and illustrate them with practical examples. I will also describe why this matters in the real world.


Ted Dunning

Ted Dunning is Chief Application Architect at MapR Technologies and committer and PMC member of the Apache Mahout, Apache ZooKeeper, and Apache Drill projects​. Ted has been very active in mentoring new Apache projects and is currently serving as vice president of incubation for the Apache Software Foundation​.​ Ted was the chief architect behind the MusicMatch (now Yahoo Music) and Veoh recommendation systems. He built fraud detection systems for ID Analytics (later purchased by LifeLock) and he has 24 patents issued to date and a dozen pending. Ted has a PhD in computing science from the University of Sheffield. When he’s not doing data science, he plays guitar and mandolin. He also bought the beer at the first Hadoop user group meeting..

Vince Gonzalez

Vince Gonzalez is a Systems Engineer for MapR. He has 20 years of experience designing, implementing and automating infrastructure and applications. Previously, he was an Field Application Engineer for AMD Data Center Server Solutions (formerly SeaMicro), where he supported customers in the eastern U.S. Prior to AMD/SeaMicro, Vince worked for BlueArc Corporation. Vince held a variety of roles at BlueArc, including Field Engineer, Sales Engineer, and Director of Technical Services, where he led the Eastern region field technical team. Earlier in his career, Vince was a UNIX/Linux administrator.