Hadoop Summit 2015
San Jose, CA
Tuesday, June 9, 2015
to
Thursday, June 11, 2015
MapR is proud to be a Platinum Sponsor of the 8th Annual Hadoop Summit, a leading conference for the Apache Hadoop community. This 3-day event features many of the Apache Hadoop thought leaders who will showcase successful Hadoop use cases, share development and administration tips and tricks, and educate organizations about how best to leverage Apache Hadoop as a key component in their enterprise data architecture.

Talks

Drilling into Data with Apache Drill

Jacques Nadeau & Tomer Shiran View Bio

June 9, 2015 at 11:15am

Apache Drill is a next generation SQL engine for Hadoop and NoSQL. Its unique schema-free approach enables self-service data exploration with the agility that organizations need in this new era of rapidly growing and evolving data. In this talk we present an overview of Apache Drill including it's unique features and architecture, and explain how to get started with Drill.

Computing with Chaos

Ted Dunning View Bio

June 9, 2015 at 12:05pm

Chaos.  Randomness.  Utter unpredictability.

These sound like trouble, but they are actually tools that you can use to make software more reliable, solve paradoxes, test theorems and much, much more.

With a few slides, a bit of live coding and a lot of fun I will show how you can become a Lord of Chaos.  Specifically, I will show how you can use randomness to:

- Solve the Monte Hall problem and explain your answer

- Do precise proximity searches on geo-hashed databases

- Safely build machine learning models in the open to solve problems on secure data

- Design software that behaves predictably without corner cases

- Test statistical hypotheses without remembering formulas

Real Time and Big Data – It’s About Time

Tomer Shiran View Bio

June 9, 2015 at 5:25pm

To deliver real-time impact from big data, organizations must evolve beyond traditional analytic approaches to support a new class of agile, distributed applications. Real-time Hadoop overcomes batch programs reliant on data transformations and schema management. This session highlights how leading organizations are leveraging Hadoop and NoSQL to merge analytics and production data to make adjustments while business is happening to optimize revenue, mitigate risk and reduce operational costs. Details include how companies have achieved real-time impact on their business, collapsed data silos, and automated in-line analytics with operational data for immediate impact.

Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks

Nick Amato View Bio

June 10, 2015 at 12:05pm

This session examines practical ways you can begin leveraging network data sources in Hadoop using familiar technologies like SQL and BI tools. Using the diverse sets of sources available, such as traces, routing protocol data, and direct packet captures from critical network locations, we will examine the capabilities of BI tools in the network context and examine cases for extracting value from data collected from the network infrastructure.

HBase and Drill: How Loosely Typed SQL is Ideal for NoSQL

Ted Dunning View Bio

June 10, 2015 at 3:25pm

The Apache HBase approach to data has a huge potential for expressing NoSQL-y, non-relational programs. Apache Drill supports SQL for non-relational data. Paradoxically, combining this NoSQL with this SQL tool results in something even better. I will show and explain how to combine HBase and Drill to access time series data and to support high performance secondary indexing.

How (the internet of) Things are Turning the Internet Upside Down

Ted Dunning View Bio

June 10, 2015 at 5:25pm

Just when we thought the last mile problem was solved, the Internet of Things is turning the last mile problem of the consumer internet into the first mile problem of the industrial internet. This inversion impacts every aspect of the design of networked applications. I will show how to use existing Hadoop ecosystem tools, such as Spark, Drill and others, to deal successfully with this inversion. I will present real examples of how data from things leads to real business benefits and describe real techniques for how these examples work.

Realistic Synthetic Generation Allows Secure Development

Ted Dunning View Bio

June 11, 2015 at 1:30pm

Open source is great, if developed in the open. Privacy is great, but things have to be private. So what happens when you find an open source bug with private data? How do you even file the bug report? Likewise, how can you develop fraud detection algorithms in academic settings when the training data can't be transported outside a secure perimeter. One answer is really good fake data. Good enough to fool the bug. Good enough to emulate the fraud. I will describe log-synth and several physics based approaches that can do this and tell some real stories about fake data.

Speakers

Jacques Nadeau & Tomer Shiran

Jacques Nadeau leads Apache Drill development efforts at MapR Technologies. He is an industry veteran with over 15 years of big data and analytics experience. Most recently, he was cofounder and CTO of search engine startup YapMap. Before that, he was director of new product engineering with Quigo (contextual advertising, acquired by AOL in 2007). He also built the Avenue A | Razorfish analytics data warehousing system and associated services practice (acquired by Microsoft).

Tomer Shiran heads the product management team at MapR and is responsible for product strategy, roadmap and requirements. Prior to MapR, Tomer held numerous product management and engineering roles at Microsoft, most recently as the product manager for Microsoft Internet Security & Acceleration Server (now Microsoft Forefront). He is the founder of two websites that have served tens of millions of users, and received coverage in prestigious publications such as The New York Times, USA Today and The Times of London. Tomer is also the author of a 900-page programming book. He holds an MS in Computer Engineering from Carnegie Mellon University and a BS in Computer Science from Technion - Israel Institute of Technology.

Ted Dunning

Ted Dunning is Chief Application Architect at MapR Technologies and committer and PMC member of the Apache Mahout, Apache ZooKeeper, and Apache Drill projects​. Ted has been very active in mentoring new Apache projects and is currently serving as vice president of incubation for the Apache Software Foundation​.​ Ted was the chief architect behind the MusicMatch (now Yahoo Music) and Veoh recommendation systems. He built fraud detection systems for ID Analytics (later purchased by LifeLock) and he has 24 patents issued to date and a dozen pending. Ted has a PhD in computing science from the University of Sheffield. When he’s not doing data science, he plays guitar and mandolin. He also bought the beer at the first Hadoop user group meeting..

Tomer Shiran

Tomer Shiran heads the product management team at MapR and is responsible for product strategy, roadmap and requirements. Prior to MapR, Tomer held numerous product management and engineering roles at Microsoft, most recently as the product manager for Microsoft Internet Security & Acceleration Server (now Microsoft Forefront). He is the founder of two websites that have served tens of millions of users, and received coverage in prestigious publications such as The New York Times, USA Today and The Times of London. Tomer is also the author of a 900-page programming book. He holds an MS in Computer Engineering from Carnegie Mellon University and a BS in Computer Science from Technion - Israel Institute of Technology.

Nick Amato

Nick works with MapR's ecosystem and technology partners to identify new opportunities were the MapR platform can bring value to our customers. His areas of focus include third-party integrations with BI tools, benchmarking, architecture, and enabling scalable data platforms.