Barcelona, Spain
Wednesday, November 19, 2014
Friday, November 21, 2014
The best minds in data will gather in Barcelona this November for the O'Reilly Strata Conference to learn, connect, and explore the complex issues and exciting opportunities brought to business by big data, data science, and pervasive computing. MapR is proud to be an Exabyte Sponsor - come and meet us on stand 217.


Data, Data Everywhere and Only Map-Reduce to Drink

Michael Hausenblas View Bio

Friday, November 21, 2014
We will describe our experiences in implementing a full-scale application applied to a large anonymised dataset from the mobile operator Telefonica. In the course of building this application, we faced and solved classic ETL and aggregation problems in a Map-Reduce setting. More importantly, we also developed methods for integrating diverse tools including MongoDB, the statistical system R, Hadoop Map-Reduce and a project optimized database known as jumboDB. Throughout, we used primitive operations based on Map-Reduce as the core element of computation. The lessons learned in this project have broad implications outside of this single project. Our project was unusual in the breadth of techniques used and also in the diversity in our goals. We will describe what sorts of Map-Reduce worked well and which forms did not. We will describe what sorts of problems were appropriate for Map-Reduce and which were not. Having a single programming model was useful for us in this project, but it was also an impediment in some respects. We will provide our perspective based on our project and examine how upcoming technologies would have impacted our efforts.
Doing the Impossible, Almost (A survey of approximation algorithms that make queries vastly faster)

Ted Dunning View Bio

Friday, November 21, 2014
Computing various quantities such as medians or the number of unique elements requires a lot of time or a lot of memory or both. It is, however, possible to get really close to the exact answer with much less time and much less memory. Some of these algorithms are much simpler than you might expect. I will describe a selection of these algorithms including some not yet published results. I will also outline how these algorithms can be applied to practical problems like anomaly detection.
Resistance is Futile: The Next Generation Big Data Architecture

Jim Scott View Bio

Friday, November 21, 2014
Apache Mesos, Apache Hadoop, Apache Spark + Custom Enterprise Applications: This stack combined is greater than the sum of each of the pieces of this stack. Mesos can manage resources across an entire data center, Hadoop provides a distributed data store and scalable data processing, and Spark delivers great in-memory and disk-based performance of data processing as well as streaming capabilities. Couple all of that with custom enterprise applications, and the data center turns into a well-oiled machine. When combined, this software stack delivers unlimited flexibility for the entire data center.


Michael Hausenblas

Michael is Chief Data Engineer, EMEA, for MapR, where he helps people tap the potential of Big Data by bridging the technical (architecture, scalability, etc.) and the business side (RoI, TCO, etc.). His background is in large-scale data integration, the Internet of Things, and Web applications and he's experienced in advocacy and standardization (World Wide Web Consortium). Michael's sharing his experience with the Lambda Architecture, distributed systems and polyglot persistence through blog posts and public speaking engagements and is a contributor to Apache Drill.

Ted Dunning

Ted Dunning is Chief Application Architect at MapR Technologies and committer and PMC member of the Apache Mahout, Apache ZooKeeper, and Apache Drill projects​. Ted has been very active in mentoring new Apache projects and is currently serving as vice president of incubation for the Apache Software Foundation​.​ Ted was the chief architect behind the MusicMatch (now Yahoo Music) and Veoh recommendation systems. He built fraud detection systems for ID Analytics (later purchased by LifeLock) and he has 24 patents issued to date and a dozen pending. Ted has a PhD in computing science from the University of Sheffield. When he’s not doing data science, he plays guitar and mandolin. He also bought the beer at the first Hadoop user group meeting..

Jim Scott

Jim drives enterprise architecture and strategy at MapR. Jim Scott is the cofounder of the Chicago Hadoop Users Group. As cofounder, Jim helped build the Hadoop community in Chicago for the past four years. He has implemented Hadoop at three different companies, supporting a variety of enterprise use cases from managing Points of Interest for mapping applications, to Online Transactional Processing in advertising, as well as full data center monitoring and general data processing. Prior to MapR, Jim was SVP of Information Technology and Operations at SPINS, the leading provider of retail consumer insights, analytics reporting and consulting services for the Natural, Organic and Specialty Products industry. Additionally, he served as Lead Engineer/Architect for dotomi, one of the world’s largest and most diversified digital marketing companies. Prior to dotomi, Jim held several architect positions with companies such as aircell, NAVTEQ, Classified Ventures, Innobean, Imagitek, and Dow Chemical, where his work with high-throughput computing was a precursor to more standardized big data concepts like Hadoop.