Strata & Hadoop World, London 2016
London
Wednesday, June 1, 2016
to
Friday, June 3, 2016
At Strata + Hadoop World, learn the skills and technologies you need to build successful, data-driven projects and organizations. MapR is pleased to be an Exabyte sponsor and speaker at this event.

Talks

Anomaly Detection in Telecom with Spark

Ted Dunning View Bio

Telecom operators need to find operational anomalies in their networks very quickly. This need, however, is shared with many other industries as well so there are lessons for all of us here. Spark plus a streaming architecture can solve these problems very nicely. I will present both a practical architecture as well as design patterns and some detailed algorithms for detecting anomalies in event streams. These algorithms are simple but quite general and can be applied across a wide variety of situations. All code for this talk will be open source and available on github.
Building Better Cross-Team Communication

Ellen Friedman View Bio

There is a huge advantage to be gained by creating effective communication between groups with widely different backgrounds and levels of technical expertise. Those with a business background can better understand the value of data and data-driven decisions and better communicate their own specific goals to the domain experts and technical teams who will develop projects. Without this cross-team communication, valuable development time can be lost through misunderstanding, unrealistic expectations or a poor fit between business goals and technical solutions. Yet it may seem as though different groups speak different languages. It’s not easy to do it, but it is possible to learn how to communicate effectively even between diverse stakeholders. One key to effective cross-team communication lies in finding the critical concepts and essential processes that drive whatever project you’re doing and making certain that they are clearly understood on all sides. This is more than just avoiding jargon: it’s about identifying what is most important to the project. Not only does this improve communication, it also helps build better understanding of data-based work. In this talk the audience will learn specific actions they can take to help improve their skills in cross-team communication. These include: • Identification of key concepts and potential challenges • Building respect for different stakeholders • Finding common language to support important ideas • Making data and data-based decisions more understandable. The talk will include examples from successful big data projects.
Adding Complex Data to Spark Stack

Neeraja Rentachintala View Bio

Apache Drill is an evolving SQL technology that allows users to instantly query and manipulate complex and semi-structured data such as JSON in its native format without requiring any upfront schema definitions. Apache Spark is a proven dataprocessing framework that allows users to quickly build in-memory data pipelines for advanced analytics and machine learning using a wide variety of language APIs. The latest integration between Drill and Spark brings the best of both of these technologies together. In this talk, we will go through how Spark users can leverage Drill’s dynamic schema discovery capabilities to create Spark RDDs directly on complex/semi-structure data, build data pipelines using Drill’s ANSI SQLextensions to manipulate the complex data within Spark programs, mix in Spark’s transformations, and then persist the SparkRDDs back to disk for queries by BI/Analytics tools. The session will cover the use cases for the integration and show a live demo of these technologies working together.
Real-time Hadoop: What an ideal messaging system should bring to Hadoop

Jim Scott View Bio

Application developers and architects today are interested in making their applications as real-time as possible. To make an application respond to events as they happen, developers need a reliable way to move data as it is generated across different systems, one event at a time. In other words, these applications need messaging. Messaging solutions have existed for a long time now, but in the age of Hadoop newer solutions like Kafka are being introduced that have higher performance, more scalability, and better integration with the Hadoop ecosystem. Kafka and similar systems are based on drastically different assumptions than the legacy systems, and have vastly different architectures. Because of this I’m often asked “What is the ideal messaging system for Hadoop?” In this talk I will dive into the architectural details and tradeoffs of both legacy and new messaging solutions seeking an answer to that question. We will discuss following topics during this presentation: • Queues versus logs • Security issues like authentication, authorization and encryption • Scalability and performance • Handling applications that span multiple data centers • Multi-tenancy considerations • APIs and integration points, and more

Speakers

Ted Dunning

Ted Dunning is Chief Application Architect at MapR Technologies and committer and PMC member of the Apache Mahout, Apache ZooKeeper, and Apache Drill projects​. Ted has been very active in mentoring new Apache projects and is currently serving as vice president of incubation for the Apache Software Foundation​.​ Ted was the chief architect behind the MusicMatch (now Yahoo Music) and Veoh recommendation systems. He built fraud detection systems for ID Analytics (later purchased by LifeLock) and he has 24 patents issued to date and a dozen pending. Ted has a PhD in computing science from the University of Sheffield. When he’s not doing data science, he plays guitar and mandolin. He also bought the beer at the first Hadoop user group meeting..

Ellen Friedman

Ellen Friedman is a consultant and commentator on big data topics. Active in open source, Ellen is a committer for Apache Mahout, co-author of the book Mahout in Action and a contributor for Apache Drill. She has a PhD in biochemistry, did laboratory research in microbiology, molecular biology and plant genetic engineering and has written about a wide range of technical topics including biology, oceanography and the genetics of learning and memory.

Ellen thinks rabbits are funny, so she helped design magic-themed cartoons in the book "A Rabbit Under the Hat."

Neeraja Rentachintala

As Director of Product Management, Neeraja is responsible for the product strategy, roadmap and requirements of MapR's SQL initiatives. Prior to MapR, Neeraja held numerous product management and engineering roles at Informatica, Microsoft SQL Server, Oracle and Expedia.com, most recently as the principal product manager for Informatica Data Services/Data Virtualization.

Jim Scott

Jim drives enterprise architecture and strategy at MapR. Jim Scott is the cofounder of the Chicago Hadoop Users Group. As cofounder, Jim helped build the Hadoop community in Chicago for the past four years. He has implemented Hadoop at three different companies, supporting a variety of enterprise use cases from managing Points of Interest for mapping applications, to Online Transactional Processing in advertising, as well as full data center monitoring and general data processing. Prior to MapR, Jim was SVP of Information Technology and Operations at SPINS, the leading provider of retail consumer insights, analytics reporting and consulting services for the Natural, Organic and Specialty Products industry. Additionally, he served as Lead Engineer/Architect for dotomi, one of the world’s largest and most diversified digital marketing companies. Prior to dotomi, Jim held several architect positions with companies such as aircell, NAVTEQ, Classified Ventures, Innobean, Imagitek, and Dow Chemical, where his work with high-throughput computing was a precursor to more standardized big data concepts like Hadoop.