Berlin Buzzwords 2015
Berlin, Germany
Sunday, May 31, 2015
Wednesday, June 3, 2015

Berlin Buzzwords is Germany's most exciting conference on storing, processing and searching large amounts of digital data. It focuses on open source software projects. The 6th edition of Berlin Buzzwords will be held from May 31 to June 3, 2015 at Postbahnhof.

On Sunday, May 31, Berlin Buzzwords starts with the barcamp, followed by the two main conference days on Monday and Tuesday. On Wednesday attendees can join several workshops, meetups & hackathons taking place all over Berlin hosted by Berlin Buzzwords and local companies.


Practical t-digest Applications

Ted Dunning View Bio

The t-digest is a state-of-the-art algorithm for computing approximate quantiles with adjustable accuracy limits and very few limitations.

Implementations of t-digest algorithm are easy to use and have been integrated in all kinds of software from ElasticSearch to Apache Mahout. Certain kinds of queries such as finding the top 99.999th %-ile can be accelerated by several orders of magnitude by using t-digest.

I will describe the basic algorithm and demonstrate the effect of some variations of the algorithm. I will also show how to use the algorithm in your code or your queries.

How to gain insights from a fleet of vehicles without breaking a sweat

Michael Hausenblas View Bio

One of the more mature areas of the Internet of Things (IoT) is the application of sensors in the context of vehicles. Going beyond the mechanical and electrical challenges of deploying the sensors and delivering their readings, we will discuss scalable architectures by means of reviewing three applications (connected car, trucks, and agricultural equipment). From a technological point of view we will be dealing with message queues (Kafka and fluentd), stream processing platforms (Storm and Spark Streaming) as well as time series databases (InfluxDB and OpenTSDB). A live demo from the automotive domain is included in this talk.

An introduction to Apache Kylin - Business Intelligence meets Big Data

Apache Kylin is an open source distributed analytics engine that originated at ebay, Inc. and provides a SQL-interface and multi-dimensional analysis for online analytical processing. Kylin was recently accepted by the Apache Software Foundation as an incubator project and already has a number of integrations with HDFS, MapReduce, Hive, HBase and Apache Drill. This session will provide an overview about Apache Kylin, how it works, what it does and will give an introduction to Online Analytical Processing (OLAP), how Business Intelligence works and how Kylin + Hadoop can help in analyzing extremely large datasets.

Talk the Talk: How to Communicate with the Non-Coder

Your first thought may be: Why would I want to talk to non-coders? Buzzwords is a developers’ conference, and most users of open source software also are developers. But there’s a huge advantage to be gained by being able to describe what you do – the capabilities and the reasonable limitations – in powerful, non-technical ways that let you communicate effectively with project managers, those with useful domain knowledge, and in the case of some open source projects, the users. Take the new Apache Drill offering, for example. The community of Drill users comprise widely different groups. It includes developers who will appreciate the flexibility and extensibility of Drill as they incorporate it into their own projects plus business analysts with less deep technical developer knowledge but with strong experience and serious goals analyzing big data with BI tools. It helps for those developing Drill to be able to clearly see their needs and talk about how Drill may address them.

Describing technical work in non-technical terms does not mean “dumbing it down”. Much to the contrary, it means having sufficiently clear conceptual understanding of your own work that you can cut to the heart of the essential aspects and communicate them to people with a wide range of different background expertise. This approach is particularly useful with machine learning projects in which clear communication with business clients and domain experts about the applicability of available data sources can make or break a project.

There’s another advantage to developing the skill of conceptual communication: it improves your own thinking in terms of seeing the critically important aspects of your work and in leaving you open to innovation. This talk will examine concrete steps to take to learn how to best communicate the strength of your work to other groups and best conceptualize your own roadmap.


Ted Dunning

Ted Dunning is Chief Application Architect at MapR Technologies and committer and PMC member of the Apache Mahout, Apache ZooKeeper, and Apache Drill projects​. Ted has been very active in mentoring new Apache projects and is currently serving as vice president of incubation for the Apache Software Foundation​.​ Ted was the chief architect behind the MusicMatch (now Yahoo Music) and Veoh recommendation systems. He built fraud detection systems for ID Analytics (later purchased by LifeLock) and he has 24 patents issued to date and a dozen pending. Ted has a PhD in computing science from the University of Sheffield. When he’s not doing data science, he plays guitar and mandolin. He also bought the beer at the first Hadoop user group meeting..

Michael Hausenblas

Michael is Chief Data Engineer, EMEA, for MapR, where he helps people tap the potential of Big Data by bridging the technical (architecture, scalability, etc.) and the business side (RoI, TCO, etc.). His background is in large-scale data integration, the Internet of Things, and Web applications and he's experienced in advocacy and standardization (World Wide Web Consortium). Michael's sharing his experience with the Lambda Architecture, distributed systems and polyglot persistence through blog posts and public speaking engagements and is a contributor to Apache Drill.