data2day 2014
Karlsruhe, Germany
Wednesday, November 26, 2014
Friday, November 28, 2014
Big Data appeals to a variety of industries. Regardless of the company size, they are looking to make their data usable for the business. The way to get there is by no means simple: the data must be available and accessible, and you must be able to process it as appropriate, ie fast, safe and reliable enough to generate business relevant information from it. As these challenges have to be mastered, data2day presents tools and approaches being currently employed as well as allowing the end users to report on their Big Data projects.


Lambda Architecture: Implementing the Speed Layer with Storm and Spark Streaming

Michael Hausenblas View Bio

The Lambda Architecture enables developers to build large-scale, distributed data processing systems in a human fault tolerant way. In the LA we deal with three layers: 1. the batch layer, managing the master dataset and pre-computing batch views, 2. the serving layer, indexing batch views so that they can be queried in a low-latency manner, and 3. the speed layer, dealing with recent data only, processing the incoming data online. In this talk we focus on how to implement the speed layer using two prominent Apache project: Storm and Spark. We will discuss pros and cons, discuss use cases and demonstrate concrete examples in this domain.
Security im Hadoop-Ökosystem

Wednesday, November 26, 2014
Die Session gibt einen neutralen Überblick über die aktuellen Themen in Bezug auf Hadoop Security. Es geht hier um Themen wie IAM (Benutzerberechtigung, Authentifizierung, RBAC, Autorisierung), Data Protection Filesystem (Daten-/Volume-Verschlüsselung), Schutz beim Transfer der Daten (Cluster-zu-Cluster-, In-Cluster-Verschlüsselung), Logging und Monitoring (Privileged User Monitoring, Data Metering, Monitoring, Auditing) und Operational Security (Disaster Recovery, Backup ...). Ziel der Session soll es sein, die verschiedenen Aspekte sicherheitsrelevanter Themen im Hadoop-Ökosystem einschätzen zu können.


Michael Hausenblas

Michael is Chief Data Engineer, EMEA, for MapR, where he helps people tap the potential of Big Data by bridging the technical (architecture, scalability, etc.) and the business side (RoI, TCO, etc.). His background is in large-scale data integration, the Internet of Things, and Web applications and he's experienced in advocacy and standardization (World Wide Web Consortium). Michael's sharing his experience with the Lambda Architecture, distributed systems and polyglot persistence through blog posts and public speaking engagements and is a contributor to Apache Drill.