Hadoop Summit 2016 Dublin
Dublin, Ireland
Wednesday, April 13, 2016
Thursday, April 14, 2016
Hadoop Summit is community focused and all the conference sessions are voted on by the public and selected by a committee of industry luminaries. At Hadoop Summit you'll get deep-dive technical content from committers across a wide range of advanced/basic topics and projects. MapR are pleased to be Gold Sponsors of Hadoop Summit Europe. Come and visit our booth, meet the team and hear us speak.


Real-World NoSQL Schema Design

Tugdual Grall View Bio

Wednesday, April 13, 2016
There are lots of claims about the benefits of NoSQL databases, but few realistic demonstrations of the impact that such a database can have on anything more than toy-sized data. In this talk, I will deconstruct a real-world database schema into the corresponding NoSQL design. The database that I will use is the Musicbrainz database, which exhibits many important idioms found in real databases, such as factoring relations into multiple tables to implement column families, linkage tables, and many-to-one relationships. In spite of such radical structural changes, the resulting denormalized and nested data can still be queried with SQL using Apache Drill, and the queries are often noticeably simpler than the queries used against the original data structures. The methods are practical and easy to apply, and can sometimes be largely automated. For example, I'll show how a percolator pattern can be used to allow the resulting NoSQL database to be automatically maintained in multiple NoSQL technologies simultaneously, so that full text search, recommendations, and the HBase API can all be used to access the same data.
Detecting Persistent Threats Using Sequence Statistics

Ted Dunning View Bio

Thursday, April 14, 2016
n a persistent threat, the attacker often penetrates a system but exploits information captured there elsewhere at a throttled rate to avoid detection. In some cases, the attacker even takes measures to protect the penetrated system from other attackers to avoid the detailed inspection that often accompanies the detection of a compromise. I will describe one particular kind of situation in which a single point of compromise is used to extract consumer financial information that is then used elsewhere to commit fraud. This kind of attack can be difficult to detect and hard to trace. In fact, however, detailed examination of transaction histories across thousands to millions of accounts can provide a very sensitive indicator of such activity and can often pin-point the original point of compromise. The detection technique that I will describe has very broad applicability across many problems that involve sequences of symbols and has produced state-of-art results in genomics, fraud detection, text analysis, retail recommendations and predicting attrition and profitability. The specific case that I describe in this talk is also interesting since the technique was initially developed using synthetic data which emulated real data closely enough that a fraud ring was detected the first time out.


Tugdual Grall

"Tug" is Technical Evangelist EMEA at MapR, an open source advocate and a passionate developer. He currently works with the European developer communities to ease MapR, Hadoop and NoSQL adoption.

Before joining MapR, Tug was Technical Evangelist at MongoDB and Couchbase. Tug has also worked as CTO at eXo Plaform and JavaEE product manager, and software engineer at Oracle.

Tugdual also writes a blog available at http://tgrall.github.io/
Twitter: @tgrall
Email: tug@mapr.com

Ted Dunning

Ted Dunning is Chief Application Architect at MapR Technologies and committer and PMC member of the Apache Mahout, Apache ZooKeeper, and Apache Drill projects​. Ted has been very active in mentoring new Apache projects and is currently serving as vice president of incubation for the Apache Software Foundation​.​ Ted was the chief architect behind the MusicMatch (now Yahoo Music) and Veoh recommendation systems. He built fraud detection systems for ID Analytics (later purchased by LifeLock) and he has 24 patents issued to date and a dozen pending. Ted has a PhD in computing science from the University of Sheffield. When he’s not doing data science, he plays guitar and mandolin. He also bought the beer at the first Hadoop user group meeting..