Apache Spark Tour: Philly DB
Philadelhia, PA
Tuesday, June 24, 2014

PhillyDB members are database developers, data architects, DBAs, analysts, and application developers who meet to learn about enterprise systems (Oracle, SQLServer, DB2), open source systems (MySQL, PostgreSQL, SQLite), big data management systems (Hadoop, MPP, NoSQL, NewSQL, and real-time SQL databases), and analytics tools.


Apache Spark

Keys Botzum View Bio


Hadoop disrupted decades of data management practices and technologies by introducing an Open Source massively parallel processing framework. The Hadoop community and the component ecosystem it has developed have been an unqualified success. The widely anticipated Apache Spark project is the newest addition to that ecosystem. 

The Spark software stack includes:

  • Spark - the core data-proccessing engine
  • Shark - interface for interactive querying
  • Spark Streaming - for streaming data analysis
  • MLib - for machine learning
  • GraphX - for graph analysis

Spark is quickly establishing itself as a leading environment for doing fast, iterative in-memory and streaming analysis. This talk will give an introduction to the Spark stack, explain how Spark achieves lighting fast results, and how it complements your existing Apache Hadoop investment.


Keys Botzum

Keys is Senior Principal Technologist with MapR Technologies, where he wears many hats. His primary responsibility is interacting with customers in the field, but he also teaches classes, contributes to documentation, and works with engineering teams. He has over 15 years of experience in large scale distributed system design. Previously, he was a Senior Technical Staff Member with IBM, and a respected author of many articles on the WebSphere Application Server as well as a book. When not wearing one of his MapR hats, Keys enjoys time with friends and family, and getting outside to play tennis and hike.