Spark Summit 2015
San Francisco, CA
Monday, June 15, 2015
Tuesday, June 16, 2015
The Spark Summit brings together the Apache Spark community to hear from leading production users of Spark, SparkSQL, Spark Streaming and related projects; find out where the project development is going; and learn how to use the Spark stack in a variety of applications. MapR is proud to be a Gold Sponsor of Spark Summit 2015.


Spark & Hadoop at Production Scale

Anil Gadre View Bio

June 15, 2015 at 10:35am

How are leading companies deploying Spark with Hadoop in production? What insights have they learned and what key considerations should you consider to put your Spark-based innovative app to work faster? Hear real-life customer examples of turning data into action using Spark and Hadoop and how advanced users are deploying Hadoop and Spark applications in one cluster with better reliability and performance at production scale.

Some Important Streaming Algorithms You Should Know About

Ted Dunning View Bio

June 16, 2015 at 4:00pm

Streaming algorithms are becoming extremely important as people push more and more to real-time processing. Some of these algorithms are reasonably well known like k-min counters or hyper log log. There are other newer important algorithms available, however, like t-digest and streaming k-means. I will survey these and other algorithms in an approachable, but sound presentation on the most important algorithms of this kind. I will pay particular attention to the newer algorithms including t-digest which allows extremely accurate quantile computation, streaming k-means which allows accurate clustering with exactly one pass over the data and (nearly bounded storage), and truly real-time collaborative filtering.

Adding Complex Data to Spark Stack

Neeraja Rentachintala View Bio

June 16, 2015 at 5:30pm

This session discusses latest integration between Apache Drill and Spark technologies. Together the combination allows Spark users to leverage Drill’s flexible schema and dynamic schema discovery capabilities to query and work with complex data directly using familiar Spark programming paradigms.


Anil Gadre

Anil Gadre is the SVP of Product Management at MapR. Prior to MapR, Anil was the EVP of Product Management at Silver Spring Networks, responsible for product strategy, planning and marketing of networking and software products focused on the Smart Grid for the energy industry. Before that, Anil was with Sun Microsystems, a Fortune 200 technology leader, serving as EVP of The Application Platform Software organization and had previously been the Chief Marketing Officer leading global branding, demand creation and an extensive developer ecosystem program. At Sun Microsystems his experience covered diverse product lines ranging from networked desktop and enterprise servers systems to market leading software products such as the Solaris Operating system, Java, MySQL database and various middleware products. He has a BSEE from Stanford University, and an MM degree from the Kellogg School at Northwestern University.

Ted Dunning

Ted Dunning is Chief Application Architect at MapR Technologies and committer and PMC member of the Apache Mahout, Apache ZooKeeper, and Apache Drill projects​. Ted has been very active in mentoring new Apache projects and is currently serving as vice president of incubation for the Apache Software Foundation​.​ Ted was the chief architect behind the MusicMatch (now Yahoo Music) and Veoh recommendation systems. He built fraud detection systems for ID Analytics (later purchased by LifeLock) and he has 24 patents issued to date and a dozen pending. Ted has a PhD in computing science from the University of Sheffield. When he’s not doing data science, he plays guitar and mandolin. He also bought the beer at the first Hadoop user group meeting..

Neeraja Rentachintala

As Director of Product Management, Neeraja is responsible for the product strategy, roadmap and requirements of MapR's SQL initiatives. Prior to MapR, Neeraja held numerous product management and engineering roles at Informatica, Microsoft SQL Server, Oracle and, most recently as the principal product manager for Informatica Data Services/Data Virtualization.