Dallas Big Data Science
Dallas, TX
Thursday, January 21, 2016
The Dallas Big Data Science meetup is about sharing knowledge & learning the latest developments about statistics, machine learning, math, analytics, parallel algorithms, distributed systems.


Crunch in R, Analyze in Spark

Will Cairns View Bio

SparkR is an R package that provides a lightweight frontend to use Apache Spark from R. This project has gained enormous traction, and is being used in many production systems. Each respective community (R & Spark) has eclipsed all expectations by the numbers of contributors and participants in online forums. This talk will introduce SparkR, discuss some of its features, my own personal experiences, and highlight the power of combining R's interactive console with Spark's distributed data. SparkR provides a distributed data frame implementation that supports operations like selection, filtering, aggregation etc. (similar to R data frames, dplyr) but on large datasets. SparkR also supports distributed machine learning using MLlib.


Will Cairns

Will Cairns is a Data Scientist at MapR Technologies. Prior to joining MapR, Will held engineering positions with HP/Vertica, Teleglobe Telecom, Vonage, and ITXC. In addition, Will has worked independently as a statistical consultant in a number of industries, including an NYC hedge fund, a number of Silicon Valley startups, and a Las Vegas casino. His specialties include statistics, machine learning, data mining, R Project, IPython, SAS, BI, SQL, PostgreSQL, hypothesis testing, and predictive modeling. Will has a Master’s Degree in Statistics from the New Jersey Institute of Technology. In his free time, he enjoys running, playing guitar, and golf.