Real-Time Profiles with Spark and Python

An appealing aspect of Spark for handling application workloads is its support for Python as a first-class citizen.  This can be especially useful for quickly aggregating and summarizing large datasets and feature generation for machine learning models. Join this 30-minute Free Code Friday with Nick Amato, Director of Technical Marketing for MapR. He'll show you how to:

  • Take an example dataset of individual customer behaviors,
  • Process the dataset in Spark with Python, and
  • Extract meaningful statistics and features for analysis