DEV 361 - Build and Monitor Apache Spark Applications


About this course

This course is the second in the Apache Spark series. You will learn to create and modify pair RDDs, perform aggregations, and control the layout of pair RDDs across nodes with data partitioning. This course also discusses Spark SQL and DataFrames, the programming abstraction of Spark SQL. You will learn the different ways to load data into DataFrames, perform operations on DataFrames using DataFrame functions, actions and language integrated queries, and create and use user-defined functions with DataFrames. This course also describes the components of the Spark execution model using the Spark Web UI to monitor Spark applications. The concepts are taught using scenarios in Scala that also form the basis of hands-on labs. Lab solutions are provided in Scala and Python.

Right for you?

  • For application developers

Prerequisites for success in the course:

  • Required
    • DEV 360 - Apache Spark Essentials
    • Basic to intermediate Linux knowledge, including:
      • The ability to use a text editor, such as vi
      • Familiarity with basic command-line options such a mv, cp, ssh, grep, cd, useradd
    • Knowledge of application development principles
    • A Linux, Windows or MacOS computer with the MapR Sandbox installed (On-demand course)
    • Connection to a Hadoop cluster via SSH and web browser (for the ILT and vILT course)
  • Recommended

What’s next?


This course helps prepare you for the MCSD – MapR Certified Spark Developer certification exam.


Lesson 4:
Work with PairRDD
  • Review loading and exploring data in RDD
  • Lab: Load and explore data in RDD
  • Describe and create Pair RDD
  • Lab: Create and explore PairRDD
  • Control partitioning across nodes
  • Lab: Explore partitioning
Lesson 5:
Work with DataFrames
  • Create DataFrames
    • From existing RDD
    • From data sources
  • Lab: Create DataFrames using reflection
  • Work with data in DataFrames
    • Use DataFrame operations
    • Use SQL
  • Lab: Explore data in DataFrames
  • Create user-defined functions (UDF)
    • UDF used with Scala DSL
    • UDF used with SQL
  • Lab: Create and use user-defined functions
  • Repartition DataFrames
  • Lab: Build a standalone application
Lesson 6:
Monitor Apache Spark Applications
  • Describe components of the Spark execution model
  • Use Spark Web UI to monitor Spark applications
  • Debug and tune Spark applications
  • Lab: Use the Spark Web UI