DEV 362 - Create Data Pipeline Applications Using Apache Spark


About this course

This course is the third in the Apache Spark series. In this course, you cover the following Apache Spark libraries - Spark Streaming, Spark SQL, Spark MLlib, and Spark GraphX. This course describes the benefits of the Apache Spark unified platform and how to build a data pipeline application using Spark Streaming, Spark SQL, Spark GraphX, and MLlib. The concepts are taught using scenarios in Scala that also form the basis of hands-on labs.

Right for you?

  • For application developers

Prerequisites for success in the course:

  • Required
    • DEV 361 Build and Monitor Apache Spark Applications
    • Basic to intermediate Linux knowledge, including:
      • The ability to use a text editor, such as vi
      • Familiarity with basic command-line options such a mv, cp, ssh, grep, cd, useradd
    • Knowledge of application development principles
    • A Linux, Windows, or Mac OS computer with the MapR Sandbox installed (On-demand course)
    • Connection to a Hadoop cluster via SSH and web browser (for the ILT and vILT course)
  • Recommended


This course helps prepare you for the MCSD – MapR Certified Spark Developer certification exam.


Lesson 7:
Introduction to Apache Spark Data Pipelines
  • Identify components of Apache Spark Unified Stack
  • List benefits of Apache Spark over Hadoop ecosystem
  • Describe data pipeline use cases
Lesson 8:
Create an Apache Spark Streaming Application
  • Describe Spark Streaming architecture
  • Create DStreams and a Spark Streaming application
  • Lab: Build and run a Streaming application which writes to HBase
  • Apply operations on DStream
  • Define window operations
  • Labs:
    • Build and run a Streaming application with SQL
    • Build and run a Streaming application with Windows and SQL
  • Describe how Streaming applications are fault-tolerant
Lesson 9:
Use Apache Spark GraphX
  • Describe GraphX
  • Define regular, directed, and property graphs
  • Create a property graph
  • Perform operations on graphs
  • Labs:
    • Create a property graph
    • Apply graph operations
Lesson 10:
Use Apache Spark MLlib
  • Describe Spark MLlib
  • Describe the Machine Learning techniques
    • Classification
    • Clustering
    • Collaborative filtering
  • Use collaborative filtering to predict user choice
  • Labs:
    • Load and inspect data using the Spark shell
    • Use the Spark MLlib to make movie recommendations