DEV 362 - Create Data Pipeline Applications Using Apache Spark

register-html: 

Register

About this course

This course is the third in the Apache Spark series. In this course, you cover the following Apache Spark libraries - Spark Streaming, Spark SQL, Spark MLlib, and Spark GraphX. This course describes the benefits of the Apache Spark unified platform and how to build a data pipeline application using Spark Streaming, Spark SQL, Spark GraphX, and MLlib. The concepts are taught using scenarios in Scala that also form the basis of hands-on labs.

Right for you?

  • For application developers

Prerequisites for success in the course:

  • Required
    • DEV 361 Build and Monitor Apache Spark Applications
    • Basic to intermediate Linux knowledge, including:
      • The ability to use a text editor, such as vi
      • Familiarity with basic command-line options such a mv, cp, ssh, grep, cd, useradd
    • Knowledge of application development principles
    • A Linux, Windows, or Mac OS computer with the MapR Sandbox installed (On-demand course)
    • Connection to a Hadoop cluster via SSH and web browser (for the ILT and vILT course)
  • Recommended

Certification

Syllabus

Lesson 7 – Introduction to Apache Spark Data Pipelines

  • Identify components of Apache Spark Unified Stack
  • List benefits of Apache Spark over Hadoop ecosystem
  • Describe data pipeline use cases

Lesson 8 – Create an Apache Spark Streaming Application

  • Describe Spark Streaming architecture
  • Create DStreams and a Spark Streaming application
    • Lab: Build and run a Streaming application which writes to HBase
  • Apply operations on DStream
  • Define window operations
  • Labs:
    • Build and run a Streaming application with SQL
    • Build and run a Streaming application with Windows and SQL
  • Describe how Streaming applications are fault-tolerant

Lesson 9 – Use Apache Spark GraphX

  • Describe GraphX
  • Define regular, directed, and property graphs
  • Create a property graph
  • Perform operations on graphs
  • Labs:
    • Create a property graph
    • Apply graph operations

Lesson 10 – Use Apache Spark MLlib

  • Describe Spark MLlib
  • Describe the Machine Learning techniques
    • Classification
    • Clustering
    • Collaborative filtering
  • Use collaborative filtering to predict user choice
  • Labs:
    • Load and inspect data using the Spark shell
    • Use the Spark MLlib to make movie recommendations

 


Related Resources
SANDBOX

MapR Sandbox with Drill
Get started

BLOG

Advice from the front.
Read

YOU MAY ALSO LIKE

On-demand Training
ESS 100 – Introduction to Big Data
Learn more