DEV 301 - Developing Hadoop Applications


About this course

This course is designed to teach developers how to write effective MapReduce applications. Every lecture is accompanied by a hands-on exercise. There are multiple data sets, approaches, and demonstrations used throughout the course to enable the student to explore the rich variety of solutions available using the MapReduce paradigm. The primary objective of the training is to understand how to write effective MapReduce applications in Java, and every lesson and lab contributes to that performance objective. The course also covers debugging, managing jobs, improving performance, working with different data sources, managing workflows, and using other programming languages for MapReduce.

Right for you?

  • For application developers, data analysts, data architects, database administrators

Are you ready?

Yes, if you have:
  • Taken ESS 100 – Introduction to Big Data
  • Beginner-to-intermediate fluency with Java or object-oriented programming in an IDE
  • Basic Hadoop knowledge -- helpful but not required.
  • A Linux, PC or Mac with a MapR Sandbox downloaded



Lesson 1:
Introduction to Developing Hadoop Applications
  • Illustrate the MapReduce model conceptually
  • Brief history of MapReduce
  • Discuss how MapReduce works at a high level
  • Define how data flows in MapReduce
  • Hands-on Exercises
Lesson 2:
Job Execution Framework
MapReduce v1 and v2
  • Describe the MapReduce v1 job execution framework
  • Compare MapReduce v1 to MapReduce v2 (YARN)
  • Describe how jobs execute in YARN
  • Describe how to manage jobs in YARN
  • Hands-on Exercises
Lesson 3:
Write a MapReduce Program
  • Summary of the programming problem
  • Design and implement the Mapper class, Reducer class and driver
  • Build and execute the code then examine the output
  • Describe data set for programming problem
  • Hands-on Exercises

Lesson 4:
Use the MapReduce API
  • API overview
  • Mapper input processing and Reducer output processing data flow
  • Explore the Mapper, Reducer and Job class API
  • Hands-on exercises
Lesson 5:
Managing, Monitoring, and
Testing MapReduce Jobs
  • Work with counters
  • Use the MCS to monitor jobs
  • Use the Hadoop CLI to manage jobs
  • Display job history and logs
  • Write unit tests for MapReduce programs
  • Hands-on Exercises
Lesson 6:
Managing Performance
  • Learn components of MapReduce performance
  • Enhance performance in your MapReduce jobs
  • Overview of MapR performance enhancements
  • Hands-on exercises

Lesson 7:
Working With Data
  • Work with sequence files
  • Working with the distributed cache
  • Working with HBase
  • Hands-on exercises
Lesson 8:
Launching Jobs
  • Implement programmatic job control in the driver
  • Use MapReduce chaining
  • Use Oozie to manage MapReduce workflows
  • Hands-on Exercises
Lesson 9:
Using Non-Java Programs
(Streaming MapReduce)
  • Overview of the MapReduce streaming paradigm
  • Configure MapReduce streaming parameters
  • Define the programming contract for mappers and reducers
  • Monitor and debug MapReduce streaming jobs
  • Hands-on Exercises