DEV 3000 - Developing Hadoop Applications


About this course

This course is designed to teach developers how to write effective MapReduce applications. Every lecture is accompanied by a hands-on exercise. There are multiple data sets, approaches, and demonstrations used throughout the course to enable the student to explore the rich variety of solutions available using the MapReduce paradigm. The primary objective of the training is to understand how to write effective MapReduce applications in Java, and every lesson and lab contributes to that performance objective. The course also covers debugging, managing jobs, improving performance, working with different data sources, managing workflows, and using other programming languages for MapReduce.

Prerequisites for Success in the Course

Review the following prerequisites carefully and decide if you are ready to succeed in this course. If you have not met the prerequisites, you may fall behind in the lab work, and the instructor will not have time to provide individual assistance.

  • Required:
    • The ability to use a text editor, such as vi
    • Basic to intermediate Linux knowledge, including familiarity with basic command-line options such a mv, cp, ssh, grep, cd, useradd
    • Access to, and the ability to use, a laptop with a browser and terminal program installed (such as terminal on the Mac, or PuTTY on Windows)
  • Recommended:
  • Optional: Basic Hadoop knowledge

Right for you?

  • For application developers


This course prepares you for the MapR Certified Hadoop Developer (MCHD) certification exam.


Included in this 3-day course are
  • Slide Guide pdf
  • Lab Guide pdf
  • Lab Code
Day 1
  • Introduction to Developing Hadoop Applications
    • Illustrate the MapReduce model conceptually
    • Brief history of MapReduce
    • Discuss how MapReduce works at a high level
    • Define how data flows in MapReduce
    • Hands-on exercises
  • Job Execution Framework MapReduce v1 & v2
    • Describe the MapReduce v1 job execution framework
    • Compare MapReduce v1 to MapReduce v2 (YARN)
    • Describe how jobs execute in YARN
    • Describe how to manage jobs in YARN
    • Hands-on exercises
  • Lesson 3: Write a MapReduce Program
    • Summary of the programming problem
    • Design and implement the Mapper class, Reducer class and driver
    • Build and execute the code, then examine the output
    • Describe data set for programming problem
    • Hands-on exercises
Day 2
  • Lesson 4: Use the MapReduce API
    • API overview
    • Mapper input processing and Reducer output processing data flow
    • Explore the Mapper, Reducer and Job class API
    • Hands-on exercises
  • Lesson 5: Managing, Monitoring, and Testing MapReduce Jobs
    • Work with counters
    • Use the MCS to monitor jobs
    • Use the Hadoop CLI to manage jobs
    • Display job history and logs
    • Write unit tests for MapReduce programs
    • Hands-on exercises
  • Lesson 6: Characterizing and Improving MapReduce Job Performance
    • Learn components of MapReduce performance
    • Enhance performance in your MapReduce jobs
    • Overview of MapR performance enhancements
    • Hands-on exercises
Day 3
  • Lesson 7: Working with Different Data Sources in MapReduce
    • Work with sequence files
    • Working with the distributed cache
    • Working with HBase
    • Hands-on exercises
  • Lesson 8: Managing Multiple MapReduce Jobs
    • Different approaches to launching multiple MapReduce jobs
    • Implement programmatic job control in the driver
    • Use MapReduce chaining
    • Use Oozie to manage MapReduce workflows
    • Hands-on exercises
  • Lesson 9: Using MapReduce Streaming
    • Overview of the MapReduce streaming paradigm
    • Configure MapReduce streaming parameters
    • Define the programming contract for mappers and reducers
    • Monitor and debug MapReduce streaming jobs
    • Hands-on exercises