ESS 101 – Apache Hadoop Essentials

register-html: 

About this course

This course is designed to introduce students to the basics of Apache Hadoop. The course begins with a brief introduction to the Hadoop Distributed File System and MapReduce, then covers several open source ecosystem tools such as Apache Spark, Apache Drill, and Apache Flume. Finally, these tools are applied to real-world use cases.

Prerequisites What’s Included
ESS 100 – Introduction to Big Data
  • Slide guide
  • Glossary
  • This is a non-lab course.

What’s next?

ESS 102 – MapR Converged Data Platform. Together with ESS 100 and ESS 101, these courses comprise the ESS 1000 series courses. These are prerequisites for all courses in the Administrator (ADM), Data Analyst (DA), and Developer (DEV) learning paths.


Syllabus

Lesson 3:
Core Elements of Apache Hadoop
  • Compare and contrast local and distributed file systems
  • Explain data management in the Hadoop file system
  • Summarize the MapReduce algorithm
Lesson 4:
The Apache Hadoop Ecosystem
  • Define the following ecosystem components:
    • Administration: ZooKeeper, YARN
    • Ingestion: Flume, Oozie, Sqoop
    • Processing: Spark, HBase, Pig
    • Analysis: Hive, Drill, Mahout
Lesson 5:
Solving Big Data Problems with Apache Hadoop
  • Summarize the following use cases:
    • Data Warehouse Optimization
    • Recommendation Engine
    • Large-Scale Log Analysis