DA 450 - Apache Pig Essentials


About this course

DA 450 - Apache Pig Essentials is an introductory-level course designed for data analysts and developers. The course begins with a review of data pipeline tools, then covers how to load and manipulate relations in Pig.

Are you ready?

  • Required:
    • Familiarity with a command-line interface, such as a Unix shell
    • Familiarity with RDBMS database tools, such as SQL
    • Access to, and the ability to use, a laptop with an internet connection and a terminal program installed (such as terminal on the Mac, or PuTTY on Windows).
  • Recommended:

Right for you?

  • For data analysts and developers interested in the data pipeline
  • For data scientists and business analysts who are familiar with SQL and want to use data on an HDFS
  • This is a programming course; you must have some programming experience to do the exercises

What's next?


This course prepares you for the MapR Certified Data Analyst (MCDA) certification exam.


Lesson 1:
Pig in the Hadoop Ecosystem
  • Use cases of Pig
  • Steps in the data pipeline
Lesson 2:
Extract, Transform, and Load Data
  • Load data into relations
  • Debug Pig scripts
  • Perform simple manipulations
  • Save relations as files
Lesson 3:
Manipulate Data
  • Subset relations
  • Combine relations
  • Use UDFs on relations