DA 450 - Apache Pig Essentials



About this course

DA 450 - Apache Pig Essentials is an introductory-level course designed for data analysts and developers. The course begins with a review of data pipeline tools, then covers how to load and manipulate relations in Pig.

Are you ready?

  • Required:
    • Familiarity with a command-line interface, such as a Unix shell
    • Familiarity with RDBMS database tools, such as SQL
    • Access to, and the ability to use, a laptop with an internet connection and a terminal program installed (such as terminal on the Mac, or PuTTY on Windows).
  • Recommended:

Right for you?

  • For data analysts and developers interested in the data pipeline
  • For data scientists and business analysts who are familiar with SQL and want to use data on an HDFS
  • This is a programming course; you must have some programming experience to do the exercises

What's next?


Apache Pig Essentials
  • Pig in the Hadoop Ecosystem
    • Use cases of Pig
    • Steps in the data pipeline
  • Extract, Transform, and Load Data
    • Load data into relations
    • Debug Pig scripts
    • Perform simple manipulations
    • Save relations as files
  • Manipulate Data
    • Subset relations
    • Combine relations
    • Use UDFs on relations


Related Resources

MapR Sandbox with Drill
Get started


Advice from the front.

Other Resources

Pig Documentation

Apache Pig Website


On-demand Training
DA 410 - Apache Drill Essentials
Learn more

DA 440 - Apache Hive Essentials
Learn more

Instructor-led Training
DA 4000 - Apache Drill
Learn more