DA 450 - Apache Pig Essentials

register-html: 

Register

About this course

DA 450 - Apache Pig Essentials is an introductory-level course designed for data analysts and developers. The course begins with a review of data pipeline tools, then covers how to load and manipulate relations in Pig.

Are you ready?

  • Required:
    • Familiarity with a command-line interface, such as a Unix shell
    • Familiarity with RDBMS database tools, such as SQL
    • Access to, and the ability to use, a laptop with an internet connection and a terminal program installed (such as terminal on the Mac, or PuTTY on Windows).
  • Recommended:

Right for you?

  • For data analysts and developers interested in the data pipeline
  • For data scientists and business analysts who are familiar with SQL and want to use data on an HDFS
  • This is a programming course; you must have some programming experience to do the exercises

What's next?

Syllabus

Apache Pig Essentials
  • Pig in the Hadoop Ecosystem
    • Use cases of Pig
    • Steps in the data pipeline
  • Extract, Transform, and Load Data
    • Load data into relations
    • Debug Pig scripts
    • Perform simple manipulations
    • Save relations as files
  • Manipulate Data
    • Subset relations
    • Combine relations
    • Use UDFs on relations

       


Related Resources
SANDBOX

MapR Sandbox with Drill
Get started

BLOG

Advice from the front.
Read

Other Resources

Pig Documentation
Visit

Apache Pig Website
Visit

YOU MAY ALSO LIKE

On-demand Training
DA 410 - Apache Drill Essentials
Learn more

DA 440 - Apache Hive Essentials
Learn more

Instructor-led Training
DA 4000 - Apache Drill
Learn more