About this course
DA 450 - Apache Pig Essentials is an introductory-level course designed for data analysts and developers. The course begins with a review of data pipeline tools, then covers how to load and manipulate relations in Pig.
Are you ready?
- Familiarity with a command-line interface, such as a Unix shell
- Familiarity with RDBMS database tools, such as SQL
- Access to, and the ability to use, a laptop with an internet connection and a terminal program installed (such as terminal on the Mac, or PuTTY on Windows).
- Familiarity with Hadoop
- Completion of the on-demand course ESS 100 – Introduction to Big Data
Right for you?
- For data analysts and developers interested in the data pipeline
- For data scientists and business analysts who are familiar with SQL and want to use data on an HDFS
- This is a programming course; you must have some programming experience to do the exercises
SyllabusApache Pig Essentials
- Pig in the Hadoop Ecosystem
- Use cases of Pig
- Steps in the data pipeline
- Extract, Transform, and Load Data
- Load data into relations
- Debug Pig scripts
- Perform simple manipulations
- Save relations as files
- Manipulate Data
- Subset relations
- Combine relations
- Use UDFs on relations
MapR Sandbox with Drill
Advice from the front.
Apache Pig Website
|YOU MAY ALSO LIKE|
DA 440 - Apache Hive Essentials