Hive and Pig Hands-On Training

Duration

2 days

Delivery

  • Instructor-led Training
  • Students must provide their own workstation

Target Audience

Developers with basic Hadoop knowledge

Course Syllabus

Day 1

  • HIVE
  • Overview of Hadoop
    • Big Data
    • Distributed File System
    • MapReduce
  • Hive Introduction
    • Why Hive?
    • Compare vs SQL
    • Use Cases
  • Hive Architecture – Building Blocks
  • Hive CLI and Language (Exercise)
    • HDFS Shell
    • Hive CLI
    • Data Types
    • Hive Cheat sheat
    • Data Definition Statements
    • Data Manipulation Statements
    • Select, Views, GroupBy, SortBy/DistributeBy/ClusterBy/OrderBy, Joins
    • Built-in Functions
    • Union, Sub Queries, Sampling, Explain
  • Hive Usecase implementation (Exercise)
    • Use Case 1
    • Use Case 2
  • Best Practices
  • Advance Features
    • Transform and Map-Reduce Scripts
    • Custom UDF
    • UDTF
    • SerDe
  • Recap and Q&A

Day 2

  • PIG
  • Pig Introduction
    • Position Pig in Hadoop ecosystem
    • Why Pig and not MapReduce
    • Simple example (slides) comparing Pig and MapReduce
    • Who is using Pig now and what are the main use cases
  • Pig Architecture
    • Discuss high level components of Pig
  • Pig Grunt
  • Pig Latin Programming
    • Data Types
    • Cheat sheet
    • Schema
    • Expressions
    • Commands and Exercise
    • Load, Store, Dump, Relational Operations, Foreach, Filter, Group, Order By, Distinct, Join, Cogroup, Union, Cross, Limit, Sample, Parallel
  • Use Cases (working exercise)
    • Use Case 1
    • Use Case 2
    • Use Case 3 (compare pig and hive)
  • Advanced Features, UDFs
  • Best Practices and common pitfalls
  • Recap and Q&A