About this course
DA 440 is an introductory-level course designed for data analysts and developers. You will learn how Apache Hive fits in the Hadoop ecosystem, how to create and load tables in Hive, and how to query data using the Hive Query Language.
Are you ready?
- Familiarity with a command-line interface, such as a Unix shell
- Familiarity with RDBMS database tools, such as SQL
- Access to, and the ability to use, a laptop with an internet connection and a terminal program installed (such as terminal on the Mac, or PuTTY on Windows).
- Familiarity with Hadoop
- Completion of the on-demand course ESS 100 – Introduction to Big Data
Right for you?
- For data analysts and developers interested in the data pipeline
- For data scientists and business analysts who are familiar with SQL and want to use data on an HDFS
- This is a programming course; you must have some programming experience to do the exercises
SyllabusApache Hive Essentials
- Hive in the Hadoop Ecosystem
- Use cases of Hive
- Steps in the data pipeline
- Create and Load Data
- Create databases, internal tables, external tables, and partitioned tables
- Learn about data types and casting in Hive
- Load data into tables and databases
- Query and Manipulate Data
- Query, sort, and filter data
- Manipulate data with user-defined functions