DEV 3200 - HBase Applications Design and Build

register-html: 

About this course

Learn how to architect and write HBase programs using Hadoop as a distributed NoSQL datastore. This course introduces HBase architecture, the HBase data model, and the most important APIs for writing programs. The course also introduces schema design, performance tuning, bulk-loading of data, and storing complex data structures.

Prerequisites for Success in the Course

Review the following prerequisites carefully and decide if you are ready to succeed in this programming-oriented course. The Instructor will move forward with lab exercises, assuming that you have mastered the skills listed below.

  • Required:
    • Basic Linux knowledge, including familiarity with basic command-line options such a mv, cp, cd, ls, ssh, and scp
    • Access to, and the ability to use, a laptop with a terminal program installed (such as terminal on the Mac, or PuTTY and WinSCP Windows)
    • Beginner-to-intermediate fluency with Java or object-oriented programming in an IDE such as Eclipse
  • Recommended:
  • Optional: Basic Hadoop and database knowledge

Right for you?

  • For developers interested in designing and developing HBase applications
  • This is a programming course; you must have Java programming experience to do the exercises.

Certification

This course prepares you for the MapR Certified HBase Developer (MCHBD) certification exam.

Syllabus

Included in this 3-day course are

  • Access to a multi-node Amazon Web Services (AWS) cluster
  • Slide Guide pdf
  • Lab Guide pdf
Day 1
  • Introduction to HBase
    • Differentiate between RDBMS and HBase
    • Identify typical HBase use cases
  • HBase Data Model
    • Describe the HBase data model and data model components
    • Describe how logical data model maps physical storage on disk
    • Use data model operations
    • Create an HBase table
  • HBase Architecture
    • Identify the components of an HBase cluster
    • Describe how the HBase components work together
    • Describe how regions work and their benefits
    • Define the function of minor and major compactions
    • Describe Region Server splits
    • Describe how HBase handles fault tolerance
    • Differentiate MapR-DB from HBase
  • Basic Schema Design
    • List the elements of schema design
    • Design row keys for data access patterns
    • Design table shape and column families for data access patterns
    • Define column family properties
    • Design schema for given scenario
  • Design Schemas for Complex Data Structures
    • Transition from relational model to HBase
    • Use intelligent keys
    • Use secondary indexes or lookup tables
    • Design for other complex data structures
    • Evolve schemas over time
  • Using Hive to Query HBase
    • Use Hive to query HBase/MapR tables
Day 2
  • Java Client API Part 1
    • Define the CRUD operations from the Hbase Java API and discuss when and how to use them
    • Get, Put, Delete, Scan
    • Describe the data flow between Client and Server when using these APIs
    • Define the various helper classes for these APIs: KeyValue, Result, ResultScanner (Scan)
    • Lab on Java Client API Get, Put, Delete, Scan: Use these APIs to create an application
  • Java API Part 2
    • Client-side write buffer
    • HTable batch operations
    • checkAndPut: atomic put operation
    • KeyValue, Result Objects
    • Atomic put with checkAndPut
    • Lab on Java Client API HTable Batch, checkAndPut
    • Use HTable Batch APIs in an application
    • Use HTable checkAndPut APIs for row transactions in an application
  • Java Client API for Administrative Features
    • HTable descriptor
    • HColumnDescriptor
    • HBaseAdmin
    • Lab: Create Tables and Define Properties using the HBaseAdmin Java interface
Day 3
  • Advanced HBase Java API
    • Filters
    • Counters
    • Lab
    • Using filters in an application
    • Using counter-increment for row transactions in an application
  • Time Series Application with Flat Wide and Tall Narrow Implementations
    • Explanation of time series application implementation
    • Lab: Programming a Time Series Application
  • MapReduce on HBase
    • How is MapReduce used on HBase?
    • How to program MapReduce applications for HBase
    • Lab: Reading from HBase and writing Back Daily Statistics
  • Social Application
    • Explanation of social application implementation
    • Lab: Programming a Social Application
  • Bulk Loading of Data
    • Using the importtsv bulk load tool
    • Use MapReduce job to import data
    • Lab: Using Importtsv and MapReduce to Load from a File into HBase
  • Performance
    • Performance considerations
    • Monitoring
    • Benchmarking
    • Lab: YCSB Benchmarking
  • Security
    • Authentication, authorization, auditing, encryption
    • Access Control Expressions, roles, permissions
    • Lab: Tables Authorization

 

Related Resources
SANDBOX

MapR Sandbox for Hadoop
Get started

BLOG

Advice from the front.
Read

PREREQUISITES

On-demand Training
ESS 100 – Introduction to Big Data
Learn more