Become a Big Data Expert with Free, Comprehensive
Hadoop Online Training Courses
Hadoop On-Demand Training offers full-length courses on a range of Hadoop technologies for developers, data analysts and administrators. Designed in a format that meets your convenience, availability and flexibility needs, these courses will lead you on the path to becoming a certified Hadoop professional. Read our FAQ for more details.
ESS 100 - Introduction to
- Free 24x7x365 access. Take online Hadoop courses anytime, from anywhere in the world. Refresh your knowledge, whenever or wherever you want.
- In-depth online Hadoop curriculum covers MapReduce, HBase, Hive, and Apache Drill. Learn with interactive labs and quizzes.
- Get certified as a Hadoop expert. Add value to your current and future employers – and put your newly acquired skills into action right away.
- Stay at the forefront of Hadoop & big data Maintain your competitive edge in a data-hungry world.
I am a ...
This course introduces students to the basics of big data. Students will learn about big data concepts and how different tools and roles can help solve real-world big data problems.
This course is designed to introduce students to the basics of Apache Hadoop. The course begins with a brief introduction to the Hadoop Distributed File System and MapReduce, then covers several open source ecosystem tools such as Apache Spark, Apache Drill, and Apache Flume. Finally, these tools are applied to real-world use cases.
This course is designed to introduce students to the basics of the MapR Converged Data Platform. At the end of the course, students will have some of the fundamental knowledge necessary for other MapR Academy courses.
This is the first course in the Cluster Administration curriculum. This course covers pre-installation testing and verification, installing a MapR cluster, and performing post-installation benchmarking.
ADM 201 is the second course in the Cluster Administration curriculum. This course covers how to configure the cluster’s storage resources once the cluster has been installed.
ADM 202 is the third course in the Cluster Administration curriculum. This course defines methods for data ingestion, and covers the use of snapshots and mirrors.
This is the fourth and final course in the Cluster Administration curriculum. This course teaches you how to configure cluster settings, monitor the cluster, resolve issues, and optimize cluster performance.
This course teaches developers, with lectures and hands-on lab exercises, how to write Hadoop Applications using MapReduce and YARN in Java. The course extensively covers MapReduce programming, debugging, managing jobs, improving performance, working with custom data, managing workflows, and using other programming languages for MapReduce.
This course is intended for data analysts, data architects and application developers. DEV 320 provides you with a thorough understanding of the HBase data model and architecture, which is required before going on to designing HBase schemas and developing HBase applications.
Targeted towards data analysts, data architects and application developers, the goal of this course is to enable you to design HBase schemas based on design guidelines. You will learn about the various elements of schema design and how to design for data access patterns. The course offers an in-depth look at designing row keys, avoiding hot-spotting and designing column families. It discusses how to transition from a relational model to an HBase model. You will learn the differences between tall tables and wide tables. Concepts are conveyed through lectures, hands-on labs and analysis of scenarios.
Targeted towards data architects and application developers who have experience with Java, the goal of this course is to learn how to write HBase programs using Hadoop as a distributed NoSQL datastore.
Targeted towards data architects and application developers who have experience with Java, the goal of this series of courses is to learn how to write HBase programs using Hadoop as a distributed NoSQL datastore. This course builds on DEV 320 and 325 - HBase Data Model and Schema Design. This is a continuation of DEV 330 - Developing HBase Applications: Basics.
Targeted towards data analysts, data architects, and application developers, the goal of this course is to learn more about architecting your Apache HBase applications for performance and security. This course covers how to bulk load data into HBase, performance considerations and tips for designing your HBase application, benchmarking and monitoring your HBase application, and MapR-DB security. Concepts are conveyed through lectures, hands-on labs, and scenario analyses.
This introductory-level course teaches the core concepts necessary to understand and begin using MapR Streams to develop big data processing applications.
This course is targeted towards developers and administrators to give them the core concepts necessary to build simple MapR Streams applications.
This introductory course enables developers to get started developing big data applications with Apache Spark. In the first part of the course, you will use Spark’s interactive shell to load and inspect data. The course then describes the various modes for launching a Spark application. You will then go on to build and launch a standalone Spark application.
This course is the second in the Apache Spark series. You will learn to create and modify pair RDDs, perform aggregations, and control the layout of pair RDDs across nodes with data partitioning. This course also discusses Spark SQL and DataFrames, the programming abstraction of Spark SQL. This course also describes the components of the Spark execution model using the Spark Web UI to monitor Spark applications.
This course is the third in the Apache Spark series. In this course, you cover the following Apache Spark libraries - Spark Streaming, Spark SQL, Spark MLlib, and Spark GraphX. This course describes the benefits of the Apache Spark unified platform and how to build a data pipeline application using Spark Streaming, Spark SQL, Spark GraphX, and MLlib. The concepts are taught using scenarios in Scala that also form the basis of hands-on labs.
This introductory Apache Drill course, targeted at Data Analysts, Scientists and SQL programmers, covers how to use Drill to explore known or unknown data without writing code. You will write SQL queries on a variety of data types including structured data in a Hive table, semi-structured data in HBase or MapR-DB, and complex data file types, such as Parquet and JSON.
DA 415 is an intermediate level course designed for data analysts, developers, and systems administrators. It is a continuation of DA 410 - Apache Drill Essentials, and describes how a query is received and executed by Drill. You will learn the different services involved at each step, and how Drill optimizes a query for distributed SQL execution.
DA 440 is an introductory-level course designed for data analysts and developers. You will learn how Apache Hive fits in the Hadoop ecosystem, how to create and load tables in Hive, and how to query data using the Hive Query Language.
DA 450 - Apache Pig Essentials is an introductory-level course designed for data analysts and developers. The course begins with a review of data pipeline tools, then covers how to load and manipulate relations in Pig.
This certification demonstrates proficiency in the installation, administration, and maintenance of a clusters in the MapR Converged Data Platform.
This certification exam measures and validates the technical knowledge, skills and abilities required to write HBase programs using HBase as a distributed NoSQL datastore. This exam covers HBase architecture, the HBase data model, APIs, schema design, performance tuning, bulk-loading of data, and storing complex data structures.
This certification exam measures the specific technical knowledge, skills and abilities required to design and develop MapReduce programs in Java. This exam covers writing MapReduce programs, using MapReduce API, managing, monitoring and testing MapReduce programs and workflows.
The MapR Certified Spark Developer credential is designed for Engineers, Programmers, and Developers who prepare and process large amounts of data using Spark. The certification tests your ability to use Spark in a production environment; where coding knowledge is tested, we lean toward the use of Scala for our code samples.