Tutorials

The tutorials for the MapR Sandbox get you started with Hadoop development in minutes.

These tutorials cover a range of topics on Hadoop and the ecosystem projects.

Looking for Product Demos?

Click here
Getting Started with Drill

Apache Drill is a schema-less SQL query engine that allows you to run ANSI SQL queries against Hadoop and NoSQL data sources without the need to create centralized schemas. Get started with Drill using this tutorial and see it in action. Note that you will need to download the MapR Sandbox for Apache Drill for this tutorial. Read More

Tags: Hive , Apache Drill
Getting Started with Spark on MapR Sandbox

This tutorial walks you through the steps required to build and run your first Spark application on the MapR Sandbox for Hadoop.

Tags: Apache Spark
Building a Classification Model Using Spark

In this tutorial, we are going to build a real-time classification model using Spark on the MapR Converged Data Platform. We are using a hypothetical music streaming site where customers log into this service and listen to music tracks. We will use Python, PySpark and MLLib to compute some basic statistics and a simple training model that is used as input to the classifier.

Tags: HBase , Apache Spark
Recommender System with Mahout and Elasticsearch

This tutorial will get you started with recommender systems in the easiest way possible. The tutorial provides step-by-step instructions on how to code Mahout’s collaborative filtering algorithm to build and train a machine learning model, and then use search technology such as Elasticsearch to simplify deployment of the recommender.

Tags: ElasticSearch , Mahout
Getting Started with HBase Shell

In this tutorial, we use the HBase shell to perform CRUD operations to create an HBase table, put data into the table, retrieve data from the table and delete data from the table. This tutorial also gives a brief intro into the MapR Control System (MCS), where you'll learn how to create an HBase table, add ColumnFamilies, and manage their properties using MCS.

Tags: MapR Control System , HBase
Running SQL Queries on a JSON (YELP) Dataset using Drill

Apache Drill allows you to query semi-structured data without the need to build centralized schema. This tutorial walks you through the steps required to install Drill on your laptop and run it against the Yelp data set. Read More

Tags: Hive , Apache Drill
Hue Tutorial Part 1: FileBrowser, Metastore Manager and Beeswax

Hue is an interface for interacting with web applications that access the MapR File System (MapR-FS). Use the applications in Hue to access MapR-FS, work with tables, run Hive queries, MapReduce jobs, and Oozie workflows. In this first part of the two-part series, you can learn about the FileBrowser, the MetaStore Manager and Beeswax functionality.

Tags: Hue
Hue Tutorial Part 2: Pig, Job Designer and Oozie

Hue is an interface for interacting with web applications that access the MapR File System (MapR-FS). Use the applications in Hue to access MapR-FS, work with tables, and run Hive queries, MapReduce jobs, and Oozie workflows. In this second part of the two-part series, you will learn how to write a simple Pig script, and understand the Job Designer and Oozie functionality.

Tags: Pig , Hue
How to Mount a MapR Cluster Using NFS

When you mount a MapR cluster directly via NFS, your applications can read and write data directly into the cluster with standard tools, applications, and scripts. MapR enables direct file modification and multiple concurrent reads and writes via POSIX semantics. For example, you can run a MapReduce job that outputs to a CSV file, then import the CSV file directly into SQL via NFS.

Tags: NFS
MapR Control System Part 1: Dashboard and Setting Topology

The MapR Control System (MCS) is a graphical, programmatic control panel for cluster administration that provides complete cluster monitoring functionality and most of the functionality of the command line. In Part 1 of this three-part series, you will be introduced to the MCS dashboard, and you'll see how easy it is to set up a topology.

Tags: Hadoop Admin , MapR Control System
MapR Control System Part 2: Setting up Volumes, Snapshots and Mirrors

The MapR Control System (MCS) is a graphical, programmatic control panel for cluster administration that provides complete cluster monitoring functionality and most of the functionality of the command line. This is Part 2 of the three-part series on MCS tutorials that talks about setting up Volumes, Snapshots and Mirrors using MCS.

Tags: Hadoop Admin , MapR Control System
MapR Control System Part 3: Alarms and Metrics

The MapR Control System (MCS) is a graphical, programmatic control panel for cluster administration that provides complete cluster monitoring functionality and most of the functionality of the command line. This is Part 3 of the three-part series on MCS tutorials that talks about setting up Alarms and Notifications and the Metrics available on MCS.

Tags: Hadoop Admin , MapR Control System