|This Guide provides installation instructions for the M7 Edition of the MapR distribution for Hadoop. Installation instructions for the M3 or M5 editions are available here.|
MapR is a complete, industry-standard, Hadoop distribution with key improvements. MapR Hadoop is API-compatible and includes or works with the family of Hadoop ecosystem components such as HBase, Hive, Pig, Flume, and others. MapR provides a version of Hadoop and key ecosystem components that have been tested together on specific platforms.
For example, while MapR supports the Hadoop FS abstraction interface, MapR specifically improves the performance and robustness of the distributed file system, eliminating the Namenode. The MapR distribution for Hadoop supports continuous read/write access, improving data load and unload processes.
To reiterate, MapR Hadoop does not use Namenodes.
The diagram above illustrates the services surrounding the basic Hadoop idea of Map and Reduce operations performed across a distributed storage system. Some services provide management and others run at the application level.
The MapR Control System (MCS) is a browser-based management console that provides a way to view and control the entire cluster.
MapR offers multiple editions of the MapR distribution for Apache Hadoop.
|M3||Free community edition|
|M5||Adds high availability and data protection, including multi-node NFS|
|M7||Supports structured table data natively in the storage layer, providing a flexible, NoSQL database compatible with Apache HBase. Available with MapR version 3.0 and later.|
The type of license you apply determines which features will be available on the cluster. The installation steps are similar for all editions, but you will plan the cluster differently depending on the license you apply.
This Installation Guide has been designed as a set of sequential steps. Complete each step before proceeding to the next.
Installing MapR Hadoop involves these steps:
- Planning the Cluster
- Determine which services will be run on each node. It is important to see the big picture before installing and configuring the individual management and compute nodes.
- Preparing Each Node
- Check that each node is a suitable platform for its intended use. Nodes must meet minimum requirements for operating system, memory and disk resources and installed software, such as Java. Including unsuitable nodes in a cluster is a major source of installation difficulty.
- Installing MapR
- Each node in the cluster, even purely data/compute nodes, runs several services. Obtain and install MapR packages, using either a package manager, a local repository, or a downloaded tarball.
- After installing services on a node, configure it to participate in the cluster, then initialize the raw disk resources.
- Bringing Up the Cluster
- Start the nodes and check the cluster. Verify node communication and that services are up and running.
- Create one or more volumes to organize data.
- Installing Hadoop Ecosystem Components
- Install additional Hadoop components alongside MapR services.
To begin, start by Planning the Cluster.