MapR Distribution including Apache™ Hadoop®
The MapR Distribution including Apache Hadoop provides you with an enterprise-grade distributed data platform to reliably store and process big data. With its unbiased approach to open source, MapR gives you a broad range of technologies—multiple projects for SQL-on-Hadoop, NoSQL databases, execution engines, etc.—to choose from, so you can use the right tool for your needs. And with its backward compatibility and multi-version project support, you can upgrade projects and existing applications individually on your own schedule. The browser-based management console, MapR Control System, you can monitor and manage your Hadoop cluster easily and efficiently. Key MapR features are described below.

* Planned for certification or inclusion
MapR supports multiple projects for SQL-on-Hadoop, NoSQL databases, execution engines, etc.,
to let you choose the tools that meet your needs.
Integrated Enterprise-Grade NoSQL - MapR-DB
MapR provides an optional, enterprise-grade NoSQL database in Hadoop to run operational and analytical workloads together in a single cluster. You can run your existing HBase applications on MapR-DB because it incorporates HBase functionality. It supports the HBase API, the flexible wide-column data model, Hadoop scalability, data locality with MapReduce jobs, row-level ACID transactions, strong data consistency, etc. It runs on the same nodes in a cluster with Hadoop and stores data in Hadoop, letting you run database workloads alongside Hadoop analytics. It shares administrative functionality with Hadoop, including capabilities around high availability, disaster recovery, snapshots, and security (authentication, authorization, wire-level encryption). It is architected to deliver high performance, continuous low latency (no compaction/defragmentation delays), and extreme scalability.
Direct Access NFS and Interoperability
Industry-standard NFS access capabilities give you the ability to read and write Hadoop data as if it were stored on a regular file system. Unlike other distributions, MapR gives you true NFS capabilities with its Direct Access NFS feature which supports full read/write capabilities. This lets you run existing file system-based applications on MapR without any changes. Direct Access NFS was architected for high performance and high availability, making it ready for your large-scale, business-critical, distributed deployments.
MapR promotes interoperability with support for other popular industry standards—all Hadoop interfaces (including HDFS and HBase), ODBC/JDBC, Kerberos, LDAP, etc.—so you can deploy MapR into your existing enterprise architecture and avoid vendor lock-in.
MapR Direct Access NFS:
Download the Tech Brief
Comparing MapR-FS and HDFS NFS and Snapshots:
Read the Blog Post
High Availability and Disaster Recovery
The built-in MapR high availability (HA) features eliminate single points of failure at the node, file system metadata, NFS access, resource management (YARN), and job tracking levels. These give you high uptime with zero data loss. You also get no work loss upon node failure to avoid restarting jobs from scratch. Rolling Upgrades let you upgrade live clusters one node at a time to minimize planned downtime.
The built-in MapR disaster recovery (DR) features let you develop a true business continuity strategy to overcome a site-wide disaster. MapR Mirroring lets you create a consistent remote replica or "mirror" for disaster recovery, as well as for load balancing and geographic distribution. Scheduled mirroring sends only block-level differentials to minimize both synchronization time and bandwidth utilization, and can help you define an appropriate recovery point objective (RPO) for your needs. The Promotable Mirrors feature lets you easily enable a remote mirror as the active master cluster, to help you define a low recovery time objective (RTO). Mirror Cascades let you create chains of mirrors (mirrors of mirrors) to support multiple remote data centers.
MapR Snapshots let you recover quickly from file deletion or corruption. MapR Snapshots are consistent, meaning they accurately reflect the point-in-time state of the cluster at the time the snapshot was taken.
High Availability on MapR:
Download the Tech Brief
MapR Snapshots:
Download the Tech Brief
MapR Consistent Snapshots:
Watch the Video

Integrated Security
MapR provides security controls to ensure that sensitive data is accessible only by authorized users. Hadoop data is protected using standard Unix file permissions, along with advanced role-based access control lists. For authentication services, you can integrate with Kerberos and/or LDAP via Pluggable Authentication Modules (PAM). A native authentication system is also available for you as an alternative to Kerberos, and is ideal for environments that do not have or need external authentication systems. Performant wire-level encryption encrypts data sent between nodes to ensure data privacy
Learn moreHigh Performance and Scalability
MapR has set world records for both TeraSort and MinuteSort, while using far fewer hardware resources than other contenders. A MapR customer recently set a MinuteSort records by sorting 1.65 TB of data in one minute on 1/7th the hardware servers of the previous record holder. Innovations at the file system for faster file access, and an optimized MapReduce shuffle engine let you get more work with less hardware than compared to other distributions. Faster file access and a faster optimized shuffle for MapReduce lets customers get more work out of their hardware investment.
With its distributed file system metadata architecture, MapR scales linearly with the number of nodes, with support of up to 1 trillion files. MapR clusters are designed to scale to 10,000 nodes to provide plenty of headroom for today's growing big data deployments.
MapR-DB can scale to levels beyond other NoSQL technologies, and supports up to 1 trillion tables, millions of columns across trillions of rows, and cell sizes up to 2 GB.
MapR MinuteSort Record
Read the Blog Post
MapR TeraSort Record
Read the Blog Post
Multi-Tenancy
With multi-tenancy, a capability unique to MapR, you can manage distinct user groups, data sets, and applications in a single cluster while keeping them isolated from each other. You can run different jobs at the same time safely, securely, and efficiently. Several features contribute to the multi-tenancy capability in MapR:
- Volumes - logical partitions of the cluster for creating separate administrative policies such as quotas, permissions, and capacity planning
- Security - role-based access controls to limit data access to authorized users
- Data placement control - specify on which nodes data resides to isolate distinct data sets
- Job placement control - specify which nodes will run jobs to take advantage of resources in specific parts of the cluster, used in conjunction with data placement control
- ExpressLane - automatically let small jobs quickly run to completion even if the cluster is busy with other large jobs
- YARN - use the Hadoop 2.X resource scheduler for an alternative level of resource control when running multiple jobs in a cluster





