MapR Data Platform
The MapR Data Platform is a key component in the MapR Distribution for Apache Hadoop. It provides organizations with the enterprise-level functionality needed to take Hadoop to production, including:
- High Performance: MapR Data Platform has many architectural advantages for performance, including that it interacts directly with the underlying storage media to provide near-native performance.
- High Availability: With features such as Self-Healing HA (a no-NameNode architecture), JobTracker HA and NFS HA, MapR was designed to have no single points of failure. By distributing filesystem metadata, MapR achieves a no-NameNode architecture, eliminating one of the most frustrating failure modes of all other Hadoop distributions and providing both scale and performance advantages. JobTracker HA prevents lost jobs and painful restarts, while NFS HA provides continuous reliable access.
- Consistent Snapshots: MapR is the only distribution that provides consistent, point-in-time recovery because of its unique read-write storage architecture. MapR Snapshots require very low overhead and can be done on a frequent interval. Recovery is as easy as copying a directory or files from the snapshot directory to the current directory.
- Mirrors: Going far beyond replication, mirroring allows users to set policies around recovery time objectives (RTO) and mirror data automatically within the cluster, between two data centers, or between on-premise and cloud infrastructures.
- Compression: MapR provides automatic, behind-the-scenes compression to data, allowing organizations to get the most value from their hardware investments, taking the burden of compression off of software developers.
- Multi-tenancy: MapR protects the core system by isolating it from user jobs so runaway jobs can’t bring down the entire cluster. Volumes can be used to create customized environments for users, groups and applications with different usage, security and hardware requirements. Quotas, labels and queues help manage job execution, improving user experience.
- Security: The MapR distribution includes comprehensive authentication, authorization, and encryption features.
- Integrated Tables: MapR M7 Enterprise Database Edition integrates column-oriented NoSQL tables with consistently high performance, low latency, and high availability, while being fully compatible with HBase APIs.
- Seamless Access: MapR enables the user to mount the cluster via NFS, enabling Hadoop to support full random reads/writes across multiple readers and writers. Unlike other Hadoop distributions that cannot process open files, MapR gives you the capability to ingest data directly into the cluster, run analytics on streaming data and gain real-time access to the results.
MapR Data Platform maintains full compatibility with all other Hadoop distributions, exposing industry-standard interfaces including:
- HDFS API
- HBase API
- RESTful API