MapR Converged Data Platform
Enterprise-Grade Platform Services
MapR-FS is the enterprise standard POSIX file system that provides high-performance read/write data storage for the MapR Converged Data Platform. MapR-FS includes important features for production deployments such as fast NFS access, access controls, and transparent data compression at a virtually unlimited scale.
MapR-DB is an enterprise-grade, high performance, in-Hadoop NoSQL database management system. It is used to add real-time, operational analytics capabilities to applications built on the Hadoop or Spark ecosystems. Because it is integrated into the MapR Converged Data Platform, it inherits the protections and high performance capabilities.
MapR Streams is a global publish-subscribe event streaming system for big data. It connects data producers and consumers worldwide in real time, with unlimited scale. MapR Streams is built into the MapR Converged Data Platform, making it the only highly available streaming system to support global event replication.
The built-in MapR high availability (HA) features eliminate single points of failure at the node, and include file system metadata, NFS access, resource management (YARN), and job tracking levels. You can benefit from:
- High uptime with zero data loss, despite multiple node failures in the cluster.
- No work loss upon node failure to avoid restarting jobs from scratch.
- Rolling upgrades which let you upgrade live clusters one node at a time to minimize planned downtime.
- Zero configuration required to get HA, unlike other big data platforms. No complex setup or manual intervention is needed.
MapR gives you real-time capabilities beyond what other platforms can provide in a single cluster. With businesses always seeking to respond faster to new events, MapR provides key real-time capabilities for:
- Immediate access to large data files in MapR-FS, even as they are being loaded into the system.
- Interactive read and write operations for business applications with MapR-DB.
- Self-service exploration of new data with SQL via Apache Drill, without having to first create a formal schema.
- Reliable delivery of global, high speed streams of event data with MapR Streams.
MapR provides security controls to ensure that sensitive data is accessible only by authorized users. MapR provides:
- Authentication via Kerberos and/or LDAP via Pluggable Authentication Modules, or a native username/password authentication system as an alternative to Kerberos.
- Access controls for files, databases, and streams, including Access Control Expressions (ACEs) for fine grained, Boolean expression-based permissions.
- Performant wire-level encryption protects data sent between nodes and applications to ensure data privacy.
- Comprehensive auditing on data accesses, authentication, and administrative operations.
With multi-tenancy, a capability unique to MapR, you can manage distinct user groups, data sets, and applications in a single cluster while keeping them isolated from each other. You can run different jobs at the same time safely, securely, and efficiently. Several features contribute to the multi-tenancy capability in MapR:
- Volumes - logical partitions of the cluster for creating separate administrative policies such as quotas, permissions, and capacity planning.
- Security - role-based access controls to limit data access to authorized users.
- Data/job placement control - specify on which nodes data resides and jobs run.
- YARN - use the Hadoop 2.X resource scheduler as another level of resource control when running multiple jobs in a cluster.
MapR Platform Services support distinct global cluster deployments that run as a single logical, global cluster. With global namespace support, you can:
- Access any data sets (with the appropriate access controls) on any remote cluster as if they were part of the local cluster.
- Submit jobs from a cluster at one site to a cluster at a remote site.
- Perform administrative tasks for any globally remote cluster from a single administrative interface.
Ensure fidelity and protection of your critical data with mirroring, replication, and consistent, point-in-time snapshots.
- Scheduled, incremental, block-level mirroring allows you to deploy your mission-critical disaster recovery strategy on large files with low recovery point objectives (RPO) and low recovery time objectives (RTO).
- MapR-DB and MapR Streams deliver immediate updates to remote replicas in real time to enable very low RPOs. Replicas are immediately available for active use upon failover.
- Consistent snapshots protect against data loss or corruption due to user or application errors. Snapshots also can be used for creating consistent, online backups.
MapR delivers a powerful node recovery process via patented innovations. MapR serves your big data environments that cannot lose data, must run on a 24x7 basis, and require immediate recovery from node and site failures—all with a smaller data center footprint. MapR supports these capabilities for the broadest set of applications from batch analytics to interactive querying and real-time streaming.Learn More
Management & Monitoring
Manage and monitor your big data cluster with the interface that best suits your workflow: browser-based, REST API, or command line.
- Easily provision nodes in your cluster with appliance-like simplicity with the browser-based Auto-Provisioning Templates.
- Manage your infrastructure with instant views/alerts of your cluster health with heatmaps and alarms.
- Manage applications by viewing running jobs for troubleshooting or utilization auditing.
- Manage data with volumes, security, mirroring, and snapshots.
When applications go from idea to reality, MapR provides the only production-ready platform for Hadoop, Spark, and related technologies.
The design of the patented MapR Converged Data Platform speaks directly to Enterprise Architects who know best that architecture matters.
MapR provides developers the widest variety of popular open source projects for developing data applications.