MapR-DB adds high performance operational database capabilities to the MapR Converged Data Platform.
MapR-DB is an enterprise-grade, high performance, in-Hadoop NoSQL database management system. As part of the MapR Converged Data Platform, MapR-DB provides the first in-Hadoop document database that allows developers to deliver scalable applications that also leverage continuous analytics on real-time data. MapR-DB also supports a wide column data model and API to run existing Apache HBase™ applications faster and more reliably.
With MapR-DB, the MapR Platform is the only data platform built for running NoSQL operations, Hadoop analytics, and event stream processing in the same cluster.
Forrester Research ranked MapR-DB as the strongest "Current Offering" when compared against 14 other leading NoSQL big data technologies.
Download the full report: The Forrester Wave™: Big Data NoSQL, Q3 2016Download Now
MapR-DB: The Optimized In-Hadoop NoSQL Database
Only MapR-DB handles high volume and high velocity database workloads alongside batch analytical tasks. This is due to the MapR architecture which uniquely enables fast, random read/write access to files and tables. MapR-DB is a multi-model NoSQL database that uses Open JSON Application Interface (OJAI™) for document database capabilities, as well as a wide column API to run existing HBase applications faster and more reliably. It manages many operational data formats including log data, sensor data, metadata, clickstreams, user profiles, session states, and link/semantics/relationship data.
The MapR Converged Data Platform also provides support for HBase as an add-on to give developers a choice of in-Hadoop databases. Developers can run applications simultaneously on HBase and MapR-DB in the same deployment.
High-availability (HA). The MapR architecture eliminates single points of failure, avoiding data and job loss even upon multiple node failures in the cluster. HA for MapR-DB leverages the same data replication system used for Hadoop files.
Instant recovery. Upon node failure, a replica instantly takes over for the failed node without the failover lag seen in other distributions.
Disaster recovery (DR). Multi-master, real-time table replication enables distributed applications on global data while reducing the risk for data loss in DR scenarios.
Point-in-time recovery. Consistent snapshots instantly mark database tables at a specific time to recover from accidental deletions, overwrites, or corruption. Only MapR enables exact recovery, even for files and database tables that are open at the time of the snapshot.
Minimal Database Administration
Integrated database operations. Database operations are efficiently run in the underlying, core MapR platform services layer, so no extra servers and administration are required.
Automatic optimizations. MapR-DB handles region splits (i.e., sharding) automatically and eliminates compaction (defragmentation) delays. And unlike other in-Hadoop databases, MapR-DB is self-optimizing and does not require application-level database administration code.
Built-in HA/DR. The HA/DR capabilities of the MapR Converged Data Platform also include MapR-DB data. The MapR Control System (MCS) handles cluster administration as well as database-specific administration such as creating and modifying tables. A command line interface (CLI) and REST API are also available for administration.
Access controls. Access Control Expressions (ACEs) control permissions at various levels including column and sub-document by a combination of user, group, and role.
Kerberos and LDAP integration. MapR-DB can authenticate users with Kerberos and/or LDAP.
Native authentication. MapR also offers a standards-based authentication system as a simpler alternative to Kerberos that leverages Linux Pluggable Authentication Modules (PAM) to provide the widest registry support.
Comprehensive auditing. MapR-DB auditing logs help to analyze user behavior as well as to meet regulatory compliance requirements. MapR-DB uses the JSON format to log accesses at various levels including the column and sub-document levels. MapR also audits at the administrative, authentication, and file levels.
MapR-DB is built on the core MapR platform services layer that set records on both the TeraSort and the MinuteSort benchmarks. Recently, MapR-DB ran over 30,000 batch put operations per second on one node, and showed as much as an elevenfold speed advantage over HBase. With its in memory feature, MapR-DB can store a database in memory for additional performance gains.
Continuous Low Latency
Auto-tuning and data structure innovations ensure consistent low latency, even at the 95th and 99th percentile latency measurements. MapR (in red on the graph) consistently responds quickly, while the other distribution (in blue) shows many high spikes of low latency due to suboptimal disk cleanup.
Users can scale out their MapR clusters linearly and incrementally, and manage data sets with millions of columns across trillions of rows.
MapR-DB can scale to
- Thousands of nodes in a cluster
- Trillions of tables
- Trillions of records per table
- 64 column families per table
- Cell/document sizes up to 2 GB
- Multi-model support with a JSON based document API, and Apache HBase Java and C APIs
- HA/DR, instant recovery, and point-in-time recovery
- Multi-master, real-time table replication
- Strong data consistency
- Row/document-level ACID transactions
- Real-time updates to search indexes
- Integration with Hadoop
- Access control expressions (ACE) for fine-grained authorization
- Kerberos/LDAP integration, and easy-to-configure native authentication
- Comprehensive auditing at various granular levels
- In-memory database options for even faster speeds
- Consistently fast responsiveness
- Petabyte scalability