A Quick Video Explanation of the MapR
Converged Data Platform

One Platform for Big Data Applications

With MapR, data does not need to be moved to specialized silos for processing; data can be processed in place. In fact, we have applied the concept of  "Polyglot Persistence" to the MapR Platform, with the ability to leverage multiple data types and formats directly, depending on your use case. The MapR Converged Data Platform enables direct processing of files, tables, and event streams. The MapR Platform also makes it easier to leverage existing applications and solutions by supporting POSIX-compliant, industry-standard NFS. Additionally, containerized applications can make use of the MapR Persistent Application Client Containers to securely access and leverage MapR platform services (MapR-FS, MapR-DB, MapR Streams) as a persistent data store.

Additional features to support a diverse set of applications and users include a range of enterprise-grade features: unified security, global namespace, high availability, data protection and disaster recovery support; multi-tenancy and volume support; data and job placement control so applications can be selectively executed in a cluster to take advantage of faster CPUs or SSD drives; and support for a heterogeneous hardware cluster.


MapR sets MinuteSort record using Google Compute Engine and MapR Distribution for Apache Hadoop


MapR provides world-record performance for MapReduce operations. MapR holds the MinuteSort world record by sorting 1.5 TB of data in one minute. The previous record was less than 600 GB. With an advanced architecture that is built in C/C++ and harnesses distributed metadata with an optimized shuffle process, MapR delivers consistent high performance.

MapR and the Easiest Access to Hadoop Data

File-Based Applications

MapR is a 100% POSIX-compliant system that fully supports random read-write operations. By supporting industry-standard NFS, users can mount a MapR cluster and execute any file-based application, written in any language, directly on the data residing in the cluster. All standard tools in the enterprise including browsers, UNIX tools, spreadsheets, and scripts can access the cluster directly without any modifications.

Interactive SQL

There are a number of applications that support SQL access against data contained in MapR including Apache Drill, Apache Hive, Spark SQL, Impala, and others. MapR is a key contributor to Apache Drill - a query engine for Hadoop, NoSQL, and cloud storage. Drill provides the first schema-free SQL engine for big data that enables instant self-service data exploration across multi-structured data including NoSQL, Hadoop as well as traditional RDBMS. With ANSI SQL compatibility, Drill supports all of the standards tools that the enterprise uses to build and implement SQL queries.

A Quick Video Explanation of MapR-DB


MapR has removed the trade-offs organizations face when looking to deploy a NoSQL solution. Specifically, MapR-DB delivers ease of use, dependability and performance advantages for HBase applications and supports both key-value and native JSON documents. MapR provides scale, strong consistency, reliability and continuous low latency with an architecture that eliminates delays due to compactions or consistency checks. MapR-DB is ideal for big data use cases such as Internet-of-things data analytics, and was demonstrated to load 100 million data points per second on only 4 nodes.

A Quick Video Explanation of MapR Streams

Stream Processing

MapR Streams is a global publish-subscribe event streaming system for big data. It connects data producers and consumers worldwide in real -time, with unlimited scale. MapR streams makes real-time data instantly available to stream processing and other frameworks, providing:
  • Utility-grade reliability with self-healing, no single point of failure architecture
  • Out-of-box integration with popular stream processing frameworks like Spark Streaming, Storm, Flink, and Apex
  • Kafka API for real-time producers and consumers for easy application migration