One Platform for Big Data Applications

With MapR, data does not need to be moved to specialized silos for processing; data can be processed in place. In fact, we have applied the concept of  "Polyglot Persistence" to the MapR platform, with the ability to leverage multiple data types and formats directly, depending on your use case. MapR Distribution’s unified architecture enables direct processing of files and tables. The MapR platform also makes it easier to leverage existing applications and solutions by supporting POSIX-compliant, industry-standard NFS.

Additional features to support a diverse set of applications and users include a range of enterprise-grade features: high availability, data protection and disaster recovery support; multi-tenancy and volume support; data and job placement control so applications can be selectively executed in a cluster to take advantage of faster CPUs or SSD drives; and support for a heterogeneous hardware cluster.

KEY FEATURES

MapReduce

MapR provides world-record performance for MapReduce operations on Hadoop. MapR holds the MinuteSort world record by sorting 1.5 TB of data in one minute. The previous Hadoop record was less than 600 GB. With an advanced architecture that is built in C/C++ and harnesses distributed metadata with an optimized shuffle process, MapR delivers consistent high performance.

File-Based Applications

MapR is a 100% POSIX-compliant system that fully supports random read-write operations. By supporting industry-standard NFS, users can mount a MapR cluster and execute any file-based application, written in any language, directly on the data residing in the cluster. All standard tools in the enterprise including browsers, UNIX tools, spreadsheets, and scripts can access the cluster directly without any modifications.

Interactive SQL

There are a number of applications that support SQL access against data contained in MapR including Hive, Hadapt and others. MapR is also spearheading the development of Apache Drill that brings ANSI SQL capabilities to Hadoop. Apache Drill, inspired by Google’s Dremel project, delivers low latency interactive query capability for large-scale distributed datasets. Apache Drill supports nested/hierarchical data structures, schema discovery and is capable of working with NoSQL, Hadoop as well as traditional RDBMS. With ANSI SQL compatibility, Drill supports all of the standards tools that the enterprise uses to build and implement SQL queries.

Database

MapR has removed the trade-offs organizations face when looking to deploy a NoSQL solution. Specifically, MapR delivers ease of use, dependability and performance advantages for HBase applications. MapR provides scale, strong consistency, reliability and continuous low latency with an architecture that does not require compactions or background consistency checks. From a performance standpoint, MapR delivers over a million operations per second from just a 10-node cluster.

Search

MapR is the first Hadoop distribution to integrate enterprise-grade search. On a single platform customers can now perform predictive analytics, full search and discovery; and conduct advanced database operations. The MapR enterprise-grade search capability works directly on Hadoop data but can also index and search standard files without having to perform any conversion or transformation. All search content and results are protected with enterprise-grade high availability and data protection, including snapshots and mirrors enabling a full restore of search capabilities.

Stream Processing

MapR provides a dramatically simplified architecture for real-time stream computational engines such as Storm. Streaming data feeds can be written directly to the MapR platform for Hadoop for long-term storage and MapReduce processing. Because MapR enables data streams to be written directly to the MapR cluster, MapR allows administrators to eliminate queuing systems such as Kafka or Krestel and perform publish-subscribe models within the data platform. Storm can then ‘tail’ a file to which it wishes to subscribe, and as soon as new data hits the file system, it is injected into the Storm topology. This allows for strong Storm/Hadoop interoperability, and a unification and simplification of technologies onto one enterprise Hadoop platform.