MapR supports mission-critical and real-time Big Data analytics across different industries. MapR is used across financial services, retail, media, healthcare, manufacturing, telecommunications and government organizations as well as by leading Fortune 100 and Web 2.0 companies.The MapR platform for Big Data can be used for a variety of use cases from batch applications that leverage MapReduce with data source such as clickstreams to real-time applications that leverage sensor data. The MapR platform for Apache Hadoop™ integrates a growing set of functions including MapReduce, file-based applications, interactive SQL, NoSQL databases, search and discovery, and real-time stream processing. With MapR, data does not need to be moved to specialized silos for processing, data can be processed in place.
This full range of applications and data sources benefit from MapR’s enterprise-grade platform and unified architecture for files and tables. The MapR platform provides high availability, data protection and disaster recovery to support mission-critical applications. The MapR platform also makes it easier to leverage existing applications and solutions by supporting industry-standard interfaces such as NFS. To support a diverse set of applications and users, MapR also provides multi-tenancy features and volume support. These features include support for heterogeneous hardware within a cluster and data and job placement control so applications can be selectively executed in a cluster to take advantage of faster CPUs or SSD drives.
MapR brings unmatched dependability, ease-of-use, and world-record speed to Hadoop, NoSQL, database and streaming applications in one unified Big Data platform.
MapReduce
MapR provides world record performance for MapReduce operations on Hadoop. MapR holds the MinuteSort world record by sorting 1.5 TB of data in one minute. The previous Hadoop record was less than 600 GB. With an advanced architecture that is built in C/C++ and harnesses distributed metadata with an optimized shuffle process, MapR delivers consistent high performance.
File-Based Applications
MapR is a 100% POSIX compliant system that fully supports random read-write operations. By supporting industry-standard NFS, users can mount a MapR cluster and execute any file-based application, written in any language, directly on the data residing in the cluster. All standard tools in the enterprise including browsers, UNIX tools, spreadsheets, and scripts can access the cluster directly without any modifications.
SQL
There are a number of applications that support SQL access against data contained in MapR including Hive, Hadapt and others. MapR is also spearheading the development of Apache Drill that brings ANSI SQL capabilities to Hadoop. Apache Drill, inspired by Google’s Dremel project, delivers low latency interactive query capability for large-scale distributed datasets. Apache Drill supports nested/hierarchical data structures, schema discovery and is capable of working with NoSQL, Hadoop as well as traditional RDBMS. With ANSI SQL compatibility, Drill supports all of the standards tools that the enterprise uses to build and implement SQL queries.
Database
MapR has removed the trade-offs organizations face when looking to deploy a NoSQL solution. Specifically, MapR delivers ease of use, dependability and performance advantages for HBase applications. MapR provides scale, strong consistency, reliability and continuous low latency with an architecture that does not require compactions or background consistency checks. From a performance standpoint, MapR delivers over a million operations per second from just a 10-node cluster.
Search
MapR is the first Hadoop distribution to integrate enterprise-grade search. On a single platform customers can now perform predictive analytics, full search and discovery; and conduct advanced database operations. The MapR enterprise-grade search capability works directly on Hadoop data but can also index and search standard files without having to perform any conversion or transformation. All search content and results are protected with enterprise-grade high availability and data protection, including snapshots and mirrors enabling a full restore of search capabilities.
By integrating the search technology of the industry leader, LucidWorks, MapR and its customers benefit from the added value that LucidWorks Search delivers in the areas of security, connectivity and user management for Apache Lucene/Solr.
Stream Processing
MapR provides a dramatically simplified architecture for real-time stream computational engines such as Storm. Streaming data feeds can be written directly to the MapR platform for Hadoop for long-term storage and MapReduce processing. Because MapR enables data streams to be written directly to the MapR cluster, MapR allows administrators to eliminate queuing systems such as Kafka or Krestel and perform publish-subscribe models within the data platform. Storm can then ‘tail’ a file to which it wishes to subscribe, and as soon as new data hits the file system, it is injected into the Storm topology. This allows for strong Storm/Hadoop interoperability, and a unification & simplification of technologies onto one platform.