White papers

Key Considerations When Productionizing Hadoop

Download now and learn the critical questions to ask before selecting a Hadoop distribution for your production environment.

The Forrester Wave™: Big Data Hadoop Solutions, Q1 2014

Forrester Research, Inc says Apache™ Hadoop® is transforming how companies store, process, analyze, and share big data. See who’s leading the way for big data Hadoop solutions.

451 Research: MapR ties YARN and HP Vertica to its MapR Distribution

MapR has become the latest distributor of Apache Hadoop to update its distribution to the Hadoop 2.x code base, including the Apache YARN resource management framework. The company has also expanded its SQL-on-Hadoop strategy to embrace HP's Vertica Analytics Platform as part of an open approach that will support various SQL-on-Hadoop approaches and projects.

Hadoop in the Enterprise: MapR - Informatica White Paper

Informatica offers the industry’s leading independent data integration platform, which uniquely enables organizations to maximize the return on big data and support top business imperatives. Informatica is also integrated with Hadoop, which is purpose-built for processing big data effectively and affordably, and specifically with the MapR Distribution for Hadoop, which improves performance, scalability, reliability, and ease of data access.

MapR M7 Performance – Client Advisory Note

According to the Evaluator Group, MapR M7 demonstrates consistent, real-time performance.

HP Reference Architecture for MapR M7

This white paper provides several performance optimized configurations for deploying MapR M7 distribution of Apache Hadoop clusters of varying sizes on HP infrastructure that provide a significant reduction in complexity and increase in value and performance.

HP Reference Architecture for MapR M5

This paper provides several performance optimized configurations for deploying MapR M5 distribution of Apache Hadoop clusters of varying sizes on HP infrastructure. The reference architecture configurations for MapR M5 provide a significant reduction in complexity, faster time to value and an improvement in performance. This paper has been created to assist in the rapid design and deployment of MapR M5 software on HP infrastructure for clusters of various sizes.

ApacheDrill: Interactive Ad-Hoc Analysis at Scale

Apache Drill is a distributed system for interactive ad-hoc analysis of large scale datasets. In this article, we introduce Drill's architecture, discuss its extensibility points, and put it into the context of the emerging offerings in the interactive analytics realm.

MapR White Paper: High Availability in the Hadoop Ecosystem

The MapR Distribution for Apache Hadoop provides high availability with no single points of failure across the entire stack. In the storage layer, MapR’s Distributed NameNode HATM architecture provides high availability with self-healing and support for multiple, simultaneous failures, with no additional hardware whatsoever.

The MapR Distribution for Apache Hadoop: Learn how MapR makes Hadoop Easy, Dependable and Fast.

The MapR Distribution for Apache Hadoop adds innovation to the excellent work already done by a large community of developers. With key new technology advances, MapR transforms Hadoop into a dependable and interactive system with real-time data flows.

Quantifying the Value of MapR

MapR makes development, administration and end-user file access and insight much simpler and faster. Developed specifically for high availability and data protection, MapR provides assurance with 100% uptime for your business analytics process, recovery from user and application errors and strong protection against lost data.

Evaluator Group: Evaluating Hadoop in the Data Center

What will make Hadoop an enterprise data center-grade analytics platform?

CITO Research: Choosing a Provider from the Hadoop Ecosystem

Enterprises are faced with new requirements for data. We now have big data that is different from the structured, cleansed corporate data repositories of the past. Before, we had to plan out structured queries. In the Hadoop world, we don’t have to sort data according to a predetermined schema when we collect it. We can store data as it arrives and decide what to do with it later. Today, there are different ways to analyze data collected in Hadoop—but which one is the best way forward?

Intelligent Business Strategies: Offloading and Accelerating Data Warehouse ETL Processing Using Hadoop

Mike Ferguson, Managing Director of Intelligent Business Strategies and former Chief Architect at Teradata, discusses offloading of ETL processing to a much lower cost Hadoop platform where it can scale to manage increasing transaction volumes as well as integrate this data with new more complex high value data types like clickstream, and un-modelled multi-structured data. Hadoop can be used as a long term data store for Big Data as well as archived data warehouse data and as an analytical platform to handle workloads that are unlikely to be done in traditional data warehouses.

Evaluator Group: Advancing Hadoop - MapR M7 Edition

The number of enterprise-level deployments of Hadoop MapReduce is rising quickly, driven by a need to understand and potentially adopt this new business analytics platform for business applications. We note that pilot Hadoop projects are underway within many of the Fortune 1000 group of companies. Responding to this demand, the Hadoop ecosystem is now offering "enterprise" versions of Hadoop.

Kusnetzky Group: MapR Technologies M7 Making Big Data Work for Everyone

Big Data is emerging as an important tool to help organizations learn more about their business operations, product performance, and customer purchasing behavior. It is misunderstood by the media making it difficult for organizations to determine if investing in this tool will bring results and make it possible to improve efficiency, bring out better products and services or better understand customer requirements.

MapR and LucidWorks: Crowd Sourcing Reflected Intelligence Using Search and Big Data

Search has evolved in recent years beyond keyword search into a more broadly applicable information discovery tool by using principles of reflected intelligence. Learn how several organizations combine big data, search and reflected intelligence to improve search results and decision-making, and how LucidWorks and MapR work together to make it possible for organizations to get started using reflected intelligence in their search applications.

Datasheets

MapR: M7 Edition

The M7 Edition is an enterprise-grade platform for NoSQL and Hadoop, providing unique ease of use, dependability and performance advantages. M7 has removed the trade-offs organizations face when looking to deploy a NoSQL solution. M7 not only delivers enterprise-grade features such as Instant Recovery, Snapshots and Mirroring but also provides scale, strong consistency, reliability and continuous low latency.

MapR: M5 Edition

The M5 Edition is an enterprise-grade platform for Hadoop which includes features such as Mirroring, Snapshots, NFS HA, data placement control, and many more. The M5 Edition also offers full support, on demand patches and online incident submission.

MapR: M3 Edition

The M3 Edition is free and available for unlimited production use. Support is provided on a community basis and through MapR's Forums.

MapR Professional Services Datasheet

MapR Professional Services brings world-class expertise to help you get the most out of your Hadoop investment. This datasheet describes the offerings from the MapR Professional Services team: from implementation to data migration to tuning and optimization to data engineering and advanced analytics, they will work with you every step of the way.

MapR Support

MapR Premium Support offers world-class support engineers, thorough documentation and MapR forums to make your ramp-up easy and ensure the smooth operation of your mission-critical applications. MapR also offers a full range of Service Level Agreements (SLAs) to match your business needs.

MapR Academy

At MapR, we’ve developed a full range of training resources to help you understand and leverage the power of the industry’s most advanced distribution for Apache Hadoop. By taking advantage of our wide breadth of MapR Academy training, ranging from instructor-led courses to videos, you’ll soon be on your way to creating real solutions for Big Data.

E-Books

The Executive's Guide to Big Data and Apache Hadoop

This special edition ebook, from the author of Hadoop for Dummies, contains everything you need to know to get started with big data and Hadoop. Download this ebook to learn critical big data concepts and trends, real-world applications of Hadoop in production, and 10 things to look for when evaluating Hadoop technology.

Hadoop Buyer's Guide

Increasing numbers of enterprises are turning to Hadoop as an indispensable component for the mission-critical applications that drive their core business operations. This ebook, by the author for Hadoop for Dummies, presents a series of guidelines that you can use when searching for the essential Hadoop infrastructure that will be sustaining your organization for years to come.

Practical Machine Learning

Machine Learning is a critical tool used for gaining actionable insight and relevant inferences into your ever-increasing amount of data.. In this guide, authors and Mahout committers Ted Dunning and Ellen Friedman, shed light on a more approachable recommendation engine design and the business advantages for leveraging this innovative implementation style.

Tech briefs

MapR Direct Access NFS

MapR is the only distribution for Apache™ Hadoop® that leverages the full power of NFS. The MapR POSIX compliant platform can be exported via NFS to perform fully random read-write operations on files stored in Hadoop.

MapR Snapshots

Snapshots are intended to provide point-in-time recovery, that is, to provide the ability to recover the data to a precise and consistent state in the past. This tech brief discusses how MapR Snapshots do that, along with other benefits of MapR Snapshots.

Stream Processing with MapR

This tech brief delves into data stream processing on Apache™ Hadoop® in the context of the Lambda Architecture - a useful framework to think through the architectural layout of big data systems.

Cisco UCS CPA for Big Data with MapR - Tech Brief

As part of the Cisco Validated Design program, consisting of systems and solutions designed, tested, and documented to facilitate faster, more reliable, and more predictable customer deployments, this document is intended to assist solution architects, sales engineers, field consultants, professional services, IT managers, partner engineering and customers in deploying MapR on the Cisco Common Platform Architecture (CPA) for Big Data.

IBM System x Reference Architecture for Hadoop: MapR

The MapR-validated reference architecture solution from IBM for Hadoop big data analytics is built around powerful, affordable, scalable System x servers and IBM networking solutions so you can deploy your MapR-validated solution more quickly.

RHadoop and MapR

RHadoop is an open source collection of three R packages created by Revolution Analytics that allow users to manage and analyze data with Hadoop from an R environment. It allows data scientists familiar with R to quickly utilize the enterprise-grade capabilities of the MapR Hadoop distribution directly with the analytic capabilities of R.

Managing MapR Clusters on Google Compute Engine

This paper presents several techniques for those who wish to manage their own MapR installations on Google Compute Engine, and select scenarios (migration across zones, disaster recovery and high availability) that arise when dealing with long-lived clusters and operating across multiple zones.

Launching a MapR Cluster on Google Compute Engine

The MapR Distribution for Hadoop is fully integrated with the Google Compute Engine (GCE) framework, allowing customers to deploy a MapR cluster with ready access to Google’s cloud infrastructure.

MapR, Hive, Pig on Google Compute Engine

This paper describes how you can take advantage of Google Compute Engine, with support from Google Cloud Storage, and run a self-managed MapR cluster with Apache Hive and Apache Pig as part of a Big Data processing solution.

M7 Performance Benchmark Report

Apache HBase applications running on MapR M7 experience dramatic performance advantages compared to HBase applications running on other distributions.

HP Vertica and MapR Solution Brief

Unlike other approaches, with HP Vertica Analytics Platform and MapR, you can more quickly leverage existing SQL skills and BI tools to unlock insights from all your data in Hadoop.

Solution Briefs

Cisco UCS Common Platform Architecture for Big Data with MapR

MapR on the Cisco UCS® Common Platform Architecture for Big Data delivers a fully optimized Apache™ Hadoop® solution that provides lights-out data center capabilities and ease of use with superior performance for different classes of Hadoop applications.

MapR - Splunk Solution Brief: Explore, Analyze, and Visualize Data in Hadoop