White papers

Optimize Your Enterprise Architecture with Hadoop and NoSQL

Relational databases have endured for a reason – they fit well with the types of data that organizations use to run their business. One area where emerging technologies can complement relational database technologies is big data. With the rapidly growing volumes of data, along with the many new sources of data, organizations look for ways to relieve pressure from their existing systems. That’s where Hadoop and NoSQL come in.

Foundations for Data-Driven Enterprises: The Rapidly Evolving Hadoop-based Enterprise Data Hub

Enterprise Management Associates

Hadoop-based solutions have helped solve many of the historical challenges of data warehousing and advanced analytical processing. Recently, Hadoop has evolved, both in terms of open source Hadoop and commercial distributions, to the point where a far wider set of applications and users, in both business and IT, are able to take advantage of Hadoop-based “enterprise data hubs.”

The Hadoop Data Refinery and Enterprise Data Hub

Mike Ferguson, Intelligent Business Strategies

Companies today have seen a huge explosion in the volume of data that they generate. In order to analyze all of this data, companies are realizing that they can’t do it all with just a single enterprise data warehouse, and are turning to new big data platforms like Hadoop, stream processing engines, and NoSQL graph DBMSs to manage their big data workloads. Download this white paper, from Intelligent Business Strategies, to learn about a cost-effective, reliable way to implement a data management hub for your entire big data analytical ecosystem.

Key Considerations When Productionizing Hadoop

Robert D. Schneider

Download now and learn the critical questions to ask before selecting a Hadoop distribution for your production environment.

TDWI Best Practices Report: Evolving Data Warehouse Architectures in the Age of Big Data

Phillip Russom, The Data Warehousing Institute

This best practices report quantifies trends in data warehouse architectures, catalogs newly available, relevant technologies, and documents how successful organizations are evolving their architectures to leverage new business opportunities for big data.

The Forrester Wave™: Big Data Hadoop Solutions, Q1 2014

Forrester Research

Forrester Research, Inc says Apache™ Hadoop® is transforming how companies store, process, analyze, and share big data. See who’s leading the way for big data Hadoop solutions.

ApacheDrill: Interactive Ad-Hoc Analysis at Scale

Michael Hausenblas and Jacques Nadeau

Apache Drill is a distributed system for interactive ad-hoc analysis of large scale datasets. In this article, we introduce Drill's architecture, discuss its extensibility points, and put it into the context of the emerging offerings in the interactive analytics realm.

MapR ties YARN and HP Vertica to its MapR Distribution

Matt Aslett, 451 Research Analyst

MapR has become the latest distributor of Apache Hadoop to update its distribution to the Hadoop 2.x code base, including the Apache YARN resource management framework. The company has also expanded its SQL-on-Hadoop strategy to embrace HP's Vertica Analytics Platform as part of an open approach that will support various SQL-on-Hadoop approaches and projects.

MapR M7 Performance – Client Advisory Note

John Webster, Evaluator Group

According to the Evaluator Group, MapR M7 demonstrates consistent, real-time performance.

Evaluating Hadoop in the Data Center

John Webster, Evaluator Group

What will make Hadoop an enterprise data center-grade analytics platform?

Choosing a Provider from the Hadoop Ecosystem

CITO Research

Enterprises are faced with new requirements for data. We now have big data that is different from the structured, cleansed corporate data repositories of the past. Before, we had to plan out structured queries. In the Hadoop world, we don’t have to sort data according to a predetermined schema when we collect it. We can store data as it arrives and decide what to do with it later. Today, there are different ways to analyze data collected in Hadoop—but which one is the best way forward?

Advancing Hadoop - MapR M7 Edition

By John Webster, Evaluator Group

The number of enterprise-level deployments of Hadoop MapReduce is rising quickly, driven by a need to understand and potentially adopt this new business analytics platform for business applications. We note that pilot Hadoop projects are underway within many of the Fortune 1000 group of companies. Responding to this demand, the Hadoop ecosystem is now offering "enterprise" versions of Hadoop.

MapR Technologies M7 Making Big Data Work for Everyone

Dan Kusnetzky, The Kusnetzky Group

Big Data is emerging as an important tool to help organizations learn more about their business operations, product performance, and customer purchasing behavior. It is misunderstood by the media making it difficult for organizations to determine if investing in this tool will bring results and make it possible to improve efficiency, bring out better products and services or better understand customer requirements.

Crowd Sourcing Reflected Intelligence Using Search and Big Data

Grant Ingersoll, Chief Scientist, LucidWorks Ted Dunning, Chief Application Architect, MapR

Search has evolved in recent years beyond keyword search into a more broadly applicable information discovery tool by using principles of reflected intelligence. Learn how several organizations combine big data, search and reflected intelligence to improve search results and decision-making, and how LucidWorks and MapR work together to make it possible for organizations to get started using reflected intelligence in their search applications.


MapR M7 Edition

The M7 Edition is an enterprise-grade platform for NoSQL and Hadoop, providing unique ease of use, dependability and performance advantages. M7 has removed the trade-offs organizations face when looking to deploy a NoSQL solution. M7 not only delivers enterprise-grade features such as Instant Recovery, Snapshots and Mirroring but also provides scale, strong consistency, reliability and continuous low latency.

MapR M5 Edition

The M5 Edition is an enterprise-grade platform for Hadoop which includes features such as Mirroring, Snapshots, NFS HA, data placement control, and many more. The M5 Edition also offers full support, on demand patches and online incident submission.

MapR M3 Edition

The M3 Edition is free and available for unlimited production use. Support is provided on a community basis and through MapR's Forums.

Professional Services

MapR Professional Services brings world-class expertise to help you get the most out of your Hadoop investment. This datasheet describes the offerings from the MapR Professional Services team: from implementation to data migration to tuning and optimization to data engineering and advanced analytics, they will work with you every step of the way.

Premium Support

MapR Premium Support offers world-class support engineers, thorough documentation and MapR forums to make your ramp-up easy and ensure the smooth operation of your mission-critical applications. MapR also offers a full range of Service Level Agreements (SLAs) to match your business needs.

MapR Academy

At MapR, we’ve developed a full range of training resources to help you understand and leverage the power of the industry’s most advanced distribution for Apache Hadoop. By taking advantage of our wide breadth of MapR Academy training, ranging from instructor-led courses to videos, you’ll soon be on your way to creating real solutions for Big Data.

MapR Academy Course Catalog

The catalog lists all the courses available through MapR Academy in 2014.


A New Look At Anomaly Detection

This is the second book in the series Practical Machine Learning by Ted Dunning and Ellen Friedman

Anomaly detection is the detective work of machine learning: finding the unusual, catching the fraud, discovering strange activity in large and complex data sets. From banking security to natural sciences, medicine, and marketing, anomaly detection has many useful applications in this age of big data. In this O’Reilly report, two committers of the Apache Mahout project use practical examples to explain how the underlying concepts of anomaly detection work.

Innovations in Recommendations

This is the first book in the series Practical Machine Learning by Ted Dunning and Ellen Friedman

Machine Learning is a critical tool used for gaining actionable insight and relevant inferences into your ever-increasing amount of data. In this guide, authors and Mahout committers Ted Dunning and Ellen Friedman, shed light on a more approachable recommendation engine design and the business advantages for leveraging this innovative implementation style.

Hadoop Buyer's Guide

Robert D. Schneider

Increasing numbers of enterprises are turning to Hadoop as an indispensable component for the mission-critical applications that drive their core business operations. This ebook, by the author for Hadoop for Dummies, presents a series of guidelines that you can use when searching for the essential Hadoop infrastructure that will be sustaining your organization for years to come.

The Executive's Guide to Big Data and Apache Hadoop

Robert D. Schneider

This special edition ebook, from the author of Hadoop for Dummies, contains everything you need to know to get started with big data and Hadoop. Download this ebook to learn critical big data concepts and trends, real-world applications of Hadoop in production, and 10 things to look for when evaluating Hadoop technology.

Tech briefs

Multi-tenancy with MapR

Organizations seek to share IT resources cost-efficiently and securely among multiple applications, data, and user groups. Platforms that support this architecture are commonly known as multi-tenancy technologies. Big data platforms are increasingly expected to support multi-tenancy out-of-the-box. The key to multi-tenancy is isolation of the distinct tenants, both in terms of the data contained in the data platform as well as the compute aspect.

MapR Direct Access NFS

MapR is the only distribution for Apache™ Hadoop® that leverages the full power of NFS. The MapR POSIX compliant platform can be exported via NFS to perform fully random read-write operations on files stored in Hadoop.

MapR Snapshots

Snapshots are intended to provide point-in-time recovery, that is, to provide the ability to recover the data to a precise and consistent state in the past. This tech brief discusses how MapR Snapshots do that, along with other benefits of MapR Snapshots.

HP Reference Architecture for MapR M7

This white paper provides several performance optimized configurations for deploying MapR M7 distribution of Apache Hadoop clusters of varying sizes on HP infrastructure that provide a significant reduction in complexity and increase in value and performance.

HP Reference Architecture for MapR M5

This paper provides several performance optimized configurations for deploying MapR M5 distribution of Apache Hadoop clusters of varying sizes on HP infrastructure. The reference architecture configurations for MapR M5 provide a significant reduction in complexity, faster time to value and an improvement in performance. This paper has been created to assist in the rapid design and deployment of MapR M5 software on HP infrastructure for clusters of various sizes.

Stream Processing with MapR

This tech brief delves into data stream processing on Apache™ Hadoop® in the context of the Lambda Architecture - a useful framework to think through the architectural layout of big data systems.

Cisco UCS CPA for Big Data with MapR - Tech Brief

As part of the Cisco Validated Design program, consisting of systems and solutions designed, tested, and documented to facilitate faster, more reliable, and more predictable customer deployments, this document is intended to assist solution architects, sales engineers, field consultants, professional services, IT managers, partner engineering and customers in deploying MapR on the Cisco Common Platform Architecture (CPA) for Big Data.

IBM System x Reference Architecture for Hadoop: MapR

The MapR-validated reference architecture solution from IBM for Hadoop big data analytics is built around powerful, affordable, scalable System x servers and IBM networking solutions so you can deploy your MapR-validated solution more quickly.

RHadoop and MapR

RHadoop is an open source collection of three R packages created by Revolution Analytics that allow users to manage and analyze data with Hadoop from an R environment. It allows data scientists familiar with R to quickly utilize the enterprise-grade capabilities of the MapR Hadoop distribution directly with the analytic capabilities of R.

Managing MapR Clusters on Google Compute Engine

This paper presents several techniques for those who wish to manage their own MapR installations on Google Compute Engine, and select scenarios (migration across zones, disaster recovery and high availability) that arise when dealing with long-lived clusters and operating across multiple zones.

Launching a MapR Cluster on Google Compute Engine

The MapR Distribution for Hadoop is fully integrated with the Google Compute Engine (GCE) framework, allowing customers to deploy a MapR cluster with ready access to Google’s cloud infrastructure.

MapR, Hive, Pig on Google Compute Engine

This paper describes how you can take advantage of Google Compute Engine, with support from Google Cloud Storage, and run a self-managed MapR cluster with Apache Hive and Apache Pig as part of a Big Data processing solution.

M7 Performance Benchmark Report

Apache HBase applications running on MapR M7 experience dramatic performance advantages compared to HBase applications running on other distributions.

Solution Briefs

Data Warehouse Optimization with MapR and Informatica

Organizations seek more and larger data sets in their data warehouses (DW) to extract more value. They derive better insights when analyzing a complete picture of enterprise-wide data. The MapR/Informatica data warehouse optimization (DWO) solution lets organizations cost-effectively add more data, more types of data, and more capabilities to their data warehouse environments.

High Availability on MapR

High availability (HA) is the ability of a system to remain up and running despite unforeseen failures, avoiding unplanned downtime or service disruption. HA is a critical feature that businesses rely on to support customer-facing applications and service level agreements. Advance HA features in the MapR Distribution for Hadoop provides numerous benefits to organizations trying to harness big data.

Business Intelligence at the Speed-of- Thought and the Scale-of-Big-Data

Apache Hadoop offers new ways for unearthing valuable insights from information scattered across your company’s various departments and disparate technology systems. Hadoop can deliver unparalleled value in revealing new analytics-driven revenue streams, improving customer acquisition and retention, as well as increasing operational efficiencies.

Big Data and Apache Hadoop for the Banking and Securities Industry

Investment banks have been dealing with high velocity for a long time, but volume is a relatively new factor and emerging as the strongest driver for banks to look at big data and Apache™ Hadoop®.

Enterprise Data Hub: Optimizing Your Data Architecture with Hadoop

Many organizations today face the challenges of big data, and need a scalable and cost-effective way to manage their data growth and boost their enterprise data architecture with the ecosystem of technologies around Hadoop.

HP Vertica and MapR Solution Brief

Unlike other approaches, with HP Vertica Analytics Platform and MapR, you can more quickly leverage existing SQL skills and BI tools to unlock insights from all your data in Hadoop.

MapR - Splunk Solution Brief: Explore, Analyze, and Visualize Data in Hadoop
Cisco UCS Common Platform Architecture for Big Data with MapR

MapR on the Cisco UCS® Common Platform Architecture for Big Data delivers a fully optimized Apache™ Hadoop® solution that provides lights-out data center capabilities and ease of use with superior performance for different classes of Hadoop applications.

Big Data and Apache Hadoop for the Pharmaceutical Industry

The pharmaceutical industry is experiencing significant growth in the volume and variety of data from several sources, including the R&D process, retailers, patients, and caregivers. Sales and marketing functions in the pharmaceutical industry have been leading adopters of big data technology and other functions are starting to move in that direction, especially Research and Development.