Relational databases have endured for a reason – they fit well with the types of data that organizations use to run their business. One area where emerging technologies can complement relational database technologies is big data. With the rapidly growing volumes of data, along with the many new sources of data, organizations look for ways to relieve pressure from their existing systems. That’s where Hadoop and NoSQL come in.
Enterprise Management Associates
Hadoop-based solutions have helped solve many of the historical challenges of data warehousing and advanced analytical processing. Recently, Hadoop has evolved, both in terms of open source Hadoop and commercial distributions, to the point where a far wider set of applications and users, in both business and IT, are able to take advantage of Hadoop-based “enterprise data hubs.”
Mike Ferguson, Intelligent Business Strategies
Companies today have seen a huge explosion in the volume of data that they generate. In order to analyze all of this data, companies are realizing that they can’t do it all with just a single enterprise data warehouse, and are turning to new big data platforms like Hadoop, stream processing engines, and NoSQL graph DBMSs to manage their big data workloads. Download this white paper, from Intelligent Business Strategies, to learn about a cost-effective, reliable way to implement a data management hub for your entire big data analytical ecosystem.
Robert D. Schneider
Download now and learn the critical questions to ask before selecting a Hadoop distribution for your production environment.
Phillip Russom, The Data Warehousing Institute
This best practices report quantifies trends in data warehouse architectures, catalogs newly available, relevant technologies, and documents how successful organizations are evolving their architectures to leverage new business opportunities for big data.
Forrester Research, Inc says Apache™ Hadoop® is transforming how companies store, process, analyze, and share big data. See who’s leading the way for big data Hadoop solutions.
Michael Hausenblas and Jacques Nadeau
Apache Drill is a distributed system for interactive ad-hoc analysis of large scale datasets. In this article, we introduce Drill's architecture, discuss its extensibility points, and put it into the context of the emerging offerings in the interactive analytics realm.
Matt Aslett, 451 Research Analyst
MapR has become the latest distributor of Apache Hadoop to update its distribution to the Hadoop 2.x code base, including the Apache YARN resource management framework. The company has also expanded its SQL-on-Hadoop strategy to embrace HP's Vertica Analytics Platform as part of an open approach that will support various SQL-on-Hadoop approaches and projects.
John Webster, Evaluator Group
According to the Evaluator Group, MapR M7 demonstrates consistent, real-time performance.
John Webster, Evaluator Group
What will make Hadoop an enterprise data center-grade analytics platform?
Enterprises are faced with new requirements for data. We now have big data that is different from the structured, cleansed corporate data repositories of the past. Before, we had to plan out structured queries. In the Hadoop world, we don’t have to sort data according to a predetermined schema when we collect it. We can store data as it arrives and decide what to do with it later. Today, there are different ways to analyze data collected in Hadoop—but which one is the best way forward?
By John Webster, Evaluator Group
The number of enterprise-level deployments of Hadoop MapReduce is rising quickly, driven by a need to understand and potentially adopt this new business analytics platform for business applications. We note that pilot Hadoop projects are underway within many of the Fortune 1000 group of companies. Responding to this demand, the Hadoop ecosystem is now offering "enterprise" versions of Hadoop.
Dan Kusnetzky, The Kusnetzky Group
Big Data is emerging as an important tool to help organizations learn more about their business operations, product performance, and customer purchasing behavior. It is misunderstood by the media making it difficult for organizations to determine if investing in this tool will bring results and make it possible to improve efficiency, bring out better products and services or better understand customer requirements.
Grant Ingersoll, Chief Scientist, LucidWorks Ted Dunning, Chief Application Architect, MapR
Search has evolved in recent years beyond keyword search into a more broadly applicable information discovery tool by using principles of reflected intelligence. Learn how several organizations combine big data, search and reflected intelligence to improve search results and decision-making, and how LucidWorks and MapR work together to make it possible for organizations to get started using reflected intelligence in their search applications.
The M7 Edition is an enterprise-grade platform for NoSQL and Hadoop, providing unique ease of use, dependability and performance advantages. M7 has removed the trade-offs organizations face when looking to deploy a NoSQL solution. M7 not only delivers enterprise-grade features such as Instant Recovery, Snapshots and Mirroring but also provides scale, strong consistency, reliability and continuous low latency.
MapR Professional Services brings world-class expertise to help you get the most out of your Hadoop investment. This datasheet describes the offerings from the MapR Professional Services team: from implementation to data migration to tuning and optimization to data engineering and advanced analytics, they will work with you every step of the way.
At MapR, we’ve developed a full range of training resources to help you understand and leverage the power of the industry’s most advanced distribution for Apache Hadoop. By taking advantage of our wide breadth of MapR Academy training, ranging from instructor-led courses to videos, you’ll soon be on your way to creating real solutions for Big Data.
The catalog lists all the courses available through MapR Academy in 2014.
This is the second book in the series Practical Machine Learning by Ted Dunning and Ellen Friedman
Anomaly detection is the detective work of machine learning: finding the unusual, catching the fraud, discovering strange activity in large and complex data sets. From banking security to natural sciences, medicine, and marketing, anomaly detection has many useful applications in this age of big data. In this O’Reilly report, two committers of the Apache Mahout project use practical examples to explain how the underlying concepts of anomaly detection work.
This is the first book in the series Practical Machine Learning by Ted Dunning and Ellen Friedman
Machine Learning is a critical tool used for gaining actionable insight and relevant inferences into your ever-increasing amount of data. In this guide, authors and Mahout committers Ted Dunning and Ellen Friedman, shed light on a more approachable recommendation engine design and the business advantages for leveraging this innovative implementation style.
Robert D. Schneider
Increasing numbers of enterprises are turning to Hadoop as an indispensable component for the mission-critical applications that drive their core business operations. This ebook, by the author for Hadoop for Dummies, presents a series of guidelines that you can use when searching for the essential Hadoop infrastructure that will be sustaining your organization for years to come.
Robert D. Schneider
This special edition ebook, from the author of Hadoop for Dummies, contains everything you need to know to get started with big data and Hadoop. Download this ebook to learn critical big data concepts and trends, real-world applications of Hadoop in production, and 10 things to look for when evaluating Hadoop technology.
Organizations seek to share IT resources cost-efficiently and securely among multiple applications, data, and user groups. Platforms that support this architecture are commonly known as multi-tenancy technologies. Big data platforms are increasingly expected to support multi-tenancy out-of-the-box. The key to multi-tenancy is isolation of the distinct tenants, both in terms of the data contained in the data platform as well as the compute aspect.
MapR is the only distribution for Apache™ Hadoop® that leverages the full power of NFS. The MapR POSIX compliant platform can be exported via NFS to perform fully random read-write operations on files stored in Hadoop.
Snapshots are intended to provide point-in-time recovery, that is, to provide the ability to recover the data to a precise and consistent state in the past. This tech brief discusses how MapR Snapshots do that, along with other benefits of MapR Snapshots.
This white paper provides several performance optimized configurations for deploying MapR M7 distribution of Apache Hadoop clusters of varying sizes on HP infrastructure that provide a significant reduction in complexity and increase in value and performance.
This paper provides several performance optimized configurations for deploying MapR M5 distribution of Apache Hadoop clusters of varying sizes on HP infrastructure. The reference architecture configurations for MapR M5 provide a significant reduction in complexity, faster time to value and an improvement in performance. This paper has been created to assist in the rapid design and deployment of MapR M5 software on HP infrastructure for clusters of various sizes.
This tech brief delves into data stream processing on Apache™ Hadoop® in the context of the Lambda Architecture - a useful framework to think through the architectural layout of big data systems.
As part of the Cisco Validated Design program, consisting of systems and solutions designed, tested, and documented to facilitate faster, more reliable, and more predictable customer deployments, this document is intended to assist solution architects, sales engineers, field consultants, professional services, IT managers, partner engineering and customers in deploying MapR on the Cisco Common Platform Architecture (CPA) for Big Data.
The MapR-validated reference architecture solution from IBM for Hadoop big data analytics is built around powerful, affordable, scalable System x servers and IBM networking solutions so you can deploy your MapR-validated solution more quickly.
RHadoop is an open source collection of three R packages created by Revolution Analytics that allow users to manage and analyze data with Hadoop from an R environment. It allows data scientists familiar with R to quickly utilize the enterprise-grade capabilities of the MapR Hadoop distribution directly with the analytic capabilities of R.
This paper presents several techniques for those who wish to manage their own MapR installations on Google Compute Engine, and select scenarios (migration across zones, disaster recovery and high availability) that arise when dealing with long-lived clusters and operating across multiple zones.
The MapR Distribution for Hadoop is fully integrated with the Google Compute Engine (GCE) framework, allowing customers to deploy a MapR cluster with ready access to Google’s cloud infrastructure.
This paper describes how you can take advantage of Google Compute Engine, with support from Google Cloud Storage, and run a self-managed MapR cluster with Apache Hive and Apache Pig as part of a Big Data processing solution.
Apache HBase applications running on MapR M7 experience dramatic performance advantages compared to HBase applications running on other distributions.
Organizations seek more and larger data sets in their data warehouses (DW) to extract more value. They derive better insights when analyzing a complete picture of enterprise-wide data. The MapR/Informatica data warehouse optimization (DWO) solution lets organizations cost-effectively add more data, more types of data, and more capabilities to their data warehouse environments.
High availability (HA) is the ability of a system to remain up and running despite unforeseen failures, avoiding unplanned downtime or service disruption. HA is a critical feature that businesses rely on to support customer-facing applications and service level agreements. Advance HA features in the MapR Distribution for Hadoop provides numerous benefits to organizations trying to harness big data.
Investment banks have been dealing with high velocity for a long time, but volume is a relatively new factor and emerging as the strongest driver for banks to look at big data and Apache™ Hadoop®.
Many organizations today face the challenges of big data, and need a scalable and cost-effective way to manage their data growth and boost their enterprise data architecture with the ecosystem of technologies around Hadoop.
Unlike other approaches, with HP Vertica Analytics Platform and MapR, you can more quickly leverage existing SQL skills and BI tools to unlock insights from all your data in Hadoop.
MapR on the Cisco UCS® Common Platform Architecture for Big Data delivers a fully optimized Apache™ Hadoop® solution that provides lights-out data center capabilities and ease of use with superior performance for different classes of Hadoop applications.
The pharmaceutical industry is experiencing significant growth in the volume and variety of data from several sources, including the R&D process, retailers, patients, and caregivers. Sales and marketing functions in the pharmaceutical industry have been leading adopters of big data technology and other functions are starting to move in that direction, especially Research and Development.