Organizations everywhere are grappling with how to manage their growing big data sets from ERP and e-commerce systems, log files, sensor data, social media and more. Apache Hadoop provides a cost-effective enterprise data hub (EDH) to store, transform, cleanse, filter, analyze and gain new value from all kinds of data.
Architecture for Enterprise Data Hub
Specific uses cases include:
- Data Reservoir or “Data Lake”: Collecting raw data which was previously too expensive to store and process. Data is managed and governed here and can also act as an online archive for data infrequently accessed.
- Data Refining: Optimize the process of integrating diverse data types from multiple sources to discover relationships. Parse, cleanse, transform, and integrate data.
- Big Data Exploration: Perform investigative analytics on large data volumes of unknown value. Apply a combination of SQL-on-Hadoop, machine learnings, statistics, and graph analysis techniques to unlock new insights and improve operational analytics such as anomaly detection and recommendations.
- Data Warehouse Optimization: Capture, store, and refine incoming big data in an enterprise data hub (EDH) to free up valuable processing and storage space on the data warehouse for mission-critical reporting and analysis. Create online archive of infrequently queried data.
- Mainframe Optimization: Offload data and batch processing to Hadoop to free up expensive MIPS cycles and modernize the enterprise data architecture.
The MapR Enterprise Data Hub
Deploying an EDH with MapR leverages the high-performance, massively scalable, and reliable MapR Distribution for Hadoop to give organizations a powerful, enterprise-grade, distributed computing platform. Some of the important features of the MapR EDH include:
- Easy data access: Copying data to and from the MapR EDH is as simple as copying data to a standard file system using Direct Access NFS™. Read Tech Brief
- Multi-tenancy: Support multiple user groups, any and all enterprise data sets, and multiple applications in the same cluster. Read Tech Brief
- Business continuity: MapR EDH provides integrated high availability (HA), data protection, and disaster recovery (DR) capabilities to protect against both hardware failure as well as site-wide failure.
- Data storage in native formats: The MapR EDH supports all data types without requiring predefined schemas so that all data sources from across the enterprise can be included for a 360-degree view of your business.
- High performance: The MapR EDH was designed for high performance, with respect to both high throughput and low latency. In addition, a fraction of servers are required for running the MapR EDH versus other Hadoop distributions, leading to architectural simplicity and lower capital and operational expenses.
Overview of an Enterprise Data Hub that illustrates how customers are using MapR for this solution.
EMA: Foundations for Data-Driven Enterprises: The Rapidly Evolving Hadoop-based Enterprise Data Hub
Set the Bar High for Enterprise Data Hub Requirements
Enterprise Data Hub: Optimizing Your Data Architecture with Hadoop
HP leverages the power of MapR in its Big Data infrastructure
comScore uses Hadoop to process over 1.7 trillion internet and mobile records per month
How Cisco IT built big data platform to transform data management