Distributed Processing and Analytics

The Executive's Guide to Big Data & Hadoop

Download the ebook.



Apache™ Hadoop® enables big data applications for both operations and analytics and is one of the fastest-growing technologies providing competitive advantage for businesses across industries. Hadoop is a key component of the next-generation data architecture, providing a massively scalable distributed storage and processing platform. Hadoop enables organizations to build new data-driven applications while freeing up resources from existing systems.

There are two main layers to Hadoop, the compute layer that consists of an expanding group of packages of open source components (Hive, Pig, YARN, Oozie, Sqoop, Flume, etc…). The second layer is the Hadoop Distributed File System (HDFS). Most of the focus tends to be on the compute and its corresponding robust, thriving ecosystem. However, the data layer is an important and often overlooked aspect of Hadoop. In fact, HDFS has major limitations today that existed when Hadoop was created 10 years ago. Hadoop lacks enterprise grade features, including consistent snapshots, and mirroring for DR. It’s a batch file system not real-time, and legacy applications cannot use it like a standard file system.

MapR has addressed these limitations with the Converged Data Platform that provides an underlying data layer for Hadoop that is real-time, enterprise-grade, fast, and scalable.

KEY BENEFITS

Change the Economics of Your Data

With the emergence of Hadoop, CIOs are rethinking their enterprise data architecture. Data which was previously too expensive to store, can now be made available for analysis to improve business insights at 1/10 to 1/50 the cost on a per terabyte basis. MapR also enables the capturing and storing of data from every touch point in an organization, while eliminating separate silos to process that data (e.g., data transformation, cleansing, analysis, scoring).

More Data Leads to Better Insights

Users across industries are now able to bring structured, unstructured and semi-structured data sources together on one platform to perform deeper and more accurate analysis for improved customer interactions and day-to-day operations. Examples include building customer 360-degree views of both transactions and interactions, measuring brand health across channels, improving real-time fleet logistics, improving risk modeling and fraud detection algorithms, and improving quality control on assembly lines. One of the common insights across these customer experiences is that expanding the data for a given model can have dramatic improvements in analysis.

Game-Changing Business Applications

In addition to data management and analytic applications powered by Hadoop, more organizations are powering business-critical operational applications where low latency and transactional consistency are important. A large number of companies are gaining competitive advantage by executing Hadoop-based applications on a converged data such as advertisement auction engines, product search, and family history services, while large enterprises are leveraging search and NoSQL databases on Hadoop (in-Hadoop databases) to create data-driven services for their customers. Examples include real-time product offers for customers, power grid optimization, and cable advertising optimization using set-top box data.

Hadoop is Transforming Businesses Across Industries

Every industry is benefitting from the scale and processing power that Hadoop as part of a converged data platform brings, becoming more data-driven and gaining deeper insights to customers and operations.