Building the Foundation for Production Success on Hadoop

As we begin to process the data deluge from customer web interactions, mobile devices, supply chain management and the widespread use of sensors, tightly integrated data systems become critical for organizations. Harnessing data from new data sources—including the Internet of Everything—opens opportunities to provide deeper, real-time intelligence and interactions with customers and business operations. Insights gained from combining data from different parts of the business will create not only tremendous operational efficiencies and profits, but will open the door to new data-driven services.

As big data applications graduate from batch analytics to more real-time customer facing applications, large volumes of data arriving at extremely high rates need to be processed in real-time on a continuous basis. The data system needs to be dependable and scalable, yet easy to manage and administer. Apache Hadoop has emerged as the software platform of choice for big data storage and processing, but it needs to be tightly integrated with best-in-class hardware and networking built for the future. That’s where the leading-edge innovation that MapR and Cisco together bring to the market comes in.

For years, the Cisco UCS and MapR reference architecture has provided one of the best combinations for performance and ease of management for big data deployments. For example, MapR software on Cisco hardware broke the performance world record for Terasort with just 300 nodes (compared to 1000s on alternate Hadoop approaches). And our customers have seen tremendous success with the Cisco UCS and MapR solution—to the point where 80% of them have tripled the number of applications and use cases on their Hadoop clusters within the first 12 months.  

One of the challenges that customers face, especially when moving to enterprise-grade production clusters, is the need for a centralized “single pane of glass” for system management and automated node deployment. We are excited that Cisco has released UCS Director Express for Big Data to address this issue by allowing the fully-automated deployment of the entire cluster through the click of a button. It is built on top of the industry-leading Cisco UCS Common Platform Architecture (CPA) for Big Data. Users can now deploy the complete hardware and software stack including the OS and Hadoop software across the entire cluster and manage the full cluster through a single management interface. UCS Director Express for Big Data also allows for high-level job metrics from different nodes to be propagated all the way to the management interface for a single view of the cluster resource usage.

Another critical facet of production clusters is the need to fully support multi-tenancy at the data as well as hardware levels. Different departments typically need to build workloads that have different memory, CPU and disk utilization requirements. Supported by features such as volumes, data placement control and job placement control, MapR provides the only multi-tenant cluster capability that can also accommodate multiple hardware configurations on the same cluster. With the latest hardware additions to the Cisco UCS CPA family, users can now deploy more hardware configurations on the same cluster and manage all of them through a unified interface. This means much less capital expense for customers because of higher hardware resource utilization, as well as lower operation/administration costs.

Quickly harnessing data from various sources and putting it to work requires seamless data movement. MapR provides several APIs including NFS for direct data ingestion into the cluster, allowing data to be landed and processed at very high speeds. Multiple options to configure and implement stream processing and high performance time-series database applications also allow for the best real-time solutions to be deployed on hardware customized for Hadoop applications.

To learn how MapR and Cisco work together to address the crucial challenges of big data, I encourage you to check out our joint customer success stories on the Cisco blogs and MapR partner pages.

How do you think harnessing big data will change your business operations and customer interactions? 

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free