Persistent Storage for Enterprise-Grade Spark Applications

Apache Spark is becoming the de facto standard as a processing and compute engine for big data workloads.


The benefits of both in-memory processing and support for multiple programming languages have made it much easier to develop big data applications with Spark.

However, (1) Spark has no persistent data storage capabilities of its own and (2) it is layered on top of HDFS in the context of Hadoop.

Most IT managers deem HDFS not ideally suited for an enterprise production data center environment due to:

  • Inadequate data protection
  • Subpar disaster recovery capabilities
  • The need to move data between clusters
  • The lack of true multi-tenant capabilities

To understand how the MapR Converged Data Platform addresses these requirements, download this white paper “Persistent Storage for Apache Spark in the Enterprise” by The Evaluator Group.

The paper provides an in-depth view into how the MapR Platform meets key requirements for the persistent storage layer for Spark applications.