Frequently Asked Questions
Q: What is MapR Streams?
MapR Streams is a global publish-subscribe event streaming system for big data. It connects data producers and consumers worldwide in real time, with unlimited scale. Publishers (data producers) write data to one or more topics in MapR Streams. Subscribers (data consumers) to the topic can read the data instantaneously, anywhere across the globe.
MapR Streams is unique for two reasons. First, it is the first big data-scale streaming system to be built into a converged data platform. Next, it is the only big data streaming system to support global event replication at Internet-of-Things (IoT) scale and reliability, providing failover endpoints between up to thousands of distributed clusters.
Q: What is an event stream?
An event stream is a continuous flow of event data that is transported between multiple applications or services. The events are typically generated by diverse data sources including web applications, system logs, social media, sensors, connected devices, and machine logs. In contrast to the types of events that legacy message queue-oriented systems were designed to handle, big data event streams are often generated by millions of sources worldwide, reaching millions or sometimes billions of events per second.
Q: What is a publish-subscribe model?
Publish-subscribe is a messaging paradigm where the data producers (referred to as publishers) do not directly send the data to data consumers. Instead, they publish the data to a system that manages “topics.” The data consumers (referred to as subscribers) subscribe to relevant topics to retrieve the data. This model allows publishers and subscribers to publish and subscribe without knowledge of each other, at different rates.
Q: How is MapR Streams related to the MapR Converged Data Platform?
MapR Streams is an integral part of the MapR Converged Data Platform, which also includes file, storage, database services, and processing frameworks in a single cluster. Furthermore, batch, interactive, and stream processing frameworks have direct access to event streams, eliminating data movement and ensuring consistency. It derives enterprise features such as secure access control, encryption, multi-tenancy, and strong consistency from the MapR Converged Data Platform.
Q: What are the industry challenges for event streams?
Data volume and diversity: Modern businesses are being overwhelmed by the onslaught of data created continuously by diverse sources such as web applications, social media, sensors, connected devices, and machine logs, to name a few.
Geographic dispersion: To add to the complexity, the diverse sources mentioned above are often geographically distributed, sending data to the closest data center for low latency. This distributed data needs to be centralized and joined with data from enterprise applications to paint a complete picture of the state of business.
Delayed processing and insights: Although the data is created continuously, it is consumed for transformation, movement, or processing at a predetermined frequency. This introduces data pipeline complexity and precludes the ability to respond immediately to new information.
Architectural complexity: Businesses typically deploy data transport systems and data processing systems in separate clusters. This creates complexity in analyzing new data available in the data streams in real time, as well as administrative overhead of managing separate clusters.
Q: How does MapR Streams address these challenges?
MapR Streams provides a reliable, globally scalable streaming system that connects data producers and consumers via topics. MapR Streams is integrated into one converged data platform with file, database, and stream processing services.
Converged data platform reduces architectural complexity for streaming: MapR Streams brings together data transport and data processing in the same cluster. Batch, interactive, and stream processing frameworks have direct access to event streams, eliminating data movement and ensuring consistency. Like other services in the MapR Converged Data Platform, MapR Streams provides enterprise features such as secure access control, encryption, and multi-tenancy.
Continuous real-time data processing avoids delayed processing and insights: MapR Streams makes real-time data directly available for processing. Real-time data can be processed by stream processing frameworks such as Spark Streaming to enable sub-second response and automated actions. Enterprise features such as high availability with no single point of failure and disaster recovery mirroring ensure that your system is always on for business critical environments.
Global scalability handles data diversity and geographic dispersion: MapR Streams scales linearly as nodes are added, allowing billions of events per second to be sent across billions of topics. Further, MapR Streams is designed for geographically dispersed systems, with real-time global replication. You can access data created at multiple geographical locations, and process it in real time to get a complete state-of-the-business picture. Producers and consumers can failover between distributed clusters for high availability.
Q: What are the key features of MapR Streams?
- Converged cluster for files, database tables, and streams.
- Converged analytics with batch and streaming analytics in-place using an optimized OJAI API, avoiding data movement.
- Converged security with authentication, fine-grained authorization, and wire-level encryption under a unified security model between MapR services.
- Multi-tenant architecture where users, groups, or applications have separate topic domains, security policies, and replication rules.
- Integrates with stream processing frameworks like Apache Spark Streaming, Apache Storm, Apache Flink, and Apache Apex.
- Persistence of all messages in stream up to an infinite time span.
- High availability with no single point of failure.
- Strong consistency due to synchronous replication.
- Linearly scalable, with each node handling over 1 million messages/second/node in reliable mode.
- Real-time, reliable stream replication between up to thousands of global clusters in an arbitrary topology with producer and consumer failover.
- Intra-cluster capacity and performance scale linearly as servers are added.
Q: Who will benefit from using MapR Streams?
Business leaders: Improve responsiveness to critical events with continuous processing of real-time big data.
Enterprise architects/lead engineers: Simplify the flow of data across data sources, formats, and locations, to reduce architectural complexity and TCO.
Developers: Improve time-to-market for advanced data streaming applications to meet the growing demands of your stakeholders.
Q: What is the relationship between MapR Streams and other “streaming” components like Spark, Storm, Apex, and Flink?
MapR Streams provides the reliable data ingestion, transport, and buffering for stream processing frameworks such as Spark, Storm, Apex and Flink. These stream processing frameworks are fully integrated with MapR Streams, and work together to enable real-time global streaming analytics.
Q: How does MapR Streams compare with Kafka?
MapR Streams is similar to Kafka, as both systems use the same API for publish and subscribe. What differentiates MapR Streams is its proven enterprise features such as global replication, security and multi-tenancy, and HA/DR, all of which it inherits from the MapR Converged Data Platform.
Q: What are some of the functional use cases for MapR Streams?
Stream processing: MapR Streams provides the ingest, transport, and buffering layer for stream processing frameworks such as Spark Streaming to enable real-time operations such as calculations and aggregations on data as it’s delivered.
Database change capture: Change capture keeps the operational system-of-record synchronized with other systems.
Application logs and metrics delivery: MapR Streams can provide a pipeline for log/metrics data coming out of appliances, servers, and applications, making them available to infrastructure monitoring systems for alerting, dashboarding, and search.
Q: What are some vertical use cases that can benefit from MapR Streams?
- Real-time antenna optimization based on user location data.
- Real-time charging and billing based on customer usage, ability to populate up-to-date usage dashboards for users.
- Mobile offers.
- Optimized advertising for video/audio content based on what users are consuming.
- Smart hospitals - collect data and readings from hospital devices (vitals, IVs, MRI, etc.) and analyze and alert in real time.
- Biometrics - collect and analyze data from patient devices that collect vitals while outside of care facilities.
- Ad Tech
- Real-time user targeting based on segment and preferences.
- Build an intelligent supply chain by placing sensors or RFID tags on items to alert if items aren’t in the right place, or proactively order more if supply is low.
- Smart logistics with real-time end-to-end tracking of delivery trucks.
- Financial Services
- Real-time fraud detection.
- Real-time mobile notifications.
- Financial Services
- Real-time fraud detection.
- Real-time mobile notifications.
- Oil & Gas
- Real-time monitoring of pumps/rigs.
Q: How can I try MapR Streams?
When MapR 5.1 becomes available in early 2016, MapR will provide MapR Streams as part of the free MapR Converged Community Edition. We will also release a virtual machine sandbox with MapR Streams along with tutorials, sample code, and video demos to make getting started easy.
Q: How can I buy MapR Streams?
You can license MapR Streams as an individual product or bundle it with our other enterprise products. Please reach out to email@example.com to learn more.
Q: How do I use MapR Streams if I already have MapR?
You can license MapR Streams as an add-on to your current MapR Distribution. Please reach out to firstname.lastname@example.org to learn more.
Q: What version of MapR do I need to use MapR Streams?
5.1 and greater.