The capability to process live data streams enables businesses to make real-time, data-driven decisions. The decisions could be based on simple data aggregation rules or even complex business logic. The engines that support these decision models have to be fast, scalable and reliable and Hadoop, with its rapidly growing ecosystem, is fast emerging as the data platform that supports such real-time stream processing engines.
Recently, the Apache Storm project graduated from Incubation to Apache Top-Level Project. In addition to validating the rapidly-growing developer community, this milestone also reflects the value that real-time stream processing frameworks such as Storm provide to the user community.
At MapR, we have always been excited about Storm. I fondly remember the “Frog Eggs” demo we built in early 2013, showcasing real-time Twitter firehose analysis using Storm on Hadoop. Here is the original blog on the topic, if you are interested. Since then, with our own Ted Dunning serving as mentor, we have seen major developments for the Storm project, as it moved from Apache Incubation to its recent status as an Apache Top-Level Project.
With scalable, fault-tolerant, and flexible distributed processing frameworks such as Apache Storm, users can easily deploy solutions for real-time processing of streaming data, along with the capability to support varied data sources including Kafka, HDFS, MapR-FS, S3 as well as destinations including RDBMS, HBase, MapR-FS or a web application.
MapR Customer Use Cases
Over the past year, our MapR customer base has shown tremendous interest in streaming applications, including Storm. The interest is spread across different sizes of companies as well as different types of industries, including retail, telecom, manufacturing, media and advertising. Here is a small sampling of the use cases we have worked on with our customer base:
Real-time Click-Stream Analysis
A Fortune 100 MapR customer augments its online customer experience with targeted product placements based on both historical customer behavior as well as real-time click-stream data from online customers. The Storm bolts are designed in such a way that aggregated counters on streaming clicks trigger the necessary product placement in the web application serving the end user.
Real-time Threat Detection:
A large managed security services provider augments its cloud security service with real-time threat detection using Storm on a MapR cluster. As sensor data and logs from intrusion protection systems get collected centrally, the very first action that is taken on these streams is in-memory processing that validates if the incoming streams have any known threat footprints that need to be flagged immediately.
Enhanced Network Service Quality:
A large telecom company is using Storm running on their MapR cluster to make real-time service quality decisions. Using Storm and MapR together allows real-time systems to integrate with batch systems to analyze long-term trends.
We have seen strong demand among our customers to move beyond exclusively batch operations. The need to include real-time capabilities is only likely to increase and Storm provides important capabilities in this regard.
Look for more strong, stream-processing solutions like Storm and Spark Streaming from MapR and the wider community. Much like that of SQL-on-Hadoop, NoSQL and machine learning technologies, MapR provides users with multiple options to develop stream processing applications and works closely with customers to identify and deploy the right solution in production environments.
Feel free to reach out to us if you have any questions, and please share if you have any interesting use cases on stream processing. Be sure to check out our “Stream Processing with MapR” tech brief if you are new to streaming applications on Hadoop.