Streaming Blog Posts

Posted on January 18, 2017 by Jack Norris

In this week's Whiteboard Walkthrough, Jack Norris, Senior Vice President of Data and Applications at MapR, explains how the MapR Converged Data Platform opens up the use of containers to the big data environment such that you can access data directly, thus taking advantage of otherwise under utilized assets.

Posted on January 4, 2017 by Ted Dunning

In this week’s Whiteboard Walkthrough Ted Dunning, Chief Application Architect at MapR, provides some pointers for building better machine learning models, including the advantages of data streams and microservices style design in the example of a credit card fraud detector, the need for metrics, and how reconstruction of data from an auto-encoder can serve as a figure of merit that helps identify good models.

Posted on October 19, 2016 by Ted Dunning

In this week's Whiteboard Walkthrough Ted Dunning, Chief Application Architect at MapR, explains in detail how to use streaming IoT sensor data from handsets and devices as well as cell tower data to detect strange anomalies. He takes us from best practices for data architecture, including the advantages of multi-master writes with MapR Streams, through analysis of the telecom data using clustering methods to discover normal and anomalous behaviors.

Posted on October 11, 2016 by Kirk Borne

Much has been written about the power of big data collections to enable the 360 view of our customers, our business, our employees, and our processes. When our numerous disparate heterogeneous data collections are aggregated and joined in the data lake, with appropriate data tagging and data discovery tools in place (such as Apache Drill), then we can reach for that ideal: the 360 view of our domain!

Posted on October 4, 2016 by Carol McDonald

With the rapid expansion of smart phones and other connected mobile devices, communications service providers (CSPs) need to rapidly process, store, and derive insights from the diverse volume of data travelling across their networks. Big data analytics can help CSPs improve profitability by optimizing network services/usage, enhancing customer experience, and improving security.

Posted on September 27, 2016 by Rachel Silver

MapR is pleased to announce support for event-driven microservices on the MapR Converged Data Platform. In this blog post, I’d like to explain what this means, and how it fits into our bigger idea of “convergence.” Microservices are simple, single-purpose applications that work in unison via lightweight communications, such as data streams. They allow you to more easily manage segmented efforts to build, integrate, and coordinate your applications in ways that have traditionally been impossible with monolithic applications.

Posted on September 26, 2016 by Tugdual Grall

Get an introduction to streaming analytics, which allows you real-time insight from captured events and big data. There are applications across industries, from finance to wine making, though there are two primary challenges to be addressed.

Posted on August 16, 2016 by Ellen Friedman

It’s not just a concern when ordering coffee. Something similar can happen as we investigate new and innovative big data technologies and techniques. I used the cappuccino example in a talk I presented recently at the Strata + Hadoop World Conference in London. The talk, titled “Building Better Cross Team Communication,” highlighted the importance of identifying and addressing the difference in how each side thinks the world works when two groups that have different experience and skills come together.

Posted on July 29, 2016 by Ankur Desai

Oil and gas wells produce a huge amount of information. Sensors monitor things like temperature, pressure, fluid viscosity, the presence of foreign substances, and seismic activity. Sensors must be monitored in real time to optimize both performance and safety. A slight change in pressure underground may indicate a fracture that can jeopardize the whole well.

Posted on July 27, 2016 by Ted Dunning

In this week’s Whiteboard Walkthrough Part I, Ted Dunning, Chief Application Architect at MapR, explains the key capabilities required of a streaming platform in the context of micro-services and the advantages they offer.

Posted on July 27, 2016 by Ted Dunning

In this week’s Whiteboard Walkthrough Part II, Ted Dunning, Chief Application Architect at MapR, talks about the design freedom gained by adopting a micro-services architecture based on streaming data. When you move – one step at a time - from an old style architecture that suffers from too much dependence on a shared global state database to a stream-based flow architecture, the isolation between micro-services results in reduced strain on the original database, improved flexibility and often speed.

Posted on July 25, 2016 by Jim Scott

Within this post you will see mention of message-driven architectures. This is in short a subset of a service oriented architecture (SOA). This has been around for many years and is a very popular model. What you will find going through this post is that the foundational message-driven architecture is more competitive to the concepts of the enterprise service bus (ESB).

Posted on July 19, 2016 by Ellen Friedman

In January, I made predictions about six big data trends for 2016 (“What Will You Do in 2016?”). Now we’ve reached the mid-and-a-bit-more year, so it’s a good time to check them out and see how well these predictions match what has happened so far in 2016, what is surprising about that, and what’s likely to come in the second half of the year.

Posted on July 8, 2016 by Ankur Desai

I was at the annual Hadoop Summit in San Jose last week. As usual, the MapR booth was buzzing with big data enthusiasts and experts alike. We showcased demos that spanned multiple topics including multi-cluster Hadoop monitoring using Grafana and Kibana (as part of our new Spyglass Initiative), IoT stream analysis using MapR Streams and Spark Streaming, and self-service big data analytics using Apache Drill.

Posted on June 21, 2016 by Ellen Friedman

Streaming data can be used as a long-term auditable history when you choose a messaging system with persistence, but is this approach practical in terms of the cost of storing years of data at scale? The answer is “yes”, particularly because of the way topic partitions are handled in MapR Streams. Here’s how it works.

Posted on June 8, 2016 by Ellen Friedman

In this week's Whiteboard Walkthrough, Ellen Friedman, a consultant at MapR, talks about how to design a system to handle real-time applications, but also how to take advantage of streaming data beyond those in the moment insights.

Posted on June 7, 2016 by Carol McDonald

Standards and incentives for the digitizing and sharing of healthcare data along with improvements and decreasing costs in storage and parallel processing on commodity hardware, are causing a big data revolution in health care with the goal of better care at lower cost.

Posted on May 23, 2016 by Charu Madan

In today’s world of immense competition and customer churn, Telecom Providers are reinventing and transforming to be able to provide their customers with the best possible customer care and satisfaction.

Posted on May 16, 2016 by Ellen Friedman

Streaming data now is a big focus for many big data projects, including real time applications, so there’s a lot of interest in excellent messaging technologies such as Apache Kafka or MapR Streams, which uses the Kafka 0.9 API.

Posted on May 11, 2016 by Ellen Friedman

What capabilities should you look for in a messaging system when you design the architecture for a streaming data project? Let’s start with a hypothetical IoT data aggregation example to illustrate specific business goals and the requirements they place on messaging technology and data architecture needed to meet those goals...

Posted on May 5, 2016 by Will Ochandarena

The first Kafka Summit was recently held in San Francisco. While the size of the conference was relatively small at 600 attendees, it was encouraging to see the variety of companies that are embracing real-time data pipelines.

Posted on April 18, 2016 by Bill Peterson

MapR, Cisco, and SAP have been collaborating for years to help you gain insight from all of your data sources. Today, we’re excited to announce that Cisco has developed an appliance that includes the MapR Converged Data Platform for SAP HANA, making it much easier and faster for you to harness the power of big data.

Posted on April 14, 2016 by Ankur Desai

This blog post provides an introduction to the components of a typical streaming architecture and options available at each stage. The three major components, Producers, a streaming system, and consumers. The enthusiasm over real-time processing is being met with a host of technologies. Learn about some of them...

Posted on April 8, 2016 by Ankur Desai

What is predictive maintenance?
If we can predict a part failure well in advance, we can schedule maintenance/repair work for the part as per our convenience, while continuing to operate the equipment to avoid unexpected downtime. This will help reduce large repair expenses, as the part will be repaired or replaced well before it fails

Posted on April 7, 2016 by Jack Norris

There are substantial advantages to being able to make decisions at the speed required to respond to events in the moment. In fact, real time is at the foundation of many transformational applications. Let’s take a closer look at what real time really means, and why real time is required across the entire process.

Posted on March 28, 2016 by Ankur Desai

Can we agree at the outset that modern businesses rely heavily on data to make critical decisions, and the ability to make decisions in real time is very valuable? Good.

Posted on February 16, 2016 by Karen Whipple

Most likely, you’ve seen quite a few “Internet of Things” headlines in the last year. But how will the IoT really transform the world as we know it? Here are just a few ways both organizations and consumers are benefiting from IoT

Posted on February 15, 2016 by Ellen Friedman

Actionable insights from real time analytics – that’s a goal for many new projects being designed to make use of streaming data, and it’s no wonder so many organizations are aiming at this prize. If you can develop programs to process streaming data with near or actual real time analytics, you gain the ability to react to life as it happens.

Posted on February 10, 2016 by Jim Scott

Processing data from social media streams and sensors devices in real time is becoming increasingly prevalent, and there are plenty of open source solutions to choose from. Here is the presentation that I gave at Strata+Hadoop World, where I compared three popular Apache projects that allow you to do stream processing: Apache Storm, Apache Spark, and Apache Samza.

Posted on February 2, 2016 by Will Ochandarena

Two blogs came out recently that share some very interesting perspectives on the blurring lines between architectures and implementation of different data services, ranging from file systems to databases to publish/subscribe streaming services.

Posted on January 27, 2016 by Tugdual Grall

In this week's whiteboard walkthrough, Tugdual Grall, technical evangelist at MapR, explains the advantages of a publish-subscribe model for real-time data streams.

Posted on January 18, 2016 by Kirk Borne

The Internet of Things (IoT), with its ubiquitous sensors and streams of big data for big insights, has an estimated market valuation of 17 trillion U.S. Dollars. Apparently, the "sensoring" of the world is a seriously big deal, generating insights into people, processes, and products on a scale that is almost incomprehensible. Certainly 17 trillion dollars is almost incomprehensible.

Posted on January 13, 2016 by Balaji Mohanam

In this week's whiteboard walkthrough, Balaji Mohanam, Product Manager at MapR, explains the difference between Apache Spark and Apache Flink and how to make a decision which to use.

Posted on December 15, 2015 by Jim Scott

In this week's Whiteboard Walkthrough, Jim Scott, Director of Enterprise Strategy and Architecture at MapR, discusses a business use case that leverages the power of MapR Streams.

Posted on December 8, 2015 by Will Ochandarena

Over the last 5 years of shipping product we’ve watched our customers get enormous value out of storing and processing big data. The use cases are far and wide, from performing predictive maintenance on oil rigs to building fraud and risk models on financial transactions.

Posted on December 7, 2015 by Jim Scott

If you’re thinking about working with big data, you might be wondering which tools you should use. If you are trying to enable SQL-on-Hadoop then you might be considering the use of Apache Spark or Apache Drill.

Posted on November 30, 2015 by Jim Scott

The faster questions can be asked the faster you can get answers. Waiting for data to be shipped off of servers to a central processing platform can take time and most businesses these days want to get as close to real time as possible.

Posted on November 24, 2015 by Jim Scott

Streaming data enables businesses to respond to customers as close to real time as possible. There are many different ways to leverage a streaming platform and utilizing Spark in your streaming architecture is easier than you might think.

Posted on October 13, 2015 by Nitin Bandugula

The recent Attunity and MapR webinar ”Give your Enterprise a Spark: How to Deploy Hadoop with Spark in Production” proved to be highly interactive and engaging. As promised, Nitin and Rodan have provided follow-up answers your questions.

Posted on October 8, 2015 by Jim Scott

At Strata London in 2015, someone said to me, “Spark is like a fighter jet that you have to build yourself. Once you have it built, though, you have a fighter jet. Pretty awesome. Now you have to learn to fly it.”

Posted on October 6, 2015 by Jim Scott

Stream processing is a power that has been added alongside Spark Core and its original design goal of rapid in-memory data processing.

Posted on September 28, 2015 by Jim Scott

Apache Spark is a top-level project of the Apache Software Foundation, designed to be used with a range of programming languages on a variety of architectures.

Posted on September 24, 2015 by Ellen Friedman

If you’re not already looking at ways to efficiently handle streaming data flow, chances are you will be soon. An increasing number of organizations are shifting their approach from a largely batch-based design to one that incorporates more streaming processes. What’s the allure?

Posted on September 23, 2015 by Jim Scott

In this blog post, I’ll talk about the relationship between Spark and Hadoop, what Hadoop gives Spark, and what Spark gives Hadoop.

Posted on September 22, 2015 by Michele Nemschoff

Australian shoppers are some of the most digitally influenced in the world; a majority of Australians go online to research a product before buying it, according to a 2015 report by Deloitte.

Posted on September 21, 2015 by Jim Scott

Recently, a new name has entered many of the conversations about big data. Some people see the popular newcomer Apache Spark™ as a more accessible and more powerful replacement for Hadoop, the original technology of choice for big data.

Posted on September 15, 2015 by Michele Nemschoff

It's an exciting time for those in pharmaceutical research these days, given that research organizations can now leverage big data to improve their business.

Posted on September 14, 2015 by Sean Suchter

As more organizations begin to deploy Spark in their production clusters, the need for fine-grained monitoring tools becomes paramount.

Posted on September 10, 2015 by Steve Wooledge

In this week's Whiteboard Walkthrough, Steve Wooledge, VP of Industry Solutions at MapR, talks about an Apache Sark + Hadoop use case for drug discovery that one of our customers is currently running in production.

Posted on September 8, 2015 by Michele Nemschoff

The explosion of data from new devices and technologies has forced the telecommunications industry to completely change the way they handle big data. Their traditional storage and analytics solutions cannot adequately manage the expanding, diverse volume of data generated today.

Posted on August 21, 2015 by Jim Scott

Apache Hadoop is revolutionizing big data in more than one way. While the Hadoop platform introduced reliable distributed storage and processing, various packages such as Spark on top of Hadoop make it possible to build applications and analyze data much faster. Here are some cool ways the Hadoop stack is being used right now.

Posted on August 11, 2015 by Nitin Bandugula

Apache Spark on Hadoop is great for processing large amounts of data quickly. The story gets even better when you get into the realm of real time applications.

Posted on July 15, 2015 by Anil Gadre

You are probably all somewhere on the Spark journey to production scale—you're either at Spark Summit to learn, to start doing something with Spark, or perhaps you have mission-critical applications already running in your enterprise. On this journey, there's a lot to think about—mostly about your application—but you also need to figure out how to actually get Spark into production scale as more and more groups will want the power of the results and the value of using Spark in mission-critical, operational deployments.

Posted on June 23, 2015 by Nitin Bandugula

In this blog, I’d like to talk about the differences between Apache Spark and MapReduce, why it’s easier to develop on Spark, and the top five use cases.

Posted on June 15, 2015 by Sameer Nori

We thought the Kickstart song by Mötley Crüe was appropriate, since everyone is excited about kickstarting their Spark-based applications these days. That’s our theme for the Quick Start Solutions we’re announcing today at Spark Summit West—you can kickstart your Spark efforts into high gear with our Spark Quick Starts. You’ll be able to develop at high speeds, use streaming data, and build applications faster.

Posted on May 15, 2015 by Arsalan Tavakoli-Shiraji

Apache Spark recently celebrated its five-year anniversary as an open source project. While we are always humbled and excited by the open source success of Spark, it gives us far greater pleasure in knowing that there are more and more organizations this year that are deploying Spark into production business applications.

Posted on December 31, 2014 by Karen Whipple

As we close out the year, here is a look back at our 10 most popular blogs of 2014.  Our top posts include machine learning and time series data topics, new milestones for the Apache projects Drill and Spark, and hands-on technical explanations that save you time and headaches.  

Posted on December 16, 2014 by Nitin Bandugula

As one of the most popular tools in the Apache Hadoop ecosystem, there’s been a lot of noise made about Apache Spark – and for good reason. It complements the existing Hadoop ecosystem by adding easy-to-use APIs and data-pipelining capabilities to Hadoop data, and the project support continues to grow. Since its launch in 2010, Spark has seen over 400 contributors from more than 50 different companies.

Posted on October 8, 2014 by Nitin Bandugula

In this blog series, we’re showcasing the top 10 reasons customers choose the MapR Distribution for Hadoop to optimize their data-driven strategies. Reason #8: MapR provides Unbiased Open Source supporting 20+ OSS projects.

Posted on July 18, 2014 by Nitin Bandugula

M.C. Srivas, CTO and Co-Founder of MapR Technologies, spoke recently at Spark Summit 2014 on “Why Spark on Hadoop Matters.” Spark, with an in-memory processing framework, provides a complimentary full stack on Hadoop, and this integration is showing tremendous promise for MapR customers.

Posted on April 14, 2014 by Karen Whipple
We just wrapped up a great quarter for MapR! We introduced our free Sandbox for Hadoop, achieved the highest ranking for Current Offering in a Big Data Hadoop Solutions report by Forrester, and announced the MapR Distribution for Hadoop with YARN and HP Vertica. Read about our latest announcements, top blog posts, webinars, white papers and more in this information-packed newsletter.
Posted on April 10, 2014 by Arsalan Tavakoli-Shiraji
Today, MapR announced that it will distribute and support the Apache Spark platform as part of the MapR Distribution for Hadoop in partnership with Databricks. We’re thrilled to start on this journey with MapR for a multitude of reasons.

Blog Sign Up

Sign up and get the top posts from each week delivered to your inbox every Friday!

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free