In this week's Whiteboard Walkthrough, Jack Norris, Senior Vice President of Data and Applications at MapR, explains how the MapR Converged Data Platform opens up the use of containers to the big data environment such that you can access data directly, thus taking advantage of otherwise under utilized assets.
Streaming Blog Posts
Debugging a real-life distributed application can be a pretty daunting task. Most common Google searches don't turn out to be very useful, at least at first. In this blog post, I will give a fairly detailed account of how we managed to accelerate by almost 10x an Apache Kafka/Spark Streaming/Apache Ignite application and turn a development prototype into a useful, stable streaming application that eventually exceeded the performance goals set for the application.
This series of blog posts details my findings as I bring to production a fully modern take on Complex Event Processing, or CEP for short. In many applications, ranging from financials to retail and IoT applications, there is tremendous value in automating tasks that require to take action in real time. Putting aside the IT system and frameworks that would support this capability, this is clearly a useful capability.
This post is intended as a detailed account of a project I have made to integrate an OSS business rules engine with a modern stream messaging system in the Kafka style. The goal of the project, better known as Complex Event Processing (CEP), is to enable real-time decisions on streaming data, such as in IoT use cases.
In this week’s Whiteboard Walkthrough Ted Dunning, Chief Application Architect at MapR, provides some pointers for building better machine learning models, including the advantages of data streams and microservices style design in the example of a credit card fraud detector, the need for metrics, and how reconstruction of data from an auto-encoder can serve as a figure of merit that helps identify good models.
In this week’s Whiteboard Walkthrough, Ankur Desai, Senior Product Marketing Manager at MapR, describes how Apache Kafka Connect and a REST API simplify and improve agility in working with streaming data from a variety of data sources including legacy database or data warehouse. He also explains the differences in this architecture when you use MapR Streams versus Kafka for data transport.
We’re pleased to announce the general release of the MapR Ecosystem Pack (MEP) version 2.0. This represents the second major release of a MapR Ecosystem Pack since the beginning of this new process of delivering ecosystem upgrades.
In my previous blogpost, I explained the three major components of a streaming architecture. Most streaming architectures have three major components – producers, a streaming system, and consumers. Producers (such as Apache Flume) publish event data into a streaming system after collecting it from the data source, transforming it into the desired format, and optionally filtering, aggregating, and enriching it.
In this week's Whiteboard Walkthrough Ted Dunning, Chief Application Architect at MapR, explains in detail how to use streaming IoT sensor data from handsets and devices as well as cell tower data to detect strange anomalies. He takes us from best practices for data architecture, including the advantages of multi-master writes with MapR Streams, through analysis of the telecom data using clustering methods to discover normal and anomalous behaviors.
Much has been written about the power of big data collections to enable the 360 view of our customers, our business, our employees, and our processes. When our numerous disparate heterogeneous data collections are aggregated and joined in the data lake, with appropriate data tagging and data discovery tools in place (such as Apache Drill), then we can reach for that ideal: the 360 view of our domain!
With the rapid expansion of smart phones and other connected mobile devices, communications service providers (CSPs) need to rapidly process, store, and derive insights from the diverse volume of data travelling across their networks. Big data analytics can help CSPs improve profitability by optimizing network services/usage, enhancing customer experience, and improving security.
MapR is pleased to announce support for event-driven microservices on the MapR Converged Data Platform. In this blog post, I’d like to explain what this means, and how it fits into our bigger idea of “convergence.” Microservices are simple, single-purpose applications that work in unison via lightweight communications, such as data streams. They allow you to more easily manage segmented efforts to build, integrate, and coordinate your applications in ways that have traditionally been impossible with monolithic applications.
Get an introduction to streaming analytics, which allows you real-time insight from captured events and big data. There are applications across industries, from finance to wine making, though there are two primary challenges to be addressed.
It’s not just a concern when ordering coffee. Something similar can happen as we investigate new and innovative big data technologies and techniques. I used the cappuccino example in a talk I presented recently at the Strata + Hadoop World Conference in London. The talk, titled “Building Better Cross Team Communication,” highlighted the importance of identifying and addressing the difference in how each side thinks the world works when two groups that have different experience and skills come together.
Oil and gas wells produce a huge amount of information. Sensors monitor things like temperature, pressure, fluid viscosity, the presence of foreign substances, and seismic activity. Sensors must be monitored in real time to optimize both performance and safety. A slight change in pressure underground may indicate a fracture that can jeopardize the whole well.
In this week’s Whiteboard Walkthrough Part I, Ted Dunning, Chief Application Architect at MapR, explains the key capabilities required of a streaming platform in the context of micro-services and the advantages they offer.
In this week’s Whiteboard Walkthrough Part II, Ted Dunning, Chief Application Architect at MapR, talks about the design freedom gained by adopting a micro-services architecture based on streaming data. When you move – one step at a time - from an old style architecture that suffers from too much dependence on a shared global state database to a stream-based flow architecture, the isolation between micro-services results in reduced strain on the original database, improved flexibility and often speed.
Within this post you will see mention of message-driven architectures. This is in short a subset of a service oriented architecture (SOA). This has been around for many years and is a very popular model. What you will find going through this post is that the foundational message-driven architecture is more competitive to the concepts of the enterprise service bus (ESB).
In January, I made predictions about six big data trends for 2016 (“What Will You Do in 2016?”). Now we’ve reached the mid-and-a-bit-more year, so it’s a good time to check them out and see how well these predictions match what has happened so far in 2016, what is surprising about that, and what’s likely to come in the second half of the year.
I was at the annual Hadoop Summit in San Jose last week. As usual, the MapR booth was buzzing with big data enthusiasts and experts alike. We showcased demos that spanned multiple topics including multi-cluster Hadoop monitoring using Grafana and Kibana (as part of our new Spyglass Initiative), IoT stream analysis using MapR Streams and Spark Streaming, and self-service big data analytics using Apache Drill.
Streaming data can be used as a long-term auditable history when you choose a messaging system with persistence, but is this approach practical in terms of the cost of storing years of data at scale? The answer is “yes”, particularly because of the way topic partitions are handled in MapR Streams. Here’s how it works.
In this week's Whiteboard Walkthrough, Ellen Friedman, a consultant at MapR, talks about how to design a system to handle real-time applications, but also how to take advantage of streaming data beyond those in the moment insights.
Standards and incentives for the digitizing and sharing of healthcare data along with improvements and decreasing costs in storage and parallel processing on commodity hardware, are causing a big data revolution in health care with the goal of better care at lower cost.
Streaming data now is a big focus for many big data projects, including real time applications, so there’s a lot of interest in excellent messaging technologies such as Apache Kafka or MapR Streams, which uses the Kafka 0.9 API.
What capabilities should you look for in a messaging system when you design the architecture for a streaming data project? Let’s start with a hypothetical IoT data aggregation example to illustrate specific business goals and the requirements they place on messaging technology and data architecture needed to meet those goals...
The first Kafka Summit was recently held in San Francisco. While the size of the conference was relatively small at 600 attendees, it was encouraging to see the variety of companies that are embracing real-time data pipelines.
MapR, Cisco, and SAP have been collaborating for years to help you gain insight from all of your data sources. Today, we’re excited to announce that Cisco has developed an appliance that includes the MapR Converged Data Platform for SAP HANA, making it much easier and faster for you to harness the power of big data.
This blog post provides an introduction to the components of a typical streaming architecture and options available at each stage. The three major components, Producers, a streaming system, and consumers. The enthusiasm over real-time processing is being met with a host of technologies. Learn about some of them...
What is predictive maintenance? If we can predict a part failure well in advance, we can schedule maintenance/repair work for the part as per our convenience, while continuing to operate the equipment to avoid unexpected downtime. This will help reduce large repair expenses, as the part will be repaired or replaced well before it fails
There are substantial advantages to being able to make decisions at the speed required to respond to events in the moment. In fact, real time is at the foundation of many transformational applications. Let’s take a closer look at what real time really means, and why real time is required across the entire process.
Can we agree at the outset that modern businesses rely heavily on data to make critical decisions, and the ability to make decisions in real time is very valuable? Good.
Most likely, you’ve seen quite a few “Internet of Things” headlines in the last year. But how will the IoT really transform the world as we know it? Here are just a few ways both organizations and consumers are benefiting from IoT
Actionable insights from real time analytics – that’s a goal for many new projects being designed to make use of streaming data, and it’s no wonder so many organizations are aiming at this prize. If you can develop programs to process streaming data with near or actual real time analytics, you gain the ability to react to life as it happens.
Processing data from social media streams and sensors devices in real time is becoming increasingly prevalent, and there are plenty of open source solutions to choose from. Here is the presentation that I gave at Strata+Hadoop World, where I compared three popular Apache projects that allow you to do stream processing: Apache Storm, Apache Spark, and Apache Samza.
Two blogs came out recently that share some very interesting perspectives on the blurring lines between architectures and implementation of different data services, ranging from file systems to databases to publish/subscribe streaming services.
In this week's whiteboard walkthrough, Tugdual Grall, technical evangelist at MapR, explains the advantages of a publish-subscribe model for real-time data streams.
The Internet of Things (IoT), with its ubiquitous sensors and streams of big data for big insights, has an estimated market valuation of 17 trillion U.S. Dollars. Apparently, the "sensoring" of the world is a seriously big deal, generating insights into people, processes, and products on a scale that is almost incomprehensible. Certainly 17 trillion dollars is almost incomprehensible.
In this week's whiteboard walkthrough, Balaji Mohanam, Product Manager at MapR, explains the difference between Apache Spark and Apache Flink and how to make a decision which to use.
In this week's Whiteboard Walkthrough, Jim Scott, Director of Enterprise Strategy and Architecture at MapR, discusses a business use case that leverages the power of MapR Streams.
Over the last 5 years of shipping product we’ve watched our customers get enormous value out of storing and processing big data. The use cases are far and wide, from performing predictive maintenance on oil rigs to building fraud and risk models on financial transactions.
The faster questions can be asked the faster you can get answers. Waiting for data to be shipped off of servers to a central processing platform can take time and most businesses these days want to get as close to real time as possible.
Streaming data enables businesses to respond to customers as close to real time as possible. There are many different ways to leverage a streaming platform and utilizing Spark in your streaming architecture is easier than you might think.
The recent Attunity and MapR webinar ”Give your Enterprise a Spark: How to Deploy Hadoop with Spark in Production” proved to be highly interactive and engaging. As promised, Nitin and Rodan have provided follow-up answers your questions.
If you’re not already looking at ways to efficiently handle streaming data flow, chances are you will be soon. An increasing number of organizations are shifting their approach from a largely batch-based design to one that incorporates more streaming processes. What’s the allure?
In this blog post, I’ll talk about the relationship between Spark and Hadoop, what Hadoop gives Spark, and what Spark gives Hadoop.
Australian shoppers are some of the most digitally influenced in the world; a majority of Australians go online to research a product before buying it, according to a 2015 report by Deloitte.
Recently, a new name has entered many of the conversations about big data. Some people see the popular newcomer Apache Spark™ as a more accessible and more powerful replacement for Hadoop, the original technology of choice for big data.
It's an exciting time for those in pharmaceutical research these days, given that research organizations can now leverage big data to improve their business.
As more organizations begin to deploy Spark in their production clusters, the need for fine-grained monitoring tools becomes paramount.
In this week's Whiteboard Walkthrough, Steve Wooledge, VP of Industry Solutions at MapR, talks about an Apache Sark + Hadoop use case for drug discovery that one of our customers is currently running in production.
The explosion of data from new devices and technologies has forced the telecommunications industry to completely change the way they handle big data. Their traditional storage and analytics solutions cannot adequately manage the expanding, diverse volume of data generated today.
Apache Hadoop is revolutionizing big data in more than one way. While the Hadoop platform introduced reliable distributed storage and processing, various packages such as Spark on top of Hadoop make it possible to build applications and analyze data much faster. Here are some cool ways the Hadoop stack is being used right now.
Apache Spark on Hadoop is great for processing large amounts of data quickly. The story gets even better when you get into the realm of real time applications.
You are probably all somewhere on the Spark journey to production scale—you're either at Spark Summit to learn, to start doing something with Spark, or perhaps you have mission-critical applications already running in your enterprise. On this journey, there's a lot to think about—mostly about your application—but you also need to figure out how to actually get Spark into production scale as more and more groups will want the power of the results and the value of using Spark in mission-critical, operational deployments.
In this blog, I’d like to talk about the differences between Apache Spark and MapReduce, why it’s easier to develop on Spark, and the top five use cases.
We thought the Kickstart song by Mötley Crüe was appropriate, since everyone is excited about kickstarting their Spark-based applications these days. That’s our theme for the Quick Start Solutions we’re announcing today at Spark Summit West—you can kickstart your Spark efforts into high gear with our Spark Quick Starts. You’ll be able to develop at high speeds, use streaming data, and build applications faster.
Apache Spark recently celebrated its five-year anniversary as an open source project. While we are always humbled and excited by the open source success of Spark, it gives us far greater pleasure in knowing that there are more and more organizations this year that are deploying Spark into production business applications.
As we close out the year, here is a look back at our 10 most popular blogs of 2014. Our top posts include machine learning and time series data topics, new milestones for the Apache projects Drill and Spark, and hands-on technical explanations that save you time and headaches.
As one of the most popular tools in the Apache Hadoop ecosystem, there’s been a lot of noise made about Apache Spark – and for good reason. It complements the existing Hadoop ecosystem by adding easy-to-use APIs and data-pipelining capabilities to Hadoop data, and the project support continues to grow. Since its launch in 2010, Spark has seen over 400 contributors from more than 50 different companies.
In this blog series, we’re showcasing the top 10 reasons customers choose the MapR Distribution for Hadoop to optimize their data-driven strategies. Reason #8: MapR provides Unbiased Open Source supporting 20+ OSS projects.
M.C. Srivas, CTO and Co-Founder of MapR Technologies, spoke recently at Spark Summit 2014 on “Why Spark on Hadoop Matters.” Spark, with an in-memory processing framework, provides a complimentary full stack on Hadoop, and this integration is showing tremendous promise for MapR customers.
Blog Sign Up
Sign up and get the top posts from each week delivered to your inbox every Friday!