Use Cases Blog Posts

Posted on October 24, 2016 by Sameer Nori

Offloading cold or unused data and ETL workloads from a data warehouse to Hadoop/big data platforms is a very common starting point for enterprises beginning their big data journey. Platforms like Hadoop provide an economical way to store data and do bulk processing of large data sets; hence, it’s not surprising that cost is the primary driver for this initial use case.

Posted on October 5, 2016 by Ted Dunning

In this Whiteboard Walkthrough, MapR’s Chief Application Architect, Ted Dunning, explains the move from state to flow and shows how it works in a financial services example. Ted describes the revolution underway in moving from a traditional system with multiple programs built around a shared database to a new flow-based system that instead uses a shared state queue in the form of a message stream built with technology such as Apache Kafka or MapR Streams. This new architecture lets decisions be made locally and supports a micro-services style approach.

Posted on September 22, 2016 by Raphaël Velfre

A very common use case for the MapR Converged Data Platform is collecting and analyzing data from a variety of sources, including traditional relational databases. Until recently, data engineers would build an ETL pipeline that periodically walks the relational database and loads the data into files on the MapR cluster, then perform batch analytics on that data.

Posted on August 29, 2016 by Carol McDonald

Building a robust, responsive, secure data service for healthcare is tricky. For starters, healthcare data lends itself to multiple models: Document representation for patient profile views or updates; Graph representation to query relationships between patients, providers, and medications; Search representation for advanced lookups. This post will describe how stream-first architectures can solve these challenges, and look at how this has been implemented at Liaison Technologies.

Posted on August 8, 2016 by Carol McDonald

This post is the first in a series where we will review examples of how Joe Blue, a Data Scientist in MapR Professional Services, assisted MapR customers in identifying new data sources and applying machine learning algorithms in order to better understand their customers. The first example in the series is an advertising customer 360°; the next example in the series will be banking and healthcare customer 360° examples.

Posted on June 29, 2016 by Carol McDonald

This post will use Apache Spark SQL and DataFrames to query, compare and explore S&P 500, Exxon and Anadarko Petroleum Corporation stock prices.

Posted on June 13, 2016 by Ellen Friedman

The power of SQL for business analytics is a given, but the challenge in big data settings is that SQL is normally a static language that assumes pre-defined, fixed and well-known schema. SQL also needs flat data structures. It has been assumed that you need fixed schema for performance.

Posted on May 17, 2016 by Mathieu Dumoulin

In this blog post, I would like to share another, much less talked about advantage that emerges from this strategy. This is because a MapR cluster can naturally take advantage of the very well regarded Elasticsearch and Kibana stack to give cluster admins a near real-time view of their cluster’s health and performance.

Posted on May 10, 2016 by Nicolas Perez

Streaming data is a hot topic these days, and Apache Spark is an excellent framework for streaming. In this blog post, I'll show you how to integrate custom data sources into Spark.

Posted on May 4, 2016 by Nick Amato

If you’ve had a chance to work with Hadoop or Spark a little, you probably already know that HDFS doesn't support full random read-writes or many other capabilities typically required in a production-ready file system.

Posted on May 3, 2016 by Carol McDonald

In this post we are going to discuss building a real time solution for credit card fraud detection.

Posted on April 27, 2016 by Mathieu Dumoulin

We have experimented with on a 5 node MapR 5.1 cluster running Spark 1.5.2 and will share our experience, difficulties, and solutions on this blog post.

Posted on April 21, 2016 by Leon Clayton

In this article we will explore what it means to have a converged data platform for building and delivering business applications. This sample application will be to create blog articles for a personal website.

Posted on April 20, 2016 by Nick Amato

One of the most useful things to do with machine learning is inform assumptions about customer behaviors. This has a wide variety of applications: everything from helping customers make superior choices (and often, more profitable ones), making them contagiously happy about your business, and building loyalty over time.

Posted on April 15, 2016 by Nick Amato

This is a great example of Pig and Hive in action. The data set used is publicly available, making it a great self-help tutorial to play with.

Posted on April 7, 2016 by William Cairns

Having participated in a number of fantasy sports leagues and being a Data Scientist at MapR gives me a unique perspective on my approach to choosing who I think will most likely “win” the predictions for the six players, ranked in order, who I predict will most likely to finish in 10th or better place this year (and hopefully 1st) based on my statistical modeling are:

Posted on April 6, 2016 by Neeraja Rentachintala

Today we are very excited to announce the release of Apache Drill 1.6 on the MapR Converged Data Platform. Drill has been on the path of rapid iterative releases for one and a half years now, gathering amazing traction with customers and OSS community users on the way.

Posted on March 22, 2016 by Ben Sadeghi

Churn prediction is big business. It minimizes customer defection by predicting which customers are likely to cancel a subscription to a service. Though originally used within the telecommunications industry, it has become common practice across banks, ISPs, insurance firms, and other verticals.

Posted on March 1, 2016 by Nicolas Perez

An important part of any application is the underlying log system we incorporate into it. Logs are not only for debugging and traceability, but also for business intelligence. Building a robust logging system within our apps could be use as a great insights of the business problems we are solving.

Posted on February 17, 2016 by Parth Chandra

During the early days of developing Apache Drill, the Drill team realized the need for an efficient way to represent complex, columnar data in memory. Projects like Protobuf provided an efficient way to represent data that had a predefined schema for transmission over the network, and the Apache Parquet project had implemented an efficient way to represent complex columnar data on disk.

Posted on February 9, 2016 by Tugdual Grall

Streaming data is of growing interest to many organizations, and most applications need to use a producer-consumer model to ingest and process data in real time. Many messaging solutions exist today on the market, but few of them have been built to handle the challenges of modern deployment related to IoT, large web based applications and related big data projects.

Posted on January 28, 2016 by Joseph Blue

This time, it’s personal. Super Bowl 50 is being played at Levi’s Stadium in Santa Clara – within sight of many of the world’s most innovative technology companies, including MapR.

Posted on January 26, 2016 by Jim Scott

Are you ready to start streaming all the events in your business? What happens to your streaming solution when you outgrow your single data center? What happens when you are at a company that is already running multiple data centers and you need to implement streaming across data centers?

Posted on January 25, 2016 by Ranjit Lingaiah

In the wide column data model of MapR-DB, all rows are stored by a row key, column family, column qualifier, value, and timestamps. In the current version, the row key is the only field that is indexed, which fits the common pattern of queries based on the row key.

Posted on January 7, 2016 by Dong Meng

XGBoost is a library that is designed for boosted (tree) algorithms. It has become a popular machine learning framework among data science practitioners, especially on Kaggle, which is a platform for data prediction competitions where researchers post their data and statisticians and data miners compete to produce the best models.

Posted on August 5, 2015 by Joseph Blue

Did Harper Lee write To Kill a Mockingbird? For many years, conspiracy buffs supported the urban legend that Truman Capote, Lee’s close friend with considerably more literary creds, might have ghost-authored the novel. The author’s reticence on that subject (as well as every other subject) fueled the rumors and it became another urban legend.

Posted on September 15, 2014 by Dale Kim

At the Big Data Everywhere conference held in Israel, Atzmon Hen-Tov, Vice President of R&D of Pontis, and Lior Schachter, Director of Cloud Technology and Platform Group Manager of Pontis, gave an informative talk titled “Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform.” The five phase, two-year migration of their operational and analytical functions to MapR resulted in a true, real-time operational analytics environment on Hadoop.

Blog Sign Up

Sign up and get the top posts from each week delivered to your inbox every Friday!

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free