Latest

Posted on July 21, 2016 by Stephan Ewen

In this week’s Whiteboard Walkthrough, Stephan Ewen, PMC member of Apache Flink and CTO of data Artisans, explains how to use savepoints, a unique feature in Apache Flink stream processing, to let you reprocess data, do bug fixes, deal with upgrades, and do A/B testing.

Featured

Posted on May 3, 2016 by Carol McDonald

In this post we are going to discuss building a real time solution for credit card fraud detection.

Posted on July 20, 2016 by Sameer Nori

One of the customer questions has centered around wanting to understand how to determine the degree of parallelism being used for various operators in queries. We’ll address this question and the best practice that originated from this in the rest of this blog post.

Posted on July 19, 2016 by Ellen Friedman

In January, I made predictions about six big data trends for 2016. (“What Will You Do in 2016?”) Now we’ve reached the mid-and-a-bit-more-year so it’s a good time to check them out and see how well these predictions match what has happened so far in 2016, what is surprising about that, and what’s likely to come in the second half of the year.

Posted on July 15, 2016 by Ryan Victory

MapR Streams and MapR-DB are both very exciting developments in the MapR Converged Data Platform. In this blog post, I’m going to show you how to get Ruby code to natively interact with MapR-DB and MapR Streams.

Posted on July 14, 2016 by Nick Amato

Sooner or later, if you eyeball enough data sets, you will encounter some that look like a graph, or are best represented a graph. Whether it's social media, computer networks, or interactions between machines, graph representations are often a straightforward choice for representing relationships among one or more entities.

Posted on July 13, 2016 by Philippe Cuzey

As a data analyst that primarily used Apache Pig in the past, I eventually needed to program more challenging jobs that required the use of Apache Spark, a more advanced and flexible language. At first, Spark may look a bit intimidating, but this blog post will show that the transition to Spark (especially PySpark) is quite easy.

Posted on July 12, 2016 by Carol McDonald

Random forests are one of the most successful machine learning models for classification. In this blog post, I’ll help you get started using Apache Spark’s spark.ml Random forests for classification of bank loan credit risk.

Posted on July 11, 2016 by Prashant Rathi

In this week’s Whiteboard Walkthrough, Prashant Rathi, Senior Product Manager at MapR, describes the architecture for fine-grained monitoring in the MapR converged data platform from collection to storage and visualization from a variety of data sources as part of the Spyglass Initiative.

Posted on July 8, 2016 by Craig Warman

In this blog post, I’ll describe how to install Apache Drill on the MapR Sandbox for Hadoop, resulting in a "super" sandbox environment that essentially provides the best of both worlds—a fully-functional, single-node MapR/Hadoop/Spark deployment with Apache Drill.

Blog Sign Up

Sign up and get the top posts from each week delivered to your inbox every Friday!


Featured Author

Data Engineer, MapR
Mathieu is a Data Engineer on the MapR Professional Services team, and is based in the Asia-Pacific region.

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams

 

 

 

Download for free