Posted on July 28, 2016 by Nicolas Perez

Logging in Apache Spark is very easy to do, since Spark offers access to a logobject out of the box; only some configuration setups need to be done. In a previous post, we looked at how to do this while identifying some problems that may arise. However, the solution presented might cause some problems when you are ready to collect the logs, since they are distributed across the entire cluster.


Posted on July 12, 2016 by Carol McDonald

Random forests are one of the most successful machine learning models for classification. In this blog post, I’ll help you get started using Apache Spark’s Random forests for classification of bank loan credit risk.

Posted on July 27, 2016 by Ted Dunning

In this week’s Whiteboard Walkthrough Part I, Ted Dunning, Chief Application Architect at MapR, explains the key capabilities required of a streaming platform in the context of micro-services and the advantages they offer.

Posted on July 27, 2016 by Ted Dunning

In this week’s Whiteboard Walkthrough Part II, Ted Dunning, Chief Application Architect at MapR, talks about the design freedom gained by adopting a micro-services architecture based on streaming data. When you move – one step at a time - from an old style architecture that suffers from too much dependence on a shared global state database to a stream-based flow architecture, the isolation between micro-services results in reduced strain on the original database, improved flexibility and often speed.

Posted on July 26, 2016 by Manny Puentes

“Big Data” is no longer a buzzword. Businesses big and small that don’t invest now in big data technologies risk getting left behind as the marketplace becomes more and more data-driven. In fact, a recent McKinsey and Company report suggested that companies that invest in big data and analytics consistently outperform their peers in both productivity and revenue.

Posted on July 25, 2016 by Jim Scott

Within this post you will see mention of message-driven architectures. This is in short a subset of a service oriented architecture (SOA). This has been around for many years and is a very popular model. What you will find going through this post is that the foundational message-driven architecture is more competitive to the concepts of the enterprise service bus (ESB).

Posted on July 21, 2016 by Stephan Ewen

In this week’s Whiteboard Walkthrough, Stephan Ewen, PMC member of Apache Flink and CTO of data Artisans, explains how to use savepoints, a unique feature in Apache Flink stream processing, to let you reprocess data, do bug fixes, deal with upgrades, and do A/B testing.

Posted on July 20, 2016 by Sameer Nori

One of the customer questions has centered around wanting to understand how to determine the degree of parallelism being used for various operators in queries. We’ll address this question and the best practice that originated from this in the rest of this blog post.

Posted on July 19, 2016 by Ellen Friedman

In January, I made predictions about six big data trends for 2016. (“What Will You Do in 2016?”) Now we’ve reached the mid-and-a-bit-more-year so it’s a good time to check them out and see how well these predictions match what has happened so far in 2016, what is surprising about that, and what’s likely to come in the second half of the year.

Posted on July 15, 2016 by Ryan Victory

MapR Streams and MapR-DB are both very exciting developments in the MapR Converged Data Platform. In this blog post, I’m going to show you how to get Ruby code to natively interact with MapR-DB and MapR Streams.

Blog Sign Up

Sign up and get the top posts from each week delivered to your inbox every Friday!

Featured Author

Data Engineer, MapR
Mathieu is a Data Engineer on the MapR Professional Services team, and is based in the Asia-Pacific region.

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free