Featured Author

Mathieu Dumoulin
Data Engineer, MapR
Mathieu is a Data Engineer on the MapR Professional Services team, and is based in the Asia-Pacific region. He started using Hadoop in 2012 at the Fujitsu Canada Innovation Lab, where he built a large-scale text classification system from scratch. Since then, Mathieu split his time between being a Search Engineer and managing a new Data Science team for a large Japanese HR company. His current interests are focused on Apache Drill, Apache Spark, and Deep Learning. Mathieu holds both a B.A.Sc. in Computer Science and a Master of Computer Science degree from the Université Laval in Canada.

Author's Posts

Posted on January 17, 2017 by Mathieu Dumoulin

Debugging a real-life distributed application can be a pretty daunting task. Most common Google searches don't turn out to be very useful, at least at first. In this blog post, I will give a fairly detailed account of how we managed to accelerate by almost 10x an Apache Kafka/Spark Streaming/Apache Ignite application and turn a development prototype into a useful, stable streaming application that eventually exceeded the performance goals set for the application.

Posted on January 10, 2017 by Mathieu Dumoulin

This series of blog posts details my findings as I bring to production a fully modern take on Complex Event Processing, or CEP for short. In many applications, ranging from financials to retail and IoT applications, there is tremendous value in automating tasks that require to take action in real time. Putting aside the IT system and frameworks that would support this capability, this is clearly a useful capability.

Posted on January 9, 2017 by Mathieu Dumoulin

This post is intended as a detailed account of a project I have made to integrate an OSS business rules engine with a modern stream messaging system in the Kafka style. The goal of the project, better known as Complex Event Processing (CEP), is to enable real-time decisions on streaming data, such as in IoT use cases.

Posted on November 7, 2016 by Mathieu Dumoulin

Automatic replication of MapR-DB data to Elasticsearch is useful for many environments, and I want to share information about a specific customer deployment I worked on recently. Their use case is related to log security analytics and is centered around using Drill for running interactive queries on aggregated data.

Posted on May 17, 2016 by Mathieu Dumoulin

In this blog post, I would like to share another, much less talked about advantage that emerges from this strategy. This is because a MapR cluster can naturally take advantage of the very well regarded Elasticsearch and Kibana stack to give cluster admins a near real-time view of their cluster’s health and performance.

Posted on April 29, 2016 by Mathieu Dumoulin

There are many options for monitoring the performance and health of a MapR cluster. In this post, I will present the lesser-known method for monitoring the CLDB using the Java Management Extensions (JMX).

Posted on April 27, 2016 by Mathieu Dumoulin

We have experimented with on a 5 node MapR 5.1 cluster running Spark 1.5.2 and will share our experience, difficulties, and solutions on this blog post.

Blog Sign Up

Sign up and get the top posts from each week delivered to your inbox every Friday!