Logging in Apache Spark is very easy to do, since Spark offers access to a logobject out of the box; only some configuration setups need to be done. In a previous post, we looked at how to do this while identifying some problems that may arise. However, the solution presented might cause some problems when you are ready to collect the logs, since they are distributed across the entire cluster.
Random forests are one of the most successful machine learning models for classification. In this blog post, I’ll help you get started using Apache Spark’s spark.ml Random forests for classification of bank loan credit risk.
In this week’s Whiteboard Walkthrough Part I, Ted Dunning, Chief Application Architect at MapR, explains the key capabilities required of a streaming platform in the context of micro-services and the advantages they offer.
In this week’s Whiteboard Walkthrough Part II, Ted Dunning, Chief Application Architect at MapR, talks about the design freedom gained by adopting a micro-services architecture based on streaming data. When you move – one step at a time - from an old style architecture that suffers from too much dependence on a shared global state database to a stream-based flow architecture, the isolation between micro-services results in reduced strain on the original database, improved flexibility and often speed.
“Big Data” is no longer a buzzword. Businesses big and small that don’t invest now in big data technologies risk getting left behind as the marketplace becomes more and more data-driven. In fact, a recent McKinsey and Company report suggested that companies that invest in big data and analytics consistently outperform their peers in both productivity and revenue.
Within this post you will see mention of message-driven architectures. This is in short a subset of a service oriented architecture (SOA). This has been around for many years and is a very popular model. What you will find going through this post is that the foundational message-driven architecture is more competitive to the concepts of the enterprise service bus (ESB).
In this week’s Whiteboard Walkthrough, Stephan Ewen, PMC member of Apache Flink and CTO of data Artisans, explains how to use savepoints, a unique feature in Apache Flink stream processing, to let you reprocess data, do bug fixes, deal with upgrades, and do A/B testing.
One of the customer questions has centered around wanting to understand how to determine the degree of parallelism being used for various operators in queries. We’ll address this question and the best practice that originated from this in the rest of this blog post.
In January, I made predictions about six big data trends for 2016. (“What Will You Do in 2016?”) Now we’ve reached the mid-and-a-bit-more-year so it’s a good time to check them out and see how well these predictions match what has happened so far in 2016, what is surprising about that, and what’s likely to come in the second half of the year.
MapR Streams and MapR-DB are both very exciting developments in the MapR Converged Data Platform. In this blog post, I’m going to show you how to get Ruby code to natively interact with MapR-DB and MapR Streams.
- 1 of 84
Blog Sign Up
Sign up and get the top posts from each week delivered to your inbox every Friday!