If you want to try out the MapR Converged Data Platform to see its unique big data capabilities but don’t have a cluster of hardware immediately available, you still have a few other options. For example, you can spin up a MapR cluster in the cloud using multiple node instances on one of our IaaS partners (Amazon, Azure, etc.).
According to Gartner, by 2020, a quarter of a billion connected cars will form a major element of the Internet of Things. Connected vehicles are projected to generate 25GB of data per hour, which can be analyzed to provide real-time monitoring and apps, and will lead to new concepts of mobility and vehicle usage.
In the past year, the big data pendulum for financial services has officially swung from passing fad or experiment to large deployments. That puts a somewhat different slant on big data trends when compared to 2016 trends. The question of big data hype versus reality has finally been put to rest for banks.
A lot of people choose MapR as their core platform for processing and storing big data because of its advantages for speed and performance. MapR consistently performs faster than any other big data platform for all kinds of applications, including Hadoop, distributed file I/O, NoSQL data storage, and data streaming. In this post, I’m focusing on the latter to provide some perspective on how much better/faster/cheaper MapR Streams can be compared to Apache Kafka as a data streaming technology.
This series of blog posts details my findings as I bring to production a fully modern take on Complex Event Processing, or CEP for short. In many applications, ranging from financials to retail and IoT applications, there is tremendous value in automating tasks that require to take action in real time. Putting aside the IT system and frameworks that would support this capability, this is clearly a useful capability.
This post is intended as a detailed account of a project I have made to integrate an OSS business rules engine with a modern stream messaging system in the Kafka style. The goal of the project, better known as Complex Event Processing (CEP), is to enable real-time decisions on streaming data, such as in IoT use cases.
There has been a lot of research in document image processing over the past 20 years, but not much research has been done in terms of parallel processing. Some of the solutions proposed for parallel processing have been to create threads of execution for each image, or to use GNU Parallel.
The first post discussed creating a machine learning model using Apache Spark’s K-means algorithm to cluster Uber data based on location. This second post will discuss using the saved K-means model with streaming data to do real-time analysis of where and when Uber cars are clustered.
In this week’s Whiteboard Walkthrough Ted Dunning, Chief Application Architect at MapR, provides some pointers for building better machine learning models, including the advantages of data streams and microservices style design in the example of a credit card fraud detector, the need for metrics, and how reconstruction of data from an auto-encoder can serve as a figure of merit that helps identify good models.
Pentaho Data Integration (PDI) provides the ETL capabilities that facilitate the process of capturing, cleansing, and storing data. Its uniform and consistent format makes it accessible and relevant to end-users and IoT technologies. Apache Drill is a schema-free SQL-on-Hadoop engine that lets you run SQL queries against different data sets with various formats, e.g. JSON, CSV, Parquet, HBase, etc.
- 1 of 95
Blog Sign Up
Sign up and get the top posts from each week delivered to your inbox every Friday!