This series of blog posts details my findings as I bring to production a fully modern take on Complex Event Processing, or CEP for short. In many applications, ranging from financials to retail and IoT applications, there is tremendous value in automating tasks that require to take action in real time. Putting aside the IT system and frameworks that would support this capability, this is clearly a useful capability.
Machine Learning Blog Posts
This post is intended as a detailed account of a project I have made to integrate an OSS business rules engine with a modern stream messaging system in the Kafka style. The goal of the project, better known as Complex Event Processing (CEP), is to enable real-time decisions on streaming data, such as in IoT use cases.
This post is the second in a series where we will go over examples of how MapR data scientist Joe Blue assisted MapR customers, in this case a regional bank, to identify new data sources and apply machine learning algorithms in order to better understand their customers. In this second part, we will cover a bank customer profitability 360° example, presenting the before, during and after.
Business owners and executives today know the power of social media, mobile technology, cloud computing, and analytics. If you pay attention, however, you will notice that truly mature and successful digital businesses do not jump at every new technological tool or platform.
Standards and incentives for the digitizing and sharing of healthcare data along with improvements and decreasing costs in storage and parallel processing on commodity hardware, are causing a big data revolution in health care with the goal of better care at lower cost.
Just a few years ago, using a fingerprint to sign on to your phone seemed futuristic. Today, it’s everywhere and just the beginning of how biometrics will be woven into our lives.
What is predictive maintenance? If we can predict a part failure well in advance, we can schedule maintenance/repair work for the part as per our convenience, while continuing to operate the equipment to avoid unexpected downtime. This will help reduce large repair expenses, as the part will be repaired or replaced well before it fails
Having participated in a number of fantasy sports leagues and being a Data Scientist at MapR gives me a unique perspective on my approach to choosing who I think will most likely “win” the tournament...my predictions for the six players, ranked in order, who I predict will most likely to finish in 10th or better place this year (and hopefully 1st) based on my statistical modeling are:
There are 150 quintillion (i.e. the one after quadrillion) permutations to consider when completing your NCAA bracket. Some of us don’t have time to review them all; if you are likewise short on time, you can let MapR do the heavy lifting for you and get your personalized bracket from the Crystal B-Ball!
Dimensionality reduction is a critical component of any solution dealing with massive data collections. Being able to sift through a mountain of data efficiently in order to find the key descriptive, predictive and explanatory features of the collection is a fundamental required capability for coping with the Big Data avalanche.
Companies everywhere are looking for ways to improve customer service. For example, companies with call-in support centers might track how long agents take to answer calls, or how long customers stay on hold.
The Internet of Things (IoT), with its ubiquitous sensors and streams of big data for big insights, has an estimated market valuation of 17 trillion U.S. Dollars. Apparently, the "sensoring" of the world is a seriously big deal, generating insights into people, processes, and products on a scale that is almost incomprehensible. Certainly 17 trillion dollars is almost incomprehensible.
Someone once said “if you can’t measure something, you can’t understand it.” Another version of this belief says: “If you can’t measure it, it doesn’t exist.” This is a false way of thinking – a fallacy – in fact it is sometimes called the McNamara fallacy.
The meteoric growth of available data has precipitated the need for data scientists to leverage that surplus of information. This spotlight has caused many industrious people to wonder “can I be a data scientist, and what are the skills I would need?”.
I was recently asked five questions by Alex Woodie of Datanami for the article, “So You Want To Be A Data Scientist” that he was preparing. He used a few snippets from my full set of answers. The longer version of my answers provided additional advice. For aspiring data scientists of all ages, I provide here the full, unabridged version of my answers, which may help you even more to achieve your goal.
Curious to know how American Express uses machine learning successfully, in production, at very large scale?
Fraud represents the biggest loss for banks, accounting for upward of $1.744 billion in losses annually. The banking industry spends millions each year on technologies aimed at reducing fraud and retaining customers, but the spend does little in protecting banks. Let’s focus on why the current fraud detection approaches don't work as well as they should and how machine learning on big data can help.
As we close out the year, here is a look back at our 10 most popular blogs of 2014. Our top posts include machine learning and time series data topics, new milestones for the Apache projects Drill and Spark, and hands-on technical explanations that save you time and headaches.
The recent Skytree and MapR webinar ”Predictive Analytics with Machine Learning and Hadoop” proved to be highly interactive and engaging. As promised, Nitin and Jin have provided answers to questions that we were not able to get to during the webinar:
We previously discussed the “Top 8 Reasons that Characterization is Right for Your Data.” Here we move the discussion of characterization from the theoretical to the practical, by providing four simple examples of characterizations of data. In each of these cases, the set of characterizations that are generated can then be fed into different types of analytics algorithms for discovery from your data: predictive patterns, clusters (segments), associations, correlations, trends, and anomalies (outliers, surprises).
This question invariably comes up during big data discussions – ‘What is big data good for?’ Those who are close to the subject can quickly identify numerous examples of how big data can be used for the greater good, including some that are listed here: “Big Data and Hadoop for Competitive Advantage – 5 Sources of Insights and Opportunities.”
Many machine learning algorithms that are used for data mining and data science work with numeric data. And many algorithms tend to be very mathematical (such as Support Vector Machines, which we previously discussed). But, association rule mining is perfect for categorical (non-numeric) data and it involves little more than simple counting!
Real estate experts like to say that the three most important features of a property are: location, location, location! Likewise, weather events are highly location-dependent. We will see below how a similar perspective is also applicable to machine learning algorithms.
M.C. Srivas, CTO will participate on the Big Data Ecosystem partner panel at Splunk .conf2013.
Blog Sign Up
Sign up and get the top posts from each week delivered to your inbox every Friday!