Machine Learning Blog Posts

Posted on February 10, 2017 by Ronald van Loon

Businesses today need to do more than merely acknowledge big data. They need to embrace data and analytics and make them an integral part of their company. Of course, this will require building a quality team of data scientists to handle the data and analytics for the company.

Posted on December 8, 2016 by Carol McDonald

This post is the second in a series where we will go over examples of how MapR data scientist Joe Blue assisted MapR customers, in this case a regional bank, to identify new data sources and apply machine learning algorithms in order to better understand their customers. In this second part, we will cover a bank customer profitability 360° example, presenting the before, during and after.

Posted on December 5, 2016 by Ronald van Loon

Business owners and executives today know the power of social media, mobile technology, cloud computing, and analytics. If you pay attention, however, you will notice that truly mature and successful digital businesses do not jump at every new technological tool or platform.

Posted on June 7, 2016 by Carol McDonald

Standards and incentives for the digitizing and sharing of healthcare data along with improvements and decreasing costs in storage and parallel processing on commodity hardware, are causing a big data revolution in health care with the goal of better care at lower cost.

Posted on May 31, 2016 by Jim Scott

Just a few years ago, using a fingerprint to sign on to your phone seemed futuristic. Today, it’s everywhere and just the beginning of how biometrics will be woven into our lives.

Posted on April 8, 2016 by Ankur Desai

What is predictive maintenance?
If we can predict a part failure well in advance, we can schedule maintenance/repair work for the part as per our convenience, while continuing to operate the equipment to avoid unexpected downtime. This will help reduce large repair expenses, as the part will be repaired or replaced well before it fails

Posted on April 7, 2016 by William Cairns

Having participated in a number of fantasy sports leagues and being a Data Scientist at MapR gives me a unique perspective on my approach to choosing who I think will most likely “win” the predictions for the six players, ranked in order, who I predict will most likely to finish in 10th or better place this year (and hopefully 1st) based on my statistical modeling are:

Posted on March 14, 2016 by Joseph Blue

There are 150 quintillion (i.e. the one after quadrillion) permutations to consider when completing your NCAA bracket. Some of us don’t have time to review them all; if you are likewise short on time, you can let MapR do the heavy lifting for you and get your personalized bracket from the Crystal B-Ball!

Posted on March 2, 2016 by Kirk Borne

Dimensionality reduction is a critical component of any solution dealing with massive data collections. Being able to sift through a mountain of data efficiently in order to find the key descriptive, predictive and explanatory features of the collection is a fundamental required capability for coping with the Big Data avalanche.

Posted on January 22, 2016 by Kirk Borne

As a lifelong computational scientist (and now data scientist) I have always been fascinated with numbers, especially lists and tables of things (= databases!).

Posted on January 21, 2016 by Jim Scott

Getting from point A to point B has been one of humanity’s greatest preoccupations throughout history. While we’ve developed new methods of transportation such as railroads, cars, trucks, and airplanes, they never seem to be fast enough.

Posted on January 19, 2016 by Jim Scott

Companies everywhere are looking for ways to improve customer service. For example, companies with call-in support centers might track how long agents take to answer calls, or how long customers stay on hold.

Posted on January 18, 2016 by Kirk Borne

The Internet of Things (IoT), with its ubiquitous sensors and streams of big data for big insights, has an estimated market valuation of 17 trillion U.S. Dollars. Apparently, the "sensoring" of the world is a seriously big deal, generating insights into people, processes, and products on a scale that is almost incomprehensible. Certainly 17 trillion dollars is almost incomprehensible.

Posted on January 6, 2016 by Kirk Borne

Someone once said “if you can’t measure something, you can’t understand it.” Another version of this belief says: “If you can’t measure it, it doesn’t exist.” This is a false way of thinking – a fallacy – in fact it is sometimes called the McNamara fallacy.

Posted on December 1, 2015 by Joseph Blue

The meteoric growth of available data has precipitated the need for data scientists to leverage that surplus of information. This spotlight has caused many industrious people to wonder “can I be a data scientist, and what are the skills I would need?”.

Posted on June 12, 2015 by Kirk Borne

I was recently asked five questions by Alex Woodie of Datanami for the article, “So You Want To Be A Data Scientist” that he was preparing. He used a few snippets from my full set of answers. The longer version of my answers provided additional advice. For aspiring data scientists of all ages, I provide here the full, unabridged version of my answers, which may help you even more to achieve your goal.

Posted on April 3, 2015 by Ellen Friedman

Curious to know how American Express uses machine learning successfully, in production, at very large scale?

Posted on March 13, 2015 by Nitesh Kumar

Fraud represents the biggest loss for banks, accounting for upward of $1.744 billion in losses annually. The banking industry spends millions each year on technologies aimed at reducing fraud and retaining customers, but the spend does little in protecting banks. Let’s focus on why the current fraud detection approaches don't work as well as they should and how machine learning on big data can help.

Posted on December 31, 2014 by Karen Whipple

As we close out the year, here is a look back at our 10 most popular blogs of 2014.  Our top posts include machine learning and time series data topics, new milestones for the Apache projects Drill and Spark, and hands-on technical explanations that save you time and headaches.  

Posted on August 12, 2014 by Pat Farrel
There are big changes happening in Apache Mahout. For years it’s been the go to machine learning library for Hadoop. It contained most of the best-in-class algorithms for scalable machine learning, which means clustering, classification, and recommendation. But it was written for Hadoop and mapreduce. Today a number of new parallel execution engines show great promise in speeding calculations by as much as 10-100x (Spark, H2O, Flink). That means instead of buying 10 computers for a cluster, a single one may do. That should get you manager’s attention.
Posted on July 25, 2014 by Karen Whipple

The recent Skytree and MapR webinar ”Predictive Analytics with Machine Learning and Hadoop” proved to be highly interactive and engaging.  As promised, Nitin and Jin have provided answers to questions that we were not able to get to during the webinar:

Posted on June 12, 2014 by Kirk Borne

We previously discussed the “Top 8 Reasons that Characterization is Right for Your Data.” Here we move the discussion of characterization from the theoretical to the practical, by providing four simple examples of characterizations of data. In each of these cases, the set of characterizations that are generated can then be fed into different types of analytics algorithms for discovery from your data: predictive patterns, clusters (segments), associations, correlations, trends, and anomalies (outliers, surprises).

Posted on May 8, 2014 by Kirk Borne

This question invariably comes up during big data discussions – ‘What is big data good for?’  Those who are close to the subject can quickly identify numerous examples of how big data can be used for the greater good, including some that are listed here: Big Data and Hadoop for Competitive Advantage – 5 Sources of Insights and Opportunities.”

Posted on April 28, 2014 by Kirk Borne

Many machine learning algorithms that are used for data mining and data science work with numeric data. And many algorithms tend to be very mathematical (such as Support Vector Machines, which we previously discussed). But, association rule mining is perfect for categorical (non-numeric) data and it involves little more than simple counting!

Posted on March 27, 2014 by Kirk Borne
In a previous post, I presented a “Big Data A to Z Glossary of my Favorite Data Science Things”. I would like to focus here on one of those favorite things, which also happens to be the entry associated with my favorite letter.
Posted on March 13, 2014 by Kirk Borne

Real estate experts like to say that the three most important features of a property are: location, location, location!  Likewise, weather events are highly location-dependent.  We will see below how a similar perspective is also applicable to machine learning algorithms.

Posted on October 1, 2013 by Karen Whipple
MapR will be at a wide range of events this month where we will participate on panels, deliver keynotes and present tutorials.

M.C. Srivas, CTO will participate on the Big Data Ecosystem partner panel at Splunk .conf2013.

Posted on June 10, 2013 by Ellen Friedman
When people sit down to build a real-time big data reporting system, it is very common that compromises creep into the design. These compromises result in a “quick and dirty” analysis – the thought being that in order to get rapid results, you must give up accuracy or consistency or even any notion of what failure modes might exist. But Ted Dunning says that to get “quick” you don’t have to settle for “dirty.”

Posted on June 7, 2013 by Ellen Friedman
It was really exciting to be in Berlin for the open source Berlin Buzzwords 2013 conference this week. The audiences were energized, and one of the really hot topics was a lot of interest and enthusiasm for search.

Posted on August 2, 2011 by Jack Norris
Ted is just back from OSCON where he co-presented Hands On Mahout - Mammoth Scale Machine Learning. The slides are available for this tutorial that covered the evolution of Mahout for clustering and classification of large datasets. If you’re interested in hands-on instruction for machine learning and how specific algorithms can be used to solve real-world problems – this is the session.

Blog Sign Up

Sign up and get the top posts from each week delivered to your inbox every Friday!

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free