Building a Simple Recommender

One of the most accessible ways to use machine learning on big data scale is to build a recommender with Apache Mahout. It's one thing to build a recommender but quite another to build one that works really well. With the right approach, however, building a simple, effective recommender can be easier than you may think.

Mahout committer and MapR Chief Application Architect Ted Dunning has been talking about some tips and tricks to make the process of building and deploying a recommender easier. Recently he presented aspects of this topic at the Twin Cities HUG in St. Paul, Minn, at DFW Big Data meet-up in Dallas and most recently at the San Diego Hadoop User Group.

This advice can make the task much less daunting for the developer and a much better experience for the user:
  • Avoid ratings as input data to show preferences
  • Use co-occurrence and the Mahout LLR algorithm to build an offline recommendation engine
  • If possible, use more than one type of input data (multi-modal recommendation) for much stronger results
  • Revolutionize the deployment by using conventional search technology such as Apache Solr

How do you get started? The first key task in building a great recommender is to choose the right type of data. But what data accurately reflects human preferences? The answer may surprise you. People often assume that human-produced ratings are one of the best ways to know preferences, but generally this is not correct. Instead of ratings, watch people's behavior.

Watching what people do is far more powerful than having them record what they say they like.

Ted also explains how to use co-occurrence statistics based on actual behavior to accurately predict preferences. This approach does require some knowledge of matrices and linear algebra, but much less than you might expect. In his presentation he shows the basic pattern of analysis that lets you build a co-occurrance matrice based on behavior. By setting many of the elements to zero, you end up with an indicator matrix needed for making recommendations.

Keep in mind that the initial phase of the project, building the Mahout recommender offline, can be carried out on a Hadoop cluster.

Building the recommender on cluster: Input data is a complete history of user behavior related to specific items. Co-occurrence analysis sets up the basis for making new recommendations based on past behavior of same or other users.

The most surprising aspect of the novel approach Ted describes is that you can use existing text retrieval software in new ways in order to deploy a recommender. For example, Apache Solr can be used to easily deploy recommendations derived with the Mahout recommender.

Conventional search technology provides a short cut to deploying the Mahout recommender.

This use of search technology capitalizes on the high reliability and performance of text retrieval software but searches for preferences instead of text. This use of search technology enormously simplifies deployment of a recommendation engine.

The new approach to recommendation with Mahout was presented at the Twin Cities HUG on the same day as the new 0.8 version of Apache Mahout was released. The talk was well-received by HUG participants:



If you'd like to know more about Apache Mahout recommendation:

Watch a video of Ted Dunning's Twin HUGs presentation at this link http://www.youtube.com/watch?feature=player_embedded&v=bXtX6lPoBME
See Ted's slides at: http://slidesha.re/1enPV4c
Read the Mahout in Action book published by Manning. Ted Dunning and Ellen Friedman are two of the co-authors.

Follow on Twitter:

Join the Apache Mahout meet-up in the Bay Area http://www.meetup.com/Mahout/

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams

 

 

 

Download for free