Ted Dunning (Chief Application Architect at MapR) and Ellen Friedman have written a new O’Reilly Media book on “Practical Machine Learning – Innovations in Recommendation" (released in January 2014). This book examines one of the most interesting, fun, and powerful data science applications in the big data universe: recommendation systems. For me, this was one of the most interesting applications of data mining that immediately captured my imagination after I embarked on the journey to data science (drifting away from my astrophysics roots) about a dozen years ago. It is also one of the most common use cases that are taught in data science MOOCs and other analytics training courses.
Why Recommendation Systems?
I believe that the love affair with recommender systems can be partly attributed to two things. First, nearly all of us have experienced, benefited from, and greatly appreciated Amazon’s remarkable insight in building the first enterprise-scale recommender system and for their own wild success with it (along with that of Netflix and many others). Second, recommender systems provide such a clear and demonstrable proof of the value of big data and data science—as if we need any more proofs—and I use examples of recommender science in nearly all of my public presentations. Dunning and Friedman’s book begins with a simple toy example: everyone wants a pony! (We will come back to this later.) It is important to note that the book uses this toy example as a playful illustration in order to demonstrate the exact opposite: why knowing that everybody wants a pony is not the best way to make interesting and effective recommendations.
A recommender system is relatively easy to understand, straightforward to justify (to upper management), and intuitive to design. It provides valuable insights and actionable intelligence on your customers and it has obvious metrics of success; the ROI (Return On Investment, or Return On Innovation) of such big data science initiatives are convenient to measure and track. Usually, the required metrics are already part of your website and customer analytics packages, right out of the box. In other words, the answers to these questions are readily available: What pages did my customers visit? What products were shown to them? What did they click on? What did they add to their shopping cart? Which shopping carts were abandoned? What was in those abandoned carts? What did they finally purchase? Designing and performing experiments on different implementations of your recommender engine (to find the most predictive and profitable predictive models) is “powerful Jedi” data science.
Design Patterns for Predictive Analytics
As demonstrated above, recommender systems are essentially predictive analytics engines. Starting with (a) historical training data (e.g., the web usage and/or purchase patterns of a particular customer relative to previous customers) and (b) a variety of similarity calculations (measuring the overlap in those purchase patterns), the engine can then predict what each particular customer is most likely to do next: advance to a specific page on your website, or download a specific item from your inventory, or purchase a particular product in your catalog. Consequently, we see the two main categories of elements in the design pattern for any predictive analytics activity (and for recommender systems in particular): historical logs (training data) and a supervised machine learning algorithm. We will briefly describe here a few examples of design patterns for recommender systems. Then, in our next article, we will examine, explore, and expand upon some special aspects of these in more depth.
So, what is a design pattern? Here is a good definition: “a general repeatable solution to a commonly occurring problem; a description or template for how to solve a problem that can be used in many different situations.” In essence, a design pattern is a proven development paradigm applied to a particular class of problems. For us, the specific class of problem is designing a recommender engine. The design patterns that we list here can be used for movies, books, restaurants, news articles, music, and more—the patterns are content-agnostic. We identify four different design patterns that are useful in recommender engines for predicting customer behavior in the customer experience environment (e.g., online store, browser, smartphone app, or whatever): co-occurrence matrices, vector space models, Markov models, and “everyone gets a pony (the most popular item).”
- The co-occurrence matrix, described in Dunning and Friedman’s book, is the cross-matrix of all possible product pairs A and B that were co-purchased by prior customers. Analysis of non-zero elements in this matrix identifies which co-occurrences are anomalous, that is, are more frequent than you’d expect by independent occurrence of items. These anomalous co-occurrences become indicators for potential offers of product B for customers who buy product A. This approach is based upon the association rule mining algorithm (a limited form of an approach called market basket analysis).
- Vector space models are useful for both customer modeling and product modeling. This begins with building a feature vector, consisting of either a set of features that describe a customer (e.g., products of interest, features of interest, manufacturers of interest, purchase frequency, price range, etc.) or a set of features that describe a product (e.g., content, author/creator, theme/genre, etc.). Cosine similarity calculations are then made against these feature vectors to identify similar customers (X,Y) and similar products (A,B). In the first case, products are offered to customer X based upon the purchase history of similar customer Y. In the second case, the customer is offered product A based upon its similarity to product B that the customer has previously purchased or has recently looked at (but not purchased).
- Markov models are a form of probabilistic model that can be used to predict elements of a sequence, usually a temporal sequence (e.g., the weather, the stock market, network traffic, web clicks within a site, or purchase patterns). Markov models have a restricted form, mathematically speaking, and this restriction can sometimes make it possible to learn a Markov model from past data (training data). This model can be used to predict probabilities of future events, such as the next most likely thing that the customer will do or buy.
- Most Popular (“Top 40”) Items, or “Everyone Gets a Pony”: this model is the world’s simplest. After you find the items that everyone likes and nearly everyone else has purchased, then you offer those items to every new customer, since they too may buy them. This “Top 40” model is not very interesting and does not require a complex learning model, but the product may be a guaranteed seller. Such a simplistic model may be most useful in online stores that have a specific branding (e.g., electronics, or books, or movie rentals) and that also have other popular items that customers may not be aware of. Consequently, you want to make customers aware of those popular off-brand items (e.g., gift cards, reading lamps, or popcorn) while they are still shopping in your store.
Diversity in Recommender Systems
Finally, we note that the Holy Grail in recommender systems is diversity—that’s where the real gold at the end of the rainbow will be found! In other words, there is a higher chance that a recommendation will be accepted by the customer if that offer is interesting, surprising, and unexpected, compared to a routine “predictable” recommendation. For example, if I buy a washing machine, then the worst possible recommendation would be for another washing machine from a different manufacturer, while a better recommendation might be for a companion clothes dryer from the original washing machine’s manufacturer. An even more interesting recommendation might be for a year’s supply of washing powder at 40% off the regular price. In the case of books or movies, what might be really interesting are recommendations for products that are similar to your original purchase except for being very different in one or two features (e.g., not a book by the same author on the same subject, but an illustrated companion to the other). So, even if you want to offer everyone a pony, then at least offer them a little horse of a different color.