This blog was originally posted on the O'Reilly blog and reposted with permission.
Does it make sense for me to have a car? If so, which one is the best choice for my needs: a gasoline, hybrid, or electric? And should I buy or lease?
In order to make an effective decision, I need to understand key issues about the design, performance, and cost of cars, regardless of whether or not I actually know how to build one myself. The same is true for people deciding if machine learning is a good choice for their business goals or project. Will the payoff be worth the effort? What machine learning approach is most likely to produce valuable results for your particular situation? What size team with what expertise is necessary to be able to develop, deploy, and maintain your machine learning system?
Given the complex and previously esoteric nature of machine learning as a field – the sometimes daunting array of learning algorithms and the math needed to understand and employ them – many people feel the topic is one best left only to the few.
It doesn’t have to be that way. The key concepts in machine learning are useful and approachable when they are presented in an accessible way. Even for those who have the expertise to actually build the learning model themselves, it’s a big advantage to understand the underlying ideas, the best strategies, and the advantages or disadvantages of different designs.
For that reason, we wanted our recent report for O’Reilly to open the door to practical machine learning to make it useful for almost anyone whose work involves large datasets, regardless of their technical expertise. We chose recommendation as our focus. Recommendation is not only one of the most accessible types of practical machine learning, it also can reap huge benefits.
Do you know how to build a simple but powerful recommender? It’s easier than you think not only to see whether or not it’s a good idea to do recommendation but also to understand how to do it. This observation is true particularly if you know smart ways to simplify the process. The right kinds of simplification can reduce the time-to-market for building and implementing a recommender while keeping high-quality performance.
Working with my coauthor Ted Dunning, Chief Applications Architect at MapR Technologies and someone who has built some of the best-performing recommenders ever deployed in real-world settings, we explored some new ways to introduce and explain innovative, powerful designs for recommendation. We start with toy examples of user behavior (“I want a pony”) and move on to a serious example involving recommendation of artists on a mock music-listening website.
Using an algorithm from open source project Apache Mahout to build and train a machine learning model combined with the surprising use of search technology from Apache Solr to greatly simplify deployment of the recommender, we describe what you need to know in order to decide if this approach will be of value to you, and how to dive into the technical details if you choose to have your team build one.
At the root of all successful machine learning is the art of choosing the right input data for the job. Here’s where one more trick comes into play: let the public do a lot of the work for you. By “watching” what people do via logging their behaviors on a website, for instance, you can discover patterns of behavior that are valuable clues to what’s best to recommend for them. In this case, actions do generally speak louder (or more effectively) than words – such as those in ratings – do. In other words, you can understand what people want from watching what they do.
But keep in mind that even if I do buy a car, I still want a pony…