Association Rule Mining – Not Your Typical Data Science Algorithm

Many machine learning algorithms that are used for data mining and data science work with numeric data.  And many algorithms tend to be very mathematical (such as Support Vector Machines, which we previously discussed).  But, association rule mining is perfect for categorical (non-numeric) data and it involves little more than simple counting! That’s the kind of algorithm that MapReduce is really good at, and it can also lead to some really interesting discoveries.

Association rule mining is primarily focused on finding frequent co-occurring associations among a collection of items.  It is sometimes referred to as “Market Basket Analysis”, since that was the original application area of association mining. The goal is to find associations of items that occur together more often than you would expect from a random sampling of all possibilities.  The classic example of this is the famous Beer and Diapers association that is often mentioned in data mining books. The story goes like this: men who go to the store to buy diapers will also tend to buy beer at the same time.  Let us illustrate this with a simple example.  Suppose that a store’s retail transactions database includes the following information:

  • There are 600,000 transactions in total.
  • 7,500 transactions contain diapers (1.25 percent)
  • 60,000 transactions contain beer (10 percent)
  • 6,000 transactions contain both diapers and beer (1.0 percent)

If there was no association between beer and diapers (i.e., they are statistically independent), then we expect only 10% of diaper purchasers to also buy beer (since 10% of all customers buy beer).  However, we discover that 80% (=6000/7500) of diaper purchasers also buy beer.  This is a factor of 8 increase over what was expected – that is called Lift, which is the ratio of the observed frequency of co-occurrence to the expected frequency.  This was determined simply by counting the transactions in the database.  So, in this case, the association rule would state that diaper purchasers will also buy beer with a Lift factor of 8.   In statistics, Lift is simply estimated by the ratio of the joint probability of two items x and y, divided by the product of their individual probabilities:  Lift = P(x,y)/[P(x)P(y)].  If the two items are statistically independent, then P(x,y)=P(x)P(y), corresponding to Lift = 1 in that case.  Note that anti-correlation yields Lift values less than 1, which is also an interesting discovery – corresponding to mutually exclusive items that rarely co-occur together.

The above simple example was made up, and it is very rare in real world cases to have Lift factors as high as 8.  But, there was a case where it did happen.  That case was discovered by Walmart in 2004 when a series of hurricanes crossed the state of Florida. After the first hurricane, there were several more hurricanes seen in the Atlantic Ocean heading toward Florida, and so Walmart mined their massive retail transaction database to see what their customers really wanted to buy prior to the arrival of a hurricane.  They found one particular item that increased in sales by a factor of 7 over normal shopping days.  That was a huge Lift factor for a real-world case.  That one item was not bottled water, or batteries, or beer, or flashlights, or generators, or any of the usual things that we might imagine. The item was strawberry pop tarts!  One could imagine lots of reasons why this was the most desired product prior to the arrival of a hurricane – pop tarts do not require refrigeration, they do not need to be cooked, they come in individually wrapped portions, they have a long shelf life, they are a snack food, they are a breakfast food, kids love them, and we love them.  Despite these “obvious” reasons, it was a still a huge surprise!  And so Walmart stocked their stores with tons of strawberry pop tarts prior to the next hurricanes, and they sold them out.  That is a win-win: Walmart wins by making the sell, and customers win by getting the product that they most want.

Another example of association mining was provided to me by a colleague of mine at George Mason University. He is a professor of geoinformation systems and earth science. He used this algorithm to examine the characteristics of hurricanes (internal wind speed, atmospheric pressure in the eye of the hurricane, wind shear, rainfall amounts, direction and propagation speed of the hurricane, etc.​), and he found a strong association between the final strength (category) of the hurricane and the values of those different characteristics.  He was able to predict hurricane intensification and its ultimate strength more accurately with association mining than the standard hurricane model used by the national hurricane center.  That was an amazing application of an algorithm that was initially developed for retail store transaction mining.

There was an equally impressive scientific application that I encountered several years ago when I was working at NASA.  Every summer we would have student interns working with us. These students were usually college undergraduates, and they were always very bright.  In one of those summers, we had a student who was not yet a senior in high school.  He heard me give a lunch talk on data mining, which I presented to the full cohort of summer interns that year.  He was working on a project with a NASA space physicist to try to predict when solar energetic particles would reach the earth after the occurrence of a major solar storm on the Sun.  He decided to try association mining.  What he did was very clever.  Similar to the hurricane example mentioned above, he collected characteristics of the solar storms on the Sun and geomagnetic events around the earth (as measured by NASA’s satellites) to look for predictive patterns. But the special thing that he did was to look at time-shifted data values.  For example, he compared events on the Sun with geo events with time lags of 1 hour, 2 hours, 3 hours, 4 hours, etc. in order to see when the peak correlation (association!) occurred.  He found it – the strongest geomagnetic effects were measured around the earth at approximately 2-3 hours after the solar storm.  His NASA mentor called me into his office to show me the amazing discovery by this high school junior, using the simple techniques that I taught him in my lunchtime seminar.  We were all quite impressed!

The above examples illustrate two very useful approaches when mining your own big data collections: (1) search for rare and unusual co-occurring associations of non-numeric items (which then makes for powerful predictive analytics); and (2) if you have time-based data, consider the effects of introducing a time lag in your data mining experiments to see if the strength of the correlation reaches it peak some time later.  My final example of association rule mining was discovered many years ago by a major electronics store that sold video cameras and video (VHS) players/recorders.  They mined their retail customer database and found that customers who buy a VHS player/recorder tend to come back to the store about 3-4 months later to buy a video camera (camcorder). The store then used this information to send discount coupons for camcorders to all of its customers who bought VHS player/recorders a few months earlier, in order to bring those customers back into the store to purchase a camcorder. Apparently, this customer engagement program worked!  And its success was due to association rule mining. With the massive quantities of big data that are now available, and with powerful technologies to perform analytics on those data, one can only imagine what surprising and useful associations are waiting to be discovered that can boost your bottom line. Start counting!


Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free