Would you like to be able to analyze data seamlessly between up-to-the-minute real-time reporting and long-term aggregation, without the need for reprocessing of temporary real-time estimates? And would you like to do that accurately and with a simple architecture? Would you like it if your CEO doesn’t find any more nasty discrepancies in your metrics?
Last week at Berlin Buzzwords 2013, MapR’s Ted Dunning showed how to do this with both metrics and with many forms of machine learning in his fourth #bbuzz talk titled “Real-time Learning for Fun and Profit,” presented to a packed room.
Several key MapR features make it possible for the approach Ted described to be incredibly simple. These features include NFS access to the MapR distributed storage, as well as reliable, small footprint MapR snapshots. Under the covers, this approach employs a combination of replay logs, aggregation checkpoints, and snapshots to implement a real-time system with an analysis horizon from now to years in the future.
If you need to collect and react to data as it arrives but also need to store data over a long time frame, this approach may help you. Traditionally, it has been difficult to do this accurately and yet keep up-to-the-instant in reporting. Ted’s approach applied via a MapR system is exact, correct and consistent as analysis moves smoothly from real-time to long-time.
The example that Ted focused on at Buzzwords was the problem of maintaining simple counts, such as how many page views there are for a particular site, but this same approach can be applied to any problem involving associative aggregations. This includes unique counts (e.g. how many unique visitors to a web site), finding heavy hitters or trending topics (things getting the highest number of hits) and even the co-occurrence counting required by recommendation engines.
If you would like to know about how this approach could improve your business metrics and machine learning efforts, contact Ted at MapR email@example.com or view his slides.