It’s no surprise to hear that data is growing quickly. An IDC study earlier this year confirmed that data is growing faster than Moore’s Law. This means that however you’re processing data today, tomorrow you’re going to be doing it with many more servers. Clusters will continue to expand within your environment.
Put another way, the rate of data growth has changed the bottleneck. The network is now the bottleneck, not the disk. The amount of data to analyze, makes it unwieldy to drag it across the network. It’s much more efficient to perform data and compute together and send the results over the network.
This introduces a new computing paradigm, and is the driver for MapReduce. A poster child for this is Google. We now take Google’s dominance for granted, but when Google launched their beta in 1998 they were late. They were the 19th search engine to enter the market. Yahoo was dominant, there was Infoseek, Excite, Lycos, Ask Jeeves, AltaVista, and a host of others. Within two years Google was the leader. It wasn’t until Google published a paper in 2003 that we got a glimpse of their back-end architecture. Google was able to reach dominance because they recognized early on the paradigm shift and they were able to index more data, get better results and do it much, much more efficiently and cost effectively than their competitors. They went from 19th to first in a few short years because of MapReduce.
A Yahoo engineer by the name of Doug Cutting read that same paper in 2003 and developed a Java implementation of MapReduce named after his son’s stuffed elephant that became the basis for the open source Hadoop project. Hadoop has grown to include a robust ecosystem. MapR is dedicated to expand the capabilities of Hadoop to bring the full promise of MapReduce to all organizations. With the incredible power of MapReduce it’s important for your organization to realize the benefits before the 19th player in your market moves to dominance. We’re here to help.