Perhaps you’re old enough to remember when the library was the place we went to learn. We foraged through card catalogs, encyclopedias and the Reader's Guide to Periodical Literature in hopes that we’d be able to understand what was going on in other people’s minds when they decided what went where.
The process was time-consuming, frustrating and often futile. We collected more data than we needed because we didn’t want to have to come back to look it up. If our objective was to write a report, we had to anticipate every detail in advance because once the results were on paper, there was no going back.
Think of how search changed things. We no longer have to memorize or write down facts on the chance that they might be needed. When we want a fact, we can look it up in an instant. There’s no need to navigate for information because search engines index everything, and information can be categorized many ways at once. And we don’t have to clog up our notebooks – or our minds – with facts we don’t need because we can get them later.
Search has changed our behavior. It has made the once-arduous process of finding and retaining information fast and fluid. It has freed us from the constraints of rigid organizational structures so we can think creatively. And it has made research a process of continuous iteration and discovery. As we try new terms, we stumble upon unexpected ideas and perspectives, which may take us off in new directions. There are no barriers to entry and no rules.
Think of the economics of big data the same way. It changes not only the scope and volume of data we can process, but also the way we think about data in the first place.
Traditional data warehouses are a lot like libraries. They contain only the information that humans have made a conscious decision to put into them. They’re structured, organized and accessible only within narrow parameters defined by data types and query languages. They have great value, but they don’t align well to the way people think.
Big data is to data warehousing what a search engine is to a library. In fact, Hadoop actually has its roots in search engine research. It easily handles structured, semi-structured and unstructured data, as do the new class of NoSQL databases. You can easily add new data types and queries, continually experimenting and refining your questions to look for new angles. Often you discover correlations that you didn’t expect, and that may take you off in a new direction. Just like a search engine.
Hadoop has also redefined the economics of big data, reducing the per-terabyte cost by more than 95%. When cost becomes a non-issue, adoption always soars. Smart organizations are using these new economic rules to make big data services available to anyone who needs them.
Big data should change the way we think about how we collect and process data in these ways:
Access to analytics should be ubiquitous. People in the organization who believe they can be more effective by better understanding data should have the opportunity to do so. There are simply no valid economic arguments to the contrary.
Analytics should be more exploratory. The cost of data warehouses limits their use to applications that have well-defined business value. Big data gives you the freedom to experiment. Think of this analogy: In the days when mainframes ruled, almost no one had the resources to perform “what-if” modeling. Personal computers obliterated those cost barriers, freeing users to innovate in how they used data. That completely changed the way business was done.
Analytics should be iterative. Here we can steal an idea from DevOps, the wildly popular new deployment and development technique that breaks big projects up into tiny pieces with frequent deliverables and a constant feedback loop. DevOps yields better software because code is continually improved as it is developed. Think of analytics the same way, not as a “big bang” event with a definitive outcome, but more as a process in which interesting discoveries are investigated, revisited and refined.
Gather as much data as possible. The cost of storing and processing data is now so low that the question of what to keep has given way to the question of what not to keep. Give users a lot of latitude to import and crunch new data sets. Interesting relationships will invariably emerge.