This question invariably comes up during big data discussions – ‘What is big data good for?’ Those who are close to the subject can quickly identify numerous examples of how big data can be used for the greater good, including some that are listed here: “Big Data and Hadoop for Competitive Advantage – 5 Sources of Insights and Opportunities.” In a discussion on this topic recently with a high-level government official, I paused to reflect before answering the question, in order to try to give a fresh perspective, and ultimately I decided to characterize my point of view simply as a trio of D2D’s: Data-to-Discovery, Data-to-Decisions, and Data-to-Dollars. I summarize each of these here:
As a scientist, I was first drawn into data science and into becoming an advocate for big data as a consequence of massive data’s enormous potential for new discoveries. In order to achieve those discoveries, the algorithms and methods of data mining and machine learning come into play. These data science techniques enable 4 major categories of Data-to-Discovery:
Correlation discovery - finding the hidden patterns and trends in the data
Novelty discovery - finding surprises, anomalies, outliers, and unexpected items in your data space
Association discovery - finding unusual, improbable co-occurring features or products in the data set
Class discovery - finding new categories and classes of items, events, or behaviors in your domain
Since these correlations and associations represent “knowledge” about our domain, we sometimes refer to this D2D as “D2K: Data-to-Knowledge”. For example, the NIH has focused a major research initiative in this area, called BD2K (Big Data to Knowledge): http://bd2k.nih.gov/. One does not need to be a scientist to appreciate the value of discovery in any domain – e.g., in retail analytics, businesses try to discover new marketing opportunities, new customers, new ways to engage existing customers, new signals that they are about to lose a customer, new categories of customer interests, or any insights that move them closer to their business goals. Discovery brings joy to business analysts, marketing pros, and to scientists, especially to the data scientist. The potential for greater discoveries makes big data even more of a joy to work with. We are always searching for the next example of “beer and diapers” or “hurricanes and strawberry pop tarts” in our big data collections.
Achieving actionable intelligence (that informs good decision-making) from big data can often be very difficult. This is sometimes referred to as “the last mile challenge.” Taking the bits and bytes of big data, converting those into information packets, and then connecting those information “dots” into actionable knowledge takes both data science and business instinct. Joint human-computer cooperation is essential, particularly for non-trivial decisions. Some decisions may not require human intervention, such as the decision to deliver a discount coupon to a disengaged customer, or to send a welcome back message to a returning customer, or to offer a recommendation from a recommender engine to a customer when they view their online shopping cart.
Algorithms can make suggestions, even autonomously based upon your business rules and processes, but critical decisions need a person-in-the-loop. Objective evidence-based decisions by busy humans are enabled, empowered, and informed by data. As your enterprise grows, the more autonomy you may choose to assign to autonomous decision agents (e.g., Syntasa’s Decision Science-as-a-Service). These “agents” are guided by machine learning algorithms that have been trained and validated by data scientists and analysts exploring your big data collections. Sometimes, even simple statistical measures of a data sample can inform important decisions – data profiling is one of the simplest tools for data-to-decisions. The 4 primary steps in data profiling are:
Data Preview and Selection
Data Cleansing and Preparation
Data Typing for Normalization and Transformation
Each of these steps bring you closer to your data (i.e., “knowing your data”), thus bridging that last mile gap: from data to actionable intelligence and data-driven decisions.
We owe the instantiation of this concept to Jaime Fitzgerald of Fitzgerald Analytics. The meaning is clear – big data is the “new oil”. It is the new source of revenue, the fuel of the new innovation economy, the driver of wealth creation in the information age, and the MVA (Most Valuable Asset) for most businesses, industries, domains, and agencies. Insights on customer behaviors, preferences, and responses to stimuli can be delivered to marketing teams in real-time, autonomously, at a person-specific level of granularity. This is pure gold to business. Both Data-to-Discovery (insights) and Data-to-Decisions are essential ingredients toward achieving business value (Data-to-Dollars) from the big data and analytics resources within your organization.
Data as an Asset
As itemized in the article “Business Leaders Need R’s not V’s: The 5 R’s of Big Data”, we see how big data delivers relevant, realistic, and reliable insights (discoveries) about your business domain (Data-to-Discovery), enables real-time decisions (Data-to-Decisions), and provides greater ROI from your data assets (Data-to-Dollars). Big Data is especially good for achieving the modern version of ROI: Return On Innovation! Therefore, after hearing a lot of discussion about the “V’s of Big Data” over the past few years, which has now been expanded to include the “R’s of Big Data”, we have inevitably arrived at the D2D’s of big data goodness. It is especially exciting to see the powerful new tools that bring these promises to fruition, including the new Apache Spark implementation within the MapR Hadoop distribution. Check it out, and you will definitely see what it is good for!