The Big C of Big Data: Top 8 Reasons that Characterization is ‘ROIght’ for Your Data

The game of Monopoly has been popular for over 100 years, which is perhaps due to the game's features that seem to parallel real life: strategizing and making deals in order to accumulate benefits from the aggregate of many moves and decisions along the way – hoping to maximize ROI.  Moving forward and reaping rewards are critical – you do not want to be sent back several steps or to pay the “Luxury Tax” (or worse).  One of the other features of Monopoly that creates rousing debates within every family that has played the game is the large variation in each family’s interpretation of the rules:  What is the right thing to do in a given situation? Is a given action even allowed by the rules?

As in the game of Monopoly, we face similar challenges in the era of big data:  What are the essential moves that will take us beyond the feelings of shock and awe when we see the enormity of the data set sizes, analytics challenges, and potential rewards that lay before us?  How do we move forward with analytics, advancing toward our business goals while avoiding any seriously “wrong moves” along the way? (Side note: it is inevitable that we will make some wrong moves, but we usually want to make such mistakes early in the game, before they become irreversible  – this is called “Fast Fail”, which will be the topic of another article.)

In addition to the all-important business moves that we need to make in order to take big data to the next level (e.g., strategic planning, goal-setting, silo-smashing, and culture-building), one of the first “data sciencey” (analytics) moves that we should make is data Profiling (the “big P of big data” – i.e., getting to know our data).  One of the next important analytics moves is big data characterization (the “big C of big data”).  We will elaborate on that topic below, but first let us put this “big C” into the larger big data context.

We previously discussed this suggested new definition of big data:  big data is everything, quantified and tracked.  In that article we gave several general examples, justifications, and related business actions that correspond to this perspective on big data.  In fact, we introduced this concept previously under the letter “Q” in our article Big Data A to ZZ – A Glossary of My Favorite Data Science Things, where “Q” referred to the big data concept “Quantified and tracked.”  The “big P” (data profiling) was also listed there.  Another entry in that big data glossary was “C – Characterization”!  We described characterization this way:  “methodology for generating descriptive parameters that describe the behavior and characteristics of a data item, for use in any unsupervised learning algorithm to find clusters, patterns, and trends without the bias of incorporating class labels.”  We give a detailed example of characterization, applied to time series data analysis, in the article “Learning from Data, Big and Small, using Characterizations”.

Here are some ways that you can tap into the “big C” value stream, to improve your big data ROI (Return On your big data Investments, and Return On big data Innovation).  In particular, here are the top 8 reasons that characterization of your big data is the “ROIght” thing to do! 

Characterization provides the following benefits:

1) Generates useful metrics that track, monitor, and signal changes, events, and emergent behavior in your data assets (i.e., fulfilling the definition: “big data is everything, measured and tracked");

2) Creates compact, condensed representations of the essential information content of your big data;

3) Accomplishes the critical Data-to-Information conversion that brings you one step closer to completing the full Data-to-Information-to-Knowledge transformation, and thereby achieving your business’ D2D goals (Data-to-Discovery, Data-to-Decisions, and Data-to-Dollars);

4) Can be used to index and tag specific granules, objects, events, and features in your data collection for fast search, retrieval, and re-use; 

5) Provides the foundation of a versatile information-centric business use case-driven infrastructure for many uses and users of your data assets – tailored to specific needs;

6) Can be incorporated into business processes (e.g., using the BPEL = Business Process Execution Language) – either through automated data characterization, or human-assisted tag generation, or both;

7) Helps to find the Signals, Swans, and Shockwaves within data streams from multi-channel sources and sensors (including social, web, machine data, transaction logs, detectors, and other measurement systems), and then converts those signals into actionable information nuggets with targeted business value (i.e., finding patterns in the data haystack); and

8) Productively utilizes and delivers ROI from your data technology investments (such as The MapR M7 Database Edition) by providing a tailor-made application (characterization) for Hadoop big data processinge.g., mapping, crunching, and reducing customer data into customer characteristics across distributed clusters of processing nodes for competitive advantage and business insights.

Ultimately, data profiling (the “big P”) and characterization (the “big C”) enable you to increase ROI from your big data portfolio by assisting your organization in its moves to reach these five fundamental business objectives in the big data era

(a) Shifting now to a big data mindset;

(b) Developing your organization’s strategic big data analytics use case(s) that will differentiate your business from others;

(c) Preparing yourself for continuous disruption;

(d) Focusing on analytics value, not on big data volume; and

(e) Creating and nurturing an information-centric business culture at all levels, everywhere in your company.


Don’t be stuck with poor performing assets in the Monopoly game of life – get moving on your big data analytics journey.  Move forward to greater ROI with big data characterization. Become a data-driven business with best practices that drive innovation, revenue, and growth by making all the “ROIght” moves. 

no

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams

 

 

 

Download for free