Raising the Standard in the Big Data Analytics Profession

The sign of maturity of most technologies is the appearance of standards. Standards are used to enable, to promote, to measure, and perhaps to govern the use of that technology across a wide spectrum of communities. Standardization increases independent use and comparative evaluations of the technology.  

Standards may pertain to processes, including business process improvement (such as Six Sigma), software engineering (such as CMM, the capability maturity model), quality management (such as ISO 9000/9001), education delivery (such as Common Core), and data mining (such as CRISP-DM, the Cross Industry Standard Process for Data Mining).  Standards may also pertain to codes of conduct (as in the military, medical, accounting, and legal professions).  Other standards apply to digital content, including: (a) interoperable data exchange (such as GIS, CDF, or XML-based data standards); (b) data formats (such as ASCII or IEEE 754); (c) image formats (such as GIF or JPEG); (d) metadata coding standards (such as ICD-10 for the medical profession, or the Dublin Core for cultural, research, and information artifacts); and (e) standards for the sharing of models (such as PMML, the predictive model markup language, for data mining models).  Standards are ubiquitous.  This abundance causes some folks to quip: “The nice thing about standards is that there are so many to choose from.”

Standards are now beginning to appear also in the worlds of big data and data science, providing evidence of the growing maturity of those professions.  I am not referring to standard programming paradigms such as Hadoop, or standards for enterprise search, though they are standards too.  I am referring to standards related to the “big data profession” (if we accept that there is such a thing).  I describe two of those standards here and introduce a third.

1. Big Data Analytics Maturity Models – There are in fact several “standards” emerging in the area of analytics capability maturity. Fortunately, they are not very different and are therefore likely to converge.  One of these has been presented by TIBCO – their six steps toward analytics maturity are: Measure, Diagnose, Predict and Optimize, Operationalize, Automate, and Transform.  Another example is presented through the SAS Analytics Assessment, which evaluates your business analytics readiness and capabilities in several areas. The B-eye Network analytics maturity model mimics software engineering’s CMM – the 6 levels of maturity are: Level 0 = Incomplete; Level 1 = Performed; Level 2 = Managed; Level 3 = Defined, Level 4 = Quantitatively Managed; and Level 5 = Optimizing.  The most “mature” standard in the field is probably the IDC Big Data and Analytics (BDA) MaturityScape Framework. This BDA framework (measured across the five core dimensions of intent, data, technology, process, and people) consists of five stages of maturity, which essentially parallel those presented above: Ad hoc, Opportunistic, Repeatable, Managed, and Optimized.  An analysis of a utility industry survey based on the IDC BDA framework revealed the top 10 traits that most distinguish high-achieving BDA utilities – this article contains many good insights and links to the relevant IDC reports.

All of these are excellent models for analytics maturity.  But, if you find these different models to be too theoretical or opaque or unattainable, then I suggest a more practical model for your business analytics progression from ground zero all of the way up to cognitive analytics:  from Descriptive, to Diagnostic, to Predictive, to Prescriptive, to Cognitive.

2. Data Science Code of Professional Conduct – The Data Science Association has formalized a code of professional conduct for data scientists.  This extensive model of standard behavior in the profession provides very detailed guidance under nine rule categories:  Terminology, Competence, Scope of Professional Services, Communication with Clients, Confidential Information, Conflicts of Interest, Duties to Prospective Clients, Quality of Data and Quality of Evidence, and Maintaining Integrity (including examples of misconduct).   Data Ethics is therefore an essential component of data literacy and data science practice.  Education programs in data science should cover ethical issues.  For example, the Data Ethics course at George Mason University addresses these objectives:  “Students engage in activities and discussion related to the serious ethical issues arising from the widespread distribution of data and information in the Internet age. Students gain a deeper understanding of ethics as it applies to the use and interpretation of data in the sciences. In addition to statistical and scientific case studies, students are presented with practical ethical challenges that they may face in their future corporate, government, or academic employment. As an added benefit, students acquire RCR (Responsible Conduct in Research) Certification or else HSR (Human Subjects in Research) Certification.” As part of the course, students are required to read the excellent little classic “How To Lie With Statistics”, which provides serious and humorous examples of misuse and abuse of statistics (intended to show what not to do).

3. Meaningful Use (MU) – the Health IT profession has established MU criteria and stages for using EHR (Electronic Health Record) technology.  It appears that similar MU criteria would make a lot of sense in the Big Data Analytics profession, particularly since EHR usage is a form of Big Data Analytics (specifically for healthcare). The three stages of MU for big data analytics would look like this:  Stage 1 – data capture and sharing; Stage 2 – advance big data analytics practice and processes; and Stage 3 – improve outcomes.   The objectives of MU in healthcare are also transferrable to big data analytics: (a) improve quality, safety, efficiency, and reduce disparities; (b) engage stakeholders; (c) improve coordination and outcomes; and (d) maintain privacy and security of information. A significant amount of effort is going into the development of MU in healthcare – consequently, big data professionals can follow their lead, identify appropriate best practices in MU, and transfer their usage to the analytics profession.  Several good examples of this are presented in the article “From Meaningful Use to Meaningful Analytics.”

Therefore, as we progress on our big data analytics journeys, let us be confident that the field is maturing on all fronts:  technology, algorithms, employee skills, best practices, and governance of the profession.  When in doubt, look around – someone is probably addressing the issue that you are facing and providing one or more standards for dealing with it.

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams

 

 

 

Download for free