Over the course of the last year or so, the buzz-phrase “dark data” has entered the common lexicon of data management, information management, technology and business analytics circles. When I first heard the term my mind conjured up the image of the bat-cave wired with the technical capabilities to track Gotham City’s super villains. Armed with my rapacious curiosity I set out on a deliberate quest to study the shadowed periphery of the information landscape.
This four part blog series shall examine the emergent phenomenon known as “dark data” with the objective of evaluating and contextualizing the trend’s influence on information management practice discipline.
PART 1: PEERING BEHIND THE MASK – LOOKING BEYOND THE HYPE TO DEFINE DARK DATA
My comic book daydreams aside… there is still no common definition of what constitutes dark data amongst those who seek to use it. For example, Gartner positions dark data as “information assets that are collected, processed and stored” over the regular course of business and in turn fail to be leveraged for their value to support additional business purposes (such as analytics, direct monetization or business relationships). The Gartner definition goes further to expressly articulate that many organizations retain dark data only for compliance purposes incurring additional operational expense and risk as opposed to business value. Accordingly, Gartner’s definition seemingly pits compliance and legal risk against operational risk… vanquishing the potential for a balanced risk-based approach to data management for any organization.
On the other hand, I would define dark data as information that an organization is unable to efficiently identify, process and/or utilize but know that it exists based on the effect this data type has on its other information-based assets. Archivists, librarians and records managers alike are well acquainted with this reality… knowing that a large volume of an organization’s information assets have traditionally been paper-based and stored offsite with little ability in the present to extrapolate analytics from the collection besides this initial indexing action. Records managers know that these offsite records are there … somewhere… and work through the implementation of organizational strategies to minimize the information asset risks associated with what they commonly call “orphaned” or even “unmanaged” record collections. This traditional information management model has learned there is an important difference between the information an organization can’t see versus the information that hasn’t yet been discovered. Accordingly, these same patterns of practice should be considered within the framework of the digital age as many organizations begin to map out and implement their big data management strategies.
Stay Tuned for Part 2 of our Series…
Activating the Bat Signal – Shining a Light on Enterprise Information Governance.