The digital universe consisting of all the data we create annually, is currently doubling in size approximately every twelve months. According to research by IDC, the total data is expected to reach 44 zettabytes in size by 2020. That’s 44 trillion gigabytes and will contain nearly as many digital bits as there are stars in the universe. Likewise, it is predicted that by 2030 more than 90% of this data will be unstructured data. This explosion of data is far exceeding our capacity to actually use it. Nearly all companies (and even individuals) store data that they will never access again, just because cloud storage is now cheap and available to everyone.

Only a small fraction of all that data is in a traditional, structured form which is easily accessed and used by organisations. A more substantial part of big data is unstructured, but at least some are accessible while the vast majority is simply hidden altogether going unseen and unused. This is what we call dark data. The growing flow of machine and sensor data generated by the Internet of Things and the massive stores of raw data found in the unexplored depths of the deep web, all comprise dark data.

It is clear that the majority of all this data that is created is dark unstructured data. Dark data was a concept coined by the IT consulting firm Gartner which defined it as data assets organisations collect, process and store during normal business activities but commonly fail to apply for other purposes.

#opinions #dark data #data-science

What Is Dark Data Within An Organisation?
1.10 GEEK