Introduction

Data science, machine learning, artificial intelligence, those terms are all over the news. They get everyone excited with the promises of automation, new savings or higher earnings, new features, markets or techniques. Some of those promises are well-founded, while others are still in inception or haven’t passed the proof-of-concept stage (another way to say they’re just at the wet-dreams stage).

There have been major improvements in techniques we use to extract, transform and load data. New and refined algorithms or techniques such as PCA, hyperparameter optimization, and designs, such as Neural Network, have brought improvements in outcomes. But there’s that one aspect that doesn’t get enough attention, the villain little duck. If you’re accustomed to working with data you might have already guessed it. If not, you’ll find out next. Let’s dive in.

The Unloved One In Data Science

At the heart of everything in business and research, aside from money, is data. Data is the new oil, or the new electricity depends on who you ask. A key asset. Computers make it easy to collect, share and analyse, it’s now a strategic asset.

But there’s an aspect about that that isn’t sufficiently discussed, it’s its quality. Quantity, whether Big Data or small data, doesn’t matter if the quality of the data is poor.

Garbage in, garbage out

No matter how good is your data pipeline, your cleaning and training/testing models, no matter your hypothesis, or the complexity of your algorithm, nothing valuable will result from your work if your data isn’t good or is of poor quality. That’s the famous “garbage in, garbage out”. You can’t bake a good cake with ripe ingredients.

This flow provides another way to look at data quality:

Data Quality → Information Quality → Decision Quality → Business Outcome

#data-science #data #data-governance #memes #business

The One Component That Deserves More Attention in Data Science
1.15 GEEK