I often see articles or posts that identify data integration or preparation as the key issues facing data science projects. This always puzzles me as this is not our lived experience - not what we see when we work with Fortune 500 companies adopting predictive analytics, machine learning, or AI. But I think I have figured it out. The problem is as follows:

What data scientists think counts as a “data science project” is not, in fact, a data science project.

Let me illustrate this with some data from a great study. Back in 2016, the Economist Information Unit did a survey on “Broken links: Why analytics investments have yet to pay off” and below, you see how this data appears to support the argument that data problems are #1.

Wow - pretty clear that Data integration/preparation is the biggest problem, with nearly twice as many projects reporting it as a problem as the next one.

#2020 sep tutorials # overviews #business #data science #mistakes #problem definition

Data Scientists think data is their problem. Here’s why they’re wrong
1.10 GEEK