I often see articles or posts that identify data integration or preparation as the key issues facing data science projects. This always puzzles me as this is not our lived experience - not what we see when we work with Fortune 500 companies adopting predictive analytics, machine learning, or AI. But I think I have figured it out. The problem is as follows:
What data scientists think counts as a “data science project” is not, in fact, a data science project.
Let me illustrate this with some data from a great study. Back in 2016, the Economist Information Unit did a survey on “Broken links: Why analytics investments have yet to pay off” and below, you see how this data appears to support the argument that data problems are #1.
Wow - pretty clear that Data integration/preparation is the biggest problem, with nearly twice as many projects reporting it as a problem as the next one.
#2020 sep tutorials # overviews #business #data science #mistakes #problem definition