Data cleaning is finally being automated

Data cleaning is finally being automated

Ihab Ilyas on the TDS podcast. Editor’s note: The Towards Data Science podcast’s “Climbing the Data Science Ladder” series is hosted by Jeremie Harris.

Editor’s note: The Towards Data Science podcast’s “Climbing the Data Science Ladder” series is hosted by Jeremie Harris. Jeremie helps run a data science mentorship startup called [SharpestMinds_](http://sharpestminds.com/). You can listen to the podcast below:_

It’s cliché to say that data cleaning accounts for 80% of a data scientist’s job, but it’s directionally true.

That’s too bad, because fun things like data exploration, visualization and modelling are the reason most people get into data science. So it’s a good thing that there’s a major push underway in industry to automate data cleaning as much as possible.

One of the leaders of that effort is Ihab Ilyas, a professor at the University of Waterloo and founder of two companies, Tamr and Inductiv, both of which are focused on the early stages of the data science lifecycle: data cleaning and data integration. Ihab knows an awful lot about data cleaning and data engineering, and has some really great insights to share about the future direction of the space — including what work is left for data scientists, once you automate away data cleaning.

Here were some of my biggest takeaways from the conversation:

  • Data cleaning involves a lot of things, one of which is dealing with missing values. Historically, missing values have often been filled in manually by subject matter experts who can make educated guesses about the data, but automated techniques can work well (and usually do better) at scale.
  • These automated strategies can range from fairly naive approaches (e.g. replacing a value with the median or average value of other points in the dataset), to more sophisticated techniques (e.g. using a predictive model to guess at missing values).

data-science podcast data-engineering tds-podcast towards-data-science data science

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

What Are The Advantages and Disadvantages of Data Science?

Online Data Science Training in Noida at CETPA, best institute in India for Data Science Online Course and Certification. Call now at 9911417779 to avail 50% discount.

Is data science merging with data engineering?

Kenny Ning on the TDS podcast. Editor’s note: The Towards Data Science podcast’s “Climbing the Data Science Ladder” series is hosted by Jeremie Harris.

50 Data Science Jobs That Opened Just Last Week

Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments. Our latest survey report suggests that as the overall Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments, data scientists and AI practitioners should be aware of the skills and tools that the broader community is working on. A good grip in these skills will further help data science enthusiasts to get the best jobs that various industries in their data science functions are offering.

The brave new world of data engineering

Editor’s note: The Towards Data Science podcast’s “Climbing the Data Science Ladder” series is hosted by Jeremie Harris. Jeremie helps run a data science mentorship startup called SharpestMinds.

Intro to Data Engineering for Data Scientists

Intro to Data Engineering for Data Scientists: An overview of data infrastructure which is frequently asked during interviews