Let’s Impute Missing Values with SQL

Let’s Impute Missing Values with SQL

Automated and easy-to-understand script to get the job done. Missing values are a huge problem in machine learning. In a day and age when machine learning can be done directly in the database, one wonders how to perform adequate data preparation with SQL, without other programming languages, such as Python and R. Today we’ll see just how easy it is.

Missing values are a huge problem in machine learning. In a day and age when machine learning can be done directly in the database, one wonders how to perform adequate data preparation with SQL, without other programming languages, such as Python and R. Today we’ll see just how easy it is.

We’ll use Oracle Cloud for the purpose of this article, as it’s free and can be used without any downloads and installations on your machine — through the SQL Developer Web. If you decide to follow along, create a free OLTP database, and go to Service Console — Development — SQL Developer Web.

With regards to the dataset, we’ll use the well-known Titanic dataset for two reasons:

  • It’s simple and easy to understand
  • It contains enough missing values for us to play with

Once you have the dataset downloaded, you can use the Upload Data _functionality of _SQL Developer Web to create the table and upload data:

Image for post

Change data types using your best judgment and you’re ready to roll!


Preparation and exploration

I don’t want to mess anything up with the source table, called titanic, so let’s make a copy of it:

CREATE TABLE cp_titanic AS 
SELECT * FROM titanic;

Let’s just make a quick Select to verify everything is as it should be:

SELECT * FROM cp_titanic;

towards-data-science machine-learning sql data-science programming

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Introduction to Structured Query Language SQL pdf

SQL stands for Structured Query Language. SQL is a scripting language expected to store, control, and inquiry information put away in social databases. The main manifestation of SQL showed up in 1974, when a gathering in IBM built up the principal model of a social database. The primary business social database was discharged by Relational Software later turning out to be Oracle.

15 Machine Learning and Data Science Project Ideas with Datasets

Learning is a new fun in the field of Machine Learning and Data Science. In this article, we’ll be discussing 15 machine learning and data science projects.

Learn Programming, Software Engineering, Machine Learning, And More

Best Free Resources to Learn Programming, Software Engineering, Machine Learning, And More All you need to learn. Do you know that you can take the courses from MIT, Stanford.

Most popular Data Science and Machine Learning courses — July 2020

Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant

Pipelines in Machine Learning | Data Science | Machine Learning | Python

Machine Learning Pipelines performs a complete workflow with an ordered sequence of the process involved in a Machine Learning task. The Pipelines can also