Machine Learning with SQL : A Complete Guide to Regression

Machine Learning with SQL : A Complete Guide to Regression

We'll explore how to perform Machine Learning with SQL. To be more precise, we'll explore how to apply regression modeling to the well-known housing prices dataset. It seems like everybody knows how to train predictive models in languages like Python, R, and Julia. But what about SQL? How can you leverage the power of a well-known database language for machine learning?

In this video, we'll explore how to perform Machine Learning with SQL. To be more precise, we'll explore how to apply regression modeling to the well-known housing prices dataset.

DATASET: https://www.betterdatascience.com/wp-content/uploads/2020/11/HousingPrices.csv

CODE SNIPPETS:


It seems like everybody knows how to train predictive models in languages like Python, R, and Julia. But what about SQL? How can you leverage the power of a well-known database language for machine learning?

SQL has been around for decades, but still isn’t recognized as a language for machine learning. Sure, I’d pick Python any day of the week, but sometimes in-database machine learning is the only option.

We’ll use Oracle Cloud for this article. It’s free, so please register and create an instance of the OLTP database (Version 19c, has 0.2TB of storage). Once done, establish a connection through the SQL Developer Web — or any other tool.

We’ll use a slightly modified version of a well-known Housing prices dataset — you can download it here. I’ve chosen this dataset because it doesn’t require any extensive preparation, so the focus can immediately be shifted to predictive modeling.

The article is structured as follows:

  • Dataset loading and preparation
  • Model settings
  • Model training and evaluation
  • Conclusion

Dataset loading and preparation

If you are following along, I hope you’ve downloaded the dataset already. You need to load it into the database with a tool of your choice — I’m using SQL Developer Web, a free tool provided by Oracle Cloud.

The loading process is straightforward — click on the upload button, choose the CSV file, and click Next a couple of time:

Image for post

Image 1 — Dataset loading with Oracle SQL Developer Web (image by author)

Mine is now stored in the housingprices table. Here’s how the first couple of rows look like:

Image for post

Image 2 — First 10 rows of the loaded dataset (image by author)

Oracle Machine Learning (OML) is a bit strange when it comes to creating models. Your data table must contain a column with an ID — numbers generated from a sequence. Our dataset doesn’t have that by default, so let’s add it manually:

CREATE SEQUENCE seq_housingprices;
ALTER TABLE housingprices ADD (
    row_id INTEGER DEFAULT seq_housingprices.NEXTVAL
);

Image for post

Image 3 — Added a column for row ID (image by author)

There’s only one thing left to do in this section: the train/test split. In SQL, that’s done by creating two views. The first view has 75% of the data randomly sampled, and the second one has the remaining 25%. The percentages were chosen arbitrarily:

BEGIN
    EXECUTE IMMEDIATE 
        'CREATE OR REPLACE VIEW 
            housing_train_data AS 
            SELECT * FROM housingprices 
            SAMPLE (75) SEED (42)';
    EXECUTE IMMEDIATE 
        'CREATE OR REPLACE VIEW 
            housing_test_data AS 
            SELECT * FROM housingprices 
            MINUS 
            SELECT * FROM housing_train_data';
END;
/

We now have two views created — housing_train_data for training and housing_test_data for testing. There’s still one thing left to do before model training: to specify settings for the model. Let’s do that in the next section.

Model settings

Oracle uses this settings table style for training predictive models. The settings table is made of two columns — one for the name and the other for the value.

Here’s how to create the table:

CREATE TABLE housing_model_settings(
    setting_name VARCHAR2(30),
      setting_value VARCHAR2(4000)
);

The following information will be stored in this table:

  • Type of model — let’s use a Generalized Linear Model (GLM)
  • Diagnostics table name — for model statistics
  • Data preparation technique — let’s go with the automatic
  • Feature selection — disabled or enabled, let’s go with enabled
  • Feature generation — same as with feature selection

Yes, you are reading this right — all of that is done automatically without the need for your assistance. Let’s fill the settings table next:

BEGIN
    INSERT INTO housing_model_settings VALUES(DBMS_DATA_MINING.ALGO_NAME, DBMS_DATA_MINING.ALGO_GENERALIZED_LINEAR_MODEL);

    -- Row diagnostic statistics
    INSERT INTO housing_model_settings VALUES(DBMS_DATA_MINING.GLMS_DIAGNOSTICS_TABLE_NAME, 'GLMS_HOUSING_DIAG');

    -- Auto data preparation
    INSERT INTO housing_model_settings VALUES(DBMS_DATA_MINING.PREP_AUTO, DBMS_DATA_MINING.PREP_AUTO_ON);

    -- Feature selection
    INSERT INTO housing_model_settings VALUES(DBMS_DATA_MINING.GLMS_FTR_SELECTION, DBMS_DATA_MINING.GLMS_FTR_SELECTION_ENABLE);

    -- Feature generation
    INSERT INTO housing_model_settings VALUES(DBMS_DATA_MINING.GLMS_FTR_GENERATION, DBMS_DATA_MINING.GLMS_FTR_GENERATION_ENABLE);
END;
/

Full Article: https://towardsdatascience.com/machine-learning-with-sql-a-complete-guide-to-regression-b0e5e2db22da

sql data-science machine-learning developer

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Introduction to Structured Query Language SQL pdf

SQL stands for Structured Query Language. SQL is a scripting language expected to store, control, and inquiry information put away in social databases. The main manifestation of SQL showed up in 1974, when a gathering in IBM built up the principal model of a social database. The primary business social database was discharged by Relational Software later turning out to be Oracle.

Hire Machine Learning Developers in India

We supply you with world class machine learning experts / ML Developers with years of domain experience who can add more value to your business.

Most popular Data Science and Machine Learning courses — July 2020

Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant

15 Machine Learning and Data Science Project Ideas with Datasets

Learning is a new fun in the field of Machine Learning and Data Science. In this article, we’ll be discussing 15 machine learning and data science projects.

SQL for Data Science - Learn SQL Basics for Data Science

This blog on SQL for Data Science is structured to help Data Scientists understand the importance and gain the learning of databases and data management.