Machine Learning with SQL

Python (and soon JavaScript with TensorFlow.js) is a dominant language for Machine Learning. What about SQL? There is a way to build/run Machine Learning models in SQL. There could be a benefit to run model training close to the database, where data stays. With SQL we can leverage strong data analysis out of the box and run algorithms without fetching data to the outside world (which could be an expensive operation in terms of performance, especially with large datasets). This post is to describe how to do Machine Learning in the database with SQL.

I will be using Oracle autonomous DB running in Oracle Cloud Free Tier. Oracle DB comes with out of the box support for Machine Learning. Recently Oracle came up with Oracle Cloud Free Tier, which includes the database. This means we can run it for free, for production too.

Setup

Once you are in Oracle Cloud console, go to Autonomous Data Warehouse, there you can create Always Free DB instance with ML functionality included:

Machine Learning with SQL

To be able to access ML environment, make sure to set up ML user. To do this, go to DB instance in Cloud console and select Service Console option:

Machine Learning with SQL

Then go to the administration section and select Manage Oracle ML Users option:

Machine Learning with SQL

There you can create a new user (with this user we will log in into Oracle ML notebook). This is DB user, it means DB schema with the same name will be created and we can login directly into DB with the same credentials:

Machine Learning with SQL

To be able to login directly through SQL Developer or another client (useful, when need to do data pre-processing), select DB Connection option and download client credentials:

Machine Learning with SQL

Connect to DB with SQL Developer and create table PIMA_INDIANS_DIABETES (read more about Pima Indians Diabetes dataset here). Populate the table with data by running SQL script from my GitHub repo for this post:

Machine Learning with SQL

Let’s open Oracle ML notebook. Go back to Oracle Cloud console for Autonomous Data Warehouse and select Development section. Under this section, click on Oracle ML SQL Notebooks:

Machine Learning with SQL

In the next step, it will ask for username/password. Here you should provide the same ML user info you created in few steps before (I’m using user REDSAM):

Machine Learning with SQL

Machine Learning

Oracle Machine Learning home page. Click on Notebooks, this will navigate to the form to create a new notebook:

Machine Learning with SQL

Create a new notebook — Pima Indians Diabetes:

Machine Learning with SQL

Go to notebook. Within notebook you can access the same DB schema, like the one you were connecting from DB client and uploading data. Technically you can build ML model directly from SQL client. Oracle ML notebook provides a better user experience though.

In the first step, let’s fetch data from the table, to make sure we can access DB schema and see the data:

Machine Learning with SQL

Next, create DB tables for test and train datasets, you can fetch data directly into these tables from the main table:

Machine Learning with SQL

Before running the training and building model, it is recommended to clean up previously trained model, if any:

Machine Learning with SQL

Here we train the model, using a decision tree algorithm. Point to the table with training data and run PL/SQL function — CREATE_MODEL, the model will be built behind the scenes for you:

Machine Learning with SQL

Model metadata is stored in generated DB tables:

Machine Learning with SQL

Add the prediction column to the test table, this is where prediction results will be stored:

Machine Learning with SQL

Run prediction function with the trained model, results will be stored in the prediction column:

Machine Learning with SQL

We can calculate prediction accuracy, by comparing the predicted value with actual value in test dataset:

Machine Learning with SQL

Confusion matrix — this helps to understand model quality quickly:

Machine Learning with SQL

Oracle Machine Learning SQL notebook and data are available on my GitHub repo.

#machine-learning #sql

Machine Learning with SQL
9.25 GEEK