Python (and soon JavaScript with TensorFlow.js) is a dominant language for Machine Learning. What about SQL? There is a way to build/run Machine Learning models in SQL. There could be a benefit to run model training close to the database, where data stays. With SQL we can leverage strong data analysis out of the box and run algorithms without fetching data to the outside world (which could be an expensive operation in terms of performance, especially with large datasets). This post is to describe how to do Machine Learning in the database with SQL.
I will be using Oracle autonomous DB running in Oracle Cloud Free Tier. Oracle DB comes with out of the box support for Machine Learning. Recently Oracle came up with Oracle Cloud Free Tier, which includes the database. This means we can run it for free, for production too.
Once you are in Oracle Cloud console, go to Autonomous Data Warehouse, there you can create Always Free DB instance with ML functionality included:
To be able to access ML environment, make sure to set up ML user. To do this, go to DB instance in Cloud console and select Service Console option:
Then go to the administration section and select Manage Oracle ML Users option:
There you can create a new user (with this user we will log in into Oracle ML notebook). This is DB user, it means DB schema with the same name will be created and we can login directly into DB with the same credentials:
To be able to login directly through SQL Developer or another client (useful, when need to do data pre-processing), select DB Connection option and download client credentials:
Connect to DB with SQL Developer and create table PIMA_INDIANS_DIABETES (read more about Pima Indians Diabetes dataset here). Populate the table with data by running SQL script from my GitHub repo for this post:
Let’s open Oracle ML notebook. Go back to Oracle Cloud console for Autonomous Data Warehouse and select Development section. Under this section, click on Oracle ML SQL Notebooks:
In the next step, it will ask for username/password. Here you should provide the same ML user info you created in few steps before (I’m using user REDSAM):
Oracle Machine Learning home page. Click on Notebooks, this will navigate to the form to create a new notebook:
Create a new notebook — Pima Indians Diabetes:
Go to notebook. Within notebook you can access the same DB schema, like the one you were connecting from DB client and uploading data. Technically you can build ML model directly from SQL client. Oracle ML notebook provides a better user experience though.
In the first step, let’s fetch data from the table, to make sure we can access DB schema and see the data:
Next, create DB tables for test and train datasets, you can fetch data directly into these tables from the main table:
Before running the training and building model, it is recommended to clean up previously trained model, if any:
Here we train the model, using a decision tree algorithm. Point to the table with training data and run PL/SQL function — CREATE_MODEL, the model will be built behind the scenes for you:
Model metadata is stored in generated DB tables:
Add the prediction column to the test table, this is where prediction results will be stored:
Run prediction function with the trained model, results will be stored in the prediction column:
We can calculate prediction accuracy, by comparing the predicted value with actual value in test dataset:
Confusion matrix — this helps to understand model quality quickly:
Oracle Machine Learning SQL notebook and data are available on my GitHub repo.
#machine-learning #sql