This is the first article in my MLflow tutorial series:
MLflow is an open-source platform for machine learning lifecycle management. Recently, I set up MLflow in production with a Postgres database as a Tracking Server and SFTP for the transfer of artifacts over the network. It took me about 2 weeks to get all the components right but this post would help you setup of MLflow in a production environment in about 10 minutes.
Tracking Server stores the metadata that you see in the MLflow UI. First, let’s create a new Conda environment:
conda create -n mlflow_env
conda activate mlflow_env
Install the MLflow and PySFTP libraries:
conda install python
pip install mlflow
pip install pysftp
Our Tracking Server uses a Postgres database as a backend for storing the metadata. So let’s install PostgreSQL:
apt-get install postgresql postgresql-contrib postgresql-server-dev-all
Next, we will create the admin user and a database for the Tracking Server
sudo -u postgres psql
In the psql console:
CREATE DATABASE mlflow_db;
CREATE USER mlflow_user WITH ENCRYPTED PASSWORD 'mlflow';
GRANT ALL PRIVILEGES ON DATABASE mlflow_db TO mlflow_user;
As we’ll need to interact with Postgres from Python, it is needed to install the psycopg2 library. However, to ensure a successful installation we need to install the GCC Linux package before:
sudo apt install gcc
pip install psycopg2-binary
If you would like to connect to the PostgreSQL Server remotely or would like to give its access to the users. You can
cd /var/lib/pgsql/data
Then add the following line at the end of the postgresql.conf file.
listen_addresses = '*'
You can then specify a remote IP from which you want to allow connection to the PostgreSQL Server, by adding the following line at the end of the pg_hba.conf file
#mlflow #postgres #deep-learning #machine-learning #computer-vision #deep learning