One of the known truths of the Machine Learning(ML) world is that it takes a lot longer to deploy ML models to production than to develop it.¹
The problem of deploying ML models to production is well known. Modern software requires a variety of crucial properties such as on-demand scaling and high availability. As a result, it might take a lot of effort and time to correctly deploy models into productions.
Let’s discuss some different options you have when it comes to deploying ML models. Variants are provided in order from the most general to ML-specific.
The most direct way to deploy anything is to rent a VM, wrap a model into some kind of a server and leave it running. While being extremely straightforward and customizable this method has numerous drawbacks such as hard integration into CI/CD pipelines and isolation problems.
It is possible to deploy ML models in Docker containers using Kubernetes or similar orchestration tools. This option provides way more quality of life improvements. Models can be easily wrapped into specially designed servers such as NVIDIA Triton or Tensorflow Runtime (works for VM option as well). Now it is even easier to chain models together using highly sophisticated frameworks such as Kubeflow.
However, customizability comes at a cost of DevOps complexity and a requirement to maintain technologies that make your model run.
#devops #kubeflow #serverless #cloud