Cloud processing is now simpler and cheaper! A *very simple* and *cheap* way to run/distribute your *existing* processing/training code on the cloud

It happened to me, and I’m sure it’s happening to you and to many many data scientists, who work on their small/medium size project out there:

You’ve invested a lot in your own training pipeline (pre-processing -> training -> testing), tried it locally a few times using different parameters, and it seems to be great. But… you realize you need much more RAM/CPU/GPU/GPU memory or just all of them together to be able to get the most of out of it?

It can happen for many reasons —

  • The training takes too much time with your local setup
  • You need the batch size to be larger, and it can’t fit in your local GPU memory
  • You’d like to tune the hyperparameters, so many training runs are required
  • You’d like to move some of the preprocessing steps to be done during training, e.g. to save disk space / loading time, and the CPU / RAM can’t make it

So, _theoretically, _you have everything you need, but you just need to run it on a better HW… Should be a non-issue today, shouldn’t it?

Existing solutions

Well, there’re indeed many solutions out there, here’s a a few related technologies / platforms / solutions:


  1. Apache Airflow —”a platform … to programmatically author, schedule and monitor workflows”
  2. Ray — “fast and simple distributed computing”

Cloud providers AI solutions

  1. Kubeflow — “the machine learning toolkit for kubernetes” (pipelines)
  2. GCP AI Platform — ”one platform to build, deploy, and manage machine learning models” (trainingpipelinesdistributed PyTorch, Distributed TensorFlow)
  3. Azure Machine Learning — “enterprise-grade machine learning service to build and deploy models faster” (training)
  4. AWS Sagemaker — “Machine learning for every developer and data scientist” (trainingDistributed PyTorchDistributed TensorFlow)

