Cloud processing is now simpler and cheaper!

Cloud processing is now simpler and cheaper!

Cloud processing is now simpler and cheaper! A *very simple* and *cheap* way to run/distribute your *existing* processing/training code on the cloud

It happened to me, and I’m sure it’s happening to you and to many many data scientists, who work on their small/medium size project out there:

You’ve invested a lot in your own training pipeline (pre-processing -> training -> testing), tried it locally a few times using different parameters, and it seems to be great. But… you realize you need much more RAM/CPU/GPU/GPU memory or just all of them together to be able to get the most of out of it?

It can happen for many reasons —

  • The training takes too much time with your local setup
  • You need the batch size to be larger, and it can’t fit in your local GPU memory
  • You’d like to tune the hyperparameters, so many training runs are required
  • You’d like to move some of the preprocessing steps to be done during training, e.g. to save disk space / loading time, and the CPU / RAM can’t make it

So, _theoretically, _you have everything you need, but you just need to run it on a better HW… Should be a non-issue today, shouldn’t it?

Existing solutions

Well, there’re indeed many solutions out there, here’s a a few related technologies / platforms / solutions:

General

  1. Apache Airflow —”a platform … to programmatically author, schedule and monitor workflows”
  2. Ray — “fast and simple distributed computing”

Cloud providers AI solutions

  1. Kubeflow — “the machine learning toolkit for kubernetes” (pipelines)
  2. GCP AI Platform — ”one platform to build, deploy, and manage machine learning models” (trainingpipelinesdistributed PyTorch, Distributed TensorFlow)
  3. Azure Machine Learning — “enterprise-grade machine learning service to build and deploy models faster” (training)
  4. AWS Sagemaker — “Machine learning for every developer and data scientist” (trainingDistributed PyTorchDistributed TensorFlow)

aws-sagemaker distributed-computing data-science deep-learning pytorch

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Distributed Deep Learning with Ansible, AWS and Pytorch Lightning.

How to automate and scale your deep learning experiments with Ansible, AWS cloud infrastructure and Pytorch Lightning library.

PyTorch for Deep Learning | Data Science | Machine Learning | Python

PyTorch for Deep Learning | Data Science | Machine Learning | Python. PyTorch is a library in Python which provides tools to build deep learning models. What python does for programming PyTorch does for deep learning. Python is a very flexible language for programming and just like python, the PyTorch library provides flexible tools for deep learning.

Single line distributed PyTorch training on AWS SageMaker

How to iterating faster on your data science project, and let your brilliant idea to see the light of day. In this post, I’d like to show you how easy (and cheap, if you want) it is to distribute existing distribution-ready PyTorch training code on AWS SageMaker using simple-sagemaker.

Applications Of Data Science On 3D Imagery Data

The agenda of the talk included an introduction to 3D data, its applications and case studies, 3D data alignment and more.

Deep Learning — not only for the big ones

How you can use Deep Learning even for small datasets. When you’re working on Deep Learning algorithms you almost always require a large volume of data to train your model on.