Bring Your TensorFlow Training to AWS Sagemaker with Script Mode

Introduction

I’m going to try and keep this article simple. First, let’s start with some pros and cons of this method.

Image for post

PROS:

Allows you to run your training jobs at lower cost! Why? Because you can launch a very cheap instance just for coding; and then you can have a separate GPU instance that launches only during training, and then shuts downs once your training code is done running. This means you’re only consuming GPU time during model training, instead of while you’re coding!

CONS:

Everything is a trade-off, and rarely does a new advantage come without a new disadvantage. In this case, the disadvantage is that we need to learn some new skills. So say you’re running a local training job that you want to bring to Sagemaker (maybe you want to run it on a bigger GPU), and you want to take advantage of this method to reduce costs, well you’re going to need to write a bit more code to get this method to work. You also might need to understand the wiring and how things are working all together.

You could just run your training in a notebook instance with fewer code changes, but your job won’t auto-shutdown (since you’re doing everything in the notebook instance). So it’s a trade-off. To just run your job in a notebook instance, see the link in step #1 coming up.

#cloud-computing #sagemaker #artificial-intelligence #machine-learning #deep-learning #tensorflow

Introduction

towardsdatascience.com

Bring Your TensorFlow Training to AWS Sagemaker with Script Mode