This blog post accompanies a talk I gave at AWS re:Invent 2020, in which I described some of the ways in which my team at Mobileye, (officially known as Mobileye, an Intel Company), uses Amazon SageMaker Debugger in its daily DNN development.
Monitoring the Learning Process
A critical part of training machine learning models, and particularly deep neural networks (DNNs), is monitoring one’s learning process. (This is sometimes called babysitting one’s learning process.) Monitoring the learning process, refers to the art of tracking different metrics during training, in order to evaluate how the training is proceeding, and, determine what hyper-parameters to tune in order to improve the training.
On our team, we track a broad range of metrics, which can be broadly divided into three categories:
#debugging #performance #sagemaker #tensorflow #machine-learning