Cruise machine learning platform team worked with Google CMLE team together to enable distributed Tensorflow model training with Horovod in 2019. We will present the work we have done and the learning around training performance analysis, fault tolerant, monitoring and cost

Subscribe to the channel https://www.youtube.com/watch?v=I29_VZ82KW4

#tensorflow #ai

Distributed TensorFlow model training on Cloud AI Platform
1.75 GEEK