Introduction
The focus of this paper is to propose a seamless transition from a Shared Memory Model to a Distributed Memory Model when developing Machine Learning models. But let’s first discuss why this transition is so important and why it should be made in the first place. Most of us are only familiar with the Shared Memory Model. This is the model where you only have to deal with one machine, and you can pass variables freely among computing threads because all the cores in this machine have access to the same memory, thus the name “shared memory”. Now, consider the Distributed Memory Model (DMM). In the distributed paradigm you have to be aware that the computing threads live in different machines, you often need to know their network addresses, and you also have to know how to move data among them. You have to consider data serialization, machine native numerical formats, and so forth. So, DDM is clearly more complex, and one can pose the question: why go there if it is so much harder?
The simple answer to this question is that, although the Shared Memory Model is much easier for developers, it comes with limitations. You have the advantage of the single machine abstraction, but you are also limited to the number of cores and amount of memory of a single machine. What if your model grows larger than the RAM memory currently available in single machines? What if your model is computationally bound and requires hundreds or thousands of cores? This is when you enter the realm of scalable machine learning, this is when you need to endure the complexity of a distributed memory model in order to reap the benefits of unlimited scalability.

#scalability #gpu #tensorflow #tpu #machine-learning

Scalable Machine Learning with Tensorflow 2.X
1.15 GEEK