In this article, you will learn what quantization is, why do we need quantization, different types of quantization, and then build a quantized aware training deep learning model in Tensorflow.
Photo by israel palacio on Unsplash
Quantization is the process of transforming a deep learning model parameters: weights, activations, and biases from a higher floating-point precision to a lower bit representation.
Quantizing weights, biases, and activations from float32 to uint8
Quantization helps with model compression and reduced latency.
Models size can be compressed by a factor of 4. If your TF core deep learning Model is 40MB in size, it can be reduced to 10MB_. _Reduction in model size makes the model light-weight, which reduces the amount of computation and required less memory resulting in reduced latencies.
Quantized models
Quantized models have reduced size and improved latency however there is a slight trade-off with the accuracy
You can currently apply two types of Quantization to your deep learning models.
#machine-learning #tensorflow #quantization #edge-computing #deep-learning