Machine Learning is being used in countless applications today. It is a natural fit in domains where there is no single algorithm that works perfectly, and there is a large amount of unseen data that the algorithm needs to do a good job predicting the right output. Unlike traditional algorithm problems where we expect exact optimal answers, machine learning applications can tolerate approximate answers. Deep Learning with neural networks has been the dominant methodology of training new machine learning models for the past decade. Its rise to prominence is often attributed to the ImageNet [1] competition in 2012. That year, a University of Toronto team submitted a deep convolutional network (AlexNet [2], named after the lead developer Alex Krizhevsky), performing 41% better than the next best submission.

Deep and convolutional networks had been tried prior to this but somehow never delivered on the promise. Convolutional Layers were first proposed by LeCun et al. in the 90s [3]. Likewise, several neural networks had been proposed in the 80s, 90s, and so on. What took so long for deep networks to outperform hand-tuned feature-engineered models?

What was different this time around was a combination of multiple things:

  1. Compute: AlexNet was one of the earlier models to rely on Graphics Processing Units (GPUs) for training.
  2. Algorithms: A critical fix was that the activation function used ReLU. This allows the gradient to back-propagate deeper. Previous iterations of deep networks used sigmoid or the tanh activation functions, which saturate at either 1.0 or -1.0 except a very small range of input. As a result, changing the input variable leads to a very tiny gradient (if any), and when there are a large number of layers, the gradient essentially vanishes.
  3. Data: ImageNet has > 1M images over 1000 classes. With the advent of internet-based products, collecting labeled data from user actions also became cheaper.

Rapid Growth of Deep Learning Models

Efficient Deep Learning

References

#overviews #deep learning #machine learning

High Performance Deep Learning, Part 1
1.20 GEEK