Machine Learning is being used in countless applications today. It is a natural fit in domains where there is no single algorithm that works perfectly, and there is a large amount of unseen data that the algorithm needs to do a good job predicting the right output. Unlike traditional algorithm problems where we expect exact optimal answers, machine learning applications can tolerate approximate answers. Deep Learning with neural networks has been the dominant methodology of training new machine learning models for the past decade. Its rise to prominence is often attributed to the ImageNet [1] competition in 2012. That year, a University of Toronto team submitted a deep convolutional network (AlexNet [2], named after the lead developer Alex Krizhevsky), performing 41% better than the next best submission.
Deep and convolutional networks had been tried prior to this but somehow never delivered on the promise. Convolutional Layers were first proposed by LeCun et al. in the 90s [3]. Likewise, several neural networks had been proposed in the 80s, 90s, and so on. What took so long for deep networks to outperform hand-tuned feature-engineered models?
What was different this time around was a combination of multiple things:
#overviews #deep learning #machine learning