Backpropagation made easy. Backpropagation is so basic in machine learning yet seems so daunting. But actually, it is easier than it seems.

It doesn't take a math genius to learn Machine Learning (ML). Basically, all you need is college first-year level calculus, linear algebra, and probability theory, and you are good to go. But behind the seemingly-benign first impression of ML, there are a lot of mathematical theories related to ML. For many people, the first real obstacle in learning ML is back-propagation (BP). It is the method we use to deduce the gradient of parameters in a neural network (NN). It is a necessary step in the Gradient Descent algorithm to train a model.

BP is a very basic step in any NN training. It involves chain rule and matrix multiplication. However, the way BP is introduced in many ML courses or tutorials is not satisfactory. When I was first learning BP in Coursera’s Machine Learning class, I was so confused about its calculation process I paused for several months. Meanwhile, I searched for more explanation of BP. I managed to pass the course. I finished the coding assignment. But BP still remains a very messy and confusing blur in my brain.

It doesn’t really hurt if you don’t understand BP at all and simply regard it as a black-box, because Tensorflow or Pytorch can automatically perform BP for you. But recently I was reviewing my notes on ML, and I start to properly understand BP. My method is to set up a simple NN and write down every parameter and variable matrix/vector explicitly and write down the gradient calculation through chain rule for each parameter matrix/vector step by step. At the end of the day, BP turns out to be so much easier than I originally thought.

