Differentiable Programming from Scratch

Last year I had the great opportunity to attend a talk with Yann Lecun, at the Facebook Artificial Intelligence Research Centre in Paris.

As a math enthusiast, what struck me during his talk is its rethinking of Deep Learning as Differentiable Programming. Yann Lecun has detailed his thoughts in a Facebook post:

An increasingly large number of people are defining the networks procedurally in a data-dependent way (with loops and conditionals), allowing them to change dynamically as a function of the input data fed to them. It’s really very much like a regular progam, except it’s parameterized, automatically differentiated, and trainable/optimizable.

_- _Yann Lecun, Director of the FAIR

The overall goal of Differentiable Programming is to compute:

Image for post

Why do we need to be able to differentiate a program?

i.e to quantify the sensibility of the program and its outputs with respect to some of its parameters.

In this article, we are going to explain what Differentiable Programming is by developing from scratch all the tools needed for this exciting new kind of programming.

The big picture

Let’s start with a simple toy example, showing how we’d write a program to estimate the duration of a taxi ride using traditional, data-independent code:

import numpy as np

	def linear_predictions(weights, inputs):
	    ## y = weights[0] inputs[0] + weights[1] * inputs[1]
	    ## where inputs[0] = 1.0
	    return np.dot(inputs, weights) * 60.0 ## minutes

	v_avg = 30 ## km/h
	startup_time = 2 /60.0 ## hours

	inputs = np.array([[1.0, 6.0],
	                   [1.0, 4.0 ]])

	weights = np.array([startup_time, 1.0 / v_avg]) ## Program params are estimaed by experience, stats analysis, Least Square Error, ...

	print("Predictions:", linear_predictions(weights, inputs))

In this code, we compute the taxi trip duration using precomputed average speed in NYC: around 30km/h. This is the legacy way to make a program, i.e. data does not influence its parameters.

We use predefined parameters, here the average speed estimated by an expert, multiply the inverse of this speed by the trip distance, and we get the expected travel duration.

No matter how many times we run it, it will never improve. It will never learn from its error.

What Differentiable Programming offers is exactly the opposite: each run can be used to fine-tune the application parameters. Let’s see how this is accomplished.

#mathematics #optimization #math #data-science #differentiable

towardsdatascience.com

Differentiable Programming from Scratch