Concepts and relations explored by mathematicians with some application in mind — turn out decades later to be the unexpected solutions to problems they initially never imagined. Riemann’s geometry discovered only for pure reason- with absolutely no application in mind, later was used by Einstein to explain space-time fabric and general relativity.

In Reinforcement Learning (RL) an agent seeks an optimal policy for a sequential decision making problem. The common approach to reinforcement learning which models the expectation of this return, or value. But the recent advances in RL which comes under the banner of “Distributional RL” focuses on the distribution of the random return R received by an agent. State-action values can be explicitly treated as a random variable Z whose expectation is the value Q

Image for post

Eq1: Normal Bellman operator B

Normal Bellman operator (Eq-1)plays a crucial role in approximating the Q values by iteratively minimise the L-square distance between Q and BQ (TD-learning).

Image for post

Eq2: Distributional Bellman operator Ⲧπ

Similarly Distributional Bellman operator Ⲧπ approximatesthe Z values by iteratively minimise the DISTANCE between Z and ⲦπZ**.**

Z and** Ⲧπ**Z are not vectors instead they are distributions, how does one calculate distance between 2 different probability distributions? The answers can be many (KL, DL metrics etc) but we are particularly interested in Wasserstein Distance.

#machine-learning #probability-distributions #probability

Wasserstein Distance, Contraction Mapping, and Modern RL Theory.
1.40 GEEK