Distributional Reinforcement Learning

Motivation

Value-based reinforcement learning methods like DQN try to model the expectation of total returns, or value.

That is, the value of an action a in state s describes the expected return, or discounted sum of rewards, obtained from beginning in that state, choosing action a, and subsequently following a prescribed policy.

All the state transitions, actions, and rewards, that are used to calculate the value or long-term return can induce randomness, if sampled probabilistically. This makes it useful to display the returns in a distribution: value distribution. The distribution of the random return received by a reinforcement learning agent.

However, traditional value-based reinforcement learning algorithms like DQN average over randomness to estimate the value.

Distributional reinforcement learning methods model this distribution over returns explicitly instead of only estimating the mean. This can lead to more insights and knowledge for the agent. And indeed results show the usefulness of modeling the value distribution in and of itself. Leading to a much faster and more stable learning of the agent.

Over the past years, since starting to develop these new distributional reinforcement learning algorithms, constant progress has been done. Which led to ever-improving algorithms.

Image for post

Progress and Evolution of distribution RL Algorithms over time

In this and another upcoming article, I want to write about the four distributional RL algorithms C51, QR-DQN, IQN, and FQF. Additionally to the articles, I will add my own implication from scratch for each algorithm. Those will be referenced in the text or can be found on my GitHub.

#reinforcement-learning #machine-learning #deep-learning #dqn #artificial-intelligence #deep learning

Motivation

medium.com

Distributional Reinforcement Learning