Policy Gradient REINFORCE Algorithm with Baseline

Policy gradient methods are very popular reinforcement learning(RL) algorithms. They are very useful in that they can directly model the policy, and they work in both discrete and continuous space. In this article, we will:

have a short overview of the underlying math of policy gradient;
implement the policy gradient REINFORCE algorithm in Tensorflow to play CartPole;
compare Policy Gradient and Deep Q Network(DQN)

I assume readers have an understanding of reinforcement learning basics. As a refresher, you can take a quick look at the first section of my previous post A Structural Overview of Reinforcement Learning Algorithms.

I have also implemented Deep Q-net (DQN) in Tensorflow to play CartPole previously. Check it out here if you are interested. :)

#reinforcement-learning #artificial-intelligence #policy-gradient #tensorflow

towardsdatascience.com

Policy Gradient REINFORCE Algorithm with Baseline