This is the second post devoted to **Deep Q-Network (DQN), **in the Deep Reinforcement Learning Explained series, in which we will analyse some challenges that appear when we apply Deep Learning to Reinforcement Learning. We will also present in detail the code that solves the OpenAI Gym Pong game using the DQN network introduced in the previous post.

Challenges in Deep Reinforcement Learning

Unfortunately, reinforcement learning is more unstable when neural networks are used to represent the action-values, despite applying the wrappers introduced in the previous section. Training such a network requires a lot of data, but even then, it is not guaranteed to converge on the optimal value function. In fact, there are situations where the network weights can oscillate or diverge, due to the high correlation between actions and states.

In order to solve this, in this section we will introduce two techniques used by the Deep Q-Network:

  • Experience Replay
  • Target Network

There are many more tips and tricks that researchers have discovered to make DQN training more stable and efficient, and we will cover the best of them in future posts in this series.

Experience Replay

We are trying to approximate a complex, nonlinear function, Q(s, a), with a Neural Network. To do this, we must calculate targets using the Bellman equation and then consider that we have a supervised learning problem at hand. However, one of the fundamental requirements for SGD optimization is that the training data is independent and identically distributed and when the Agent interacts with the Environment, the sequence of experience tuples can be highly correlated. The naive Q-learning algorithm that learns from each of these experiences tuples in sequential order runs the risk of getting swayed by the effects of this correlation.

#reinforcement-learning #deep-r-l-explained #deep-learning #artificial-intelligence #towards-data-science #deep learning

Deep Q-Network (DQN)-II
2.65 GEEK