Abstract

Google Deepmind achieved human-level performance on 49 Atari games using the Arcade Learning Environment (ALE). This article describes the methods I used to reproduce this performance and discusses the efficiency of the mechanisms used by Deepmind and Open AI for injecting stochasticity in the ALE.

Introduction

As a side project, I spent some time trying to achieve the same Deepmind’s human-level performance on Breakout and Space Invaders. Although I understand there are many better performing architectures, my goal was to use the same network as the one presented in Deepmind’s 2015 nature paper (Human-level control through deep reinforcement learning). I did this to better understand the challenges and performances Deepmind experienced while playing some iconic games from my childhood. One of the challenges Deepmind experienced is particularly interesting to me: Are the environments stochastic or deterministic? Did Deepmind and Open AI fight deterministic waves of Space Invaders? In this paper we discuss the efficiency of the mechanisms used by Deepmind and Open AI for injecting stochasticity in the ALE.

The source code can be accessed at https://github.com/NicMaq/Reinforcement-Learning

This repository contains the code I used to support my conclusions as well as the data of my tensorboard runs to encourage discussions and facilitate comparisons.

Additionally, for readers who want to learn how my algorithm works, I published Breakout explained and e-greedy and softmax explained. These are two Google Colab notebooks where I explain expected sarsa and the implementation of the two policies, e-greedy and softmax.

#openai-gym #tensorflow #atari #reinforcement-learning #machine-learning

Are the space invaders deterministic or stochastic?
1.15 GEEK