Human vs. Machine — Reinforcement Learning in the Context of Snake

This blogpost elaborates on how to implement a reinforcement algorithm, which not only masters the game “Snake”, it even outperforms any human in a game with two players in one playing field.

The structure of the blogpost is as follows: First the theory of a special type of Reinforcement Learning is discussed, namely Q-Tables. This approach is then benchmarked against a pure Deep Learning approach to show the superiority of Reinforcement Learning for these kinds of problems. Secondly, we explore how to implement a second human-controlled snake next to the machine-controlled snake into the same playing field.

Reinforcement Learning: Basics of Q Tables

For the Snake to play the game itself, the Snake has to answer at every step the very same question: What to do when faced with conditions such as a certain position of its body, position of the snack and position of the walls which would end the game if touched by the Snake. The information used to describe the situation or state of the Snake is referred to as “state variables”. A visual example is made to illustrate the problem at hand in a different context. Imagine a fictional character who is dressed in one of the following four ways:

The way the character is dressed should be taken as a given and represents the state variables. Furthermore, the fictional character now has four possible destinations to go to next. The destinations look like this:

The question at hand is: at which of these four places would the fictional character experience the most pleasure, given its clothes? Answering this question might seem intuitive. Perhaps this experience of wearing too warm clothing (or even winter clothing) in a warm environment has already been made in the past, and the lesson from the discomfort experienced as a result has been learned. Furthermore, this combination of state variable and action has also already been evaluated. In the situation of winter clothes and the beach, the evaluated outcome was most definitely negative. This evaluation of an action given its state variable that we have experienced and learned from in our own pasts is exactly the point of reinforcement learning. Using past experiences to indicate whether a certain action is a good or bad decision is exactly how Q-Tables work in the context of reinforcement learning. It is crucial to notice that a certain action given the state was only possible because the situation of wearing winter clothes (state variables) and going to the beach (action) were already experienced.

#reinforcement-learning #data-science #artificial-intelligence #machine-learning #human vs. machine #human

Reinforcement Learning: Basics of Q Tables

becominghuman.ai

Human vs. Machine — Reinforcement Learning in the Context of Snake