I recently watched AlphaGo — The Movie, a documentary about DeepMind’s AlphaGo. AlphaGo is an AI that plays the game Go, and the documentary details the story leading up to its match against Lee Sedol. When IBM’s Deep Blue defeated the chess grandmaster Gary Kasparov in 1997, Go players around the world posited that this would not be possible with Go. 20 years later, Google’s AI beat one of the best Go players the world has ever seen. Some of the strategies it employed are believed to be very creative and are being studied by Go experts. I found all of this to be fascinating.

While I would not be able to build an AI of that caliber, I wanted to explore reinforcement learning. I decided on the game Snake (a much simpler game!) and it didn’t take very long to get some pretty great results. My code is shared in full via Github.

What is Q-Learning?

Quality Learning, or Q-learning, is similar to training a dog. My dog was a puppy when we first brought her home. She didn’t know any tricks. She didn’t know not to bite our shoes. And most importantly, she wasn’t potty trained. But she loved treats. This gave us a way to incentivize her. Every time she sat on command or shook her paw, we gave her a treat. If she bit our shoes… well, nothing really, she just didn’t get a treat. Nevertheless, over time, she even learned to press down on our feet when she needed to be let outside to use the washroom.

Q-learning is a reinforcement learning method that teaches a learning agent how to perform a task by rewarding good behavior and punishing bad behavior. In Snake, for example, moving closer to the food is good. Going off the screen is bad. At each point in the game, the agent will choose the action with the highest expected reward.

So what does that look like?

At first, the snake doesn’t know how to eat the food and is less “purposeful”. It also tends to die a lot by going the opposite way that its currently going and immediately hitting its tail. But it doesn’t take very long for the agent to learn how to play the game. After less than 30 games, it plays quite well.

Training:

Image for post

After 100 games:

Image for post

Lets take a look at how we got there.

Game Engine

In this post, we will be focusing more on the learning agent than the game. That said, we still need a game engine. I found this tutorial by Edureka that provides a great introduction into Pygame using Snake as an example. I made some minor changes to the code that allows it to interact with the learning agent and it was ready to go. Thank you Edureka!

#q-learning #editors-pick #deep learning

What is Q-Learning?

So what does that look like?

Game Engine

towardsdatascience.com

Teaching a computer how to play Snake with Q-Learning