After more than 2 months without publish, I returned! Now, I wanna divide with you my last experiences studying Reinforcement Learning and solving some problems.

The first algorithm for any any newbie in Reinforcement Learning usually is Q-Learning, and why? Because it’s a very simple algorithm, easy to understand and powerful for a many problems!

In this post, we’ll build together a agent to play the Taxi-V3 game from OpenAI Gym just using numpy and a few lines of code. After this article, you’ll be able to apply Q-Learning to solve other problems in different environments.

But first, we need to understand what is Reinforcement Learning?

A Short Resume of Reinforcement Learning

Image for post

The image above resume the core idea of Reinforcement Learning where we have:

  • Agent: Think of the agent as our model, he is responsible for making the magic happen, like playing Pacman like a professional.
  • Environment: The Environment is where the magic happens, in this example will be the Taxi-V3 game.
  • Reward: Is the feedback given by the Environment to say if the action taken from agent was good or bad. The reward can be positive or negative.
  • **Action:**The action taken by the Agent.
  • State: Is the current situation of the Agent in Environment such: low life, without ammunition or facing a wall.

The main goal of the Agent is take actions that will maximize your future reward. So the flow is:

  • Take an action;
  • Receive a feedback from environment;
  • Receive the new state;
  • Take a new Action;

Our Agent have 2 ways to take a decision in determined situation: Exploration and Exploitation. In the Exploration, our Agent will take random decisions, this is useful to learn about the environment. In the Exploitation, our Agent will take actions based on what he already knows.

In the amazing video below, you can visualize the Reinforcement Learning in practice where we have 4 agents playing hide and seek. Don’t forget to check this!

Now, you already know what is Reinforcement Learning and why it’s so amazing field from the Artificial Intelligence!

Let’s see how Q-Learning works.

Q-Learning Resume

Like I said before, Q-Learning is a very simple to understand algorithm and very recommended to beginners in Reinforcement Learning, because it’s powerful and can be apply in a few lines of code.

Basically in Q-Learning, our we create a table with actions and states, called Q-Table. This table will help our agent to take the best action for the moment. The table looks like this:

Image for post

Q-Table

But in the beginning, we start this table with 0 in all values. The idea is leave the agent explore the environment taking random actions and after, use the rewards received from these actions to populate the table, this is the Exploration.

After that, we start the Exploitation, where the agent use the table to take actions who will maximize him future reward. But in the Exploitation, the Q-Table still changing with the states, a good action in some state don’t necessary will be a good action in other state.

To decide the action to maximize the future reward, we use the formula below

Image for post

After that, our agent will receive a reward from the environment, that can be negative or positive. And we’ll use the formula below to update our Q-Table:

Image for post

This is how the Q-Learning Algorithm works, remember that flow:

Image for post

#reinforcement-learning #agents #q-learning #artificial-intelligence #ai #deep learning

Reinforcement Learning: Using Q-Learning to Drive a Taxi!
2.70 GEEK