After more than 2 months without publish, I returned! Now, I wanna divide with you my last experiences studying Reinforcement Learning.
After more than 2 months without publish, I returned! Now, I wanna divide with you my last experiences studying Reinforcement Learning and solving some problems.
The first algorithm for any any newbie in Reinforcement Learning usually is Q-Learning, and why? Because it’s a very simple algorithm, easy to understand and powerful for a many problems!
In this post, we’ll build together a agent to play the Taxi-V3 game from OpenAI Gym just using numpy and a few lines of code. After this article, you’ll be able to apply Q-Learning to solve other problems in different environments.
But first, we need to understand what is Reinforcement Learning?
The image above resume the core idea of Reinforcement Learning where we have:
The main goal of the Agent is take actions that will maximize your future reward. So the flow is:
Our Agent have 2 ways to take a decision in determined situation: Exploration and Exploitation. In the Exploration, our Agent will take random decisions, this is useful to learn about the environment. In the Exploitation, our Agent will take actions based on what he already knows.
In the amazing video below, you can visualize the Reinforcement Learning in practice where we have 4 agents playing hide and seek. Don’t forget to check this!
Now, you already know what is Reinforcement Learning and why it’s so amazing field from the Artificial Intelligence!
Let’s see how Q-Learning works.
Like I said before, Q-Learning is a very simple to understand algorithm and very recommended to beginners in Reinforcement Learning, because it’s powerful and can be apply in a few lines of code.
Basically in Q-Learning, our we create a table with actions and states, called Q-Table. This table will help our agent to take the best action for the moment. The table looks like this:
But in the beginning, we start this table with 0 in all values. The idea is leave the agent explore the environment taking random actions and after, use the rewards received from these actions to populate the table, this is the Exploration.
After that, we start the Exploitation, where the agent use the table to take actions who will maximize him future reward. But in the Exploitation, the Q-Table still changing with the states, a good action in some state don’t necessary will be a good action in other state.
To decide the action to maximize the future reward, we use the formula below
After that, our agent will receive a reward from the environment, that can be negative or positive. And we’ll use the formula below to update our Q-Table:
This is how the Q-Learning Algorithm works, remember that flow:
Artificial Intelligence (AI) will and is currently taking over an important role in our lives — not necessarily through intelligent robots.
Paper Summary: Discovering Reinforcement Learning Agents. Learning the Update Rule through Meta-Learning. One exception to this rule can be found in the field of meta-learning.
What is the difference between AI ML and DL. What is ANI vs AGI. When will general artificial intelligence be a reality? What can narrow artificial intelligence do today that is better than human intelligence?
Deep Q-Networks have revolutionized the field of Deep Reinforcement Learning, but the technical prerequisites for easy experimentation have barred newcomers until now.
Jakob Foerster on the TDS podcast. Editor’s note: The Towards Data Science podcast’s “Climbing the Data Science Ladder” series is hosted by Jeremie Harris.