Table-Based Q-Learning in Under 1KB: Q-learning is an algorithm in which an agent interacts with its environment and collects rewards for taking desirable actions.
Q-learning is an algorithm in which an agent interacts with its environment and collects rewards for taking desirable actions.
The simplest implementation of Q-learning is referred to as tabular or table-based Q-learning. There are tons of articles, tutorials, etc. already available on the web which describe Q-learning so I won’t go into excruciating detail here. Instead, I want to show how efficiently table-base Q-learning can be done using tinymind. In this article, I will describe how tinymind implements Q-learning using C++ templates and fixed-point (Q-format) numbers as well as go thru the example in the repo.
A common table-based Q-learning problem is to train a virtual mouse to find its way out of a maze to get the cheese (reward). Tinymind contains an example program which demonstrates how the Q-learning template library works.
In the example program, we define the maze:
/* Q-Learning unit test. Learn the best path out of a simple maze. 5 == Outside the maze ________________________________________________ | | | | | | | 0 | 1 / 5 | | | |____________/ ________|__/ __________________|_______________________ | | | | | | / | | 4 | 3 | 2 | | / | | |__/ __________________|_______________________|_______________________| 5 The paths out of the maze: 0->4->5 0->4->3->1->5 1->5 1->3->4->5 2->3->1->5 2->3->4->5 3->1->5 3->4->5 4->5 4->3->1->5
We define all of our types in a common header so that we can separate the maze learner code from the training and file management code. I have done this so that we can measure the amount of code and data required for the Q-learner alone. The common header defines the maze as well as the type required to hold states and actions:
// 6 rooms and 6 actions #define NUMBER_OF_STATES 6 #define NUMBER_OF_ACTIONS 6 typedef uint8_t state_t; typedef uint8_t action_t;
We train the mouse by dropping it into a randomly-selected room (or on the outside of it where the cheese is). The mouse starts off by taking a random action from a list of available actions at each step. The mouse receives a reward only when he finds the cheese (e.g. makes it to position 5 outside the maze). If the mouse is dropped into position 5, he has to learn to stay there and not wander back into the maze.
This article will simply explain the concept which will help you understand the difference between Machine Learning and Deep Learning.
We are a Machine Learning Services provider offering custom AI solutions, Machine Learning as a service & deep learning solutions. Hire Machine Learning experts & build AI Chatbots, Neural networks, etc. 16+ yrs & 2500+ clients.
Inexture's Deep learning Development Services helps companies to develop Data driven products and solutions. Hire our deep learning developers today to build application that learn and adapt with time.
Check out the 5 latest technologies of machine learning trends to boost business growth in 2021 by considering the best version of digital development tools. It is the right time to accelerate user experience by bringing advancement in their lifestyle.
We supply you with world class machine learning experts / ML Developers with years of domain experience who can add more value to your business.