Informal Introduction

Recall your childhood, when your curiosity was at its peak! Your inquisitiveness would lead you to explore the lands where no child has gone before. Often, these expeditions ended up in misadventures, when your wrongdoings were caught and the consequences, unfortunate; like, not being allowed to watch your favourite cartoon or to undertake the noble responsibility of saving Mario’s princess!

Those punishments were not to curb your free-thinking but to set a ‘precedent’ as to what is wrong and that ‘action’ should not be repeated.

Growing up, in younger classes, I didn’t care much for why I was taught about the American Revolution or Shakespeare’s work. But what I did care about was that I if I get good grades, I would be getting a new video game for my PC. The result? I always remained in the top few of my class throughout my school life. (Obviously, in hope for new video games)

To summarize, I was more likely to repeat my ‘action’ if I got a ‘reward’ for that.

The above example might be the most simple though certainly not the most accurate way to represent how the ‘reinforcement learning’ works.

Image for post

An example of Reinforcement Learning in the real world.

Formal Introduction

An ‘agent’ has a set of ‘action space’ which it can perform in a given ‘environment’ for which it gets ‘rewarded’ if that action meets some criteria. The agent ought to take actions to maximize this reward.

This is the basic principle of reinforcement learning.

Reinforcement Learning is a specialized field of artificial intelligence which has many applications in the field of Robotics, Industrial Automation, Business Applications etc.

Some commonly used terms in reinforcement learning

**Action Space: **The set of all the possible unique action an agent can take in any environment. For example, a mountain car can move up or down.
**Agent: **The entity that performs an action.
**Observation Space: **An environment-specific object representing your observation of the environment. For example, the velocity of the mountain car, the coordinates of a robotic arm in space.
**Reward: **A value used to evaluate the impactfulness of the last action performed.
Policy: The plan or the strategy for the next action to be performed.
**Episode: **All the states between the initial and final states constitute an episode.

Recent Advancements

Be it IBM’s Deep Blue v/s Kasparov, AlpaGo v/s Lee Sedol or Google’s Agile and Intelligent Locomotion, reinforcement learning has made an impressive mark in proving its capability in performing intelligent tasks in a complex environment.

Agile and Intelligent Locomotion via Deep Reinforcement Learning

Recent advancements in deep reinforcement learning (deep RL) has enabled legged robots to learn many agile skills…

ai.googleblog.com

With the current COVID-19 situation, reinforcement learning can be an excellent tool for use in robotics and medical field for performing remote non-contact surgeries and disinfecting surfaces.

Let’s move on to setting up the system for working with MuJoCo and OpenAI Gym.

Reinforcement Learning with OpenAI Gym

OpenAI Gym is a great open-source tool for working with reinforcement learning algorithms. Before Gym existed, researchers faced the problem of unavailability of standard environments which they could use for development rapid prototyping of their algorithms.

With the advent of the Gym, it made reinforcement learning a more practical and implementable advancement/alternative to traditional machine learning methods.

Gym: A toolkit for developing and comparing re

#robotics #automation #openai #reinforcement-learning #artificial-intelligence #deep learning