Machine Learning (ML) and Artificial Intelligence (AI) algorithms are increasingly powering our modern society and leaving their mark on everything from finance to healthcare to transportation. If the late half of the 20th century was about the general progress in computing and connectivity (internet infrastructure), the 21st century is shaping up to be dominated by intelligent computing and a race toward smarter machines.

Most of the discussion and awareness about these novel computing paradigms, however, circle around the so-called ‘supervised learning,’ in which Deep Learning (DL) occupies a central position. The recent advancement and astounding success of Deep Neural Networks (DNN) – from disease classification to image segmentation to speech recognition – has led to much excitement and application of DNNs in all facets of high-tech systems.

DNN systems, however, need a lot of training data (labelled samples for which the answer is already known) to work properly, and they do not exactly mimic the way human beings learn and apply their intelligence. Almost all AI experts agree that simply scaling up the size and speed of DNN-based systems will never lead to true “human-like” AI systems or anything even close to it.

Consequently, there is a lot of research and interest in exploring ML/AI paradigms and algorithms that go beyond the realm of supervised learning, and try to follow the curve of the human learning process. Reinforcement Learning (RL) is the most widely researched and exciting of these.

In this article, we briefly discuss how modern DL and RL can be enmeshed together in a field called Deep Reinforcement Learning (DRL) to produce powerful AI systems.

What is Deep Reinforcement Learning?

What is Reinforcement Learning?

Humans excel at solving a wide variety of challenging problems, from low-level motor control (e.g., walking, running, playing tennis) to high-level cognitive tasks (e.g., doing mathematics, writing poetry, conversation).

Reinforcement learning aims to enable a software/hardware agent to mimic this human behavior through well-defined, well-designed computing algorithms. The goal of such a learning paradigm is not to map labelled examples in a simple input/output functional manner (like a standalone DL system) but to build a strategy that helps the intelligent agent to take action in a sequence with the goal of fulfilling some ultimate goal.

More formally, RL refers to goal-oriented algorithms, which learn how to attain a complex objective (goal) or how to maximize along a particular dimension over many steps. The following examples illustrate their use:

  • A board game which maximizes the probability of winning
  • A financial simulation maximizing the gain of a transaction
  • A robot moving through a complex environment minimizing the error in its movements

The idea is that the agent receives input from the environment through sensor data, processes it using RL algorithms, and then takes an action towards satisfying the predetermined goal. This is very similar to how we humans behave in our daily life.

Some Essential Definitions in Deep Reinforcement Learning

It is useful, for the forthcoming discussion, to have a better understanding of some key terms used in RL.

Agent: A software/hardware mechanism which takes certain action depending on its interaction with the surrounding environment; for example, a drone making a delivery, or Super Mario navigating a video game. The algorithm is the agent.

Action: An action is one of all the possible moves the agent can make. An action is almost self-explanatory, but it should be noted that agents usually choose from a list of discrete possible actions.

Environment: The world through which the agent moves, and which responds to the agent. The environment takes the agent’s current state and action as input, and returns as output the agent’s reward and its next state.

State: A state is a concrete and immediate situation in which the agent finds itself, i.e., a specific place and moment, an instantaneous configuration that puts the agent in relation to other significant things. An example is a particular configuration of a chessboard.

Reward: A reward is the feedback by which we measure the success or failure of an agent’s actions in a given state. For example, in a game of chess, important actions such as eliminating the bishop of the opponent can bring some reward, while winning the game may bring a big reward. Negative rewards are also defined in a similar sense, e.g., loss in a game.

Discount factor: The discount factor is a multiplier. Future rewards, as discovered by the agent, are multiplied by this factor in order to dampen these rewards’ cumulative effect on the agent’s current choice of action. This is at the heart of RL, i.e., gradually reducing the value of future rewards so that recent actions are given more weight. This is critically important for a paradigm that works on the principle of ‘delayed action.’

Policy: The policy is the strategy that the agent employs to determine the next action based on the current state. It maps states to actions, the actions that promise the highest reward.

Value: The expected long-term return with the discount, as opposed to the short-term reward. The value is defined as the expected long-term return of the current state under a particular policy.

Q-value or action-value: Q-value is similar to value, except that it takes an extra parameter, the current action. It refers to the long-term return of an action taking a specific action under a specific policy from the current state.

#2020 may tutorials #overviews #deep learning #reinforcement learning #deep learning

What You Need to Know About Deep Reinforcement Learning - KDnuggets
2.90 GEEK