Tia  Gottlieb

Tia Gottlieb

1597617720

Deep Q-Network (DQN)-III

This is the third post devoted to **Deep Q-Network (DQN), **in the Deep Reinforcement Learning Explained series, in which we show how to use TensorBoard to obtain a graphical view of the performance of the model. We also present a way to see how our trained Agent is able to play Pong.

The entire code of this post can be found on GitHub (and can be run as a Colab google notebook using this link).

Running the training loop

When we run the notebook of this example, we obtain this output in our console:

Image for post

. . .

Image for post

Due to randomness in the training process, your actual dynamics will differ from what is displayed here.

#artificial-intelligence #towards-data-science #deep-learning #reinforcement-learning #deep-r-l-explained #deep learning

What is GEEK

Buddha Community

Deep Q-Network (DQN)-III
Tia  Gottlieb

Tia Gottlieb

1597617720

Deep Q-Network (DQN)-III

This is the third post devoted to **Deep Q-Network (DQN), **in the Deep Reinforcement Learning Explained series, in which we show how to use TensorBoard to obtain a graphical view of the performance of the model. We also present a way to see how our trained Agent is able to play Pong.

The entire code of this post can be found on GitHub (and can be run as a Colab google notebook using this link).

Running the training loop

When we run the notebook of this example, we obtain this output in our console:

Image for post

. . .

Image for post

Due to randomness in the training process, your actual dynamics will differ from what is displayed here.

#artificial-intelligence #towards-data-science #deep-learning #reinforcement-learning #deep-r-l-explained #deep learning

Paper Summary: Playing Atari with Deep Reinforcement Learning

This paper presents a deep reinforcement learning model that learns control policies directly from high-dimensional sensory inputs (raw pixels /video data). The deep learning model, created by DeepMind, consisted of a CNN trained with a variant of Q-learning. The model learned to play seven Atari 2600 games and the results showed that the algorithm outperformed all the previous approaches.

Introduction

The paper lists some of the challenges faced by Reinforcement Learning algorithms in comparison to other Deep Learning techniques. Unlike other DL algorithms that have a large amount of labeled training data, RL relies on scalar reward functions that are sparse and delayed. The reward for a particular action could be received after several thousand time-steps. This delay between input and output reward is in stark contrast to the direct association between input and target in supervised learning. In other DL approaches, the inputs are independent, whereas, in RL, the input is a sequence of highly correlated states. Also, in RL, the data distribution changes as the algorithm learns new behaviors.

This paper trains a CNN with a variant of Q-learning algorithm to overcome the drawbacks mentioned above and uses stochastic gradient descent to update the weights. The problem of correlated data and non-stationary distribution is handled using an experience replay mechanism that randomly samples previous transitions. The approach is tested on the Atari game environment which provides a training ground with high dimensional visual input. The goal is to create a neural network learning agent whose only input is the raw video input, the reward and terminal signals and a set of valid actions. It is noteworthy that the same neural network architecture and hyperparameters were used for all the games.

1. Background

In the proposed task, an agent interacts with the environment Ɛ (the Atari emulator) by performing an action at from a set of valid actions A — {1,…K}. Ɛ may be stochastic as the game environment is unpredictable. The agent can observe an image xtϵ R^d that represents a vector of raw pixels in the emulator and receives a reward rt depending on the action taken. The agent only sees the current screen xt which does not give a complete idea about the current state. Hence, the paper refers to a sequence of actions and observations that depict the state st of the system at time t as st= x1 + a1 + x2 + a2 …xt + at_._ This makes the machine learn in a more human-like manner. The goal is to maximize the future rewards that are discounted by a factor η.

The optimal value function is defined as Q*(s,a) = maxπ E [Rt| st = s; at = a; π ] that obeys the Bellman Equation. The optimal value function can be found through a value iteration approach, but since it is impractical, the authors propose a non-linear function approximator such as a neural network to estimate the optimal Q value. The Q network can be trained by minimizing the loss functions Li(Θi) that changes at every iteration i.

The Q-learning algorithm is model-free, meaning, it doesn’t need to learn the rules of the game, as opposed to the model-based approaches, where the rules of the game are embedded in a transition matrix. The Q-learning algorithm is also off-policy and not on-policy. In fact, the paper follows a more hybrid approach. In the game environment, the agent can take one of several possible actions. Each action is associated with a Q value. Instead of always selecting the action with the highest Q value (greedy approach /on-policy), the agent can choose other actions to see where they lead(off-policy) and ‘build an experience’. In this paper, the agent picks the greedy action with a probability of (1-e), where e represents the randomness of the choice. At first, the network begins ‘exploring’ the environment by choosing e very close to 1 and gradually reduces e to 0 when it starts ‘exploiting’ the environment.

The Atari game environment faces a ‘perceptual aliasing’ problem where a single frame at any given instant does not reveal any useful information about the state of the environment. The paper solves this problem by using the last 4 frames of history and stacks them to produce the input to the Q-function.

2. Related Work

The most closely related work to this paper is neural fitted Q- learning (NFQ) that optimizes a sequence of loss functions using the RPOP algorithm. But they used batch update which is computationally expensive. This paper uses stochastic gradient descent that has low constant cost per iteration and can scale to large datasets. NFQ used deep autoencoders to learn a low dimensional representation of the task. This paper’s novelty lies in using reinforcement learning end-to-end directly from raw visual inputs thus learning features that are directly relevant to discriminating actionA values.
A

3. Deep Reinforcement Learning architecture

The paper aims to connect a reinforcement learning algorithm to a deep neural network that directly takes in RGB images as input and processes it using SGD. This paper utilizes a technique called Experience Replay. The agent’s experience at each step is stored as et = (st, at, rt, st+1) in a dataset pooled over multiple episodes and is called replay memory. Q-learning updates are applied to a random batch of samples from the pool. This makes the training data samples more random and uncorrelated, thus making it more ‘stationary’ to the neural network as each new batch is filled with random strategy experiences.

Deep Q learning algorithm with Experience Replay has the following advantages –

· Each step of the experience is potentially used in many weight updates, which allows for greater data efficiency.

· By using experience replay the behavior distribution is averaged over many of its previous states, smoothing out learning and avoiding oscillations or divergence in the parameters.

The proposed algorithm has the limitation that it only stores the last n experience tuples in memory. A better approach would be to emphasize more on the transitions from which the agent can learn the most.

The input image is 210x160, which is large and computationally expensive. To simplify, the raw image is pre-processed by reducing it to greyscale with dimension 110x84. The image is cropped to 84x84 as they use GPU implementations of 2D convolution which accepts only square inputs. The last four frames in history are pre-processed and stacked to produce the input to the Q-function.

Instead of giving a (state, action) pair as the input to the neural network, in this architecture, only the state representation is given as input and separate output units are present for each possible action for the given state. The main advantage of this type of architecture is the ability to compute Q-values for all possible actions in a given state with only a single forward pass through the network.

The input to the neural network consists of an 84x 84x 4 image subjected to the following DQN structure -

· 16 8x8 filters with stride 4, ReLu activation

· 32 4x4 filters with stride 2, ReLu activation

· a fully connected layer, 256 rectified units

#q-learning #deep-reinforcement #deep-learning #atari #dqn #deep learning

Tia  Gottlieb

Tia Gottlieb

1597597080

Deep Q-Network (DQN)-I

In the previous post, we have presented solution methods that represent the action-values in a small table. We referred to this table as a Q-table. In the next three posts of the Deep Reinforcement Learning Explained series, we will introduce the reader to the idea of using neural networks to expand the size of the problems that we can solve with reinforcement learning presenting the Deep Q-Network (DQN), that represents the optimal action-value function as a neural network, instead of a table. In this post, we will do an overview of DQN as well as introduce the OpenAI Gym framework of Pong. In the next two posts, we will present the algorithm and its implementation.

Atari 2600 games

The Q-learning method that we have just covered in previous posts solves the issue by iterating over the full set of states. However often we realize that we have too many states to track. An example is Atari games, that can have a large variety of different screens, and in this case, the problem cannot be solved with a Q-table.

The Atari 2600 game console was very popular in the 1980s, and many arcade-style games were available for it. The Atari console is archaic by today’s gaming standards, but its games still are challenging for computers and is a very popular benchmark within RL research (using an emulator)

Image for post

#artificial-intelligence #reinforcement-learning #towards-data-science #deep-learning #deep-r-l-explained #deep learning

Tia  Gottlieb

Tia Gottlieb

1597608720

Deep Q-Network (DQN)-II

This is the second post devoted to **Deep Q-Network (DQN), **in the Deep Reinforcement Learning Explained series, in which we will analyse some challenges that appear when we apply Deep Learning to Reinforcement Learning. We will also present in detail the code that solves the OpenAI Gym Pong game using the DQN network introduced in the previous post.

Challenges in Deep Reinforcement Learning

Unfortunately, reinforcement learning is more unstable when neural networks are used to represent the action-values, despite applying the wrappers introduced in the previous section. Training such a network requires a lot of data, but even then, it is not guaranteed to converge on the optimal value function. In fact, there are situations where the network weights can oscillate or diverge, due to the high correlation between actions and states.

In order to solve this, in this section we will introduce two techniques used by the Deep Q-Network:

  • Experience Replay
  • Target Network

There are many more tips and tricks that researchers have discovered to make DQN training more stable and efficient, and we will cover the best of them in future posts in this series.

Experience Replay

We are trying to approximate a complex, nonlinear function, Q(s, a), with a Neural Network. To do this, we must calculate targets using the Bellman equation and then consider that we have a supervised learning problem at hand. However, one of the fundamental requirements for SGD optimization is that the training data is independent and identically distributed and when the Agent interacts with the Environment, the sequence of experience tuples can be highly correlated. The naive Q-learning algorithm that learns from each of these experiences tuples in sequential order runs the risk of getting swayed by the effects of this correlation.

#reinforcement-learning #deep-r-l-explained #deep-learning #artificial-intelligence #towards-data-science #deep learning

Marlon  Boyle

Marlon Boyle

1594312560

Autonomous Driving Network (ADN) On Its Way

Talking about inspiration in the networking industry, nothing more than Autonomous Driving Network (ADN). You may hear about this and wondering what this is about, and does it have anything to do with autonomous driving vehicles? Your guess is right; the ADN concept is derived from or inspired by the rapid development of the autonomous driving car in recent years.

Image for post

Driverless Car of the Future, the advertisement for “America’s Electric Light and Power Companies,” Saturday Evening Post, the 1950s.

The vision of autonomous driving has been around for more than 70 years. But engineers continuously make attempts to achieve the idea without too much success. The concept stayed as a fiction for a long time. In 2004, the US Defense Advanced Research Projects Administration (DARPA) organized the Grand Challenge for autonomous vehicles for teams to compete for the grand prize of $1 million. I remembered watching TV and saw those competing vehicles, behaved like driven by drunk man, had a really tough time to drive by itself. I thought that autonomous driving vision would still have a long way to go. To my surprise, the next year, 2005, Stanford University’s vehicles autonomously drove 131 miles in California’s Mojave desert without a scratch and took the $1 million Grand Challenge prize. How was that possible? Later I learned that the secret ingredient to make this possible was using the latest ML (Machine Learning) enabled AI (Artificial Intelligent ) technology.

Since then, AI technologies advanced rapidly and been implemented in all verticals. Around the 2016 time frame, the concept of Autonomous Driving Network started to emerge by combining AI and network to achieve network operational autonomy. The automation concept is nothing new in the networking industry; network operations are continually being automated here and there. But this time, ADN is beyond automating mundane tasks; it reaches a whole new level. With the help of AI technologies and other critical ingredients advancement like SDN (Software Defined Network), autonomous networking has a great chance from a vision to future reality.

In this article, we will examine some critical components of the ADN, current landscape, and factors that are important for ADN to be a success.

The Vision

At the current stage, there are different terminologies to describe ADN vision by various organizations.
Image for post

Even though slightly different terminologies, the industry is moving towards some common terms and consensus called autonomous networks, e.g. TMF, ETSI, ITU-T, GSMA. The core vision includes business and network aspects. The autonomous network delivers the “hyper-loop” from business requirements all the way to network and device layers.

On the network layer, it contains the below critical aspects:

  • Intent-Driven: Understand the operator’s business intent and automatically translate it into necessary network operations. The operation can be a one-time operation like disconnect a connection service or continuous operations like maintaining a specified SLA (Service Level Agreement) at the all-time.
  • **Self-Discover: **Automatically discover hardware/software changes in the network and populate the changes to the necessary subsystems to maintain always-sync state.
  • **Self-Config/Self-Organize: **Whenever network changes happen, automatically configure corresponding hardware/software parameters such that the network is at the pre-defined target states.
  • **Self-Monitor: **Constantly monitor networks/services operation states and health conditions automatically.
  • Auto-Detect: Detect network faults, abnormalities, and intrusions automatically.
  • **Self-Diagnose: **Automatically conduct an inference process to figure out the root causes of issues.
  • **Self-Healing: **Automatically take necessary actions to address issues and bring the networks/services back to the desired state.
  • **Self-Report: **Automatically communicate with its environment and exchange necessary information.
  • Automated common operational scenarios: Automatically perform operations like network planning, customer and service onboarding, network change management.

On top of those, these capabilities need to be across multiple services, multiple domains, and the entire lifecycle(TMF, 2019).

No doubt, this is the most ambitious goal that the networking industry has ever aimed at. It has been described as the “end-state” and“ultimate goal” of networking evolution. This is not just a vision on PPT, the networking industry already on the move toward the goal.

David Wang, Huawei’s Executive Director of the Board and President of Products & Solutions, said in his 2018 Ultra-Broadband Forum(UBBF) keynote speech. (David W. 2018):

“In a fully connected and intelligent era, autonomous driving is becoming a reality. Industries like automotive, aerospace, and manufacturing are modernizing and renewing themselves by introducing autonomous technologies. However, the telecom sector is facing a major structural problem: Networks are growing year by year, but OPEX is growing faster than revenue. What’s more, it takes 100 times more effort for telecom operators to maintain their networks than OTT players. Therefore, it’s imperative that telecom operators build autonomous driving networks.”

Juniper CEO Rami Rahim said in his keynote at the company’s virtual AI event: (CRN, 2020)

“The goal now is a self-driving network. The call to action is to embrace the change. We can all benefit from putting more time into higher-layer activities, like keeping distributors out of the business. The future, I truly believe, is about getting the network out of the way. It is time for the infrastructure to take a back seat to the self-driving network.”

Is This Vision Achievable?

If you asked me this question 15 years ago, my answer would be “no chance” as I could not imagine an autonomous driving vehicle was possible then. But now, the vision is not far-fetch anymore not only because of ML/AI technology rapid advancement but other key building blocks are made significant progress, just name a few key building blocks:

  • software-defined networking (SDN) control
  • industry-standard models and open APIs
  • Real-time analytics/telemetry
  • big data processing
  • cross-domain orchestration
  • programmable infrastructure
  • cloud-native virtualized network functions (VNF)
  • DevOps agile development process
  • everything-as-service design paradigm
  • intelligent process automation
  • edge computing
  • cloud infrastructure
  • programing paradigm suitable for building an autonomous system . i.e., teleo-reactive programs, which is a set of reactive rules that continuously sense the environment and trigger actions whose continuous execution eventually leads the system to satisfy a goal. (Nils Nilsson, 1996)
  • open-source solutions

#network-automation #autonomous-network #ai-in-network #self-driving-network #neural-networks