1640003713
Tensorforce is an open-source deep reinforcement learning framework, with an emphasis on modularized flexible library design and straightforward usability for applications in research and practice. Tensorforce is built on top of Google's TensorFlow framework and requires Python 3.
Tensorforce follows a set of high-level design choices which differentiate it from other similar libraries:
A stable version of Tensorforce is periodically updated on PyPI and installed as follows:
pip3 install tensorforce
To always use the latest version of Tensorforce, install the GitHub version instead:
git clone https://github.com/tensorforce/tensorforce.git
pip3 install -e tensorforce
Note on installation on M1 Macs: At the moment Tensorflow, which is a core dependency of Tensorforce, cannot be installed on M1 Macs directly. Follow the "M1 Macs" section in the documentation for a workaround.
Environments require additional packages for which there are setup options available (ale
, gym
, retro
, vizdoom
, carla
; or envs
for all environments), however, some require additional tools to be installed separately (see environments documentation). Other setup options include tfa
for TensorFlow Addons and tune
for HpBandSter required for the tune.py
script.
Note on GPU usage: Different from (un)supervised deep learning, RL does not always benefit from running on a GPU, depending on environment and agent configuration. In particular for environments with low-dimensional state spaces (i.e., no images), it is hence worth trying to run on CPU only.
from tensorforce import Agent, Environment
# Pre-defined or custom environment
environment = Environment.create(
environment='gym', level='CartPole', max_episode_timesteps=500
)
# Instantiate a Tensorforce agent
agent = Agent.create(
agent='tensorforce',
environment=environment, # alternatively: states, actions, (max_episode_timesteps)
memory=10000,
update=dict(unit='timesteps', batch_size=64),
optimizer=dict(type='adam', learning_rate=3e-4),
policy=dict(network='auto'),
objective='policy_gradient',
reward_estimation=dict(horizon=20)
)
# Train for 300 episodes
for _ in range(300):
# Initialize episode
states = environment.reset()
terminal = False
while not terminal:
# Episode timestep
actions = agent.act(states=states)
states, terminal, reward = environment.execute(actions=actions)
agent.observe(terminal=terminal, reward=reward)
agent.close()
environment.close()
Tensorforce comes with a range of example configurations for different popular reinforcement learning environments. For instance, to run Tensorforce's implementation of the popular Proximal Policy Optimization (PPO) algorithm on the OpenAI Gym CartPole environment, execute the following line:
python3 run.py --agent benchmarks/configs/ppo.json --environment gym \
--level CartPole-v1 --episodes 100
For more information check out the documentation.
multiprocessing
and socket
.By combining these modular components in different ways, a variety of popular deep reinforcement learning models/features can be replicated:
Note that in general the replication is not 100% faithful, since the models as described in the corresponding paper often involve additional minor tweaks and modifications which are hard to support with a modular design (and, arguably, also questionable whether it is important/desirable to support them). On the upside, these models are just a few examples from the multitude of module combinations supported by Tensorforce.
Please get in touch via mail or on Gitter if you have questions, feedback, ideas for features/collaboration, or if you seek support for applying Tensorforce to your problem.
If you want to support the Tensorforce core team (see below), please also consider donating: GitHub Sponsors or Liberapay.
Tensorforce is currently developed and maintained by Alexander Kuhnle.
Earlier versions of Tensorforce (<= 0.4.2) were developed by Michael Schaarschmidt, Alexander Kuhnle and Kai Fricke.
The advanced parallel execution functionality was originally contributed by Jean Rabault (@jerabaul29) and Vincent Belus (@vbelus). Moreover, the pretraining feature was largely developed in collaboration with Hongwei Tang (@thw1021) and Jean Rabault (@jerabaul29).
The CARLA environment wrapper is currently developed by Luca Anzalone (@luca96).
We are very grateful for our open-source contributors (listed according to Github, updated periodically):
Islandman93, sven1977, Mazecreator, wassname, lefnire, daggertye, trickmeyer, mkempers, mryellow, ImpulseAdventure, janislavjankov, andrewekhalel, HassamSheikh, skervim, beflix, coord-e, benelot, tms1337, vwxyzjn, erniejunior, Deathn0t, petrbel, nrhodes, batu, yellowbee686, tgianko, AdamStelmaszczyk, BorisSchaeling, christianhidber, Davidnet, ekerazha, gitter-badger, kborozdin, Kismuz, mannsi, milesmcc, nagachika, neitzal, ngoodger, perara, sohakes, tomhennigan.
Please cite the framework as follows:
@misc{tensorforce,
author = {Kuhnle, Alexander and Schaarschmidt, Michael and Fricke, Kai},
title = {Tensorforce: a TensorFlow library for applied reinforcement learning},
howpublished = {Web page},
url = {https://github.com/tensorforce/tensorforce},
year = {2017}
}
If you use the parallel execution functionality, please additionally cite it as follows:
@article{rabault2019accelerating,
title = {Accelerating deep reinforcement learning strategies of flow control through a multi-environment approach},
author = {Rabault, Jean and Kuhnle, Alexander},
journal = {Physics of Fluids},
volume = {31},
number = {9},
pages = {094105},
year = {2019},
publisher = {AIP Publishing}
}
If you use Tensorforce in your research, you may additionally consider citing the following paper:
@article{lift-tensorforce,
author = {Schaarschmidt, Michael and Kuhnle, Alexander and Ellis, Ben and Fricke, Kai and Gessert, Felix and Yoneki, Eiko},
title = {{LIFT}: Reinforcement Learning in Computer Systems by Learning From Demonstrations},
journal = {CoRR},
volume = {abs/1808.07903},
year = {2018},
url = {http://arxiv.org/abs/1808.07903},
archivePrefix = {arXiv},
eprint = {1808.07903}
}
Download Details:
Author: tensorforce
Source Code: https://github.com/tensorforce/tensorforce
License: Apache-2.0 License
#tensorflow #machine-learning #deep-learning #python
1617355640
The Association of Data Scientists (AdaSci), a global professional body of data science and ML practitioners, is holding a full-day workshop on building games using reinforcement learning on Saturday, February 20.
Artificial intelligence systems are outperforming humans at many tasks, starting from driving cars, recognising images and objects, generating voices to imitating art, predicting weather, playing chess etc. AlphaGo, DOTA2, StarCraft II etc are a study in reinforcement learning.
Reinforcement learning enables the agent to learn and perform a task under uncertainty in a complex environment. The machine learning paradigm is currently applied to various fields like robotics, pattern recognition, personalised medical treatment, drug discovery, speech recognition, and more.
With an increase in the exciting applications of reinforcement learning across the industries, the demand for RL experts has soared. Taking the cue, the Association of Data Scientists, in collaboration with Analytics India Magazine, is bringing an extensive workshop on reinforcement learning aimed at developers and machine learning practitioners.
#ai workshops #deep reinforcement learning workshop #future of deep reinforcement learning #reinforcement learning #workshop on a saturday #workshop on deep reinforcement learning
1620233880
Reinforcement learning has gained valuable popularity with the relatively recent success of DeepMind’s AlphaGo method to baeat the world champion Go player. The AlphaGo method was educated in part by reinforcement learning on deep neural networks.
This style of learning is a distinct feature of machine learning from the classical supervised and unsupervised paradigms. In reinforcement learning, the network responds to environmental data (called the state) using deep neural networks, and influences the behaviour of an agent to try to optimise a reward.
This technique helps a network to learn how to play sports, such as Atari or other video games, or some other challenge that can be rewritten as a form of game. In this tutorial, a common model of reinforcement learning, I will introduce the broad principles of Q learning, and I will demonstrate how to incorporate deep Q learning in TensorFlow.
As mentioned above, reinforcement learning consists of a few basic entities or principles. They are: an environment that creates a condition and reward, and an entity that performs actions in the given environment. In the diagram below, you see this interaction:
The task of the agent in such a setting is to analyse the state and the incentive information it receives and pick an behaviour that maximises the input it receives from the reward. The agent learns by repetitive contact with the world, or, in other words, repeated playing of the game.
In order to succeed, it is necessary for the agent to:
1. Learn the link between states, behaviour and resulting incentives
2. Determine which is the best move to pick from (1)
Implementation (1) requires defining a certain set of principles that can be used to notify (2) and (2) is referred to as the strategy of operation. One of the most common methods of applying (1) and (2) using deep Q is the Deep Q network and the epsilon-greedy policy.
#artificial intelligence #machine learning #reinforcement learning #tensorflow
1625143260
Due to the memory limitations of LSTMs, most of the current Deep Learning models have used attention mechanisms. A paper titled ‘Deep reinforcement learning with relational inductive biases’ about that topic was already published by DeepMind at ICLR 2019.
In this paper, grid type environment is used to verify the performance of the model. However, it is confirmed that that environment takes too much time to train compared to the quality of it due to the way of moving the block. That is why I decide to use another environment that has relational features but has a simple movement way. After confirming whether the algorithm of the paper works, we can use it in the Starcraft 2 environment.
…
#reinforcement-learning #relational-intelligence #tensorflow #deep-learning #relational deep reinforcement learning
1617331066
Reinforcement learning (RL) is surely a rising field, with the huge influence from the performance of AlphaZero (the best chess engine as of now). RL is a subfield of machine learning that teaches agents to perform in an environment to maximize rewards overtime.
Among RL’s model-free methods is temporal difference (TD) learning, with SARSA and Q-learning (QL) being two of the most used algorithms. I chose to explore SARSA and QL to highlight a subtle difference between on-policy learning and off-learning, which we will discuss later in the post.
This post assumes you have basic knowledge of the agent, environment, action, and rewards within RL’s scope. A brief introduction can be found here.
The outline of this post include:
We will compare these two algorithms via the CartPole game implementation. This post’s code can be found here :QL code ,SARSA code , and the fully functioning code . (the fully-functioning code has both algorithms implemented and trained on cart pole game)
The TD learning will be a bit mathematical, but feel free to skim through and jump directly to QL and SARSA.
#reinforcement-learning #artificial-intelligence #machine-learning #deep-learning #learning
1598250000
Although the field of deep learning is evolving extremely fast, unique research with the potential to get us closer to Artificial General Intelligence (AGI) is rare and hard to find. One exception to this rule can be found in the field of meta-learning. Recently, meta-learning has also been applied to Reinforcement Learning (RL) with some success. The paper “Discovering Reinforcement Learning Agents” by Oh et al. from DeepMind provides a new and refreshing look at the application of meta-learning to RL.
**Traditionally, RL relied on hand-crafted algorithms **such as Temporal Difference learning (TD-learning) and Monte Carlo learning, various Policy Gradient methods, or combinations thereof such as Actor-Critic models. These RL algorithms are usually finely adjusted to train models for a very specific task such as playing Go or Dota. One reason for this is that multiple hyperparameters such as the discount factor γ and the bootstrapping parameter λ need to be tuned for stable training. Furthermore, the very update rules as well as the choice of predictors such as value functions need to be chosen diligently to ensure good performance of the model. The entire process has to be performed manually and is often tedious and time-consuming.
DeepMind is trying to change this with their latest publication. In the paper, the authors propose a new meta-learning approach that discovers the learning objective as well as the exploration procedure by interacting with a set of simple environments. They call the approach the Learned Policy Gradient (LPG). The most appealing result of the paper is that the algorithm is able to effectively generalize to more complex environments, suggesting the potential to discover novel RL frameworks purely by interaction.
In this post, I will try to explain the paper in detail and provide additional explanation where I had problems with understanding. Hereby, I will stay close to the structure of the paper in order to allow you to find the relevant parts in the original text if you want to get additional details. Let’s dive in!
#meta-learning #reinforcement-learning #machine-learning #ai #deep-learning #deep learning