Action Masking with RLlib

Action Masking with RLlib

RL algorithms learn via trial and error. The agent searches the state space early on and takes random actions to learn what leads to a good reward. Pretty straightforward.

Parametric actions to improve reinforcement learning

RL algorithms learn via trial and error. The agent searches the state space early on and takes random actions to learn what leads to a good reward. Pretty straightforward.

Unfortunately, this isn’t terribly efficient, especially if we already know something about what makes a good vs. bad action in some states. Thankfully, we can use action masking — a simple technique that sets the probability of bad actions to 0 — to speed learning and improve our policies.

TL;DR

We enforce constraints via action masking for a knapsack packing environment and show how to do this using RLlib.

Enforcing Constraints

Let’s use the classic knapsack problem to develop a concrete example.

The knapsack problem (KP) asks you to pack a knapsack to maximize the value in the bag without overloading it. If you have a collection of items like we have shown below, the optimal packing is going to contain three of the yellow boxes and three of the gray boxes for a total of $36 and 15kg (this is the unbounded knapsack problem because you have no limit on how many boxes you can choose).

Image for post

Typically, this problem is solved using dynamic programming or math programming. If we set it up following a math program, we can write out the model as follows:

Image for post

In this case, xi_​ is can be any value ≥0 and symbolizes the number of items _i _we place into the knapsack. vi_​ and wi_​, are the values and weights of the items respectively.

In plain language, this small model is saying we want to maximize the value in the knapsack (which we call z). We do this by finding the largest number of items (xi_) and their values (vi​) without exceeding the weight limit of the knapsack (_W). This formulation is known as an Integer Program (IP) because we have integer decision variables (we can’t pack parts of items, just full, integer values) and is solved using a solver like CPLEX, Gurobi, or GLPK (the last one is free and open source).

deep-learning operations optimization reinforcement-learning data-science deep learning

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Reinforcing the Science Behind Reinforcement Learning

Dummies guide to Reinforcement learning, Q learning, Bellman Equation. You’re getting bore stuck in lockdown, you decided to play computer games to pass your time.

50 Data Science Jobs That Opened Just Last Week

Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments. Our latest survey report suggests that as the overall Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments, data scientists and AI practitioners should be aware of the skills and tools that the broader community is working on. A good grip in these skills will further help data science enthusiasts to get the best jobs that various industries in their data science functions are offering.

Applications Of Data Science On 3D Imagery Data

The agenda of the talk included an introduction to 3D data, its applications and case studies, 3D data alignment and more.

Most popular Data Science and Machine Learning courses — July 2020

Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant

Deep Learning — not only for the big ones

How you can use Deep Learning even for small datasets. When you’re working on Deep Learning algorithms you almost always require a large volume of data to train your model on.