Consistent Value Function Definitions

Consistent Value Function Definitions

Mathematically defining value function and policies, as well as some of the emergent properties of these definitions. In this post, I will define the standard γ-discounted value functions used in reinforcement learning.

In this post, I will define the standard γ-discounted value functions used in reinforcement learning. From these definitions, I will discuss two important emergent properties of value functions that prove the self-consistency of the definitions. I will build up these concepts mathematically, focusing on writing out every step in the derivations and discussing the implications of each step. These equations are the foundation of many important mathematical proofs in RL and understanding them completely is important to building a theoretical understanding of RL.

Value Functions

Value functions are at the core of reinforcement learning. For any given state, an agent can query a value function to determine the “value” associated with being in that state. We traditionally define “value” as being the sum of rewards obtained into the future. Because of its dependence on what rewards the agent will see in the future, a value function must be defined for a given strategy of behavior; a policy. That is, the value of a state depends on how the agent behaves after visiting that state; a discussion of “value” that is independent of behavior is meaningless.

We denote a policy as a function which maps a state to a probability distribution over actions. Formally,

where 𝒮 denotes the set of all possible states that the agent can visit (often called the “state space”), 𝒜 denotes the set of all possible actions (often called the “action space”), and Δ(𝒜) denotes the standard simplex over the set of actions. The standard simplex is simply a formal way of writing a probability distribution over the action space. Simply put, a policy takes a state and returns a weighting over which actions the agent should take in that state. A large weighting leads to a frequency of selecting that action, a small weighting leads towards a low frequency.

bellman-equation mathematics theory machine-learning

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

What is Supervised Machine Learning

What is neuron analysis of a machine? Learn machine learning by designing Robotics algorithm. Click here for best machine learning course models with AI

Pros and Cons of Machine Learning Language

AI, Machine learning, as its title defines, is involved as a process to make the machine operate a task automatically to know more join CETPA

How To Get Started With Machine Learning With The Right Mindset

You got intrigued by the machine learning world and wanted to get started as soon as possible, read all the articles, watched all the videos, but still isn’t sure about where to start, welcome to the club.

What is Machine learning and Why is it Important?

Machine learning is quite an exciting field to study and rightly so. It is all around us in this modern world. From Facebook’s feed to Google Maps for navigation, machine learning finds its application in almost every aspect of our lives. It is quite frightening and interesting to think of how our lives would have been without the use of machine learning. That is why it becomes quite important to understand what is machine learning, its applications and importance.

Three Month Plan to Learn Mathematics Behind Machine Learning

In this article, I have shared a 3-month plan to learn mathematics for machine learning. As we know, almost all machine learning algorithms make use of concepts of Linear Algebra, Calculus, Probability & Statistics, etc.