Last year, I started my journey into machine learning through a Master’s program at Cornell Tech. One topic that particularly caught my eye was reinforcement learning(RL), which we approached from both the traditional direction of Markov Decision Processes (MDP) and from the direction of Deep Learning (DL). While the coursework was very informative, I wanted to take it a step further. Here, I have documented my (ongoing) attempt to do just that, by training an agent to solve a Rubik’s Cube.
A couple introductory notes:
A Markov Decision Process captures how an agent takes _actions in an environment. _Each action puts the agent in a different environmental _state, usually according to some probability distribution, _where the agent then has the possibility of receiving some _reward. _The agent’s goal is to learn a _policy _(i.e. the appropriate action to take in a given state) in order to maximize the long-run reward that the agent receives.
Source: Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.
More concretely, an MDP is defined by a tuple (S, I, Pₐ, Rₐ). Where:
S is the set of all possible states in the environment
A is the set of all possible actions the agent can take
Pₐ defines the probability distribution for transitioning from state s⁰ to state _s¹ _when taking action a
_Rₐ _specifies the reward received for taking action _a _in state s
#markov-decision-process #rubiks-cube #reinforcement-learning #deep-learning #deep-q-learning #deep learning