Last year, I started my journey into machine learning through a Master’s program at Cornell Tech. One topic that particularly caught my eye was reinforcement learning(RL), which we approached from both the traditional direction of Markov Decision Processes (MDP) and from the direction of Deep Learning (DL). While the coursework was very informative, I wanted to take it a step further. Here, I have documented my (ongoing) attempt to do just that, by training an agent to solve a Rubik’s Cube.

A couple introductory notes:

  1. Solving a Rubik’s Cube with reinforcement learning is not a new problem and I will be basing most of my work off this paper by Stephen McAleer et al., with some modifications.
  2. For certain concepts, I will try to go into as much detail as possible about my task specific implementations, so some familiarity with probability, machine learning, RL, and DL is recommended. That being said, I am by no means an expert and any and all feedback is much appreciated!

Introducing Markov Decision Process

A Markov Decision Process captures how an agent takes _actions in an environment. _Each action puts the agent in a different environmental _state, usually according to some probability distribution, _where the agent then has the possibility of receiving some _reward. _The agent’s goal is to learn a _policy _(i.e. the appropriate action to take in a given state) in order to maximize the long-run reward that the agent receives.

Image for post

Source: Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.

More concretely, an MDP is defined by a tuple (S, I, Pₐ, Rₐ). Where:

S is the set of all possible states in the environment

A is the set of all possible actions the agent can take

Pₐ defines the probability distribution for transitioning from state s⁰ to state _s¹ _when taking action a

_Rₐ _specifies the reward received for taking action _a _in state s

#markov-decision-process #rubiks-cube #reinforcement-learning #deep-learning #deep-q-learning #deep learning

Solving a Rubik’s Cube with Reinforcement Learning
1.35 GEEK