The Bandit Framework

Baby Robot is lost in the mall. Using Reinforcement Learning we want to help him find his way back to his mum. However, before he can even begin looking for her, he needs to recharge, from a set of power sockets that each give a slightly different amount of charge.

Using the strategies from the __multi-armed bandit __problem we need to find the best socket, in the shortest amount of time, to allow Baby Robot to get charged up and on his way.

This is the second, in a six part series, on Multi-Armed Bandits. In Part 1 we covered all the basic terminology and mathematics required to describe the bandit problem.

In this part we’ll take a look at the problem we’ll be solving in the forthcoming articles, describing exactly how the power socket problem will be setup. This covers all the code that is used to create the basic socket simulator and the test framework used to evaluate these sockets.

So, although we’ll not yet make it onto the actual Bandit algorithms, we’ll do all the required groundwork, to allow us to start examining the various Bandit strategies in subsequent parts.

All code for the bandit algorithms and testing framework can be found on github: Multi_Armed_Bandits

