In this, the fourth part of our series on Multi-Armed Bandits, we’re going to take a look at the Upper Confidence Bound (UCB) algorithm that can be used to solve the bandit problem.
If you’re not already familiar with the bandit problem and its terminology you may want to first take a look at the earlier parts of this series, which are as follows:
All code for the bandit algorithms and testing framework can be found on github: Multi_Armed_Bandits
Baby Robot is lost in the mall. Using Reinforcement Learning we want to help him find his way back to his mum. However, before he can even begin looking for her, he needs to recharge, from a set of power sockets that each give a slightly different amount of charge.
_Using the strategies from the __multi-armed bandit _problem we need to find the best socket, in the shortest amount of time, to allow Baby Robot to get charged up and on his way.
Baby Robot has entered a charging room containing 5 different power sockets. Each of these sockets returns a slightly different amount of charge. We want to get Baby Robot charged up in the minimum amount of time, so we need to locate the best socket and then use it until charging is complete.
This is identical to the Multi-Armed Bandit problem except that, instead of looking for a slot machine that gives the best payout, we’re looking for a power socket that gives the most charge.
#baby-robot-guide #reinforcement-learning #multi-armed-bandit #machine-learning