We live in uncertainty and from time to time we have to make a choice when given several options none of which we know much. And with time, if we are faced with the same options over and over again, we might be able to learn which option tends to give us the maximal reward, which one is the second best, etc. Generally, such can be said of the human learning process, guided by our goal to maximize the total rewards (or minimize the total loss or regrets).

Moreover, some important business applications can be modeled this way too. Think about the following situations:

(1) Given a bunch of stock tickers, each with varying returns, how would you rationally pick the ones that would maximize your returns?

(2) There are three website landing page designs that you would like to try out, how would you pick one design that maximizes your metric, such as conversion rate?

(3) Suppose you want to promote your business and have three different advertising venues, how would you choose the one venue that gives you the best value for your budget?

These realistic business problems can all be handily abstracted into the following scenario:

Suppose you have N slot machines (or bandits), each of which with its own probability of rendering you the reward R. How do you figure out which slot machine to pick over time, in order to secure the reward as much as possible?

#decision-making #reinforcement-learning #data-science #machine-learning #artificial-intelligence #deep learning

Learn to Bet — Use Bayesian Bandits for Decision-Making
1.20 GEEK