Cracking Blackjack — Finishing the First-Visit Monte Carlo Algorithm

Hi!

If you haven’t done so already, please read Parts 1–4 before continuing. The rest of this article will assume you have read and understood the previous articles.

Image for post

Image from Unsplash


Outline for this Article

  • Learn how the run_mc() function facilitates the algorithm.
  • Dive deep into Step 6 of the First-Visit MC algorithm, which is where the Q-table and Prob-table are updated after each episode.

10,000 ft Overview of First-Visit MC from Part 4

  1. Initialize the Blackjack environment from Part 2.
  2. Define the Q-table, Prob-table, alpha (α), epsilon (ε), ε-decay, ε-min, and gamma (γ) explained in Part 3.
  3. Define how many episodes you would like your agent to learn from. More episodes usually yield a more profitable policy.
  4. Play an episode. Record all of the (**state → action → reward) **tuples in the episode.
  5. After the episode, apply ε-decay/ε-min to ε.
  6. Then, update the Q-table and the Prob-table using (**state → action → reward) **tuples from Step 4 and the associated formulas.
  7. Repeat Steps 4–6 for the number of episodes defined in Step 3.
  8. After all episodes, the resulting Q-table and Prob-table represent the optimized policy for Blackjack that the AI agent just learned.

How to Facilitate Steps 4–6

In Part 4, we learned how Step 4 is implemented in the play_game() function. Before diving into Steps 5 & 6 in the same level of detail, I would like introduce the run_mc() function, which allows Steps 4–6 to work together.

Skim the code below. I will be explaining it in detail below. View the code in its entirety here.

#reinforcement-learning #machine-learning #data-science #cracking-blackjack #artificial-intelligence #deep learning

What is GEEK

Buddha Community

Cracking Blackjack — Finishing the First-Visit Monte Carlo Algorithm

Cracking Blackjack — Finishing the First-Visit Monte Carlo Algorithm

Hi!

If you haven’t done so already, please read Parts 1–4 before continuing. The rest of this article will assume you have read and understood the previous articles.

Image for post

Image from Unsplash


Outline for this Article

  • Learn how the run_mc() function facilitates the algorithm.
  • Dive deep into Step 6 of the First-Visit MC algorithm, which is where the Q-table and Prob-table are updated after each episode.

10,000 ft Overview of First-Visit MC from Part 4

  1. Initialize the Blackjack environment from Part 2.
  2. Define the Q-table, Prob-table, alpha (α), epsilon (ε), ε-decay, ε-min, and gamma (γ) explained in Part 3.
  3. Define how many episodes you would like your agent to learn from. More episodes usually yield a more profitable policy.
  4. Play an episode. Record all of the (**state → action → reward) **tuples in the episode.
  5. After the episode, apply ε-decay/ε-min to ε.
  6. Then, update the Q-table and the Prob-table using (**state → action → reward) **tuples from Step 4 and the associated formulas.
  7. Repeat Steps 4–6 for the number of episodes defined in Step 3.
  8. After all episodes, the resulting Q-table and Prob-table represent the optimized policy for Blackjack that the AI agent just learned.

How to Facilitate Steps 4–6

In Part 4, we learned how Step 4 is implemented in the play_game() function. Before diving into Steps 5 & 6 in the same level of detail, I would like introduce the run_mc() function, which allows Steps 4–6 to work together.

Skim the code below. I will be explaining it in detail below. View the code in its entirety here.

#reinforcement-learning #machine-learning #data-science #cracking-blackjack #artificial-intelligence #deep learning

Cracking Blackjack : Setup + Simple Monte Carlo Simulations

Hi!

I am writing this “Cracking Blackjack” blog series as a way of consolidating the knowledge I have gained in the Reinforcement Learning space. I needed to combine the info of 30+ articles and blog posts when I was creating this Blackjack project, and I want to make it easier for the next person.

I am by no means an expert on this stuff. Please feel free to reach out if you have any questions or concerns about the information I am about to present! My contact info is at the bottom.

Image for post


Quick Summary of Blackjack

Blackjack is a card game played against a dealer. At the start of a round, both player and dealer are dealt 2 cards. The player can only see one of the dealer’s cards. The goal of the game is to get the value of our cards as close to 21 as possible, without crossing 21. The value of each card is listed below.

#reinforcement-learning #data-science #monte-carlo-method #cracking-blackjack #artificial-intelligence #deep learning

A Structural Overview of Reinforcement Learning Algorithms

Reinforcement learning has gained tremendous popularity in the last decade with a series of successful real-world applications in robotics, games and many other fields.

In this article, I will provide a high-level structural overview of classic reinforcement learning algorithms. The discussion will be based on their similarities and differences in the intricacies of algorithms.

RL Basics

Let’s start with a quick refresher on some basic concepts. If you are already familiar with all the terms of RL, feel free to skip this section.

Reinforcement learning models are a type of state-based models that utilize the markov decision process(MDP). The basic elements of RL include:

Episode(rollout): playing out the whole sequence of state and action until reaching the terminate state;

Current state s (or s_t_): where the agent is current at;

Next state s’ (or s_t+1_): next state from the current state;

Action a: the action to take at state s;

Transition probability P(s’|s, a): the probability of reaching s’ if taking action at at state s_t_;

Policy π(s, a): a mapping from each state to an action that determines how the agent acts at each state. It can be either deterministic or stochastic

Reward r (or R(s, a)): a reward function that generates rewards for taking action a at state s;

Return G_t_: total future rewards at state s_t;_

Value V(s): expected return for starting from state s;


Q value Q(s, a): expected return for starting from state s and taking action a;

Bellman equation

According to the Bellman equation, the current value is equal to current reward plus the discounted(γ) value at the next step, following the policy π. It can also be expressed using the Q value as:

This is the theoretical core in most reinforcement learning algorithms.

Prediction vs. Control Tasks

There are two fundamental tasks of reinforcement learning: prediction and control.

In prediction tasks, we are given a policy and our goal is to evaluate it by estimating the value or Q value of taking actions following this policy.

In control tasks, we don’t know the policy, and the goal is to find the optimal policy that allows us to collect most rewards. In this article, we will only focus on control problems.

RL Algorithm Structure

Below is a graph I made to visualize the high-level structure of different types of algorithms. In the next few sections, we will delve into the intricacies of each type.

MDP World

In the MDP world, we have a mental model of how the world works, meaning that we know the MDP dynamics (transition P(s’|s,a) and reward function R(s, a)), so we can directly build a model using the Bellman equation.

Again, in control tasks our goal is to find a policy that gives us maximum rewards. To achieve it, we use dynamic programming.

Dynamic Programming (Iterative Methods)

1. Policy Iteration

Policy iteration essentially performs two steps repeatedly until convergence: policy evaluation and policy improvement.

In the policy evaluation step, we evaluate the policy **π **at state s by calculating the Q value using the Bellman equation:

In the policy improvement step, we update the policy by greedily searching for the action that maximizes the Q value at each step.

Let’s see how policy iteration works.

2. Value Iteration

Value iteration combines the two steps in policy iteration so we only need to update the Q value. We can interpret value iteration as always following a greedy policy because at each step it always tries to find and take the action that maximizes the value. Once the values converge, the optimal policy can be extracted from the value function.

In most real-world scenarios, we don’t know the MDP dynamics so the applications of iterative methods are limited. In the next section, we will switch gears and discuss reinforcement learning methods that can deal with the unknown world.

#monte-carlo #deep-q-learning #algorithms #algorithms

August  Larson

August Larson

1625085120

Monte Carlo Simulation and Variants with Python

Your Guide to Monte Carlo Simulation and Must Know Statistical Sampling Techniques With Python Implementation

onte Carlo Simulation is based on repeated random sampling. The underlying concept of Monte Carlo is to use **randomness **to solve problems that might be deterministic in principle. Monte Carlo simulation is one of the most popular techniques to draw inferences about a population without knowing the true underlying population distribution. This sampling technique becomes handy especially when one doesn’t have the luxury to repeatedly sample from the original population. Applications of Monte Carlo Simulation range from solving problems in theoretical physics to predicting trends in financial investments.

Monte Carlo has 3 main usages: estimate parameters or statistical measures, examine the properties of the estimates, approximate integrals

This article is about these 3 usages of the Monte Carlo procedures and about 3 Monte Carlo variants, statistical sampling techniques, which can be used to generate independent random samples. The article will cover the following topics:

- Introduction to Monte Carlo Simulation
- MC Parameter Estimation
- MC Examining the Estimate Properties
- MC Integrals Approximation
- Importance Sampling
- Rejection Sampling
- Inverse Transform Sampling

_This article is suited for readers who have prior Statistical knowledge since it will cover medium-level statistical concepts and examples. If you want to learn essential statistical concepts from scratch, you can check my previous article about _Fundamentals Of Statistics here.

#machine-learning #programming #monte-carlo #data-science #python #monte carlo simulation and variants with python

Dejah  Reinger

Dejah Reinger

1599921480

API-First, Mobile-First, Design-First... How Do I Know Where to Start?

Dear Frustrated,

I understand your frustration and I have some good news and bad news.

Bad News First (First joke!)
  • Stick around another 5-10 years and there will be plenty more firsts to add to your collection!
  • Definitions of these Firsts can vary from expert to expert.
  • You cannot just pick a single first and run with it. No first is an island. You will probably end up using a lot of these…

Good News

While there are a lot of different “first” methodologies out there, some are very similar and have just matured just as our technology stack has.

Here is the first stack I recommend looking at when you are starting a new project:

1. Design-First (Big Picture)

Know the high-level, big-picture view of what you are building. Define the problem you are solving and the requirements to solve it. Are you going to need a Mobile app? Website? Something else?

Have the foresight to realize that whatever you think you will need, it will change in the future. I am not saying design for every possible outcome but use wisdom and listen to your experts.

2. API First

API First means you think of APIs as being in the center of your little universe. APIs run the world and they are the core to every (well, almost every) technical product you put on a user’s phone, computer, watch, tv, etc. If you break this first, you will find yourself in a world of hurt.

Part of this First is the knowledge that you better focus on your API first, before you start looking at your web page, mobile app, etc. If you try to build your mobile app first and then go back and try to create an API that matches the particular needs of that one app, the above world of hurt applies.

Not only this but having a working API will make design/implementation of your mobile app or website MUCH easier!

Another important point to remember. There will most likely be another client that needs what this API is handing out so take that into consideration as well.

3. API Design First and Code-First

I’ve grouped these next two together. Now I know I am going to take a lot of flak for this but hear me out.

Code-First

I agree that you should always design your API first and not just dig into building it, However, code is a legitimate design tool, in the right hands. Not everyone wants to use some WYSIWYG tool that may or may not take add eons to your learning curve and timetable. Good Architects (and I mean GOOD!) can design out an API in a fraction of the time it takes to use some API design tools. I am NOT saying everyone should do this but don’t rule out Code-First because it has the word “Code” in it.

You have to know where to stop though.

Designing your API with code means you are doing design-only. You still have to work with the technical and non-technical members of your team to ensure that your API solves your business problem and is the best solution. If you can’t translate your code-design into some visual format that everyone can see and understand, DON’T use code.

#devops #integration #code first #design first #api first #api