How to Win Texas Hold em Poker



Similar documents
Tartanian5: A Heads-Up No-Limit Texas Hold em Poker-Playing Program

Adaptive play in Texas Hold em Poker

Laboratory work in AI: First steps in Poker Playing Agents and Opponent Modeling

Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions

University of Alberta. Library Release Form

A heads-up no-limit Texas Hold em poker player: Discretized betting models and automatically generated equilibrium-finding programs

Action Translation in Extensive-Form Games with Large Action Spaces: Axioms, Paradoxes, and the Pseudo-Harmonic Mapping

SARTRE: System Overview

Decision Generalisation from Game Logs in No Limit Texas Hold em

Texas Hold em. From highest to lowest, the possible five card hands in poker are ranked as follows:

Potential-Aware Imperfect-Recall Abstraction with Earth Mover s Distance in Imperfect-Information Games

Accelerating Best Response Calculation in Large Extensive Games

Rafael Witten Yuze Huang Haithem Turki. Playing Strong Poker. 1. Why Poker?

Game theory and AI: a unified approach to poker games

Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling

Champion Poker Texas Hold em

Measuring the Size of Large No-Limit Poker Games

A competitive Texas Hold em poker player via automated abstraction and real-time equilibrium computation

Creating a NL Texas Hold em Bot

Playing around with Risks

Best-response play in partially observable card games

Nikolai Yakovenko, PokerPoker LLC CU Neural Networks Reading Group Dec 2, 2015

A Simulation System to Support Computer Poker Research

Combinatorics 3 poker hands and Some general probability

SYMMETRIC FORM OF THE VON NEUMANN POKER MODEL. Guido David 1, Pearl Anne Po 2

Lecture V: Mixed Strategies

THREE CARD BRAG (FLASH)

Monte Carlo Tree Search and Opponent Modeling through Player Clustering in no-limit Texas Hold em Poker

Better Automated Abstraction Techniques for Imperfect Information Games, with Application to Texas Hold em Poker

Nash Equilibria and. Related Observations in One-Stage Poker

Source.

Artificial Intelligence Beating Human Opponents in Poker

In this variation of Poker, a player s cards are hidden until showdown. Prior to receiving cards, you must place an initial wager known as an ante.

1 Representation of Games. Kerschbamer: Commitment and Information in Games

Minimax Strategies. Minimax Strategies. Zero Sum Games. Why Zero Sum Games? An Example. An Example

Ultimate Texas Hold'em features head-to-head play against the player/dealer and an optional bonus bet.

BAD BEAT. Bad Beat will reset at $10,000 with a qualifier of four deuces beaten. Every Monday at 6:00 AM the Bad Beat will increase by $10,000.

We employed reinforcement learning, with a goal of maximizing the expected value. Our bot learns to play better by repeated training against itself.

RULES FOR TEXAS HOLD EM POKER

Heads-up Limit Hold em Poker is Solved

Game Theory and Algorithms Lecture 10: Extensive Games: Critiques and Extensions

Bayesian Nash Equilibrium

A Near-Optimal Strategy for a Heads-Up No-Limit Texas Hold em Poker Tournament

2016 POKER TOURNAMENT CONTEST RULES

Analysis of poker strategies in heads-up poker

RULES FOR PLAY TEXAS HOLD EM

Poker Strategies. Joe Pasquale CSE87: UCSD Freshman Seminar on The Science of Casino Games: Theory of Poker Spring 2006

Casino Gaming Rule 2010

Bonus Maths 2: Variable Bet Sizing in the Simplest Possible Game of Poker (JB)

This section of the guide describes poker rules for the following cash game types:

Object of the Game The object of the game is for each player to form a five-card hand that ranks higher than the player-dealer s hand.

6.254 : Game Theory with Engineering Applications Lecture 2: Strategic Form Games

Identifying Player s Strategies in No Limit Texas Hold em Poker through the Analysis of Individual Moves

IMPORTANT: These are the rules for Youth Home s Texas Hold em Poker Tournament. Entry into the charity casino night event is separate and plays by

Game Theory for Humans. Matt Hawrilenko MIT: Poker Theory and Analytics

6.207/14.15: Networks Lecture 15: Repeated Games and Cooperation

Sequential lmove Games. Using Backward Induction (Rollback) to Find Equilibrium

Dartmouth College. Sandbagging in One-Card Poker

Classification/Decision Trees (II)

TEXAS HOLD EM POKER FOR SIGHT

cachecreek.com Highway 16 Brooks, CA CACHE

Terms and Conditions for Charitable Texas Hold em Poker Tournaments

Know it all. Table Gaming Guide

Knowledge and Strategy-based Computer Player for Texas Hold'em Poker

Probabilities of Poker Hands with Variations

Computational Game Theory and Clustering

Will Tipton began playing poker online in He steadily moved up in stakes in online HUNL tournaments to become a regular winner in the high

During the last several years, poker has grown in popularity. Best Hand Wins: How Poker Is Governed by Chance. Method

SCHEDULE OF PLAY LEVEL ANTE BLINDS

NewPokerSoft. Texas Holdem Poker Game Simulator

Texas Hold em No Limit Freeze Poker Tournament

CS 341 Software Design Homework 5 Identifying Classes, UML Diagrams Due: Oct. 22, 11:30 PM

Intelligent Agent for Playing Casino Card Games

Game Theory and Poker

Bayesian Tutorial (Sheet Updated 20 March)

PLACE BETS (E) win each time a number is thrown and lose if the dice ODDS AND LAYS HARDWAYS (F) BUY & LAY BETS (G&H)

on a table having positions for six players on one side of the table 1. A true to scale rendering and a color photograph of the

A Comedy of Errors - Light!

Slots seven card stud...22

Phantom bonuses. November 22, 2004

Games of Incomplete Information

UNDERGROUND TONK LEAGUE

Von Neumann and Newman poker with a flip of hand values

Poker-CNN: A Pattern Learning Strategy for Making Draws and Bets in Poker Games Using Convolutional Networks

2 nd Year Software Engineering Project Final Group Report. automatedpoker player

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

On the Existence of Nash Equilibrium in General Imperfectly Competitive Insurance Markets with Asymmetric Information

Perfect Bayesian Equilibrium

How to Play. Player vs. Dealer

Baccarat Gold. 1 Copyright 2007 Dolchee, LLC. All rights reserved.

k o G ob in d m n a a G H

Government of Nunavut. Community and Government Services. Consumer Affairs Division

Training Manual. Shuffle Master Gaming Three Card Poker Training Manual R

Using Probabilistic Knowledge and Simulation to Play Poker

MATH 340: MATRIX GAMES AND POKER

How to Play Blackjack Alex Powell, Jayden Dodson, Triston Williams, Michael DuVall University of North Texas TECM /27/2014

Dynamics and Equilibria

Roulette Wheel Selection Game Player

No MEASURING SKILL IN MORE-PERSON GAMES WITH APPLICATIONS TO POKER. By Ruud Hendrickx, Peter Borm, Ben van der Genugten, Pim Hilbers

A Graph-Theoretic Network Security Game

Transcription:

How to Win Texas Hold em Poker Richard Mealing Machine Learning and Optimisation Group School of Computer Science University of Manchester / 44

How to Play Texas Hold em Poker Deal private cards per player st (sequential) betting round 3 Deal 3 shared cards ( flop ) 4 nd betting round 5 Deal shared card ( turn ) 6 3rd betting round 7 Deal shared card ( river ) 8 4th (final) betting round If all but player folds, that player wins the pot (total bet) Otherwise at the end of the game hands are compared ( showdown ) and the player with the best hand wins the pot / 44

How to Play Texas Hold em Poker 3 / 44

How to Play Texas Hold em Poker Ante = forced bet (everyone pays) Blinds = forced bets ( people pay big/small) If players > then (big blind player, small blind player, dealer) If players = ( heads-up ) then (big blind, small blind/dealer) No-Limit Texas Hold em lets you bet all your money in a round Minimum bet = big blind Maximum bet = all your money Limit Texas Hold em Poker has fixed betting limits A $4/$8 game means in betting rounds & bets = $4 and in betting rounds 3 & 4 bets = $8 Big blind usually equals small bet e.g. $4 and small blind is usually 5% of big blind e.g. $ Total number of raises per betting round is usually capped at 4 or 5 4 / 44

-Card Poker Trees Game tree - both players private cards are known 5 / 44

-Card Poker Trees Public tree - both players private cards are hidden 6 / 44

-Card Poker Trees P information set tree - P s private card is hidden 7 / 44

-Card Poker Trees P information set tree - P s private card is hidden 8 / 44

-Card Poker Trees Game tree - both players private cards are known Public tree - both players private cards are hidden 3 P information set tree - P s private card is hidden 4 P information set tree - P s private card is hidden 9 / 44

Heads-Up Limit Texas Hold em Poker Tree Size Cards Dealt F C R F C R F C R F C R F C R F C R F C R F C R F F C C P dealt private cards = ( 5 ) = 36 P dealt private cards = ( 5 ) = 5 st betting round = 9, 9 continuing Flop dealt = ( 48 3 ) = 796 nd betting round = 9, 9 continuing Turn dealt = 45 3rd betting round = 9, 9 continuing River dealt = 44 4th betting round = 9 / 44

Heads-Up Limit Texas Hold em Poker Tree Size Player Deal = Player Deal = 36 st Betting Round = 36 * 5 * 9 nd Betting Round = 36 * 5 * 9 * 796 * 9 3rd Betting Round = 36 * 5 * 9 * 796 * 9 * 45 * 9 4th Betting Round = 36 * 5 * 9 * 796 * 9 * 45 * 9 * 44 * 9 Total =.79 8 (quintillion) / 44

Abstraction Lossless Suit isomorphism, at the start (pre-flop) two hands are strategically the same if each of their cards ranks match and they are both suited or off-suit e.g. (A K, A K ) or (T J, T J ), 69 equivalence classes reduces possible starting hands from 6435 to 856 Lossy Bucketing (binning) groups hands into equivalence classes e.g. based on their probability of winning at showdown against a random hand Imperfect recall eliminates past information Betting round reduction Betting round elimination / 44

Abstraction Heads-up Limit Texas Hold em poker has around 8 states Abstraction can reduce the game to e.g. 7 states Nesterov s excessive gap technique can find approximate Nash equilibria in a game with states Counterfactual regret minimization can find approximate Nash equilibria in a game with states 3 / 44

Nash Equilibrium Game theoretic solution Set of strategies per player such that no one can do better by changing their strategy if the others keep their strategies fixed Nash proved that in every game with finite players and pure strategies there is at least (possibly mixed) Nash equilibrium 4 / 44

Annual Computer Poker Competition Heads-up Limit Texas Hold em Total Bankroll: Slumbot (Eric Jackson, USA) Little Rock (Rod Byrnes, Australia) and Zbot (Ilkka Rajala, Finland) Bankroll Instant Run-off: Slumbot (Eric Jackson, USA) Hyperborean (University of Alberta, Canada) 3 Zbot (Ilkka Rajala, Finland) Heads-up No-Limit Texas Hold em Total Bankroll: Little Rock (Rod Byrnes, Australia) Hyperborean (University of Alberta, Canada) 3 Tartanian5 (Carnegie Mellon University, USA) Bankroll Instant Run-off: Hyperborean (University of Alberta, Canada) Tartanian5 (Carnegie Mellon University, USA) 3 Neo Poker Bot (Alexander Lee, Spain) 3-player Limit Texas Hold em Total Bankroll: Hyperborean (University of Alberta, Canada) Little Rock (Rod Byrnes, Australia) 3 Neo Poker Bot (Alexander Lee, Spain) and Sartre (University of Auckland, New Zealand) Bankroll Instant Run-off: Hyperborean (University of Alberta, Canada) Little Rock (Rod Byrnes, Australia) 3 Neo Poker Bot (Alexander Lee, Spain) and Sartre (University of Auckland, New Zealand) Source: http://www.computerpokercompetition.org/index.php/competitions/results/9--results 5 / 44

Annual Computer Poker Competition Total Bankroll = total money won against all agents Bankroll Instant Run-off Set S = all agents Set N = agents in a game 3 Play every ( S N ) possible matches between agents in S storing each agent s total bankroll 4 Remove the agent(s) with the lowest total bankroll from S 5 Repeat steps and 3 until S only contains N agents 6 Play a match between the last N agents and rank them according to their total bankroll in this game 6 / 44

Extensive-Form Game A finite set of players N = {,,..., N } {c} A finite set of action sequences or histories e.g. H = {(),..., (A A ),...} Z H terminal histories e.g. Z = {..., (A A, 7, r, F ),...} A(h) = {a : (h, a) H} actions available after history h H\Z P(h) N {c} player who takes an action after history h H\Z u i : Z R utility function for player i 7 / 44

Extensive-Form Game f c maps every history h where P(h) = c to an independent probability distribution f c (a h) for all a A(h) I i is an information partition (set of nonempty subsets of X where each element of X is in subset) for player i I j I i is player i s jth information set containing indistinguishable histories e.g. I j = {..., (A A, 7 ),..., (A A, 6 3 ),...} Player i s strategy σ i is a function that assigns a distribution over A(I j ) for all I j I i where A(I j ) = A(h) for any h I j A strategy profile σ is a strategy for each player σ = {σ, σ,..., σ N } 8 / 44

Nash Equilibrium Nash Equilibrium: u (σ) max σ Σ u (σ, σ ) u (σ) max σ Σ u (σ, σ ) ɛ-nash Equilibrium: u (σ) + ɛ max σ Σ u (σ, σ ) u (σ) + ɛ max σ Σ u (σ, σ ) 9 / 44

Extensive-Form Game.5J.5K.5J.5K.5J.5K I I I I.6C.4R.6C.4R.3C.7R.3C.7R.8C.R.F.C.C.9R.F.C.8C.R.F.C.C.9R.F.C -.F.C.F.C.F.C.F.C - I = {I, I, I 7, I 8 } and I = {I 3, I 4, I 5, I 6 } A((J, J)) = {C, R} and P((J, J)) = / 44

Extensive-Form Game.5J.5K.5J.5K.5J.5K I I I I.6C.4R.6C.4R.3C.7R.3C.7R.8C.R.F.C.C.9R.F.C.8C.R.F.C.C.9R.F.C -.F.C.F.C.F.C.F.C - I = {I, I, I 7, I 8 } and I = {I 3, I 4, I 5, I 6 } A((J, J)) = {C, R} and P((J, J)) = / 44

Extensive-Form Game.5J.5K.5J.5K.5J.5K I I I I.6C.4R.6C.4R.3C.7R.3C.7R.8C.R.F.C.C.9R.F.C.8C.R.F.C.C.9R.F.C -.F.C.F.C.F.C.F.C - I = {I, I, I 7, I 8 } and I = {I 3, I 4, I 5, I 6 } A((J, J)) = {C, R} and P((J, J)) = / 44

Extensive-Form Game.5J.5K.5J.5K.5J.5K I I I I.6C.4R.6C.4R.3C.7R.3C.7R.8C.R.F.C.C.9R.F.C.8C.R.F.C.C.9R.F.C -.F.C.F.C.F.C.F.C - I = {I, I, I 7, I 8 } and I = {I 3, I 4, I 5, I 6 } A((J, J)) = {C, R} and P((J, J)) = 3 / 44

Extensive-Form Game.5J.5K.5J.5K.5J.5K I I I I.6C.4R.6C.4R.3C.7R.3C.7R.8C.R.F.C.C.9R.F.C.8C.R.F.C.C.9R.F.C -.F.C.F.C.F.C.F.C - I = {I, I, I 7, I 8 } and I = {I 3, I 4, I 5, I 6 } A((J, J)) = {C, R} and P((J, J)) = 4 / 44

Extensive-Form Game.5J.5K.5J.5K.5J.5K I I I I.6C.4R.6C.4R.3C.7R.3C.7R.8C.R.F.C.C.9R.F.C.8C.R.F.C.C.9R.F.C -.F.C.F.C.F.C.F.C - I = {I, I, I 7, I 8 } and I = {I 3, I 4, I 5, I 6 } A((J, J)) = {C, R} and P((J, J)) = 5 / 44

Extensive-Form Game.5J.5K.5J.5K.5J.5K I I I I.6C.4R.6C.4R.3C.7R.3C.7R.8C.R.F.C.C.9R.F.C.8C.R.F.C.C.9R.F.C -.F.C.F.C.F.C.F.C - I = {I, I, I 7, I 8 } and I = {I 3, I 4, I 5, I 6 } A((J, J)) = {C, R} and P((J, J)) = 6 / 44

Extensive-Form Game.5J.5K.5J.5K.5J.5K I I I I.6C.4R.6C.4R.3C.7R.3C.7R.8C.R.F.C.C.9R.F.C.8C.R.F.C.C.9R.F.C -.F.C.F.C.F.C.F.C - I = {I, I, I 7, I 8 } and I = {I 3, I 4, I 5, I 6 } A((J, J)) = {C, R} and P((J, J)) = 7 / 44

Extensive-Form Game.5J.5K.5J.5K.5J.5K I I I I.6C.4R.6C.4R.3C.7R.3C.7R.8C.R.F.C.C.9R.F.C.8C.R.F.C.C.9R.F.C -.F.C.F.C.F.C.F.C - I = {I, I, I 7, I 8 } and I = {I 3, I 4, I 5, I 6 } A((J, J)) = {C, R} and P((J, J)) = 8 / 44

Extensive-Form Game.5J.5K.5J.5K.5J.5K I I I I.6C.4R.6C.4R.3C.7R.3C.7R.8C.R.F.C.C.9R.F.C.8C.R.F.C.C.9R.F.C -.F.C.F.C.F.C.F.C - I = {I, I, I 7, I 8 } and I = {I 3, I 4, I 5, I 6 } A((J, J)) = {C, R} and P((J, J)) = 9 / 44

Extensive-Form Game.5J.5K.5J.5K.5J.5K I I I I.6C.4R.6C.4R.3C.7R.3C.7R.8C.R.F.C.C.9R.F.C.8C.R.F.C.C.9R.F.C -.F.C.F.C.F.C.F.C - f c (J (J)) =.5 and f c (K (J)) =.5 σ (I, C) =.6 and σ (I, R) =.4 3 / 44

Extensive-Form Game.5J.5K.5J.5K.5J.5K I I I I.6C.4R.6C.4R.3C.7R.3C.7R.8C.R.F.C.C.9R.F.C.8C.R.F.C.C.9R.F.C -.F.C.F.C.F.C.F.C - f c (J (J)) =.5 and f c (K (J)) =.5 σ (I, C) =.6 and σ (I, R) =.4 3 / 44

Extensive-Form Game.5J.5K.5J.5K.5J.5K I I I I.6C.4R.6C.4R.3C.7R.3C.7R.8C.R.F.C.C.9R.F.C.8C.R.F.C.C.9R.F.C -.F.C.F.C.F.C.F.C - f c (J (J)) =.5 and f c (K (J)) =.5 σ (I, C) =.6 and σ (I, R) =.4 3 / 44

Counterfactual Regret Minimization Counterfactual regret minimization minimizes the maximum counterfactual regret (over all actions) at every information set Minimizing counterfactual regrets minimizes overall regret In a two-player zero-sum game at time T, if both players average overall regret is less than ɛ, then σ T is a ɛ Nash equilibrium. 33 / 44

Counterfactual Regret Minimization Counterfactual Value v i (I j σ) = n I j π σ i(root, n)u i (n) u i (n) = z Z[n] π σ (n, z)u i (z) v i (I j σ) is the counterfactual value to player i of information set I j given strategy profile σ π i σ (root, n) is the probability of reaching node n from the root ignoring player i s contributions according to strategy profile σ π σ (n, z) is the probability of reaching node z from node n according to strategy profile σ u i (n) is the payoff to player i at node n if it is a leaf node or its expected payoff if it is a non-leaf node Z[n] is the set of terminal nodes that can be reached from node n 34 / 44

Counterfactual Regret Minimization.5J.5K.5J.5K.5J.5K I I I I.6C.4R.6C.4R.3C.7R.3C.7R.8C.R.F.C.C.9R.F.C.8C.R.F.C.C.9R.F.C -.F.C.F.C.F.C.F.C - v (I 8 σ) = π i(root, σ n)u (n) n I 8 =.5.5. (. +. ) +.5.5.9 (. +. ) =. 35 / 44

Counterfactual Regret Minimization Counterfactual Regret r(i j, a) = v i (I j σ Ij a) v i (I j σ) r(i j, a) is the counterfactual regret of not playing action a at information set I j Positive regret means the player would have preferred to play action a rather than their strategy Zero regret means the player was indifferent between their strategy and action a Negative regret means the player preferred their strategy rather than playing action a 36 / 44

Counterfactual Regret Minimization.5J.5K.5J.5K.5J.5K I I I I.6C.4R.6C.4R.3C.7R.3C.7R.8C.R.F.C.C.9R.F.C.8C.R.F.C.C.9R.F.C -.F.C.F.C.F.C.F.C - v (I 8 σ F ) =.5.5. (. +. ) +.5.5.9 (. +. ) =.75 r (I 8 F ) = v (I 8 σ F ) v (I 8 σ) =.75. =.375 37 / 44

Counterfactual Regret Minimization Cumulative Counterfactual Regret R T (I j, a) = T r t (I j, a) t= R T (I j, a) is the cumulative counterfactual regret of not playing action a at information set I j for T time steps Positive cumulative regret means the player would have preferred to play action a rather than their strategy over those T steps Zero cumulative regret means the player was indifferent between their strategy and action a over those T steps Negative cumulative regret means the player preferred their strategy rather than playing action a over those T steps 38 / 44

Counterfactual Regret Minimization Regret Matching R T,+ (I j,a) if denominator is positive a σ T + (I j, a) = A(I j ) RT,+ (I j,a ) otherwise A(I j ) R T,+ (I j, a) = max(r T (I j, a), ) 39 / 44

Counterfactual Regret Minimization Initialise the strategy profile σ e.g. for all i N, for all I j I i and for all a A(I j ) set σ(i j, a) = A(I j ) For each player i N, for all I j I i and for all a A(I j ) calculate r(i j, a) and add it to R(I j, a) 3 For each player i N, for all I j I i and for all a A(I j ) use regret matching to update σ(i j, a) 4 Repeat from 4 / 44

Counterfactual Regret Minimization Cumulative counterfactual regret is bounded by R T i (I j ) (max z u i (z) min z u i (z)) A(I j ) T Total counterfactual regret is bounded by I i (max z u i (z) min z u i (z)) max h:p(h)=i A(h) Ri T T 4 / 44

Counterfactual Regret Minimization (a) Number of game states, number of iterations, computation time, and exploitability of the resulting strategy for different sized abstractions (b) Convergence rates for three different sized abstractions, x-axis shows iterations divided by the number of information sets in the abstraction Source: 8 - Regret Minimization in Games with Incomplete Information - Zinkevich et al 4 / 44

Summary If you want to win (in expectation) at Texas Hold em poker (against exploitable players) then... Abstract the version of Texas Hold em poker you are interested so it has at most game states Run the counterfactual minimization algorithm on the abstraction for T iterations and obtain the average strategy profile σ abs T 3 Map the average strategy profile σ abs T for the abstracted game to one σ T for the real game 4 Play your average strategy profile σ T against your (exploitable) opponents 43 / 44

References Annual Computer Poker Competition Website http://www.computerpokercompetition.org/ 8 - Regret Minimization in Games with Incomplete Information - Zinkevich et al - http://martin.zinkevich.org/publications/regretpoker.pdf 3 7 - Robust strategies and counter-strategies Building a champion level computer poker player - Johanson - http://poker.cs.ualberta.ca/publications/johanson.msc.pdf 4 3 - Monte Carlo Sampling and Regret Minimization for Equilibrium Computation and Decision-Making in Large Extensive Form Games - Lanctot http://era.library.ualberta.ca/public/view/item/uuid: 48ae86c-45-4c-b9c-3e7ce9bc9ae 44 / 44