Tartanian5: A Heads-Up No-Limit Texas Hold em Poker-Playing Program
|
|
|
- Quentin Clement Barker
- 9 years ago
- Views:
Transcription
1 Tartanian5: A Heads-Up No-Limit Texas Hold em Poker-Playing Program Sam Ganzfried and Tuomas Sandholm Computer Science Department Carnegie Mellon University {sganzfri, sandholm}@cs.cmu.edu Abstract We present an overview of Tartanian5, a no-limit Texas Hold em agent which we submitted to the 01 Annual Computer Poker Competition. The agent plays a game-theoretic approximate Nash equilibrium strategy. First, it applies a potential-aware, perfect-recall, automated abstraction algorithm to group similar game states together and construct a smaller game that is strategically similar to the full game. In order to maintain a tractable number of possible betting sequences, it employs a discretized betting model, where only a small number of bet sizes are allowed at each game state. The strategies for both players are then computed using an improved version of Nesterov s excessive gap technique specialized for poker. To mitigate the effect of overfitting, we employ an ex-post purification procedure to remove actions that are played with small probability. One final feature of our agent is a novel algorithm for interpreting bet sizes of the opponent that fall outside our model. We describe our new approach in detail, and present theoretical and empirical advantages over prior approaches. Finally, we briefly describe ongoing research in our group involving real-time computation and opponent exploitation, which will hopefully be incorporated into our agents in future years. Introduction Developing effective strategies for agents in multiagent systems is an important and challenging problem. It has received significant attention in recent years from several different communities in part due to the competitions held at top conferences (e.g, the computer poker, robo-soccer, and trading agent competitions). As many domains are so large that solving them directly (i.e., computing a Nash equilibrium or a solution according to some other relevant solution concept) is computationally infeasible, some amount of approximation is necessary to produce agents. Specifically, significant work has been done on computing approximate game-theory-based strategies in large imperfect-information games (Sandholm 010). This work typically follows a three-step approach, which is depicted This material is based upon work supported by the National Science Foundation under grants IIS , IIS , and CCF We also acknowledge Intel Corporation and IBM for their machine gifts. Copyright c 01, Association for the Advancement of Artificial Intelligence ( All rights reserved. in Figure 1. First, a smaller game G is constructed which is strategically similar to the original game G. Initially this was done by a hand-crafted abstraction procedure (Billings et al. 003), and more recently has been done by automated abstraction algorithms (Gilpin and Sandholm 006; 007; Shi and Littman 00). Second, an equilibrium-finding algorithm is run on G to compute an ɛ-equilibrium σ (Hoda et al. 010; Zinkevich et al. 007). Third, a reverse mapping is applied to σ to compute an approximate equilibrium σ in the full game G (Gilpin, Sandholm, and Sørensen 008; Schnizlein, Bowling, and Szafron 009). Original game Nash equilibrium Abstraction algorithm Reverse mapping Abstracted game Nash equilibrium Custom algorithm for finding a Nash equilibrium Figure 1: General game-theoretic approach for solving large games. No-limit Texas Hold em poker is an especially challenging domain, as its game tree contains states (Gilpin, Sandholm, and Sørensen 008) 1. State of the art equilibrium-finding algorithms can only scale to games with approximately 10 1 game tree states (Hoda et al. 010; Zinkevich et al. 007); hence, significant abstraction and approximation is necessary to apply a game-theoretic approach. Interestingly, only two agents from the 011 competition were based on approximate equilibrium strategies, 1 This is for the two-player version with starting stacks of 500 big blinds where stacks are reset after each hand. Initially the Annual Computer Poker Competition used 500 big blind starting stacks, but for the last few years 00 big blind stacks were used; thus, the number of game states is actually smaller than
2 while 4 of the 7 agents that reported a description used handcrafted expert systems. The final agent used a case-based approach designed to imitate the strategies of top-performing agents in previous years competitions (Rubin and Watson 01). In 010 two of the five agents submitted to the twoplayer no-limit competition employed approximate equilibrium strategies, including the previous variant of our program, Tartanian4, which finished in first place in the total bankroll competition and in third place in the bankroll instant run-off competition. In this paper we describe our new no-limit approximateequilibrium agent, Tartanian5. The primary components of our agent are a potential-aware automated abstraction algorithm, discretized betting model, custom equilibriumfinding algorithm, ex-post purification procedure, and a novel reverse mapping algorithm. We describe our new reverse mapping algorithm in detail, as it has changed significantly from the algorithm used in our prior agents. We present theoretical and empirical advantages of our new approach over prior approaches in simplified settings, which we expect to translate directly to improved performance in Texas Hold em. An overview of Tartanian5 In this section we present the different technical components of our agent Tartanian5, and discuss the key differences from prior versions. Automated card abstraction In order to reduce to size of the game tree to make equilibrium computation feasible, we applied a potential-aware automated abstraction algorithm (Gilpin, Sandholm, and Sørensen 007). The algorithm uses clustering to group hands together that have similar histograms over future outcomes, and integer programing to determine how many children to allocate to each node. Many abstraction algorithms group hands together that have similar expected hand strength against a uniform rollout of the opponent s hands. Our approach takes into account that the strength of hands can vary significantly over the course of a hand, and some hands with similar expected hand strength should be played quite differently. For example, QJ suited and 55 both win about 60% of the time against a random hand, but should be played very differently. Our algorithm uses perfect recall, in which each player is forced to remember his hole cards and the exact sequence of community cards throughout the hand. Future work will involve considering imperfect recall abstractions, in which agents potentially forget information that they once knew (Waugh et al. 009). The previous version of our program, Tartanian4, used a potential-aware card abstraction with branching factors of 1, 5, 6, and 6 for the four betting rounds. Perhaps counterintuitively, this year we decided to use a smaller abstraction; Tartanian5 uses an abstraction with branching factors of 8, 1, 4, and 4. The reason for this change was that we realized the abstraction was so large that our equilibriumfinding algorithm was not able to converge sufficiently fast. Define the exploitability of strategy σ i to be the difference between the value of the game to player v i and the performance of σ i against its nemesis, formally: expl(σ i ) = v i min u i (σ i, σ i ). σ i Define the total exploitability of an agent which plays strategy σ 1 as player 1 and strategy σ as player to be expl(σ 1 ) + expl(σ ). Even within its abstracted game, Tartanian4 had exploitability of approximately 0.8 big blinds per hand. This is noteworthy for several reasons. First, notice that the strategy of always folding would have an exploitability of 0.75 big blinds per hand (since it would lose 0.5 big blinds when it is in the small blind and 1 big blind when it is in the big blind). So our program which came in first place in the 010 no-limit Texas Hold em division of the computer poker competition was more exploitable than always folding! Furthermore, the exploitability of 0.8 BB/hand assumed that the opponent was restricted to the same card and betting abstraction; the full game exploitability of our strategies was probably significantly higher. This indicates that there is still a long way to go in terms of developing strong game-theoretic agents for no-limit Texas Hold em. It also suggests that perhaps worst-case exploitability is not the most useful metric for evaluating the quality of strategies. This year, we decided to use a smaller abstraction that had a smaller exploitability in its abstraction. At the time of our decision, the abstraction had an exploitability of 0.55 BB/hand, while the abstraction had an exploitability of 0.13 BB/hand (and the abstraction had an exploitability of 0.8 BB/hand). We played these three agents against each other, and the agent (with intermediate abstraction size and exploitability) beat the other two. So we chose that for our abstraction size. The current exploitability of our strategy is approximately 0.3 BB/hand within the abstraction. Discretized betting model In limit Texas Hold em, there is a fixed betting size; thus at each game state, a player must choose between 3 options: folding, calling/checking, or betting. By contrast in no-limit Texas Hold em, a player can bet any amount (up to the number of chips in his stack) at any game state. This causes an enormous blowup in the size of the game tree. While twoplayer limit Texas Hold em has about game states in its tree, two-player no-limit Texas Hold em has about states. Thus, in order to develop a no-limit agent, some form of betting abstraction or discretization is needed. Our approach is to discretize the betting space into a small number of allowable bet sizes at each game state (Gilpin, Sandholm, and Sørensen 008). Folding, calling/checking, and going all-in are available at every game state. In addition, one or two intermediate bet sizes are also available at each game state (which ones are available depend on the betting history so far). These include bet sizes of 1 -pot, 3 - pot, and pot. This year we made some changes to our discretization, and allowed bets of 1.5-pot and -pot in certain situations.
3 Equilibrium finding Once we fixed our card abstraction and betting model, we ran a perl script which automatically generates C++ code for running an equilibrium-finding algorithm on the abstract game (Gilpin, Sandholm, and Sørensen 008). The algorithm we used is a gradient-based algorithm which uses an improved version of Nesterov s excessive gap technique specialized for poker (Hoda et al. 010). The computation was run using 64 cores at the Pittsburgh Supercomputing center on the world s largest shared-memory supercomputer (called blacklight). Purification Rather than play the exact strategies output by our equilibrium-finding algorithm, we decided to modify the action probabilities by preferring the higher-probability actions. The intuition for doing this is that we should ignore actions that are played with small probability in the abstract equilibrium, as they are likely due to abstraction coarseness, overfitting, or failure of the equilibrium-finding algorithm to fully converge (Ganzfried, Sandholm, and Waugh 01). In 010, we submitted two versions of our Tartanian4 agent; one that employed full purification, and one that employed thresholding using a threshold of The purification agent played the highest-probability action with probability 1 at each information set, and played the other actions with probability 0 (randomizing for ties). The thresholded agent rounded all action probabilities below 0.15 to 0, then renormalized. The purification agent beat the thresholded agent head-to-head, and outperformed it against all competition opponents except one. Thus, we decided to use full purification in Tartanian5. Reverse mapping While the strategies we computed assume the opponent is following our discretized betting model, our actual opponent may in fact make bet sizes that fall outside of our model. Thus a reverse mapping algorithm is necessary to map his bet size to one of the bet sizes in our model. Tartanian5 uses a new reverse mapping algorithm that differs from our algorithm of previous years, as well as the algorithms used by prior agents. Discussion of our new reverse mapping algorithm and comparison to prior approaches In this section, we present our new reverse mapping and discuss advantages over prior approaches. We first present several basic properties that a reverse mapping should satisfy, and show that each of the prior mappings either fails to satisfy some of the properties, or produces strategies that are highly exploitable. Furthermore, the prior mappings are based on heuristics and do not have any theoretical justification. By contrast, our mapping satisfies all of the properties, has low exploitability, and is theoretically motivated as the solution to simplified poker games. Problem formulation Suppose our opponent makes a bet of size x [A, B], where A denotes the largest betting size in our abstraction less than or equal to x, and B denotes the smallest betting size in our abstraction greater than or equal to x (we require that 0 A < B.) The reverse mapping problem is to determine whether we should map x to A or to B (perhaps probabilistically). Thus, our goal is to find a function f A,B (x), which denotes the probability that we map x to A (1 f A,B (x) denotes the probability that we map x to B); this is our reverse mapping function. We assume the pot size is 1, and that all values have been normalized accordingly. For simplicity, we will sometimes write f A,B just as f. Basic properties a reverse mapping algorithm should have It seems clear that every reverse mapping function should satisfy the following basic properties: 1. f A,B (A) = 1. f A,B (B) = 0 3. f A,B is monotonically decreasing 4. k > 0, x [A, B], f ka,kb (kx) = f A,B (x) Boundary constraints If the opponent takes an action that is actually in our abstraction, then it is natural to map his action x to the corresponding action with probability 1. Hence we require that f(a) = 1 and f(b) = 0. Monotonicity As the opponent s action x moves away from A towards B, it is natural to require that the probability of his action being mapped to A decreases. Thus we require that f be monotonically decreasing. Scale invariance This condition requires that scaling A, B, and x by some multiplicative factor k > 0 does not affect our reverse mapping. In poker for example, it is common to scale all bet sizes by the size of the big blind or the size of the pot. Survey of existing reverse mapping algorithms In this section, we will present several reverse mapping algorithms have been proposed in the literature (Gilpin, Sandholm, and Sørensen 008; Rubin and Watson 01; Schnizlein, Bowling, and Szafron 009). For each algorithm, we will check whether it satisfies the above properties, and discuss additional strengths and limitations. Deterministic arithmetic The deterministic arithmetic algorithm is perhaps the simplest. If x < A+B, then x is mapped to A with probability 1; otherwise x is mapped to B with probability 1. This mapping satisfies Properties 1,, and 4, but not Property 3. (It is constant from x = A to A+B and from x = A+B to B, and is therefore not monotonically decreasing). Furthermore, it is potentially highly exploitable. For example, suppose A is a pot-size bet and B is an all-in. Then the opponent could significantly exploit us by betting slightly under A+B with his strong hands.
4 Randomized arithmetic f A,B (x) = B x B A This mapping satisfies all four properties; but it is exploitable in the same way as the deterministic arithmetic mapping (though to a lesser extent). Deterministic geometric The motivation behind this algorithm is to reduce the level of exploitability of the arithmetic approach by making our threshold x at the point where the ratios of x to A and B to x are the same (rather than the differences). Specifically, if A x > x B then x is mapped to A with probability 1; otherwise x is mapped to B with probability 1. Thus, the threshold will be x = AB rather than A+B. This will diminish the effectiveness of the exploitation described above; namely to make a large value bet just below the threshold. Like the deterministic arithmetic mapping, this mapping satisfies Properties 1,, and 4, but not Property 3. An additional undesirable property is that f(x) = 0 for all x at A = 0 (i.e., A corresponds to the check action). If A = 0, B = 1, and x = 0.1 for example, we will believe the opponent s small bet of 0.1 times the pot is actually a full potsized bet. This function also behaves undesirably as A 0. Randomized geometric 1 g A,B (x) = A x A B 1 A B x B h A,B (x) = A B 1 A B g A,B (x) f A,B (x) = g A,B (x) + h A,B (x) Like the deterministic geometric mapping, this violates monotonicity at A = 0. This mapping has been used in previous competition agents (e.g., Hyperborean from Alberta (Schnizlein, Bowling, and Szafron 009) and Sartre from Auckland (Rubin and Watson 01)). Randomized geometric f A,B (x) = AB(A + B) (B A)(x + AB) + A A B This mapping behaves similarly to the first randomized geometric mapping, and also violates monotonicity at A = 0. This was the mapping we used in our Tartanian4 agent. Note that both randomized geometric mappings have f A,B ( AB) = 1. Our new reverse mapping algorithm The motivating problem for our new reverse mapping is a half-street no-limit clairvoyance game (Ankenman and Chen 006): There are two players, P1 and P. The initial size of the pot is $1. Both players have stacks of size n. P1 is clairvoyant, and his hand is randomly drawn from a distribution that is half winning hands and half losing hands. P1 is allowed to bet a fixed amount s, where s [0, n]. P is allowed to call or fold. For a given bet size s, equilibrium strategies for both players are as follows (Ankenman and Chen 006): P1 bets s with probability 1 with a winning hand. P1 bets s with probability checks otherwise). s 1+s with a losing hand (and P calls with probability 1 1+s. In fact, these betting and calling frequencies are shown to be optimal in many other no-limit poker games as well (Ankenman and Chen 006). Using this as motivation, our reverse mapping will be the solution to f A,B (x) A + (1 f 1 A,B(x)) 1 + B = x. Specifically, our mapping is: f A,B (x) = (B x)(1 + A) (B A)(1 + x). This is the only mapping consistent with calling a bet of size x with probability 1 1+x for all x [A, B]. This mapping satisfies all of the properties, is wellbehaved as A 0, and is not as susceptible to the exploitations previously described. The threshold x for which f A,B (x ) = 1 is x = A + B + AB A + B +. Examples In Figure we plot all four randomized reverse mapping algorithms using A = 0.01 and B = 1. As the figure shows, both of the randomized geometric algorithms map a bet size of 0.1-pot to A and B with roughly equal probability, while the corresponding threshold of the arithmetic algorithm is around 0.5-pot and the new algorithm is around 0.35-pot. In this case, the algorithms differ quite significantly. In Figure 3, we plot the algorithms using A = 1 and B = 4. In this case our mapping is relatively similar to the geometric mappings, while the arithmetic mapping differs significantly from the others. Additional research relevant to poker and imperfect-information games In addition to the research that went into Tartanian5, our group has worked on several other topics highly related to imperfect-information games, and to poker in particular. We have developed algorithms for computing an ɛ-equilibrium in the endgame of a 3-player no-limit Texas Hold em tournament for provably small ɛ (Ganzfried and Sandholm 008; 009).
5 Figure : Plots of the four randomized reverse mapping algorithms using A = 0.01 and B = 1. Figure 3: Plots of the four randomized reverse mapping algorithms using A = 1 and B = 4.
6 We have developed an algorithm for quickly solving river subgames in real time in limit Texas Hold em that exploits qualitative information on the structure of equilibrium strategies (Ganzfried and Sandholm 010). We have a developed a opponent exploitation algorithm that leads to improved performance against a variety of agents in two-player limit Texas Hold em (Ganzfried and Sandholm 011). It works by constructing a model of the opponent based on deviations from a precomputed approximate equilibrium, and computing approximate best responses in real time. We have developed algorithms for exploiting suboptimal opponents that guarantee at least the value of the game in expectation (Ganzfried and Sandholm 01). Our algorithms led to a significant performance improvement against several opponent classes in Kuhn poker. Conclusion We introduced our no-limit Texas Hold em agent Tartanian5 and described its primary components: an automated abstraction algorithm, a discretized betting model, a custom equilibrium-finding algorithm, an ex-post purification procedure, and a new reverse mapping algorithm. We also briefly described ongoing research in our group involving real-time computation and opponent exploitation, which will hopefully be incorporated into our agents in future years. We described several theoretical and empirical advantages of our new reverse mapping algorithm over prior approaches. Specifically, all prior mappings either fail to satisfy some basic properties or are highly exploitable; furthermore, they are all ad hoc and lack any theoretical justification. By contrast, our mapping satisfies all of the properties, has low exploitability, and is theoretically motivated as the solution to simplified poker games. Future work will involve doing a thorough comparison of the various approaches against agents submitted to this year s competition. References Ankenman, J., and Chen, B The Mathematics of Poker. ConJelCo LLC. Billings, D.; Burch, N.; Davidson, A.; Holte, R.; Schaeffer, J.; Schauenberg, T.; and Szafron, D Approximating game-theoretic optimal strategies for full-scale poker. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI). Ganzfried, S., and Sandholm, T Computing an approximate jam/fold equilibrium for 3-player no-limit Texas Hold em tournaments. In International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS). Ganzfried, S., and Sandholm, T Computing equilibria in multiplayer stochastic games of imperfect information. In Proceedings of the 1st International Joint Conference on Artificial Intelligence (IJCAI). Ganzfried, S., and Sandholm, T Computing equilibria by incorporating qualitative models. In International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS). Ganzfried, S., and Sandholm, T Game theory-based opponent modeling in large imperfect-information games. In International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS). Ganzfried, S., and Sandholm, T. 01. Safe opponent exploitation. In Proceedings of the ACM Conference on Electronic Commerce (ACM-EC). Ganzfried, S.; Sandholm, T.; and Waugh, K. 01. Strategy purification and thresholding: Effective non-equilibrium approaches for playing large games. In International Conference on Autonomous Agents and Multi-Agent Systems (AA- MAS). Gilpin, A., and Sandholm, T A competitive Texas Hold em poker player via automated abstraction and realtime equilibrium computation. In Proceedings of the National Conference on Artificial Intelligence (AAAI). Gilpin, A., and Sandholm, T Lossless abstraction of imperfect information games. Journal of the ACM 54(5). Gilpin, A.; Sandholm, T.; and Sørensen, T. B Potential-aware automated abstraction of sequential games, and holistic equilibrium analysis of Texas Hold em poker. In Proceedings of the National Conference on Artificial Intelligence (AAAI). Gilpin, A.; Sandholm, T.; and Sørensen, T. B A heads-up no-limit Texas Hold em poker player: Discretized betting models and automatically generated equilibriumfinding programs. In International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS). Hoda, S.; Gilpin, A.; Peña, J.; and Sandholm, T Smoothing techniques for computing Nash equilibria of sequential games. Mathematics of Operations Research 35(): Rubin, J., and Watson, I. 01. Case-based strategies in computer poker. AI Communications 5(1): Sandholm, T The state of solving large incompleteinformation games, and application to poker. AI Magazine Special issue on Algorithmic Game Theory. Schnizlein, D.; Bowling, M.; and Szafron, D Probabilistic state translation in extensive games with large action sets. In Proceedings of the 1st International Joint Conference on Artificial Intelligence (IJCAI). Shi, J., and Littman, M. 00. Abstraction methods for game theoretic poker. In CG 00: Revised Papers from the Second International Conference on Computers and Games. Springer-Verlag. Waugh, K.; Zinkevich, M.; Johanson, M.; Kan, M.; Schnizlein, D.; and Bowling, M A practical use of imperfect recall. In Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation (SARA). Zinkevich, M.; Bowling, M.; Johanson, M.; and Piccione, C Regret minimization in games with incomplete information. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS).
Potential-Aware Imperfect-Recall Abstraction with Earth Mover s Distance in Imperfect-Information Games
Potential-Aware Imperfect-Recall Abstraction with Earth Mover s Distance in Imperfect-Information Games Sam Ganzfried and Tuomas Sandholm Computer Science Department Carnegie Mellon University {sganzfri,
A heads-up no-limit Texas Hold em poker player: Discretized betting models and automatically generated equilibrium-finding programs
A heads-up no-limit Texas Hold em poker player: Discretized betting models and automatically generated equilibrium-finding programs Andrew Gilpin Computer Science Dept. Carnegie Mellon University Pittsburgh,
Measuring the Size of Large No-Limit Poker Games
Measuring the Size of Large No-Limit Poker Games Michael Johanson February 26, 2013 Abstract In the field of computational game theory, games are often compared in terms of their size. This can be measured
How to Win Texas Hold em Poker
How to Win Texas Hold em Poker Richard Mealing Machine Learning and Optimisation Group School of Computer Science University of Manchester / 44 How to Play Texas Hold em Poker Deal private cards per player
A competitive Texas Hold em poker player via automated abstraction and real-time equilibrium computation
A competitive Texas Hold em poker player via automated abstraction and real-time equilibrium computation Andrew Gilpin and Tuomas Sandholm Computer Science Department Carnegie Mellon University {gilpin,sandholm}@cs.cmu.edu
Adaptive play in Texas Hold em Poker
Adaptive play in Texas Hold em Poker Raphaël Maîtrepierre and Jérémie Mary and Rémi Munos 1 Abstract. We present a Texas Hold em poker player for limit headsup games. Our bot is designed to adapt automatically
Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions
Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions Richard Gibson, Neil Burch, Marc Lanctot, and Duane Szafron Department of Computing Science, University of Alberta
Best-response play in partially observable card games
Best-response play in partially observable card games Frans Oliehoek Matthijs T. J. Spaan Nikos Vlassis Informatics Institute, Faculty of Science, University of Amsterdam, Kruislaan 43, 98 SJ Amsterdam,
A Simulation System to Support Computer Poker Research
A Simulation System to Support Computer Poker Research Luís Filipe Teófilo 1, Rosaldo Rossetti 1, Luís Paulo Reis 2, Henrique Lopes Cardoso 1 LIACC Artificial Intelligence and Computer Science Lab., University
SARTRE: System Overview
SARTRE: System Overview A Case-Based Agent for Two-Player Texas Hold em Jonathan Rubin and Ian Watson Department of Computer Science University of Auckland, New Zealand [email protected], [email protected]
Rafael Witten Yuze Huang Haithem Turki. Playing Strong Poker. 1. Why Poker?
Rafael Witten Yuze Huang Haithem Turki Playing Strong Poker 1. Why Poker? Chess, checkers and Othello have been conquered by machine learning - chess computers are vastly superior to humans and checkers
Bonus Maths 2: Variable Bet Sizing in the Simplest Possible Game of Poker (JB)
Bonus Maths 2: Variable Bet Sizing in the Simplest Possible Game of Poker (JB) I recently decided to read Part Three of The Mathematics of Poker (TMOP) more carefully than I did the first time around.
Creating a NL Texas Hold em Bot
Creating a NL Texas Hold em Bot Introduction Poker is an easy game to learn by very tough to master. One of the things that is hard to do is controlling emotions. Due to frustration, many have made the
A Near-Optimal Strategy for a Heads-Up No-Limit Texas Hold em Poker Tournament
A Near-Optimal Strategy for a Heads-Up No-Limit Texas Hold em Poker Tournament Peter Bro Miltersen University of Aarhus Åbogade 34, Århus, Denmark [email protected] Troels Bjerre So/ rensen University
Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling
Journal of Artificial Intelligence Research 42 (20) 575 605 Submitted 06/; published 2/ Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling Marc Ponsen Steven de Jong Department
We employed reinforcement learning, with a goal of maximizing the expected value. Our bot learns to play better by repeated training against itself.
Date: 12/14/07 Project Members: Elizabeth Lingg Alec Go Bharadwaj Srinivasan Title: Machine Learning Applied to Texas Hold 'Em Poker Introduction Part I For the first part of our project, we created a
Laboratory work in AI: First steps in Poker Playing Agents and Opponent Modeling
Laboratory work in AI: First steps in Poker Playing Agents and Opponent Modeling Avram Golbert 01574669 [email protected] Abstract: While Artificial Intelligence research has shown great success in deterministic
Heads-up Limit Hold em Poker is Solved
Heads-up Limit Hold em Poker is Solved Michael Bowling, 1 Neil Burch, 1 Michael Johanson, 1 Oskari Tammelin 2 1 Department of Computing Science, University of Alberta, Edmonton, Alberta, T6G2E8, Canada
Game-theoretic solution concepts prescribe how rational
The State of Solving Large Incomplete-Information Games, and Application to Poker Tuomas Sandholm n Game-theoretic solution concepts prescribe how rational parties should act, but to become operational
Artificial Intelligence Beating Human Opponents in Poker
Artificial Intelligence Beating Human Opponents in Poker Stephen Bozak University of Rochester Independent Research Project May 8, 26 Abstract In the popular Poker game, Texas Hold Em, there are never
Game Theory and Algorithms Lecture 10: Extensive Games: Critiques and Extensions
Game Theory and Algorithms Lecture 0: Extensive Games: Critiques and Extensions March 3, 0 Summary: We discuss a game called the centipede game, a simple extensive game where the prediction made by backwards
Von Neumann and Newman poker with a flip of hand values
Von Neumann and Newman poker with a flip of hand values Nicla Bernasconi Julian Lorenz Reto Spöhel Institute of Theoretical Computer Science ETH Zürich, 8092 Zürich, Switzerland {nicla jlorenz rspoehel}@inf.ethz.ch
ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015
ECON 459 Game Theory Lecture Notes Auctions Luca Anderlini Spring 2015 These notes have been used before. If you can still spot any errors or have any suggestions for improvement, please let me know. 1
Minimax Strategies. Minimax Strategies. Zero Sum Games. Why Zero Sum Games? An Example. An Example
Everyone who has studied a game like poker knows the importance of mixing strategies With a bad hand, you often fold But you must bluff sometimes Lectures in Microeconomics-Charles W Upton Zero Sum Games
Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows
TECHNISCHE UNIVERSITEIT EINDHOVEN Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows Lloyd A. Fasting May 2014 Supervisors: dr. M. Firat dr.ir. M.A.A. Boon J. van Twist MSc. Contents
6.207/14.15: Networks Lecture 15: Repeated Games and Cooperation
6.207/14.15: Networks Lecture 15: Repeated Games and Cooperation Daron Acemoglu and Asu Ozdaglar MIT November 2, 2009 1 Introduction Outline The problem of cooperation Finitely-repeated prisoner s dilemma
AN ANALYSIS OF A WAR-LIKE CARD GAME. Introduction
AN ANALYSIS OF A WAR-LIKE CARD GAME BORIS ALEXEEV AND JACOB TSIMERMAN Abstract. In his book Mathematical Mind-Benders, Peter Winkler poses the following open problem, originally due to the first author:
Game theory and AI: a unified approach to poker games
Game theory and AI: a unified approach to poker games Thesis for graduation as Master of Artificial Intelligence University of Amsterdam Frans Oliehoek 2 September 2005 ii Abstract This thesis focuses
Nikolai Yakovenko, PokerPoker LLC CU Neural Networks Reading Group Dec 2, 2015
Deep Learning for Poker: Inference From Patterns in an Adversarial Environment Nikolai Yakovenko, PokerPoker LLC CU Neural Networks Reading Group Dec 2, 2015 This work is not complicated Fully explaining
How to Learn Good Cue Orders: When Social Learning Benefits Simple Heuristics
How to Learn Good Cue Orders: When Social Learning Benefits Simple Heuristics Rocio Garcia-Retamero ([email protected]) Center for Adaptive Behavior and Cognition, Max Plank Institute for Human
Monte Carlo Tree Search and Opponent Modeling through Player Clustering in no-limit Texas Hold em Poker
Monte Carlo Tree Search and Opponent Modeling through Player Clustering in no-limit Texas Hold em Poker A.A.J. van der Kleij August 2010 Master thesis Artificial Intelligence University of Groningen, The
Sequential lmove Games. Using Backward Induction (Rollback) to Find Equilibrium
Sequential lmove Games Using Backward Induction (Rollback) to Find Equilibrium Sequential Move Class Game: Century Mark Played by fixed pairs of players taking turns. At each turn, each player chooses
Poker-CNN: A Pattern Learning Strategy for Making Draws and Bets in Poker Games Using Convolutional Networks
Poker-CNN: A Pattern Learning Strategy for Making Draws and Bets in Poker Games Using Convolutional Networks Nikolai Yakovenko PokerPoker LLC, Las Vegas [email protected] Liangliang Cao Columbia
Texas Hold em. From highest to lowest, the possible five card hands in poker are ranked as follows:
Texas Hold em Poker is one of the most popular card games, especially among betting games. While poker is played in a multitude of variations, Texas Hold em is the version played most often at casinos
Section 1.3 P 1 = 1 2. = 1 4 2 8. P n = 1 P 3 = Continuing in this fashion, it should seem reasonable that, for any n = 1, 2, 3,..., = 1 2 4.
Difference Equations to Differential Equations Section. The Sum of a Sequence This section considers the problem of adding together the terms of a sequence. Of course, this is a problem only if more than
Lecture V: Mixed Strategies
Lecture V: Mixed Strategies Markus M. Möbius February 26, 2008 Osborne, chapter 4 Gibbons, sections 1.3-1.3.A 1 The Advantage of Mixed Strategies Consider the following Rock-Paper-Scissors game: Note that
Compact Representations and Approximations for Compuation in Games
Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions
Game Theory and Poker
Game Theory and Poker Jason Swanson April, 2005 Abstract An extremely simplified version of poker is completely solved from a game theoretic standpoint. The actual properties of the optimal solution are
6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
Statistical Machine Translation: IBM Models 1 and 2
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
Game Theory for Humans. Matt Hawrilenko MIT: Poker Theory and Analytics
Game Theory for Humans Matt Hawrilenko MIT: Poker Theory and Analytics 1 Play Good Poker Read-Based Approach Game Theoretic Approach Why Game Theory? Computer Scientists Humans Audience Theory Practice
Fast Sequential Summation Algorithms Using Augmented Data Structures
Fast Sequential Summation Algorithms Using Augmented Data Structures Vadim Stadnik [email protected] Abstract This paper provides an introduction to the design of augmented data structures that offer
Making Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research
Identifying Player s Strategies in No Limit Texas Hold em Poker through the Analysis of Individual Moves
Identifying Player s Strategies in No Limit Texas Hold em Poker through the Analysis of Individual Moves Luís Filipe Teófilo and Luís Paulo Reis Departamento de Engenharia Informática, Faculdade de Engenharia
Nash Equilibria and. Related Observations in One-Stage Poker
Nash Equilibria and Related Observations in One-Stage Poker Zach Puller MMSS Thesis Advisor: Todd Sarver Northwestern University June 4, 2013 Contents 1 Abstract 2 2 Acknowledgements 3 3 Literature Review
Playing with Numbers
PLAYING WITH NUMBERS 249 Playing with Numbers CHAPTER 16 16.1 Introduction You have studied various types of numbers such as natural numbers, whole numbers, integers and rational numbers. You have also
Review of Fundamental Mathematics
Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools
Example of the Glicko-2 system
Example of the Glicko-2 system Professor Mark E. Glickman Boston University November 30, 203 Every player in the Glicko-2 system has a rating, r, a rating deviation, RD, and a rating volatility σ. The
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem
Champion Poker Texas Hold em
Champion Poker Texas Hold em Procedures & Training For the State of Washington 4054 Dean Martin Drive, Las Vegas, Nevada 89103 1 Procedures & Training Guidelines for Champion Poker PLAYING THE GAME Champion
Irrational Numbers. A. Rational Numbers 1. Before we discuss irrational numbers, it would probably be a good idea to define rational numbers.
Irrational Numbers A. Rational Numbers 1. Before we discuss irrational numbers, it would probably be a good idea to define rational numbers. Definition: Rational Number A rational number is a number that
Analyzing the Procurement Process in a Simplified Version of the TAC SCM Game
Analyzing the Procurement Process in a Simplified Version of the TAC SCM Game Hosna Jabbari December 9, 2005 Abstract The TAC supply chain management game presents a simulated market environment with automated
Optimal Drawing Strategy for Deuces Wild Video Poker Stephen Jensen
Optimal Drawing Strategy for Deuces Wild Video Poker Stephen Jensen Abstract Video poker, an extremely popular computerized version of the traditional game of poker, offers many opportunities for mathematical
CALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
Knowledge and Strategy-based Computer Player for Texas Hold'em Poker
Knowledge and Strategy-based Computer Player for Texas Hold'em Poker Ankur Chopra Master of Science School of Informatics University of Edinburgh 2006 Abstract The field of Imperfect Information Games
ECO 199 B GAMES OF STRATEGY Spring Term 2004 PROBLEM SET 4 B DRAFT ANSWER KEY 100-3 90-99 21 80-89 14 70-79 4 0-69 11
The distribution of grades was as follows. ECO 199 B GAMES OF STRATEGY Spring Term 2004 PROBLEM SET 4 B DRAFT ANSWER KEY Range Numbers 100-3 90-99 21 80-89 14 70-79 4 0-69 11 Question 1: 30 points Games
RULES FOR PLAY TEXAS HOLD EM
RULES FOR PLAY TEXAS HOLD EM The player to the left of the dealer s button places the small blind which will be a stipulated amount. The player seated second from the dealer s left places the big blind
Game playing. Chapter 6. Chapter 6 1
Game playing Chapter 6 Chapter 6 1 Outline Games Perfect play minimax decisions α β pruning Resource limits and approximate evaluation Games of chance Games of imperfect information Chapter 6 2 Games vs.
A Sarsa based Autonomous Stock Trading Agent
A Sarsa based Autonomous Stock Trading Agent Achal Augustine The University of Texas at Austin Department of Computer Science Austin, TX 78712 USA [email protected] Abstract This paper describes an autonomous
14.1 Rent-or-buy problem
CS787: Advanced Algorithms Lecture 14: Online algorithms We now shift focus to a different kind of algorithmic problem where we need to perform some optimization without knowing the input in advance. Algorithms
A NOTE ON OFF-DIAGONAL SMALL ON-LINE RAMSEY NUMBERS FOR PATHS
A NOTE ON OFF-DIAGONAL SMALL ON-LINE RAMSEY NUMBERS FOR PATHS PAWE L PRA LAT Abstract. In this note we consider the on-line Ramsey numbers R(P n, P m ) for paths. Using a high performance computing clusters,
The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy
BMI Paper The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy Faculty of Sciences VU University Amsterdam De Boelelaan 1081 1081 HV Amsterdam Netherlands Author: R.D.R.
Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10
CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10 Introduction to Discrete Probability Probability theory has its origins in gambling analyzing card games, dice,
5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.
The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution
1 Error in Euler s Method
1 Error in Euler s Method Experience with Euler s 1 method raises some interesting questions about numerical approximations for the solutions of differential equations. 1. What determines the amount of
Network Security Validation Using Game Theory
Network Security Validation Using Game Theory Vicky Papadopoulou and Andreas Gregoriades Computer Science and Engineering Dep., European University Cyprus, Cyprus {v.papadopoulou,a.gregoriades}@euc.ac.cy
Christmas Gift Exchange Games
Christmas Gift Exchange Games Arpita Ghosh 1 and Mohammad Mahdian 1 Yahoo! Research Santa Clara, CA, USA. Email: [email protected], [email protected] Abstract. The Christmas gift exchange is a popular
Math 728 Lesson Plan
Math 728 Lesson Plan Tatsiana Maskalevich January 27, 2011 Topic: Probability involving sampling without replacement and dependent trials. Grade Level: 8-12 Objective: Compute the probability of winning
Moral Hazard. Itay Goldstein. Wharton School, University of Pennsylvania
Moral Hazard Itay Goldstein Wharton School, University of Pennsylvania 1 Principal-Agent Problem Basic problem in corporate finance: separation of ownership and control: o The owners of the firm are typically
Completion Time Scheduling and the WSRPT Algorithm
Completion Time Scheduling and the WSRPT Algorithm Bo Xiong, Christine Chung Department of Computer Science, Connecticut College, New London, CT {bxiong,cchung}@conncoll.edu Abstract. We consider the online
Hierarchical Judgement Composition: Revisiting the Structural Credit Assignment Problem
Hierarchical Judgement Composition: Revisiting the Structural Credit Assignment Problem Joshua Jones, Ashok Goel College of Computing Georgia Institute of Technology Atlanta, USA 30332 {jkj, goel}@cc.gatech.edu
Probability Distributions
CHAPTER 5 Probability Distributions CHAPTER OUTLINE 5.1 Probability Distribution of a Discrete Random Variable 5.2 Mean and Standard Deviation of a Probability Distribution 5.3 The Binomial Distribution
Dynamic Stochastic Optimization of Relocations in Container Terminals
Dynamic Stochastic Optimization of Relocations in Container Terminals Setareh Borjian Vahideh H. Manshadi Cynthia Barnhart Patrick Jaillet June 25, 2013 Abstract In this paper, we present a mathematical
Beating the MLB Moneyline
Beating the MLB Moneyline Leland Chen [email protected] Andrew He [email protected] 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series
6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc.
1. Introduction Linear Programming for Optimization Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc. 1.1 Definition Linear programming is the name of a branch of applied mathematics that
Comparing Artificial Intelligence Systems for Stock Portfolio Selection
Abstract Comparing Artificial Intelligence Systems for Stock Portfolio Selection Extended Abstract Chiu-Che Tseng Texas A&M University Commerce P.O. BOX 3011 Commerce, Texas 75429 Tel: (903) 886-5497 Email:
The Advantages and Disadvantages of Online Linear Optimization
LINEAR PROGRAMMING WITH ONLINE LEARNING TATSIANA LEVINA, YURI LEVIN, JEFF MCGILL, AND MIKHAIL NEDIAK SCHOOL OF BUSINESS, QUEEN S UNIVERSITY, 143 UNION ST., KINGSTON, ON, K7L 3N6, CANADA E-MAIL:{TLEVIN,YLEVIN,JMCGILL,MNEDIAK}@BUSINESS.QUEENSU.CA
NewPokerSoft. Texas Holdem Poker Game Simulator
NewPokerSoft poker for life Texas Holdem Poker Game Simulator www.newpokersoft.com Poker training simulator for Texas Holdem Here, we present the simulator of the Texas Holdem PokerGame. It can be used
UCLA. Department of Economics Ph. D. Preliminary Exam Micro-Economic Theory
UCLA Department of Economics Ph. D. Preliminary Exam Micro-Economic Theory (SPRING 2011) Instructions: You have 4 hours for the exam Answer any 5 out of the 6 questions. All questions are weighted equally.
Poker with a Three Card Deck 1
Poker with a Three Card Deck We start with a three card deck containing one Ace, one King and one Queen. Alice and Bob are each dealt one card at random. There is a pot of $ P (and we assume P 0). Alice
Computing Game-Theoretic Solutions and Applications to Security
Computing Game-Theoretic Solutions and Applications to Security Vincent Conitzer Departments of Computer Science and Economics Duke University Durham, NC 27708, USA [email protected] Abstract The multiagent
Maximizing Precision of Hit Predictions in Baseball
Maximizing Precision of Hit Predictions in Baseball Jason Clavelli [email protected] Joel Gottsegen [email protected] December 13, 2013 Introduction In recent years, there has been increasing interest
This section of the guide describes poker rules for the following cash game types:
Poker Rules What has changed in version 1.2 of the Poker Rules: Rake on games with stakes of 0.02/0.04 was lowered from 2% to 1%. This section of the guide describes poker rules for: Cash Games Tournaments
SYMMETRIC FORM OF THE VON NEUMANN POKER MODEL. Guido David 1, Pearl Anne Po 2
International Journal of Pure and Applied Mathematics Volume 99 No. 2 2015, 145-151 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v99i2.2
