Monte-Carlo Tree Search (MCTS) for Computer Go



Similar documents
Generalized Widening

A Dynamical Systems Approach for Static Evaluation in Go

Monte-Carlo Methods. Timo Nolle

KNOWLEDGE-BASED MONTE-CARLO TREE SEARCH IN HAVANNAH. J.A. Stankiewicz. Master Thesis DKE 11-05

Feature Selection with Monte-Carlo Tree Search

Counting the Score: Position Evaluation in Computer Go

Monte-Carlo Tree Search Solver

FUEGO An Open-source Framework for Board Games and Go Engine Based on Monte-Carlo Tree Search

Monte-Carlo Tree Search

On the Laziness of Monte-Carlo Game Tree Search in Non-tight Situations

Chess Algorithms Theory and Practice. Rune Djurhuus Chess Grandmaster / October 3, 2012

The UCT Algorithm Applied to Games with Imperfect Information

AI Techniques Used in Computer Go

Chess Algorithms Theory and Practice. Rune Djurhuus Chess Grandmaster / September 23, 2014

How To Play Go Lesson 1: Introduction To Go

TD-Gammon, A Self-Teaching Backgammon Program, Achieves Master-Level Play

Laboratory work in AI: First steps in Poker Playing Agents and Opponent Modeling

Roulette Wheel Selection Game Player

Mastering the Game of Go with Deep Neural Networks and Tree Search

Parallel Computing for Option Pricing Based on the Backward Stochastic Differential Equation

Union-Find Algorithms. network connectivity quick find quick union improvements applications

Game playing. Chapter 6. Chapter 6 1

Real Time Scheduling Basic Concepts. Radek Pelánek

Sanjeev Kumar. contribute

Load Balancing. Load Balancing 1 / 24

Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows

10. Machine Learning in Games

Contextual-Bandit Approach to Recommendation Konstantin Knauf

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach

Lecture 1: Introduction to Reinforcement Learning

Cellular Computing on a Linux Cluster

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

FIRST EXPERIMENTAL RESULTS OF PROBCUT APPLIED TO CHESS

Frequency Assignment in Mobile Phone Systems

Throughput constraint for Synchronous Data Flow Graphs

How I won the Chess Ratings: Elo vs the rest of the world Competition

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Blunder cost in Go and Hex

Traffic Engineering for Multiple Spanning Tree Protocol in Large Data Centers

Making Sense of the Mayhem: Machine Learning and March Madness

Kaz s Online Go School

Game Playing in the Real World. Next time: Knowledge Representation Reading: Chapter

We employed reinforcement learning, with a goal of maximizing the expected value. Our bot learns to play better by repeated training against itself.

Performance Monitoring of Parallel Scientific Applications

Multicore Parallel Computing with OpenMP

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Lecture 10: Regression Trees

A* Algorithm Based Optimization for Cloud Storage

MAN VS. MACHINE. How IBM Built a Jeopardy! Champion x The Analytics Edge

Analysis of Algorithms I: Binary Search Trees

D1.1 Service Discovery system: Load balancing mechanisms

Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup

Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Heuristics for Dynamically Adapting Constraint Propagation in Constraint Programming

Creating a NL Texas Hold em Bot

CHAPTER 1 INTRODUCTION

Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye MSRC

A number of tasks executing serially or in parallel. Distribute tasks on processors so that minimal execution time is achieved. Optimal distribution

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

Expected Value. 24 February Expected Value 24 February /19

A Comparison of General Approaches to Multiprocessor Scheduling

Improving Depth-first PN-Search: 1 + ε Trick

Acknowledgements I would like to thank both Dr. Carsten Furhmann, and Dr. Daniel Richardson for providing me with wonderful supervision throughout my

Spring 2011 Prof. Hyesoon Kim

Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling

This Method will show you exactly how you can profit from this specific online casino and beat them at their own game.

The Taxman Game. Robert K. Moniot September 5, 2003

Parallel Computing. Benson Muite. benson.

Simulation-Based Approach to General Game Playing

Beating Roulette? An analysis with probability and statistics.

Search methods motivation 1

Procedia Computer Science 00 (2012) Trieu Minh Nhut Le, Jinli Cao, and Zhen He. trieule@sgu.edu.vn, j.cao@latrobe.edu.au, z.he@latrobe.edu.

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Distributed forests for MapReduce-based machine learning

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp

Supply Chain Analytics - OR in Action

Database Design Patterns. Winter Lecture 24

Constrained Clustering of Territories in the Context of Car Insurance

Introduction Solvability Rules Computer Solution Implementation. Connect Four. March 9, Connect Four

A Reactive Tabu Search for Service Restoration in Electric Power Distribution Systems

EECS 750: Advanced Operating Systems. 01/28 /2015 Heechul Yun

Decision Theory Rational prospecting

Measuring the Performance of an Agent

Online Planning for Ad Hoc Autonomous Agent Teams

Distributed Computing over Communication Networks: Topology. (with an excursion to P2P)

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Multiple Optimization Using the JMP Statistical Software Kodak Research Conference May 9, 2005

Protein Protein Interaction Networks

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Dynamic Adaptive Feedback of Load Balancing Strategy

6.042/18.062J Mathematics for Computer Science. Expected Value I

Transcription:

Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université Paris Descartes Séminaire BigMC 5 mai 2011

Outline The game of Go: a 9x9 game The «old» approach (*-2002) The Monte-Carlo approach (2002-2005) The MCTS approach (2006-today) Conclusion MCTS for Computer Go 2

The game of Go MCTS for Computer Go 3

The game of Go 4000 years Originated from China Developed by Japan (20 th century) Best players in Korea, Japan, China 19x19: official board size 9x9: beginners' board size MCTS for Computer Go 4

A 9x9 game The board has 81 «intersections». Initially,it is empty. MCTS for Computer Go 5

A 9x9 game Black moves first. A «stone» is played on an intersection. MCTS for Computer Go 6

A 9x9 game White moves second. MCTS for Computer Go 7

A 9x9 game Moves alternate between Black and White. MCTS for Computer Go 8

A 9x9 game Two adjacent stones of the same color builds a «string» with «liberties». 4-adjacency MCTS for Computer Go 9

A 9x9 game Strings are created. MCTS for Computer Go 10

A 9x9 game A white stone is in «atari» (one liberty). MCTS for Computer Go 11

A 9x9 game The white string has five liberties. MCTS for Computer Go 12

A 9x9 game The black stone is «atari». MCTS for Computer Go 13

A 9x9 game White «captures» the black stone. MCTS for Computer Go 14

A 9x9 game For human players, the game is over. Hu? Why? MCTS for Computer Go 15

A 9x9 game What happens if White contests black «territory»? MCTS for Computer Go 16

A 9x9 game White has invaded. Two strings are atari! MCTS for Computer Go 17

A 9x9 game Black captures! MCTS for Computer Go 18

A 9x9 game White insists but its string is atari... MCTS for Computer Go 19

A 9x9 game Black has proved is «territory». MCTS for Computer Go 20

A 9x9 game Black may contest white territory too. MCTS for Computer Go 21

A 9x9 terminal position The game is over for computers. Hu? Who won? MCTS for Computer Go 22

A 9x9 game The game ends when both players pass. One black (resp. white) point for each black (resp. white) stone and each black (resp. white) «eye» on the board. One black (resp. white) eye = an empty intersection surrounded by black (resp. white) stones. MCTS for Computer Go 23

A 9x9 game Scoring: Black = 44 White = 37 Komi = 7.5 Score = -0.5 White wins! MCTS for Computer Go 24

Go ranking: «kyu» and «dan» Pro ranking Amateur ranking Top professional players Very strong players 9 dan 1 dan Strong players Average playersl Beginners 9 dan 6 dan 1 dan 1 kyu 10 kyu 20 kyu Very beginners 30 kyu MCTS for Computer Go 25

Computer Go (old history) First go program (Lefkovitz 1960) Zobrist hashing (Zobrist 1969) Interim2 (Wilcox 1979) Life and death model (Benson 1988) Patterns: Goliath (Boon 1990) Mathematical Go (Berlekamp 1991) Handtalk (Chen 1995) MCTS for Computer Go 26

The old approach Evaluation of non terminal positions Knowledge-based Breaking-down of a position into subpositions Fixed-depth global tree search Depth = 0 : action with the best value Depth = 1: action leading to the position with the best evaluation Depth > 1: alfa-beta or minmax MCTS for Computer Go 27

The old approach Current position 2 or 3 Bounded depth Tree search Evaluation of non terminal positions 361 Huhu? Terminal positions MCTS for Computer Go 28

Position evaluation Break-down Whole game (win/loss or score) Goal-oriented sub-game Local searches String capture Connections, dividers, eyes, life and death Alpha-beta and enhancements Proof-number search MCTS for Computer Go 29

A 19x19 middle-game position MCTS for Computer Go 30

A possible black break-down MCTS for Computer Go 31

A possible white break-down MCTS for Computer Go 32

Possible local evaluations (1) Alive and territory unstable Not important alive dead alive MCTS for Computer Go 33

Possible local evaluations (2) alive unstable unstable alive + big territory unstable MCTS for Computer Go 34

Position evaluation Local results Obtained with local tree search Result if white plays first (resp. black) Combinatorial game theory (Conway) Switches {a b}, >, <, *, 0 Global recomposition move generation and evaluation position evaluation MCTS for Computer Go 35

Position evaluation MCTS for Computer Go 36

Drawbacks (1/2) The break-down is not unique Performing a (wrong) local tree search on a (possibly irrelevant) local position Misevaluating the size of the local position Different kinds of local information Symbolic (group: dead alive unstable) Numerical (territory size, reduction, increase) MCTS for Computer Go 37

Drawbacks (2/2) Local positions interact Complicated Domain-dependent knowledge Need of human expertise Difficult to program and maintain Holes of knowledge Erratic behaviour MCTS for Computer Go 38

Upsides Feasible on 1990's computers Execution is fast Some specific local tree searches are accurate and fast MCTS for Computer Go 39

The old approach Pro ranking Amateur ranking Top professional players Very strong players 9 dan 1 dan 9 dan 6 dan Strong players Average playersl 1 dan 1 kyu 10 kyu Old approach Beginners 20 kyu Very beginners 30 kyu MCTS for Computer Go 40

End of part one! Next: the Monte-Carlo approach... MCTS for Computer Go 41

The Monte-Carlo (MC) approach Games containing chance Backgammon (Tesauro 1989) Games with hidden information Bridge (Ginsberg 2001) Poker (Billings & al. 2002) Scrabble (Sheppard 2002) MCTS for Computer Go 42

The Monte-Carlo approach Games with complete information A general model (Abramson 1990) Simulated annealing Go (Brügmann 1993) 2 sequences of moves «all moves as first» heuristic Gobble on 9x9 MCTS for Computer Go 43

The Monte-Carlo approach Position evaluation: Launch N random games Evaluation = mean value of outcomes Depth-one MC algorithm: For each move m { Play m on the ref position Launch N random games Move value (m) = mean value } MCTS for Computer Go 44

Depth-one Monte-Carlo............ MCTS for Computer Go 45

Progressive pruning (Billings 2002, Sheppard 2002, Bouzy & Helmstetter 2003) Second best Current best move Still explored Pruned MCTS for Computer Go 46

Upper bound Optimism in face of uncertainty Intestim (Kaelbling 1993), UCB multi-armed bandit (Auer & al 2002) Second best promising Current best proven move Current best promising move MCTS for Computer Go 47

All-moves-as-first heuristic (1/3) r A B D C A B C D MCTS for Computer Go 48

All-moves-as-first heuristic (2/3) A r A B D C A B C D B C Actual simulation D B C o A D MCTS for Computer Go 49

All-moves-as-first heuristic (3/3) A r C A B D C A B C D B B C D Actual simulation B C A D Virtual simulation = actual simulation assuming c is played «as first» o A D o MCTS for Computer Go 50

The Monte-Carlo approach Upsides Robust evaluation Global search Move quality increases with computing power Way of playing Good strategical sense but weak tactically Easy to program Follow the rules of the game No break-down problem MCTS for Computer Go 51

Monte-Carlo and knowledge Pseudo-random simulations using Go knowledge (Bouzy 2003) Moves played with a probability depending on specific domain-dependent knowledge 2 basic concepts string capture 3x3 shapes MCTS for Computer Go 52

Monte-Carlo and knowledge Results are impressive MC(random) << MC(pseudo random) Size 9x9 13x13 19x19 % wins 68 93 98 Other works on simulations Patterns in MoGo, proximity rule (Wang & al 2006) Simulation balancing (Silver & Tesauro 2009) MCTS for Computer Go 53

Monte-Carlo and knowledge Pseudo-random player 3x3 pattern urgency table with 3 8 patterns Few dizains of relevant patterns only Patterns gathered by Warning Human expertise Reinforcement Learning (Bouzy & Chaslot 2006) p1 better than p2 does not mean MC(p1) better than MC(p2) MCTS for Computer Go 54

Monte-Carlo Tree Search (MCTS) How to integrate MC and TS? UCT = UCB for Trees MCTS (Kocsis & Szepesvari 2006) Superposition of UCB (Auer & al 2002) Selection, expansion, updating (Chaslot & al) (Coulom 2006) Simulation (Bouzy 2003) (Wang & Gelly 2006) MCTS for Computer Go 55

MCTS (1/2) while (hastime) { playouttreebasedgame() expandtree() outcome = playoutrandomgame() updatenodes(outcome) } then choose the node with...... the best mean value... the highest visit number MCTS for Computer Go 56

MCTS (2/2) PlayOutTreeBasedGame() { node = getnode(position) while (node) { } } move=selectmove(node) play(move) node = getnode(position) MCTS for Computer Go 57

UCT move selection Move selection rule to browse the tree: move=argmax (s*mean + C*sqrt(log(t)/n) Mean value for exploitation s (=+-1): color to move UCT bias for exploration C: constant term set up by experiments t: number of visits of the parent node n: number of visits of the current node MCTS for Computer Go 58

Example 1 iteration 1/1 1 MCTS for Computer Go 59

Example 2 iterations 1/2 0/1 0 MCTS for Computer Go 60

Example 3 iterations 2/3 1/1 0/1 1 MCTS for Computer Go 61

Example 4 iterations 2/4 0/1 1/1 0/1 MCTS for Computer Go 62

Example 5 iterations 3/5 0/1 1/1 0/1 1/1 MCTS for Computer Go 63

Example 6 iterations 3/6 0/1 1/2 0/1 1/1 0/1 MCTS for Computer Go 64

Example 7 iterations 3/7 0/1 1/2 0/1 1/2 0/1 0/1 MCTS for Computer Go 65

Example 8 iterations 4/8 0/1 2/3 0/1 1/2 1/1 0/1 0/1 MCTS for Computer Go 66

Example 9 iterations 4/9 0/1 2/4 0/1 1/2 0/1 1/1 0/1 0/1 MCTS for Computer Go 67

Example 10 iterations 5/10 0/1 2/4 0/1 2/3 0/1 1/1 0/1 1/1 0/1 MCTS for Computer Go 68

Example 11 iterations 6/11 0/1 2/4 0/1 3/4 0/1 1/1 0/1 1/1 0/1 1/1 MCTS for Computer Go 69

Example 12 iterations 7/12 0/1 2/4 0/1 4/5 0/1 1/1 0/1 1/1 1/2 1/1 1/1 MCTS for Computer Go 70

Example Clarity C = 0 Notice with C!= 0 a node cannot stay unvisited min or max rule according to the node depth not visited children have an infinite mean Practice Mean initialized optimistically MCTS for Computer Go 71

MCTS enhancements The raw version can be enhanced Tuning UCT C value Outcome = score or win loss info (+1/-1) Doubling the simulation number RAVE Using Go knowledge Speed-up In the tree or in the simulations Optimizing, pondering, parallelizing MCTS for Computer Go 72

Assessing an enhancement Self-play The new version vs the reference version % wins with few hundred games 9x9 (or 19x19 boards) Against differently designed programs GTP (Go Text Protocol) CGOS (Computer Go Operating System) Competitions MCTS for Computer Go 73

Move selection formula tuning Using UCB Best value for C? 60-40% Using «UCB-tuned» (Auer & al 2002) C replaced by min(1/4,variance) 55-45% MCTS for Computer Go 74

Exploration vs exploitation General idea: explore at the beginning and exploit in the end of thinking time Diminishing C linearly in the remaining time (Vermorel & al 2005) 55-45% At the end: Argmax over the mean value or over the number of visits? 55-45% MCTS for Computer Go 75

Kind of outcome 2 kinds of outcomes Score (S) or win loss information (WLI)? Probability of winning or expected score? Combining both (S+WLI) (score +45 if win) Results WLI vs S 65-35% S+WLI vs S 65-35% MCTS for Computer Go 76

Doubling the number of simulations N = 100,000 Results 2N vs N 60-40% 4N vs 2N 58-42% MCTS for Computer Go 77

Tree management Transposition tables Tree -> Directed Acyclic Graph (DAG) Different sequences of moves may lead to the same position Interest for MC Go: merge the results Result: 60-40% Keeping the tree from one move to the next Result: 65-35% MCTS for Computer Go 78

RAVE (1/3) Rapid Action Value Estimation Mogo 2007 Use the AMAF heuristic (Brugmann 1993) Result: There are «many» virtual sequences that are transposed from the actually played sequence 70-30% MCTS for Computer Go 79

RAVE (2/3) AMAF heuristic Which nodes to update? A B Actual Sequence ACBD C D C D Virtual Nodes B B A A BCAD, ADBC, BDAC D C Nodes MCTS for Computer Go 80

RAVE (3/3) 3 variables Usual mean value M u AMAF mean value M amaf M = β M amaf + (1-β) M u β = sqrt(k/(k+3n)) K set up experimentally M varies from M amaf to M u MCTS for Computer Go 81

Knowledge in the simulations High urgency for... capture/escape 55-45% 3x3 patterns 60-40% Proximity rule 60-40% Mercy rule Interrupt the game when the difference of captured stones is greater than a threshold (Hillis 2006) 51-49% MCTS for Computer Go 82

Knowledge in the tree Virtual wins for good looking moves Automatic acquisition of patterns of pro games (Coulom 2007) (Bouzy & Chaslot 2005) Matching has a high cost Progressive widening (Chaslot & al 2008) Interesting under strong time constraints Result: 60-40% MCTS for Computer Go 83

Speeding up the simulations Fully random simulations (2007) 50,000 game/second (Lew 2006) 20,000 (commonly eared) 10,000 (my program) Pseudo-random 5,000 (my program in 2007) Rough optimization is worthwhile MCTS for Computer Go 84

Pondering Think on the opponent time 55-45% Possible doubling of thinking time The move of the opponent may not be the planned move on which you think Side effect: play quickly to think on the opponent time MCTS for Computer Go 85

Summing up the enhancements MCTS with all enhancements vs raw MCTS Exploration and exploitation: 60-40% Win/loss outcome: 65-35% Rough optimization of simulations 60-40% Transposition table 60-40% RAVE 70-30% Knowledge in the simulations 70-30% Knowledge in the tree 60-40% Pondering 55-45% Parallelization 70-30% Result: 99-1% MCTS for Computer Go 86

Parallelization Computer Chess: Deep Blue Multi-core computer Symmetric MultiProcessor (SMP) one thread per processor shared memory, low latency mutual exclusion (mutex) mechanism Cluster of computers Message Passing Information (MPI) MCTS for Computer Go 87

Parallelization while (hastime) { playouttreebasedgame() expandtree() outcome = playoutrandomgame() updatenodes(outcome) } MCTS for Computer Go 88

Leaf parallelization MCTS for Computer Go 89

Leaf parallelization (Cazenave Jouandeau 2007) Easy to program Drawbacks Wait for the longest simulation When part of the simulation outcomes is a loss, performing the remaining may not be a relevant strategy. MCTS for Computer Go 90

Root parallelization MCTS for Computer Go 91

Root parallelization (Cazenave Jouandeau 2007) Easy to program No communication At completion, merge the trees 4 MCTS for 1sec > 1 MCTS for 4 sec Good way for low time settings and a small number of threads MCTS for Computer Go 92

Tree parallelization global mutex MCTS for Computer Go 93

Tree parallelization local mutex MCTS for Computer Go 94

Tree parallelization One shared tree, several threads Mutex Global: the whole tree has a mutex Local: each node has a mutex «Virtual loss» Given to a node browsed by a thread Removed at update stage Preventing threads from similar simulations MCTS for Computer Go 95

Computer-computer results Computer Olympiads 19x19 9x9 2010 Erica, Zen, MFGo MyGoFriend 2009 Zen, Fuego, Mogo Fuego 2008 MFGo, Mogo, Leela MFGo 2007 Mogo, CrazyStone, GNU Go Steenvreter 2006 GNU Go, Go Intellect, Indigo CrazyStone 2005 Handtalk, Go Intellect, Aya Go Intellect 2004 Go Intellect, MFGo, Indigo Go Intellect MCTS for Computer Go 96

9x9 2009: Mogo won a pro with black Human-computer results 19x19: 2009: Fuego won a pro with white 2008: Mogo won a pro with 9 stones Crazy Stone won a pro with 8 stones Crazy Stone won a pro with 7 stones 2009: Mogo won a pro with 6 stones MCTS for Computer Go 97

MCTS and the old approach Pro ranking Amateur ranking Top professional players Very strong players 9 dan 1 dan 9x9 go 9 dan 6 dan Strong players Average players 1 dan 1 kyu 10 kyu 19x19 go MCTS Beginners 20 kyu Old approach Very beginners 30 kyu MCTS for Computer Go 98

Computer Go (MC history) Monte-Carlo Go (Brugmann 1993) MCGo devel. (Bouzy & Helmstetter 2003) MC+knowledge (Bouzy 2003) UCT (Kocsis & Szepesvari 2006) Crazy Stone (Coulom 2006) Mogo (Wang & Gelly 2006) MCTS for Computer Go 99

Conclusion Monte-Carlo brought a Big improvement in Computer Go over the last decade! No old approach based program anymore! All go programs are MCTS based! Professional level on 9x9! Dan level on 19x19! Unbelievable 10 years ago! MCTS for Computer Go 100

Some references PhD, MCTS and Go (Chaslot 2010) PhD, Reinf. Learning and Go (Silver 2010) PhD, R. Learning: applic. to Go (Gelly 2007) UCT (Kocsis & Szepesvari 2006) 1 st MCTS go program (Coulom 2006) MCTS for Computer Go 101

Web links http://www.grappa.univ-lille3.fr/icga/ http://cgos.boardspace.net/ http://www.gokgs.com/ http://www.lri.fr/~gelly/mogo.htm http://remi.coulom.free.fr/crazystone/ http://fuego.sourceforge.net/... MCTS for Computer Go 102

Thank you for your attention! MCTS for Computer Go 103