CS182 Lecture 5: Games and Adversarial Search

Similar documents
Game playing. Chapter 6. Chapter 6 1

Chess Algorithms Theory and Practice. Rune Djurhuus Chess Grandmaster / October 3, 2012

Chess Algorithms Theory and Practice. Rune Djurhuus Chess Grandmaster / September 23, 2014

Game Playing in the Real World. Next time: Knowledge Representation Reading: Chapter

CS MidTerm Exam 4/1/2004 Name: KEY. Page Max Score Total 139

AI: A Modern Approach, Chpts. 3-4 Russell and Norvig

Measuring the Performance of an Agent

10. Machine Learning in Games

Decision Making under Uncertainty

Minimax Strategies. Minimax Strategies. Zero Sum Games. Why Zero Sum Games? An Example. An Example

FIRST EXPERIMENTAL RESULTS OF PROBCUT APPLIED TO CHESS

Game Theory and Algorithms Lecture 10: Extensive Games: Critiques and Extensions

Search methods motivation 1

Introduction Solvability Rules Computer Solution Implementation. Connect Four. March 9, Connect Four

Laboratory work in AI: First steps in Poker Playing Agents and Opponent Modeling

The Taxman Game. Robert K. Moniot September 5, 2003

Acknowledgements I would like to thank both Dr. Carsten Furhmann, and Dr. Daniel Richardson for providing me with wonderful supervision throughout my

TEACHER S GUIDE TO RUSH HOUR

Mastering Quoridor. Lisa Glendenning THESIS. Submitted in Partial Fulfillment of the Requirements for the Degree of

Lab 11. Simulations. The Concept

COMP 590: Artificial Intelligence

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

Sequential lmove Games. Using Backward Induction (Rollback) to Find Equilibrium

TD-Gammon, A Self-Teaching Backgammon Program, Achieves Master-Level Play

Monte Carlo Tree Search and Opponent Modeling through Player Clustering in no-limit Texas Hold em Poker

20-30 minutes, can be used within a longer activity

Keywords-Chess gameregistration, Profile management, Rational rose, Activities management.

Roulette Wheel Selection Game Player

6.207/14.15: Networks Lecture 15: Repeated Games and Cooperation

Improving Depth-first PN-Search: 1 + ε Trick

Monte-Carlo Methods. Timo Nolle

A Knowledge-based Approach of Connect-Four

SIMS 255 Foundations of Software Design. Complexity and NP-completeness

Clock Arithmetic and Modular Systems Clock Arithmetic The introduction to Chapter 4 described a mathematical system

CSE 517A MACHINE LEARNING INTRODUCTION

Dynamic programming formulation

Load Balancing. Load Balancing 1 / 24

Best-First and Depth-First Minimax Search in Practice

6.254 : Game Theory with Engineering Applications Lecture 2: Strategic Form Games

UNIVERSALITY IS UBIQUITOUS

Bayesian Nash Equilibrium

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

University of Alberta. Library Release Form

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

10/13/11 Solution: Minimax with Alpha-Beta Pruning and Progressive Deepening

A Systematic Approach to Model-Guided Empirical Search for Memory Hierarchy Optimization

INTRODUCTION 4 1 OVERALL LOOK ON CHESS PROGRAMMING 8

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

One pile, two pile, three piles

Course Outline Department of Computing Science Faculty of Science. COMP Applied Artificial Intelligence (3,1,0) Fall 2015

Math Games For Skills and Concepts

Standard 12: The student will explain and evaluate the financial impact and consequences of gambling.

History of Artificial Intelligence. Introduction to Intelligent Systems

Computer Game Programming Intelligence I: Basic Decision-Making Mechanisms

CSE 326: Data Structures B-Trees and B+ Trees

Near Optimal Solutions

Checkers Is Solved. *To whom correspondence should be addressed.

Gamesman: A Graphical Game Analysis System

From Last Time: Remove (Delete) Operation

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Game theory and AI: a unified approach to poker games

(Refer Slide Time: 2:03)

Watson. An analytical computing system that specializes in natural human language and provides specific answers to complex questions at rapid speeds

CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 313]

Optimization in ICT and Physical Systems

CMPSCI611: Approximating MAX-CUT Lecture 20

CSC384 Intro to Artificial Intelligence

Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Generalized Widening

Rafael Witten Yuze Huang Haithem Turki. Playing Strong Poker. 1. Why Poker?

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

Guessing Game: NP-Complete?

Smart Graphics: Methoden 3 Suche, Constraints

Understanding Proactive vs. Reactive Methods for Fighting Spam. June 2003

Math Quizzes Winter 2009

MetaGame: An Animation Tool for Model-Checking Games

1 Representation of Games. Kerschbamer: Commitment and Information in Games

6.042/18.062J Mathematics for Computer Science. Expected Value I

International Journal of Advanced Research in Computer Science and Software Engineering

Lecture 10: Regression Trees

How I won the Chess Ratings: Elo vs the rest of the world Competition

6.080/6.089 GITCS Feb 12, Lecture 3

BASIC RULES OF CHESS

Final Exam. Route Computation: One reason why link state routing is preferable to distance vector style routing.

Learning Agents: Introduction

Classification/Decision Trees (II)

BPM: Chess vs. Checkers

How to Win Texas Hold em Poker

Decision Theory Rational prospecting

A Working Knowledge of Computational Complexity for an Optimizer

AP Stats - Probability Review

CHAPTER 15: IS ARTIFICIAL INTELLIGENCE REAL?

The Turing Test! and What Computer Science Offers to Cognitive Science "

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

The UCT Algorithm Applied to Games with Imperfect Information

Dynamic Programming. Lecture Overview Introduction

How To Understand The Relation Between Simplicity And Probability In Computer Science

Lecture Note 1 Set and Probability Theory. MIT Spring 2006 Herman Bennett

Transcription:

CS182 Lecture 5: Games and Adversarial Search Agenda EOLQ: uninformed and informed search Games The difference an adversary makes Minimax Alpha-beta pruning Static evaluation functions (for depth-limited search) Games with chance (briefly: Expectimax) Announcements EOLQ CS 182: Intelligent Systems: Reasoning, Actions, & Plans Fall 2013

Announcements AIMA 3e Reading: 5 through 5.5 and 5.7 (skip 5.6) Assignment 1: (1) due Thursday; (2) extra credit problem: Optional. Can earn up to 2 points, but can t get more than 100 total points on assignment. Good job answering each other in the forum!!! Depth-first search in PacMan: impact of graph search, saving states to check for repetitions. Lecture of interest: Dave Ferrucci (leader of IBM Watson effort), Wednesday September 18th, 5-6pm in MD 119. See course website for more information. CS 182: Intelligent Systems: Reasoning, Actions, & Plans Fall 2013

Search Summary Systematic search to find sequences of actions (aka paths ) from initial state to a goal state: Uninformed search: BFS, DFS, UCS, IDS Heuristic functions estimate costs of shortest paths and can dramatically reduce search cost. A* search expands lowest g+h: Complete and optimal. IDA* Greedy best-first search expands lowest h: Incomplete and not always optimal Local search: for optimization Don t need path (seq. of acts) so can ignore history Do need an objective (evaluation) function Hill-climbing (many variants), simulated annealing, beam search, genetic algorithms CS 182: Intelligent Systems: Reasoning, Actions, & Plans Fall 2013

Why/When Use Local Search? Solution, not path, is what matters. Objective function Better is better than best Large/infinite complex state spaces Gain: little memory and ability to operate in large landscapes. Lose: optimality/completeness

An Additional Species of Search Problem So far Systematic search for action sequences: least-cost path to a goal at unknown depth Searching for a goal state: Local/Optimization New: Choosing an action in the presence of an adversary: Games! Examples: tic-tac-toe, chess, backgammon, Adversary might prevent path to best goal Want best assured outcome

Game Playing Characteristics of games Unpredictable opponent, so solution is a strategy, specifying a move for every possible opponent reply. Time limits: need to approximate rather than do complete search. Also, course will consider only 2-player game, zerosum (win, lose, draw games or fast/anytime valued). Search goal: given state of the game, choose a next move, by Evaluating the future, i.e., how the game might proceed from this state. Choosing as the best move the move that leads to best payoff in the future. what s best?

Why games? One of the first tasks undertaken by AI Easy to represent, precise rules Games and strategy coincide with our intuitive notions of intelligence (humans as symbol manipulators, Newell & Simon) Better than people in Othello and Checkers, defeated world champions in chess, backgammon... but not Go Games are often too hard to solve optimally: Chess has a branching factor of 35, 35 100 nodes Drives bounded-rationality research They re fun!

Types of Games

Research Program Advice We may hope that machines will eventually compete with men in all purely intellectual fields. But which are the best ones to start with? Even this is a difficult decision. Many people think that a very abstract activity, like the playing of chess, would be best. It can also be maintained that it is best to provide the machine with the best sense organs that money can buy, and then teach it to understand and speak English... Again I do not know what the right answer is, but I think both approaches should be tried. Turing, Computing Machinery and Intelligence, Mind, 1950. p. 460

Representing Games as Trees S Agent Moves Opponent Moves Agent Moves G Possible Goal State (winning situation for agent)

Game Tree for Tic-Tac Toe

Tic-tac-toe: first move Start Node X X X

Tic-tac-toe: second move Start Node O X O X X X O X X O X X O X

Tic-tac-toe: third move Start Node X O X O X X O X O X O X O X

The Rational Opponent The Game tree gives us the possible ways the game could proceed, but how do we decide what s the best move? Especially since we can t plan or control the whole path Minimax principle: [Von Neumann/Morgenstern, 1944] Assume that both players always play optimally You try to maximize your winnings They try to minimize your winnings

Winning Strategy? I choose I choose You choose You choose I win I lose I win I lose I win I win I win I lose I don t have a winning strategy strategy. I do have a winning strategy.

Evaluating Moves 1 me 1 0 you 1 1 1 0 me value (me-node) = value (you-node) = max of children min of children

Generalize to Different Leaf-node Values 2 me = max 2 1 you = min 2 7 1 8 me = max

A Max vs Min Game Tree Suppose there are two players: MAX (that s you!) and MIN (your opponent) After one turn each, say the game is over. What move should MAX pick? 3 max(3,2,2) MAX has 3 possible moves: a1, a2, a3 3 min(3,12,8) 2 min(2,4,6) 2 min(14,5,2) MIN has 3 possible responses in each case Utility (winnings) of the final state Conceptually, Construct game tree (root to leaves) Propagate utility upwards (leaves to root) Minimax Principle: Maximize your winnings, given that your opponent will try to minimize them

Minimax Approach Basic Algorithm Construct search tree down to the leaves Determine minimax values at leaves, Propagate minimax values up the tree Eventually, read off the minimax decision at root. Formally Let n = node, n = children of node n MINIMAX_VALUE (n) = Utility(n), if n is terminal node (leaf) = max ( MINIMAX(n ) ) if n is MAX node (maximize the value) = min ( MINIMAX(n ) ) if n is MIN node (minimize the value)

Minimax algorithm Uses depth-first search (to save space) 21

Exercise: Minimax Search 3 6 7 5 9 7 9 11 6 5 3 7 3 2 8 9

Properties of minimax Complete? Only if tree is finite; note that a finite strategy can exist even in an infinite tree Optimal? Yes, against an optimal opponent; otherwise? Time complexity? O(b m ) (b = legal moves, m = maximum depth of tree) Space complexity? O(bm) (depth-first exploration Yikes!! 23

Time Complexity Too High to Play the Games Suppose the game tree depth is at most m, with b possible moves Time = O(b m ) is still exponential What is the size for Chess? Tic-tac-toe? Go? Backgammon?

Some Sizes of Game Trees Chess: b ~ 35 (average branching factor) d ~ 100 (depth of game tree for typical game) b d ~ 35 100 (~10 154) nodes!! Tic-Tac-Toe At most 9 half-moves, <=9 choices 9! = 362,880 (Computer goes first) 8! = 40,320 (Computer goes second) Backgammon: b ~20 x 20 (because of chance nodes) Go: Branching factor starts at 361 (19x19 board)

Can We Avoid Exploring Every Path? In games, time is everything! Two strategies that help: Prune the tree (alpha-beta pruning) Don t waste time looking at situations that don t improve anything Look a limited way into the future (heuristic function) Use a heuristic to estimate the future goodness of this choice (ala A*?) without exploring all the way

Pruning the Search 2 max ( me ) =2 min ( you ) 1 2 7 1 max

and values ( value) 2 max ( me ) ( value) =2 1 min ( you ) 2 7 1 max No point expanding this node further.

and Values Going Deeper =2 max ( me ) =2 min ( you ) =2 =7 max 1 2 7 No point expanding this node further.

α-β pruning example 30

α-β pruning example 31

α-β pruning example 32

α-β pruning example 33

α-β pruning example 34

- Pruning Approach: cutoff at a MIN node Consider a MAX node, and the second MIN node below it. If the MAX player will not choose the action leading to that node then can prune a MAX b c At MAX = the best (highest) choice we have found so far at any choice point along path for MAX 3 Eventually, your minimax value is Anything less than 3 on b branch, prune this effort!

- Pruning Approach: cutoff at a MAX node Now consider a MIN node, and the second MAX mode below it. Again, if the MIN player will not choose the action leading to that node then can prune a MIN b c At MIN = the best (lowest) choice we have found so far at any choice point along path for MIN 100 Eventually, your minimax value is Anything more than a 100 on b branch, prune this effort!

Putting it Together = 3 α = 3 MIN a MAX b = 2 c At MAX = the best (highest) choice we have found so far at any choice point along path for MAX Eventually, your minimax value is a b c At MIN = the best (lowest) choice we have found so far at any choice point along path for MIN α = 100 α = 150 α = 3 Eventually, your minimax value is

and Cut-off Values Provisional backed up values; become final when done on path below them. PBV( me -node) = max(value of successors so far) = value value can never decrease PBV( you -node) = min(value of successors so far) = value value can never increase

Cutting Off Useless Search - procedure: stop searching below any min ( you ) node s.t. value value of any max ancestor [ - cutoff] max ( me ) node s.t. value value of any min ancestor [ - cutoff]

The α-β algorithm 40

MAX 2 MIN 2 a = - b = 2 3 2 2 1 MAX 3 3 2 2 3 3 1 2 MIN 3 6 3 2 3 3 2 2 32 2 2 2 1 1 0 30 6 3 2 1 5 8 2 1 5 3 0 8 The entire subtree is pruned 88 88 8

Effectiveness of - pruning The effectiveness relies on having a good move ordering heuristic If we are considering children left to right, then we don t prune last tree even though we could have

Pruning does not affect final result Good move ordering improves effectiveness of pruning With perfect ordering time complexity=o(b m/2 ) doubles depth of search (In worst case, there is no improvement) A simple example of the value of reasoning about which computations are relevant (a form of metareasoning) Unfortunately, 35 50 is still impossible Properties of α-β 43

Try It Out: α-β Search 3 6 7 5 9 7 9 11 6 5 3 7 3 2 8 9

The Importance of Move Ordering (Knuth & Moore 75) [best-case] If successors are ordered best-first Only examine O(b d/2 ) nodes instead of O(b d ) Can look twice as far ahead in same amount of time! [avg-case] If successors are examined in random order Then nodes will be O(b 3d/4 ) for moderate b [worst case] No improvement over exhaustive search For chess, a fairly simple ordering function (e.g. captures, then threats, then forward, then backward moves) gets within about a factor of 2 to theoretical limit

What more can we do? Minimax search with alpha-beta pruning Has to search all the way to terminal states at least once Still much too expensive in typical games such as Chess. What else can we do? Bound the depth of search (limit) Treat bottom nodes as terminal nodes Use evaluation function (aka heuristic evaluation function) to estimate the utility of these nodes (whether they will win) What makes a good heuristic?

Heuristic Evaluation Functions Examples Othello: Number of white pieces - Number of black pieces Chess: Value of all white pieces - Value of all black pieces Heuristics are where domain knowledge comes in Not as theoretically nice as with A*/admissibility f Instead you ask domain 3 = f 1 /f expert 2 for features Express as linear weighted sum Maybe learn the weights? Problems?? Features for chess: f 1 = number of white pieces f 2 = number of black pieces f 4 = number of white bishops f 5 = estimate of threat to white king Eval(s)=w 1 f 1 (s)+w 2 f 2 (s)+ +w 5 f 5 (s)

Other Important Techniques Iterative Deepening Instead of a fixed depth, keep solutions at different depths. When you run out of time, choose the last solution. Transposition Table Different permutations of move, can result in previously seen games. Store and remove repeated states (essential!)

More Important Techniques Quiescent Search Use deeper search on positions where there might be wild swings in value (non-quiescent positions) e.g. in chess, if there is the potential for a capture (domain knowledge v. important here) Endgame/Opening Databases Precompute choices for smaller games, and store them Checkers: 400 billion positions with less than 9 pieces Chess: Ken Thompson (UNIX) and Stiller solved all 5-piece endgames (these games can be quite long!!!)

Games with chance Some games have chance events Rolling a dice Flipping a coin Drawing a card These are called chance nodes How to consider chance nodes? Evaluate the expected value 50

Example Game with Chance: Backgammon In backgammon, the goal is to get your pieces (say white) all the way off the board white must travel from 0 to 25 to do this. You can move to a position so long as 1 opponent, if 1 opponent you capture them (reset them to zero) CHANCE comes in because you roll dice to determine what moves you can take

With Chance, Need to Compute Expected Value Assume the dice has n outcomes d 1,d 2,d 3,..,d n Every outcome has a value v(d 1 ),v(d 2 ),..,v(d n ) Every outcome has a probability p(d 1 ),p(d 2 ),..,p(d n ) The expected value is p(d 1 )v(d 1 )+p(d 2 )v(d 2 )+ +p(d n )v(d n ) Also called the weighted average 52

Computing the Optimal Move When Chance Nodes Include chance nodes in addition to MAX and MIN nodes, Calculate expected minimax values Explodes branching-factor O(b d n d ), for n distinct rolls backgammon, n=21, b 20 so around 3-ply is possible

54

Search tree with probabilities MAX 3-1 0.5 0.5 0.5 0.5 MIN 2 4 0-2 2 4 7 4 6 0 5-2 55

Search tree with probabilities MAX 3.6-1 0.2 0.8 0.5 0.5 MIN 2 4 0-2 2 4 7 4 6 0 5-2 56

Game Tree with Chance Nodes MAX move (for MIN) (for MAX) MIN move MAX move

Summary of Adversarial Search Algorithms Game Playing Programs The goal is to choose a next move Evaluate potential futures, then choose the best move Model the Opponent as being just as smart Two Key Questions: How do we compute best? How do we compute it fast We saw Minimax principle ( best ) Alpha-Beta pruning ( fast ) Heuristic Evaluation Functions (fastest, but at the expense of best) Chance games are a lot harder.

Games have a long history in AI Famous early computer scientists thought a lot about games, competition, and economics John McCarthy, Allen Newell, Herb Simon: chess Arthur Samuel wrote a checkers program that could learn. Hans Berliner, Backgammon Von Neumann (cf. Game Theory) Chess Ratings for Chess programs 1950s, Shannon, Turing 1958s, NSS Chess (Newell et al, McCarthy) 1997, Deep Blue 1987, HITECH (Berliner, et al) First to defeat human grandmaster 1982, Belle (Condon & Thompson) Rating 2250, first master-level) 1974, Kaissa, (Moscow ITEP, 1900) 1967, MacHack 6 (Greenblatt et al, 1400) Below novice

State-of-the art game programs Optimal strategies for Othello (Logistello, 1997) Backgammon champion is a computer 60

Deep blue: Chess Deep Blue defeated human world champion Garry Kasparov in a six-game match in 1997. Other strong programs: Deep Junior, and Fritz. Two camps: emulation camp and engineering camp. Mostly engineering + lots of knowledge Complex evaluation function: material value, positional value, pawn structure, obstructed lines, center situation, mobility, etc. Vast library of openings and endgames compiled by a team of Chess Masters. No learning!! 62

Chinook: optimal checkers player In 2007, the entire search tree of Checkers was spanned and the optimal strategy was calculated How Chinook works? Parallel iterative alpha-beta, deep searches (17-30) Heuristic function: weighted sum of 25 components; 4 parameter sets for different phases of the game Perfect knowledge of all endgames with <= 7 and 40% of 8 piece positions (about 40x10 9 ) in 2GB Openings library of 6,000 positions + antibook : library of bad positions 64

Checkers is Solved Originally published in Science Express on 19 July 2007 Science 14 September 2007: Vol. 317, no. 5844, pp. 1518 1522 Jonathan Schaeffer (University of Alberta), Neil Burch, Yngvi Björnsson, Akihiro Kishimoto, Martin Müller, Robert Lake, Paul Lu, Steve Sutphen 65

What do most programs do? Combination of Brute force search Heuristics Game database Some programs attempt to learn Improve the heuristic function by comparing its estimates with actual outcomes Attempt to discover rules and heuristic functions 66

Game of Go A very popular game in Japan More than 3,000 years old Branching factor is ~360 Search depth is also ~360 No good evaluation function Minimax approach not effective Open challenge! Very recent approach achieved master level in a small version of Go, using a different (not minimax) approach 67

Games Summary Games are fun to work on! They illustrate several important points about AI Perfection is unattainable must approximate Good idea to think about what to think about Uncertainty constrains the assignment of values to states Optimal decisions depend on information state, not real state Games are to AI as grand prix racing is to automobile design 68

Games: Some Final Project Ideas Scrabble Dictionary gives the computer an edge finding words But what is a good strategy? ( chance ) Connect 4 Tutor Game trees allow you to model games. Instead of using this to defeat an opponent, can you train the opponent Can you help them learn to be better? Simulated Soccer Soccer is a *very* different game distributed and continuous space/time How would you design a team? (robocup simulation league) Technical Game Playing Algorithms Local search (over moves); Monte Carlo Evaluation, etc

Summary: Classes of Search Algorithms Systematic search for action sequence: want least-cost path to goal, which typically is at unknown depth. Optimization/Local search to find state that maximizes/minimizes objective function (which captures important/relevant properties of states). Decisions with an adversary (games) search to find best assured outcome in presence of an adversary who might prevent path to best goal

Upcoming lectures and sections Sept. 19: Constraint satisfaction Thursday/Friday sections: Games and Constraint Satisfaction Next week: Representation and modeling wider (than search) worlds CS 182: Intelligent Systems: Reasoning, Actions, & Plans Fall 2013 71

Announcements For Thursday: AIMA 3e Chapter 6 (skim 6.5) Keep up the good work answering each other in the forum!!! Assignment 1: (1) due Thursday; (2) extra credit problem: Optional. Can earn up to 2 points, but can t get more than 100 total points on assignment. Lecture of interest: Dave Ferrucci (leader of IBM Watson effort), Wednesday September 18th, 5-6pm in MD 119. See course website for more information. CS 182: Intelligent Systems: Reasoning, Actions, & Plans Fall 2013

EOLQ What question didn t you get to ask today? What s still puzzling or not clear? What idea would you like to hear a bit more about? CS 182: Intelligent Systems: Reasoning, Actions, & Plans Fall 2013