COMP-424: Artificial intelligence. Lecture 7: Game Playing

Transcription

1 COMP Artificial Intelligence Lecture 7: Game Playing Instructor: Joelle Pineau (jpineau@cs.mcgill.ca) Class web page: Unless otherwise noted, all material posted for this course are copyright of the instructor, and cannot be reused or reposted without the i nstructor s written permission. Quick recap Standard assumptions (except for the last lecture) Discrete (vs continuous) state space. Deterministic (vs stochastic) environment. Observable (vs unobservable) environment. Static (vs changing) environment. A single AI agent. 2 1

2 Overview of today s lecture Adversarial search => games with 2 players Minimax search Evaluation functions Alpha-beta pruning State-of-the-art game playing programs 3 Game playing One of the oldest, most well-studied domains in AI! Why? People like them! And are good at playing them. Many games are hard: State spaces are very large and complicated. E.g. Chess has branching factor of 35, games go to 50 moves per player, so nodes in search tree! Sometimes there is stochasticity and imperfect information. Real-time constraints (e.g. fixed amount of time between moves). Clear, clean description of the environment. Easy performance indicator. Winning is everything! 4 2

3 Types of games Perfect vs Imperfect information Perfect: See the exact state of the game Imperfect: Information is hidden 5 Types of games Perfect vs Imperfect information Perfect: See the exact state of the game Imperfect: Information is hidden Deterministic vs Stochastic Deterministic: Change in state is fully determined by player move. Stochastic: Change in state is partially determined by chance. 6 3

4 What type of game? Yes Perfect information? No Deterministic Checkers Othello Chess Go Mastermind Stochastic Backgammon Scrabble Bridge Poker Most card games 7 Game playing as search Consider 2-player, perfect information, deterministic games. Can we formulate them as a search problem? State: Operators: Goal: Cost: We want to find a strategy (i.e. way of picking moves) that maximizes utility (i.e. max prob. of win or min cost). 8 4

5 Game playing as search Consider 2-player, perfect information, deterministic games. Can we formulate them as a search problem? State: State of the board, player turn Operators: Legal moves Goal: States in which the game is won/lost/drawn Cost: We want to find a strategy (i.e. way of picking moves) that maximizes utility (i.e. max prob. of win or min cost). 9 Game playing as search Consider 2-player, perfect information, deterministic games. Can we formulate them as a search problem? State: State of the board, player turn Operators: Legal moves Goal: States in which the game is won/lost/drawn Cost: Simple case: +1 for winning, -1 for loosing, 0 for a draw. More complex cases: points won, money, We want to find a strategy (i.e. way of picking moves) that maximizes utility (i.e. max prob. of win or min cost). 10 5

6 Game search challenge Not quite the same as simple searching. There is an opponent! The opponent is malicious! Opponent is trying to make things good for itself, and bad for us. We have to simulate the opponent s decisions. Key idea: Define a max player (who wants to maximize its utility) And a min player (who wants to minimize it.) 11 Example: Game Tree for Tic-Tac-Toe 12 6

7 Minimax search Expand complete search tree, until terminal states have been reached and their utilities computed. Go back up from leaves towards the current state of the game. At each min node: backup the worst value among the children. At each max node: backup the best value among the children. 13 Minimax search Expand complete search tree, until terminal states have been reached and their utilities computed. Go back up from leaves towards the current state of the game. At each min node: backup the worst value among the children. At each max node: backup the best value among the children. 14 7

8 Minimax Algorithm Operator MinimaxDecision() For each legal operator o: Apply the operator o and obtain the new game state s. Value[o] = MinimaxValue(s) Return the operator with the highest value Value[o]. Double MinimaxValue(s) If isterminal(s), return Utility(s). For each state s in Successors(s) Let Value(s ) = MinimaxValue(s ). If Max is to move in s, return max s Value(s ). If Min is to move in s, return min s Value(s ). 15 Properties of Minimax search Complete? Optimal? Time complexity? Space complexity? 16 8

9 Properties of Minimax search Complete? If the game tree is finite. Optimal? Against an optimal opponent (otherwise don t know.) Time complexity? O(b m ) Space complexity? O(bm) if we use DFS Why not use Minimax to solve chess? In chess: b 35, m 100, so an exact solution is impossible! 17 Coping with resource limitations Suppose we have 100 seconds to make a move, and we can search 10 4 nodes per second. Can only search 10 6 nodes. (Or even fewer, if we spend time deciding which nodes to search.) Possible approach: Use a cutoff test (e.g. based on depth limit) Use an evaluation function for the nodes where we cutoff the search. Need to think about real-time search. 18 9

10 Evaluation functions An evaluation function v(s) represents the goodness of a board state (e.g. chance of winning from that position). If the features of the board can be evaluated independently, use a weighted linear function: v(s) = w 1 f 1 (s) + w 2 f 2 (s) + + w n f n (s) (where s is board state) This function can be given by the designer or learned from experience. 19 Example: Chess Black to move White slightly better White to move Black winning Linear evaluation function: v(s) = w 1 f 1 (s) + w 2 f 2 (s) w 1 = 9 f 1 (s) = (# white queens) - (# black queens) w 2 = 3 f 2 (s) = (# white pawns) - (# black pawns) 20 10

11 How precise should the evaluation fn be? Evaluation function is only approximate, and is usually better if we are close to the end of the game. Move chosen is the same if we apply a monotonic transformation to the evaluation function. 21 How precise should the evaluation fn be? Evaluation function is only approximate, and is usually better if we are close to the end of the game. Move chosen is the same if we apply a monotonic transformation to the evaluation function. Only the order of the numbers matter: payoffs in deterministic games act as an ordinal utility function

12 Minimax with an evaluation function Use evaluation function to evaluate non-terminal nodes. Helps us make a decision without searching until the end of the game. Minimax cutoff algorithm: Same as standard Minimax, except stop at some maximum depth m and use the evaluation function on those nodes. 23 Minimax cutoff in Chess How many moves ahead can we search in Chess? >> 10 6 nodes with b=35 allows us to search 4 moves ahead! Is this useful? 4-moves ahead novice player 8-moves ahead human master, typical PC 12-moves ahead Deep Blue, Kasparov Key idea: Search few lines of play, but search them deeply.need pruning! 24 12

13 α-β Pruning example α-β Pruning example

14 α-β Pruning example X X 27 α-β Pruning example X X 28 14

15 α-β Pruning example X X 29 α-β Pruning example X X

16 α-β Pruning Simple idea: if a path looks worse than what we already have, discard it. If the best move at a node cannot change (regardless of what we would find by searching) then no need to search further! Algorithm is like Minimax, but keeps track of best leaf value for our player (α) and best one for the opponent (β). Standard technique for deterministic, perfect information games. 31 α-β algorithm Instead of MinimaxValue, we have two functions. Double MaxValue(s,α,β) If cutoff(s), return Evaluation(s). For each state s in Successors(s) Let α = max { α, MinValues(s,α,β) }. If α β, return β. Return α. Double MinValue(s,α,β) If cutoff(s), return Evaluation(s). For each state s in Successors(s) Let β = min { β, MinValues(s,α,β) }. If α β, return α. Return β

17 Example 33 Example 34 17

22 Example 43 Important lessons of α-β pruning Pruning can greatly increase efficiency!! Pruning does not affect the final result. Best moves are same as returned by Minimax (assuming the opponent is optimal and the evaluation function is correct.) 44 22

23 Important lessons of α-β pruning Pruning can greatly increase efficiency!! Pruning does not affect the final result. Best moves are same as returned by Minimax (assuming the opponent is optimal and the evaluation function is correct.) Order matters for search efficiency! 45 α-β Pruning example X X On the middle branch, nodes were ordered well and we pruned a lot. On the right branch, the order was bad and there was no pruning

24 Important lessons of α-β pruning Pruning can greatly increase efficiency!! Pruning does not affect the final result. Best moves are same as returned by Minimax (assuming the opponent is optimal and the evaluation function is correct.) Order matters for search efficiency! The α-β pruning demonstrates the value of reasoning about which computations are important! 47 Efficiency of α-β pruning With bad move ordering, time complexity is O(b m ) Means nothing was pruned. With perfect ordering, time complexity is O(b m/2 ) Means double the search depth, for same resources. In chess: this is difference between novice and expert player. On average, O(b 3m/4 ), if we expect to find the max/min after b/2 expansions. Randomizing the move ordering can achieve the average case. Evaluation function can be used to order the nodes

25 Forward pruning Simple idea (for domains with large branching factor): Only explore n best moves for current state (according to the evaluation function). Unlike α-β pruning, this can lead to sub-optimal solution. Can be very efficient with a good evaluation function. 49 Drawbacks of α-β If the branching factor is really big, search depth is still too limited. E.g. in Go, where branching factor b~300. More on this next time! Optimal play is guaranteed against an optimal opponent if search proceeds to the end of the game. But the opponent may not be optimal! If heuristics are used, this assumption turns into the opponent playing optimally according to the same heuristic function as the player. This is a very big assumption! What to do if the opponent plays very differently? 50 25

26 Chinook (Schaeffer et al., U. of Alberta) Best checkers player (since 1990 s). Plain α-β search, performed on standard PCs. Evaluation function based on expert features of the board. Opening database. HUGE endgame database! Chinook has perfect information for all checkers positions involving 8 or fewer pieces on the board (a total of 443,748,401,247 positions). Only a few moves in middle of the game are actually searched. They have now done an exhaustive search for checkers, and through optimal play can force at least a draw. 51 Deep Blue (IBM) Specialized chess processor, special-purpose memory architecture. Very sophisticated evaluation function (expert features, tuned weights). Database of standard openings/closings. Uses a version of α-β pruning (with undisclosed improvements) Can search up to 40-deep in some branches. Can search over 200 billion positions per second! Overall, an impressive engineering feat. Now, several computer programs running on regular hardware are on par with human champions (e.g. Fritz)

27 53 Some possible strategies for AI in games Design a compact state representation. Search using iterative deepening to get anytime real-time play. Use alpha-beta pruning with an evaluation function. Order moves using the evaluation function. Tune the evaluation function using domain knowledge, trial-and-error, learning. Searching deeper is often more important than having a good evaluation function. Consider using different strategies for opening, middle and endgame. Use randomization to break ties. Consider that the opponent might not be optimal. It is crucial to decide where to spend the computation effort! 54 27

28 Summary Understand the different types of games. Understand Minimax and alpha-beta pruning, what they compute, how to implement, and common methods to make them work (e.g. node ordering). Understand the role of the evaluation function. Be familiar with state-of-the-art for solving some common games