Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Sections 1 and 2 Fall 2010

Computational aspects of two-player zero-sum games Course notes for Computational Game Theory Sections 1 and 2 Fall 2010 Peter Bro Miltersen November 1, 2010 Version 1.2

1 Introduction The topic of this course is Computational game theory. Informally, a game is a mathematical model of a situation where agents with conflicting interests interact. Computational game theory studies algorithms for solving games, i.e., algorithms where the input is a finite description of a game and the output is a solution to the game. What we mean by solution may vary. Game theory has several solution concepts. Some of these solution concepts are motivated descriptively or even predictively, such as Nash equilibrium which is, informally, a stable way of playing a game. Other solution concepts are motivated prescriptively. For such solution concepts, solutions are best thought of as advice we may give to agents about how to play the game well. An example is a maximin strategy which is a strategy achieving the best possible guaranteed outcome from the perspective of one of the agents. While there are many other solution concepts, the two above are extremely central and variations of them will take up most of our time. In fact, in this incarnation of the course we shall be concerned almost exclusively with two-player zero-sum games where the two notions will be seen to coincide. As theoretical computer scientists studying computational game theory, we shall be concerned with different ways of discretely representing games. As it happens, this is a concern shared with pure game theory, where such succinct representations as the extensive form were developed already in the 1940s. A more unique concern of computer science is our focus on worst case correctness and worst case time complexity of the algorithms we develop. The insistence of correctness may seem obvious but becomes less so when one considers that we shall be working in a domain where real numbers play a big role. We shall not be happy with methods that sometimes work, or only works if no numerical issues spoils the fun, without a thorough understanding of classes of instances where they are guaranteed to work, and a worst case analysis that makes sure that all numerical issues can be dealt with when the algorithm is implemented on a discrete computer. Many (most?) algorithms of numerical analysis in fact do not satisfy this criterion. When we consider worst case time complexity, we are as computer scientists particularly interested in the time complexity as a function of the combinatorial parameters ( the size ) of the game. In constrast, other disciplines analyzing games would often consider the game as a constant and consider the convergence rates of their methods as a function of the descired accuracy only. We will often be interested in precise big-o bounds on the complexity of our algorithms but our basic definition of computational efficiency is that the algorithms have polynomial time complexity. Sometimes, we shall not be 2

able to arrive at an polynomial time algorithm for the task we consider and to explain why, we shall derive computational hardness results for the task, such as NP-hardness. The prerequisites for reading these notes is familiarity with linear programming, algorithm analysis (in particular big-o time complexity ) and notions of polynomial reductions and NP-hardness and completeness. In particular, the courses dads, dopt, and dkombsoeg of the computer science program at Aarhus University make the perfect background. There are no prerequisites on game theoretic topics, but we refer to various external notes along the way that should be read together with these notes. 2 Games in Strategic Form Please read Ferguson, Game Theory, Part 2, Section 1-4, as a supplement to these notes. 2.1 Basic definitions Definition 1 A (non-cooperative) game G in strategic form is given by: A set I of players, I = {1, 2,..., l}. For each player i, a strategy space S i. For now we will assume that S i is finite. We shall also refer to S i as the set of pure strategies of Player i. For each player i, a utility/payoff function u i : S 1 S 2 S l R. The set S 1 S 2 S l is also called the set of pure strategy profiles of the game. The result of applying u i to a particular strategy profile is called the outcome. The notion of a game in strategic form can obviously be used to model situations where only one simultaneous move is performed by the players. Less obviously, by defining the strategy space appropriately, it can also be used to model games played over time. The computer scientist may find it easier to see this by interpreting the sets S i as the sets of possible deterministic programs for playing such games. We want to be able to consider randomized ways of playing a game. 3

Definition 2 The mixed extension G of a game G is obtained by: Extending S i to the set S i = (S i ) := the set of probability distributions on S i. The set S i is also referred to as the set of mixed strategies of Player i. Extending u i to ũ i with domain S 1 S 2 S l, where ũ i (σ 1, σ 2,..., σ l ) is defined to be the expected payoff when the mixed strategies σ i are played against each other, i.e, the expected payoff when l pure strategies are sampled independently from S 1, S 2,..., S l according to the probability distributions σ 1,..., σ l. The set S 1 S 2 S l is also called the set of mixed strategy profiles of the game. Definition 3 A two-player zero-sum game is a game where l=2 (two players) and u 2 = u 1 (zero-sum) When the strategy spaces S 1, S 2 are finite, a two-player zero-sum game is also called a matrix game. Indeed, we can use matrix notation to conveniently represent the game. Concretely, let S 1 = {1,..., n} and S 2 = {1,..., m}. Then the game can be represented as a payoff matrix A = (a ij ) with n rows and m columns and with a ij = u 1 (i, j). Henceforth, we shall also refer to Player 1 as the row player and Player 2 as the column player. Note that the matrix entries are the payoffs of the row player. The payoffs of the column player can be obtained by negating these. In an n m matrix game, a mixed strategy x of Player 1 is a member of n, the set of probability distributions on {1, 2,..., n}, and a mixed strategy y of Player 2 is a member of m. It is convenient to consider x and y to be column vectors of dimension n and m, respectively, with the i th entry being the probability that pure strategy i is played. Then, the expected payoff when x is played against y is easily seen to be given by x T Ay. The central solution concept for matrix games is the notion of a maximin strategy. Informally, a maximin strategy is a way of playing the game in a randomized way so the best possible guarantee on the expected payoff is achieved. Formally: Definition 4 A maximin strategy for Player 1 is any member of arg max min x T Ay. y m x n 4

For Player 2, a minimax strategy is any member of arg min y m max x n x T Ay. The guarantees obtained also have names. Definition 5 The lower value v of the game is max x The upper value v of the game is min y min x T Ay. y max x T Ay. x The lower value v is a lower bound on the amount that Player 1 will win (in expectation) when he plays by a maximin strategy. Similarly, the upper value v is an upper bound on the amount that Player 2 will lose (in expectation) when she plays by a minimax strategy. It should be obvious that v v. Maximin and minimax strategies are jointly known as optimal strategies though this terminology is a bit dangerous (and in fact often leads to misunderstandings) as such an optimal strategy is not necessarily optimal to use in any particular situation. For instance, when playing rock-scissors-paper, if you happen to know that your opponent will choose rock, the best move is not to play the maximin mixed strategy (which is derived below) but to play paper. The reason for the terminology is this: The maximin strategy provides, by definition, the best possible (i.e., optimal) guarantee on the expected payoff against an unknown opponent. The following lemma is convenient. It expresses that once Player 1 has committed to play by the mixed strategy x, Player 2 does not lose anyting by playing a pure strategy rather than a mixed one (in the lemma, e j is the j th unit vector). Lemma 6 For any x n min x T Ay = y m min j {1,...,m} (xt A) j = min j {1,...,m} xt Ae j Proof Since y is a probability distribution (a stochastic vector) x T Ay is a weighted average of x T Ae j, for j in {k y k > 0}. An average cannot be strictly smaller than all the items it is an average of! 5

Consider now as an example the game of rock, scissors and paper for one dollar. It can be represented using the following payoff matrix r s p R 0 1 1 S 1 0 1 P 1 1 0 Consider the mixed strategy x of Player 1 that assigns a probability of 1 3 to each of the three rows. Let us analyze what guarantee this strategy obtains. If Player 2 plays rock, the expected payoff to Player 1 is 1 3 0 + 1 3 ( 1) + 1 3 1 = 0. If Player 2 plays scissors, the expected payoff to Player 1 is 1 3 1 + 1 3 0 + 1 ( 1) = 0. 3 If Player 2 plays paper, the expected payoff to Player 1 is 1 3 ( 1) + 1 3 1 + 1 3 0 = 0. The minimum of 0, 0 and 0 is 0, so the guarantee obtained by the strategy is 0. That is, the lower value of the game is at least 0. Symmetrically, we can argue by looking at the mixed strategy y of Player 2 that assigns 1/3 to each column that the upper value of the game is at most 0. Since the lower value is at most the upper value, they must both be equal to 0, and hence the uniform distributions on the rows and columns are in fact maximin and minimax strategies. As a more difficult example, we consider a modified rock-scissors-paper game we call Paper Rules. In this game, if the row player wins with paper, he wins two dollars rather than one. r s p R 0 1 1 S 1 0 1 P 2 1 0 We ask: How much can the row player offer the column player for playing this game? This is given by the lower value of the game, i.e.: max min{ x S + 2x P, x R x P, x R + x S } x R,x S,x P s.t.x R + x S + x P = 1, x R, x S, x P 0. 6

In order to evaluate this expression, we make a reasonable guess that we must of course verify later. We guess that at the max we have the following equalities: x S + 2x P = x R x P x R x P = x R + x S Adding the equation x R + x S + x P = 1, we have 3 equations in 3 unknowns which yields the following unique solution: x R = 1 3, x S = 5 12, x P = 1 4 1 which yields an expected payoff for Player 1 of, no matter what Player 2 12 plays. That is, whether or not our guess above is correct, we have that the lower value is at least 1. By similar reasoning, we arrive at a guess for the 12 minimax strategy of Player 2: y r = 1 4, y s = 5 12, y p = 1 3 1 1 with a guarantee of. That is, the upper value is at most. Since the 12 12 lower value is at most the upper value, they are in fact both equal to 1, and 12 the strategies we arrived at are in fact the maximin/minimax strategies for this game. So, if Player 1 pays more than 1 to play the game, he has paid 12 too much (feel free to try to make money out of this fact in a bar, taking the role of Player 2). 2.2 Von Neumanns s theorem In the derivation of the maximin/minimax strategies above, we depended on a lucky guess. We will now see how to derive the maximin/minimax strategies in general, using linear programming (the lucky guess above corresponds to guessing the basis of an optimal basic solution to this program). This follows from the proof of the fundamental theorem on matrix games, namely: Theorem 7 (von Neumann s min-max theorem, 1928) For all matrix games, v = v Indeed, this was the case for our two examples above. Since the lower and upper values are equal, we shall refer to both as simply the value of the game. 7

We will state and prove a more general result which will be useful later. In the general result, rather than taking the max and the min over probability distributions, we take them over arbitrary non-empty and bounded polytopes. Theorem 8 (Generalized min-max theorem) Given real matrixes A,E,F and real column vectors e,f so that X = {x : Ex = e, x 0} and Y = {y : F y = f, y 0} are non-empty and bounded polytopes. Then max min x X y Y xt Ay = min max y Y x X xt Ay Proof By using the duality theorem for linear programming we get max x, Ex=e, x 0 min y, F y=f, y 0 xt Ay = max x, Ex=e, x 0 max q T f (1) q, q T F x T A = max q T f (2) x,q, Ex=e, q T F x T A, x 0 We use the duality theorem a second time to obtain min y, F y=f, y 0 max x, Ex=e, x 0 xt Ay = min y, F y=f, y 0 min r T e (3) r, r T E Ay = min r T e (4) y,r, F y=f, r T E Ay, y 0 and applying for the third time the duality theorem, now on (2), we obtain (4) which proves the theorem. Note that the expression (2) is just a linear program! In particular, finding maximin (and minimax) strategies and values in matrix games reduces to solving linear programs. Here is what the program looks like for the Paper rules games. There are four variables, where x R, x S and x P are probabilities for playing each of the three pure strategies. and v is the value. Also, there is one constraint for each move the opponent might make. s.t. max v x R,x S,x P,v r: 0 x R + ( 1) x S + 2 x P v s: 1 x R + 0 x S + ( 1) x P v p: ( 1) x R + 1 x S + 0 x P v x R + x S + x P = 1 x R, x S, x P 0 8

Note that the reduction from solving matrix games to solvng linear programming is very simple; it essentially consists of copying coefficients of the matrix A and letting them be coefficients of the linear program. In particular, the reduction from solving matrix games to solving linear programs is a strongly polynomial time reduction. Let us remind ourselves what a strongly polynomial time algorithm is. A strongly polynomial time algorithm is an algorithm for computing a semi-algebraic function (that is, a function from real vectors to real vectors that can be defined in first order logic using the vocabulary +,, /,, ) using polynomially in the dimension of the domain of the function many arithmetic operations and comparisons. We know that linear programming has polynomial time algorithms (e.g., the ellipsoid algorithm and many interior point algorithms) but it is an open problem if it has a strongly polynomial time algorithm. In particular, the polynomial time ellipsoid algorithm is not a strongly polynomial time algorithm: It can not be defined on an arbitrary real input as an algorithm using the operations +,, /,, only. Also, while it can be used to exactly solve linear programs in Turing machine time polynomial in the bit length of a rational input, it will need more iterations and more time on inputs containings numbers with more digits, a deficiency not shared by a strongly polynomial time algorithm. It is very likely that we will someday find a strongly polynomial time algorithm for linear programming. We do have candidates for such algorithms. In particular, the simplex algorithm with some ingenious pivoting rule could very well be such an algorithm (on the other hand, the standard pivoting rules have all been shown to lead to worst case exponential time complexity). Summing up, we now know: Corollary 9 Maximin/minimax strategies and values can be found in polynomial time (given the matrix, with entries rational numbers given as fractions, of a matrix game as input). If there is a strongly polynomial time algorithm for linear programming then there is even a strongly polynomial time algorithm for computing maximin strategies for given matrices. Note that we are very careful about stating the representation we have in mind when we consider the notion of polynomial time solvability. It is very interesting to ask whether the implication in the last bullet of the corollary can be reversed. Could we hope for a strongly polynomial time algorithm for computing maximin strategies without finding one for linear programming? To kill such hopes, we have to provide a strongly polynomial reduction from solving linear programs to solving matrix games. That is, we 9

should postulate a black box finding maximin strategies for matrix games given to the black box as input as use such a black box to solve a given linear program using a polynomial number of arithmetic operations and applications of the black box. There seems to be a foklore belief that it is known how to do this (in fact, the lecturer has been ridiculed on more than one occasion for claiming that it is not known!). The foklore belief seems to stem from a redution due to Dantzig (1948) that does indeed in some sense reduce solving linear programs to solving matrix games but does not do quite what we want. Let us have a look at Dantzig s reduction: Given an LP (in standard form) P : max c T x Ax b x 0 we want to know if it has an optimal solution (so that it is not infeasible or unbounded). The answer is that this is the case if and only if P : Ax b A T y c b T y = c T x x, y 0 is feasible, by the duality theorem. Dantzig s observation is that P is feasible if and only if the following matrix game 0 A T c G = A 0 b c T b T 0 has some maximin strategy (x, y, z ) that plays the last row with nonzero probability z > 0. We shall not give a proof of this statement, but we remark that the x-part of the feasible solution to P is in that case given by x = x /z and the y-part is given by y = y /z. Since G is skew-symmetric, the game appear identical for the two players. We call such a game symmetric. It is easy to see that the value of a symmetric game is 0, like in the (unmodified) rock, scissors and paper game. In particular, Dantzig certainly did not reduce the general linear programming problem to computing the value of a matrix game. Thus, it seems that the following problem is still open: 10

Open problem 1 Is there a strongly polynomial time reduction from finding optimal solutions to linear programs to finding maximin strategies of matrix games? 2.3 Maximin strategies and Nash equilibria Definition 10 (for general games) Given σ i S i (mixed strategy for player i) and σ i j i Sj (mixed strategy for the other players). We say that σ i is a best reply or best response to σ i, if σ i is arg max π i ũ i (π i, σ i ) Definition 11 A strategy profile σ l i=1 S i is a Nash equilibrium if σ i is a best reply to σ i for all i. Nash equilibrium is a central solution concept in game theory. Note that it is conceptually very different from maximin. Nash equilibrium is a descriptive notion capturing stability while Maximin is a normative notion capturing optimal guarantees. Still, for the case of a two-player, zero-sum game, we can show that the two notions coincide. Proposition 12 For a two-player, zero-sum game, we have: Nash equilibria = Maximin strategies Minimax strategies Proof : We have a strategy profile (x, y ) where x is maximin and y be minimax. The expected outcome of play must be (x ) T A(y ) = v = v = v, as both players are guaranteed an outcome at least this good. Can any player deviate and get better outcome? No! This would violate guarantee of the other player. : Let (x, y ) be a Nash equilibrium and let v(x ) = min y m (x ) T Ay, v(y ) = max x n x T Ay (we may call v(x ) the value of strategy x ). We should prove that x is maximin and that y is minimax. We shall just show that x is maximin; the other proof is similar. So suppose, to the contrary, that x is not maximin. 11

That is, v(x ) v. Then v(x ) < v. If (x ) T Ay > v(x ), then player 2 can deviate and achieve v(x ), contradicting the Nash equilibrium property, i.e., we have a contradiction. On the other hand, if (x ) T Ay v(x ), then player 1 can deviate to his maximin strategy and achieve v, contradicting the Nash equilibrium property. The following corollary is often very useful when deriving maximin strategies by hand: Corollary 13 (Principle of Indifference) Given a matrix game and an maximin mixed strategy x for the row player y and a minimax mixed strategy y for the column player. If the column player plays according to y, the expected payoff for the row player is the same, no matter which pure strategy i he chooses, as long as x i > 0. Proof If not, the Nash equilibrium condition is violated. 12