A Lyapunov Optimization Approach to Repeated Stochastic Games

Transcription

1 PROC. ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING, OCT A Lyapunov Optmzaton Approach to Repeated Stochastc Games Mchael J. Neely Unversty of Southern Calforna mjneely Abstract Ths paper consders a tme-varyng game wth N players. Every tme slot, players observe ther own random events and then take a control acton. The events and control actons affect the ndvdual utltes earned by each player. The goal s to maxmze a concave functon of tme average utltes subject to equlbrum constrants. Specfcally, partcpatng players are provded access to a common source of randomness from whch they can optmally correlate ther decsons. The equlbrum constrants ncentvze partcpaton by ensurng that players cannot earn more utlty f they choose not to partcpate. Ths form of equlbrum s smlar to the notons of Nash equlbrum and correlated equlbrum, but s smpler to attan. A Lyapunov method s developed that solves the problem n an onlne maxweght fashon by selectng actons based on a set of tmevaryng weghts. The algorthm does not requre knowledge of the event probabltes. A smlar method can be used to compute a standard correlated equlbrum, albet wth ncreased complexty. I. INTRODUCTION Consder a repeated game wth N players and one game manager. The game s played over an nfnte sequence of tme slots t {0, 1, 2,...}. Every slot t there s a random event vector ω(t = (ω 0 (t, ω 1 (t,..., ω N (t. The game manager observes the full vector ω(t, whle each player {1,..., N} observes only the component ω (t. The value ω 0 (t represents nformaton known only to the manager. After the slot t event s observed, the game manager sends a message to each player. Based on ths message, the players choose a control acton α (t. The random event and the collecton of all control actons for slot t determne ndvdual utltes u (t for each player {1,..., N}. Each player s nterested n maxmzng the tme average of ts own utlty process. The game manager s nterested n provdng messages that lead to a far allocaton of tme average utltes across players. Specfcally, let u be the tme average of u (t. The farness of an acheved vector of tme average utltes s defned by a concave farness functon φ(u 1,..., u N. The goal s to devse strateges that maxmze φ(u 1,..., u N subject to certan game-theoretc equlbrum constrants. For example, suppose the farness functon s a sum of logarthms: N φ(u 1,..., u N = log(u Ths corresponds to proportonal far utlty maxmzaton, a concept often studed n the context of communcaton networks [1]. Another natural concave farness functon s: φ(u 1,..., u N = mn[u 1,..., u N, c] for some gven constant c > 0. Ths farness functon assgns no added value when the average utlty of one player exceeds that of another. Let M(t = (M 1 (t,..., M N (t be the message vector provded by the game manager on slot t. The value M (t s an element of the set A and represents the acton the manager would lke player to take. A player {1,..., N} s sad to partcpate f she always chooses the suggeston of the manager, that s, f α (t = M (t for all t {0, 1, 2,...}. At the begnnng of the game, each player makes a partcpaton agreement. Partcpatng players receve the messages M (t, whle non-partcpatng players do not. Ths paper consders the class of algorthms that delver message vectors M(t as a statonary and randomzed functon of the observed ω(t. Assumng that all players partcpate, ths nduces a condtonal probablty dstrbuton on the actons, gven the current ω(t. The condtonal dstrbuton s defned as a coarse correlated equlbrum (CCE f t yelds a tme average utlty vector (u 1,..., u N wth the followng property [2]: For each player {1,..., N}, the average utlty u s at least as large as the maxmum tme average utlty ths player could acheve f she dd not partcpate (assumng the actons of all other players do not change. Overall, the goal s to maxmze φ(u 1,..., u N subject to the CCE constrants. A. Contrbutons and related work The noton of coarse correlated equlbrum (CCE was ntroduced n [2] n the statc case where there s no event process ω(t. The CCE defnton s smlar to a correlated equlbrum (CE [3][4][5]. The dfference s as follows: A correlated equlbrum (CE s more strngent and requres the utlty acheved by each player to be at least as large as the utlty she could acheve f she dd not partcpate but f she stll knew the M (t messages on every slot. It s known that both CCE and CE constrants can be wrtten as lnear programs. The concept of Nash equlbrum (NE s more strngent stll: The NE constrant requres all players to act ndependently wthout the ad of a message process M(t [6][5]. Unfortunately, the problem of computng a Nash equlbrum s nonconvex. Ths paper uses the NE, CE, and CCE concepts n the context of a stochastc game wth random events ω(t. When farness functons are concave but nonlnear, the optmal acton assocated wth a partcular event can depend on whether or not the event s rare. Ths paper develops an onlne algorthm

2 PROC. ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING, OCT that s nfluenced by the event probabltes, but does not requre knowledge of these probabltes. The algorthm uses the Lyapunov optmzaton theory of [7][8] and s of the max-weght type. Specfcally, every slot t, the game manager observes the ω(t realzaton and chooses a suggeston vector by greedly mnmzng a drft-plus-penalty expresson. Such Lyapunov methods are used extensvely n the context of queueng networks [9][10] (see also related methods n [11][12][13]. Ths s perhaps the frst use of such technques n a game-theoretc settng. One reason the soluton of ths paper can have a smple structure s that the random event process ω(t s assumed to be ndependent of the pror control actons. Specfcally, whle the components ω (t are allowed to be arbtrarly correlated across {0, 1,..., N}, the vector ω(t s assumed to be ndependent and dentcally dstrbuted (..d. over slots. Pror work on stochastc games consders more complex problems where ω(t 1 s nfluenced by the control acton of slot t, ncludng work n [14] whch studes correlated equlbra n ths context. Ths typcally nvolves Markov decson theory and has complexty that grows exponentally wth the dmenson of the state vector ω(t, and hence exponentally n the number of players N. In contrast, whle the current paper treats a stochastc problem wth more lmted structure, the resultng soluton scales gracefully wth N. Specfcally, the algorthm uses a number of vrtual queues that s lnear n N, rather than exponental n N, resultng n polynomal tme bounds on complexty and convergence tme. Furthermore, the number of vrtual queues s ndependent of the number of possble values of ω 0 (t. Unfortunately, the number of vrtual queues s exponental n the number of possble values of ω (t for {1,..., N}. Hence, the algorthm works best when players observe only a small number of possble random events. II. STATIC GAMES Ths secton ntroduces the problem n the statc case wthout random processes ω 0 (t, ω 1 (t,..., ω N (t. The dfferent forms of equlbrum are defned and compared through a smple example. The general stochastc problem s treated n Secton III. Suppose there are N players, where N s an nteger larger than 1. Each player {1,..., N} has an acton space A, assumed to be a fnte set. The game operates n slotted tme t {0, 1, 2,...}. Every slot t, each player chooses an acton α (t A. Let α(t = (α 1 (t,..., α N (t be the vector of control actons on slot t. The utlty u (t earned by player on slot t s a real-valued functon of α(t: u (t = û (α(t {1,..., N} The utlty functons û (α can be dfferent for each player. Defne A = A 1 A N. Consder startng wth a partcular vector α A and modfyng t by changng a sngle entry from α to some other acton β. Ths new vector s represented by the notaton (β, α. Defne A as the set of all vectors α, beng the set product of A j over all j. The three dfferent forms of equlbrum consdered n ths secton are defned by probablty mass functons P r[α] for α A. It s assumed throughout that: P r[α] 0 for all α A. P r[α] = 1. If actons α(t are chosen ndependently every slot accordng to the same probablty mass functon P r[α], the law of large numbers ensures that, wth probablty 1, the tme average utlty of each player {1,..., N} s: A. Nash equlbrum (NE u = P r[α]û (α The standard concept of Nash equlbrum assumes players take ndependent actons, so that P r[α] = N g [α ], where g [α ] s defned for all {1,..., N} and α A by: g [α ] = P r[α (t = α ] A collecton of such functons g [α ] for {1,..., N} defnes a mxed strategy Nash equlbrum (NE f [15][6]: N N g j [α j ]û (β, α g j [α j ]û (α j=1 j=1 {1,..., N}, β A (1 B. Correlated equlbrum (CE The standard concept of correlated equlbrum from [3][4] can be motvated by a game manager that provdes suggested actons (α 1 (t,..., α N (t every slot t, where player 1 only sees α 1 (t, player 2 only sees α 2 (t, and so on. Assume the suggeston vector s ndependent and dentcally dstrbuted (..d. over slots wth some probablty mass functon P r[α]. Assume all players partcpate, so that every slot ther chosen actons match the suggestons. The probablty mass functon P r[α] s a correlated equlbrum (CE f [3][4]: P r[α, α ]û (α, α P r[α, α ]û (β, α α A α A {1,..., N}, α A, β A wth β α (2 Ths can be understood as follows: Fx an {1,..., N} and an α A such that P r[α (t = α ] > 0. Dvde both sdes of the above nequalty by P r[α (t = α ]. Then: The left-hand-sde s the condtonal expected utlty of player, gven that all players partcpate and that player sees suggeston α on the current slot. The rght-hand-sde s the condtonal expected utlty of player, gven that she sees α on the current slot, that all other players j partcpate, and that player chooses acton β nstead of α (so player does not partcpate. The correlated equlbrum constrants are lnear n the P r[α] varables. Defne A as the number of actons n set A. The number of lnear constrants specfed by (2 s then: N A ( A 1

3 PROC. ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING, OCT C. Coarse correlated equlbrum (CCE The defnton of correlated equlbrum assumes that nonpartcpatng players stll receve the suggestons from the game manager. As the suggeston α (t for player may be correlated wth the suggestons α j (t of other players j, ths can gve a non-partcpatng player a great deal of nformaton about the lkelhood of actons from other players. The followng smple modfcaton assumes that nonpartcpatng players do not receve any suggestons from the game manager. A probablty mass functon P r[α] s a coarse correlated equlbrum (CCE f: P r[α]û (β, α P r[α]û (α {1,..., N}, β A (3 Ths CCE defnton was ntroduced n [2]. Whle the above constrants are defned n terms of fxed alternatves β, t can be shown that these constrants mply that no player can reach an mproved payoff by usng a dfferent strategy. Intutvely, ths s because any dfferent strategy can, at a gven tme t, be vewed as a probablstc mxture of fxed actons. These CCE constrants (3 are lnear n the P r[α] values. The number of CCE constrants s: N A Ths number s typcally much less than the number of constrants requred for a CE n (2. Assumng that A 2 for each player (so that each player has at least 2 acton optons, the number of CCE constrants s always less than or equal to the number of CE constrants, wth equalty f and only f A = 2 for all players. D. A superset result The assumpton that all sets A are fnte make the game a fnte game. Fx a fnte game and defne E Nash, E CE, and E CCE as the set of all probablty mass functons P r[α] that defne a (mxed strategy Nash equlbrum, a correlated equlbrum, and a coarse correlated equlbrum, respectvely. It s known that every such fnte game has at least one mxed strategy Nash equlbrum, and so E Nash s nonempty [15][6]. Furthermore, n [3][4] t s shown that any (mxed strategy Nash equlbrum s also a correlated equlbrum. Smlarly, t can be shown that any correlated equlbrum s also a coarse correlated equlbrum [2]. Thus: E Nash E CE E CCE (4 Furthermore, the sets E CE and E CCE are closed, bounded, and convex. For example, ths s true for E CCE because ths set s the ntersecton of the lnear constrants (3 and the closed, bounded, and convex probablty smplex. E. A smple example Consder a game where player 1 has three control optons and player 2 has two control optons: A 1 = {α, β, γ}, A 2 = {α, β} The utlty functons û 1 (α 1, α 2 and û 2 (α 1, α 2 are specfed n the table of Fg. 1, where player 1 actons are lsted by row and player 2 actons are lsted by column. Utlty 1 Utlty 2 Probabltes α β α β α β α 2 5 α 50 1 α a b β 4 2 β 2 4 β c d γ 3 5 γ 3 0 γ e f Fg. 1. Example utlty functons û 1 (α 1, α 2 and û 2 (α 1, α 2. There are sx possble acton vectors (α 1, α 2. Defne the mass functon P r[α] by values a, b, c, d, e, f assocated wth each of the sx possbltes, as shown n Fg. 1. The eght CE constrants for ths problem are: player 1 sees α: player 1 sees α: player 1 sees β: player 1 sees β: player 1 sees γ: player 1 sees γ: player 2 sees α: player 2 sees β: 2a 5b 4a 2b 2a 5b 3a 5b 4c 2d 2c 5d 4c 2d 3c 5d 3e 5f 2e 5f 3e 5f 4e 2f 50a 2c 3e a 4c 0e b 4d 0f 50b 2d 3f It can be shown that there s a sngle probablty mass functon P r[α] that satsfes all of these CE constrants: a = b = 0, c = 0.45, d = 0.15, e = 0.3, f = 0.1 Ths s also the only NE. The average utlty vector assocated wth ths mass functon s (u 1, u 2 = (3.5, 2.4. In contrast, the fve CCE constrants for ths problem are: player 1 chooses α: player 1 chooses β: player 1 chooses γ: player 2 chooses α: player 2 chooses β: 2a 5b 4c 2d 3e 5f 2(a c e 5(b d f 2a 5b 4c 2d 3e 5f 4(a c e 2(b d f 2a 5b 4c 2d 3e 5f 3(a c e 5(b d f 50a b 2c 4d 3e 50(a b 2(c d 3(e f 50a b 2c 4d 3e 1(a b 4(c d 0(e f There are an nfnte number of probablty mass functons P r[α] that satsfy these CCE constrants. Three dfferent ones are gven n the table of Fg. 2, labeled dstrbuton 1, dstrbuton 2, and dstrbuton 3. Dstrbuton 1 corresponds to the CE and NE dstrbuton. The set of all utlty vectors (u 1, u 2 achevable under CCE constrants s the trangular regon shown n Fg. 3. The three vertces of the trangle correspond to the three dstrbutons n Fg. 2, and are: (u 1, u 2 {(3.5, 2.4, (3.5, 9.3, (3.8773, }

4 PROC. ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING, OCT The pont (3.5, 2.4 s the lower left vertex of the trangle and corresponds to the CE (and NE dstrbuton. It s clear that both players can sgnfcantly ncrease ther utlty by changng from CE constrants to CCE constrants. Ths llustrates the followng general prncple: All players beneft f non-partcpants are dened access to the suggestons of the game manager. Ths prncple s justfed by (4. Dstrbuton 1 Dstrbuton 2 Dstrbuton 3 α β α β α β α 0 0 α.15 0 α β β β γ γ 0.10 γ Fg. 2. Three dfferent probablty dstrbutons that satsfy the CCE constrants. The frst dstrbuton also satsfes the CE and NE constrants. Utlty 2 average (3.5, 9.3 (3.5, 2.4 Utlty regon under CCE constrants NE and CE pont (3.7323, (3.8773, Utlty 1 average Fg. 3. The regon of (u 1, u 2 values achevable under CCE constrants. All ponts nsde and on the trangle are achevable. The NE and CCE pont s the lower left vertex. The pont (3.7323, s the soluton to the convex optmzaton example of Secton II-F. F. Utlty optmzaton wth equlbrum constrants For convenence, assume all utlty functons are nonnegatve. Defne u max as an upper bound on the utlty for each player {1,..., N}, so that: 0 û (α u max α A Defne φ(u 1,..., u N as a contnuous and concave functon that maps the set N [0, umax ] to the real numbers. Ths s called the farness functon. The game manager chooses a probablty mass functon P r[α] wth the goal of maxmzng φ(u 1,..., u N subject to CCE constrants: Maxmze: φ(u 1,..., u N (5 Subject to: u = P r[α]û (α {1,..., N} (6 P r[α] 0 α A (7 P r[α] = 1 (8 CCE constrants (3 are satsfed (9 The above s a convex optmzaton problem. If the CCE constrants are replaced by the CE constrants (2, the problem remans convex but can have sgnfcantly more constrants. If the CCE constrants are replaced wth the NE constrants (1, the problem becomes nonconvex. Consder the specal case example of Secton II-E wth farness functon gven by: φ(u 1, u 2 = 10 log(1 u 1 log(1 u 2 where player 1 s gven a hgher prorty. The optmal utlty s (u 1, u 2 = (3.7323, , plotted n Fg. 3. III. STOCHASTIC GAMES Let ω(t = (ω 0 (t, ω 1 (t,..., ω N (t be a vector of random events for slot t {0, 1, 2,...}. Each component ω (t takes values n some fnte set Ω, for {0, 1,..., N}. Defne Ω = Ω 0 Ω 1 Ω N. The vector process ω(t s assumed to be ndependent and dentcally dstrbuted (..d. over slots wth probablty mass functon: π[ω] =P r[ω(t = ω] ω Ω where the notaton = means defned to be equal to. On each slot t, the components of the vector ω(t can be arbtrarly correlated. At the begnnng of each slot t, each player {1,..., N} observes ts own random event ω (t. The game manager observes the full vector ω(t, ncludng the addtonal nformaton ω 0 (t. It then sends a suggested acton M (t to each partcpatng player {1,..., N}. Assume M (t A, where A s the fnte set of actons avalable to player. Each player chooses an acton α (t A. Partcpatng players always choose α (t = M (t. Non-partcpatng players do not receve M (t and choose α (t usng knowledge of only ω (t and of events that occurred before slot t. Let α(t = (α 1 (t,..., α N (t be the acton vector. The utlty u (t earned by each player on slot t s a functon of α(t and ω(t: u (t = û (α(t, ω(t For convenence, assume utlty functons are nonnegatve wth maxmum values u max for {1,..., N}, so that: 0 û (α(t, ω(t u max A. Dscusson of game structures Ths model can be used to treat varous game structures. For example, the scenaro where all players have full nformaton can be treated by defnng ω (t = ω 0 (t for all {1,..., N}. Ths s useful n games related to economc

5 PROC. ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING, OCT markets, where ω 0 (t can represent a commonly known vector of current prces. Alternatvely, one can magne a game wth a sngle random event process ω 0 (t that s known to the game manager but unknown to all players. For example, consder a game defned over a wreless multple access system. Wreless users are players n the game, and the access pont s the game manager. In ths example, ω 0 (t can represent a vector of current channel condtons known only to the access pont. Such games can be treated by settng ω (t to a default constant value for all {1,..., N} and all slots t. B. Pure strateges and the vrtual statc game Assume all players partcpate, so that M (t = α (t for all. Lke the prevous secton, the goal here s to defne equlbrum condtons that ensure no player can beneft by ndvdually devatng from the suggested actons of the game manager. For each {1,..., N}, denote the szes of sets Ω and A by Ω and A, respectvely. Defne a pure strategy functon for player as a functon b (ω that maps Ω to the set A. There are A Ω such functons. Defne: S = {1, 2,..., A Ω } Enumerate the pure strategy functons for player and represent them by b (s (ω for s S. Defne: S =S 1 S 2 S N Each vector (s 1, s 2,..., s N S can be used to specfy a profle of pure strateges used by each player. For each s S and each ω Ω, defne: b (s (ω = (b (s1 1 (ω 1, b (s2 2 (ω 2,..., b (s N (ω N The vector b (s (ω s equal to (α 1 (t,..., α N (t n the specal case when ω(t = ω and when the acton of each player on slot t s defned by pure strategy s. The average utlty earned by player on such a slot t s defned: h (s = ω Ω π[ω]û (b (s (ω, ω (10 The stochastc game can be transformed nto a vrtual statc game as follows: The vrtual statc game also has N players. The vrtual acton space of each player s vewed as the set of pure strateges S. The vrtual utlty functons are gven by the functons h (s. As the vrtual acton spaces of the players reman fnte, the vrtual statc game s ndeed a fnte game. Hence, the NE, CE, and CCE defntons specfed for statc games n the prevous secton can be used for ths vrtual statc game. Ths provdes a natural path for extendng these equlbrum concepts to the stochastc context. In partcular, let P r[s] be a probablty mass functon over the fnte set of strategy profles s S. Ths generates a condtonal probablty mass functon P r[α ω] defned over all α A and ω Ω: P r[α ω] = s S P r[s]1{b(s (ω = α} (11 where 1{b (s (ω = α} s an ndcator functon that s 1 f b (s (ω = α, and s 0 else. The next lemma shows that N every condtonal probablty mass functon P r[α ω] can be generated n ths way (proof n [16]. Lemma 1: Let P r[α ω] be a condtonal probablty mass functon defned over (α, ω A Ω. Then there exsts a probablty mass functon P r[s], defned over s S, for whch (11 holds. Now suppose P r[s] s a CCE of the vrtual statc game. By defnton of CCE: s S P r[s]h (s s S P r[s]h (r, s {1,..., N}, r S (12 Lemma 2: If P r[s] and P r[α ω] are probablty mass functons that satsfy (11, then P r[s] satsfes the CCE constrants (12 for the vrtual statc game f and only f P r[α ω] satsfes the followng constrants: π[ω]p r[α ω]û (α, ω ω Ω ( π[ω]p r[α ω]û (b (s (ω, α, ω ω Ω {1,..., N}, s S (13 Proof: It suffces to show that the left-hand-sde of (12 s equal to the left-hand-sde of (13, and that the same s true of the rght-hand-sdes. The followng two denttes are useful. Fx {1,..., N}, ω Ω, s S. Then: û (b (s (ω, ω = 1{b (s (ω = α}û (α, ω (14 Fx r S, and defne s = (r, s, beng the strategy profle that replaces the th component of s wth strategy r. Then: û (b (s (ω, ω = 1{b (s (ω = α}û ((b (r (ω, α, ω (15 Now consder the left-hand-sde of (12: P r[s]h (s s S = s S P r[s] ω Ω π[ω]û (b (s (ω, ω (16 = P r[s] π[ω] 1{b (s (ω = α}û (α, ω (17 s S ω Ω = π[ω]p r[α ω]û (α, ω (18 ω Ω where (16 follows by (10, (17 follows by (14, and (18 follows by (11. Ths proves the left-hand-sdes are equal. A smlar argument proves the rght-hand-sdes are equal. C. Equlbrum for the stochastc game Let P r[α ω] be a condtonal probablty mass functon defned over ω Ω, α A. It s assumed throughout that: P r[α ω] 0 α A, ω Ω (19 P r[α ω] = 1 ω Ω (20 Lemma 2 suggests the followng defnton: A condtonal probablty mass functon P r[α ω] s a coarse correlated

6 PROC. ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING, OCT equlbrum (CCE for the stochastc game f constrants (13 are satsfed for all {1,..., N} and all s S. Defntons of NE and CE can be smlarly extended to ths stochastc context. A probablty mass functon P r[α ω] s a mxed strategy Nash equlbrum (NE for the stochastc game f t has the product form: P r[α ω] = N P r[α ω ] and f the followng constrants are satsfed: N π[ω] P r[α ω ]û (α, ω ω Ω N ( π[ω] P r[α ω ]û (b (s (ω, α, ω ω Ω {1,..., N}, s S (21 A condtonal probablty mass functon P r[α ω] s a correlated equlbrum (CE for the stochastc game f: π[ω]p r[α ω]û (α, ω ω Ω α =c ( π[ω]p r[α ω]û (b (s (ω, α, ω ω Ω α =c {1,..., N}, s S, c A (22 Let ENE stoc, E CE stoc, E CCE stoc be the set of all condtonal probablty mass functons P r[α ω] that are NE, CE, and CCE, respectvely, for the stochastc game. Lemma 3: For the general stochastc game defned above: (a The set ENE stoc s nonempty. (b ENE stoc E CE stoc E CCE stoc. (c Sets ECE stoc stoc and ECCE are closed, bounded, and convex. Proof: See [16]. As n the statc case, t can be shown that the CCE constrants (13 mply that no player can beneft by ndvdually choosng not to partcpate (assumng non-partcpants do not receve the messages from the game manager [16]. Lkewse, the CE constrants (22 mply that no player can beneft by ndvdually choosng not to partcpate (assumng nonpartcpants stll receve the messages. D. Optmzaton objectve As before, defne φ(u 1,..., u N as a contnuous and concave functon that maps N [0, umax ] to the set of real numbers. The goal s to choose messages M(t = α(t accordng to a condtonal probablty mass functon P r[α(t ω(t] that solves the problem below: Maxmze: φ(u 1,..., u N (23 Subject to: u = ω Ω π[ω]p r[α ω]û (α, ω {1,..., N} (24 CCE constrants (13 are satsfed (25 P r[α ω] 0 α A, ω Ω (26 P r[α ω] = 1 ω Ω (27 Ths s a convex program n the unknowns P r[α ω]. Secton IV presents an onlne soluton technque that does not requre knowledge of the probabltes π[ω]. IV. LYAPUNOV OPTIMIZATION For a real-valued stochastc process u(t defned over slots t {0, 1, 2,...}, defne: u(t = 1 t t 1 τ=0 E [u(τ] Recall that u (t =û (α(t, ω(t. For each {1,..., N}, defne u (s (t =û (s (α(t, ω(t, where û (s (α(t, ω(t s the correspondng utlty when the player acton s replaced by the acton of the pure strategy b (s (ω : û (s (α(t, ω(t =û (( b (s (ω (t, α (t, ω(t Consder the followng modfcaton of the problem (23- (27: Every slot t {0, 1, 2,...} the game manager observes ω(t and chooses an acton vector α(t A to solve: Maxmze: lm nf t φ(u 1(t,..., u N (t (28 Subject to: lm nf (t u (s t (t] 0 {1,..., N}, s S (29 α(t A t {0, 1, 2,...} (30 It can be shown that optmalty can be acheved by a statonary and randomzed algorthm that observes ω(t and ndependently chooses α(t accordng to the same condtonal probablty mass functon P r[α ω] every slot. Such algorthms yeld well defned lmts. Any probablty mass functon P r[α ω] that solves (23-(27 must also solve (28- (30. Moreover, any soluton to (28-(30 must have tme average expectatons that are arbtrarly close to condtonal probablty mass functons P r[α ω] that solve (23-(27. A. Transformaton va Jensen s nequalty Usng the auxlary varable technque of [7], the problem (28-(30, whch maxmzes a nonlnear functon of a tme average, can be transformed nto a maxmzaton of the tme average of a nonlnear functon. To ths end, let γ(t = (γ 1 (t,..., γ N (t be an auxlary vector that the game manager chooses on slot t, assumed to satsfy 0 γ (t u max for all t and all. Defne: g(t =φ(γ 1 (t,..., γ N (t Jensen s nequalty mples that for all slots t > 0: g(t φ(γ 1 (t,..., γ N (t (31 Now consder the followng problem: Every slot t {0, 1, 2,...} the game manager observes ω(t and chooses both an acton vector α(t A and an auxlary vector γ(t

7 PROC. ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING, OCT to solve: Maxmze: lm nf g(t (32 t Subject to: lm γ (t u (t = 0 {1,..., N} (33 t lm nf (t u (s t (t] 0 {1,..., N}, s S (34 α(t A t (35 0 γ (t u max t, {1,..., N} (36 For ntuton, suppose all lmts exst, so that constrant (33 s equvalent to γ = u. Ths together wth Jensen s nequalty (31 ensures the optmal value of the objectve functon n the above problem s less than or equal to that of the problem (28- (30. On the other hand, the optmal value of (28-(30 can be acheved by choosng γ (t = u for all t, where (u 1,..., u N are optmal tme average utltes for problem (28-(30. The problems (28-(30 and (32-(36 are equvalent. B. The drft-plus-penalty algorthm The problem (32-(36 can be solved va the drft-pluspenalty algorthm of [7]. To enforce the constrants (34, defne a vrtual queue Q (s (t for all {1,..., N} and all s S, wth update equaton: Q (s (t 1 = max[q (s (t u (s (t u (t, 0] (37 The above looks lke a slotted tme queueng equaton wth arrval process u (s (t and servce process u (t. The ntuton s that f a control algorthm s constructed that makes these queues mean rate stable, so that: [ ] E Q (s (t lm = 0 t t then constrant (34 s satsfed [7]. Lkewse, to enforce the constrants (33, defne a vrtual queue Z (t for all {1,..., N}, wth update equaton: Z (t 1 = Z (t γ (t u (t (38 For smplcty, assume all vrtual queues are ntalzed to 0. Defne L(t as a sum of squares of all vrtual queues (dvded by 2 for convenence: L(t = 1 2 N s S Q (s (t 2 1 N 2 Z (t 2 Ths s called a Lyapunov functon. Defne (t =L(t 1 L(t, called the Lyapunov drft on slot t. The drft-plus-penalty algorthm s defned by choosng control actons greedly every slot to mnmze a bound on the drft-plus-penalty expresson (t V g(t. Here, g(t s a penalty and V s a nonnegatve constant that affects a tradeoff between convergence tme and proxmty to the optmal soluton. Lemma 4: For all slots t one has: (t V g(t B V g(t N where: B = 1 2 N Q (s s S (t[u (s (t u (t] N Z (t[γ (t u (t] (39 s S (u max 2 1 N 2 (umax 2 Proof: The result follows mmedately from the fact that max[x, 0] 2 x 2 (see detals n [16]. Greedly mnmzng the rght-hand-sde of (39 every slot leads to the followng algorthm: Every slot t, the game manager observes the queues and the current ω(t. Then: Auxlary varables: Choose γ (t [0, u max ] for all {1,..., N} to maxmze: V φ(γ 1 (t,..., γ N (t N Z (tγ (t Suggested actons: Choose α(t A 1 A N to mnmze: N Z (tû (α(t, ω(t N Q (s s S (t[û (s (α(t, ω(t û (α(t, ω(t] Then send suggested actons α (t to each (partcpatng player {1,..., N}. Update vrtual queues va (37 and (38. Ths s an onlne algorthm that does not requre knowledge of the probabltes π[ω]. C. Performance analyss Defne φ as the optmal value of the objectve functon for problem (23-(27, and note by equvalence of the transformatons that ths s also the optmal value for the problem (32- (36. Defne θ(t as the vector of all vrtual queues Q (s (t and Z (t, and defne θ = 2L(t. Theorem 1: If the above algorthm s mplemented usng a fxed value V > 0, then: (a For all slots t > 0 one has: φ (γ 1 (t,..., γ N (t φ B/V (b All vrtual queues Q (s (t and Z (t are mean rate stable, so that the constrants (33-(36 are satsfed. (c E[ θ(t ] 2B2V (g t max φ t for all slots t > 0, where g max s the maxmum possble value for g(t, beng the maxmum of φ(γ 1,..., γ N over γ [0, u max ] for all {1,..., N}.

8 PROC. ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING, OCT Proof: The algorthm mnmzes the rght-hand-sde of (39 every slot t, and so: (t V g(t B V φ(γ1(t,..., γn (t N Q (s s S (t[u (s (t u (t] N Z (t[γ (t u (t] (40 for all alternatve decsons α (t A and γ (t that satsfy 0 γ (t umax, where: u (t u (s (t = û (α (t, ω(t (( = û b (s (ω (t, α (t, ω(t Now randomly choose α (t as a functon of ω(t (and ndependently of queue backlog accordng to the probabltes P r[α ω] that solve (23-(27. Let u be the expected utlty of player under ths dstrbuton, and note that: φ(u 1,..., u N = φ Choose γ (t = u for {1,..., N}. Takng expectatons of (40 then gves: E [ (t V g(t] B V φ Fx a slot T > 0. Summng the above over slots t {0, 1, 2,..., T 1} and usng L(0 = 0 gves: T 1 E [L(T ] V E [g(t] BT V φ T (41 t=0 Rearrangng (41 and usng the defnton of g(t gves: T 1 1 E [φ(γ 1 (t,..., γ N (t] φ B T V t=0 E [L(T ] V T Usng Jensen s nequalty and E [L(T ] 0 proves part (a. Agan rearrangng (41 gves: E [ θ(t 2] 2BT 2V T (g max φ Usng the fact that E [ θ(t ] 2 E [ θ(t 2], dvdng by T 2, and takng square roots proves part (c. Part (b follows mmedately from part (c. Defne ɛ = 1/V. Theorem 1 shows that average utlty s wthn O(ɛ of optmalty. Part (c of the theorem mples that constrant volaton s wthn O(ɛ after tme O(1/ɛ 3. If a Slater condton holds, ths convergence tme s mproved to O(1/ɛ 2 [7]. Smlar bounds can be shown for nfnte horzon tme averages (rather than tme average expectatons [7]. D. Dscusson The onlne algorthm ensures the constrants (34 are satsfed. Ths shows that average utlty of each player s greater than or equal to the achevable utlty f the player were to constantly use some other pure strategy. Ths corresponds to the constrant n (13. If an algorthm makes random decsons ndependently every slot accordng to a condtonal probablty mass functon P r[α ω], then constrant (13 mples player cannot do better under any alternatve decsons, possbly those that mx pure strateges wth dfferent mxng probabltes every slot. A subtlety s that the onlne algorthm does not make statonary and randomzed decsons. Thus, t s not clear f a player wth knowledge of the algorthm could mprove average utlty by makng alternatve decsons that do not correspond to a pure strategy. Of course, the onlne algorthm yelds tme averages that correspond to a desred P r[α ω]. Thus, a potental fx s to run the onlne algorthm n the background and make α(t decsons accordng to the tme averages that emerge. A faster method mght be to modfy the acton choce selecton by usng tme averages of the Z (t and Q (s (t values, rather than ther nstantaneous values. ACKNOWLEDGEMENT Ths work s supported n part by: NSF grant CCF , the Network Scence Collaboratve Technology Allance sponsored by the U.S. Army Research Lab W911NF REFERENCES [1] F. Kelly. Chargng and rate control for elastc traffc. European Transactons on Telecommuncatons, vol. 8, no. 1 pp , Jan.-Feb [2] H. Mouln and J. P. Val. Strategcally zero-sum games: The class of games whose completely mxed equlbra cannot be mproved upon. Internatonal Journal of Game Theory, vol. 7, no. 3/4, pp , [3] R. Aumann. Subjectvty and correlaton n randomzed strateges. Journal of Mathematcal Economcs, vol. 1, pp , [4] R. Aumann. Correlated equlbrum as an expresson of bayesan ratonalty. Econometrca, vol. 55, pp. 1-18, [5] M. J. Osborne and A. Rubnsten. A Course n Game Theory. MIT Press, Cambrdge, MA, [6] J. F. Nash. Non-cooperatve games. Annals of Mathematcs, vol. 54, pp , [7] M. J. Neely. Stochastc Network Optmzaton wth Applcaton to Communcaton and Queueng Systems. Morgan & Claypool, [8] L. Georgads, M. J. Neely, and L. Tassulas. Resource allocaton and cross-layer control n wreless networks. Foundatons and Trends n Networkng, vol. 1, no. 1, pp , [9] L. Tassulas and A. Ephremdes. Dynamc server allocaton to parallel queues wth randomly varyng connectvty. IEEE Transactons on Informaton Theory, vol. 39, no. 2, pp , March [10] M. J. Neely, E. Modano, and C. L. Farness and optmal stochastc control for heterogeneous networks. IEEE/ACM Transactons on Networkng, vol. 16, no. 2, pp , Aprl [11] A. Erylmaz and R. Srkant. Far resource allocaton n wreless networks usng queue-length-based schedulng and congeston control. IEEE/ACM Transactons on Networkng, vol. 15, no. 6, pp , Dec [12] A. Stolyar. Greedy prmal-dual algorthm for dynamc resource allocaton n complex networks. Queueng Systems, vol. 54, no. 3, pp , [13] X. Ln, N. B. Shroff, and R. Srkant. A tutoral on cross-layer optmzaton n wreless networks. IEEE Journal on Selected Areas n Communcatons, Specal Issue on Nonlnear Optmzaton of Communcaton Systems, vol. 14, no. 8, Aug [14] E. Solan and N. Velle. Correlated equlbrum n stochastc games. Games and Economc Behavor, vol. 38, pp , [15] J. F. Nash. Equlbrum ponts n n-person games. Proceedngs of the Natonal Academy of Scences of the Unted States of Amerca, vol. 36, pp , [16] M. J. Neely. A Lyapunov optmzaton approach to repeated stochastc games. ArXv techncal report, Oct