Noisy mean field stochastic games with network applications


 Blaze McDaniel
 1 years ago
 Views:
Transcription
1 Noisy mea field stochastic games with etwork applicatios Hamidou Tembie LSS, CNRSSupélecUiv. Paris Sud, Frace Pedro Vilaova AMCS, KAUST, Saudi Arabia Mérouae Debbah Chaire AlcatelLucet e radioflexible, Supélec, Frace Abstract We cosider a class of stochastic games with fiite umber of resource states, idividual states ad actios per states. At each stage, a radom set of players iteract. The states ad the actios of all the iteractig players determie together the istataeous payoffs ad the trasitios to the ext states. We study the covergece of the stochastic game with variable set of iteractig players whe the total umber of possible players grow without boud. We show that the optimal payoffs, the mea field equilibrium payoffs are solutio of coupled system of backwardforward equatios. The limitig games are equivalet to discrete time aoymous sequetial games or to differetial populatio games. Usig multidimesioal diffusio process, a geeral mea field covergece to stochastic differetial equatio is give. We illustrate the cotrolled mea field limit i wireless etworks. CONTENTS I Itroductio 2 II Stochastic game with idividual states 4 IIA The settig IIB Policies ad Strategies IIC Differet types of payoffs IID Computatio of equilibria IIE Qvalues IIF Dyamic team problems ad stochastic potetial games III The classic mea field approach 8 IIIA Mea Field Iteractio model [] IIIB Mea Field Asymptotic of Markov Decisio Evolutioary Games ad Teams model [2]... 9 IV Cotrolled mea field iteractio V Noisy mea field approach VI Covergece to discrete time mea field 3 VIA Big stepsize VIA Mea field coordiatio games VIB Vaishig stepsize VIB Cetralized mea field cotrol VII Applicatio to malware propagatio 6 VIIA Homogeeous system VIIA Ucotrolled behaviour VIIA2 Cotrolled behaviour VIIB Heterogeeous system
2 VIIC Optimal strategy for the homogeeous system VIID Noisy mea field VIIE Cyclig behavior i IEEE 82. CSMA based Cogitive Networks VIII Cocludig remarks 22 Refereces 23 Appedix 23 I. INTRODUCTION Dyamic Game Theory deals with sequetial situatios of several decisio makers (ofte called players) where the objective for each oe of the players may be a fuctio of ot oly its ow preferece ad decisio but also of decisios of other players. Dyamic games allow to model sequetial decisio makig, timevaryig iteractio, ucertaity ad radomess of iteractio by the players. They allow to model situatios i which the parameters defiig the games vary i time ad players ca adapt their strategies (or policies) accordig the evolutio of the eviromet. At ay give time, each player takes a decisio (also called a actio) accordig to some strategy. A strategy of a player is a collectio of historydepedet maps that tell at each time the choice (which ca be probabilistic) of that player. The vector of actios chose by players at a give time (called actio profile) may determie ot oly the payoff for each player at that time; it ca also determie the state evolutio. A particular class of dyamic games widely studied i the literature is the class of stochastic games. Those are dyamic games with probabilistic state trasitios (stochastic state evolutio) cotrolled by oe or more players. The discrete time state evolutio is ofte modeled as iteractive Markov decisio processes while the cotiuous time state evolutio is referred to stochastic differetial games. Discouted stochastic games have bee itroduced i [3]. Stochastic games ad iteractive Markov decisio processes are widely used for modelig sequetial decisiomakig problems that arise i egieerig, computer sciece, operatios research, ad social scieces. However, it is well kow that may realworld problems modeled by stochastic games have huge state ad/or actio spaces, leadig to the wellkow curse of dimesioality that makes solutio of the resultig models itractable. I additio, if the size of the system grows, the umber of parameters: states, actios, trasitios explode expoetially. I this paper we develop a oisy mea field limit for stochastic games with variable umber of iteractig players. Cetralized ad decetralized mea field solutios are obtaied by idetifyig a cosistecy relatioship betwee the idividualstatemass iteractio such that i the populatio limit each idividual optimally respods to the mass effect ad these idividual strategies also collectively produce the same mass effect presumed iitially. This leads to a coupled system forward/backward equatios. Related work Mea field iteractios with spatially distributed players ad types ca be described as a sequece of dyamic games. Sice the populatio profile ivolves may players for each type or class, a commo approach is to igore idividual players ad to use cotiuous variables to represet the aggregate average of typelocatiosecodary actios. The validity of this method has bee prove oly uder specific timescalig techiques ad regularity assumptios. The mea field limit is the modeled by state ad locatiodepedet time process. This type of aggregate models, also kow as oatomic or populatio games, have bee studied by Wardrop (952) i a determiistic ad statioary settig of idetical players. We describe below some recet advaces i mea field cotrol ad games. Lasry & Lios itroduced i recet papers [4], [5], [6] a geeral mathematical modelig approach for highly dimesioal systems of evolutio equatios correspodig to a large umber of players (particles or agets). They exteded the field of such meafield approaches also to problems i ecoomics, fiace ad game theory. They studied player stochastic differetial games ad the related problem of the existece of equilibrium poits, ad see also the populatio mass actio iterpretatio i Nash (95).
3 by lettig ted to ifiity they derived the meafield limit equatio. However, the discrete time Markov games are ot examied i their aalysis. Decetralized mea field stochastic cotrol ad Nash Certaity Equivalece have bee studied i [7], [8] for large populatio stochastic dyamic systems. Ispired by mea field approximatios i statistical mechaics ad liear quadratic differetial games, the authors aalyzed a commo situatio where the dyamics ad rewards of ay give aget are iflueced by certai aggregate of the mass multiaget behaviors ad established the existece of equilibria ad optimal cotrol strategies. I the ifiite populatio limit, the agets become statistically idepedet uder some assumptios, a pheomeo related to the propagatio of chaos i mathematical physics. The paradigm of mea field dyamics has bee to associate relative growth rate to actios accordig to the expected payoff they achieved, the study the asymptotic trajectories of the state of the system, i.e. the fractio of users that adopt the differet idividual ad actios. Classes of mea field optimal cotrols have bee studied i [7], [9], []. The aspects of heterogeeity of players which play a importat role i heterogeeous systems are ot cosidered i [7], [9], [], [8], []. The authors i [2] have itroduced evolvig stochastic games with a fiite umber players, i which each player i the populatio iteracts with other radomly selected players (the umber of players i iteractio evolves i time). The states ad actios of each player i a iteractio together determie the istataeous payoff for all ivolved players. They also determie the trasitio probabilities to move to the ext state. Each idividual wishes to maximize the total expected payoff over a ifiite horizo. Uder restricted class of Markovia strategies, the radom process cosistig of oe specific player ad the remaiig populatio coverges weakly to a jump process drive by the solutio of a system of differetial equatios. The authors proved that the large populatio asymptotic of the microscopic model is equivalet to a (macroscopic) stochastic populatio game i which a local iteractio is described by a sigle player agaist a evolvig populatio profile. Aother characterizatio of the log ru mea field game is described by usig the socalled differetial populatio games. Here we exted the work i [2] which is limited to determiistic limit uder strog coditios (bouded secod momet, statioary strategies). Our work exteds also [] i which the decisios of the players are ot examied. Our cotributio The cotributio i this paper is threefold. First, from the theoretical poit of view, we preset a ovel mea field approach, which tries to overcome oe of the limitatios of the classical mea field approach, that is, the approximatio of a iheretly stochastic system with a determiistic represetatio (ordiary differetial equatio). We propose a ew approach i which we preserve the mai advatage of the classical mea field, that is, the reductio of the parameters i the aalysis of large systems, but addig a radom or oisy compoet. This ew additio could lead to a more realistic mathematical model of the origial situatio ad may local ad simultaeous trasitios may occurs at the same time makig the secod momet ubouded. I this cotext the work i [], [2], [2] are ot applicable aymore because the secod momet of the umber of object trasitios per time slot may ot vaish whe the umber of objects goes to ifiity. A typical sceario is whe may players do parallel trasitios. The idea of the proposed aalysis is that, if the third order i the Taylor approximatio is bouded, the the oise is ot egligible ad a covergece to a stochastic mea field limit ca be established. Ispired from the work of [3] based o multidimesioal diffusio process, we were able to establish a mea field covergece to odetermiistic differetial equatios ad exted the previous works i mea field iteractio models (with ad without cotrols). This ew mea field limit which is stochastic is called oisy mea field limit ad applied i this work to malware propagatio i opportuistic etworks. Secod, from the malware propagatio modellig poit of view, we exted the model developed o [] i which the types are ot used ad the impact of the cotrol parameters are ot specially studied. This leads to a limitatio i the results obtaied, because differet types of systems could lead for example to slower rates of propagatio. They ca represet, for example, differet operatig systems, differet versios of the operatig systems or patched/upatched versio of the same operatig system. To the best of our kowledge, i most of the related work about malware spreadig i large etworks authors do ot model the heterogeeity of the etwork. The authors i [4] examied cotrolled dissemiatio usig Potryagi maximum priciple. However, the mea field covergece of their fluid model is ot provided. I [5], the authors aalyze spatial mea field model betwee differet locatios but the cotrol framework is ot examied there. We observe that cotrol parameters are importat
4 i the mea field limit sice they give ew isights to ucotrolled mea field framework which may be costraied (eergy limitatio). This helps i cotrollig the proportio of ifected odes. Third, we study the impact cotrol desig i IEEE 82. CSMAbased cogitive radio etworks. We show that the cotrol ca help i stabilizig cyclig cofiguratio uder which the decouplig assumptio may holds (ad the fixed poit approximatio eeds more justificatio). Orgaizatio The rest of the paper is structured as follows. I ext sectio we preset the stochastic game with radom set of iteractig players. The we preset a backgroud o existig mea field models i discrete time. We provide a geeral covergece to mea field which is characterized by a stochastic differetial equatio ad the payoff evolutio are solutio of partial differetial equatios. We characterize the mea field optimality by a coupled system of BellmaShapley optimality combied with a discrete time mea field (forward) equatio, HamiltoJacobiBellma equatio combied a mea field differetial equatio (Kolmogorov forward or Fokker Plack equatio). We examie the existece of solutios of these systems ad derived the existece of mea field equilibrium. Coectio to differetial populatio games ad aoymous sequetial games are established. Fially, we apply mea field to opportuistic wireless large etworks ad illustrates importat of cotrol desig i IEEE 82. CSMAbased cogitive radio etworks. II. STOCHASTIC GAME WITH INDIVIDUAL STATES I this sectio, we preset the class of stochastic games with idividual states ad radom umber of iteractig players which is closely related to the mea field model that we study here. We cosider a stochastic game i discrete time i which each player has a fiite umber of states. For each idividual state, the player has fiite umber of actios. I cotrast to classical formulatio, here we describe explicitly the idividual state evolutio ad the variability of the set of iteractig players, leadig to a stochastic game with radom umber of iteractig players. A. The settig A stochastic game with idividual states ad radom set of iteractig players is a collectio Γ = (N, S, (X j, Ãj, A j, r j ) j N, q, B ) where N is the set of players. The cardiality of N is. S is a set of eviromet states. A state vector. For every player j, X j is its ow state space. A state has two compoets as follows: the type of the player ad the iteral state. The type is a costat durig the game. The state of player j at time t is deoted by Xj (t) = (θ j, Yj (t)) j is the type. The set of possible states X j = {, 2,..., Θ} Y j is fiite. Y j may iclude other parameters, such as, space locatio, curret directio ad so o. For every player j N, Ã j is the set of actios of that player. A j : S X j 2Ãj is a setvalued map (correspodece) that assigs to each state (s, x j ) S X j the set of actios A j (s, x) that are available to player j. We deote the stateactio space by SXA = { (s, x, a) : (s, x,..., x ) S j N X j, a = (a j ) j N, a j A j (s, x), j N } For every player j N, r j : SXA R is a istat payoff fuctio for player j. q : SXA (S j X j) is a trasitio fuctio where (S j X j) is the space of probability distributios over S j X j. The margial of q is deoted by q j, which represets the trasitio probabilities of state of player j.
5 B is a radom set of players: B N deotes the set of iteractig players at curret time. The payoff fuctio will be deoted by rj B. How the dyamic game evolves? The dyamic game starts at a iitial (s, x ) S j X j ad is played as follows. Time is discrete ad time space is N. A radom set Bt of players are radomly selected for a oeshot iteractio. The curret state s t, x t, Bt is kow by all the selected players. At each time t N, each player j Bt, chooses a actio a j,t A j (s t, x t ), ad receives a istat payoff r j (s t, x t, a t ), where a t = (a j,t ) j N, ad the game moves to a ew state accordig to a probability distributio give by q(. s t, x t, a t ) (S j X j ). The payoff ad trasitios of oselected players are zero ad the payoff fuctio r B t j of a selected player j does ot deped o the stateactios of the oselected players. B. Policies ad Strategies We start by the case where the actios ad the value of the past states are observed, that is, at each time t, the collectio {s, x, B, a, s, x, B, a,..., s t, x t, B t } of statesactios that were visited i the past ad the actios that were chose by all the players is kow at time t. This assumptio is too strog for most of etwork applicatios, ad the assumptio eed to be relaxed for example, the set of active players is ot ecessarily kow by all the players ad the states of the other players are ot kow. Histories Uder complete iformatio of past play, a history of legth t + correspods to the sequece h t = (s, x, B, a, s, x, B, a,..., s t, x t, B t ). Strategies A pure strategy σ j of player j is a mappig that assigs to every fiite history h t a elemet of A j (s t, x t ). A mixed strategy is a probability distributio of the pure strategies. The set of mixed strategies is deoted by Σ j. A behavioral strategy of j is a collectio of mappigs (σ j,t ) t with σ j,t from the space of histories to a probability distributio over A j (s t, x t ) Ãj. Statioary strategies A simple class of strategies is the class of statioary strategies. A strategy profile (σ j ) j N is statioary if j, σ j (h t ) depeds oly o the curret pair (s t, x t ). A statioary strategy of player j ca be idetified with elemet of the productspace of (s,x) S X (A j(s, x)). Every profile σ = (σ j ) j N j Σ j of mixed strategies, together with the iitial state s, x iduces a probability distributio P s,x,σ over the space of ifiite plays SXA. We deote the correspodig expectatio operator by E s,x,σ. C. Differet types of payoffs We examie three classes of payoff fuctios to evaluate the sequeces of payoffs that the players receive i a stochastic game. Let σ be a mixed strategy profile. I the fiitehorizo payoff, a player cosiders the cumulative payoff durig the first T times. For every fiite T N, the fiitehorizo payoff is for player j is ] F j,t (s, x, σ) = E s,x,σ [ T t= r B t j (s t, x t, a t )l {j B t }. () We deote the associate stochastic game by Γ T. I the discouted payoff, a player cosiders the discouted sum of his istataeous payoffs: For differet discout factors β j <, the β j discouted payoff uder the strategy σ for player j is [ + ] F j,β (s, x, σ) = ( β j )E s,x,σ βjr t B t j (s t, x t, a t )l {j B t }. (2) The term ( β j ) is a ormalizatio factor: Sice + t= βt j = β j, by multiplyig by ( β j ) both sides oe gets ( β j ) + t= βt j =. We deote the associated stochastic game by Γ β. t=
6 The limif (or limsup) payoff, i which a player cosiders the limif of his logru average payoffs. For player j, the limif payoff is [ T ] t= F j, (s, x, σ) = E s,x,σ lim if rb t j (s t, x t, a t )l {j B t } T + T t = l. (3) {j B t } We will cosider the limif or the limsup whe the the limit of Cesaromea payoff does ot exists. We deote the associate stochastic game by Γ. Defiitio. Let ɛ. A strategy profile σ = (σ j ) j N is a T stage ɛ (Nash) equilibrium if (s, x ) S j X j, j N, σ j Σ j (4) F j,t (s, x, σ ) F j,t (s, x, (σ j, σ j)) ɛ, (5) where σ j meas (σ i ) i j ad (σ j, σ j ) deotes the profile (σ,..., σ j, σ j, σ j+,..., σ ). A strategy profile σ = (σ j ) j N is a β discouted ɛ equilibrium if (s, x ) S j X j, j N, σ j Σ j (6) F j,β (s, x, σ ) F j,β (s, x, (σ j, σ j)) ɛ, (7) A strategy profile σ = (σ j ) j N is a lim if ɛ equilibrium if (s, x ) S j X j, j N, σ j Σ j, (8) F j, (s, x, σ ) F j, (s, x, (σ j, σ j)) ɛ, (9) The payoff of a ɛ equilibrium is called ɛ equilibrium payoff. A equilibrium (simply called  Nash  equilibrium) of the stochastic game is a strategy profile such that o player ca profit by a uilateral deviatio. A refied otio of (Nash) equilibrium is subgame perfect equilibrium ad ɛ subgame perfect equilibrium. It ca be see as the requiremet for the credibility of a equilibrium strategy. A strategy profile σ is called a subgame perfect ɛ equilibrium if σ(h t ) is a ɛ equilibrium strategy profile for every fiite history h t. Note that the radom set Bt ca be see as a state compoet. Whe j Bt its actio space is A j (s t, x t ) otherwise his actio space is a set of dummy actios i term of payoffs i.e { B j (s t, x t, Bt Aj (s ) = t, x t ) if j Bt B j otherwise where the actios i B j does ot affect the payoffs. The, uder complete iformatio, the stochastic game with radom umber of iteractig players ca be trasformed i the classical formulatio with (s t, x t, B t ) as the augmeted state of dyamic game. Therefore, the followig result follows. Lemma. For every N, the stochastic games with radom umber of iteractig players Γ T, Γ β have at least oe equilibrium. The stochastic game Γ has a equilibrium uder ergodicity coditios [6] The discouted stochastic game Γ β has at least oe statioary equilibrium. Moreover, Γ β has at least oe subgame perfect ɛequilibrium. The first ad secod statemets are classical results i stochastic games with idividual states, see [2] ad the refereces therei. The proof of the last statemet follows from [6].
7 D. Computatio of equilibria Uder discout factors, complete iformatio, fixed set of players durig all the iteractio, we ca write the BelmaShapley operator to compute the equilibria (the solutios obtaied by dyamic programmig are also subgame perfect equilibria ad the obtaied strategies are Markovia). The operator O = (O j ) j N is give by O j (V )[s, x] = max{( β j )r j (s, x, u) + β j q(s, x s, x, u)v j (s, x )} u j s,x where u = (u,..., u ) is a statioary strategy profile. O j (V ) = (O j (V )[s, x]) s,x. A fixed poit solutio of O gives equilibrium payoff as well as β discouted equilibrum obtaied by collectig the maximizers of ( β j )r j (s, x, a)+β j s,x q(s, x s, x, a)v j (s, x ). Uder the same assumptios, oe ca write similar dyamic programmig priciple (by omittig the factor β j ) for the fiite horizo case (backward iductio). Uder variable set of iteractig players, the dyamic programmig priciple becomes: O j (V )[s, x, B {, u j] = max ( β j )r B u j (s, x, u) j + β j L(s, x, B s, x, B, u)v j (s, x, B, u j) s,x, B where L(s, x, B s, x, B, u) = q(s, x s, x, u)q 2 ( B B ), q 2( B B ) = P(B t = B ) is the probability that B is the set of active players at the curret time. The set B describes all the possible subsets of N (except the empty set). So, oe has 2 elemets i additio to the set S j X j. Naturally, several questios arise o the complexity of computatio of this system. The first questio is how to reduce the complexity o solvig of the dyamic programmig equatios? The secod questio is the dyamic programmig priciple whe we have less iformatio ad relaxed observatio assumptios? These questios are crucial i term of computatio of the solutios ad the applicatio of stochastic games to experimetal iteractive scearios. I ext subsectio, we examie the case where the kowledge of the trasitio probabilities ca be relaxed via Qlearig ad stochastic approximatios. E. Qvalues While a optimal strategy ca, i priciple, be obtaied by the methods of dyamic programmig, policy iteratio, ad value iteratio, such computatios are ofte prohibitively timecosumig. I particular, the size of the state space grows expoetially with the umber of state variables, a pheomeo referred to by Bellma as the curse of dimesioality. Similarly, the size of the actio space ca also lead to computatioal itractability. I the sigle player case, Qlearig [7] perhaps the most wellkow example of reiforcemet learig, is a stochasticapproximatiobased solutio approach to solvig the fixedpoit equatio ( ) Q (s, x, a) = ( β)r(s, x, a) + β s,x q(s, x s, x, a) sup Q (s, x, b) b A(s,x ) Oe ice feature is to lear the Qvalue without the kowledge of the trasitio probabilities. The iterative versio is give by Q t+ (s, x, a) = Q t (s, x, a) + λ t (s, x, a) [( β)r(s, x, a)+ ( ) ] β sup Q t (s, x, b) b A(s,x ) Q t (s, x, a) where λ t (s, x, a) is a learig rate fuctio satisfyig t λ t(s, x, a) = +, t λ t(s, x, a) 2 < +, which is a stadard assumptio i stochastic approximatio. I the sigle player case, the Qlearig algorithms give good properties. However, the covergece of the algorithm i the multiplayer stochastic game case is still a ope ad challegig issue. I particular, i presece of multiple equilibria the covergece of the Qvalues are uclear.
8 F. Dyamic team problems ad stochastic potetial games I the case of actio idepedet trasitios, we show the existece of pure equilibrium if the auxiliary games are potetial games. Below, we preset sufficiet coditios for the existece of pure equilibrium i the game Γ β. Propositio. If i the discouted stochastic game Γ β the payoff fuctio is the same for all the players i.e r j (.) = r(.) the the discouted stochastic game is a team problem or a coordiatio game ad has at least oe pure (Nash) equilibrium. If i the discouted stochastic game Γ β, the trasitios betwee the states are idepedet o the actios played ad for each state (s, x), the fiite game G(s, x) = (N, A j (s, x), r j (s, x,.)) is a potetial game i the sese of Moderer & Shapley (996, [8]), the the discouted stochastic game has at least oe pure (Nash) equilibrium, where r j (s, x,.) = ( β j )r j (s, x,.) + β j s,x q(s, x s, x)v j (s, x ). Proof: The first result follows from the fact that, the discouted stochastic game ca be see as a discouted team problem. By the use of Bellma optimality, we associate a dyamic programmig priciple to the commo objective fuctio r. The, we use the result of Blackwell (962, [9]) which gives the existece of optimal statioary pure strategies. V R S X, V (s, x) = max uj (A j(s,x)){r(s, x, u) + β j s,x q(s, x s, x, u)v (s, x )}. We ow prove the secod result. Sice the players have o ifluece i the trasitio probabilities, each player does its best i the visited states. Each state (s, x), oe has a pure (Nash) equilibrium a s,x of the auxiliary game with r j (s, x, a) + β j s,x q(s, x s, x)v j (s, x ) by usig Moderer & Shapley [8]. The collectio of the pure equilibria a = (a s,x) (s,x) S X is a statioary pure equilibrium of the discouted stochastic game. This completes the proof. For the remaiig we assume that the states of the other players is ot kow. Each player has the its ow private history h j,t = (s, x j,, O, a j,,..., s t, x j,t, O t, a j,t ) where O t is a radom variable idicatig if j is active or ot, O t = l {j B t }. I particular a player does ot observe the past actios players by others ad does ot kow with whom he is curretly iteractig (the subset Bt is ukow by the players). We assume that A j (s, x) = A j (s, x j ), x j := (x j ) j j j X j. III. THE CLASSIC MEAN FIELD APPROACH I this sectio we preset a summary of the previous work o mea field ad the relatio with stochastic games. A. Mea Field Iteractio model [] The mai iterests of [] are models of iteractig objects i discrete time, with fiite umber of states. The objects share local resources, which have a fiite umber of states. The objects are observable oly through their state. I the limit, whe the umber of objects goes to ifiity, it is foud that the system ca be approximated by a determiistic, usually o liear dyamical system called the mea field limit. The mea field is i discrete or cotiuous time, depedig o how the model scales with the umber of objects. If the expected umber of trasitios per object per time slot vaishes whe grows, the the limit is i cotiuous time. Else the limit is i discrete time. Oe of the mai advatages of the mea field approach is that it could drastically reduce the umber of parameters i the aalysis of large systems by usig a aggregate dyamical system descriptio. The first assumptio is that X (t) X is a homogeeous Markov chai. The secod oe is that the trasitio kerel L of X (t) is ivariat uder ay permutatio of the labelig of the objects. That is, L (x,.., x, s; x,.., x, s ) = L (x σ(),.., x σ(), s; x σ(),.., x σ(), s ) = P(X (t+) = x,.., X (t+) = x, S(t+) = s X (t) = x,.., X (t) = x, S(t) = s) for ay permutatio σ of the idex set {, 2,.., }. The, X (t) is called a mea field iteractio model with objects. The occupacy measure is defied as Mi (t) = j= l {Xj (t)=i}. By the ivariace assumptio, (S(t), M (t)) is a homogeeous Markov chai. Its state space is S, where is the simplex {m : i m i =
9 , m i i}. Sice we are iterested i the asymptotic results whe grows to ifiity, we eed to set the proper time scalig. Assume there exists a vaishig δ called itesity (or stepsize), which ca be uderstood as the expected umber of objects that do a trasitio i oe time slot. Also defie the drift f (m, s) = E(M (t + ) M (t) M (t) = m, S(t) = s), which is the expected chage i the occupacy measure i oe time slot. The mai scalig assumptios i this model are: H: The resource does ot scale with, that is, exists lim L s;s (m) = L s;s (m), s, s S where L s;s (m) = P(S(t + ) = s M (t) = m, S(t) = s). Moreover, the trasitio matrix L(m) is ergodic. H2: Itesity vaishes at a rate δ, that is, exists δ, such that lim δ = ad f (m, s) lim = f(m, s), s S δ H3: Secod momet of umber of object trasitios per time slot is bouded, which is equivalet to s S, x, x X l { m m 2}(m x m x)(m x m x ) L (dm ; m, s) m δ where L (m ; m, s) := (x L (x ; x, s).,...,x ) j= δ x =m j H4: L s;s (m) is a smooth fuctio of δ ad m. H5: f (m, s) is a smooth fuctio of δ ad m. Later we will see that these assumptios are ideed relaxed i our approach. This theorem ca be used to approximate M (t) by the solutio m(δ t) of a ODE with the same iitial coditio, that is, Theorem. If M () m i probability [resp. i mea square] as the sup τ T M (τ) m(τ) i probability [resp. i mea square], τ [tδ, (t + )δ ], where m(τ) satisfies { ṁ = f(m) m() = m where f(m) = s w s(m)f(m, s) ad w(m) is the ivariat probability of the trasitio matrix L(m). B. Mea Field Asymptotic of Markov Decisio Evolutioary Games ad Teams model [2] The mai iterests of [2] are large populatios of players i which frequet iteractios occur betwee small umbers of chose idividuals. Each iteractio i which a player is ivolved ca be described as oe stage of a dyamic game. The state ad actios of the players at each stage determie a immediate payoff for each player as well as the trasitio probabilities of a cotrolled Markov chai. Each player wishes to maximize or miimize its expected payoff averaged over time. This model with a fiite umber of players, is i geeral difficult to aalyze because of the huge state space required to describe all of the players. The takig the asymptotics as the umber of players grows to ifiity, the whole behavior of the populatio is replaced by a determiistic limit that represets the systems state, which is fractio of the populatio at each idividual state that use a give actio. For large, uder assumptios aalogous to HH5, the mea field coverges to a determiistic measure that satisfies a oliear ordiary differetial equatio for uder ay statioary strategy. They show that the mea field iteractio is asymptotically equivalet to a Markov decisio evolutioary game. The mea field asymptotic calculatios for large for give choices of strategies allows to compute the equilibrium of the game i the asymptotic regime. Theorem 2. If M () m i probability as the, for ay statioary strategy u, ad ay time t, M (t) coverges i distributio to the solutio of { ṁ = f(u, m) u is formally defied i the ext sectio. m() = m
10 IV. CONTROLLED MEAN FIELD INTERACTION I this sectio, we itroduce a cotrolled mea field iteractio model. The fiite versio of this model is a particular case of stochastic games with idividual states. We restrict our attetio ito a particular class of behavioral strategies withi we are able to establish the mea field covergece. This restrictio is due to the fact that whe the umber goes to ifiity, the dimesio of the set of statioary strategies goes to ifiity as well as. The mea field approach offers a alterative to direct computatio of the ifiite systems kow as curse of dimesioality. By lettig the size of the system go to ifiity, the discrete stochastic game problem is replaced by a limit of system of HamiltoJacobiBellma equatio coupled with a mea field limit ODE or coupled system of BellmaShapley optimality ad discrete mea field evolutio, that are determiistic ad where the dimesioality of the origial system has bee trasformed i the massbehavior of the system. Time t N is discrete. The global state of the system at time t is (S(t), X (t)) = (S(t), X (t),..., X (t)). Deote by A (t) = (A (t),..., A (t)) the actio profile at time t. The system (S(t), X (t)) is Markovia oce the actio profile A (t) are draw uder Markovia strategies. We deote the set of Markovia strategies by U. Defie M (t) to be the curret populatio profile i.e Mx (t) = j= l {Xj (t)=x}. At each time t, M (t) is i the fiite set {,, 2,..., } X, ad Mx (t) is the fractio of players who belog to populatio of idividual state x. For a subset X X, defie M [u](t)(x ) := δ {X j [u](t) X }. Similarly, we associate the process Ua (t) = j= l {A j (t)=a} to the fractio of actios. Strategies ad radom set of iteractig players: At time slot t, a ordered list Bt, of players i {, 2,..., }, without repetitio, is selected radomly as follows. First we draw a radom umber of players k t such that P( Bt = k M (t) = m) =: Jk (m) where the distributio J k (m) is give for ay, m {,, 2,..., } X. Secod, we set Bt to a ordered list of k t players draw uiformly at radom amog the ( )...( k t + ) possible oes. Each player such that j Bt takes part i a oeshot iteractio at time t, as follows. First, each selected player j Bt chooses a actio a j,t A(s, x j ) with probability u(a j s, x j ) where (s, x j ) is the curret player state. The stochastic array u is the strategy profile of the populatio. Deotig the curret set of iteractig players Bt = {j,..., j k }. Give the actios a j,..., a jk draw by the k players, we draw a ew set of idividual states (x j,..., x j k ) ad resource state s with probability L s;s (k, m, a), where a is the vector of the selected actios by the iteractig players. We assume that for ay give Markovia strategy, the trasitio kerel L is ivariat by ay permutatio of the idex of the players withi the same type. This implies i particular that the players are oly distiguishable through their idividual state. Moreover, this meas that the process M (t) is also Markovia oce the sequece of strategy is give. Deote by ws,s (u, m) be the margial trasitio probability betwee the resource states. Give ay Markov strategy ad ay vector m of (X ), the resource state geerates a irreducible Markov decisio process with limitig ivariat measure w s (u, m). The, we ca symplify the aalysis by fixig the resource state S(t) = s without losig geerality. The model is etirely determied by the probability distributios J, the trasitio kerels L ad the strategy profile u. j= V. NOISY MEAN FIELD APPROACH We provide a geeral covergece result of the mea field to a stochastic differetial equatio ad a martigale problem is formulated for the the law of the process M t. We are able to establish a mea field covergece to odetermiistic differetial equatios, thus, extedig the previous works i mea field iteractio [], [5], i mea field Markov decisio teams [2], [2] or i mea field Markov games [2], [9]. We show that eve if the expected umber of players that do a trasitio i oe time slot is ot bouded, oe ca have a mea field limit, but a stochastic oe. This mea field limit is referred to oisy mea field. Before to preset the mai theoretical results of this paper, we first itroduce some prelimiary otios. Let F t = σ(x (t ), A (t ), t t) be the filtratio geerated by the sequece of states ad actios up to t. The evolutio of
11 the system depeds o the decisio of the iteractig players. Give a history h t = (S(), X (), A (),..., S(t) = s, X (t), A (t)). X (t + ) evolves accordig to the trasitio probability L (x ; x, u, s) = P ( X (t + ) = x h t ) The term L (x ; x, u, s) is the trasitio kerel o X uder the strategy U. Let x = (x,..., x ) such that j= δ x = m ad defie j L (m ; m, u, s) = L (x ; x, u, s). The system evolves accordig to the kerel (x,...,x ) j= δ x =m j L (m ; m, u, s) := P(M (t + ) = m M (t) = m, U (t) = u, S(t) = s) = P(M (t + ) = m h t ) where h t = (S(t ), X (t ), A (t ), t t, S(t) = s, X (t) = x ), such that j= δ x = m. The term j L (m ; m, u, s) correspods to the projected kerel of L. Below we make sufficiet coditios o the trasitio kerels to get a weak covergece of the process Mt uder the strategy U (t). Ispired from chapter i [3] based o multidimesioal diffusio process, we are able to establish a mea field covergece to odetermiistic differetial equatios, thus, extedig the previous works i mea field iteractio [], [2], [5], i mea field Markov decisio teams [2], [], [2] or i mea field Markov games [2], [2], [9]. We first preset the mai assumptios of this paper. A: w s (u, m) is cotiuous differetiable i m ad u. A: There exists δ ad cotiuous mappig a : R d U S R d d such that lim sup u U sup m a (m, u, s) δ a(m, u, s) =, s S where a x,x,s(m, u, s) = l m m 2(m x m x )(m x m x )L (dm ; m, u, s), (x, x, s) X 2 S m ad the third momet is fiite. A2: There exists a cotiuous mappig f : R d U S R d such that where A3: For all ɛ > ; A3 : lim sup u U sup m f (m, u, s) δ f(m, u, s) =, s S fx (m, u, s) = l m m R(m x m x )L (dm ; m, u, s), x X, s S m R d sup u U lim sup u U δ sup sup m R d m R d l m m >ɛl (dm ; m, u, s) =, s S [ a (m, u, s) + f ] (m, u, s) <, s S δ δ Give a test fuctio φ defied o R d U, let Lφ(m, u) = f x (m, u) φ(m, u) + ã x,x (m, 2 u) φ(m, u). m x x 2 m x,x x m x where f x (m, u) = s S w s (m, u)f x (m, u, s)
12 ad ã x,x (m, u) = s S w s (m, u)a x,x (m, u, s) Let D φ (t) = m t t L zφ(m z, u z ) dz. D φ (t) is a martigale if ( t2 ) E φ(m t2 ) φ(m t ) L z φ(m z, u z ) dz m t, u t, t t = t With the otio of martigale, the questio of characterizatio of the mea field trajectory ca be formulate as a martigale problem. Oe ca ask if the property that D φ (.) is a martigale for all test fuctios φ uiquely characterizes the mea field m(.) apart from specifyig a iitial m(). The martigale problem for the L z is : The existece of probability measure π defied o R d such that π(m = m) =, ad D φ (.) is a martigale for ay test fuctio. Is there at most oe such π for each m? We ow defie the martigale problem for the iterpolated process from M t deoted by M (.). ( π,m,u ( M () = m) =, ) π,m,u M (t) = M (δ t ) + t δt δ (M (δ (t + )) M (δ t )), t [δ t, δ (t + )] =, t. [ ] π,m,u M (δ (t + )) E Ft = m E L (dm ; M (δ t ), u, s), t. where E is a measurable subset i R d. Theorem 3. Assume A A3. The, for ay test fuctio φ, geerator δ L φ(m, u, s) Lφ(m, u) for ay m, u. Moreover, if the fuctio ã(.,.) ad f(.,.) have the property that for each (m, u) R d U, the martigale problem for ã ad f has exactly oe solutio π m,u startig from m. The π,m,u π m,u as δ uiformly i m for ay strategy u U where π,m,u is the law of iterpolated process from M (t). I additio, if A3 holds, the the martigale problem has a uique solutio. The cosequece of this result is threefold: First, it gives a weak covergece of the mea field M t to a solutio of the stochastic differetial equatio dm t = f(m t, u t )dt + σ(m t, u t )dw t where W t is a Browia motio (the also called Wieer process) oise ad σ σ t = ã. Secod, this result geeralizes the determiistic mea field limit coditios established i [] for U equal to a sigleto. It geeralizes also the determiistic cotrolled mea field dyamics obtaied i [2]. Third, the coditios A A3, A3 are weaker tha those give i []. Uder the coditios i [], the oise term a whe goes the ifiity. Moreover, the cotiuity assumptio o the drift limit is ot eeded. If f admits a uique itegral curve, ad a is bouded ad cotiuous the, the result applies as well as. This allows us to apply it i wide rage of etworkig scearios with discotiuous drift limit but lower semicotiuity properties. Note that the uiformity i u may ot be satisfied. I that case, a local mea field solutio is derived. A example of such discotiuity is provided i [2], [2]. Our result exteds also the covergece theorem i [2] which was restricted to statioary strategies. Here u t is a admissible strategy at time t. Usig Itô s formula (see also [22]), the payoff evolutio for a fixed horizo T is the give by v(t, m) = g(m T ), t v(t, m) = r(m, u) + x X By combiig the system, oe gets the mea field optimality: Theorem 4. The mea field optimality for horizo T is give by f x (m, u) v(t, m) + ã x,x (m, 2 u) v(t, m) m x 2 m x m x (x,x ) X 2
13 v(t, m) = g(m T ) t v(t, m) = sup u t U t {r(m t, u t ) + f } x X x (m t, u t ) m x v(t, m t ) + 2 (x,x ) X ã 2 x,x (m t, u t ) 2 m v(t, m x m x t) dm t = f(m t, u t )dt + σ(m t, u t )dw t, t > m = m A. Big stepsize VI. CONVERGENCE TO DISCRETE TIME MEAN FIELD I this subsectio, we aalyze the case where the stepsize is ot vaishig whe goes to ifiity. I that case, the determiistic mea field limit is i discrete time ad drive by the probability trasitio L t (u, m). Give a iitial populatio profile m ad a termial payoff, the sequece of populatio profile {m t } t is drive by the trasitio probabilities {L t,s,x,s,x (u, m t)} t. m t+ (x) = x X m t (x )L t,x,x (u, m t ). () where L t,x,x (u, m) = k s w s(u, m)l t,s,x,x (u, m; k)j k (m), L t,s,x,x (u, m; k) is the limitig probability trasitio from x to x whe the resource state is s ad the umber of iteractig players is k. Combiig with the BellmaShapley optimality criterio, oe gets the followig system i the fiite horizo case: v j,t (s, x, m) = max u j r(s, x, u, m t) + L(s, x s, x, u, m t )v j,t+ (s, x () ) s,x m t+ (x) = x X m t (x )L t,x,x(u, m t ) For the discouted case, oe gets { vj,β (s, x, m) = max uj {( β j )r(s, x, u, m t ) + β j s,x L(s, x s, x, u, m t )v j,β (s, x )} m t+ (x) = x X m t(x )L t,x,x(u, m t ) ad for the average case { hj + v j (s, x, m) = max uj {r(s, x, u, m) + s,x L(s, x s, x, u, m)v j (s, x )} where h j is a costat. m t+ (x) = x X m t(x )L t,x,x(u, m t ) Defiitio 2 (Aoymous sequetial populatio game). To a game as defied i subsectio IV, we associate a macroscopic populatio game, defied as follows. Each member j of the populatio, with state X j (t) ad a populatio profile m t [u]. The iitial coditios of the game are S() = s, X j () = x, m[u]() = m. The populatio profile is solutio to the discrete time evolutio ad S(t), X j (t) evolves as a jump process give by the margial of L which depeds o m t ad u. Further, let F T (s, x, u, u, m) ( be the T stage payoff of player j i this game, give that X j () = T ) x ad m() = m, i.e. F T (s, x, u, u, m) = E r(s(t), X j(t), u, u, m t ) S() = s, X j () = x, m() = m, where m t evolves as i (). The discouted F β ad limitig average payoff F are defied i the same way. Equivalece to aoymous sequetial game of [23] Uder the above assumptios, the discrete time mea field stochastic game are equivalet to a large populatio aoymous sequetial game. The discouted case is a aoymous sequetial populatio game i the sese of [23]: The istataeous payoff of a idividual with strategy v, state x whe the populatio profile (massbehavior) is m t drive by u is defied by r(s, x, v, u, m t ). The trasitio betwee the states is give by L(s, x s, x, v, u, m t ).
14 If (u, m) satisfies the followig system: m t+ (x ) = x m t(x)l x;x (u t, m t ) t, x, a such that m x u t (a s, x) > = a arg max b ( β)r(s, x, b, m t ) + β s,x v t+(s, x, m t )L(s, x s, x, b, u t, m t ) the we say that (u, m) is a mea field equilibrium. Propositio 2. Uder the above assumptios, the fiite horizo (resp. the discouted) aoymous sequetial populatio game has at least oe mea field equilibrium. Proof: The proof of this result follows from Jovaovic & Rosethal (988). ) Mea field coordiatio games: I this subsectio, we provide sufficiet coditios for mea field covergece i team problems ad global optimal problems. Cosider oe team. A payoff of a player i the team is r(x, a) whe the team actio is a A ad x is the state x X. The payoff fuctio is the same of each member of the team. For a measure m (X ), we defie the expected payoff of the team as (r a (m)) a A. r a (m) = x X r(x, a)m x A history of legth t is ow a sequece of (A (t ), M (t )) t t. Let σ be a behavioral strategy. The payoff defied above becomes [ T ] FT (m, σ) = E m,σ r at (Mt ). (2) t= [ + ] Fβ (m, σ) = ( β)e m,σ β t r at (Mt ). (3) F (m, σ) = E m,σ t= [ lim if T + T ] T r at (Mt ). (4) We keep the idex i the payoff fuctios because the strategy σ takes a sequece of occupacy measure or populatio profile M (t) ad gives A (t), these radom process depeds o. Our aim is to prove the covergece of fiite coordiatio game to a mea field coordiatio game for (i) strategies, (ii) populatio profile, ad (iii) payoffs. Propositio 3. Assume that lim t= sup L (. x, u, m ) L(. x, u, m ) = m (x )> There exists c > such that for all u, x, m, m, oe gets M () m, The, the followig holds L (. x, u, m) L(. x, u, m ) c m m Covergece of payoff fuctios: lim F β (M ()) = F β (m ). Characterizatio of limitig optimal strategies: For all N, there exists > such that BR β (θ, M ()) BR β (θ, m ) where v β (θ, m) = sup{( β)r θ,a (m) + β L (m m, θ, a)vβ (θ, m )} a m
15 BR β (θ, m) = {a, v β (θ, m) = ( β)r θ,a(m) + β m L (m m, θ, a)v β (θ, m )} is the set of actios reachig the maximum ad BR β (θ, m ) = {a, v β (θ, m ) = ( β)r θ,a (m ) + βv β (θ, m, L(. m, θ,.) )} A mea field limit solves m = m, L(. u, m, θ,.) ad u BR(m ) Proof: I appedix. B. Vaishig stepsize We defie the rescaled processes with stepsize δ. X j (δ t) = X j (t), j N, M j (δ t) = M (t), t N. M (δ t + t ) = M (t) + t δ (M (t + ) M (t)), t (, δ ) ) Defie the drift as f (u, m, s) = E ( M (δ (t + )) M (δ t) M (δ t) = m, Ã (δ t) u, S(δ t) = s We assume (i) δ f (u, m, s) f(u, m, s) where f is locally Lipschitz, (ii) The first ad secod momets of B are fiite whe +. Uder these assumptios, if the iitial process M () coverges i distributio to m (S j X j) the, the process M coverges i distributio to a determiistic measure (mea field limit) m(t) which is solutio of the ordiary differetial equatio ṁ = f(m). Propositio 4 ([2]). Uder the same assumptios, the process M [u] (which the M uder the strategy u) coverges i distributio to a determiistic measure m[u](t) which is solutio of the ordiary differetial equatio ṁ = f(u, m). Defiitio 3 (Idividual optimizatio framework i populatio game). To a game as defied i subsectio IV, we associate a macroscopic populatio game, defied as follows. Each member j of the populatio, with state X j (t) ad a populatio profile m[u](t). The iitial coditio of the game is S(t) = s, X j () = x, m[u]() = m. The populatio profile is solutio to the discrete time evolutio ad S(t), X j (t) evolves as a jump process give by the margial of q which depeds o m(t), u ad the strategy u of j. Further, let F T (s, x, u, u, m) be the T stage payoff of player j i this game, give that X j () = x ad m() = m, i.e. ( T ) F T (s, x, u, u, m) = E r(s(t), X j (t), u, u, m(t)) S() = s, X j () = x, m() = m where m is solutio of the ODE ṁ(t) = f(u, m(t)). The discouted F β ad limitig average payoff F are defied i the same way. Equivalece to differetial populatio game of [2] Uder the above assumptios, the cotiuous time mea field stochastic game are equivalet to a differetial populatio game. The istataeous payoff of a idividual with strategy v, state x whe the populatio profile (massbehavior) is m(t) drive by u is defied by r(s, x, v, u, m). The trasitio betwee the state is give by L(s, x s, x, v, u, m). ) Cetralized mea field cotrol: We ow provide the feedback optimality priciple for the global expected mea field payoff r(u, m). The T stage mea field optimizatio problem write subjected to ṁ = f(u, m), m() = m. sup u T r(u(t), m(t)) dt + g(m T )
16 A set of strategy u (t) = φ(t, m) costitutes a optimal mea field solutio if there exist cotiuously differetiable fuctios v(t, m) defied o [, T ] R d R satisfyig the followig HamiltoJacobiBellma equatio combied with the mea field ODE. = v(t, m) + max t { r(u, m) + mv(t, m), f(u, m) } (5) u = t v(t, m) + r(φ(t, m), m) + mv(t, m), f(φ, m) (6) v(t, m) = g(m T ) (7) ṁ = f(u, m), m() (8) I may situatios, the horizo of the game is ot kow (for example the lifetime of a user i the system is ukow). Oe of the techiques to solve this problem is to cosider the ifiite horizo game. The optimality equatio for discouted payoff sup u + subjected to ṁ = f(u, m) where β > writes: t e β (t t ) r(u(t), m(t)) dt = β v(t, m) + max { r(u, m) + mv(t, m), f(u, m) } u = β v(t, m) + r(φ(t, m), m) + m v(t, m), f(φ, m), (9) v(t, m) = g(m T ) (2) ṁ = f(u, m), m(t ). (2) VII. APPLICATION TO MALWARE PROPAGATION I this sectio, we apply the mea field approach to a cotrolled malware propagatio i opportuistic etworks. The malware propagatio model is based o [] i which the impact of the cotrol parameters is ot examied ad the player types are ot used. The types ca represet, for example, differet operatig systems, differet versios of the operatig systems, or patched/upatched versio of the same operatig system. I most of the related work about malware spreadig i large etworks authors do ot model the heterogeeity of systems which forms the etwork, as far as we kow. This leads to a limitatio i the results obtaied, because differet types of systems could lead for example to slower rates of propagatio. I this example we have mobile odes that ca be ifected by a malicious code. There are two ifected states: passive ad active. No ifected odes are susceptible. The, the set of possible state of a ode is {P, A, S} (for passive, active ad susceptible) ad the set of possible types is {θ, θ 2 }. The state of the system at time t is X (t) = (P (t), P 2 (t), A (t), A 2 (t), S(t)), where P (t) + A(t) + S(t) =, j P j(t) = P (t), j A j(t) = A(t), t ad is the total umber of mobiles i the system. I this example, there is o resource. The occupacy measure is M θ (t) = (P θ(t)/, A θ (t)/, S(t)/) = (P θ (t), A θ (t), S (t)). At every time step we wat to cotrol the proportio of ifected odes, which is I (t) := A (t)+p (t). There are two fudametal ways to get ifected: ) Caused by a system flaw. (e.g: a exploit that could allow arbitrary code executio). 2) Caused by huma flaw. (e.g: the user is deceived ad executes a dagerous piece of code). We ca model this system as a cotrolled mea field iteractio model. The iteractio is simulated usig the followig rules: ) A passive ode may become susceptible (ioculatio) with probability δ P. 2) A passive ode with type θ may opportuistically ecouter aother passive ode of type θ, ad both become active. This occurs with probability proportioal to the frequecy of other passive odes at time t. For type θ, the probability is λ(p θ (t) l {θ=θ }). Note that the passive ode ca decide to cotact the other passive ode or ot, so there are two possible actios: {m, m} (for meet ad ot meet). Those evets will be modeled
17 as a Beroulli radom variable with success (meetig) probability δ m, which represets u(m P, θ). Here we model the possibility of gettig ifected by a system flaw. 3) A active ode may become susceptible (ioculatio) with probability δ A. 4) A active ode of type θ may become passive with probability β P θ (t) h θ+pθ (t) at time t. Here is assumed that, at high cocetratios of passive odes, each active ode ifects some maximum umber of passive oes per time step. This reflects fiite total badwidth. The parameter β has the iterpretatio of the maximum ifectio rate. The parameter h θ is the passive ode desity at which the ifectio proceeds at half of its maximum rate. Here we model the possibility of gettig ifected by a system flaw. 5) A susceptible ode may become active with probability δ S 6) A susceptible ode may become passive via two ways. First, δ Sm is the probability of gettig ifected by a huma flaw. I this case, the susceptible ode ca decide to get deceived or ot, so there are two possible actios: {o, ō}. The statioary strategy i this case will be modeled as a coi toss with probability δ e. Secod, η(pθ (t) + P θ (t)) models the probability of ecouterig a passive ode. I this case, the passive ode ca decide to cotact the susceptible ode or ot, ad it is modeled aalogously to the other two cases. At every time step, oe of the trasitios is radomly selected ad performed. The umber of odes that do a trasitio i oe time slot is always, or 2. I order to cotrol the ifected populatio, each trasitio has a certai payoff cotributio which could be if o ifected ode is ioculated, / if there is a ode which is ioculated ad / if oe ode is ifected. I Table I are the trasitio probabilities, the cotributio to M (t + ) M (t), the set of actios, ad the cotributio to the total payoff. Case Trasitio proba. (θ, θ {, 2}). Mθ (t + ) Mθ (t) Actios Payoff cotrib. Pθ (t)δ P (,, )/ sigleto set / 2 Pθ (t)δmλ(p 2 θ (t) ) ( 2, 2, )/ {m, m} 3 A θ (t)δ A (,, )/ sigleto set / 4 A θ (t)β P θ (t) h θ +P θ (t) (,, )/ sigleto set 5 S (t)δ S (,, )/ sigleto set / 6 S (t)(δ eδ Sm + δ mηp (t)) (,, )/ {o, ō, m, m} / TABLE I PROBABILITIES, EFFECTS, ACTIONS AND PAYOFFS. The itesity, that is, the probability that oe arbitrary object does a trasitio i oe time slot is of the order of /. The drift, that is, the expected chage of M i oe time step, give the curret state of the system is: fθ (m) = E(M θ (t + ) M θ (t) M (t) = m) = p θ δ P 2p θ δmλ 2 pθ a θ β pθ h θ+p θ +s(δ e δ Sm +δ m η(p θ +p θ )) 2p θ δmλ 2 pθ a θ δ A + a θ β pθ h θ+p θ + sδ S p θ δ P + a θ δ A sδ S s(δ e δ Sm +δ m η(p θ +p θ )) where m = (p θ, p θ, a θ, a θ, s). The the limit f(m) is p θ δ P 2λp 2 θ δ2 m a θ β pθ h θ+p θ +s(δ e δ Sm +δ m η(p θ +p θ )) p θ δ P 2λp 2 θ δ m 2 a θ β p θ h θ +p θ +s(δ e δ Sm +δ m η(p θ +p θ )) 2λp 2 θ δ2 m a θ δ A + a θ β pθ h θ+p θ + sδ S 2λp 2 θ δ 2 m a θ δ A + a θ β p θ h θ +p θ + sδ S (p θ +p θ )δ P +(a θ +a θ )δ A 2sδ S 2s(δ e δ Sm +δ m η(p θ +p θ ))
18 A. Homogeeous system We briefly metio the homogeeous mea field. The drift is obtaied by computig the expected chages i oe time slot: f (m) = E(M (t + ) M (t) M (t) = m) = pδ P 2pδ mλ 2 p aβ p h+p + s(δ eδ Sm + δ m ηp) 2pδ mλ 2 p aδ A + aβ p h+p + sδ S pδ P + aδ A sδ S s(δ e δ Sm + δ m ηp) where m = (p, a, s). The the limit is f(m) = pδ P 2p 2 δ 2 mλ aβ p h+p + s(δ eδ Sm + δ m ηp) 2p 2 δ 2 mλ aδ A + aβ p h+p + sδ S pδ P + aδ A sδ S s(δ e δ Sm + δ m ηp) I all the simulatios, we kept this parameters uchaged: β = 2, δ A = 5 3 ad δ P = δ S = δ Sm = 4. O parameter h (the passive ode desity at which the ifectio of active odes proceeds at half of it maximum value) depeds the stability of the system. Here we set h = 2 i order to obtai a ustable behaviour. Regardig the cotrol parameters, δ m =, δ e = meas o cotrol. We ivestigate the evolutio of the system i the followig scearios: Trajectory of oe ru of the simulatio (figure ), Mea trajectory of multiple simulatios (figure 2), Mea field limit trajectory (figure 3), Trajectory of the payoff fuctio, Cotrolled mea field limit (figure 4), Trajectory of the heterogeeous malware propagatio (figure 6), Optimal cotrol uder the mea field limit (figure 7), Noisy mea field (stochastic path), (figure 8). These cofiguratios are aalyzed with cotrol ad without cotrol parameters. We observe that the time mea T T m(s)ds is covergig to the statioary poit iside the limit cycle. ) Ucotrolled behaviour: I figures,2,3 we ca see the simulatio results, usig the wellkow algorithm for exact simulatio of a discrete time Markov chai. The iitial cofiguratio is (.2,,.8). A oscillatig behaviour of the total reward ca be see Simulatio path, Particles = Trajectory Startig poit Time mea of trajectory Active Total reward Total Reward Time mea Passive Fig.. Left: Path of oe simulatio without cotrol. Right: Total reward. 2) Cotrolled behaviour: I order to illustrate the cotrol parameters, suppose we wat to keep the proportio of ifected odes below.9 for all times. Oe simple way to achieve this is to reduce the cotact tedecy of a passive ode. The, we set δ m =.75, which is the more relevat cotrol parameter. The results ca be see i figures 4 ad 5.
19 Mea simulatio trajectories, Replicatios =, Particles = Mea of trajectories Startig poit Time mea of trajectories.9.8 Total reward, Replicatios =, Particles = Active Total reward Total Reward Time mea Passive Fig. 2. Leftmost: Mea of multiple simulatio trajectories without cotrol. Rightmost: Total reward Mea field Mea Field Startig poit Time mea of trajectory.9.8 Active Total reward Total Reward Time mea Passive Fig. 3. Left: Mea field with o cotrol. Right: Total reward. B. Heterogeeous system We ivestigate umerically the behavior of the mea field limit for two types θ ad θ. I figure 6 we ca see that it is possible to stabilize the homogeeous system usig classes. C. Optimal strategy for the homogeeous system Sice the payoff fuctio is the same for all the players i.e. r j ( ) = r( ) the discouted stochastic game with commo payoff ca be trasformed i a team problem. Moreover, the set of actios is the same for all the players. I figure 7, we plot the optimal strategy obtaied by solvig the system i the fiite horizo case (equatio ()). The existece of a domiat strategy ca be observed i the plot. D. Noisy mea field I order to give some feelig o how the oisy mea field evolves with time, we show i figure 8 two differet realizatios, for = ad = 2. The variace of the oise is obtaied from σ = δ f f which its orm sup is bouded by 2Dλ, the we have d m = f( m)dt + σ dw t Note that the smooth versio of the last equatio ca be ṁ = f (m ) δ. It is worth to metio that, i this case, the simulatio algorithm is oly exact whe the umerical time step vaishes. I figure 9 we compare the classic mea field versus the mea trajectory of the oisy mea field. E. Cyclig behavior i IEEE 82. CSMA based Cogitive Networks The backoff process i IEEE 82. is govered by a Markovia decisio process if the duratio of perstage backoff is take ito accout: every ode i backoff state k attempts trasmissio with probability u k for every timeslot;
20 .9.8 Mea field Mea Field Startig poit Time mea of trajectory Total Reward Time mea Active Susceptible Active 6 4 Total reward Passive Passive Fig. 4. Leftmost: Mea field with cotrolled passive ode cotact tedecy. Rightmost: Total reward Susceptible Active prob.meet = Passive Fig. 5. Evolutio of the limit poit of the time mea, for δ m =,.25,..,.925,. if it succeeds, k chages to ; otherwise, k chages to (k + ) mod (K + ) where K is the idex of the maximum backoff state. The stadard Markov chai models, which have bee widely used i IEEE 82., very ofte lead to excessive complicatios. Recetly a mea field approach have bee proposed i []. The authors i [] have poit out that the validity of the Decouplig assumptio (odes idepedece at the asymptotic) should be justified, ot just by a simple fixed poit method but a deep study of the stability of the ordiary differetial equatio is eeded. This is because the existece ad uiqueess of a rest poit does ot implies that the dyamics coverges to this fixed poit. I this sectio we illustrate this statemet via a cycle limit behavior i a heterogeeous CSMA system uder cotrol parameters. To costruct a cyclig behavior of the mea field limit we follow the work of [2]. We.9.8 h =.5, h 2 = h =.9, h 2 =. Mea Field Startig poit Active Active Passive Passive Fig. 6. Left: Ustable mea field, for h =.5, h 2 =.5. Right: Stable mea field h =.9, h 2 =..
21 Fig. 7. Optimal strategy usig backward iductio for the homogeeous system. Noisy mea field, Particles = Noisy mea field, Particles = Active.5 Active Passive Passive Fig. 8. Left: Path of oe simulatio of the oisy mea field, for =, Right: = 2. show that the mea field limit depeds o the cotrol parameters ad the ucotrolled mea field limit system ca be suboptimal depedig o the performace metric. Hece, the cotrol parameters give ew isights ad help i uderstadig the behavior of the mea field limit. We cosider cogitive etwork with two classes of users: primary users ad secodary users (the classes are based o the differetiatio of cotetio widow or arbitratio iterframe space as doe i [2]). The primary users have high priority i the sese that they have reserved time slots for them but also they ca attempt a trasmissio i the other time slots (the commo time slots). Both class of the users ca access to the chael durig commo slots. The backoff state of the secodary users are suspeded durig the reserved time slots. Deote by the total size of the populatio of users ad by θ the size of the subpopulatio θ. Hece, θ θ =. The set of users i Mea field Mea of oisy trajectories, Replicatios =, Particles = Active.5 Active Passive Passive Fig. 9. Left: Classic mea field, Right: Mea of oisy trajectories.
MAXIMUM LIKELIHOODESTIMATION OF DISCRETELY SAMPLED DIFFUSIONS: A CLOSEDFORM APPROXIMATION APPROACH. By Yacine AïtSahalia 1
Ecoometrica, Vol. 7, No. 1 (Jauary, 22), 223 262 MAXIMUM LIKELIHOODESTIMATION OF DISCRETEL SAMPLED DIFFUSIONS: A CLOSEDFORM APPROXIMATION APPROACH By acie AïtSahalia 1 Whe a cotiuoustime diffusio is
More informationConsistency of Random Forests and Other Averaging Classifiers
Joural of Machie Learig Research 9 (2008) 20152033 Submitted 1/08; Revised 5/08; Published 9/08 Cosistecy of Radom Forests ad Other Averagig Classifiers Gérard Biau LSTA & LPMA Uiversité Pierre et Marie
More informationSUPPORT UNION RECOVERY IN HIGHDIMENSIONAL MULTIVARIATE REGRESSION 1
The Aals of Statistics 2011, Vol. 39, No. 1, 1 47 DOI: 10.1214/09AOS776 Istitute of Mathematical Statistics, 2011 SUPPORT UNION RECOVERY IN HIGHDIMENSIONAL MULTIVARIATE REGRESSION 1 BY GUILLAUME OBOZINSKI,
More informationCounterfactual Reasoning and Learning Systems: The Example of Computational Advertising
Joural of Machie Learig Research 14 (2013) 32073260 Submitted 9/12; Revised 3/13; Published 11/13 Couterfactual Reasoig ad Learig Systems: The Example of Computatioal Advertisig Léo Bottou Microsoft 1
More informationSOME GEOMETRY IN HIGHDIMENSIONAL SPACES
SOME GEOMETRY IN HIGHDIMENSIONAL SPACES MATH 57A. Itroductio Our geometric ituitio is derived from threedimesioal space. Three coordiates suffice. May objects of iterest i aalysis, however, require far
More informationStéphane Boucheron 1, Olivier Bousquet 2 and Gábor Lugosi 3
ESAIM: Probability ad Statistics URL: http://wwwemathfr/ps/ Will be set by the publisher THEORY OF CLASSIFICATION: A SURVEY OF SOME RECENT ADVANCES Stéphae Bouchero 1, Olivier Bousquet 2 ad Gábor Lugosi
More informationEverything You Always Wanted to Know about Copula Modeling but Were Afraid to Ask
Everythig You Always Wated to Kow about Copula Modelig but Were Afraid to Ask Christia Geest ad AeCatherie Favre 2 Abstract: This paper presets a itroductio to iferece for copula models, based o rak methods.
More informationSystemic Risk and Stability in Financial Networks
America Ecoomic Review 2015, 105(2): 564 608 http://dx.doi.org/10.1257/aer.20130456 Systemic Risk ad Stability i Fiacial Networks By Daro Acemoglu, Asuma Ozdaglar, ad Alireza TahbazSalehi * This paper
More informationThe Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs
Joural of Machie Learig Research 0 2009 22952328 Submitted 3/09; Revised 5/09; ublished 0/09 The Noparaormal: Semiparametric Estimatio of High Dimesioal Udirected Graphs Ha Liu Joh Lafferty Larry Wasserma
More informationWhich Extreme Values Are Really Extreme?
Which Extreme Values Are Really Extreme? JESÚS GONZALO Uiversidad Carlos III de Madrid JOSÉ OLMO Uiversidad Carlos III de Madrid abstract We defie the extreme values of ay radom sample of size from a distributio
More informationHOW MANY TIMES SHOULD YOU SHUFFLE A DECK OF CARDS? 1
1 HOW MANY TIMES SHOULD YOU SHUFFLE A DECK OF CARDS? 1 Brad Ma Departmet of Mathematics Harvard Uiversity ABSTRACT I this paper a mathematical model of card shufflig is costructed, ad used to determie
More informationCahier technique no. 194
Collectio Techique... Cahier techique o. 194 Curret trasformers: how to specify them P. Foti "Cahiers Techiques" is a collectio of documets iteded for egieers ad techicias, people i the idustry who are
More informationType Less, Find More: Fast Autocompletion Search with a Succinct Index
Type Less, Fid More: Fast Autocompletio Search with a Succict Idex Holger Bast MaxPlackIstitut für Iformatik Saarbrücke, Germay bast@mpiif.mpg.de Igmar Weber MaxPlackIstitut für Iformatik Saarbrücke,
More informationCrowds: Anonymity for Web Transactions
Crowds: Aoymity for Web Trasactios Michael K. Reiter ad Aviel D. Rubi AT&T Labs Research I this paper we itroduce a system called Crowds for protectig users aoymity o the worldwideweb. Crowds, amed for
More informationKernel Mean Estimation and Stein Effect
Krikamol Muadet KRIKAMOL@TUEBINGEN.MPG.DE Empirical Iferece Departmet, Max Plack Istitute for Itelliget Systems, Tübige, Germay Keji Fukumizu FUKUMIZU@ISM.AC.JP The Istitute of Statistical Mathematics,
More informationHow Has the Literature on Gini s Index Evolved in the Past 80 Years?
How Has the Literature o Gii s Idex Evolved i the Past 80 Years? Kua Xu Departmet of Ecoomics Dalhousie Uiversity Halifax, Nova Scotia Caada B3H 3J5 Jauary 2004 The author started this survey paper whe
More informationA General Multilevel SEM Framework for Assessing Multilevel Mediation
Psychological Methods 1, Vol. 15, No. 3, 9 33 1 America Psychological Associatio 18989X/1/$1. DOI: 1.137/a141 A Geeral Multilevel SEM Framework for Assessig Multilevel Mediatio Kristopher J. Preacher
More informationSoftware Reliability via RuTime ResultCheckig Hal Wasserma Uiversity of Califoria, Berkeley ad Mauel Blum City Uiversity of Hog Kog ad Uiversity of Califoria, Berkeley We review the eld of resultcheckig,
More informationPresent Values, Investment Returns and Discount Rates
Preset Values, Ivestmet Returs ad Discout Rates Dimitry Midli, ASA, MAAA, PhD Presidet CDI Advisors LLC dmidli@cdiadvisors.com May 2, 203 Copyright 20, CDI Advisors LLC The cocept of preset value lies
More informationON THE EVOLUTION OF RANDOM GRAPHS by P. ERDŐS and A. RÉNYI. Introduction
ON THE EVOLUTION OF RANDOM GRAPHS by P. ERDŐS ad A. RÉNYI Itroductio Dedicated to Professor P. Turá at his 50th birthday. Our aim is to study the probable structure of a radom graph r N which has give
More informationDryad: Distributed DataParallel Programs from Sequential Building Blocks
Dryad: Distributed DataParallel Programs from Sequetial uildig locks Michael Isard Microsoft esearch, Silico Valley drew irrell Microsoft esearch, Silico Valley Mihai udiu Microsoft esearch, Silico Valley
More informationSignal Reconstruction from Noisy Random Projections
Sigal Recostructio from Noisy Radom Projectios Jarvis Haut ad Robert Nowak Deartmet of Electrical ad Comuter Egieerig Uiversity of WiscosiMadiso March, 005; Revised February, 006 Abstract Recet results
More informationStatistica Siica 6(1996), 31139 EFFECT OF HIGH DIMENSION: BY AN EXAMPLE OF A TWO SAMPLE PROBLEM Zhidog Bai ad Hewa Saraadasa Natioal Su Yatse Uiversity Abstract: With the rapid developmet of moder computig
More informationTesting for Welfare Comparisons when Populations Differ in Size
Cahier de recherche/workig Paper 039 Testig for Welfare Comparisos whe Populatios Differ i Size JeaYves Duclos Agès Zabsoré Septembre/September 200 Duclos: Départemet d écoomique, PEP ad CIRPÉE, Uiversité
More informationRamseytype theorems with forbidden subgraphs
Ramseytype theorems with forbidde subgraphs Noga Alo Jáos Pach József Solymosi Abstract A graph is called Hfree if it cotais o iduced copy of H. We discuss the followig questio raised by Erdős ad Hajal.
More informationTeaching Bayesian Reasoning in Less Than Two Hours
Joural of Experimetal Psychology: Geeral 21, Vol., No. 3, 4 Copyright 21 by the America Psychological Associatio, Ic. 963445/1/S5. DOI: 1.7//963445..3. Teachig Bayesia Reasoig i Less Tha Two Hours Peter
More informationFederal Reserve Bank of New York Staff Reports
Federal Reserve Bak of New York Staff Reports Crime, House Prices, ad Iequality: The Effect of UPPs i Rio Claudio Frischtak Bejami R. Madel Staff Report o. 542 Jauary 2012 This paper presets prelimiary
More informationWork Placement in ThirdLevel Programmes. Edited by Irene Sheridan and Dr Margaret Linehan
Work Placemet i ThirdLevel Programmes Edited by Iree Sherida ad Dr Margaret Lieha Work Placemet i ThirdLevel Programmes Edited by Iree Sherida ad Dr Margaret Lieha The REAP Project is a Strategic Iovatio
More informationThe Unicorn, The Normal Curve, and Other Improbable Creatures
Psychological Bulleti 1989, Vol. 105. No.1, 156166 The Uicor, The Normal Curve, ad Other Improbable Creatures Theodore Micceri 1 Departmet of Educatioal Leadership Uiversity of South Florida A ivestigatio
More informationJ. J. Kennedy, 1 N. A. Rayner, 1 R. O. Smith, 2 D. E. Parker, 1 and M. Saunby 1. 1. Introduction
Reassessig biases ad other ucertaities i seasurface temperature observatios measured i situ sice 85, part : measuremet ad samplig ucertaities J. J. Keedy, N. A. Rayer, R. O. Smith, D. E. Parker, ad M.
More information