Notes on Mean Field Games

Transcription

1 Notes on Mean Field Games (from P.-L. Lions lectures at Collège de France) Pierre Cardaliaguet January 5, 202 Contents Introduction 2 2 Nash equilibria in games with a large number of players 4 2. Symmetric functions of many variables Limits of Nash equilibria in pure strategies Limit of Nash equilibria in mixted strategies A uniqueness result Example: potential games Comments Analysis of second order MFEs 0 3. On the Fokker-Planck equation Proof of the existence Theorem Uniqueness Application to games with finitely many players Comments Analysis of first order MFEs 8 4. Semi-concavity estimates On the continuity equation Proof of the existence Theorem The vanishing viscosity limit Comments The space of probability measures The Monge-Kantorovich distances The Wasserstein space of probability measures on R d Polynomials on P(Q) Hewitt and Savage Theorem Comments CEREMADE, UMR CNRS 7534, Université de PARIS - DAUPHINE, Place du Maréchal De Lattre De Tassigny PARIS CEDE 6 - FRANCE. [email protected] Lecture given at Tor Vergata, April-May 200. The author wishes to thank the University for its hospitality and INDAM for the kind invitation. These notes are posted with the authorization of Pierre-Louis Lions.

2 6 Hamilton-Jacobi equations in the space of probability measures Derivative in the Wasserstein space First order Hamilton-Jacobi equations Comments Heuristic derivation of the mean field equation The differential game Derivation of the equation in P From the equation in P 2 to the mean field equation Appendix Nash equilibria in classical differential games Desintegration of a measure Ekeland s and Stegall s variational principles Introduction Mean field game theory is devoted to the analysis of differential games with a (very) larger number of small players. By small player, we mean a player who has very little influence on the overall system. This theory has been recently developed by J.-M. Lasry and P.-L. Lions in a series of papers [42, 43, 44, 45] and presented though several lectures of P.-L. Lions at the Collège de France. Its name comes from the analogy with the mean field models in mathematical physics which analyses the behavior of many identical particles (see for instance Sznitman s notes [52]). Related ideas have been developed independently, and at about the same time, by Huang-Caines-Malhamé [32, 33, 34, 35]. The aim of these notes is to present in a simplified framework some of the ideas developed in the above references. It is not our intention to give a full picture of this fast growing area: we will only select a few topics, but will try to provide an approach as self content as possible. We strongly advise the interested reader to go back to the original works by J.-M. Lasry and P.-L. Lions for further and sharper results. Let us also warn the reader that these note only partially reflect the state of the art by 2008: for lack of time we did not cover the lectures of Pierre-Louis Lions at Collège de France for the years 2009 and 200. The typical model for Mean Field Games (FMG) is the following system (i) t u ν u + H(x, m, Du) = 0 in R d (0, T ) (ii) t m ν m div (D p H(x, m, Du)m) = 0 in R d (0, T ) (iii) m(0) = m 0, u(x, T ) = G(x, m(t )) () In the above system, ν is a nonnegative parameter. The first equation has to be understood backward in time and the second on is forward in time. There are two crucial structure conditions for this system: the first one is the convexity of H = H(x, m, p) with respect to the last variable. This condition implies that the first equation (a Hamilton-Jacobi equation) is associated with an optimal control problem. This first equation shall be the value function associated with a 2

3 typical small player. The second structure condition is that m 0 (and therefore m(t)) is (the density of) a probability measure. The heuristic interpretation of this system is the following. An average agent controls the stochastic differential equation d t = α t dt + 2νB t where (B t ) is a standard Brownian motion. He aims at minimizing the quantity [ T E 0 ] 2 L( s, m(s), α s )ds + G( T, m(t )) where L is the Fenchel conjugate of H with respect to the p variable. Note that in this cost the evolution of the measure m s enters as a parameter. The value function of our average player is then given by (-(i)). His optimal control is at least heuristically given in feedback form by α (x, t) = D p H(x, m, Du). Now, if all agents argue in this way, their repartition will move with a velocity which is due, on the one hand, to the diffusion, and, one the other hand, on the drift term D p H(x, m, Du). This leads to the Kolmogorov equation (-(ii)). The mean field game theory developed so far has been focused on two main issues: first investigate equations of the form () and give an interpretation (in economics for instance) of such systems. Second analyze differential games with a finite but large number of players and link their limiting behavior as the number of players goes to infinity and equation (). So far the first issue is well understood and well documented. The original works by Lasry and Lions give a certain number of conditions under which equation () has a solution, discuss its uniqueness and its stability (see also Huang-Caines-Malhamé [32, 33, 34, 35]). Several papers also study the numerical approximation of the solution of (): see Achdou and Capuzzo Dolcetta [], Achdou, Camilli and Capuzzo Dolcetta [2], Gomes, Mohr and Souza [26], Lachapelle, Salomon and Turinici [40]. The mean field games theory seem also paticularly adapted to modelize problems in economics: see Guéant [28], [29], Lachapelle [39], Lasry, Lions, Guéant [46], and the references therein. As for the second part of the program, the limiting behavior of differential games when the number of players tend to infinity has been understood only for ergodic differential games [45]. The general case remains largely open. The largest part of this paper is dedicated to the first issue, and we only consider the second one in an oversimplified framework. These notes are organized as follows. We first study as a toy example classical (one-shot) games with a very large number of identical players: this allows to present some typical phenomena for functions of many variables. We start the analysis of mean field games with the second order case (i.e. when ν = ). If we assume (to fix the ideas) that F and G are regularizing, then existence of a solution of () is fairly easy. As a byproduct, we provide an interpretation of the mean-field system for a large (but finite) number of players. Then we turn to first order mean field equation (ν = 0): in this case existence of a solution is more involved and strongly relies on the regularizing properties of F and G. Then we summarize some typical results on the space of probability measures needed for the rest of the presentation. The end of the notes is devoted on the one hand to an approach of Hamilton-Jacobi in the Wasserstein space and, on another hand, to some heuristic derivation of the mean field equation from a system of Nash equilibria for a large number of players., 3

4 2 Nash equilibria in games with a large number of players Before starting the analysis of differential games with a large number of players, it is not uninteresting to have a look at this question for classical (one-shot) games. The general framework is the following: let N be a (large) number of players. We assume that the players are symmetric. In particular, the set of strategies Q is the same for all players. We denote by Fi N = Fi N (x,..., x N ) the payoff (= the cost) of player i {,..., N}. Our symmetry assumption means that F N σ(i) (x σ(),..., x σ(n) ) = F i (x,..., x N ) for all permutation σ on {,..., N}. We consider Nash equilibria for this game and want to analyse their behavior as N +. For this we first describe the limit of maps of many variable. We proceed with the analysis of the limit of Nash equilibria in pure, and then in mixed, strategies. We finally discuss the uniqueness of the solution of the limit equation and present some examples. 2. Symmetric functions of many variables Let Q be a compact metric space and u N : Q N R be a symmetric function: u N (x,..., x N ) = u N (x σ(),..., x σ(n) ) for any permutation σ on {,..., n}. Our aim is to define a limit for the u N. For this let us introduce the set P(Q) of Borel probability measures on Q. This set is endowed with the topology of weak-* convergence: a sequence (m N ) of P(Q) converges to m P(Q) if lim ϕ(x)dm N (x) = ϕ(x)dm(x) ϕ C 0 (Q). N Q Q Let us recall that P(Q) is a compact metric space for this topology, which can be metrized by the distance (often called the Kantorowich-Rubinstein distance) d (µ, ν) = sup{ fd(µ ν) where f : Q R is Lipschitz continuous}. Q Other formulations for this distance will be given later (section 5). In order to show that the (u N ) have a limit, we assume the following:. (Uniform bound) there is some C > 0 with u N L (Q) C (2) 2. (Uniform continuity) there is a modulus of continuity ω independant of n such that u N () u N (Y ) ω(d (m N, m N Y )), Y Q N, n N, (3) where m N = n N i= δ x i and m N Y = n N i= δ y i if = (x,..., x N ) and Y = (y,..., y N ). Theorem 2. If the u N are symmetric and satisfy (2) and (3), then there is a subsequence (u nk ) of (u N ) and a continuous map U : P(Q) R such that lim sup k + Q n k u nk () U(m n k ) = 0. 4

5 Proof of Theorem 2.: Without loss of generality we can assume that the modulus ω is concave. Let us define the sequence of maps U N : P(Q) R by U N (m) = { inf un () + ω(d (m N, m)) } m P(Q). Q N Then, by condition (3), U N (m N ) = u N() for any Q N. Let us show that the U N have ω for modulus of continuity on P(Q): indeed, if m, m 2 P(Q) and if Q N is ɛ optimal in the definition of U N (m 2 ), then U N (m ) u N () + ω(d (m N, m )) U N (m 2 ) + ɛ + ω(d (m N, m 2) + d (m, m 2 )) ω(d (m N, m 2)) U N (m 2 ) + ω(d (m, m 2 )) + ɛ because ω is concave. Hence the U N are equicontinuous on the compact set P(Q) and uniformly bounded. We complete the proof thanks to Ascoli Theorem. Remark 2.2 Some uniform continuity condition is needed: for instance if Q is a compact subset of R d and u N () = max i x i, then u N converges to U(m) = sup x spt(m) x which is not continuous. Of course the convergence is not uniform. Remark 2.3 If Q is a compact subset of some finite dimensional space R d, a typical condition which ensures (3) is the existence of a constant C > 0, independent of N, such that sup i=,...,n D xi u N C N N. 2.2 Limits of Nash equilibria in pure strategies Let Q be a compact metric space and P(Q) be the set of Borel probability measures on Q. We consider a one-shot game with a large number N of players. Our main assumption is that the payoffs F N,..., F N N of the players are symmetric. In particular, under suitable bounds and uniform continuity, we know from Theorem 2. that the Fi N have a limit, which has the form F (x, m) (the dependence on x is here to keep track of the fact of the dependence in i of the function Fi N ). So the payoffs of the players are very close to payoffs of the form F (x, N j 2 δ x j ),..., F (x N, N j N δ x j ). In order to keep the presentation as simple as possible, we suppose that the payoffs have already this form. That is, we suppose that there is a continuous map F : Q P(Q) R such that, for any i {,..., N} Fi N (x,..., x N ) = F x i, N j i δ xj (x,..., x N ) Q N. Let us recall that a Nash equilibrium for the game (F N,..., F N N ) is an element ( xn,..., xn N ) Q N such that Fi N ( x N,..., x N i, y i, x N i+,..., x N N) Fi N ( x N,..., x N N) y i Q. We set N = ( x N,..., x N N) and m N = N N i= δ x N i. 5

6 Theorem 2.4 Assume that, for any N, N = ( x N,..., xn N ) is a Nash equilibrium for the game F N,..., F N N. Then up to a subsequence, the sequence of measures ( mn ) converges to a measure m P(Q) such that F (y, m)d m(y) = inf F (y, m)dm(y). (4) Q m P(Q) Q Remark 2.5 The mean field equation (4) is equivalent to saying that the support of m is contained in the set of minima of F (y, m). Indeed, if Spt m arg min y Q F (y, m), then clearly m satisfies (4). Conversely, if (4) holds, then choosing m = δ x shows that Q F (y, m)d m(y) F (x, m) for any x Q. Therefore Q F (y, m)d m(y) min x Q F (x, m), which implies that m is supported in arg min y Q F (y, m). Remark 2.6 The result is not completely satisfying because it requires the existence of Nash equilibria in the N player game, which does not always hold. However there always exists Nash equilibria in mixed strategies, i.e., when the player are allowed to randomize their behavior by playing strategies in P(Q) instead of Q. We discuss this issue below. Proof : Without loss of generality we can assume that the sequence ( m N ) converges to some m. Let us check that m satisfies (4). For this we note that, by definition, the measure δ x N is a minimum of the problem i inf F (y, δ x N )dm(y). m P(Q) Q N j Since d N j i δ x N, m N 2 j N, and since F is continuous, the measure δ x N is also ɛ optimal for the problem i inf F (y, m N )dm(y) m P(Q) Q j i as soon as N is sufficiently large. By linearity, so is m N : F (y, m N )d m N (y) F (y, m N )dm(y) + ɛ. Letting N + gives the result. Q inf m P(Q) Q 2.3 Limit of Nash equilibria in mixted strategies We now assume that the players play the same game F N,..., F N N as before, but there are allowed to play in mixed strategies, i.e., they minimize over elements of P(Q) instead of minimizing over elements of Q (which are now viewed as pure strategies). If the players play the mixed strategies π,..., π N P(Q), then the outcome of Player i (still denoted, by abuse of notation, FN i ) is FN(π i,..., π N ) = FN(x i,..., x N )dπ (x )... dπ N (x N ), Q N 6

7 or, recalling the definition of F N,..., F N N, FN(π i,..., π N ) = x i, N Q N F j i δ xj dπ (x )... dπ N (x N ). The notion of Nash equilibria in mixted strategies can be defined as before: ( π,..., π N ) (P(Q)) N is a Nash equilibrium if, for any i {,..., N}, F i N( π,..., π N ) F N i (( π j ) j i, π i ) π i P(Q). Note that the above inequality is equivalent to F i N( π,..., π N ) F N i (( π j ) j i, x i ) x i Q. Nash Theorem states that Nash equilibria in mixted strategies do exist (see Theorem 8.3 below). In fact, because of the special struture of the game, there also exists symmetric Nash equilibria, i.e., equilibria of the form ( π,..., π), where π P(Q) (Theorem 8.4). Theorem 2.7 We assume that F is Lipschitz continuous. Let, for any N, ( π N,..., π N ) be a symmetric Nash equilibrium in mixed strategies for the game F N,..., F N N. Then, up to a subsequence, ( π N ) converges to a measure m satisfying (4). Remark 2.8 In particular the above Theorem proves the existence of a solution to the mean field equation (4). Proof: Let m be a limit, up to subsequences, of the ( π N ). Since the map x j F (y, N is Lip(F )/(N ) Lipschitz continuous, we have, by definition of the distance d, F (y, δ xj ) d π N (x j ) F (y, δ xj ) d m(x j ) Q N N j i Q N N j i j i (Lip(F ) + F )d ( π N, m) y Q. j i j i δ x j ) A direct application of the Hewitt and Savage Theorem (see Theorem 5.0 below) gives lim F (y, δ xj ) d m(x j ) = F (y, m), (6) N + Q N N j i j i where the convergence is uniform with respect to y Q thanks to the (Lipschitz) continuity of F. Since ( π,..., π N ) is a Nash equilibrium, inequality (5) implies that, for any ɛ > 0 and if we can choose N large enough, Q N F (y, N δ xj ) d m(x j )d m(x i ) F (y, j i Q N j i N (5) δ xj ) d m(x j )dm(x i ) + ɛ, j i for any m P(Q). Letting N + on both sides of the inequality gives, in view of (6), F (x i, m)d m(x i ) F (x i, m)dm(x i ) + ɛ m P(Q), Q Q j i which gives the result, since ɛ is arbitrary. 7

8 2.4 A uniqueness result One obtains the full convergence of the measure m N (or π N ) if there is a unique measure m satisfying the condition (4). This is the case under the following assumption: Proposition 2.9 Assume that F satisfies (F (y, m ) F (y, m 2 ))d(m m 2 )(y) > 0 m m 2. (7) Q Then there is at most one measure satisfying (4). Remark 2.0 Requiring at the same time the continuity of F and the above monotonicity condition seems rather restrictive for applications. Condition (7) is more easily fulfilled for mapping defined on strict subsets of P(Q). For instance, if Q is a compact subset of R d of positive measure and P ac (Q) is the set of absolutely continuous measures on Q (absolutely continuous with respect to the Lebesgue measure), then { G(m(y)) if m Pac (Q) F (y, m) = + otherwise satisfies (7) as soon as G : R R is continuous and increasing. If we assume that Q is the closure of some smooth open bounded subset Ω of R d, another example is given by { um (y) if m P F (y, m) = ac (Q) L 2 (Q) + otherwise where u m is the solution in H (Q) of { um = m in Ω u m = 0 on Ω Note that in this case the map y F (y, m) is continuous. Proof of Proposition 2.9: Q Let m, m 2 satisfying (4). Then F (y, m )d m (y) F (y, m )d m 2 (y) Q and F (y, m 2 )d m 2 (y) = Therefore Q Q Q F (y, m 2 )d m (y). (F (y, m ) F (y, m 2 ))d( m m 2 )(y) 0, which implies that m = m 2 thanks to assumption (7). 8

9 2.5 Example: potential games The heuristic idea is that, if F (x, m) can somehow be represented as the derivative of some mapping Φ(x, m) with respect to the m variable, and if the problem Φ(x, m)dx inf m P(Q) Q has a minimum m, then Q Φ (x, m)(m m) 0 m P(Q). So Q F (x, m)dm Q F (x, m)d m m P(Q), which shows that m is an equilibrium. For instance let us assume that { V (x)m(x) + G(m(x)) if m Pac (Q) F (x, m) = + otherwise where V : Q R is continuous and G : (0, + ) R is continuous, strictly increasing, with G(0) = 0 and G(s) cs for some c > 0. Then let Φ(x, m) = V (x)m(x) + H(m(x)) if m is a.c. where H is a primitive of G with H(0) = 0. Note that G is strictly convex with G(s) c 2 s2 ds. Hence the problem inf V (x)m(x) + H(m(x)) dx m P ac(q) Q has a unique solution m L 2 (Q). Then we have, for any m P ac (Q), (V (x) + G( m(x)))m(x)dx (V (x) + G( m(x))) m(x)dx, Q so that m satisfies (a slightly modified version of) the mean field equation (4). In particular, we have V (x) + G(m(x)) = min y V (y) + G( m(y)) for any x Spt( m). Let us set λ = min y V (y) + G( m(y)). Then m(x) = G ((λ V (x)) + ) For instance, if we plug formally Q = R d, V (x) = x 2 2 and G(s) = log(s) into the above equality, we get m(x) = e x 2 /2 /(2π) d/ Comments There is a huge literature on games with a continuum of players, starting from the seminal work by Aumann [7]. See for instance the paper by Carmona [4] for recent results and overview. Here we only consider the simplest framework of games with identical players. Q 9

10 3 Analysis of second order MFEs Let us start our investigation of Mean Field Games with the case of second order equations. (i) t u u + 2 Du 2 = F (x, m) in R d (0, T ) (ii) t m m div (m Du) = 0 in R d (0, T ) (8) (iii) m(0) = m 0, u(x, T ) = G(x, m(t )) in R d Our aim is to prove the existence of classical solutions for this system and give some interpretation in terms of game with finitely many players. For this our main assumption is that F and G are regularizing on the set of probability measures on R d. To make our life simple, we assume that all measures considered in this section have a finite first order moment: let P be the set of such Borel probability measures m on R d such that R x dm(x) < +. The set P d can be endowed with following (Kantorovitch- Rubinstein) distance: [ ] d (µ, ν) = inf x y dγ(x, y) (9) γ Π(µ,ν) R 2d where Π(µ, ν) is the set of Borel probability measures on R 2d such that γ(a R d ) = µ(a) and γ(r d A) = ν(a) for any Borel set A R d. This distance is directly related with the one introduced in the previous section (see section 5). Here are our main assumptions on F, G and m 0 : we suppose that there is some constant C 0 such that. (Bounds on F and G) F and G are uniformly bounded by C 0 over R d P. 2. (Lipschitz continuity of F and G) F (x, m ) F (x 2, m 2 ) C 0 [ x x 2 + d (m, m 2 )] (x, m ), (x 2, m 2 ) R d P. and G(x, m ) G(x 2, m 2 ) C 0 [ x x 2 + d (m, m 2 )] (x, m ), (x 2, m 2 ) R d P. 3. The probability measure m 0 is absolutely continuous with respect to the Lebesgue measure, has a Hölder continuous density (still denoted m 0 ) which satisfies R d x 2 m 0 (x)dx < +. A pair (u, m) is a classical solution to (8) if u, m : R d [0, T ] R are continuous, of class C 2 in space and C in time and (u, m) satisfies (8) in the classical sense. The main result of this section is the following: Theorem 3. Under the above assumptions, there is at least one classical solution to (8). The proof is relatively easy and relies on basic estimates for the heat equation as well as some remarks on the Fokker-Planck equation (8-(ii)). 3. On the Fokker-Planck equation Let b : R d [0, T ] R be a given vector field. Our aim is to analyse the Fokker-Planck equation { t m m div (m b) = 0 in R d (0, T ) m(0) = m 0 (0) as an evolution equation is the space of probability measures. We assume here that the vector field b : R d [0, T ] R d is continuous, uniformly Lipschitz continuous in space, and bounded. 0

11 Definition 3.2 (Weak solution to (0)) We say that m is a weak solution to (0) if m L ([0, T ], P ) is such that, for any test function ϕ C c (R d [0, T )), we have R d φ(x, 0)dm 0 (x) + T 0 R d ( t ϕ(x, t) + ϕ(x, t) + Dϕ(x, t), b(x, t) ) dm(t)(x) = 0. In order to analyse some particular solutions of (0), it is convenient to introduce the following stochastic differential equation (SDE) { dt = b( t, t)dt + 2dB t, t [0, T ] 0 = Z 0 () where (B t ) is a standard d dimensional Brownian motion over some probability space (Ω, A, P) and where the initial condition Z 0 L (Ω) is random and independent of (B t ). Under the above assumptions on b, there is a unique solution to (). This solution is closely related to equation (0): Lemma 3.3 If L(Z 0 ) = m 0, then m(t) := L( t ) a weak solution of (0). Proof : This is a straightforward consequence of Itô s formula, which says that, if ϕ : R d [0, T ] R is bounded, of class C 2 in space and C in time, then ϕ( t, t) = ϕ(z 0, 0) + t 0 [ϕ t ( s, s) + Dϕ( s, s), b( s, s) + ϕ( s, s)] ds + Taking the expectation on both sides of the equality, we have, since [ t ] E Dϕ( s, s), db s = 0 0 t 0 Dϕ( s, s), db s. because t t 0 Dϕ( s, s), db s is a martingale, [ t ] E [ϕ( t, t)] = E ϕ(z 0, 0) + [ϕ t ( s, s) + Dϕ( s, s), b( s, s) + ϕ( s, s)] ds 0 So by definition of m(t), we get ϕ(x, t)dm(t)(x) = ϕ(x, 0)dm 0 (x) R d R d t + [ϕ t (x, s) + Dϕ(x, s), b(x, s) + ϕ(x, s)] dm(s)(x) ds, 0 R d i.e., m is a weak solution to (0). This above interpretation of the continuity equation allows to get very easily some estimates on the map t m(t) in P 2. Lemma 3.4 Let m be defined as above. There is a constant c 0 = c 0 (T ), such that d (m(t), m(s)) c 0 ( + b ) t s 2 s, t [0, T ]..

12 Proof : Recalling the definition of d we note that the law γ of the pair ( t, s ) belongs to Π(m(t), m(s)), so that d (m(t), m(s)) x y dγ(x, y) = E [ t s ]. R 2d Therefore, if for instance s < t, [ t E [ t s ] E b( τ, τ) dτ + ] 2 B t B s s b (t s) + 2(t s) Moreover we also obtain some integral estimates: Lemma 3.5 There is a constant c 0 = c 0 (T ) such that x 2 dm(t)(x) c 0 ( x 2 dm 0 (x) + + b 2 ) t [0, T ]. R d R d Proof : Indeed: x 2 dm(t)(x) = E [ [ t 2] 2E 0 2 t ] 2 + b( τ, τ)dτ + 2 B t 2 R d 0 [ ] 2 x 2 dm 0 (x) + t 2 b 2 + 2t R d 3.2 Proof of the existence Theorem Before starting the proof of Theorem 3., let us recall some basic existence/uniqueness result for the heat equation { wt w + a(x, t), Dw + b(x, t)w = f(x, t) in R d [0, T ] w(x, 0) = w 0 (x) in R d (2) For this it will be convenient to denote by C s+α (for an integer s 0 and α (0, )) the set of maps z : R d [0, T ] R such that the derivatives k t D l xz exist for any pair (k, l) with 2k + l s and such that these derivatives are bounded and α Hölder continuous in space and (α/2) Hölder continuous in time. If we assume that, for some α (0, ), a : R d [0, T ] R, b, f : R d [0, T ] R and w 0 : R d R belong to C α, then the above heat equation is has a unique weak solution. Furthermore this solution belongs to C 2+α (Theorem 5. p. 320 of [4]). We will also need the following interior estimate (Theorem. p. 2 of [4]): if a = b = 0 and f is continuous and bounded, any classical, bounded solution w of (2) satisfies, for any compact set K R d (0, T ), D x w(x, t) D x w(y, s) sup (x,t),(y,s) K x y β + t s β/2 C(K, w ) f, (3) where β (0, ) depends only on the dimension d while C(K, w ) depends on the compact set K, on w and on d. 2

13 Let C be a large constant to be chosen below and C be the set of maps µ C 0 ([0, T ], P ) such that d (µ(s), µ(t)) sup C s t t s (4) 2 and sup x 2 dm(t)(x) C. t [0,T ] R d Then C is a convex closed subset of C 0 ([0, T ], P ). It is actually compact, because the set of probability measures m for which R d x 2 dm(x) C is finite, is compact in P (see Lemma 5.7). To any µ C we associate m = Ψ(µ) C in the following way: Let u be the unique solution to { t u u + 2 Du 2 = F (x, µ(t)) in R d (0, T ) u(x, T ) = G(x, µ(t )) in R d (5) Then we define m = Ψ(µ) as the solution of the Fokker-Planck equation { t m m div (m Du) = 0 in R d (0, T ) m(0) = m 0 in R d (6) Let us check that Ψ is well-defined and continuous. To see that a solution to (5) exists and is unique, we use the Hopf-Cole transform: setting w = e u/2 we easily check that u is a solution of (5) if and only if w is a solution of the linear (backward) equation { t w w = wf (x, µ(t)) in R d (0, T ) G(x,µ(T ))/2 w(x, T ) = e in R d Note that the maps (x, t) F (x, m(t)) and x e G(x,µ(T ))/2 belong to C /2, because µ satisfies (4) and from our assumptions on F and G. Therefore the above equation is uniquely solvable and the solution belongs to C 2+α with α = 2, which in turn implies the unique solvability of (5) with a solution u which belongs to C 2+α. Recall that the maps x F (x, µ(t)) and x G(x, µ(t )) are bounded by C 0, so that a straightforward application of the comparison principle implies that u is bounded by ( + T )C 0. In the same way, since moreover the maps x F (x, µ(t)) and x G(x, µ(t )) are C 0 Lipschitz continuous (again by our assumptions on F and G), u is also C 0 Lipschitz continous. Hence Du is bounded by C 0. Next we turn to the Fokker-Planck equation (6), that we write into the form t m m Dm, Du(x, t) m u(x, t) = 0. Since u C 2+α, the maps (x, t) Du(x, t) and (x, t) u(x, t) belong to C α, so that this equation is uniquely solvable and the solution m belongs to C 2+α. Moreover, in view of the discussion of subsection 3., we have the following estimates on m: and d (m(t), m(s)) c 0 ( + C 0 ) t s 2 s, t [0, T ] R d x 2 dm(t)(x) c 0 ( + C 2 0) t [0, T ], where c 0 depends only on T. So if we choose C = max{c 0 ( + C 0 ), c 0 ( + C0 2 )}, m belongs to C. 3

14 We have just proved that the mapping Ψ : µ m = Ψ(µ) is well-defined. Let us check that it is continuous. Let µ n C converge to some µ. Let (u n, m n ) and (u, m) be the corresponding solutions. Note that (x, t) F (x, µ n (t)) and x G(x, µ n (T )) locally uniformly converge to (x, t) F (x, µ(t)) and x G(x, µ(t )). Then one gets the local uniform convergence of (u n ) to u by standard arguments (of viscosity solutions for instance). Since the (D x u n ) are uniformly bounded, the (u n ) solve an equation of the form t u n u n = f n where f n = 2 D xu n 2 F (x, m n ) is uniformly bounded in x and n. Then the interior regularity result (3) implies that (D x u n ) is locally uniformly Hölder continuous and therefore locally uniformly converges to D x u. This easily implies that any converging subsequence of the relatively compact sequence m n is a weak solution of (6). But m is the unique weak solution of (6), which proves that (m n ) converges to m. We conclude by Schauder fixed point Theorem that the continuous map µ m = Ψ(µ) has a fixed point in C. Then this fixed point is a solution of our system (8). 3.3 Uniqueness Let us assume that, besides assumptions given at the beginning of the section, the following conditions hold: R d (F (x, m ) F (x, m 2 ))d(m m 2 )(x) > 0 m, m 2 P, m m 2 (7) and R d (G(x, m ) G(x, m 2 ))d(m m 2 )(x) 0 m, m 2 P. (8) Theorem 3.6 Under the above conditions, there is a unique classical solution to the mean field equation (8). Proof : Before starting the proof, let us notice that we can use as a test function for m any map w which is of class C 2 : indeed, the result follows easily by regularizing and truncating w by cut-off functions of the form φ ɛ (t)ψ R (x) where if t T ɛ if x R φ ɛ (t) = + (T ɛ t)/ɛ if T ɛ t T and ψ R (x) = R + x if R x R + 0 if t T 0 if x R + Let now consider (u, m ) and (u 2, m 2 ) two classical solutions of (8). We set ū = u u 2 and m = m m 2. Then while t ū ū + 2 ( Dx u 2 D x u 2 2) (F (x, m ) F (x, m 2 )) = 0 (9) t m m div (m D x u m 2 D x u 2 ) = 0. Let us use ū as a test function in the second equation. Since ū is C 2 we have (recall the remark at the begining of the proof) T T ( mū)(t ) + m 0 ū(0) + ( t ū + ū) d m D x ū, m D x u m 2 D x u 2 = 0 R d R d 0 R d 0 R d 4

15 Let us multiply equality (9) by m, integrate over R d (0, T ) and add to the previous equality. We get, after simplification and using that m(0) = 0, m(t ) (G(m (T )) G(m 2 (T ))) R d T ( m ( + Dx u 2 D x u 2 2) m (F (x, m ) F (x, m 2 )) D x ū, m D x u m 2 D x u 2 ) = 0. 0 R d 2 Let us recall that and also note that so that m 2 R d m(t ) (G(m (T )) G(m 2 (T ))) 0, ( Dx u 2 D x u 2 2) D x ū, m D x u m 2 D x u 2 = m 2 D xu D x u 2 2, T 0 R d m (F (x, m ) F (x, m 2 )) 0. In view of our assumptions, this implies that m = 0 and, therefore, that ū = 0 since u and u 2 solve the same equation. 3.4 Application to games with finitely many players Before starting the discussion of games with a large number of players, let us fix a solution (u, m) of the mean field equation (8) and investigate the optimal strategy of a generic player who considers the density m of the other players as given. He faces the following minimization problem inf α [ T J (α) where J (α) = E 0 ] 2 α s 2 + F ( s, m(s)) ds + G ( T, m(t )) In the above formula, t = 0 + t 0 α sds + 2B s, 0 is a fixed random intial condition with law m 0 and the control α is adapted to some filtration (F t ). We assume that (B t ) is an d dimensional Brownian motion adapted to (F t ) and that 0 and (B t ) are independent. We claim that the feedback strategy ᾱ(x, t) := D x u(x, t) is optimal for this optimal stochastic control problem. Lemma 3.7 Let ( t ) be the solution of the stochastic differential equation { d t = ᾱ( t, t)dt + 2dB t 0 = 0. and α(t) = ᾱ( t, t). Then inf α J (α) = J ( α) = R N u(x, 0) dm 0 (x). Proof : This kind of result is known as a verification Theorem: one has a good candidate for an optimal control, and one checks, using the equation satisfied by the value function u, that 5

16 this is indeed the minimum. Let α be an adapted control. We have, from Itô s formula, E[G( T, m(t ))] = E[u( T, T )] [ T ] = E u( 0, 0) + ( t u( s, s) + α s, D x u( s, s) + u( s, s)) ds [ 0 T = E u( 0, 0) + ( ] 0 2 D xu( s, s) 2 + α s, D x u( s, s) F ( s, m(s))) ds [ T E u( 0, 0) + ( ] 2 α s 2 F ( s, m(s))) ds 0 This shows that E [u( 0, 0)] J (α) for any adapted control α. If we replace α by α in the above computations, then, since the process ( t ) becomes ( t ), the above inequalities are all equalities. So E [u( 0, 0)] = J (ᾱ) and the result is proved. We now consider a differential game with N players which consists in a kind of discrete version of the mean field game. In this game player i (i =,..., N) is controlling through his control α i a dynamics of the form d i t = α i tdt + 2dB i t (20) where (Bt) i is a d dimensional brownian motion. The initial condition 0 i for this system is also random and has for law m 0. We assume that the all 0 i and all the brownian motions (Bt) i (i =,..., N) are independent. However player i can choose his control α i adapted to the filtration (F t = σ( j 0, Bj s, s t, j =,..., N}). His payoff is then given by J N i (α,..., α N ) T = E 2 αi s 2 + F s, i N 0 j i δ j s ds + G T i, N δ j T j i Our aim is to explain that the strategy given by the mean field game is suitable for this problem. More precisely, let (u, m) be one classical solution to (8) and let us set ᾱ(x, t) = D x u(x, t). With the closed loop strategy ᾱ one can associate the open-loop control α i obtained by solving the SDE d i t = ᾱ( i t, t)dt + 2dB i t (2) with random initial condition i 0 and setting αi t = ᾱ( i t, t). Note that this control is just adapted to the filtration (F i t = σ( i 0, Bi s, s t}), and not to the full filtration (F t ) defined above. Theorem 3.8 For any ɛ > 0, there is some N 0 such that, if N N 0, then the symmetric strategy ( α,..., α N ) is an ɛ Nash equilibrium in the game J N,..., J N N : Namely J N i ( α,..., α N ) Ji N (( α j ) j i, α) + ɛ for any control α adapted to the filtration (F t ) and any i {,..., N}. Remark 3.9 This result is very close to one-shot games and its proof is mainly based on the stability property of the mean field equation. In some sense it is rather cheap : what one would really like to understand is in what extend Nash equilibria for differential games in feedback strategies give rise to a mean field equation. 6

17 Proof : Fix ɛ > 0. Since the problem is symmetrical, it is enough to show that J N ( α,..., α N ) J N (( α j ) j, α) + ɛ (22) for any control α, as soon as N is large enough. Let us denote by j t the solution of the stochastic differential equation (2) with initial condition j 0. We note that the ( j t ) are independent and identically distributed with law m(t) (the law comes from Lemma 3.3). Therefore, using (as in subsection 2.3) the Hewitt and Savage Theorem 5.0 (or, more precisely, its Corollary 5.3), there is some N 0 such that, if N N 0, E sup y / ɛ F y, δ N j s F (y, m(t)) ɛ (23) for any t [0, T ] and E sup y / ɛ j 2 G y, N δ j T j 2 G(y, m(t )) ɛ. (24) For the first inequality, one can indeed choose N 0 independent of t because, F being C 0 Lipschitz continuous with respect to m, we have E sup y ɛ F y, δ N j F y, δ t N j s j 2 j 2 E C 0 d δ N j, δ t N j s j 2 j 2 [ E j t N ] s j c 0 ( + ᾱ )(t s) /2, j 2 where the last inequality easily comes from computations similar to that for Lemma 3.4. Let now α be a control adapted to the filtration (F t ) and t be the solution to d t = α t dt + 2dB t with random initial condition 0. Let us set K = 2(T F + G ) + E[ T 0 that, if E[ T 0 2 α s 2 ds] K, then (22) holds. Let us now assume that E[ T 0 [ [ E ] sup t 2 2E t [0,T ] α s 2 ds] K. We first estimate E[sup t [0,T ] t 2 ]: T 0 2 ᾱ s 2 ds]. Note ] 2 α s 2 ds + 2 sup B t 2 2E[ 0 2 ] + 2K + 4T t [0,T ] where the last estimates comes from the Burkholder-Davis-Gundy inequality (see [37]). Denoting by K the right-hand side of the above inequality we obtain therefore that [ ] P sup t / ɛ t [0,T ] K ɛ. 7

18 Let us now fix N N 0 and estimate J N(( αj ) j, α) by separating the expectation for the F and G terms according to the fact that sup t [0,T ] t / ɛ or not. Taking into account (23) and (24) we have [ T J N (( α j ) j 2, α) E 0 2 α s 2 + F ( s, i m(t) ) + G ( T i, m(t ) ) ] ( + T )ɛ [ ] 2P sup t / ɛ t [0,T ] J N(( αj ) j ) Cɛ (T F + G ) for some constant C independent of N and α, where the last inequality comes from the optimality of ᾱ in Lemma Comments Existence of solutions for second order mean field equations hold under more general assumptions. For instance [43, 44] considers equations of the form (i) t u u + H(x, ( Du) = F (x, m) in Q (0, T ) (ii) t m m div m H ) (x, Du) = 0 in Q (0, T ) p (iii) m(0) = m 0, u(x, T ) = G(x, m(t )) in Q where Q = [0, ] d (with periodic boundary conditions), H : R d R d is Lipschitz continuous with respect to x uniformly from bounded p, convex and of class C with respect to p. The conditions on F and G are: either F and G are regularizing (i.e., satisfy conditions as in Theorem 3.), or F (x, m) = f(x, m(x)), G(x, m) = g(x, m(x)), where f = f(x, λ) and g = g(x, λ) satisfy suitable growth conditions with respect to to λ and H is sufficiently strictly convex. It is conjectured in [43, 44] that symmetric, closed loop Nash equilibria of differential games with N players with dynamics given by (20) and payoff T E L(s, i αs) i + +F s, i δ t 0 N j s ds + G T i, δ N j, T j i where L is the Fenchel conjugate of H with respect to the p variable, converge to the solution of the mean field game. This program is carried out in [45] for ergodic differential games. Finally, although all the results of this part come from [43, 44], the application to a finite number of players given in subsection 3.4 can be found in [32]. 4 Analysis of first order MFEs In this section we investigate the Mean Field Equation (i) t u(x, t) + 2 Du(x, t) 2 = F (x, m(t)) in R d (0, T ) (ii) t m(x, t) div ((Du(x, t)m(x, t)) = 0 in R d (0, T ) (25) (iii) m(0) = m 0, u(x, T ) = G(x, m(t )) in R d 8 j i

19 Let us briefly recall the heuristic interpretation of this system: the map u is the value function of a typical agent which controls his velocity α(t) and has to minimize his cost T 0 ( 2 α(t) 2 + F (x(t), m(t))) dt + G(x(T ), m(t )) where x(t) = x 0 + t 0 α(s)ds. His only knowledge on the overall world is the distribution of the other agent, represented by the density m(t) of some probability measure. Then his feedback strategy i.e., the way he ideally controls at each time and at each point his velocity is given by α(x, t) = Du(x, t). Now if all agents argue in this way, the density m(x, t) of their distribution m(t) over the space will evolve in time with the equation of conservation law (25- (ii)). Here again we work in the space P of Borel probability measures on R d with finite first order moment and we endow P with the Kantorovitch-Rubinstein distance d defined by (9) (see also section 5). We assume that F : R d P R and G : R d P R are continuous and regularizing. More precisely we suppose that. F and G are continuous over R d P. 2. There is a constant C such that, for any m P, F (, m) C 2 C, G(, m) C 2 C m P, (26) where C 2 is the space of function with continuous second order derivatives endowed with the norm f C 2 = sup x R d [ f(x) + Dx f(x) + D 2 xxf(x) ]. 3. Finally we suppose that m 0 is absolutely continuous, with a density still denoted m 0 which is bounded and has a compact support. By a solution of (25) we mean a pair (u, m) W, loc (Rd [0, T ]) L (R d (0, T )) such that (i) is satisfied in the viscosity sense while (ii) is satisfied in the sense of distributions. References on viscosity solutions can be found, for instance, in the monographs [8, 9, 24], while, for equations of conservation law see, for example, [9]. The main result of this section is the following existence result: Theorem 4. Under the above assumptions, there is a least one solution to (25). Remark 4.2 Uniqueness holds under the same condition as for Theorem 3.6. In fact the proof is exactly the same, since we can now use the Lipschitz continuous map u as a test function because the density m which is bounded and has a bounded support. Remark 4.3 The method of proof also shows that the solution of (25) are stable with respect of F, G and m 0. We give two proofs of Theorem 4.: the first one is direct but rather technical. It requires fine uniqueness property of the continuity equation (25-(ii)). The second one uses the existence result established in the previous section for viscous mean field equations. In both proofs, semi-concavity estimates for the solutions of Hamilton-Jacobi of the form (25-(i)) play an key role. 9

20 4. Semi-concavity estimates The aim of this section is to investigate some properties of the local Hamilton-Jacobi equation { t u + 2 Du x 2 = f(x, t) in R d (0, T ) u(x, T ) = g(x) in R d (27) The most fundamental regularity property of the solution of this equation being semi-concavity, let us recall some basic facts on this notion. Proofs and references can for instance, be found in the monograph [2]. Definition 4.4 A map w : R d R is semi-concave if there is some C > 0 such that one of the following equivalent conditions is satisfied. the map x w(x) C 2 x 2 is concave in R d, 2. w(λx + ( λ)y) λw(x) + ( λ)w(y) Cλ( λ) x y 2 for any x, y R d, λ [0, ], 3. D 2 w C I d in the sense of distributions, 4. p q, x y C x y 2 for any x, y R d, t [0, T ], p D x + w(x) and q D x + w(y), where D x + w denotes the super-differential of w with respect to the x variable, namely { } D x + w(x) = p R d w(y) w(x) p, y x ; lim sup 0 y x y x Lemma 4.5 Let w : R d R be semi-concave. Then w is locally Lipschitz continuous in R d. Moreover D x + w(x) is the closed convex hull of the set Dxw(x) of reachable gradients defined by { } Dxw(x) = p R d, x n x such that D x w(x n ) exists and converges to p. In particular, D + x w(x) is compact, convex and non empty subset of R d for any x R d. Finally w is differentiable at x if and only if D + x w(x) is a singleton. Lemma 4.6 Let (w n ) be a sequence of uniformly semi-concave maps on R d which point-wisely converge to a map w : R d R. Then the convergence is locally uniform and w is semi-concave. Moreover, for any x n x and any p n D + x w n (x n ), the set of cluster points of (p n ) is contained in D + x w(x). Finally, Dw n (x) converge to Dw(x) for a.e. x R d. Let us now turn to the analysis of equation (27). Lemma 4.7 For any C > 0 there is a constant C = C (C) such that, if f : R d [0, T ] R and h : R d R are continuous and such that f(, t) C 2 C, t [0, T ], g C 2 C, (28) then equation (27) has a unique bounded uniformly continuous viscosity solution which is given by the representation formula: u(x, t) = inf α L 2 ([t,t ],R d ) T t 2 α(s) 2 + f(x(s), s)ds + g(x(t )), (29) where x(s) = x + s t α(τ)dτ. Moreover u is Lipschitz continuous and satisfies D x,t u C, D 2 xxu C I d where the last inequality holds in the sense of distributions. 20

21 Proof : Let us recall that the solution u to (27) has a unique bounded uniformly continuous viscosity solution. Moreover, writing down the dynamic programming principle satisfied by the map T v(x, t) = inf α L 2 ([t,t ],R d ) t 2 α(s) 2 + f(x(s), s)ds + g(x(t )), one can check that v is a bounded, uniformly continuous viscosity solution to (27). So u = v. Next we check that u is Lipschitz continuous with respect to x. Indeed let (x, x 2, t) R d R d [0, T ] and α L 2 ([t, T ], R d ) be ɛ optimal for u(x, t). Then if we set x(s) = x + s t α(τ)dτ, we have: u(x 2, t) T t 2 α(s) 2 + f(x(s) + x 2 x, s)ds + g(x(t ) + x 2 x ) T t 2 α(s) 2 + f(x(s), s)ds + g(x(t )) + C(T + ) x 2 x u(x, t) + ɛ + C(T + ) x 2 x Hence u is Lipschitz continuous with respect to x with a Lipschitz constant C(T + ). We now prove that u is Lipschitz continuous with respect to the time variable. From the dynamic programming principle we have, if α is optimal for u(x, t) and x( ) is its associated trajectory, s u(x, t) = 2 α(τ) 2 + f(x(τ), τ) dτ + u(x(s), s), t for any t < s T. We prove in Lemma 4.8 below that α is bounded by a constant C = C (C) independent of (x, t). Hence u(x, t) u(x, s) u(x, t) u(x(s), s) + C x(s) x s t 2 α(τ) 2 + f(x(τ), τ) dτ + C (s t) α (s t) [ ] 2 α 2 + f + C α So u(x, ) is Lipschitz continuous with a Lipschitz constant independent of x. It remains to show that u is semi-concave with respect to the x variable. Let (x, y, t) R d R d [0, T ], λ (0, ), x λ = λx + ( λ)y. Let also α L 2 ([t, T ], R d ) be ɛ optimal for u(x λ, t) and set x λ (s) = x λ + s t α(τ)dτ. Then λu(x, t) + [ ( λ)u(y, t) ] T λ t 2 α(s) 2 + f(x λ (s) + x x λ, s)ds + g(x λ (T ) + x x λ ) [ ] T +( λ) t 2 α(s) 2 + f(x λ (s) + y x λ, s)ds + g(x λ (T ) + y x λ ) T t 2 α(s) 2 + f(x λ (s), s)ds + g(x λ (T )) + C(T + )λ( λ) x y 2 u(x λ, t) + ɛ + C(T + )λ( λ) x y 2 Hence u is semi-concave with respect to x with a semi-concavity constant C(T + ). For (x, t) R d [0, T ) we denote by A(x, t) the set of optimal controls of the control problem (29). One easily checks that such set is nonempty, and that, if (x n, t n ) (t, x) and α n A(x n, t n ), then, up to some subsequence, (α n ) weakly converges in L 2 to some α A(x, t). Let us recall the well-known classical Euler-Lagrange optimality condition for optimizers in (29): 2

22 Lemma 4.8 (Euler-Lagrange optimality condition) If α A(x, t), then α is of class C on [t, T ] with α (s) = D x f(x(s), s) s [t, T ], α(t ) = D x g(x(t )). In particular, there is a constant C = C (C) such that, for (x, t) R d [0, T ) and any α A(x, t) we have α C, where C is given by (28). We need to analyse precisely the connexion between the differentiability of u with respect to the x variable and the uniquess of the minimizer in (29). Lemma 4.9 (Regularity of u along optimal solutions) Let (x, t) R d [0, T ], α A(x, t) and let us set x(s) = x + s t α(τ)dτ. Then. (Uniqueness of the optimal control along optimal trajectories) for any s (t, T ], the restriction of α to [s, T ] is the unique element of A(x(s), s). 2. (Uniqueness of the optimal trajectories) D x u(x, t) exists if and only if A(x, t) is a reduced to singleton. In this case, D x u(x, t) = α(t) where A(x, t) = {α}. Remark 4.0 In particular, if we combine the above statements, we see that u(, s) is always differentiable at x(s) for s (t, T ), with D x u(x(s), s) = α(s). Proof : Let α A(x(s), s) and set x (τ) = x(s) + τ s α (σ)dσ. For any h > 0 small we build some α h A(x, t) in the following way: α(τ) if τ [t, s h) x α h (τ) = (s + h) x(s h) if τ [s h, s + h) 2h α (τ) if τ [s + h, T ] Then one easily checks that x h (τ) = x + τ t α h(σ)dσ is given by x(τ) if τ [t, s h) x h (τ) = x(s h) + (τ (s h)) x (s + h) x(s h) if τ [s h, s + h) 2h x (τ) if τ [s + h, T ] Since α [s,t ] and α are optimal for u(x(s), s), the concatenation α 0 of α [t,s] and α is also optimal for u(x, t). Note that x 0 (τ) = x + τ t α 0(σ)dσ is given by x(τ) on [t, s] and x (τ) on [s, T ]. So, comparing the payoff for α 0 (which is optimal) and the payoff for α h we have s t Hence s+h T 2 α(τ) 2 + f(x(τ), τ)dτ + s 2 α (τ) 2 + f(x (τ), τ)dτ s h s+h t 2 α(τ) 2 + f(x(τ), τ)dτ + s h 2 T + 2 α (τ) 2 + f(x (τ), τ)dτ. s s h s+h x (s + h) x(s h) 2h 2 α(τ) 2 + f(x(τ), τ)dτ + s 2 α (τ) 2 + f(x (τ), τ)dτ s+h x (s + h) x(s h) 2 2 2h + f(x h (τ), τ)dτ 0. s h f(x h (τ), τ)dτ

23 We divide this inequality by h and let h 0 + to get 2 α(s) α (s) 2 4 α(s) + α (s) 2 0, since lim h 0, s [s h,s+h] x h (s) = x(s) = x (s). So 2 α(s) α (s) 2 0, i.e., α(s) = α (s). In particular x( ) and x ( ) satisfy the same second order differential equation: y (τ) = D x f(y(τ), τ) with the same initial conditions x(s) = x (s) and x (s) = α(s) = α (s) = x (s). Therefore x(τ) = x (τ) on [s, T ] and α = α on [s, T ]. This means that the optimal solution for u(x(s), s) is unique. Next we show that, if D x u(x, t) exists, then A(x, t) is a reduced to singleton and D x u(x, t) = α(t) where A(x, t) = {α}. Indeed, let α A(x, t) and x(s) = x + s t α(τ)dτ be the associated trajectory. Then, for any v R d, u(x + v, t) T t 2 α(s) 2 ds + f(x(s) + v, s)ds + g(x(t ) + v). Since equality holds for v = 0 and since left- and right-hand sides are differentiable with respect to v at v = 0 we get D x u(x, t) = T t D x f(x(s), s)ds + D x g((t )) = α(t), where the last equality comes from the necessary conditions satisfied by α (see Lemma 4.8). In particular x( ) has to be the unique solution of the second order differential equation This is turn implies that α = x is unique. x (s) = D x f(x(s), s), x(t) = x, x (t) = D x u(x, t). Conversely, let us prove that, if A(x, t) is a singleton, then u(, t) is differentiable at x. For this we note that, if p belongs to D xu(x, t) (the set of reachable gradients of the map u(, t)), then the solution to x (s) = D x f(x(s), s), x(t) = x, x (t) = p is optimal. Indeed, by definition of p, there is a sequence x n x such that u(, t) is differentiable at x n and Du(x n, t) p. Now, since u(, t) is differentiable at x n, we know that the unique solution x n ( ) of x n(s) = D x f(x n (s), s), x(t) = x, x (t) = Du(x n, t) is optimal. Passing to the limit as n + implies (by the stability of optimal solutions mentionned above), that x( ), which is the uniform limit of the x n ( ), is also optimal. Now, from our assumptions, there is a unique optimal solution in A(x, t). Therefore Dxu(x, t) has to be reduced to a singleton, which implies, since u(, t) is semi-concave, that u(, t) is differentiable at x (Lemma 4.5). Let us consider again (x, t) R d [0, T ), α A(x, t) and x(s) = x + s t α(τ)dτ. Then we have just proved that u(, s) is differentiable at x(s) for any s (t, T ), with x (s) = α(s) = D x u(x(s), s). So, if α is optimal, its associated trajectory x( ) is a solution of the differential 23

24 equation x (s) = D x u(x(s), s) on [t, T ]. The following Lemma states that the reverse also holds: any solution of the differential equation x (s) = D x u(x(s), s) on [t, T ] is optimal on [t, T ]. This is an optimal synthesis result, since its says that the optimal control can be obtained at each position y and at each time s as by the synthesis α (y, s) = D x u(y, s). Lemma 4. (Optimal synthesis) Let (x, t) R d [0, T ) and x( ) be an absolutely continuous solution to the differential equation { x (s) = D x u(x(s), s) a.e. in [t, T ] (30) x(t) = x Then the control α := x is optimal for u(x, t). In particular, if u(, t) is differentiable at x, then equation (30) has a unique solution, corresponding to the optimal trajectory. Proof : We first note that x( ) is Lipschitz continuous because so is u. Let s (t, T ) be such that equation (30) holds (in particular u is differentiable with respect to x at (x(s), s)) and the Lipschitz continuous map s u(x(s), s) has a derivative at s. Since u is Lipschitz continuous, Lebourg s mean value Theorem [5], Th , states that, for any h > 0 small there is some (y h, s h ) [(x(s), s), (x(s + h), s + h)] and some (ξx, h ξt h ) CoDx,tu(y h, s h ) with u(x(s + h), s + h) u(x(s), s) = ξ h x, x(s + h) x(s) + ξ h t h, (3) (where CoDx,tu(y, s) stands for the closure of the convex hull of the set of reachable gradients Dx,tu(y, s)). From Caratheodory Theorem, there are (λ h,i, ξx h,i, ξ h,i t ) i=,...,d+2 such that λ h,i 0, i λh,i =, (ξx h,i, ξ h,i ) Dx,tu(y h, s h ) and t (ξ h x, ξ h t ) = i λ h,i (ξx h,i, ξ h,i t ). Note that the ξx h,i converge to D x u(x(s), s) as h 0 because, from Lemma 4.6, any cluster point of the ξx h,i must belong to D x + u(x(x), s), which is reduced to D x u(x(s), s) since u(, s) is differentiable at x(s). In particular, ξx h = i λh,i ξx h,i converges to D x u(x(s), s) as h 0. Since u is a viscosity solution of (27) and (ξx h,i, ξ h,i ) Dx,tu(y h, s h ), we have ξ h,i t + ξx h,i 2 = f(y h, s h ). 2 Therefore ξt h = λ h,i ξ h,i t = λ h,i ξ h,i x 2 f(y h, s h ) converges to 2 2 D xu(x(s), s) 2 f(x(s), s) i i as h 0. Then, dividing (3) by h and letting h 0 we get d ds u(x(s), s) = D xu(x(s), s), x (s) + 2 D xu(x(s), s) 2 f(x(s), s). Since x (s) = D x u(x(s), s), this implies that d ds u(x(s), s) = x (s) 2 f(x(s), s) a.e. in (t, T ). 2 t 24

25 Integrating the above inequality over [t, T ] we finally obtain, since u(y, T ) = g(y), u(x, t) = T t x (s) 2 + f(x(s), s) ds + g(x(t )). 2 Therefore α := x is optimal. The last statement of the Lemma is a direct consequence on Lemma 4.9-(2). From the stability of optimal solutions, the graph of the map (x, t) A(x, t) is closed when the set L 2 ([0, T ], R d ) is endowed with the weak topology. This implies that the map (x, t) A(x, t) is measurable with nonempty closed values, so that it has a Borel measurable selection ᾱ: namely ᾱ(x, t) A(x, t) for any (x, t) (see [6]). We define the flow Φ(x, t, s) = x + s Lemma 4.2 The flow Φ has the semi-group property Moreover it satisfies and t ᾱ(x, t)(τ)dτ s [t, T ]. Φ(x, t, s ) = Φ(Φ(x, t, s), s, s ) t s s T (32) s Φ(x, t, s) = Du(Φ(x, t, s), s) x R d, s (t, T ) Φ(x, t, s ) Φ(x, t, s) Du s s x R d, t s s T. (33) Proof : For any s (t, T ), we} know that from Lemma 4.9-() that A(Φ(x, t, s), s) is reduced to the singleton {ᾱ(x, t) [s,t ]. Hence (32) holds. Moreover, Lemma 4.9 also states that u(, s) is differentiable at Φ(x, t, s) with D x u(φ(x, t, s), s) = ᾱ(x, t)(s). Since, by definition, s Φ(x, t, s) = ᾱ(x, t)(s), we have s Φ(x, t, s) = D x u(φ(x, t, s), s), which clearly implies (33). Finally we shall need below the following contraction property: Lemma 4.3 If C is given by (28), then there is some constant C 2 = C 2 (C) such that, if u is a solution of (27), then x y C 2 Φ(x, t, s) Φ(y, t, s) 0 t < s T, x, y R d. In particular the map x Φ(x, s, t) has a Lipschitz continuous inverse on the set Φ(R d, t, s). Proof : We already know that Dxxu 2 C I d on R d (0, T ). Let x(τ) = Φ(x, t, s τ) and y(τ) = Φ(y, t, s τ) for τ [0, s t]. Then, from Lemma 4.2, x( ) and y( ) satisfy respectively x (τ) = D x u(x(τ), s τ), y (τ) = D x u(y(τ), s τ) τ [0, s t) (34) with initial condition x(0) = Φ(x, t, s) and y(0) = Φ(y, t, s). Note that, for almost all τ [0, s t], we have ( ) d (x y)(τ) 2 = (x y )(τ), (x y)(τ) C (x y)(τ) 2 dτ 2 where the last inequality comes from (34) and the fact that D 2 xxu C I d (see Definition 4.4). Hence x(0) y(0) e C(s τ) (x y)(τ) τ [0, s t], which proves the claim. 25

26 4.2 On the continuity equation Our aim is now to show that, given a solution u to (27) and under assumption (28), the continuity equation { t µ(x, s) div (Du(x, s)µ(x, s)) = 0 in R d (0, T ) µ(x, 0) = m 0 (x) in R d (35) has a unique solution which is the density of the measure µ(s) = Φ(, 0, s) m 0 for s [0, T ], where Φ(, 0, s) m 0 denotes the push-forward of the measure m 0 by the map Φ(, 0, s), i.e., the measure defined by Φ(, 0, s) m 0 (A) = m 0 (Φ(, 0, s) (A)) for any Borel set A R d. In a first step, we show that the measure Φ(, 0, s) m 0 is absolutely continuous with respect to the Lebesgue measure. Lemma 4.4 Let us assume that there is a constant C such that the conditions (28) on f and g hold and such that m 0 is absolutely continuous, has a support contained in the ball B(0, C) and satisfies m 0 L C. Let us set µ(s) = Φ(, 0, s) m 0 for s [0, T ]. Then there is a constant C 3 = C 3 (C) such that, for any s [0, T ], µ(s) is absolutely continuous, has a support contained in the ball B(0, C 3 ) and satisfies m 0 L C 3. Moreover d (µ(s ), µ(s)) D x u s s t s s T. Proof : By definition µ satisfies d (µ(s ), µ(s)) Φ(x, 0, s ) Φ(x, 0, s) dm 0 (x) D x u (s s). R d Since D x u C and m 0 has a compact support contained in B(0, C), the (µ(s)) have a compact support contained in B(0, R) where R = C + T C. Let us now fix t [0, T ]. Recalling Lemma 4.3, there is some C 2 such that the map x Φ(x, 0, t) has a C 2 Lipschitz continuous inverse on the set Φ(R d, 0, t). Let us denote this inverse by Ψ. Then, if E is a Borel subset of R d, we have µ(s)(e) = m 0 (Φ (, 0, t)(e)) = m 0 (Ψ(E)) m 0 L d (Ψ(E)) m 0 C 2 L d (E). Therefore µ(s) is absolutely continuous with a density (still denoted µ(s)) which satisfies µ(s) m 0 C 2 s [0, T ]. Our aim is to show that the map s µ(s) := Φ(, 0, s) m 0 is the unique weak solution of (35). The proof this statement is a little involve and requires several steps. We start with the easy part: Lemma 4.5 The map s µ(s) := Φ(, 0, s) m 0 is a weak solution of (35). Proof: Let ϕ Cc (R N [0, T )). Then, since s µ(s) is Lipschitz continuous in P, the map s R ϕ(x, s)µ(x, s)dx is absolutely continuous and we have, thanks to Lemma 4.2, d d ϕ(x, s)µ(x, s)dx = d N ϕ(φ(x, 0, s), s)m 0 (x)dx ds R d ds R = ( s ϕ(φ(x, 0, s), s) + D x ϕ(φ(x, 0, s), t), s Φ(x, 0, s) ) m 0 (x)dx R d = ( s ϕ(φ(x, 0, s), s) D x ϕ(φ(x, 0, s), s), D x u(φ(x, 0, s), s) ) m 0 (x)dx R d = ( s ϕ(y, s) D x ϕ(y, s), D x u(y, s) ) µ(y, s)dy R d 26

27 Integrating the above inequality between 0 and T we get, since µ(0) = m 0, T φ(y, 0)m 0 (y)dy + ( s ϕ(y, s) D x ϕ(y, s), D x u(y, s) ) µ(y, s)dy = 0, R d 0 R d which means that m is a weak solution of (35). We now turn to the difficult part of the problem: the uniqueness issue. The difficulty comes from the discontinuity of the vector field Du(x, t). In fact, if this vector field had some Lipschitz regularity property, then the uniqueness would easily follow, as we explain now. Lemma 4.6 Let b L (R d (0, T ), R d ) be such that, for any R > 0 and for almost all t [0, T ], there is a constant L = L R with b(, t) is L R Lipschitz continuous on B(0, R). Then the continuity equation { t µ(x, s) + div (b(x, s)µ(x, s)) = 0 in R d (0, T ) µ(x, 0) = m 0 (x) in R d (36) has a unique solution, given by µ(t) = Φ(, t) m 0 where Φ is the flow of the differential equation { s Φ(x, s) = b(φ(x, s), s) Φ(x, 0) = x Remark 4.7 Note that, for any smooth function ϕ Cc (R d ), we have, in the sense of distributions, d ϕ(x)m(x, t)dx = Dϕ(x), b(x, t) m(x, t)dx. dt R d R d Since the right-hand side of the above equality belongs to L, the map t R d ϕ(x)m(x, t)dx is absolutely continuous, and, therefore, has a continuous representative. By using the separability of C 0 b (Rd ), this implies that m(t) has a continuous representative on [0, T ] as a measure on R d and the initial condition m(0) = m 0 holds in the classical sense. Proof : The fact that the map t Φ(, t) m 0 is a solution of the continuity equation (36) can be established as in Lemma 4.5, so we omit the proof. Let us recall that the map x Φ(x, t) is locally Lipschitz continuous, with a locally Lipschitz continuous inverse denoted by Ψ(x, t). Note also that Ψ is actually locally Lipschitz continuous in space-time. Let ϕ C c (R d ) and let us consider the map w defined by w(x, t) = ϕ(ψ(x, t)). Then w is Lipschitz continuous with compact support and satisfies: 0 = d dt ϕ(x) = d dt w(φ(x, t), t) = tw(φ(x, t), t) + Dw(Φ(x, t), t), b(φ(x, t), t) a.e., so that w is a solution to t w(y, t) + Dw(y, t), b(y, t) = 0 a.e. in R d (0, T ). Using w as a test function for µ we have d w(y, t)µ(y, t)dy = ( t w(y, t) + Dw(y, t), b(y, t) )µ(y, t)dy = 0, dt R d R d and therefore ϕ(ψ(y, t))µ(y, t)dy = ϕ(y)m 0 (y)dy. R d R d 27

28 Changing the test function this proves that ψ(y)µ(y, t)dy = ψ(φ(y, s))m 0 (y)dy, R d R d for any ψ C 0 c (R N ), which shows that µ(t) = Φ(, t) m 0. Let us come back to the continuity equation (35) and we consider a solution µ. We now regularize this solution by using a smooth kernel ρ ɛ, assumed to be positive in R d (for instance the Gaussian kernel). We set µ ɛ (x, t) = µ ρ ɛ and b ɛ (x, t) = (Duµ) ρ ɛ(x, t) µ ɛ (x, t) Then b ɛ D x u and b ɛ is locally Lipschitz continuous in the sense of Lemma 4.6 above. Moreover µ ɛ satisfies the continuity equation for b ɛ because t µ ɛ + div (b ɛ µ ɛ ) = ( t µ) ρ ɛ div ((Duµ) ρ ɛ ) = [ t µ div (Duµ)] ρ ɛ = 0.. So, according to Lemma 4.6, µ ɛ (t) = Φ ɛ (, t) m ɛ, where m ɛ = m 0 ρ ɛ and Φ ɛ is the flow associated to b ɛ : { s Φ ɛ (x, s) = b ɛ (Φ(x, s), s) Φ ɛ (x, 0) = x The difficulty now boils down to passing to the limit in the equality µ ɛ (t) = Φ ɛ (, t) m ɛ. Let us set, to simplify the notations, Γ T = C 0 ([0, T ], R d ) and let us associate with µ ɛ the measure η ɛ on R d Γ T defined by ϕ(x, γ)dη ɛ (x, γ) = R d Γ T ϕ(x, Φ(x, ))m ɛ (x)dx R d ϕ C 0 (R d Γ T, R). For t [0, T ] we denote by e t the evaluation map at t, i.e., e t (γ) = γ(t) for γ Γ T. Then, for any ϕ Cb 0(RN, R) we have ϕ(e t (γ))dη ɛ (x, γ) = ϕ(φ ɛ (x, t))m ɛ (x)dx = ϕ(x)µ ɛ (x, t)dx. (37) R d Γ T R d R d Let us now prove that (η ɛ ) is tight in R d Γ T. Indeed, since m ɛ converges to m 0 as ɛ 0, we can find for any δ > 0 some compact set K δ R d such that m ɛ (K δ ) δ for any ɛ > 0. Let K δ be the subset of K δ Γ T consisting in pairs (x, γ) where x K δ, γ(0) = x, γ is Lipschitz continuous with γ D x u. Then, K δ is compact and by definition of η ɛ, η ɛ (K δ ) = m ɛ (K δ ) δ ɛ > 0. Therefore (η ɛ ) is tight and, from Prokhorov compactness Theorem one find find a subsequence, still labelled (η ɛ ), which narrowly converges to some probability measure η on R d Γ T. Letting ɛ 0 in (37) gives ϕ(e t (γ))dη(x, γ) = R d Γ T ϕ(x)µ(x, t)dx R d t [0, T ] (38) for any ϕ Cb 0(Rd, R), and therefore for any Borel bounded measurable map ϕ : R d R, thanks to Riesz representation Theorem. Moreover, since, by definition of η ɛ, we have ϕ(x)dη ɛ (x, γ) = R d Γ T ϕ(x)m ɛ (x)dx R d ϕ C 0 (R d, R), 28

29 we also have that ϕ(x)dη(x, γ) = R d Γ T ϕ(x)m 0 (x)dx R d ϕ C 0 (R d, R), (39) i.e., the first marginal of η is m 0. The key step of the proof consists in showing that η is concentrated on solutions of the differential equation for D x u. More precisely, we want to show the following superposition principle : for any t [0, T ], t γ(t) x + D x u(γ(s), s)ds dη(x, γ) = 0. (40) R d Γ T 0 For this we have to regularize a bit the vector field D x u. Let c : R d [0, T ] R d be a continuous vector field with compact support. We claim that t γ(t) x T c(γ(s), s)ds dη(x, γ) c(x, t) + D x u(x, t) µ(x, t)dxdt. (4) 0 0 R d R d Γ T Proof of (4): We have, for any ɛ > 0 small, t γ(t) x c(γ(s), s)ds dηɛ (x, γ) R d Γ T 0 t = Φɛ (x, t) x c(φ ɛ (x, s), s)ds m ɛ(x)dx R d 0 t = (b ɛ (Φ ɛ (x, t), s) c(φ ɛ (x, s), s))ds m ɛ(x)dx R d 0 t b ɛ (Φ ɛ (x, t), s) c(φ ɛ (x, s), s) m ɛ (x)dxds 0 R d t b ɛ (y, s) c(y, s))ds µ ɛ (x, t)dx 0 R d where, setting c ɛ = (cµ) ρɛ µ, we have ɛ t 0 b ɛ (y, s) c(y, s))ds µ ɛ (x, t)dx R d t b ɛ (y, s) c ɛ (y, s))ds µ ɛ (x, t)dx + 0 R d t b(y, s) c(y, s))ds µ(x, t)dx + 0 R d 0 t 0 t R d c ɛ (y, s) c(y, s))ds µ ɛ (x, t)dx R d c ɛ (y, s) c(y, s))ds µ ɛ (x, t)dx Note that the last term converges to 0 as ɛ 0 thanks to the continuity of c. This gives (4). Proof of (40): To complete the proof of (40) we just take a sequence of uniformly bounded, continuous maps c n with compact support which converges a.e. to Du. Replacing c by c n in (4) gives the desired result since, from (38), t t D x u(γ(s), s) + c n (γ(s), s) dη(x, γ)ds = D x u(c, s) + c n (x, s) µ(x, s)ds 0 R d Γ T 0 R d Let us now desintegrate η with respect to its first marginal, which, according to (39), is m 0 (see the desintegration Theorem 8.5 below). We get dη(x, γ) = dη x (γ)dm 0 (x). Then (40) implies that, for m 0 a.e. x R d, η x a.e. γ is a solution of the differential equation { γ (s) = D x u(γ(s), s) s [t, T ] γ(t) = x 29

30 But for almost all x R d, u(, 0) is differentiable at x and Lemma 4. then says that the above differential equation has a unique solution given by Φ(x, 0, ). Since m 0 is absolutely continuous, this implies that, for m 0 a.e. x R d, η x a.e. γ is given by Φ(x, 0, ). Then equality (38) becomes ϕ(x)µ(x, t)dx = ϕ(e t (γ))m 0 (x)dη x (γ)dx = ϕ(φ(x, 0, t))m 0 (x)dx R d R d Γ T R d Γ T for any ϕ C 0 b (Rd, R) and t [0, T ]. This proves that µ(t) is given by Φ(, 0, t) m 0. In conclusion we have just established the following result: Theorem 4.8 Given a solution u to (27) and under assumption (28), the map s µ(s) := Φ(, 0, s) m 0 is the unique weak solution of (35). 4.3 Proof of the existence Theorem Before starting the proof of Theorem 4., we need to show that the system (25) is, somehow, stable. Let (m n ) be a sequence of C([0, T ], P ) which uniformly converges to m C([0, T ], P ). Let u n be the solution to { t u n + 2 D xu n 2 = F (x, m n (t)) in R d (0, T ) u n (x, T ) = g(x, m n (T )) in R d and u be the solution to { t u + 2 D xu 2 = F (x, m(t)) in R d (0, T ) u(x, T ) = g(x, m(t )) in R d Let us denote by Φ n (resp. Φ) the flow associated to u n (resp. to u) as above and let us set µ n (s) = Φ n (, 0, s) m 0 and µ(s) = Φ(, 0, s) m 0. Lemma 4.9 (Stability) The solution (u n ) locally uniformly converges u in R d [0, T ] while (µ n ) converges to µ in C([0, T ], P ). Proof : From our assumptions on F and g, the sequences of maps (x, t) F (x, m n (t)) and (x, t) g(x, m n (T )) locally uniformly converge to the maps (x, t) F (x, m(t)) and (x, t) g(x, m(t )) respectively. So the local uniform convergence of (u n ) to u is a straightforward consequence of the standard stability of viscosity solutions. From Lemma 4.7 there is a constant C such that Dxxu 2 n C I d for all n. By local uniform convergence of (u n ) to u this implies that D x u n converges almost everywhere in R d (0, T ) to D x u (see Lemma 4.6). From Lemma 4.4 we know that the (µ n ) are absolutely continuous with support contained in K := B(0, C 3 ) and µ n C 3. Moreover Lemma 4.4 also states that d (µ n (s ), µ n (s)) C s s t s s T, n 0. Since P (K), the set of probability measures on K, is compact, Ascoli Theorem states that the sequence (µ n ) is precompact in C([0, T ], P (K)), and therefore a subsequence (still denoted (µ n )) of the (µ n ) converges in C([0, T ], P (K)) and in L weak-* to some m which has a support in K [0, T ], belongs to L (R d [0, T ]) and to C([0, T ], P (K)). Since the (µ n ) solve the continuity equation for (u n ), one easily gets by passing to the limit that m satisfies the continuity equation for u. By uniqueness this implies that m = µ and the proof is complete. 30

31 We are now ready to prove Theorem 4.. Let C be the convex subset of maps m C([0, T ], P ) such that m(0) = m 0. To any m C one associates the unique solution u to { t u + 2 D xu 2 = F (x, m(t)) in R d (0, T ) u(x, T ) = G(x, m(t )) in R d and to this solution one associates the unique solution to the continuity equation { t µ div ((Du(x, s)µ(x, s)) = 0 in R d (0, T ) µ(0) = m 0 Then µ C and, from Lemma 4.9, the mapping m µ is continuous. From Lemma 4.4 there is a constant C 3, independent of m, such that, for any s [0, T ], µ(s) has a support in B(0, C 3 ) and satisfies d (µ(s), µ(s )) C s s s, s [0, T ]. This implies that the mapping m µ is compact because s µ(s) is uniformly Lipschitz continuous with values in the compact set of probability measures on B(0, C 3 ). One completes the proof thanks to Schauder fix point Theorem. 4.4 The vanishing viscosity limit We now investigate the limit as σ 0 of the solutions to (i) t u σ u + 2 Du 2 = F (x, m) in R d (0, T ) (ii) t m σ m div (m Du) = 0 in R d (0, T ) (42) (iii) m(0) = m 0, u(x, T ) = G(x, m(t )) in R d Theorem 4.20 Let (u σ, m σ ) be a solution to (42). Then, under the assumptions of Theorem 4., as σ 0 and up to a subsequence, (u σ, m σ ) converges to a solution of (25): for (u σ ) the convergence is locally uniform in R d [0, T ], while for (m σ ), it is in C 0 ([0, T ], P (R d )). Remark 4.2 Incidentally this Theorem provides another (but only slightly different) proof of the existence Theorem 4.. Proof : As in the proof of Theorem 3. one can check that the (m σ ) all belong to the compact subset C of C 0 ([0, T ], P ) defined as the set of maps µ C 0 ([0, T ], P ) such that and d (µ(s), µ(t)) sup C s t t s 2 sup x 2 dm(t)(x) C, t [0,T ] R d provided C is large enough. In particular, up to a subsequence, the m σ converge to some m C. Following now the proof of Lemma 4.7 one can also check that the u σ are uniformly bounded, uniformly Lipschitz continuous and uniformly semi-concave. Since moreover the maps (x, t) 3

32 F (x, m σ (t)) and x G(x, m σ (T )) locally uniformly converge to the maps (x, t) F (x, m(t)) and x G(x, m(t )), the limit u is a solution to { t u + 2 Du x 2 = F (x, m(t)) in R d (0, T ) u(x, T ) = G(x, m(t )) in R d Because of the semi-concavity bounds on u σ, Lemma 4.6 states that the D x u σ converges a.e. to D x u. We now need to pass to the limit in equation (42-(ii)). For this we need some bounds on m σ. Recalling that the m σ are solutions to t m σ m σ Dm σ, Du σ m σ u σ = 0 (43) with u σ C by uniform semi-concavity of u σ, one easily checks that m σ (, t) m 0 e Ct t [0, T ], because m σ 0 and the right-hand side of the above inequality is a super-solution of (43). So the (m σ ) converge, still up to a subsequence, to m in L loc weak-*. Now, recalling that D xu σ converges a.e. to D x u, we can pass to the limit in the weak formulation of (42-(ii)) to get that m is a weak solution of (25-(ii)). 4.5 Comments Existence results for first order mean field equations are obtained in [43, 44] under more general conditions. The case of local operators (i.e., where F (x, m) = F (x, m(x)) and G(x, m) = G(x, m(x))) is also discussed, with links with the classical Euler equation. In the case where F = F (m(x)) where F is a strictly increasing function, the system enjoys a surprizing comparison principle: the key idea is to reduce the system to a single equation on m, equation which turns out to be elliptic (see Lions lectures [47]). Other coupling between first order Hamilton-Jacobi and transport equation can be found in the literature: in particular James and Gosse [27] and Ben Moussa and Kossioris analyze in [0] a system (coming from geometric optics) which is close to the mean field equation (25). The main difference is that in [0, 27] the first equation is forward and not backward. The analysis of the resulting system turns out to be completely different from the mean field equation: indeed, if the second equation enjoys stronger uniqueness and stability properties, measure valued solutions are unavoidable. Most of the material of subsection 4. on semi-concavity properties of value function in optimal control is borrowed from the monograph by Cannarsa and Sinestrari [2]. In fact, results as Lemmata 4.7, 4.9 or 4. hold for a much larger class of optimal control problems, which allows rooms for generalization of the existence Theorem for the mean field equation. The analysis of transport equations with discontinuous vector fields has attracted a lot of attention since the Di Perna-Lions seminal paper [20]. In subsection 4.2, we face a particularly simple situation where the vector field generates almost everywhere a unique solution. Nevertheless uniqueness of solution of associated continuity equation requires rather subtle arguments. We rely here on Ambrosio s approach [3, 4], in particular for the superposition principle (formula (40)). 32

33 5 The space of probability measures We have already seen the important role of the space of probability measures in the mean field game theory. It is now time to investigate the basic properties of this space more thoghroughly. The first two parts of this section are dedicated to metric aspects of probability measures spaces. The results are given mostly without proofs, which can be found, for instance, in Villani s monographs [55, 56] or in the monograph by Ambrosio, Gigli and Savaré [5]. 5. The Monge-Kantorovich distances Let be a Polish space (i.e., separable metric space) and P() be the set of Borel probability measures on. A sequence of measures (µ n ) is narrowly convergent to a measure µ P() if f(x)dµ n (x) = f(x)dµ(x) f Cb 0 (), lim n + where Cb 0 () is the set of continuous, bounded maps on. Prokhorov Theorem states that, a subset K of P() is relatively compact in P() if and only if it is tight, i.e., ɛ > 0, ɛ compact subset of with µ(\ ɛ ) ɛ µ K. In particular, for any µ P() and any ɛ > 0, there is some ɛ compact subset of with µ(\ ɛ ) ɛ (Ulam s Lemma). There are several ways to metricize the topology of narrow convergence, at least on some subsets of P(). Let us denote by d the distance on and, for p [, + ), by P p () the set of probability measures m such that d p (x 0, x)dm(x) < + for some (and hence for all) point x 0. The Monge-Kantorowich distance on P p () is given by d p (m, m ) = [ ] /p inf γ Π(µ,ν) d(x, y) p dγ(x, y) 2 (44) where Π(µ, ν) is the set of Borel probability measures on such that γ(a R d ) = µ(a) and γ(r d A) = ν(a) for any Borel set A. In other words, a Borel probability measure γ on belongs to Π(m, m ) if and only if ϕ(x)dγ(x) = ϕ(x)dm(x) and ϕ(y)dγ(x) = ϕ(y)dm (y), 2 2 for any Borel and bounded measurable map ϕ : R. Note that Π(µ, ν) is non-empty, because for instance µ ν always belongs to Π(µ, ν). Moreover, by Hölder inequality, P p () P p () for any p p and d p (m, m ) d p (m, m ) m, m P p (). We now explain that there exists at least an optimal measure in (44). This optimal measure is often refered to as an optimal transport plan from m to m. 33

34 Lemma 5. (Existence of an optimal transport plan) For any m, m P p (), there is at least one measure γ Π(m, m ) with [ ] /p d p (m, m ) = d(x, y) p d γ(x, y). 2 Proof : We first show that Π(µ, ν) is tight. For any ɛ > 0 there exists a compact set K ɛ such that m(k ɛ ) ɛ/2 and m(k ɛ ) ɛ/2. Then, for any γ (µ, ν), we have γ(k ɛ K ɛ ) γ(k ɛ ) γ(k ɛ (\K ɛ )) µ(k ɛ ) γ( (R N \K ɛ )) ɛ/2 ν(\k ɛ ) ɛ. This means that Π(µ, ν) is tight. It is also closed for the weak-* convergence. Since the map γ x y p dγ(x, y) is lower semi-continuous for the weak-* convergence, it has a minimum 2 on Π(m, m ). Let us now check that d p is a distance. Lemma 5.2 For any p, d p is a distance on P p. Proof : Only the triangle inequality presents some difficulty. Let m, m, m P p and γ, γ be optimal transport plans from m to m and from m to m respectively. We desintegrate the measures γ and γ with respect to m : dγ(x, y) = dγ y (x)dm (y) and dγ (y, z) = dγ y(z)dm (y) and we defined the measure π on by ϕ(x, z)dπ(x, z) = ϕ(x, z)dγ y (x)dγ y(z)dm (y). Then one easily checks that π Π(m, m ) and we have, by Hölder inequality, [ d p (x, z)dπ(x, z)] /p [ [ ] /p (d(x, y) + d(y, z)) p dγ y (x)dγ y(z)dm (y) /p [ d p (x, y)dγ y (x)dm (y)] + d p (y, z)dγ y (z)dm (y) = d p (m, m ) + d p (m, m ) ] /p So d p (m, m ) d p (m, m ) + d p (m, m ). We now prove that the distance d p metricize the weak-* convergence of measures. Proposition 5.3 If a sequence of measures (m n ) of P p () converges to m for d p, then (m n ) weakly converges to m. Conversely, if the (m n ) are concentrated on a fixed compact subset of and weakly converge to m, then the (m n ) converge to m in d p. Remark 5.4 The sharpest statement can be found in [55]: a sequence of measures (m n ) of P p () converges to m for d p if and only if (m n ) weakly converges to m and d p (x, x 0 )dm n (x) = d p (x, x 0 )dm(x) for some (and thus any) x 0. lim n + 34

35 Proof : In a first step, we only show now that, if (m n ) converges to m for d p, then ϕ(x)dm n (x) = ϕ(x)dm(x) (45) lim n + for any ϕ Cb 0 (). The proof of the converse statement is explained after Theorem 5.5. We first prove that (45) holds for Lipschitz continuous maps: indeed, if ϕ is Lipschitz continuous for some Lipschitz constant L, then, for any optimal transport plan γ n Π(m n, m) from m n to m, we have ϕ(x)dm n (x) ϕ(x)dm(x) = (ϕ(x) ϕ(y))dγ n (x) L d(x, y)dγ n (x) Ld p (m n, m). So (45) holds for any Lipschitz continuous ϕ. If now ϕ Cb 0 (), we approximate ϕ by the Lipschitz continuous map ϕ ɛ (x) = inf {ϕ(y) ɛ } d(x, y) x. y Then it is an easy exercise to show that ϕ ɛ (x) ϕ(x) as ɛ 0. Moreover ϕ ɛ is (/ɛ) Lipschitz continuous, bounded by ϕ and satisfies ϕ ɛ ϕ. In particular, from Lebesgue Theorem, ϕ ɛ (x)dm(x) = ϕ(x)dm(x). lim ɛ 0 Applying (45) to the Lipschitz continuous map ϕ ɛ we have lim sup n + ϕ(x)dm n (x) lim sup n + ϕ ɛ (x)dm n (x) = Then, letting ɛ 0, we get lim sup n + ϕ(x)dm n (x) Applying the above inequality to ϕ also gives lim inf ϕ(x)dm n (x) n + ϕ(x)dm(x). ϕ(x)dm(x). ϕ ɛ (x)dm(x). So (45) holds for any ϕ C 0 b (). In these notes, we are mainly interested in two Monge-Kantorovich distances, d and d 2. The distance d 2, which is often called the Wasserstein distance, is particularly usefull when is a Euclidean or a Hilbert space. Its analysis will be the object of the next subsection. As for the distance d, which often takes the name of the Kantorovich-Rubinstein distance, we have already encountered it several times. Let us point out a very important equivalent representation: 35

36 Theorem 5.5 (Kantorovich-Rubinstein Theorem) For any m, m P (), { } d (m, m ) = sup f(x)dm(x) f(x)dm (x) where the supremum is taken over the set of all Lipschitz continuous maps f : R. Remark 5.6 In fact the above Kantorovich duality result holds for much more general costs (i.e., it is not necessary to minimize the power of a distance). The typical assertion in this framework is, for any lower semicontinuous map c : R + {+ }, the following equality holds: inf c(x, y)dγ(x, y) = sup f(x)dm(x) + g(y)dm (y), γ Π(m,m ) f,g where the supremum is taken over the maps f L m(), g L m () such that f(x) + g(y) c(x, y) for dm almost all x and dm almost all y. Ideas of proof of Theorem 5.5: The complete proof of this result exceeds the scope of these note and can be found in several text books (see [55] for instance). First note that, if f is Lipschitz continuous, then f(x) f(y) d(x, y) (x, y). Integrating this inequality over any measure γ Π(m, m ) gives f(x)dm(x) f(y)dm (y) d(x, y)dγ(x, y), so that, taking the infimum over γ and the supremum of f gives { } sup f(x)dm(x) f(x)dm (x) d (m, m ). The opposite inequality is much more subtle. We now assume that is compact and denote by M + ( 2 ) the set of all nonnegative Borel measures on. We first note that, for any γ M + ( 2 ) sup f,g C 0 () So d (m, m ) = f(x)dm(x)+ inf sup γ M( 2 ) f,g C 0 () { g(y)dm 0 if γ Π(m, m (y) (f(x)+g(y))dγ(x, y) = ) + otherwise (d(x, y) f(x) g(y))dγ(x, y)+ f(x)dm(x)+ g(y)dm (y) If we could use the min-max Theorem, then we would have d (m, m ) = sup inf (d(x, y) f(x) g(y))dγ(x, y)+ where inf γ M( 2 ) f,g C 0 () γ M( 2 ) f(x)dm(x)+ g(y)dm (y) { 0 if f(x) + g(y) d(x, y) x, y (d(x, y) f(x) g(y))dγ(x, y) = otherwise 36

37 So d (m, m ) = sup f(x)dm(x) + g(y)dm (y) f,g where the supremum is taken over the maps f, g C 0 () such that f(x) + g(y) d(x, y) holds for any x, y. Let us fix f, g C 0 () satisfying this inequality and set f(x) = min y [d(x, y) g(y)] for any x. Then, by definition, f is Lipschitz continuous, f f and f(x) + g(y) d(x, y). So f(x)dm(x) + g(y)dm (y) f(x)dm(x) + g(y)dm (y). We can play the same game by replacing g by g(y) = min x d(x, y) f(x), which is also Lipschitz continuous and satisfies g g and f(x) + g(y) d(x, y). But one easily checks that g(y) = f(y). So f(x)dm(x) + g(y)dm (y) f(x)dm(x) f(y)dm (y). Hence d (m, m ) sup f f(x)dm(x) f(y)dm (y) where the supremum is taken over the Lipschitz continuous maps f. formal proof of the result. This completes the End of the proof of Proposition 5.3 : It remains to show that, if the (m n ) are concentrated on a fixed compact subset K of and weakly converge to m, then the (m n ) converge to m in d p. Note that m(k) =, so that m is also concentrated on K. We now show that it is enough to do the proof for p = : indeed, if γ Π(m n, m), then γ(k K) = because m n and m are concentrated on K. Therefore d p (x, y)dγ(x, y) = d p (x, y)dγ(x, y) [diam(k)] p d(x, y)dγ(x, y) K K where diam(k) denotes the diameter of K, i.e., diam(k) = max x,y K d(x, y), which is bounded since K is compact. Setting C = [diam(k)] (p )/p, we get d p (m n, m) K K [ ] /p inf C d(x, y)dγ(x, y) C[d (m n, m)] /p γ Π(m n,m) K K and it is clearly enough to show that the right-hand side has a limit. In order to prove that (m n ) converge to m in d, we use Theorem 5.5 which implies that we just have to show that lim sup f(x)d(m n m)(x) = 0. n + Lip(f) K Note that can can take the supremum over the set of maps f such that f(x 0 ) = 0 (for some fixed point x 0 K). Now, by Ascoli Theorem, the set F of maps f such that f(x 0 ) = 0 and Lip(f) is compact. In particular, for any n, there is some f n F such that d (m n, m) = 37

38 K f n(x)d(m n m)(x). Let f F be a limit of a subsequence of the (f n ) (still denoted (f n )). Then, by uniform convergence of (f n ) to f and weak convergence of (m n ) to m, we have lim sup d (m n, m) = lim sup f n (x)d(m n m)(x) = 0, n n which proves that, for any converging subsequence of the precompact family (f n ) there is a subsequence of the (d (m n, m)) which converges to 0. This implies that the full sequence (d (m n, m)) converges to 0. In the case where = R d, we repeatedly use the following compactness criterium: Lemma 5.7 Let r p > 0 and K P p be such that sup µ K K R d x r dµ(x) < +. Then the set K is tight. If moreover r > p, then K is relatively compact for the d p distance. Note carefully that bounded subsets of P p are not relatively compact for the d k distance. For instance, in dimension d = and for p = 2, the sequence of measures µ n = n n δ 0 + n δ n 2 satisfies d 2 (µ n, δ 0 ) = for any n but µ n narrowly converges to δ 0. Proof of Lemma 5.7: Let ɛ > 0 and R > 0 sufficiently large. We have for any µ K: µ(r d x r \B R (0)) R r dµ(x) C R r < ɛ, R d \B R (0) where C = sup µ K R x r dµ(x) < +. So K is tight. d Let now (µ n ) be a sequence in K. From the previous step we know that (µ n ) is tight and therefore there is a subsequence, again denoted (µ n ), which narrowly converges to some µ. Let us prove that the convergence holds for the distance d p. Let R > 0 be large and let us set µ R n := Π BR (0) µ n and µ R := Π BR (0) µ, where Π BR (0) denotes the projection onto B R (0). Note that d p p(µ R n, µ n ) Π BR (0)(x) x p dµ n (x) x p dµ n (x) R d (B R (0)) c R r p x r dµ n (x) C (B R (0)) c R r p. C In the same way, p p(µ R C, µ). Let us fix ɛ > 0 and let us choose R such that (ɛ/3) p. R r p r p Since the µ R n have a support in the compact set B R (0) and weakly converge to µ R, Proposition 5.3 states that the sequence (µ R n ) converges to µ R for the distance d p. So we can choose n 0 large enough such that d p (µ R n, µ R ) ɛ/3 for n n 0. Then d p (µ n, µ) d p (µ R n, µ n ) + d p (µ R n, µ R ) + d p (µ R, µ) ɛ n n 0. 38

39 5.2 The Wasserstein space of probability measures on R d From now on we work in = R d. Let P 2 = P 2 (R d ) be the set of Borel probability measures on R d with a second ordre moment: m belongs to P 2 if m is a Borel probability on R d with R x 2 m(dx) < +. The Wasserstein distance is just the Monge-Kankorovich distance when d p = 2: [ ] /2 d 2 (µ, ν) = inf x y 2 dγ(x, y) (46) γ Π(µ,ν) R 2d where Π(µ, ν) is the set of Borel probability measures on R 2d such that γ(a R d ) = µ(a) and γ(r d A) = ν(a) for any Borel set A R d. An important point, that we shall use sometimes, is the fact that the optimal transport plan can be realized as an optimal transport map whenever µ is absolutely continuous. Theorem 5.8 (Existence of an optimal transport map) If µ P 2 is absolutely continuous, then, for any ν P 2, there exists a convex map Φ : R N R such that the measure (id R d, DΦ) µ is optimal for d 2 (µ, ν). In particular ν = DΦ µ. Conversely, if the convex map Φ : R N R satisfies ν = DΦ µ, then the measure (id R d, DΦ) µ is optimal for d 2 (µ, ν). The proof of this result, due to Y. Brenier [], exceeds the scope of these notes. It can be found in various places, such as [55]. 5.3 Polynomials on P(Q) Let Q be a compact metric space and let us denote as usual by P(Q) the set of probability measures on Q. We say that a map P C 0 (P(Q)) is a monomial of degree k if there are k real-valued continuous maps φ i : Q R (i =,..., k) such that P (m) = k i= Q φ i (x)dm(x) m P(Q). If Q is a compact subset of R d, it is usual convenient to also assume that the maps φ i are C. Note that the product of two monomials is still a monomial. Hence the set of polynomials, i.e., the set of finite linear combinations of monomials, is subalgebra of C 0 (P(Q)). It contains the unity: P (m) = for all m P(Q) (choose φ = ). It also separates points: indeed, if m, m 2 P(Q) are distinct, then there is some smooth map φ : R d R with compact support such that Q φ(x)dm (x) Q φ(x)dm 2(x). Then the monomial P (m) = Q φ(x)dm(x) separates m and m 2. Using Stone-Weierstrass Theorem we have proved the following: Proposition 5.9 The set of polynomials is dense in C 0 (P(Q)). 5.4 Hewitt and Savage Theorem We now investigate here the asymptotic behavior of symmetric measures of a large number of variables. Let us fix a compact probability metric space. We say that a measure µ on Q k (where k N ) is symmetric if, for any permutation σ on {,..., k}, π σ µ = µ, where π σ (x,..., x k ) = (x σ(),..., x σ(k) ). 39

40 For any k, Let m k be a symmetric measure on Q k and let us set, for any n < k, m k n = dm k (x n+,..., x n ). Q n k Then, from a diagonal argument, we can find a subsequence k + such that (m k n ) has a limit m n as k + for any n 0. Note that the m n are still symmetric and satisfies Q dm n+(x n+ ) = m n for any n. Hewitt and Savage describes the structure of such sequence of measures. Theorem 5.0 (Hewitt and Savage) Let (m n ) be a sequence of symmetric probability measures on Q n such that Q dm n+(x n+ ) = m n for any n. Then there is a probability measure µ on P(Q) such that, for any continuous map f C 0 (P(Q)), Moreover lim f n + Q n ( n ) n δ xi dm n (x,..., x n ) = i= m n (A A n ) = P(Q) for any n N and any Borel sets A,..., A n Q. P(Q) f(m)dµ(m). (47) m(a )... m(a n )dµ(m) (48) Remark 5. An important case is when the measure m n = n i= m 0, where m 0 P(Q). Then, because of (48), the limit measure has to be δ m0. In particular, for any continuous map f C 0 (P(Q)), (47) becomes lim f n + Q n ( n ) n δ xi dm n (x,..., x n ) = f(m 0 ). In particular, if d is the Kantorovich-Rubinstein distance on P(Q), then ( ) n lim d δ xi, m 0 dm n (x,..., x n ) = 0. n + Q n n i= i= Remark 5.2 (Probabilistic interpretation of the Hewitt and Savage Theorem) The above result is strongly related with De Finetti s Theorem (see for instance [38]). Let (Ω, A, P) be a probability space and ( k ) a sequence of random variables with values in Q. The sequence ( k ) is said to be exchangeable if for all n N, the law of ( σ(),..., σ(n) ) is the same as the law of (,..., n ) for any permutation σ of {,..., n}. For instance, if the ( n ) are iid, then the sequence is exchangeable. De Finetti s Theorem states that there is a σ algebra F conditional to which the ( i ) are iid: namely n P [ A,..., n A n F ] = P [ i A i F ] for any n N and any Borel sets A,..., A n Q. i= 40

41 Proof of Theorem 5.0: For any n let us define the linear functional L n (C 0 (P(Q))) by L n (P ) = P ( n δ yi )m n (dy,..., dy n ) P C 0 (P(Q)). Q n n i= We want to show that L n has a limit as n +. Since the L n are obviously uniformly bounded, it is enough to show that L n (P ) has a limit for any map P of the form P (m) = φ(x,..., x j )dm(x )... dm(x j ) (49) Q j where φ : Q j R is continuous, because such class of functions contain the monomials defined in subsection 5.3, and the set of resulting polynomials is dense in C 0 (P(Q)). Note that, for any n j and any y,..., y n Q, P ( n δ yi ) = n n j φ(y i,..., y ij ) i= (i,...,i j ) where the sum is taken over the (i,..., i j ) {,..., n} j. So L n (P ) = n j φ(y i,..., y ij )m n (dy,..., dy n ) Q n i,...,i j Since m n is symmetric and satisfies Q dm n j n (x j+,..., x n ) = m j, if i,..., i j are distinct we have φ(y i,..., y ij )m n (dy,..., dy n ) = Q n φ(y,..., y j )dm j (x,..., x j ). Q j On another hand n! {(i,..., i j ), i,..., i j distinct} = (n j)! n + n j, so that lim L n(p ) = φ(y,..., y j )dm j (x,..., x j ). n + Q j This prove the existence of a limit L of L n as n +. Note that L (C 0 (P(Q))), that L is nonegative and that L() =. By Riesz representation Theorem there is a unique Borel measure µ on P(Q) such that L(P ) = P(Q) P (m)dµ(m). It remains to show that the measure µ satisfies relation (48). Let P be again defined by (49). We have already proved that L(P ) = φ(y,..., y j )dm j (x,..., x j ) = P (m)µ(dm) Q j P(Q) where P(Q) P (m)µ(dm) = P(Q) ( ) φ(x,..., x j )dm(x )... dm(x j ) µ(dm) Q j Let now A,..., A j be closed subsets of Q. We can find a nonincreasing sequence (φ k ) of continuous functions on R j which converges to A (x )... Aj (x j ). This gives (48) for any closed subsets A,..., A j of Q, and therefore for any Borel measurable subset of A,..., A j of Q. The fact that we are working on a compact set plays little role and this assumption can be removed, as we show in a particular case. 4

42 Corollary 5.3 Let m 0 be probability measure on a Polish space with a first order moment (i.e., m 0 P ()) and let m n = n i= m 0 be the law on N of n iid random variables with law m 0. Then, for any Lipschitz continuous map f C 0 (P ()), lim n + n f ( n ) n δ xi dm n (x,..., x n ) = f(m 0 ). i= Proof : For ɛ > 0 let K ɛ be a compact subset of such that µ 0 (K ɛ ) ɛ. We also choose K ɛ in such a way that, for some fixed x, \K ɛ d(x, x)dm 0 (x) ɛ. Without loss of generality we can suppose that x K ɛ. Let us now denote by π the map defined by π(x) = x if x K ɛ, π(x) = x otherwise, and set m ɛ = π m 0 and m ɛ n = n i= m ɛ. Note that by definition m ɛ n = (π,..., π) m n. Since m ɛ is concentrated on a compact set, we have, from Theorem 5.0, ( ) n lim δ xi dm ɛ n + n n(x,..., x n ) = f(m ɛ ). n f i= On the other hand, using the Lipschitz continuity of f, one has for any n: ( ) n ( ) ( f δ xi d(m ɛ n m n ) n n i= n f n n δ xi f δ n n π(xi )) dm n i= i= ( ) n Lip(f) d δ xi, n δ n n n π(xi ) dm n i= i= Lip(f) d(x, x) dm 0 (x) Lip(f)ɛ \K ɛ In the same way, f(m 0 ) f(m ɛ ) Lip(f)d (m 0, m ɛ ) Lip(f)ɛ. Combining the above inequalities easy gives the result. Another consequence of the Hewitt and Savage Theorem is: Theorem 5.4 Let Q be compact and u n : Q n R be symmetric and converge to U : P(Q) R in the sense of Theorem 2.: lim n + sup u n () U(m n ) = 0 Q n and (m n ) be a sequence of symmetric probability measures on Q n such that Q dm n+(x n+ ) = m n for all n and µ be the associate probability measure on P(Q) as in the Hewitt and Savage Theorem. Then lim n + Q n u n (x,..., x n )dm n (x,..., x n ) = P(Q) U(m)dµ(m). Proof : From the convergence of u n to U we have lim n + u n (x,..., x n )dm n (x,..., x n ) U(m n x,...,x n )dm n (x,..., x n ) = 0, Q n Q n while, since U is continuous, Hewitt and Savage Theorem states that lim U(m n n + x,...,x n )dm n (x,..., x n ) = U(m)dµ(m). Q n P(Q) Combining these two relations gives the result. 42

43 5.5 Comments The study of optimal transport and Monge-Kantorivitch distances is probably one of the most dynamic areas in analysis in these last two decades. The applications of this analysis are numerous, from probability theory to P.D.Es and from to geometry. The first two subsections of this part rely on Villani s monographs [55, 56] or in the monograph by Ambrosio, Gigli and Savaré [5]. The definition of polynomials on P(Q) comes from [47], as well as the proof of the Hewitt and Savage Theorem (see also the original reference by Hewitt and Savage [3] and Kingman [38] for a survey on exchangeability). 6 Hamilton-Jacobi equations in the space of probability measures We are now interested in the analysis of Hamilton-Jacobi equations in the space of measures. As we shall see in section 7, such equations provide the right framework for the study of limits of large systems of Hamilton-Jacobi equations in finite dimensional spaces. The first part of this section is devoted to the notion of derivative in the Wasserstein space. The study of the Hamilton-Jacobi equations comes as a byproduct. 6. Derivative in the Wasserstein space The key idea of this section is to see probability measures on R d as laws of R d valued random variables on some probability space (Ω, A, P), and use the vector structure of the set of random variable to define derivatives. Let (Ω, A, P) be a probability space, where Ω is a Polish space, A the Borel σ algebra and P an atomless Borel probability measure on (Ω, A). If is a random variable on (Ω, A, P) we denote by L() the law of. Recall that, for any integer k and any probability measure m P 2 (R k ), there is some random vector R k such that L() = m. Let L 2 (Ω) be the set of random variables such that E[ 2 ] < +. If L 2 (Ω), we denote by 2 its norm. Note that belongs to L 2 (Ω) if and only if L() belongs to the Wasserstein space P 2. It is an easy exercise to show that d 2 can be realized on the probability space (Ω, A, P) as follows: d 2 (m, m 2 ) = inf{ 2 2, L( ) = m, L( 2 ) = m 2 } m, m 2 P 2. Let u : P 2 R. We denote by U its extension to L 2 (Ω) defined by U[] = u(l()) L 2 (Ω). (50) Note that the map U[] has the very particular property of depending only of the law of. Example : If u : P 2 R is the monomial u(m) = k where the ϕ i C c (R d ), then the associated fonction U is just U[] = i= R d ϕ i (x)dm(x) m P 2, (5) k E [ϕ i ()] L 2 (Ω). i= 43

44 Definition 6. We say that u is differentiable at m 0 P 2 if there is 0 L 2 (Ω) such that L( 0 ) = m 0 and U is Frechet differentiable at 0. We say that u is of class C in a neighborhood of m 0 P 2 if there is 0 L 2 (Ω) such that L( 0 ) = m 0 and U is of class C in a neighborhood of 0. Let us identify L 2 (Ω) with its dual space. If U [ 0 ] exists, one define the gradient DU[ 0 ] of U[ ] at 0 by U [ 0 ](Y ) = E( DU[ 0 ], Y ) Y L 2 (Ω). Theorem 6.2 (Law of the gradient) Let u : P 2 R and U be defined by (50). If u is differentiable at m 0 P, then for any L 2 (Ω) such that L() = m 0, U[ ] is differentiable at and the law of DU[] does not depend on. Remark 6.3 In fact the proof of the Theorem also applies if one change the probability space. So the definition is really intrinsic. Example : For instance a monomial u of the form (5) is of class C on P 2, because k DU[] = E [ϕ j ()] Dφ i () L 2 (Ω). j k i= Hence D m u(m)(x) = k ϕ j (y)dm(y) Dφ i (x) x R d, m P 2. j k R d i= In the proof of Theorem 6.2 we use the following technical result, proved below. Here we need the fact that Ω is an atomless Polish space: Lemma 6.4 Let, Y L 2 (Ω) with the same law. Then, for any ɛ > 0, there is a bijective map τ : Ω Ω such that τ and τ are measurable and measure preserving and Y τ ɛ. Proof of Theorem 6.2: Since u is differentiable, there is some 0 L 2 (Ω) such that L( 0 ) = m 0 and U is differentiable at 0. Let L 2 (Ω) such that L() = m 0. Since and 0 have the same law, Lemma 6.4 states that, for any h > 0 there is a bijective map τ h : Ω Ω such that τ h and τ h are measurable and measure preserving and 0 τ h h. In particular 0 τ h 2 h. Since U depends only on the law of the random variable and since it is differentiable at 0, we have U[ + H] = U[ τ h + H τ h ] = U[ 0 ] + E [ DU[ 0 ], τ h + H τ h 0 ] + τ h + H τ h 2 ɛ( τ h + H τ h ) = U[] + E [ DU[ 0 ] τ h, + H 0 τ h ] + τ h + H τ h 2 ɛ( τ h + H τ h ) (52) Let us show that (DU[ 0 ] τ h ) is a Cauchy sequence as h 0. Indeed, if we fix ɛ > 0 and choose δ (0, ɛ) such that ɛ(h) ɛ if H 2 2δ, then for any h, h (0, δ 2 ) and any H L 2 (Ω) with H 2 δ, we have, thanks to (52) E [ DU[ 0 ] τ h DU[ 0 ] τ h, H ] 2 DU[ 0 ] 2 δ 2 + [δ + δ 2 ]ɛ, 44

45 which gives, for a suitable choice of H, DU[0 ] τ h DU[ 0 ] τ 2 h 2 DU[ 0 ] 2 δ + [ + δ]ɛ Cɛ. Let us denote by Z the limit of DU[ 0 ] τ h as h 0. Since τ h is measure preserving, the DU[ 0 ] τ h, and therefore Z, have the same law as DU[ 0]. Letting finally h 0 in (52) gives U[ + H] U[] E [ Z, H ] H 2 ɛ for any ɛ > 0 and any H with H 2 sufficiently small. This proves that DU[] exists and is equal to Z. Proof of Lemma 6.4 : Let us cover R d with an enumerable family (A n ) of Borel sets with a diameter less than ɛ. Let B n = (A n ) and B n = Y (A n ). Then (B n ) and (B n) form two measurable partitions of Ω. Since and Y have the same law, P(B n ) = P(B n). Hence, Ω being an atomless Polish space, it is well-known that there is bijection τ n : B n B n such that τ n and τn are measurable and measure preserving. If we set τ = n τ n Bn, then τ is a bijection of Ω and τ and τ preserve the measure. Moreover, Y τ ɛ. Theorem 6.5 (Structure of the gradient) Let u : P 2 R be of classe C, µ P 2 and L 2 (Ω) be such that L() = µ. Then there is some ξ L 2 µ(r d, R d ) such that DU[] = ξ() P a.s. Proof : Let us first assume that µ is absolutely continuous with respect to the Lebesgue measure and satisfies x 6 dµ(x) < +. (53) R d For ɛ, α > 0 we look at the perturbed problem min Y L 4 (Ω) U[Y ] + 2ɛ E[ Y 2 ] + αe[ Y 4 ] (54) Let us first show that minimum exists: let (Z n ) be a minimizing sequence, ν n = L(Z n ). Since µ is absolutely continuous, Theorem 5.8 states that there is some convex map ψ n such that ν n = ψ n µ. Let Y n = ψ n (). Then L(Y n ) = ν n and d 2 (µ, ν n ) = Y n 2. Note that (Y n ) is also a minimizing sequence because U[Y n ] + 2ɛ E[ Y n 2 ] + αe[ Y n 4 ] U[Z n ] + 2ɛ E[ Z n 2 ] + αe[ Z n 4 ] since U[ ] and E[ 4 ] depend only of the law of the random variable, and since, by definition, E[ Y n 2 ] E[ Z n 2 ]. Note that, since U[ ] is differentiable at, it is locally bounded in a neighborhood of. So, for ɛ > 0 sufficiently small, sup n x 4 dν n (x) = sup R d n E [ Y n 4] < +. In particular the sequence (ν n ) is tight. Let us still denote by (ν n ) a converging subsequence and let ν be its limit. Then Lemma 5.7 states that (ν n ) converges to ν in P 2. Therefore lim n U[Y n ] = lim n u(ν n ) = u(ν) and lim n E[ Y n 2 ] = lim n d 2 2(µ, ν n ) = d 2 2(µ, ν) 45

46 while, by convexity, lim inf n E[ Y n 4 ] = lim inf n x 4 dν n (x) x 4 dν(x). R d R d Since µ is absolutely continuous, there is a convex map ψ α,ɛ : R d R d such that ν = Dψ α,ɛ µ and d 2 2 (µ, ν) = R d Dψ α,ɛ (x) x 2 dµ(x). Then Y α,ɛ = Dψ α,ɛ () is a minimizer in (54). Since U is everywhere differentiable and Y α,ɛ = Dψ α,ɛ () is a minimizer in (54), we have, for any Z L (Ω), U [Y α,ɛ ](Z) + ɛ E[ (Y α,ɛ ), Z ] + 4αE[ Y α,ɛ 2 Y α,ɛ, Z ] = 0. Since Z is dense in L 2 (Ω) and, Y α,ɛ, DU[Y α,ɛ ] L 2 (Ω), the above equality implies that Y α,ɛ 2 Y α,ɛ L 2 (Ω) and that DU[Y α,ɛ ] = ɛ (Y α,ɛ ) 4α Y α,ɛ 2 Y α,ɛ. In particular DU[Y α,ɛ ] σ(). But σ() is closed in L 2 (Ω) and (Y α,ɛ ) converges to in L 2 (Ω) as ɛ 0 because of (53). So DU[] σ() thanks to the continuity of DU. Next we remove the assumption (53) but still assume that is absolutely continuous. We note that n = n/(n+ ) converges to in L 2 (Ω) and is absolutely continuous. On another hand, since E[ n 4 ] < +, we have from the previous step that DU[ n ] σ( n ) σ(). Letting n + we get that DU[] σ(). Finally we remove the absolute continuity assumption. We can extend the probability space Ω in such a way that there are two a Gaussian variables W and W 2 such that, W and W 2 are independent. Then n := + W i /n (for i =, 2) is absolutely continuous and therefore DU[ n ] σ(, W i ). Letting n + gives that DU[] σ(, W i ) for i =, 2. Since σ(, W ) σ(, W 2 ) = σ(), we get the result. Recall that a random variable Y R d is adapted if and only if there exists a Lebesgue measurable map ξ : R d R d such that Y = ξ() P a.s. (cf. Billingsley Probability and Measure, Theorem 20.). So there is some measurable map ξ such that DU[] = ξ(). Note that ξ is defined µ a.e. and belongs to L 2 µ because DU[] = ξ() L 2 (Ω). 6.2 First order Hamilton-Jacobi equations We consider equations of the form u t (m, t) + H(m, t, D mu(m, t)) = 0 in P 2 (R d ) [0, T ] (55) where H = H(m, t, ξ) is defined from (m, t) P 2 [0, T ] and ξ L 2 m(r d, R d ). Definition 6.6 We say that a map u is a (sub, super) solution of the HJ equation (55) if the map U : L 2 (Ω, R d ) R defined by is a (sub, super) solution of the HJ equation U[] = u(l()) L 2 (Ω, R d ) U t (, t) + H(, t, DU(, t)) = 0 in L 2 (Ω, R d ) [0, T ] (56) where H : L 2 (Ω) [0, T ] L 2 (Ω, R d ) R coincides with H is the sense that H(, t, ξ()) = H(L(), t, ξ) (, t) L 2 (Ω) [0, T ], ξ L 2 L() (Rd, R d ). 46

47 Let us recall that the definition of a viscosity solution in a Hilbert space does not differ from the usual one: Definition 6.7 We say that U is a subsolution of the HJ equation (56) if, for any test function φ C (L 2 ) such that the map U φ has a local maximum at ( 0, t 0 ) L 2 (0, T ] one has φ t ( 0, t 0 ) + H( 0, t 0, DU( 0, t 0 )) 0 in L 2 (Ω, R d ) [0, T ] In a symmetric way U is a supersolution of the HJ equation (56) if, for any test function φ C (L 2 ) such that the map U φ has a local minimum at ( 0, t 0 ) L 2 (0, T ] one has φ t ( 0, t 0 ) + H( 0, t 0, DU( 0, t 0 )) 0 in L 2 (Ω, R d ) [0, T ] Finally U is a solution of (56) if U is a sub and a super solution (56). To illustrate the powerful aspect of the approach described above, let us give some ideas on the analysis to the toy model of the Eikonal equation: u t + 2 D mu 2 = 0 in P 2 (R d ) [0, T ] (57) The main point is the following comparison Theorem: Theorem 6.8 Let u, u 2 : P 2 R be two uniformly continuous, bounded maps such that u is a subsolution of (57) and u 2 is a supersolution of (57) with u u 2 at t = 0. Then u u 2. Remark 6.9 In fact, much more is said in Lions lectures [47] about this equation, that we do not develop here for lack of time: for instance, one can show the following representation formula: { u(m, t) = inf u 0 (m ) + } m P 2 t d2 2(m, m ) (m, t) P 2 (0, T ]. One can also prove the following stability property: for N, let u N be the solution to { t u N + N N 2 i= D x i u N 2 = 0 in R Nd (0, T ) u N (x,..., x N, 0) = u N 0 (x,..., x N ) in R Nd If u N 0 converges to some u 0 : P 2 R is the sense of Theorem 2. (with d 2 replacing d ), then u N converge to the unique solution of the Eikonal equation (57) with initial condition u 0. Proof : The proof is a direct application of [6]. Let U i [, t] = u i (L(), t). Then U i are uniformly continuous and bounded on L 2, U is a subsolution of equation U t (, t) + DU(, t) 2 2 = 0 in L 2 (Ω, R d ) [0, T ] (58) while U 2 is a supersolution of (58). Let ɛ, σ, δ, α (0, ) to be choose later and look at the map Φ(, t, Y, s) = U [, t] U 2 [Y, s] 2ɛ (, t) (Y, s) 2 2 α 2 ( Y 2 2) σt From Stegall Theorem there are ξ s, ξ t R, ξ, ξ Y L 2 such that ξ s, ξ t, ξ 2, ξ Y 2 δ and the map (, t, Y, s) Φ(, t, Y, s) ξ, ξ Y, Y ξ t t ξ s s has a maximum at a point, t, Ȳ, s. 47

48 Since the U i are bounded by some M and Φ(, t, Ȳ, s) ξ, ξ Y, Y ξ t t ξ s s Φ(0, 0, 0, 0) we have α( Ȳ 2 2) + ɛ (, t) (Ȳ, s) 2 2 C( + δ( Ȳ 2 2) 2 ) where C only depends on M. So ( ( Ȳ 2 2) δ 2 C α + ) α (59) and (, t) (Ȳ, s) 2 C ɛ ( + δ ( δ α + ) ) 2 α. (60) Let us now assume for a while that t > 0 and s > 0. Since the map (, t) Φ(, t, Ȳ, s) ξ, ξ Y, Ȳ ξ tt ξ s s has a maximum at (, t) and U is a subsolution, we have ξ t + σ + t s + ɛ ξ + α + Ȳ 2 ɛ 0. In the same way, since the map (Y, s) Φ(, t, Y, s) ξ, ξ Y, Y ξ t t ξ s s has a maximum at (Ȳ, s) and U 2 is a supersolution, we have ξ s + t s + ɛ ξ Y αȳ + Ȳ 2 ɛ 0. Computing the difference in the two inequalities gives σ + ξ t + ξ s + ξ + α + Ȳ 2 ɛ ξ Y αȳ + Ȳ ɛ Hence σ 2δ 2( ξ 2 + ξ Y 2 + α( 2 + Ȳ 2)) Ȳ ɛ ( ξ Y 2 + α Ȳ 2) Using estimates (59) and (60) and the fact that ɛ we get σ 2δ C(δ + ( α) + ( δ δ α + ) ) 2 (δ + α) 2 0. α If we let first δ 0, and then α 0 we get a contradiction because σ > 0. So we can choose δ and α small enough (independently of ɛ) such that either t = 0 or s = 0. Let us assume to fix the ideas that t = 0. Let ω be a modulus of continuity of U and U 2. Then for any (, t), Φ(, t,, t) (ξ + ξ Y ), (ξ t + ξ s )t Φ(, 0, Ȳ, s) ξ, ξ Y, Ȳ ξ s s where, from (59) Φ(, 0, Ȳ (, s) U [, 0] U 2 [Ȳ, s] U 2[, 0] U 2 [Ȳ, s] ω( (, 0) (Ȳ, s) 2) ω C ( ɛ + ( δ δ α + ) )) 2 α 48

49 while, from (60), ( ξ, ξ Y, Ȳ ξ s s Cδ + δ α + ) α. So, letting ɛ 0, and then δ 0 and α 0, we get U [, t] U 2 [, t] Comments The approach of derivatives in the Wasserstein space through L 2 (Ω) exposed here goes back to Lions [47]. Lemma 6.4 is borrowed from [36]. There are several alternative definitions in the literature, which are often more direct and more intrisic: see in particular the monograph by Ambrosio, Gigli and Savaré [5] and the references therein. The link between all these approaches remains to be done. In particular, following [5], the tangent space T µ P 2 to P 2 at a measure µ P 2 can be defined as as the closure, in L 2 µ(r d, R d ), of the set of gradients of smooth, compactly supported functions: T µ P 2 = {Dφ, φ C c } L2µ. It is reasonable to expect that the derivative D m u of a differentiable map m : P 2 R belongs to T µ P 2, at least if µ is absolutely continuous with respect to the Lebesgue measure. Here is a definition of derivative introduced in [5]: if u : P 2 R we denote by D u(µ) the subdifferential of u at µ, which is the set of ξ T µ P 2 such that u(ν) u(µ) sup γ Π opt(µ,ν) R d R d ξ(x), x y dγ(x, y) + o(d 2 (µ, ν)) ν P 2, where Π opt (µ, ν) is the set of optimal plans from µ to ν. One easily checks that it is a closed convex subset of T µ P 2. The superdifferential D + u(µ) is defined by D + u(µ) = D ( u)(µ). One can prove that, if D u(µ) and D + u(µ) are both nonempty, then D u(µ) and D + u(µ) coincide and are reduced to a singleton {ξ}. So it is natural to call this element the derivative of u. Once introduced the identification between measure and random variables, it is tempting to work in the Hilbert space L 2 (Ω) instead of work in the metric space P 2 : in particular this approach of Hamilton-Jacobi equation (again coming back to Lions [47]) allows to use the various tools developed for first [6, 7, 8] and second order [48, 49, 50] HJ equations in infinite space dimension. This is however not the only possible definition of such HJ equations: other approaches, more intrisic, can be found in Cardaliaguet and Quincampoix [3], Feng and Katsoulakis [23] and Gangbo, Nguyen and Tudorascu [25]. Lions describes also in the lectures [47] how to handle second order HJ in the Wasserstein space, how to pass to the limit in the equations, etc... 7 Heuristic derivation of the mean field equation In this section we explain how the mean field equation can be at least heuristically derived from a large system of HJ equations arising when one consider Nash equilibria in feedback form for many players. This part is entirely borrowed from Lions s lectures [47]. 49

50 7. The differential game We consider a differential game involving N players in R d. The position of player i at time t is denoted by x i (t). Each player is controlling its velocity by using some control α i (t). Equation of motion is therefore just x i(t) = α i (t). Player i aims at minimizing a cost of the form J i (x, t, (α j ) j ) = T t L i (x (s),..., x N (s), α i (s))ds + g i (x (T ),..., x N (T )). We will work under the following structure conditions on L i and F i. Roughly speaking one only needs these function to be symmetric. However to simplify the notations we assume that L i (x,..., x N, α) = 2 α 2 + F δ xj N where F : P 2 R is continuous, and g i (x,..., x N ) = g x i, N where g : R d P 2 R is continuous. We assume that a smooth, symmetric Nash equilibrium in feedback form exists for this game. More precisely, we assume that there is a map U N : R d [0, T ] (R d ) (N ) R such that satisfies the system of HJ equations: d dt U N i t + 2 Dxi U N i j i j i δ xj U n i (x i, t, (x j ) j i ) = U N (x i, t, (x j ) j i ) 2 F N j i δ xj + j i D xj U N j, D xj U N i = 0 with Ui N = g i at t = T. We claim that the family of feedbacks (ᾱ i (x, t) = D xi Ui N (x, t)) provides a Nash equilibrium for the game. Indeed let us fix an initial condition ( x, t) R dn [0, T ) and some i and let us assume that Player i deviates and uses the time measurable control α i : [t, T ] R d instead of ᾱ i. Then [ T ] Ui N (x(t), t) L i (x(s), s, α(s))ds + D xi U N, α i + xj Ui j i D N, ᾱ j + 2 α i 2 + F t = U N i t U N i t 2 D x i U N 2 j i D xj U N i, D xj U N j + F = 0 with an equality everywhere if α i = ᾱ i. Integrating these inequalities over [ t, T ] gives [ T ] g i (x(t )) Ui N ( x, t) L i (x(s), s, α(s))ds 0, which means that t J i (x, t, α i, (ᾱ j ) j i ) U N i (x, t) = J i (x, t, (ᾱ j )). This proves that (ᾱ j ) is a Nash equilibrium of the game. 50

51 7.2 Derivation of the equation in P 2 We now assume that the U N satisfy the estimes sup D x,tu N (x, t, (x j )) C x,t,(x j ) j 2 and sup x,t,(x j ) j 2 D xj U N (x, t, (x j )) C N for j 2. Under these condition, and up to a subsequence, we expect that there is a map U : R d [0, T ] P 2 R such that, for any R > 0 sup U N (x, t, m N (x j ) ) U(x, t, mn x ) 0 x R,t,(x j ) j 2 as N +, where as before we have set m N (x j ) = N δ xj and m N x = N j 2 N δ xj. j= One expects that, in the viscosity sense, and U N i t U(x, t, m) t D U N xi 2 i D x U(x, t, m) 2. The only term on which we have to work is D xj Uj N, D xj Ui N. It can be proved (see [47]) that j i D xj Uj N, D xj Ui N D m U(x, t, m), D x U(, t, m) L 2 m. j i So we have heuristically explained that the limit of the U N is some U C 0 (R d [0, T ] P 2 ) which satisfies { U t + 2 D xu(x, t, m) 2 F + D m U, D x U L 2 m = 0 in R d (0, T ) P 2 (6) U(x, T, m) = g(x, m) 7.3 From the equation in P 2 to the mean field equation In order to derive the Mean Field Equation (MFE) from equation (6) let us fix some initial measure m 0 P 2 and solve the equation m t div x ((D x U(x, t, m(t)))m(x, t)) = 0 in R d (0, T ), 5

52 with m(x, 0) = m 0 (x). This equation has to be understood in the sense of distributions: d φdm(t) = Dφ, D x U(x, t, m(t)) dm(t) φ Cc (R d ). dt R d R d The existence and uniqueness of a solution for this equation strongly depends on the regularity properties of the vector field (x, t) D x U(x, t, m(t)). Indeed, following subsection 4.2, if we denote by s,x t the flow of the differential equation x (t) = D x U(x(t), t, m(t)) with initial condition x(s) = x, then a solution of our problem is given by 0, t us set n(t) = 0, t m 0. Then, for any smooth test function φ we have d φdn(t) = d dt R d dh h=0 By uniqueness one should expect m(t) = n(t). We claim that, for any V C (P 2 ), we have Indeed m 0. Indeed, let R d φ( t,x t+h )dn(t) = R d Dφ(x), D x U(x, t, m(t)) dn(t) d dt V (m(t)) = D mv, D x U(, t, m(t)) L 2 m(t). d dt V (m(t)) = d dh h=0 V ( t, t+h m(t)). Now let Y L 2 (Ω) be a random variable such that L = m(t). Then L( t,y t+h ) = t, t+h m(t) and d dh h=0 t,y t+h = D xu(y, t, m(t)). Let us apply the chain rule to the extension of V to L 2 (Ω): We have d dh h=0 V [ t,y t+h ] = E [ DV [Y ], D xu(y, t, m(t)) ] = D m V, D x U(, t, m(t)) L 2 m(t), which proves the claim. Let us now set u(x, t) = U(x, t, m(t)). Then the above considerations show that u t = U t + D mu, D x U L 2 m(t) = 2 D xu 2 + F. In conclusion we end up with the system of mean field equation: u t + 2 Du 2 = F (m) in R d (0, T ) m t div ((Du(x, t)m(x, t)) = 0 in Rd (0, T ) m(0) = m 0, u(x, T ) = g(x, m(t )) (62) The first equation has to be satisfied in the viscosity sense, while the second one holds in the sense of distributions. 52

53 8 Appendix 8. Nash equilibria in classical differential games Let S,..., S N be compact metric spaces, J,..., J N be continuous real valued functions on N i= S i. We denote by P(S i ) the compact metric space of all Borel probability measures defined on S i. Definition 8. A Nash equilibrium in mixted strategies is a N tuple ( π,..., π N ) N i= P(S i) such that, for any i =,..., N, J i ( π,..., π N ) J i (( π j ) j i, π i ) π i P(S i ). (63) where by abuse of notation J i (π,..., π N ) = J i (s,..., s N )dπ (s )... dπ N (s N ). S S N Remark 8.2 Note that condition (63) is equivalent to J i ( π,..., π N ) J i (( π j ) j i, s i ) s i S i. This later characterization is very convenient and used throughout the notes. Theorem 8.3 ((Nash, 950)(Glicksberg, 952)) Under the above assumptions, there exists at least one equilibrium point in mixed strategies. Proof: It is a straightforward application of Fan s fixed point Theorem [22]: let be a nonempty, compact and convex subset of a locally convex topological vector space. Let φ : 2 be an upper semicontinuous set-valued map such that φ(x) is non-empty, compact and convex for all x. Then φ has a fixed point: x with x φ( x). Let us recall that the upper semicontinuity of set-valued function φ : 2 means that, for every open set W, the set {x, φ(x) W } is open in. Let us set = N j= P(S i) and let us consider the best response map R i : P(S i ) of player i defined by R i ((π j ) j=,...,n ) = { } π P(S i ), J i ((π j ) j i, π) = min J i((π j ) j i, π ) π P(S i ) Then the map φ((π j ) j=,...,n ) = N i= R i((π j ) j=,...,n ) is upper semicontinuous with non-empty, compact and convex values. Therefore it has a fixed point, which is a Nash equilibrium. We now consider the case where the game is symmetric. Namely, we assume that, for all i {,..., N}, S i = S and J i (s,..., s N ) = J θ(si )(s θ(),..., s θ(n) ) for all and all permutation θ on {,..., N}. Theorem 8.4 (Symmetric games) If the game is symmetric, then there is an equilibrium of the form ( π,..., π), where π P(S) is a mixed strategy. Proof : Let = P(S) and R : 2 be the set-valued map defined by { } R(π) = σ, J i (σ, π,..., π) = min J i(σ, π,..., π). σ Then R is upper semicontinuous with nonempty convex compact values. By Fan s fixed point Theorem, it has a fixed point π and, from the symmetry of the game, the N tuple ( π,..., π) is a Nash equilibrium.. 53

54 8.2 Desintegration of a measure Theorem 8.5 Let and Y be two Polish spaces and λ be a Borel probability measure on Y. Let us set µ = π λ, where π is the standard projection from Y onto. Then there exists a µ-almost everywhere uniquely determined family of Borel probability measures (λ x ) on Y such that. the function x λ x is Borel measurable, in the sense that x λ x (B) is a Borel-measurable function for each Borel-measurable set B Y, 2. for every Borel-measurable function f : Y [0, + ], f(x, y)dλ(x, y) = f(x, y) dλ x (y)dν(x). Y See for instance the monograph [5]. 8.3 Ekeland s and Stegall s variational principles When working with Hamilton-Jacobi equations in infinite dimension, one needs to know in what extend a function reaches it minimum, at least approximately. There are two types of results in this direction: Ekeland variational principle (which works in metric spaces, so that it can be used for direct approaches of HJ in the Wasserstein space) and Stegall variational principle (where the underlying space must be a Banach space with some dentability property). Let us start with Ekeland variational principle. Let (, d) be a complete metric space and φ : R {+ } be a proper lower semi-continuous maps which is bounded from below. Theorem 8.6 (Ekeland) For any ɛ > 0 and any x 0 there is some x such that { i) f( x) + ɛd( x, x0 ) f(x 0 ) ii) f( x) < f(x) + ɛd(x, x) x \{ x} As an immediate consequence we have: Corollary 8.7 Under the same assumptions of Theorem 8.6, let λ, ɛ > 0 and x 0 be such that f(x 0 ) inf f + λɛ. Then there is some x such that i) f( x) f(x 0 ) ii) d( x, x 0 ) λ iii) f( x) f(x) + ɛd(x, x) x Y Proof of Theorem 8.6: It is enough to do the proof for ɛ =. Let us set F (x) = {y, f(y) + d(x, y) f(x)} and v(x) = inf f(y) x. y F (x) Note that F (x), because x F (x), that y F (x) F (y) F (x) and that diam(f (x)) f(x) v(x) x. 54

55 We define by induction a sequence (x n ) starting at x 0 and such that x n+ F (x n ) and f(x n+ ) v(x n ) + 2 n n N. Then (F (x n )) is a nonincreasing family of closed sets. Its diameter converges to 0 because diam(f (x n+ )) f(x n+ ) v(x n+ ) v(x n ) v(x n+ ) + 2 n 2 n. Since is complete, this implies that there is some x n F (x n). We claim that F ( x) = { x}. Indeed, if y F ( x), then y F (x n ) for any n so that d( x, y) diam(f (x n )) 0. So F ( x) = { x}, which implies that f( x) < f(y) + d( x, y) for any y \{x}. Finally x F (x 0 ), so that f( x) + d( x, x 0 ) f(x 0 ). Now we turn to Stegall variational principle. For this we assume that (, ) be a real Banach space. The closed unit ball around the origin in is denoted by B. We say that is dentable if for every ɛ > 0 every nonempty bounded set D has a slice with norm diameter at most ɛ, i.e., there are ξ and α > 0 so that x x 2 ɛ whenever x i D and ξ, x i > sup ξ, D α, i =, 2. There exist plenty of conditions which are equivalent to the dentability of : for instance the Radon-Nikodym property, which states that for every nonempty closed bounded set D there exist x D and 0 ξ such that ξ, x = sup ξ, D. In particular Hilbert spaces are dentable. For f : [0, + ] is said to attain a strong minimum at x if x is a minimum of f and x n x whenever x and f(x n ) f(x). Theorem 8.8 (Stegall s variational principle) For a real Banach space (, ) the following assertions are equivalent. (i) is dentable. (ii) For every coercive lower semicontinuous function f : [0, + ] and for every ɛ > 0 there are x and ξ, with ξ < ɛ, such that the function f ξ attains a strong minimum at x. (iii) For every coercive continuous convex function f : [0, + ) there are x and ξ such that the function f ξ attains a strong minimum at x. Stegall proved in ([53], [54]) that (ii) and (iii) hold in any Radon-Nikodym space. The above equivalence is given in [2]. See also the very nice monograph by Phelps [5]. References [] Achdou Y. and Capuzzo Dolcetta I. Mean field games: Numerical methods. Pre-print hal [2] Achdou Y., Camilli F. and Capuzzo Dolcetta Mean field games: numerical methods for the planning problem, Preprint 200. [3] Ambrosio, L., Transport equation and Cauchy problem for non-smooth vector fields. Calculus of variations and nonlinear partial differential equations, 4, Lecture Notes in Math., 927, Springer, Berlin,

56 [4] Ambrosio, L., Transport equation and Cauchy problem for BV vector fields. Inv. Math. 58 (2004), [5] Ambrosio, L., Gigli, N., Savaré, G. Gradient flows in metric spaces and in the space of probability measures. Second edition. Lectures in Mathematics ETH Zürich. Birkhäuser Verlag, Basel, [6] Aubin J.-P. and Frankowska H. Set-valued analysis. Birkhäuser, Boston (990). [7] Aumann R. Markets with a continuum of traders. Econometrica, 32(/2), 964. [8] Bardi M. & Capuzzo Dolcetta I. (996) Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations. Birkhäuser. [9] Barles G. (994) Solutions de viscosité des équations de Hamilton-Jacobi. Springer-Verlag, Berlin. [0] Ben Moussa B., Kossioris G.T. On the system of Hamilton-Jacobi and transport equations arising in geometric optics, Comm. Partial Diff. Eq., 28 (2003), [] Brenier, Y. Polar factorization and monotone rearrangement of vector-valued functions. Comm. Pure Appl. Math. 44, 4 (99), [2] Cannarsa P. and Sinestrari C. Semiconcave functions, Hamilton-Jacobi equations, and optimal control. Progress in Nonlinear Differential Equations and their Applications, 58. Birkhäuser Boston Inc., Boston, MA, [3] Cardaliaguet P. & Quincampoix M., Deterministic differential games under probability knowledge of initial condition. Int. Game Theory Rev. 0 (2008), no., 6. [4] Carmona G. Nash equilibria of games with a continuum of players. Preprint, [5] Clarke F. H. Optimization and nonsmooth analysis. Second edition. Classics in Applied Mathematics, 5. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 990. [6] Crandall M. & Lions P.-L. (985) Hamilton-Jacobi Equations in Infinite Dimensions I., Journal of Functional Analysis, Vol 62, pp [7] Crandall M. & Lions P.-L. (986) Hamilton-Jacobi Equations in Infinite Dimensions II., Journal of Functional Analysis, Vol 65, pp [8] Crandall M. & Lions P.-L. (986) Hamilton-Jacobi Equations in Infinite Dimensions III., Journal of Functional Analysis, Vol 68, pp [9] Dafermos, C. Hyperbolic conservation laws in continuum physics. Springer Verlag, [20] Di Perna R. J., Lions P.-L. Ordinary differential equations, transport theory and Sobolev spaces. Inv. Math. 98 (989), [2] Fabiana, M., Finet C. On Stegall s smooth variational principle Nonlinear Analysis Volume 66, Issue 3, February 2007, Pages

57 [22] Fan, K. (952). Fixed-point and Minimax Theorems in Locally Convex Topological Linear Spaces. Proc. Nat. Acad. Sci. U.S.A. 38(2) (952), [23] Feng J. and Katsoulakis M. A Comparison Principle for Hamilton-Jacobi Equations Related to Controlled Gradient Flows in Infinite Dimensions, Arch. Rational Mech. Anal. 92 (2009) [24] Fleming W.H. & Soner H.M. (993) Controlled Markov processes and viscosity solution. Springer-Verlag, New-York. [25] Gangbo, W., Nguyen, T., Tudorascu, A. Hamilton-Jacobi equations in the Wasserstein space. Methods Appl. Anal. 5 (2008), no. 2, [26] Gomes D.A., Mohr J. and Souza R.R. Discrete time, finite state space mean field games. Journal de Mathématiques Pures et Appliquées, 93(3) (200), [27] Gosse L. and James F. Convergence results for an inhomogeneous system arising in various high frequency approximations. Numerische Mathematik, 90 (2002), 4, [28] Guéant O. Mean field games and applications to economics. PhD thesis, Université Paris-Dauphine, [29] Guéant, O. A reference case for mean field games models. J. Math. Pures Appl. (9) 92 (2009), no. 3, [30] Guéant, O., Lions, P.-L., Lasry, J.-M. Mean Field Games and Applications. Paris- Princeton Lectures on Mathematical Finance 200. Tankov, Peter; Lions, Pierre- Louis; Laurent, Jean-Paul; Lasry, Jean-Michel; Jeanblanc, Monique; Hobson, David; Guéant, Olivier; Crépey, Stéphane; Cousin, Areski. Springer. Berlin. 20. pages [3] Hewitt, E. and Savage, L. J. (955). Symmetric measures on Cartesian products. Transactions of the American Mathematical Society, 80: [32] Huang, M., Caines, P.E., Malhamé, R.P. (2006). Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle. Communication in information and systems. Vol. 6, No. 3, pp [33] Huang, M., Caines, P.E., Malhamé, R.P. (2007). Large-Population Cost-Coupled LQG Problems With Nonuniform Agents: Individual-Mass Behavior and Decentralized ɛ-nash Equilibria. IEEE Transactions on Automatic Control, 52(9), p [34] Huang, M., Caines, P.E., Malhamé, R.P. (2007). The Nash Certainty Equivalence Principle and McKean-Vlasov Systems: an Invariance Principle and Entry Adaptation. 46th IEEE Conference on Decision and Control, p [35] Huang, M., Caines, P.E., Malhamé, R.P. (2007). An Invariance Principle in Large Population Stochastic Dynamic Games. Journal of Systems Science & Complexity, 20(2), p

58 [36] Jouini E., Schachermayer W. and Touzi N. Law Invariant Risk Measures have the Fatou Property. Advances in mathematical economics. Vol. 9, 49 7, Adv. Math. Econ., 9, Springer, Tokyo, [37] Karatzas-Schreeve Brownian motion and stochastic calculus. Second edition. Graduate Texts in Mathematics, 3. Springer-Verlag, New York, 99. [38] Kingman, J. F. C. Uses of exchangeability. Ann. Probability 6 (978), no. 2, [39] Lachapelle A. Human crowds and groups interactions: a mean field games approach. Pre-print. [40] Lachapelle A., Salomon J. and Turinici G. Computation of mean field equilibria in economics. Mathematical Models and Methods in Applied Sciences, Issue 0, Vol., pp-22, 200. [4] Ladyženskaja O.A., Solonnikov V.A and Ural ceva N.N Linear and quasilinear equations of parabolic type. Translations of Mathematical Monographs, Vol. 23 American Mathematical Society, Providence, R.I. 967 [42] Lasry, J.-M., Lions, P.-L. Large investor trading impacts on volatility. Ann. Inst. H. Poincaré Anal. Non Linéaire 24 (2007), no. 2, [43] Lasry, J.-M., Lions, P.-L. Mean field games. Jpn. J. Math. 2 (2007), no., [44] Lasry, J.-M., Lions, P.-L. Jeux à champ moyen. II. Horizon fini et contrôle optimal. C. R. Math. Acad. Sci. Paris 343 (2006), no. 0, [45] Lasry, J.-M., Lions, P.-L. Jeux à champ moyen. I. Le cas stationnaire. C. R. Math. Acad. Sci. Paris 343 (2006), no. 9, [46] Lasry, J.-M., Lions, P.-L., Guéant O. Application of Mean Field Games to Growth Theory. Preprint, [47] Lions, P.-L. Cours au Collège de France. [48] Lions P.-L. Viscosity solutions of fully nonlinear second-order equations and optimal stochastic control in infinite dimensions. Part I: the case of bounded stochastic evolutions. I. The case of bounded stochastic evolutions. Acta Math. 6 (988), no. 3-4, [49] Lions P.-L. Viscosity solutions of fully nonlinear second-order equations and optimal stochastic control in infinite dimensions. III. Uniqueness of viscosity solutions for general second-order equations. J. Funct. Anal. 86 (989), no., 8. [50] Lions P.-L. Viscosity solutions of fully nonlinear second order equations and optimal stochastic control in infinite dimensions. II. Optimal control of Zakai s equation. Stochastic partial differential equations and applications, II (Trento, 988), 47 70, Lecture Notes in Math., 390, Springer, Berlin, 989. [5] Phelps, R. R. Convex functions, monotone operators and differentiability. Second edition. Lecture Notes in Mathematics, 364. Springer-Verlag, Berlin,

59 [52] Sznitman, A.-S. Topics in propagation of chaos. Cours de l Ecole d été de Saint-Flour, Lecture notes in mathematics vol. 464, Springer (989). [53] Stegall, Ch., Optimization of functions on certain subsets of Banach spaces, Math. Annal. 236 (978) [54] Stegall, Ch., Optimization and differentiation in Banach spaces, J. Linear Algebr. Appl. 84 (986) 9-2. [55] Villani C. Topics in optimal transportation. vol. 58 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, [56] Villani, C. Optimal transport : old and new, Springer, Berlin, (2009) 59