Chapter 11 Convergence in Distribution

Transcription

1 Chapter Covergece i Distributio. Weak covergece i metric spaces 2. Weak covergece i R 3. Tightess ad subsequeces 4. Metrizig weak covergece 5. Characterizig weak covergece i spaces of fuctios

2 2

3 Chapter Covergece i Distributio Weak covergece i metric spaces Suppose that (M, d) is a metric space, ad let M deote the Borel sigma-field (the sigma field geerated by the ope sets i M). Let C b (M) deote the set of all real-valued, bouded cotiuous fuctios o M, ad let C u (M) deote the set of all real-valued, bouded uiformly cotiuous fuctios o M. Defiitio. (weak covergece) If {P }, P are probability measures o (M, M) satisfyig fdp fdp as for all f C b (M) the we say that P coverges i distributio (or law) to P, or that P coverges weakly to P, ad we write P d P or P P. Similarly, if {X } are radom elemets i M (i.e. measurable maps from some probability space(s) (Ω, A, P r) (or (Ω, A, P r )) to (M, M)) with Ef(X ) Ef(X) for all f C b (M), the we write X d X or X X. Defiitio.2 (boudary ad P-cotiuity set) For ay set B M, the boudary of B is B B \ B o where B is the closure of B ad B o is the iterior of B; i.e. the largest ope set cotaied i B. A set B is called a cotiuity set of P if P ( B) = 0. Defiitio.3 (Bouded Lipschitz fuctios) A real-valued fuctio f o a metric space (M, d) is said to satisfy a Lipschitz coditio if there exists a fiite costat K for which f(x) f(y) Kd(x, y) for all x, y M. We write BL(M) for the vector space of all bouded Lipshitz fuctios o M. We ca characterize the space BL(M) i terms of a orm f BL defied for all real valued fuctios f o M as follows: f BL max{k (f), 2K 2 (f)} 3

4 4 CHAPTER. CONVERGENCE IN DISTRIBUTION where K (f) sup x y f(x) f(y) d(x, y), K 2 (f) sup f(x). x Here we have followed Pollard (2002), who deviates from the usual defiitio of f BL i order to obtai the followig ice iequality: f(x) f(y) f BL { d(x, y)} for all x, y M. Defiitio.4 (Lower ad upper semicotiuous fuctios) A fuctio f : M R is said to be lower semicotiuous (or LSC) if {x : f(x) > t} is a ope set for each fixed t. A fuctio f is said to be upper semicotiuous (or USC) if {x : f(x) < t} is ope for each fixed t. Thus f is USC if ad oly if f is LSC. If f is both USC ad LSC the it is cotiuous. The basic example of a lower semicotiuous fuctio is the idicator fuctio B of a ope set B; the basic example of a upper semicotiuous fuctio is the idicator fuctio B of a closed set B. Our first theorem will use the followig result coectig lower semicotiuous fuctios to fuctios i BL(M). Lemma. (LSC approximatio) Let g be a lower semicotiuous fuctio bouded from below o a metric space M. The there exists a sequece {f m } m= BL(M) satisfyig f m(x) g(x) for each x M. Proof. We may assume that g 0 without loss of geerality (if ot, replace g by g + sup x ( g(x))). For each t > 0 the set B t {x : g(x) t} is closed. The sequece of fuctios f k,t (x) t (kd(x, B t )) for k N are i BL(M) ad satisfy f k,t (x) t B c t (x) = t [g(x)>t] sice d(x, B t ) > 0 if ad oly if g(x) > t. Now cosider the coutable collectio G = k N t Q + {g k,t } where Q is the set of all ratioal umbers. The poitwise supremum of G is g. If we eumerate G as {g, g 2,...}, ad the defie f m max jm g j, it follows that f m is i BL(M) for each m ad f m g. Our first result gives a umber of equivaleces to the defiitio of weak covergece give i Defiitio.. Theorem. (portmateau theorem) For probability measures {P }, P o (M, M) the followig are equivalet: (i) fdp fdp for all f C b (M) ; i.e. P d P. (ii) fdp fdp for all f C u (M). (iii) fdp fdp for all f BL(M). (iv) lim sup fdp f dp for every upper semicotiuous f bouded from above. (v) lim if fdp f dp for every lower semicotiuous f bouded from below.

5 . WEAK CONVERGENCE IN METRIC SPACES 5 (vi) (vii) (viii) (ix) lim sup P (B) P (B) for all closed sets B M. lim if (B) P (B) for all ope sets B M. lim (B) = P (B) for all P cotiuity sets B M. lim fdp = f dp for all bouded measurable fuctios f with P (C f ) =. Proof. Clearly (i) implies (ii) ad (ii) implies (iii) sice BL(M) C u (M) C b (M). We also ote that (iv) ad (v) are equivalet sice f is lower semicotiuous ad bouded from below if f is upper semicotiuous ad bouded from above. Similarly, (vi) ad (vii) are equivalet by takig complemets. Sice the idicator fuctio of a ope set is lower semicotiuous ad bouded from below, (v) implies (vii), (ad similarly, (iv) implies (vi)). Now we use Lemma. to show that (iii) implies (v): suppose that (iii) holds, ad let g be a LSC fuctio bouded from below. By Lemma. there exists a sequece {f m } i BL(M) with f m g poitwise. The, for each fixed m we have lim if gdp lim if f m dp = f m dp sice f m dp f m dp by (iii). Take the supremum over m; by the mootoe covergece theorem the right side i the last display coverges to gdp, ad thus (v) holds. To see that (vi) ad (vii) imply (viii), let B be a P cotiuity set. The sice B o is ope ad B is closed, P (B o ) lim if P (B o ) lim if P (B) lim sup P (B) lim sup P (B) P (B). Sice B is a P cotiuity set P ( B) = 0 ad P (B) = P (B o ), so the extreme terms i the last display are equal ad hece lim P (B) = P (B). Next we show that (viii) implies (vi): Let B be a closed set ad suppose that (viii) holds. Sice {x : d(x, B) δ} {x : d(x, B) = δ}, the boudaries are disjoit for differet δ > 0, ad hece at most coutably may of them ca have positive P measure. Therefore for some sequece δ k 0 the sets B k {x : d(x, B) < δ k } are P cotiuity sets ad B k B if B is closed. It follows that lim sup P (B) lim sup P (B k ) = P (B k ) sice P (B k ) P (B k ) by (viii). By lettig k this yields (vi). Now we show that (vi) implies (i). Suppose that (vi) holds ad fix f C b (M). Without loss of geerality we ca traform f so that 0 < f(x) for all x M. Fix k ad defie the closed sets B j {x M : j f(x)} k for j = 0,..., k. The it follows that j= j k P (B j Bj c ) fdp j= j k P (B j B c j ).

6 6 CHAPTER. CONVERGENCE IN DISTRIBUTION Rewritig the sum o the right side ad summig by parts gives j= j k {P (B j ) P (B j )} = k + k P (B j ) which, together with a similar summatio by parts o the left side yields k P (B j ) j= fdp k + k j= P (B j ). j= Sice the sets B j are closed, it follows from the last display (also used with P replaced by P throughout) ad (vi) that lim sup fdp lim sup k + P (B j ) k k + P (B j ) k k + fdp. Lettig k gives lim sup fdp fdp. Applyig this last coclusio to f yields lim if fdp fdp. j= Combiig these last two displays yields (i). Sice (ix) implies (viii) by takig f = B, it remais oly to show that (iv) (ad (v) sice (iv) ad (v) are equivalet) implies (ix). Suppose that f is a bouded measurable fuctio ad suppose that (iv) holds; without loss of geerality we may assume that 0 f. Defie the lower semicotiuous fuctio f ad the upper semicotiuous fuctio f by f sup{g : g f, g LSC}, f if{g : g f, g USC}. Note that this otatio is sesible: if we take f = B for a Borel set B, the ( B )= B, ( B ) = B. Also ote that f f f. We claim that E f {x : f= f} = {x : f is cotiuous at x} C f. At ay x for which f (x) = f(x), the set {y : f (y) > f(x) ɛ} is a ope eighborhood of x, ad o this eighborhood f(y) > f(x) ɛ. Similarly, if f(x) = f(x), there exists a eighborhood of x o which f(y) < f(x) + ɛ. Puttig these together shows that f is cotiuous at each poit of j=

7 . WEAK CONVERGENCE IN METRIC SPACES 7 {x : f(x) = f (x)}; i.e. E f C f. To see the reverse iclusio, ote that if f is cotiuous at x, the for each ɛ > 0 there is a ope set G for which f(y) f(x) < ɛ for all y G. The it follows that (f(x) ɛ) G (y) 2 G c(y) f(y) (f(x) + ɛ) G (y) + 2 G c(y) which differ by 2ɛ at x. Note that the upper boud i the last display is USC ad the lower boud is LSC. This shows that f(x) f (x) ɛ ad hece that f(x) = f (x). This shows that E f C f ad completes the proof of (a) Now by (a) together with (iv) ad (vi) we have (usig the abbreviated otatio P f fdp ) P f lim if P f lim if P f lim sup P f lim sup P f P f. Sice P (C f ) = by hypothesis, it follows from (a) that P ( f) = P f = P f. We thus coclude that (ix) holds. The last part of the portmateau Theorem, part (ix) has a importat cosequece: weak covergece is preserved uder a map T to aother metric space (M, d ) which is cotiuous at a sufficietly large set of poits with respect to the limit measure P. This is the Ma-Wald or cotiuous mappig theorem. Theorem.2 (Cotiuous mappig) Suppose that T is a M \ M measurable mappig from (M, d) ito aother metric space (M, d ) with Borel sigma-field M. Suppose that T is cotiuous at each poit of a measurable subset C T M. If P (C T ) =, the P T d P T ; equivaletly if X P, X P are radom elemets i (M, d), the T (X ) d T (X) i (M, d ) provided P (X C T ) =. Proof. Let g C b (M ). The gdp T = g(t )dp where g(t ) = g T : M R is bouded ad cotiuous a.e. P sice P (C T ) =. It therefore follows from (ix) of the portmateau theorem that gdp T = g(t )dp g(t )dp = gdp T.

8 8 CHAPTER. CONVERGENCE IN DISTRIBUTION 2 Weak covergece i R ad R k Weak covergece i R Whe the metric space M is R, further equivaleces ca be added to those give i the portmateau theorem, Theorem.. I particular we ca add smoothess restrictios to the fuctios f ivolved (that oly make sese for fuctios defied o R). The followig propositio is oe such result i this directio. Propositio 2. Suppose that {X, X }, are real valued radom variables, ad suppose further that Ef(X ) Ef(X) for each f C (R), the class of all bouded fuctios with bouded derivatives of all orders. The X d X. Proof. Let Z N(0, ). For a fixed f BL(R) ad σ > 0, defie a smoothed fuctio f σ by covolutio: f σ (x) = Ef(x + σz) = ( exp ) (x y)2 f(y)dy. 2πσ 2σ2 Note that f σ C (R) (sice we ca justify repeated itegratio via the domiated covergece theorem), ad f σ coverges uiformly to f sice f σ (x) f(x) E f(x + σz) f(x) f BL E{ σ Z } 0 as σ 0 by the domiated covergece theorem. Suppose that ɛ > 0 is give. Fix σ > 0 so that sup x f σ (x) f(x) ɛ. The so that Ef(X ) Ef(X) Ef σ (X ) Ef σ (X) + 2ɛ lim sup Ef(X ) Ef(X) 2ɛ sice f σ C (R) ad hece Ef σ (X ) Ef σ (X) by the hypothesis of the lemma. Here is aother propositio of this type givig further equivaleces: Propositio 2.2 Suppose that {X, X } are real valued radom variables. The the followig are equivalet: (i) F (x) = P (X x) P (X x) = F (x) for all x with P (X = x) = 0 (i.e. all P cotiuity itervals of the form (, x]). (ii) X d X; i.e. Ef(X ) Ef(X) for all f C b (R). (iii) Ef(X ) Ef(X) for all f C 3 (R). (iv) Ef(X ) Ef(X) for all f C (R). (v) E exp(itx ) E exp(itx) for all t R. Proof. We have proved that (iv) implies (ii), ad the reverse implicatio is trivially true. Sice C (R) C 3 (R) C b (R), the equivaleces with (iii) follow easily. For the equivalece of (i) ad (ii) see Exercise xx. The equivalece of (v) ad (ii) will be established i Chapter 2. O the real lie R we ca metrize weak covergece i terms of the distributio fuctios: the metric that does this is the Lévy metric λ.

9 2. WEAK CONVERGENCE IN R AND R K 9 Propositio 2.3 (Lévy metric) For ay distributio fuctios F ad G defie λ(f, G) if{ɛ > 0 : F (x ɛ) ɛ G(x) F (x + ɛ) + ɛ for all x R}. The λ is a metric. Moreover, the set of all distributio fuctios uder λ is a complete separable metric space. Also F d F as if ad oly if λ(f, F ) 0 as. Proof. See Problem 6.5. Our goal ow is to use part (ii) of Propositio 2.2 to prove several basic cetral limit theorems usig the method of Lideberg. The proofs will use the followig replacemet iequality. Propositio 2.4 (Lideberg replacemet iequality) Suppose that X ad Y are idepedet radom variables with E Y 3 <, ad suppose that W is aother radom variable idepedet of X with E W 3 <. Suppose further that EY = EW ad EY 2 = EW 2. The for f C 3 (R) Ef(X + Y ) Ef(X + W ) C ( E Y 3 + E W 3) where C = (/6) sup x f (x). I particular whe W N(µ, σ 2 ), the Ef(X + Y ) Ef(X + W ) C E Y 3 where C (5 + 4E Z 3 )C = ( )C ad Z N(0, ), ad hece E Z 3 = 2(2π) /2 z 3 e z2 /2 dz = 4(2π) /2 = Proof. Fix f C 3 (R); by Taylor s theorem f(x + y) = f(x) + yf (x) + 2 y2 f (y) + R(x, y) where R(x, y) = y 3 f (x )/6 for some x satisfyig x x y. Therefore it follows that (a) R(x, y) C y 3 for all x, y. Thus for ay two radom variables X ad Y Ef(X + Y ) = Ef(X) + E(Y f (X)) + 2 E(Y 2 f (X)) + ER(X, Y ). Usig idepedece of X ad Y ad the boud (a) it follows that Ef(X + Y ) Ef(X) E(Y )E(f (X)) 2 E(Y 2 )E(f (X)) CE Y 3. Sice the same iequality holds with Y replaced by W for aother radom variable W idepedet of X with E W 3 <, if Y ad W have E(Y ) = E(W ) ad E(Y 2 ) = E(W 2 ), the we ca subtract ad via cacellatio of the first ad secod momet terms coclude that (b) Ef(X + Y ) Ef(X + W ) C ( E Y 3 + E W 3).

10 0 CHAPTER. CONVERGENCE IN DISTRIBUTION Whe W N(µ, σ 2 ) we ca further boud E W 3 : sice Z (W µ)/σ N(0, ) we ca write W = µ + σz. The by the C r iequality (with r = 3) E W { µ 3 + σ 3 E Z 3 } = 4{ E(Y ) 3 + {E(Y 2 )} 3/2 E Z 3 } 4{E Y 3 + E Y 3 E Z 3 } = (4 + 4E Z 3 )E Y 3 where the last iequality follows from Jese s iequality used twice. Combiig the last display with (b) yields the secod iequality of the propositio. Now suppose that ξ,..., ξ k are idepedet radom variables with µ i Eξ i, σ 2 i = V ar(ξ i ), E ξ i 3 <. Suppose that {η i } are idepedet ad idepedet of the collectio {ξ i } with η i N(µ i, σ 2 i ) for i =,..., k. Defie S k = ξ ξ k, T k = η η k. ). Now we set up otatio to apply Propo- Note that T k N(E(T k ), V ar(t k )) = N( k µ j, k sitio 2.4: we defie, for each i X i ξ ξ i + +η i η k, Y i ξ i W i η i. σ2 j By idepedece of the 2k radom variables {ξ i } ad {η i } it follows that X i, Y i, ad W i are idepedet for each i. From the secod boud of Propositio 2.4 it follows that Ef(X i + Y i ) Ef(X i + W i ) C E ξ i 3 i k. Also ote that for i = k the defiitios yield X k + Y k = S k ad X + W = T k. Each replacemet of a Y i by a W i gives sums X i + Y i ad X i + W i with oe more ormal radom variable η i, ad take together the k replacemets result i replacig all the o-gaussia variables ξ i by the Gaussia radom variables η i to get T k. The total chage i expected value is therefore bouded by a sum of third momet terms. Here are the details: sice X j + W j = X j + Y j for j = 2,..., k, () Ef(S k ) Ef(T k ) = Ef(X k + Y k ) Ef(X + W ) = (Ef(X j + Y j ) Ef(X j + W j )) j= Ef(X j + Y j ) Ef(X j + W j ) j= C ( E ξ E ξ k 3). We will state the resultig theorem i terms of a triagular array of row-wise idepedet radom variables {ξ,i : i =,..., k, N} where k is o-decreasig: ξ,, ξ,2,..., ξ,k

11 2. WEAK CONVERGENCE IN R AND R K ξ 2,, ξ 2,2,..., ξ 2,k2 ξ 3,, ξ 3,2,..., ξ 3,k3... We assume that the radom variables i each row are idepedet, but othig is assumed about relatioships betwee differet rows. As we will see, this formulatio is coveiet for dealig with ceterig ad scalig costats. Theorem 2. (Basic triagular array CLT) Suppose that {ξ,i : i =,..., k } = is a triagular array of row-wise idepedet radom variables such that: (i) k Eξ,i µ where µ R is fiite. (ii) k V ar(ξ,i ) σ 2 <. (iii) k E ξ,i 3 0. The k i= ξ,i d N(µ, σ 2 ). Proof. Fix f C 3 (R). Applicatio of the iequality () yields Ef( k k ξ,i ) Ef(T ) C E ξ,i 3 0 where T N(µ, σ 2 ) ad where µ µ, σ 2 σ2 by (i) ad (ii). Sice this implies that T d N(µ, σ 2 ) (see Exercise yy), it follows that Ef( k i= ξ,i ) Ef(N(µ, σ 2 )) = Ef(µ + σz) where Z N(0, ), ad this implies (2) i view of Propositio 2.2. The basic cetral limit theorem for triagular arrays, Theorem 2., ca be exteded to cover sums of idepedet radom varibles without third momet hypotheses via trucatio argumets. Our ext result, the classical (Lideberg) cetral limit theorem for idepedet idetically distributed radom variables with fiite variaces is a good example of the techique. Theorem 2.2 (Classical CLT) Suppose that X, X 2,... are i.i.d. radom variables with E(X i ) = 0 ad E(Xi 2 ) =. The (X + + X ) = (X 0) d Z N(0, ). I fact, for f C 3 (R), Ef( /2 X ) Ef(Z) C E { ( X 2 X )} + f BL {2 + 2E Z }E{ X 2 [ X > ]} where C (5 + 4E Z 3 )C = ( )C ad C sup x f (x) /6.

12 2 CHAPTER. CONVERGENCE IN DISTRIBUTION Corollary (Berry-Essee type boud) Suppose that X, X 2,... are i.i.d. radom variables with E(X i ) = 0, E(X 2 i ) =, ad E X i 3 <. The, for f C 3 (R), Ef( /2 X ) Ef(Z) K f E X 3 where K f C + 2 f BL ( + E Z ). Proof. The argumet proceeds by applyig Theorem 2. to the trucated ad rescaled variables ξ,i = X i [ Xi ], i =,...,. We compute µ Eξ,i = Eξ, = E{X [ X > ]}/ sice E(X ) = 0, ad this yields µ E{ X [ X > ] } E{ X 2 [ X > ] } 0 by the domiated covergece theorem. For the sum of variaces we have σ 2 V ar(ξ,i ) = E{X 2 [ X ] } (Eξ,) 2 sice Eξ, = µ / = o(/) ad by usig the domiated covergece theorem agai. I fact, we ca also coclude that σ 2 E{X 2 [ X > ]} + (Eξ, ) 2 2E{X 2 [ X > ]} by (a) ad Jese s iequality. Fially the sum of third momets is cotrolled by k E ξ,i 3 { 3/2 E{ X 3 [ X ]} E X 2 ( X } ) 0 agai by the domiated covergece theorem. I fact this argumet shows that Ef( ξ,i ) Ef(T ) C E { ( X 2 X )} To coclude the proof we eed to show that for f C 3 (R) Ef( /2 X ) Ef( ξ,i ) 0.

13 2. WEAK CONVERGENCE IN R AND R K 3 But sice C 3 (R) BL(R) the iequality () yields Ef( /2 X ) Ef( ξ,i ) f BL E X i X i [ Xi ] i= f BL E{ X [ X > ] } f BL E{ X 2 [ X > ] } 0. This completes the proof of the first claim of the theorem. To fiish the proof of the secod claim, it remais to boud Ef(T ) Ef(Z) = Ef(µ + σ Z) Ef(Z) where T N(µ, σ 2) ad Z N(0, ). Agai, for f C 3 (R) the iequality () yields Ef(µ + σ Z) Ef(Z) f BL E µ + (σ )Z f BL { µ + σ E Z } f BL {E{ X 2 [ X > ]} + E Z σ + σ2 } f BL { + 2E Z }E{ X 2 [ X > ] } by (a). Collectig the bouds yields the secod coclusio of the theorem. To prove the direct half of the classical Lideberg-Feller cetral limit theorem, we will usig the follog lemma. Lemma 2. Suppose that (ɛ) 0 for each fixed ɛ > 0. The there exists a sequece ɛ 0 such that (ɛ ) 0. Proof. For each positive iteger k there is a iteger k such that (/k) < /k for k. We may assume, without loss of geerality that < 2 <.... Set { /2 if < ɛ /k if k < k+. The for it follows that ɛ = /k where k satisfies k < k+. Note that k as, ad for (ɛ ) < /k 0 as. Our ext theorem gives the forward half of the Lideberg-Feller cetral limit theorem. Theorem 2.3 (Lideberg-Feller) Suppose that {X,i : i ; N} is a triagular array of (row-wise idepedet) radom variables with E(X,i ) = 0 for all i ad N ad i= E(X2,i ) =. The the followig are equivalet: (i) X,i d Z N(0, ) ad max i E(X,i 2 ) 0; (ii) L (ɛ) E{X2,i [ X,i >ɛ]} 0 for each ɛ > 0.

14 4 CHAPTER. CONVERGENCE IN DISTRIBUTION Proof. Here we show that the Lideberg coditio (ii) implies (i). By (ii) it follows that (ɛ) L (ɛ)/ɛ 2 0 for each ɛ > 0. By Lemma 2. we ca fid ɛ 0 slowly eough that (ɛ ) 0. Now we trucate the X,i s at ɛ : defie a ew triagular array {ξ,i } by ξ,i = X,i [ X,i ɛ ]. Note that P (ξ,i X,i for some i) P ( X,i > ɛ ) L (ɛ )/ɛ 2 0. Thus it suffices to show that ξ,i d Z. To do this we use Theorem 2.. Sice the X,i have mea zero, E(ξ,i ) = Furthermore, E{X,i [ X,i >ɛ ]} L (ɛ )/ɛ = ɛ L (ɛ )/ɛ 2 0. V ar(ξ,i ) = = E{X,i 2 [ X,i ɛ ]} ( E{X,i [ X,i >ɛ ]}) 2 E(X,i) 2 L (ɛ) o(). For the third momets we compute E ξ,i 3 ɛ E(X,i 2 ) 0. Thus the hypotheses of Theorem 2. hold ad we coclude that ξ,i d Z. To complete the proof that (ii) implies (i) we eed to show that max i E(X,i 2 ) 0. But ad hece E(X 2,i) = E(X 2,i [ X,i ɛ ]) + E(X 2,i [ X,i >ɛ ]) ɛ 2 + L (ɛ ), max i E(X2,i ) ɛ2 + L (ɛ ) 0. We will prove that (i) implies (ii) i Chapter 3(?) A Coverse CLT Propositio 2.5 (Coverse CLT) Suppose that X,..., X are i.i.d., ad let S /2 i= X i. If S = O p (), the E(X 2 ) < ad E(X ) = 0. Our proof of Propositio will rely o the followig three lemmas.

15 2. WEAK CONVERGENCE IN R AND R K 5 Lemma 2.2 (Symmetrizatio) For idepedet rv s X,..., X ad ɛ,..., ɛ i.i.d. Rademacher rv s idepedet of the X i s, (2) P ( /2 i= ɛ i X i > 2t) 2 sup P ( /2 X i > t). i= Proof. By coditioig o the Rademacher s we see that ( ) ( P /2 ɛ i X i > 2t P /2 ɛ i X i + /2 i.e. (2) holds. i= E ɛ P X ( 2 sup P k 2 sup P k i:ɛ i = /2 i:ɛ i = + E ɛ P X ( /2 ( ( 2 sup P k< /2 k /2 ( X i > t ) i:ɛ i = ) X i > t i= ) X i > t i= k /2 i:ɛ i = X i > t ) X i > t, i= ) ɛ i X i > 2t ) Lemma 2.3 (Khichie s iequalities) There exist costats A p, B p, such that, for a = (a,..., a ) R, ad p, { } p/2 { } p/2 A p a 2 i E a i ɛ p B p a 2 i. i= i= i= Recall that we proved this for p = ad foud that A = / 3 ad B = work. Lemma 2.4 (Paley-Zygmud iequality) Suppose that Y is a o-egative radom variable with mea EY ad secod momet E(Y 2 ) = Y 2 2. The (3) ( (EY t) + ) 2 P (Y > t). Y 2 Proof. E(Y ) = E(Y [Y t] ) + E(Y [Y >t] ) t + E(Y 2 )P (Y > t)

16 6 CHAPTER. CONVERGENCE IN DISTRIBUTION by the Cauchy-Schwarz iequality. Rearragig this iequality yields (3). Proof. (Propositio 2.5) The followig proof is from Gié ad Zi (994). Lemma 2.2 yields sup P ( /2 ɛ i X i > 2t) 2 sup P ( /2 X i > t). i= Thus tightess of {S } implies that { /2 ɛ i X i } is tight. i= By Khichie s iequality (Lemma 2.3), regardig the X i s as fixed (coditioig o the X i s), we fid that ( ) /2 /2 E ɛ ɛ i X i A Xi 2 c[s ]. i= Thus by the Paley-Zygmud iequality (Lemma 2.4) applied with Y = /2 i= ɛ ix i ad the X i s held fixed (coditioig o the X i s) ( (EY t) P ɛ ( /2 + ) 2 ɛ i X i > t) (E(Y 2 )) /2 i= ( (c[s ] t) + ) 2 [S ] ( = c 2 t ) 2 c[s ] c2 4 [[S ]>2t/c]. Takig expectatios across this iequality with respect to the X i s yields P ( /2 i= i= ɛ i X i > t) c2 4 P ([S ] > 2t/c). It follows that the sequece {[S ]} is tight. Now for fixed M (0, ) i= Xi 2 [X 2 i M] a.s. E(X 2 [X 2 M] ) as. i= Thus i particular this covergece holds i probability ad i distributio. Portmateau theorem.7.4 (f), [E(X 2 [X 2 M] )>t] lim if P ( sup P ( i= i= Xi 2 [Xi 2 M] > t) Xi 2 [Xi 2 M] > t), Therefore, by the

17 2. WEAK CONVERGENCE IN R AND R K 7 so it follows that sup [E(X 2 M>0 [X 2M] )>t] sup sup P ( M>0 sup P ( i= Xi 2 > t) i= = sup P ([S ] 2 > t). Xi 2 [Xi 2 M] > t) By the tightess of {[S ]}, we ca make the right side of the last display as small as we please; i particular there exists a umber t 0 < such that the right side is less tha /2. But this implies that for this t 0 the idicator o the left side of the iequality must be zero, uiformly i M; i.e. sup E(X 2 [X 2 M>0 M] ) t 0. But the last supremum is just E(X 2), ad hece we have E(X2 ) t 0 <. To complete the proof, ote that E(X 2) < implies that E X <, ad hece by the strog law of large umbers we have X i a.s. E(X ). i= But the hypothesis /2 i= X i = O p () implies that X i p 0, i= Combiig these two displays yields E(X ) = 0. Gié ad Zi (994) use similar methods to establish the correspodig theorem for U- statistics. Theorem. (Gié ad Zi, 994). If the sequece { m/2 U (h)} = is tight (stochastically bouded), the Eh 2 (X,..., X m ) < ad Eh(X, x 2,..., x m ) = 0 for almost every (x 2,..., x m ) X m. Referece: Gié, E. ad Zi, J. (994). A remark o covergece i distributio of U-statistics. A. Probability 22, Weak covergece i R k The ext step is to exted the results for M = R to M = R k. We first state a set of equivaleces for d i R k. Propositio 2.6 Suppose that {X, X } are radom vectors with values i R k, ad let F (x) P (X x) ad F (x) P (X x) for x R k. The the followig are equivalet: (i) F (x) = P (X x) P (X x) = F (x) for all x C F {y R k : F is cotiuous at y}. (ii) X d X; i.e. Ef(X ) Ef(X) for all f C b (R). (iii) Ef(X ) Ef(X) for all f C (R k ). (iv) E exp(it X ) E exp(it X) for all t R k.

18 8 CHAPTER. CONVERGENCE IN DISTRIBUTION I Propositio 2.6 the equivalece of (ii) ad (iii) depeds o the equivalece of (i) ad (iii) i Theorem. ad the a geeralizatio of Propositio 2. to R k ; see Exercise 6.6. The replacemet techiques of Lideberg ca be exteded i a straightforward way to radom vectors; see Exercises 6.7 ad 6.7 for the start of this. Oe cocrete result i this directio is the followig cetral limit theorem for sums of idepedet radom vectors. Theorem 2.4 (Classical multivariate CLT) Suppose that X,..., X are i.i.d. radom vectors i R k with E(X ) = µ ad E( X 2 ) <. The /2 (X + + X µ) = (X µ) d Y N k (0, Σ) where Σ = E(X X T ) = (Cov(X j, X j ) j,j =. O the other had, the usual approach to derivig limit theorems of this type is via the result of Cramér ad Wold (936) characterizig covergece i distributio of radom vectors i terms of the covergece of liear combiatios i R. Propositio 2.7 (Cramér - Wold device) Let X, X be radom vectors i R k. The X d X i R k if ad oly if a X d a X i R for each a R k. Proof. Suppose that X d X i R k ad let a R k. The g(x) = a x is a cotiuous fuctio o R ad hece by the cotiuous mappig theorem a X = g(x ) d g(x) = a X. To prove the reverse implicatio we use part (iv) of Propositio 2.6. Suppose that a X d a X for every a R k. The by part (v) of Propositio 2.2 it follows that E exp(it(a X )) E exp(it(a X)) for all t R, ad this holds for every a R k. I particular, whe t = we have ϕ X (a) = E exp(ia X ) E exp(ia X) = ϕ X (a) for every a R k. But the by (iv) of Propositio 2.6 this implies that X d X i R k. Walther (997) gives a proof of the result of Cramér ad Wold without use of characteristic fuctios, ad otes that related results were established by Rado (97).

19 3. TIGHTNESS AND SUBSEQUENCES 9 3 Tightess ad subsequeces It is ofte useful to argue usig subsequeces i argumets ivolvig covergece i distributio. The followig basic propositio gives a startig poit for our discussio: Propositio 3. If P ad P are distributios (probability measures) o (M, M) such that for every subsequece {P } with { } N there is a further subsequece {P } such that P d P, the P P. Proof. Suppose ot. The for some f C b (M) we have P f P f. Thus for some ɛ > 0 ad subsequece it follows that P f P f > ɛ for all { }. But the there is o further subsequece { } for which P f P f, cotradictig the hypothesis. To be able to extract coverget subsequeces i geeral requires some appropriate otio of compactess. Here the right idea is to rule out escape of mass. O the real lie this escape is possible oly toward ±, but i more complicated spaces it ca happe i may ways. The followig defiitios are aimed at rulig out the escape of mass i quite geeral settigs. Defiitio 3. (Tightess) A probability measure P o M is said to be tight if for each ɛ > 0 there exists a compact set K = K ɛ such that P (K ɛ ) > ɛ. The basic result cocerig tightess of idividual measures P is due to Ulam. Theorem 3. (Ulam s theorem) If M is separable ad complete, the each P o (M, M) is tight. Proof. Let ɛ > 0. By the separability of M, for each m there is a sequece A m, A m2,... of ope /m spheres coverig M. Choose i m so that P ( iim A mi ) > ɛ/2 m. Now the set B m= ii m A mi is totally bouded i M: for each ɛ > 0 it has a fiite ɛ et (i.e. a set of poits {x k } with d(x, x k ) < ɛ for some x k for each x B). By completeess of M, B is complete ad B K is compact. Sice P (K c ) = P (B c ) P (B c ) the coclusio follows. P {( iim A mi ) c } < i= m= ɛ 2 m = ɛ, Defiitio 3.2 (Uiform tightess) If P is a set of probability measures o a metric space (M, d), the P is called uiformly tight if ad oly if for every ɛ > 0 there is a compact set K M such that P (K) > ɛ for all P P. I the case of a sequece of measures {P } it is coveiet to relax the requiremet i Defiitio 3.2 slightly. Defiitio 3.3 (Asymptotic tightess (of a sequece)) If {P } is a sequece of probability measures o (M, d), the {P } is called asymptotically tight if ad oly if for every ɛ > 0 there is a compact set K = K ɛ such that lim sup P (G c ) < ɛ for every ope set G cotaiig K ɛ

20 20 CHAPTER. CONVERGENCE IN DISTRIBUTION The mai result for a asymptotically tight sequece is the followig theorem due to Prohorov (956) ad Le Cam (957). Theorem 3.2 (Prohorov, 956; Le Cam, 957) Suppose that {P } o (M, M) is asymptotically tight. The there exists a subsequece {P } that satisfies P d (some) P where P is tight. Pollard (200) relaxes the defiitio of uiform tightess for a sequece still further, ad proves the same result for arbitrary metric spaces. The proof of the Prohorov - LeCam theorem 3.2 depeds o the followig auxiliary results. The first of these gives a correspodece betwee tight measures ad tight liear fuctioals. Theorem 3.3 (Correspodece theorem) A liear fuctioal T : BL(M) + R + with T = defies a tight probability measure if ad oly if it is fuctioally tight: i.e. for each ɛ > 0 there exists a compact set K ɛ such that T (l) < ɛ for every l BL(M) + for which l K c ɛ. Up to icosequetial costat multiples, asymptotic tightess is equivalet to: for each ɛ > 0 there exists K ɛ such that lim sup P l < 2ɛ for every l BL(M) + with 0 l K c ɛ. To see that asymptotic tightess implies this, ote that for such a fuctio l, the set G ɛ = {l < ɛ} is ope ad G ɛ K ɛ. The P (l) ɛ + P (G c ɛ) < 2ɛ evetually. The secod aalytic result we will use is: Propositio 3.2 (Cotiuous partitio of uity) For each δ > 0, ɛ > 0, ad each compact set K, there exists a fiite collectio G = {g 0, g,..., g k } BL(M) + such that: (i) g 0 (x) + g (x) + + g k (x) = for each x M; (ii) diam[g i > 0] δ for i where diam(a) sup{d(x, y) : x, y A}; (iii) g 0 < ɛ o K. Proof. Let x,..., x k be the ceters of ope balls of radius δ/4 whose uio covers K. Defie fuctios f 0 ɛ/2, f i (x) = ( 2d(x, x i )/δ) + for i, so that f j BL(M) + for j = 0,..., k. Also ote that f i (x) = 0 if d(x, x i ) > δ/2. Thus the set {f i > 0} has diameter less tha δ for i. The fuctio F (x) = k i=0 f i(x) is everywhere greater tha ɛ/2 ad is i BL(M) +. The o-egative fuctios g i f i /F are bouded by ad satisfy a Lipschitz coditio: g i (x) g i (y) F (y)f i(x) F (x)f i (y) F (x)f (y) f i(x) f i (y) + F (y) F (x) f i(y) F (x) F (x)f (y) f BLd(x, y) ɛ/2 + F BLd(x, y) ɛ/2 For each x K, there is a i for which d(x, x i ) < δ/4. For this i, f i (x) > /2 ad g 0 (x) f 0 (x)/f i (x) < (ɛ/2)/(/2) = ɛ. Thus the fuctios g i satisfy (i) - (iii)..

21 3. TIGHTNESS AND SUBSEQUENCES 2 Proof. (Prohorov-LeCam theorem). Write K i for the compact set correspodig to ɛ = /i, i. Write G i for the fiite collecto of fuctios i BL(M) + costructed i Propositio 3.2 with δ = ɛ = /i ad K = K i. The collectio G i N G i is coutable. For each g G the sequece of real umbers P g is bouded. It has a coverget subsequece. Via the Cator-diagoalizatio argumet we ca costruct a sigle sequece N N for which lim N P g exists for every g G. The aproximatio properties of G will allow us to show that T (l) lim N P N P l exists for every l BL(M) +. Without loss of geerality, suppose that l BL. Give ɛ > 0, choose a i > /ɛ, the write G i = {g 0, g,..., g k } for the fiite collectio guarateed by Propositio 3.2. The ope set G i = {g 0 < ɛ} cotais K i which implies that lim sup P G c i < ɛ. For each j k = k(i), let x j be ay poit at which g j (x j ) > 0. If x is ay other poit with g j (x) > 0, the It follows that for every x M l(x) l(x j )g j (x) l(x)g 0 (x) + l(x) l(x j ) d(x, x j ) ɛ. l(x) l(x j ) g j (x) j= (ɛ + G c i ) + ɛ, ad this itegrates to give P l l(x j )P (g j ) P G c i + 2ɛ. j= Sice lim N P g j exists, it follows that lim sup P l lim if P l 6ɛ. N N This shows that T (l) lim N P l exists for each l BL(M) +. Note that T () = easily, ad T iherits fuctioal tightess from asymptotic tightess of {P }. From the correspodece Theorem 3.3 the fuctioally tight liear fuctioal T correspods to a tight probability measure P to which {P : N } coverges weakly. Defiitio 3.4 (Relative compactess) Let P be a set of probability measures o (M, M). We say that P is relatively compact if every sequece {P } P cotais a weakly coverget subsequece. Thus every {P } P cotais a subsequece {P } with P d some Q (ot ecessarily i P). Propositio 3.3 Let (M, d) be a separable metric space. (i) (Le Cam). If P d P, the {P } is uiformly tight. (ii) If P d P, the {P } is relatively compact. (iii) If {P } is relatively compact ad the set of limit poits is just the sigle poit P, the P P. Theorem 3.4 (Prohorov s theorem) Let P be a collectio of probability measures o (M, M). (i) If P is uiformly tight, the it is relatively compact. (ii) Suppose that (M, d) is separable ad complete. If P is relatively compact it is uiformly tight.

22 22 CHAPTER. CONVERGENCE IN DISTRIBUTION 4 Metrizig weak covergece The Lévy metric o distributio fuctios defied i Propositio 2.3 exteds i a ice way to give a metric for d more geerally. For ay set B M ad ɛ > 0 defie B ɛ {y M : d(x, y) < ɛ for some x B}. Defiitio 4. (Prohorov metric) For P, Q two probability measures o (M, M), the Prohorov distace ρ(p, Q) betwee P ad Q is defied by ρ(p, Q) if{ɛ > 0 : P (B) Q(B ɛ ) + ɛ for all B M}. Aother very useful metric o P is defied i terms of the bouded Lipschitz fuctios BL(M) defied i Sectio. Defiitio 4.2 (Bouded Lipschitz metric) For P, Q two probability measures o (M, M), the bouded Lipschitz distace β(p, Q) betwee P ad Q is defied by { } β(p, Q) sup fdp fdq : f BL. Propositio 4. Both ρ ad β are metrics o P {all probability measures o (M, M)}. Proof. See Exercise 6.0. The followig theorem says that both ρ ad β metrize d just as the Lévy metric metrized covergece of distributio fuctios o R. Theorem 4. For ay separable metric space (M, d) ad Borel probability measures {P }, P o (M, M) the followig are equivalet: (i) P d P. (ii) fdp fdp for all f BL(M). (iii) β(p, P ) 0. (iv) ρ(p, P ) 0. Proof. We prove the result uder the additioal assumptio that M is complete. The equivalece of (i) ad (ii) has bee proved i Theorem.. Now we show that (ii) implies (iii): by Ulam s Theorem 3., for ay ɛ > 0 we ca choose K compact so that P (K) > ɛ. Now the set of fuctios E = {f BL(M) : f BL } restricted to K form a compact set of fuctios for (by the Arzela-Ascoli theorem; see e.g. Billigsley (968) page 22). Thus for some fiite k there are f,..., f k BL(M) such for ay f E there is a f j with sup x K f(x) f j (x) ɛ. The, sice f, f j BL(M), sup f(x) f j (x) 3ɛ. x K Let g(x) max{0, ( d(x, K)/ɛ)}; the g BL(M) ad K g K ɛ. For sufficietly large we have P (K ɛ ) gdp > 2ɛ,

23 4. METRIZING WEAK CONVERGENCE 23 ad hece for ay f E fdp fdp = (f f j )d(p P ) + f j d(p P ) (f f j )dp + (f f j )dp + f j d(p P ) f j d(p P ) 3ɛ + 2 2ɛ + 2ɛ + 2ɛ + + 7ɛ + 4ɛ + ɛ = 2ɛ by choosig large. Hece (iii) holds. Now we show that (iii) implies (iv): give a Borel set B ad ɛ > 0, let f ɛ (x) max{0, ( d(x, B)/ɛ)}. The f ɛ BL(M), f BL 2 ɛ, ad < f ɛ B ɛ. Therefore, for ay P ad Q o M we have Q(B) f ɛ dq f ɛ dp + (2 ɛ )β(p, Q) ad it follows that P (B ɛ ) + (2 ɛ )β(p, Q), ρ(p, Q) max{ɛ, (2 ɛ )β(p, Q)}. Hece if β(p, Q) ɛ 2, the ρ(p, Q) < max{ɛ, (2 ɛ )ɛ 2 } = max{2ɛ 2, ɛ} ɛ( + 2ɛ) 3ɛ. Hece for all P, Q we have ρ(p, Q) 3 β(p, Q). Thus (iii) implies (iv). [It ca also be show that cβ(p, Q) ρ(p, Q) for some c > 0; see e.g. Dudley (976), page 8.6.] Fially we show that (iv) implies (i): Suppose that (iv) holds, let B be a P cotiuity set, ad let ɛ > 0. The for 0 < δ < ɛ small, P (B δ \ B) < ɛ ad P ((B c ) δ \ B c ) < ɛ. The ad P (B) P (B δ + δ P (B) + 2ɛ P (B c ) P (((B c ) δ + δ P (B c ) + 2ɛ ; combiig these yields P (B) P (B) 2ɛ ad hece P (B) P (B). By the portmateau theorem.. this yields (i). More Metrics o P There are other useful metrics o P that metrize topologies other tha weak covergece. It is frequetly useful to relate these to the Prohorov ad bouded Lipschitz metrics ρ ad β we have itroduced earlier i this sectio. Defiitio 4.3 For probability measures P, Q o (M, M), the total variatio distace from P to Q is defied by d T V (P, Q) sup{ P (A) Q(A) : A M}.

24 24 CHAPTER. CONVERGENCE IN DISTRIBUTION Propositio 4.2 The total variatio distace d T V (P, Q) is give by d T V (P, Q) = p q dµ = (p q) dµ 2 where p = dp/dµ, q = dq/dµ, ad µ is ay measure domiatig both P ad Q (e.g. P + Q). Proof. See Exercise 6.. Defiitio 4.4 The Helliger distace H(P, Q) is defied by H 2 (P, Q) { p pqdµ q} 2 dµ =, 2 where p = dp/dµ, q = dq/dµ, ad µ is ay measure domiatig both P ad Q. It is ot hard to show (see Exercise 6.2) that H(P, Q) does ot deped o the choice of the domiatig measure µ. Here is a theorem relatig these metrics to each other ad to the Prohorov ad bouded Lipschitz metrics. Theorem 4.2 For P, Q probability measures o (M, M) the followig iequalities hold: (i) 2 β(p, Q) ρ(p, Q) 3 β(p, Q). (ii) H 2 (P, Q) d T V (P, Q) H(P, Q){ H 2 (P, Q)/2} /2. (iii) ρ(p, Q) d T V (P, Q). For distributio fuctios F, G o R (or o R k ) we have: (iv) λ(f, G) ρ(f, G) d T V (F, G). (v) λ(f, G) d K (F, G) d T V (F, G) where d K (F, G) F G sup x F (x) G(x). Proof. The right side of (i) was proved i the course of the proof of Theorem 4.. For the left side, see Dudley (976) sectio 8.6. We leave the remaiig iequalities as exercises.

25 5. CHARACTERIZING WEAK CONVERGENCE IN SPACES OF FUNCTIONS 25 5 Characterizig weak covergece i spaces of fuctios Suppose that T is a set, ad suppose that X (t), t T are stochastic processes idexed by the set T ; that is, X (t) : Ω R is a measurable map from each t T ad N. Assume that the processes X have bouded sample fuctios almost surely (or, have versios with bouded sample paths almost surely). The X ( ) l (T ) almost surely where l (T ) is the space of all bouded real-valued fuctios o T. The space l (T ) with the sup orm T is a Baach space; it is separable oly if T is fiite. Hece we will ot assume that the processes X iduce tight Borel probability laws o l (T ). Now suppose that X(t), t T, is a sample bouded process that does iduce a tight Borel probability measure o l (T ). the we say that X coverges weakly to X (or, iformally X coverges i law to X uiformly i t T ), ad write if X X i l (T ) E H(X ) EH(X) for all bouded cotiuous fuctios H : l (T ) R. Here E deotes outer expectatio. It follows immediately from the precedig defiitio that weak covergece is preserved by cotiuous fuctios: if g : l (T ) D for some metric space (D, d) where g is cotiuous ad X X i l (T ), the g(x ) g(x) i (D, d). (The coditio of cotiuity of g ca be relaxed slightly; see e.g. Va der Vaart ad Weller (996), Theorem.3.6, page 20.) While this is ot a deep result, it is oe of the reasos that the cocept of weak covergece is importat. The followig example shows why the outer expectatio i the defiitio of is ecessary. Example 5. Suppose that U is a Uiform(0, ) radom variable, ad let X(t) = {U t} = [0,t] (U) for t T = [0, ]. If we assume the axiom of choice, the there exists a omeasurable subset A of [0, ]. For this subset A, defie F A = { [0, ] (s) : s A} l (T ). Sice F A is a discrete set for the sup orm, it is closed i l (T ). But {X F A } = {U A} is ot measurable, ad therefore the law of X does ot exted to a Borel probability measure o l (T ). O the other had, the followig propositio gives a descriptio of the sample bouded processes X that do iduce a tight Borel measure o l (T ). Propositio 5. (de la Peña ad Gié (999), Lemma 5..; va der Vaart ad Weller (996), Lemma.5.9)). Let X(t), t T be a sample bouded stochastic process. The the fiitedimesioal distributios of X are those of a tight Borel probability measure o l (T ) if ad oly if there exists a pseudometric ρ o T for which (T, ρ) is totally bouded ad such that X has a versio with almost all its sample paths uiformly cotiuous for ρ. Proof. Suppose that the iduced probability measure of X o l (T ) is a tight Borel measure P X. Let K m, m N be a icreasig sequece of compact sets i l (T ) such that P X ( m= K m) =, ad let K = m= K m. The we will show that the pseudometric ρ o T defied by ρ(s, t) = 2 m ( ρ m (s, t)), m=

26 26 CHAPTER. CONVERGENCE IN DISTRIBUTION where ρ m (s, t) = sup{ x(s) x(t) : x K m }, makes (T, ρ) totally bouded. To show this, let ɛ > 0, ad choose k so that m=k+ 2 m < ɛ/4 ad let x,..., x r be a fiite subset of k m= K m = K k that is ɛ/4 dese i K k for the supremum orm; i.e. for each x k m= K m there is a iteger i r such that x x i T ɛ/4. Such a fiite set exists by compactess. The subset A of R r defied by {(x (t),..., x r (t)) : t T } is bouded (ote that k m= K m is compact ad hece bouded). Therefore A is totally bouded ad hece there exists a fiite set T ɛ = {t j : j N} such that, for each t T, there is a j N for which max sr x s (t) x s (t j ) ɛ/4. It is easily see that T ɛ is ɛ dese i T for the pseudo-metric ρ: if t ad t j are as above, the for m k it follows that ad hece ρ m (t, t j ) = sup x K m x(t) x(t j ) max sr x s(t) x s (t j ) + ɛ 2 3ɛ 4, ρ(t, t j ) ɛ m ρ m (t, t j ) ɛ. m= Thus we have proved that (T, ρ) is totally bouded. Furthermore, the fuctios x K are uiformly ρ cotiuous, sice, if x K m, the x(s) x(t) ρ m (s, t) 2 m ρ(s, t) for all s, t T with ρ(s, t). Sice P X (K) =, the idetity fuctio of (l (T ), B, P X ) yields a versio of X with almost all of its sample paths i K, hece i C u (T, ρ), the space of bouded uiformly ρ cotiuous fuctios o T. This proves the direct half of the propositio. Coversely, suppose that X(t), t T, is a stochastic process with a versio whose sample fuctios are almost all i C u (T, ρ) for a metric or pseudometric ρ o T for which (T, ρ) is totally bouded. We will cotiue to use X to deote the versio with these properties. We ca clearly assume that all the sample fuctios are uiformly cotiuous. If (Ω, A, P ) is the probability space where X is defied, the the map X : Ω C u (T, ρ) is Borel measurable because the radom vectors (X(t ),..., X(t k )), t i T, k N, are measurable ad the Borel σ algebra of C u (T, ρ) is geerated by the fiite-dimesioal sets {x C u (T, ρ) : (x(t ),..., x(t k )) A} for all Borel sets A of R k, t i T, k N. Therefore the iduced probability law P X of X is a tight Borel measure o C u (T, ρ) by Ulam s theorem; see e.g. Billigsley (968), Theorem.4 page 0, or Dudley (989), Theorem 7..4 page 76. But the iclusio of C u (T, ρ) ito l (T ) is cotiuous, so P X is also a tight Borel measure o l (T ). Exhibitig coveiet metrics ρ for which total boudedess ad cotiuity holds is more ivolved. It ca be show that (see e.g. Hoffma-Jørgese (984), (99); Aderse (985), Aderse ad Dobric (987)) that if ay pseudometric works, the the pseudometric ρ 0 (s, t) = E arcta X(s) X(t) will do the job. However, ρ 0 may ot be the most atural or coveiet pseudometric for a particular problem. I particular, for the frequet situatio i which the process X is Gaussia, the pseudometrics ρ r defied by ρ r (s, t) = (E X(s) X(t) r ) /(r )

27 5. CHARACTERIZING WEAK CONVERGENCE IN SPACES OF FUNCTIONS 27 for 0 < r < are ofte more coveiet, ad especially ρ 2 i the Gaussia case; see Va der Vaart ad Weller (996), Lemma.5.9, ad the followig discussio. Propositio 5. motivates our ext result which characterizes weak covergece X X i terms of asymptotic equicotiuity ad covergece of fiite-dimesioal distributios. Theorem 5. The followig are equivalet: (i) All the fiite-dimesioal distributios of the sample bouded processes X coverge i law, ad there exists a pseudometric ρ o T such that both: (a) (T, ρ) is totally bouded, ad (b) the processes X are asymptotically equicotiuous i probability with respect to ρ: that is } () lim δ 0 lim sup P r { sup X (s) X (t) > ɛ ρ(s,t)δ = 0 for all ɛ > 0. (ii) There exists a process X with tight Borel probability distributio o l (T ) ad such that X X i l (T ). If (i) holds, the the process X i (ii) (which is completely determied by the limitig fiitedimesioal distributios of {X }), has a versio with sample paths i C u (T, ρ), the space of all ρ uiformly cotiuous real-valued fuctios o T. If X i (ii) has sample fuctios i C u (T, γ) for some pseudometric γ for which (T, γ) is totally bouded, the (i) holds with the pseudometric ρ take to be γ. Proof. Suppose that (i) holds. Let T be a coutable ρ dese subset of T, ad let T k, k N, be fiite subsets of T satisfyig T k T. (Such sets exist by virtue of the hypothesis that (T, ρ) is totally bouded.) The limitig distributios of the processes X are cosistet, ad thus defie a stochastic process X o T. Furthermore, by the portmateau theorem for fiite-dimesioal covergece i distributio, P r{ max X(s) X(t) > ɛ} ρ(s,t)δ, s,t T k lim if P r{ max X (s) X (t) > ɛ} ρ(s,t)δ, s,t T k lim if P r{ max X (s) X (t) > ɛ}. ρ(s,t)δ, s,t T Takig the limit i the last display as k ad the usig the asymptotic equicotiuity coditio (), it follows that there is a sequece δ m 0 such that P r{ max X(s) X(t) > ɛ} 2 m. ρ(s,t)δ m, s,t T Hece it follows by Borel-Catelli that there exist m = m(ω) < a.s. such that sup X(s, ω) X(t, ω) 2 m ρ(s,t)δ m, s,t T for all m > m(ω). Therefore X(t, ω) is a ρ uiformly cotiuous fuctio of t T for almost every ω. The extesio to T by uiform cotiuity of the restrictio of X to T yields a versio of X with sample paths all i C u (T, ρ); ote that it suffices to cosider oly the set of ω s upo

28 28 CHAPTER. CONVERGENCE IN DISTRIBUTION which X is uiformly cotiuous. It the follows from Propositio 5. that the law of X exists as a tight Borel measure o l (T ). Our proof of covergece will be based o the followig fact (see Exercise 6.6): if H : l (T ) R is bouded ad cotiuous, ad K l (T ) is compact, the for every ɛ > 0 there exists τ > 0 such that: if x K ad y l (T ) with x y T < τ the (a) H(x) H(y) < ɛ. Now we are ready to prove the weak covergece part of (ii). Sice (T, ρ) is totally bouded, for every δ > 0 there exists a fiite set of poits t,..., t N(δ) that is δ dese i (T, ρ); i.e. T N(δ) i= B(t i, δ) where B(t, δ) is the ope ball with ceter t ad radius δ. Thus, for each t T we ca choose π δ (t) {t,..., t N(δ) } so that ρ(π δ (t), t) < δ. The we ca defie processes X,δ, N, ad X δ by X,δ (t) = X (π δ (t)) X δ (t) = X(π δ (t)), t T. Note that X,δ ad X δ are approximatios of the processes X ad X respectively that ca take o at most N(δ) differet values. Covergece of the fiite-dimesioal distributios of X to those of X implies that (b) X,δ X δ i l (T ). Furthermore, uiform cotiuity of the sample paths of X yields (c) lim X X δ T = 0 δ 0 a.s. Let H : l (T ) R be bouded ad cotiuous. The it follows that E H(X ) EH(X) E H(X ) EH(X,δ ) + EH(X,δ ) EH(X δ ) + EH(X δ ) EH(X) I,δ + II,δ + III δ. To show the covergece part of (ii) we eed to show that lim δ 0 lim sup of each of these three terms is 0. This follows for II,δ by (b). Now we show that lim δ 0 III δ = 0. Give ɛ > 0, let K l (T ) be a compact set such that P r{x K c } < ɛ/(6 H ), let τ > 0 be such that (a) holds for K ad ɛ/6, ad let δ > 0 be such that P r{ X δ X T τ} < ɛ/(6 H ) for all δ < δ ; this ca be doe by virtue of (c). The it follows that EH(X δ ) EH(X) 2 H P r{[x K c ] [ X δ X T τ]} + sup{ H(x) H(y) : x K, x y T < τ} ( ) ɛ ɛ 2 H + + ɛ 6 H 6 H 6 < ɛ, so that lim δ 0 III δ = 0 holds. To show that lim δ 0 lim sup I,δ = 0, chose ɛ, τ, ad K as above. The we have E H(X ) H(X,δ ) 2 H { P r { X X,δ T τ/2} + P r{x,δ (K τ/2 ) c } } (d) + sup{ H(x) H(y) : x K, x y T < τ} where K τ/2 is the τ/2 ope eighborhood of the set K for the sup orm. The iequality i the previous display ca be checked as follows: if X,δ K τ/2 ad X X,δ T < τ/2, the there

29 5. CHARACTERIZING WEAK CONVERGENCE IN SPACES OF FUNCTIONS 29 exists x K such that x X,δ T < τ/2 ad x X T < τ. Now the asymptotic equicotiuity hypothesis implies that there is a δ 2 such that lim sup P r { X,δ X T τ/2} < ɛ 6 H for all δ < δ 2, ad fiite-dimesioal covergece yields lim sup P r{x,δ (K τ/2 ) c } P r{x δ (K τ/2 ) c } Hece we coclude from (d) that, for δ < δ δ 2, lim sup E H(X ) EH(X,δ ) < ɛ, ɛ 6 H. ad this completes the proof that (i) implies (ii). The coverse implicatio is a easy cosequece of the closed set part of the portmateau theorem: if X X i l (T ), the, as for usual covergece i law, lim sup P r {X F } P r{x F } for every closed set F l (T ); see e.g. Va der Vaart ad Weller (996), page 8. If (ii) holds, the by Propositio 5. there is a pseudometric ρ o T which makes (T, ρ) totally bouded ad such that X has (a versio with) sample paths i C u (T, ρ). Thus for the closed set F = F δ,ɛ defied by F ɛ,δ = {x l (T ) : sup x(s) x(t) ɛ}, ρ(s,t)δ we have lim sup P r { sup X (s) X (t) ɛ ρ(s,t)δ } = lim sup P r {X F ɛ,δ } P r{x F ɛ,δ } = P r{ sup X(s) X(t) ɛ}. ρ(s,t)δ Takig limits across the resultig iequality as δ 0 yields the asymptotic equicotiuity i view of the ρ uiform cotiuity of the sample paths of X. Thus (ii) implies (i) We coclude this sectio by statig a obvious corollary of Theorem 5. for the empirical process G idexed by a class of measurable real-valued fuctios F o the probability space (X, A, P ), ad let ρ P be the pseudo-metric o F defied by ρ 2 P (f, g) = V ar P (f(x) g(x)) = P (f g) 2 [P (f g)] 2. Corollary Let F be a class of measurable fuctios o (X, A). The the followig are equivalet: (i) F is P Dosker: G G i l (F). (ii) (F, ρ P ) is totally bouded ad G is asymptotically equicotiuous with respect to ρ P i probability: i.e. { } (2) lim lim sup P r sup G (f) G (g) > ɛ = 0 δ 0 f,g F: ρ P (f,g)<δ for all ɛ > 0.

30 30 CHAPTER. CONVERGENCE IN DISTRIBUTION We close this sectio with aother equivalet formulatio of the asymptotic equicotiuity coditio i terms of partitios of the set T. A sequece {X } i l (T ) is said to be asymptotically tight if for every ɛ > 0 there exists a compact set K l (T ) such that lim if P (X K δ ) ɛ for every δ > 0. Here K δ = {y l (T ) : d(y, K) < δ} is the δ elargemet of K. Theorem 5.2 The sequece {X } i l (T ) is asymptotically tight if ad oly if X (t) is asymptotically tight i R for every t T ad, for every ɛ > 0, η > 0, there exists a fiite partitio T = k i= T i such that ) lim sup P ( sup ik sup X (s) X (t) > ɛ s,t T i < η. Proof. See Va der Vaart ad Weller (996), Theorem.5.6, page 36. Example 5.2 (Partial sum process) Suppose that X, X 2,... are i.i.d. radom variables with E(X ) = 0, V ar(x ) =. The partial sum process S is defied by S (t) t X i for 0 t <. i= We will cosider the process {S (t) : 0 t }. Note that S takes values i D[0, ] sice it has jumps of size X i / at the poits t = i/, i =,...,. The liearly iterpolated versio of the process S is give by S (k/) = S (k/) ad S (t) = S (k/) + (t k/)x k+, k/ t (k + )/. Note that S takes values i C[0, ], ad that (3) S S /2 max i X i a.s. 0 sice E(X 2 ) <. To show that the fiite-dimesioal distributios of S coverge i distributio, we will show that the fiite dimesioal distributios of S coverge i distributio. By (3) the same will hold for S. Let 0 < t < < t k, ad cosider the radom vectors Y (S (t ),..., S (t k )) i R k. Defie g : R k R k by g(y) = (y, y 2 y, y 3 y 2,..., y k y k ). The g(y ) = (S (t ), S (t 2 ) S (t ),..., S (t k ) S (t k )) has compoets which are idepedet (by idepedece of the X i s), ad S (t j ) S (t j ) = = t j i= t j + X i tj t j tj t j t j i= t j + d tj t j Z j d = S(tj ) S(t j ) N(0, t j t j ), j =,..., k X i