STAT 830 Convergence in Distribution

Similar documents

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS

5. Continuous Random Variables

Multivariate normal distribution and testing for means (see MKB Ch 3)

Lecture 13: Martingales

Normal distribution. ) 2 /2σ. 2π σ

THE CENTRAL LIMIT THEOREM TORONTO

1 Sufficient statistics

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

1 Prior Probability and Posterior Probability

An Introduction to Basic Statistics and Probability

Practice problems for Homework 11 - Point Estimation

Probability Generating Functions

Section 5.1 Continuous Random Variables: Introduction

Lecture 6: Discrete & Continuous Probability and Random Variables

Math 461 Fall 2006 Test 2 Solutions

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

4. Continuous Random Variables, the Pareto and Normal Distributions

WHERE DOES THE 10% CONDITION COME FROM?

Joint Exam 1/P Sample Exam 1

Lecture 8: More Continuous Random Variables

Dongfeng Li. Autumn 2010

Lectures on Stochastic Processes. William G. Faris

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

Principle of Data Reduction

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Lectures 5-6: Taylor Series

Lecture 3: Continuous distributions, expected value & mean, variance, the normal distribution

Lecture 8. Confidence intervals and the central limit theorem

Exponential Distribution

Notes on Continuous Random Variables

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

Maximum Likelihood Estimation

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Statistics 100A Homework 7 Solutions

Basics of Statistical Machine Learning

Statistics 100A Homework 4 Solutions

Exact Confidence Intervals

Multivariate Normal Distribution

STAT 315: HOW TO CHOOSE A DISTRIBUTION FOR A RANDOM VARIABLE

Introduction to General and Generalized Linear Models

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density

Statistics 100A Homework 8 Solutions

The Exponential Distribution

Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Sampling Distributions

Statistics 104: Section 6!

ECE302 Spring 2006 HW5 Solutions February 21,

Hypothesis Testing for Beginners

Probability Distribution for Discrete Random Variables

HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)!

Concentration inequalities for order statistics Using the entropy method and Rényi s representation

Taylor and Maclaurin Series

4 Sums of Random Variables

Chapter 6: Point Estimation. Fall Probability & Statistics

The Binomial Distribution

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

6.4 Normal Distribution

Inner Product Spaces

Aggregate Loss Models

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

Random variables, probability distributions, binomial random variable

Lesson 20. Probability and Cumulative Distribution Functions

6 PROBABILITY GENERATING FUNCTIONS

Math 151. Rumbos Spring Solutions to Assignment #22

Some probability and statistics

Lecture 3: Linear methods for classification

M/M/1 and M/M/m Queueing Systems

Lecture Notes 1. Brief Review of Basic Probability

Metric Spaces. Chapter Metrics

e.g. arrival of a customer to a service station or breakdown of a component in some system.

Chapter 4. Probability and Probability Distributions

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 18. A Brief Introduction to Continuous Probability

Chapter 4 Lecture Notes

E3: PROBABILITY AND STATISTICS lecture notes

Stanford Math Circle: Sunday, May 9, 2010 Square-Triangular Numbers, Pell s Equation, and Continued Fractions

A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails

10.2 Series and Convergence

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Chapter 4 Statistical Inference in Quality Control and Improvement. Statistical Quality Control (D. C. Montgomery)

1.1 Introduction, and Review of Probability Theory Random Variable, Range, Types of Random Variables CDF, PDF, Quantiles...

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

THE BANACH CONTRACTION PRINCIPLE. Contents

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

ST 371 (IV): Discrete Random Variables

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Stat 704 Data Analysis I Probability Review

Notes for a graduate-level course in asymptotics for statisticians. David R. Hunter Penn State University

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Transcription:

STAT 830 Convergence in Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 1 / 31

Purposes of These Notes Define convergence in distribution State central limit theorem Discuss Edgeworth expansions Discuss extensions of the central limit theorem Discuss Slutsky s theorem and the δ method. Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 2 / 31

Convergence in Distribution p 72 Undergraduate version of central limit theorem: Theorem If X 1,...,X n are iid from a population with mean µ and standard deviation σ then n 1/2 ( X µ)/σ has approximately a normal distribution. Also Binomial(n, p) random variable has approximately a N(np, np(1 p)) distribution. Precise meaning of statements like X and Y have approximately the same distribution? Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 3 / 31

Towards precision Desired meaning: X and Y have nearly the same cdf. But care needed. Q1: If n is a large number is the N(0,1/n) distribution close to the distribution of X 0? Q2: Is N(0, 1/n) close to the N(1/n, 1/n) distribution? Q3: Is N(0,1/n) close to N(1/ n,1/n) distribution? Q4: If X n 2 n is the distribution of X n close to that of X 0? Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 4 / 31

Some numerical examples? Answers depend on how close close needs to be so it s a matter of definition. In practice the usual sort of approximation we want to make is to say that some random variable X, say, has nearly some continuous distribution, like N(0, 1). So: want to know probabilities like P(X > x) are nearly P(N(0,1) > x). Real difficulty: case of discrete random variables or infinite dimensions: not done in this course. Mathematicians meaning of close: Either they can provide an upper bound on the distance between the two things or they are talking about taking a limit. In this course we take limits. Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 5 / 31

The definition p 75 Def n: A sequence of random variables X n converges in distribution to a random variable X if E(g(X n )) E(g(X)) for every bounded continuous function g. Theorem The following are equivalent: 1 X n converges in distribution to X. 2 P(X n x) P(X x) for each x such that P(X = x) = 0. 3 The limit of the characteristic functions of X n is the characteristic function of X: for every real t E(e itxn ) E(e itx ). These are all implied by M Xn (t) M X (t) < for all t ǫ for some positive ǫ. Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 6 / 31

Answering the questions X n N(0,1/n) and X = 0. Then 1 x > 0 P(X n x) 0 x < 0 1/2 x = 0 Now the limit is the cdf of X = 0 except for x = 0 and the cdf of X is not continuous at x = 0 so yes, X n converges to X in distribution. I asked if X n N(1/n,1/n) had a distribution close to that of Y n N(0,1/n). The definition I gave really requires me to answer by finding a limit X and proving that both X n and Y n converge to X in distribution. Take X = 0. Then and E(e txn ) = e t/n+t2 /(2n) 1 = E(e tx ) E(e tyn ) = e t2 /(2n) 1 so that both X n and Y n have the same limit in distribution. Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 7 / 31

First graph N(0,1/n) vs X=0; n=10000 0.0 0.2 0.4 0.6 0.8 1.0 X=0 N(0,1/n) -3-2 -1 0 1 2 3 N(0,1/n) vs X=0; n=10000 0.0 0.2 0.4 0.6 0.8 1.0 X=0 N(0,1/n) -0.03-0.02-0.01 0.0 0.01 0.02 0.03 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 8 / 31

Second graph N(1/n,1/n) vs N(0,1/n); n=10000 0.0 0.2 0.4 0.6 0.8 1.0 N(0,1/n) N(1/n,1/n) -3-2 -1 0 1 2 3 N(1/n,1/n) vs N(0,1/n); n=10000 0.0 0.2 0.4 0.6 0.8 1.0 N(0,1/n) N(1/n,1/n) -0.03-0.02-0.01 0.0 0.01 0.02 0.03 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 9 / 31

Scaling matters Multiply both X n and Y n by n 1/2 and let X N(0,1). Then nxn N(n 1/2,1) and ny n N(0,1). Use characteristic functions to prove that both nx n and ny n converge to N(0, 1) in distribution. If you now let X n N(n 1/2,1/n) and Y n N(0,1/n) then again both X n and Y n converge to 0 in distribution. If you multiply X n and Y n in the previous point by n 1/2 then n 1/2 X n N(1,1) and n 1/2 Y n N(0,1) so that n 1/2 X n and n 1/2 Y n are not close together in distribution. You can check that 2 n 0 in distribution. Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 10 / 31

Third graph N(1/sqrt(n),1/n) vs N(0,1/n); n=10000 0.0 0.2 0.4 0.6 0.8 1.0 N(0,1/n) N(1/sqrt(n),1/n) -3-2 -1 0 1 2 3 N(1/sqrt(n),1/n) vs N(0,1/n); n=10000 0.0 0.2 0.4 0.6 0.8 1.0 N(0,1/n) N(1/sqrt(n),1/n) -0.03-0.02-0.01 0.0 0.01 0.02 0.03 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 11 / 31

Summary To derive approximate distributions: Show sequence of rvs X n converges to some X. The limit distribution (i.e. dstbn of X) should be non-trivial, like say N(0,1). Don t say: X n is approximately N(1/n,1/n). Do say: n 1/2 (X n 1/n) converges to N(0,1) in distribution. Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 12 / 31

The Central Limit Theorem pp 77 79 Theorem If X 1,X 2, are iid with mean 0 and variance 1 then n 1/2 X converges in distribution to N(0, 1). That is, P(n 1/2 X x) 1 2π x e y2 /2 dy. Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 13 / 31

Proof of CLT As before E(e itn1/2 X) e t2 /2. This is the characteristic function of N(0,1) so we are done by our theorem. This is the worst sort of mathematics much beloved of statisticians reduce proof of one theorem to proof of much harder theorem. Then let someone else prove that. Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 14 / 31

Edgeworth expansions In fact if γ = E(X 3 ) then φ(t) 1 t 2 /2 iγt 3 /6+ keeping one more term. Then log(φ(t)) = log(1+u) where u = t 2 /2 iγt 3 /6+. Use log(1+u) = u u 2 /2+ to get log(φ(t)) [ t 2 /2 iγt 3 /6+ ] [ ] 2 /2+ which rearranged is log(φ(t)) t 2 /2 iγt 3 /6+. Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 15 / 31

Edgeworth Expansions Now apply this calculation to log(φ T (t)) t 2 /2 ie(t 3 )t 3 /6+. Remember E(T 3 ) = γ/ n and exponentiate to get φ T (t) e t2 /2 exp{ iγt 3 /(6 n)+ }. You can do a Taylor expansion of the second exponential around 0 because of the square root of n and get φ T (t) e t2 /2 (1 iγt 3 /(6 n)) neglecting higher order terms. This approximation to the characteristic function of T can be inverted to get an Edgeworth approximation to the density (or distribution) of T which looks like f T (x) 1 2π e x2 /2 [1 γ(x 3 3x)/(6 n)+ ]. Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 16 / 31

Remarks The error using the central limit theorem to approximate a density or a probability is proportional to n 1/2. This is improved to n 1 for symmetric densities for which γ = 0. These expansions are asymptotic. This means that the series indicated by usually does not converge. When n = 25 it may help to take the second term but get worse if you include the third or fourth or more. You can integrate the expansion above for the density to get an approximation for the cdf. Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 17 / 31

Multivariate convergence in distribution Def n: X n R p converges in distribution to X R p if E(g(X n )) E(g(X)) for each bounded continuous real valued function g on R p. This is equivalent to either of Cramér Wold Device: a t X n converges in distribution to a t X for each a R p. or Convergence of characteristic functions: for each a R p. E(e iat X n ) E(e iatx ) Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 18 / 31

Extensions of the CLT 1 Y 1,Y 2, iid in R p, mean µ, variance covariance Σ then n 1/2 (Ȳ µ) converges in distribution to MVN(0,Σ). 2 Lyapunov CLT: for each n X n1,...,x nn independent rvs with E(X ni ) = 0 Var( i X ni ) = 1 E( Xni 3 ) 0 then i X ni converges to N(0,1). 3 Lindeberg CLT: 1st two conds of Lyapunov and E(X 2 ni 1( X ni > ǫ)) 0 each ǫ > 0. Then i X ni converges in distribution to N(0,1). (Lyapunov s condition implies Lindeberg s.) 4 Non-independent rvs: m-dependent CLT, martingale CLT, CLT for mixing processes. 5 Not sums: Slutsky s theorem, δ method. Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 19 / 31

Slutsky s Theorem p 75 Theorem If X n converges in distribution to X and Y n converges in distribution (or in probability) to c, a constant, then X n +Y n converges in distribution to X +c. More generally, if f(x,y) is continuous then f(x n,y n ) f(x,c). Warning: the hypothesis that the limit of Y n be constant is essential. Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 20 / 31

The delta method pp 79-80, 131 135 Theorem Suppose: Sequence Y n of rvs converges to some y, a constant. X n = a n (Y n y) then X n converges in distribution to some random variable X. f is differentiable ftn on range of Y n. Then a n (f(y n ) f(y)) converges in distribution to f (y)x. If X n R p and f : R p R q then f is q p matrix of first derivatives of components of f. Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 21 / 31

Example Suppose X 1,...,X n are a sample from a population with mean µ, variance σ 2, and third and fourth central moments µ 3 and µ 4. Then n 1/2 (s 2 σ 2 ) N(0,µ 4 σ 4 ) where is notation for convergence in distribution. For simplicity I define s 2 = X 2 X 2. Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 22 / 31

How to apply δ method 1 Write statistic as a function of averages: Define W i = [ ] X 2 i. X i See that [ X 2 W n = ] X Define f(x 1,x 2 ) = x 1 x 2 2 See that s 2 = f( W n ). 2 Compute mean of your averages: [ ] [ µ W E( W E(X 2 n ) = i ) µ = 2 +σ 2 E(X i ) µ ]. 3 In δ method theorem take Y n = W n and y = E(Y n ). Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 23 / 31

Delta Method Continues 7 Take a n = n 1/2. 8 Use central limit theorem: n 1/2 (Y n y) MVN(0,Σ) where Σ = Var(W i ). 9 To compute Σ take expected value of (W µ W )(W µ W ) t There are 4 entries in this matrix. Top left entry is (X 2 µ 2 σ 2 ) 2 This has expectation: E { (X 2 µ 2 σ 2 ) 2} = E(X 4 ) (µ 2 +σ 2 ) 2. Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 24 / 31

Delta Method Continues Using binomial expansion: E(X 4 ) = E{(X µ+µ) 4 } = µ 4 +4µµ 3 +6µ 2 σ 2 +4µ 3 E(X µ)+µ 4. So Σ 11 = µ 4 σ 4 +4µµ 3 +4µ 2 σ 2. Top right entry is expectation of which is Similar to 4th moment get (X 2 µ 2 σ 2 )(X µ) E(X 3 ) µe(x 2 ) µ 3 +2µσ 2 Lower right entry is σ 2. So [ µ4 σ Σ = 4 +4µµ 3 +4µ 2 σ 2 µ 3 +2µσ 2 ] µ 3 +2µσ 2 σ 2 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 25 / 31

Delta Method Continues 7 Compute derivative (gradient) of f: has components (1, 2x 2 ). Evaluate at y = (µ 2 +σ 2,µ) to get a t = (1, 2µ). This leads to n 1/2 (s 2 σ 2 ) n 1/2 [1, 2µ] [ X 2 (µ 2 +σ 2 ) X µ ] which converges in distribution to (1, 2µ)MVN(0, Σ). This rv is N(0,a t Σa) = N(0,µ 4 σ 4 ). Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 26 / 31

Alternative approach Suppose c is constant. Define Xi = X i c. Sample variance of Xi is same as sample variance of X i. All central moments of Xi same as for X i so no loss in µ = 0. In this case: [ a t µ4 σ = (1,0) Σ = 4 ] µ 3 µ 3 σ 2. Notice that a t Σ = [µ 4 σ 4,µ 3 ] a t Σa = µ 4 σ 4. Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 27 / 31

Special Case: N(µ,σ 2 ) Then µ 3 = 0 and µ 4 = 3σ 4. Our calculation has n 1/2 (s 2 σ 2 ) N(0,2σ 4 ) You can divide through by σ 2 and get n 1/2 (s 2 /σ 2 1) N(0,2) In fact ns 2 /σ 2 has χ 2 n 1 distribution so usual CLT shows (n 1) 1/2 [ns 2 /σ 2 (n 1)] N(0,2) (using mean of χ 2 1 is 1 and variance is 2). Factor out n to get n n 1 n1/2 (s 2 /σ 2 1)+(n 1) 1/2 N(0,2) which is δ method calculation except for some constants. Difference is unimportant: Slutsky s theorem. Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 28 / 31

Example median Many, many statistics which are not explicitly functions of averages can be studied using averages. Later we will analyze MLEs and estimating equations this way. Here is an example which is less obvious. Suppose X 1,...,X n are iid cdf F, density f, median m. We study ˆm, the sample median. If n = 2k 1 is odd then ˆm is the kth largest. If n = 2k then there are many potential choices for ˆm between the kth and k +1th largest. I do the case of kth largest. The event ˆm x is the same as the event that the number of X i x is at least k. That is P(ˆm x) = P( i 1(X i x) k) Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 29 / 31

The median So P(ˆm x) = P( 1(X i x) k) i ( = P n(ˆfn (x) F(x)) ) n(k/n F(x)). From Central Limit theorem this is approximately ( n(k/n ) F(x)) 1 Φ. F(x)(1 F(x)) Notice k/n 1/2. Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 30 / 31

Median If we put x = m+y/ n (where m is true median) we find F(x) F(m) = 1/2. Also n(f(x) 1/2) f(m) where f is density of F (if f exists). So P( n(ˆm m) y) 1 Φ( 2f(m)y) That is, n(ˆm 1/2) N(0,1/(4f 2 (m))). Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 31 / 31