Efficiency and the Cramér-Rao Inequality

Similar documents
Maximum Likelihood Estimation

Basics of Statistical Machine Learning

Principle of Data Reduction

Practice problems for Homework 11 - Point Estimation

General Sampling Methods

Econometrics Simple Linear Regression

Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

1 Sufficient statistics

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

4. Continuous Random Variables, the Pareto and Normal Distributions

Multivariate normal distribution and testing for means (see MKB Ch 3)

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

1 Prior Probability and Posterior Probability

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Inner Product Spaces

5. Continuous Random Variables

Hypothesis testing - Steps

Methods for Finding Bases

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Vector and Matrix Norms

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

1 Norms and Vector Spaces

Systems of Linear Equations

Normal distribution. ) 2 /2σ. 2π σ

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

4.5 Linear Dependence and Linear Independence

1.3. DOT PRODUCT If θ is the angle (between 0 and π) between two non-zero vectors u and v,

by the matrix A results in a vector which is a reflection of the given

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Moreover, under the risk neutral measure, it must be the case that (5) r t = µ t.

Estimating an ARMA Process

Lecture L3 - Vectors, Matrices and Coordinate Transformations

HYPOTHESIS TESTING: POWER OF THE TEST

Probability Generating Functions

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 14 10/27/2008 MOMENT GENERATING FUNCTIONS

Notes on Determinant

Metric Spaces. Chapter Metrics

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 18. A Brief Introduction to Continuous Probability

Chapter 17. Orthogonal Matrices and Symmetries of Space

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Hypothesis Testing for Beginners

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

E3: PROBABILITY AND STATISTICS lecture notes

T ( a i x i ) = a i T (x i ).

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

minimal polyonomial Example

1 The Brownian bridge construction

Math 461 Fall 2006 Test 2 Solutions

Exact Confidence Intervals

Chapter 2. Parameterized Curves in R 3

Chapter 8. The exponential family: Basics. 8.1 The exponential family

Lecture 21. The Multivariate Normal Distribution

LS.6 Solution Matrices

M2S1 Lecture Notes. G. A. Young ayoung

160 CHAPTER 4. VECTOR SPACES

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Probability Calculator

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Gaussian Conjugate Prior Cheat Sheet

Lecture 3: Linear methods for classification

Factor Analysis. Factor Analysis

Lecture 8. Confidence intervals and the central limit theorem

Unified Lecture # 4 Vectors

The Bivariate Normal Distribution

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

Solving Linear Systems, Continued and The Inverse of a Matrix

The Mean Value Theorem

6. Define log(z) so that π < I log(z) π. Discuss the identities e log(z) = z and log(e w ) = w.

BANACH AND HILBERT SPACE REVIEW

Time Series Analysis

Credit Risk Models: An Overview

Introduction to General and Generalized Linear Models

2.6. Probability. In general the probability density of a random variable satisfies two conditions:

STAT 830 Convergence in Distribution

Linear Maps. Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007)

CHAPTER IV - BROWNIAN MOTION

5.1 Identifying the Target Parameter

Statistical Machine Learning from Data

An Introduction to Basic Statistics and Probability

An Introduction to Partial Differential Equations in the Undergraduate Curriculum

Notes on Continuous Random Variables

Section 5.1 Continuous Random Variables: Introduction

Lecture 2: Universality

Review Jeopardy. Blue vs. Orange. Review Jeopardy

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

v w is orthogonal to both v and w. the three vectors v, w and v w form a right-handed set of vectors.

9 Multiplication of Vectors: The Scalar or Dot Product

Let H and J be as in the above lemma. The result of the lemma shows that the integral

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Some probability and statistics

DIFFERENTIABILITY OF COMPLEX FUNCTIONS. Contents

TOPIC 4: DERIVATIVES

Solutions to Homework 10

Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS

Transcription:

Chapter Efficiency and the Cramér-Rao Inequality Clearly we would like an unbiased estimator ˆφ (X of φ (θ to produce, in the long run, estimates which are fairly concentrated i.e. have high precision. An estimator ˆφ is unlikely to impress anybody if it produces widely varying estimates on different occasions. This suggests that V ar(ˆφ should be as small as possible. But how small can V ar(ˆφ be? The answer is given by the following result. The Cramér-Rao Inequality Let X = (X,X,...,X n be a random sample from a distribution with p.m/d.f. f(x θ, where θ is a scalar parameter. Under certain regularity conditions on f(x θ, for any unbiased estimator ˆφ (X of φ (θ [ V ar (ˆφ d φ dθ (θ] (. E(S (X where S(X = θ log f X(X θ = ( n θ log f(x i θ (. Remark ( The function S(X is called the Score statistic and will be seen to play an important role in statistical inference. 7

8CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY [ d dθ φ (θ] ( The bound on the r.h.s. of the Cramér-Rao inequality is E(S (X known as the Cramér-Rao Lower Bound (CRLB for the variance of an unbiased estimator of φ(θ. (3 An estimator whose variance is equal to the CRLB is called a most efficient estimator The efficiency eff(ˆφ of an unbiased estimator ˆφ of φ(θ is defined to be the ratio eff(ˆφ = CRLB V ar(ˆφ Note that 0 eff(ˆφ for all θ Θ. (4 The quantity I(θ = E(S (X (.3 is called the Fisher information on the parameter θ contained in the sample X. Note that (i the Fisher information is, in general, a function of the value of θ, (ii the higher the Fisher information i.e. the more information there is in the sample, the smaller the CRLB and consequently the smaller the variance of the most efficient estimator (when it exists. Proof of the Cramér-Rao inequality (for case when observations are continuous. For discrete observations replace integrals by sums. Since ˆφ is an unbiased estimator E (ˆφ = φ (θ for all θ i.e.... ˆφ(x,x,...,x n f X (x,x,...,x n θdx dx...dx n = φ(θ and for convenience we write ˆφ (xfx (x θdx = φ(θ (.4

Differentiating both sides with respect to θ and interchanging the order of differentiation and integration (assuming this is possible i.e. regularity conditions hold we get 9 ˆφ (x θ f X(x θdx = d φ (θ (.5 dθ We now note that from the definition of the score statistic we have S(x = θ log f X(x θ = f X (x θ θ f X(x θ and hence θ f X(x θ =S(xf X (x θ (.6 Substituting in.4 we have ˆφ (xs(xfx (x θdx = d dθ φ (θ i.e. E (ˆφ d (XS(X = φ (θ (.7 dθ Next we note that the score statistic has zero expectation since E (S(X = S(xf X (x θdx = θ f X(x θdx because of.6. Again interchanging differentiation with integration (i.e. assuming necessary regularity conditions hold we get Hence E (S(X = θ f X (x θdx = θ = 0 (.8.. Cov(ˆφ,S = E(ˆφ,S E (ˆφ d E (S = E(ˆφ,S = φ (θ (.9 dθ the last equality following from.7, and V ar(s = E ( S E (S = E(S =I(θ (.0

0CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY Finally, since for any two random variables the correlation coefficient between them is always less than in absolute value, we have ρ (ˆφ,S i.e. [ Cov(ˆφ,S ] or V ar(ˆφv ar(s [ Cov(ˆφ,S ] [ d dθ φ (θ] V ar(ˆφ V ar(s = I(θ which proves the Cramér-Rao inequality. This proof has also provided us with the following result. Corollary Under certain regularity conditions the score statistic S(X has mean zero and variance equal to the Fisher information i.e. E (S(X = 0 (. V ar(s(x = I(θ (. Efficiency: Since we now know that the CRLB is the smallest value that the variance of any estimator of φ(θ can be, the efficiency of an estimator of θ can be assessed by comparing its variance against the CRLB. In fact, as we have seen, we have the following definition for the efficiency eff(ˆφ of an estimator ˆφ of φ(θ. Definition The efficiency of an estimator ˆφ of φ(θ is defined to be eff(ˆφ = = CRLB V ar(ˆφ [ d dθ φ(θ] I(θ V ar(ˆφ (.3

Remark Note that 0 eff(ˆφ and that the closer to the value of eff(ˆφ is, the more efficient the estimator ˆφ is; indeed when ˆφis most efficient then eff(ˆφ =. Note that a most efficient estimator of φ(θ need not always exist. When dealing with unbiased estimators, if a most efficient unbiased estimator does not exist (i.e. if there is no unbiased estimator whose variance is as small as the CRLB, we seek out the estimator whose variance is the closest to the CRLB i.e. the unbiased estimator with the highest efficiency. When does a most efficient estimator exist? The following result not only gives the necessary and sufficient conditions for a most efficient estimator to exist, it also provides this estimator. Theorem Under the same regularity conditions as in the Cramèr-Rao Inequality, there exists an unbiased estimator ˆφ of φ(θ whose variance attains the CRLB (i.e. there exists a most efficient unbiased estimator ˆφ of φ(θ if and only if the score statistic S(x can be expressed in the form S(X =α (θ [ˆφ (X φ(θ ] (.4 in which case α (θ = I(θ d φ (θ dθ or equivalently if and only if the function d dθ S(X. φ(θ + φ(θ I(θ is independent of θ and is only dependent on X, in which this statistic is the unbiased most efficient estimator of φ(θ. Proof. Recall that the Cramér-Rao inequality was a result of the fact that for any unbiased estimator ˆφ of φ(θ, ρ (ˆφ,S. Hence the CRLB is attained if and only if ρ (ˆφ,S =. But this can only happen if and only if ˆφ and S are linearly related random variables i.e. if and only if there exist functions α (θ and β (θ which are independent of X such that S(X =α (θ ˆφ (X + β (θ (.5

CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY Recall, however, (from corollary 4 that S(X has zero expectation and that ˆφ is an unbiased estimator of φ(θ. Hence taking expectations of both sides of.5 we get 0 = α (θ φ(θ + β (θ or β (θ = α (θφ(θ which, on substituting in.5 we get S(X =α (θ ˆφ (X α (θφ(θ = α (θ [ˆφ (X φ(θ ] (.6 confirming (.4. Multiplying both sides of (.6 by S(X and taking expectations throughout we get E ( S (X = α (θ [ E (ˆφ (XS(X φ (θe(s(x ] and in view of corollary 4 we have which by (.9 and(. reduces to E ( S (X = α (θcov (ˆθ (X,S(X completing the proof. I(θ = α (θ d dθ φ (θ.0.3 A computationally useful result The expression I(θ = E ( S (X ( = E θ log f X (X θ for the Fisher information does not lend itself to easy computation. The alternative form given below is computationally more attractive and you are strongly recommended to use it when finding the Fisher information in examples, exercises or exam questions. Proposition Under the same regularity conditions as in the Cramér-Rao Inequality

3 I(θ = E ( θ log f X (X θ (.7 Proof θ log f X (X θ = { } θ f X (X θ θ f X (X θ [ ] = [f X (X θ] θ f X (X θ + f X (X θ θ f X (X θ [ ] = θ log f X (X θ + f X (X θ θ f X (X θ = S (X + f X (X θ θ f X (X θ Taking expectations throughout we get ( E θ log f X (X θ = E ( S (X ( + E f X (X θ θ f X (X θ ( = I(θ + E f X (X θ θ f X (X θ (.8 But (assuming continuous observations replace integrals by summations if observations are discrete. ( ( E f X (X θ θ f X (X θ = f X (x θ θ f X (x θ f X (x θdx = θ f X (x θdx = f θ X (x θ dx = θ = 0 i.e. the second term in.8 is zero and the result follows. Example Let X = (X,X,...,X n be a random sample from the exponential distribution with parameter θ.find the Cramér-Rao lower bound for

4CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY unbiased estimators of θ. Does there exist an unbiased estimator of θ whose variance is equal to the CRLB? Solution: The Cramér-Rao inequality is with V ar(ˆθ I(θ I(θ = E ( S (X = E But f X (x θ = n θe θx i = θ n e θ n x i log f X (x θ = n log θ θ n x i x i S(x = θ log f X (x θ = n θ n ( θ log f X (x θ = n θ Hence ( I(θ = E θ log f X (X θ and Now V ar(ˆθ I(θ = θ n. S(X = n θ n θ log f X (X θ = E ( n = n θ θ X i and this cannot be expressed in the form α (θ [ˆθ (X θ ] i.e. there does not exist an unbiased estimator of θ whose variance is equal to the CRLB. This can be confirmed by the fact that with φ(θ = θ and φ (θ = dφ(θ dθ, which is not independent of θ. S(X. φ (θ I(θ + φ(θ = (n θ X i θ n + θ

Example In the last example, what is the CRLB for the variances of unbiased estimators of /θ? Does there exist an unbiased estimator of /θ whose variance is equal to the CRLB? 5 Solution: For any unbiased estimator ˆψ of /θ Further from the last example [ ( ] V ar( ˆψ θ θ [ ] = θ I(θ n = θ nθ S(X = n n [ θ n X i X i = n ] [ ] = n X n θ θ which implies that X is an unbiased estimator of θ to the CRLB =. Further, by (?? nθ n = [ θ I(θ ] = ( I(θ θ ( θ whose variance is equal I(θ = n ( = n θ θ which agrees with the direct computation of I(θ. A Remark on the regularity conditions: We have seen that the regularity conditions we require are those which allows us to interchange differentiation with integration on two occasions. These required regularity conditions will normally fail to hold if. the density/mass function f(x θ does not tail off rapidly enough (i.e. high values of x can be produced with relatively high probability so that the integrals converge slowly. In practice most of the distributions we deal with satisfy this regularity condition.

6CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY. the effective range of integration depends on θ. This will be the case if the range of possible values X ofx depends on θ. One such case is the case when X has the Uniform U(0,θ distribution over the interval (0,θ so that { /θ for 0 < x < θ f(x θ = 0 otherwise.0.4 Multivariate version of the Cramér-Rao Inequality. Theorem: Let X be a sample of observations with joint p.d/m.f. f X (x θ dependent on a vector parameter θ = (θ,θ,...,θ k T. Let φ (θ be a real valued function of θ. Then under some regularity conditions, for any unbiased estimator ˆφ (X of φ (θ V ar (ˆφ δ T [I(θ] δ where δ is the vector of derivatives of φ (θ i.e. δ = ( θ φ (θ, φ (θ,..., φ (θ (.9 θ θ k and the matrix I(θ is the Fisher Information matrix with (i,jth element I ij (θ = E (S i (XS j (X ( = E log f X (X θ θ i θ j (.0 i,j =,,...,k,. Here S i (X = θ i log f X (X θ (. is the ith score statistic i =,,...,k. In particular, if ˆθ r (X is an unbiased estimator of the rth parameter θ r then under the same regularity conditions V ar (ˆθr J rr (θ

where J rr (θ is the rth diagonal element of the inverse matrix [I(θ], r =,,...,k. 7 Proof: The proof is very similar to that of the scalar case and its details are not provided. Below, however, is a sketch of the proof. The details can be filled in by you.. Following exactly the same approach as in the scalar case show that θ j φ(θ = Cov(ˆφ(X,S j (X. Make use of the fact that for any coefficients c,c,...,c k, k V ar(ˆφ(x c i S i (X 0 (. In particular, this is true for the values c = c,c = c,...,c k = c k which minimize V ar(ˆφ(x k c i S i (X. 3. Using. show that k V ar(ˆφ(x c i S i (X = V ar(ˆφ(x + c T I(θc c T δ with c = (c,c,...,c k T and then using calculus show that 4. Thus show that c = (c,c,...,c k T = [I(θ] δ k min(v ar(ˆφ(x c i S i (X = V ar(ˆφ(x δ T [I(θ] δ and because of. that the result follows. Remark: As in the univariate case the proof also shows that. E (S i (X = 0 i =,,...,k. (.3

8CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY. I ij (θ = Cov (S i (X,S j (X i,j =,,...,k (.4 Example Let X,X,...,X n be a random sample from the N(µ,σ distribution. Find the CRLB and, in cases. and. check whether it is equalled, for the variance of an unbiased estimator of. µ when σ is known,. σ when µ is known 3. µ when σ is unknown 4. σ when µ is unknown 5. the coefficient of variation σ µ when both µ and σ are unknown. Solution:The sample joint p.d.f. is f X (x θ = n σ π exp( (x i µ /σ and log f X (x θ = n log(π n log ( σ (x i µ /σ. When σ is known θ = µ and log f X (x θ = n log(π n log ( σ (x i θ /σ S(x = θ log f X(x θ = (x i θ /σ = n [ x θ] σ where x = n n x i. By the theorem on page this implies that X = n n X i is an unbiased estimator of θ = µ whose variance equals the CRLB and that n σ = I(θ i.e. CRLB = σ n. Thus X is a most efficient estimator.

9. When µ is known but σ is unknown, θ = σ and log f X (x θ = n log(π n log (θ θ Hence (x i µ S(x = θ log f X(x θ = n θ + θ = n [ ] (x θ i µ θ n (x i µ By the theorem on page, n (X n i µ is an unbiased estimator of θ = σ and n θ = I(θ i.e. the CRLB = θ n = σ4 n ( ( µ 3. and 4. Case both µ and σ θ unknown Here θ = σ = i.e. θ θ = µ and θ = σ f X (x θ = n σ π exp( (x i µ /σ ( θ n/ exp (x i θ θ and Thus log f X (x θ = n log θ θ θ log f X (x θ = θ (x i θ (x i θ log f θ X (x θ = n θ log f X (x θ = (x θ θ θ i θ log f X (x θ = n + (x θ θ θ i θ

30CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY log f θ X (x θ = n (x θ θ 3 i θ Consequently I (θ = E ( n = n θ θ ( I (θ = E (X θ i θ = 0 ( n I (θ = E (X θ θ 3 i θ = n θ i.e. n 0 I(θ = θ n 0 θ and θ σ [I(θ] 0 = J(θ = n θ = 0 n σ 0 4 0 n n Consequently, for unbiased estimators ˆµ, ˆσ of µ and σ respectively and V ar (ˆµ σ n V ar (ˆσ σ4 n 5. We were now interested in estimating the ratio σ µ = θ µ = θ and σ = θ are unknown. Since ( ( ( ( θ θ, = θ θ θ θ θ it follows that if ˆφ is an unbiased estimator of σ µ then θ θ when both (, = σ θ θ µ, µσ V ar (ˆφ ( σ µ, [I(θ] µσ σ µ µσ

3 = ( σ µ, µσ [ ] = σ4 + µ nµ 4 σ σ n 0 0 σ 4 n σ µ µσ Example Let X,X,...,X n be a random sample from the trinomial distribution with p.m.f. f(x θ = m! x!x! (m x x! θx θ x ( θ θ m x x where θ = (θ,θ are unknown values. Find lower bounds for the variance of unbiased estimators of θ and θ. Solution: The sample joint p.m.f. is and Hence f X (x θ = n log f X (x θ m! r i!r i!r i3! θr i θ r i ( θ θ m r i r i n x i log f X (x θ = θ θ log f X (x θ = x i log θ + x i log θ + + (m x i x i log ( θ θ θ n x i θ n (m x i x i ( θ θ n (m x i x i ( θ θ and since E (x i = mθ, E (x i = mθ ( I (θ = E log f X (X θ By symmetry I (θ = E θ ( θ log f X (X θ = nm θ + nm ( θ θ = nm nm + θ ( θ θ (.5 (.6

3CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY Finally and hence n (m x i x i log f X (x θ = θ θ ( θ θ I (θ = E ( θ θ log f X (X θ = nm ( θ θ (.7 Hence the Fisher Information matrix is + I(θ = nm θ ( θ θ ( θ θ ( θ θ + θ ( θ θ and J (θ = [I (θ] = θ θ ( θ θ nm θ + ( θ θ ( θ θ ( θ θ + θ ( θ θ Thus if ˆθ is an unbiased estimator of θ V ar (ˆθ J (θ = θ θ ( θ θ nm = θ ( θ nm Similarly if ˆθ is an unbiased estimator of θ ( θ + V ar (ˆθ J (θ = θ ( θ nm ( θ θ Notice that if θ was assumed to be known then the CRLB for the variance of ˆθ would have been V ar (ˆθ I (θ i.e. V ar (ˆθ nm ( = θ ( θ θ θ + nm ( θ ( θ θ

33 Since ( θ ( θ > ( θ θ for 0 < θ,θ < it follows that J (θ > I (θ which indicates that if θ is unknown it would be wrong to use the inequality V ar (ˆθ I (θ since this inequality is not as tight as the correct inequality In general it can be shown that V ar (ˆθ J (θ J ii (θ I ii (θ for all i, with strict inequality (and hence improved lower bound for the variance of an unbiased estimator of θ i unless the Fisher information matrix I(θ is diagonal (see C.R.Rao (973 Linear Statistical Inference and its applicationse