Efficiency and the Cramér-Rao Inequality

Chapter Efficiency and the Cramér-Rao Inequality Clearly we would like an unbiased estimator ˆφ (X of φ (θ to produce, in the long run, estimates which are fairly concentrated i.e. have high precision. An estimator ˆφ is unlikely to impress anybody if it produces widely varying estimates on different occasions. This suggests that V ar(ˆφ should be as small as possible. But how small can V ar(ˆφ be? The answer is given by the following result. The Cramér-Rao Inequality Let X = (X,X,...,X n be a random sample from a distribution with p.m/d.f. f(x θ, where θ is a scalar parameter. Under certain regularity conditions on f(x θ, for any unbiased estimator ˆφ (X of φ (θ [ V ar (ˆφ d φ dθ (θ] (. E(S (X where S(X = θ log f X(X θ = ( n θ log f(x i θ (. Remark ( The function S(X is called the Score statistic and will be seen to play an important role in statistical inference. 7

8CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY [ d dθ φ (θ] ( The bound on the r.h.s. of the Cramér-Rao inequality is E(S (X known as the Cramér-Rao Lower Bound (CRLB for the variance of an unbiased estimator of φ(θ. (3 An estimator whose variance is equal to the CRLB is called a most efficient estimator The efficiency eff(ˆφ of an unbiased estimator ˆφ of φ(θ is defined to be the ratio eff(ˆφ = CRLB V ar(ˆφ Note that 0 eff(ˆφ for all θ Θ. (4 The quantity I(θ = E(S (X (.3 is called the Fisher information on the parameter θ contained in the sample X. Note that (i the Fisher information is, in general, a function of the value of θ, (ii the higher the Fisher information i.e. the more information there is in the sample, the smaller the CRLB and consequently the smaller the variance of the most efficient estimator (when it exists. Proof of the Cramér-Rao inequality (for case when observations are continuous. For discrete observations replace integrals by sums. Since ˆφ is an unbiased estimator E (ˆφ = φ (θ for all θ i.e.... ˆφ(x,x,...,x n f X (x,x,...,x n θdx dx...dx n = φ(θ and for convenience we write ˆφ (xfx (x θdx = φ(θ (.4

Differentiating both sides with respect to θ and interchanging the order of differentiation and integration (assuming this is possible i.e. regularity conditions hold we get 9 ˆφ (x θ f X(x θdx = d φ (θ (.5 dθ We now note that from the definition of the score statistic we have S(x = θ log f X(x θ = f X (x θ θ f X(x θ and hence θ f X(x θ =S(xf X (x θ (.6 Substituting in.4 we have ˆφ (xs(xfx (x θdx = d dθ φ (θ i.e. E (ˆφ d (XS(X = φ (θ (.7 dθ Next we note that the score statistic has zero expectation since E (S(X = S(xf X (x θdx = θ f X(x θdx because of.6. Again interchanging differentiation with integration (i.e. assuming necessary regularity conditions hold we get Hence E (S(X = θ f X (x θdx = θ = 0 (.8.. Cov(ˆφ,S = E(ˆφ,S E (ˆφ d E (S = E(ˆφ,S = φ (θ (.9 dθ the last equality following from.7, and V ar(s = E ( S E (S = E(S =I(θ (.0

0CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY Finally, since for any two random variables the correlation coefficient between them is always less than in absolute value, we have ρ (ˆφ,S i.e. [ Cov(ˆφ,S ] or V ar(ˆφv ar(s [ Cov(ˆφ,S ] [ d dθ φ (θ] V ar(ˆφ V ar(s = I(θ which proves the Cramér-Rao inequality. This proof has also provided us with the following result. Corollary Under certain regularity conditions the score statistic S(X has mean zero and variance equal to the Fisher information i.e. E (S(X = 0 (. V ar(s(x = I(θ (. Efficiency: Since we now know that the CRLB is the smallest value that the variance of any estimator of φ(θ can be, the efficiency of an estimator of θ can be assessed by comparing its variance against the CRLB. In fact, as we have seen, we have the following definition for the efficiency eff(ˆφ of an estimator ˆφ of φ(θ. Definition The efficiency of an estimator ˆφ of φ(θ is defined to be eff(ˆφ = = CRLB V ar(ˆφ [ d dθ φ(θ] I(θ V ar(ˆφ (.3

Remark Note that 0 eff(ˆφ and that the closer to the value of eff(ˆφ is, the more efficient the estimator ˆφ is; indeed when ˆφis most efficient then eff(ˆφ =. Note that a most efficient estimator of φ(θ need not always exist. When dealing with unbiased estimators, if a most efficient unbiased estimator does not exist (i.e. if there is no unbiased estimator whose variance is as small as the CRLB, we seek out the estimator whose variance is the closest to the CRLB i.e. the unbiased estimator with the highest efficiency. When does a most efficient estimator exist? The following result not only gives the necessary and sufficient conditions for a most efficient estimator to exist, it also provides this estimator. Theorem Under the same regularity conditions as in the Cramèr-Rao Inequality, there exists an unbiased estimator ˆφ of φ(θ whose variance attains the CRLB (i.e. there exists a most efficient unbiased estimator ˆφ of φ(θ if and only if the score statistic S(x can be expressed in the form S(X =α (θ [ˆφ (X φ(θ ] (.4 in which case α (θ = I(θ d φ (θ dθ or equivalently if and only if the function d dθ S(X. φ(θ + φ(θ I(θ is independent of θ and is only dependent on X, in which this statistic is the unbiased most efficient estimator of φ(θ. Proof. Recall that the Cramér-Rao inequality was a result of the fact that for any unbiased estimator ˆφ of φ(θ, ρ (ˆφ,S. Hence the CRLB is attained if and only if ρ (ˆφ,S =. But this can only happen if and only if ˆφ and S are linearly related random variables i.e. if and only if there exist functions α (θ and β (θ which are independent of X such that S(X =α (θ ˆφ (X + β (θ (.5

CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY Recall, however, (from corollary 4 that S(X has zero expectation and that ˆφ is an unbiased estimator of φ(θ. Hence taking expectations of both sides of.5 we get 0 = α (θ φ(θ + β (θ or β (θ = α (θφ(θ which, on substituting in.5 we get S(X =α (θ ˆφ (X α (θφ(θ = α (θ [ˆφ (X φ(θ ] (.6 confirming (.4. Multiplying both sides of (.6 by S(X and taking expectations throughout we get E ( S (X = α (θ [ E (ˆφ (XS(X φ (θe(s(x ] and in view of corollary 4 we have which by (.9 and(. reduces to E ( S (X = α (θcov (ˆθ (X,S(X completing the proof. I(θ = α (θ d dθ φ (θ.0.3 A computationally useful result The expression I(θ = E ( S (X ( = E θ log f X (X θ for the Fisher information does not lend itself to easy computation. The alternative form given below is computationally more attractive and you are strongly recommended to use it when finding the Fisher information in examples, exercises or exam questions. Proposition Under the same regularity conditions as in the Cramér-Rao Inequality

3 I(θ = E ( θ log f X (X θ (.7 Proof θ log f X (X θ = { } θ f X (X θ θ f X (X θ [ ] = [f X (X θ] θ f X (X θ + f X (X θ θ f X (X θ [ ] = θ log f X (X θ + f X (X θ θ f X (X θ = S (X + f X (X θ θ f X (X θ Taking expectations throughout we get ( E θ log f X (X θ = E ( S (X ( + E f X (X θ θ f X (X θ ( = I(θ + E f X (X θ θ f X (X θ (.8 But (assuming continuous observations replace integrals by summations if observations are discrete. ( ( E f X (X θ θ f X (X θ = f X (x θ θ f X (x θ f X (x θdx = θ f X (x θdx = f θ X (x θ dx = θ = 0 i.e. the second term in.8 is zero and the result follows. Example Let X = (X,X,...,X n be a random sample from the exponential distribution with parameter θ.find the Cramér-Rao lower bound for

4CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY unbiased estimators of θ. Does there exist an unbiased estimator of θ whose variance is equal to the CRLB? Solution: The Cramér-Rao inequality is with V ar(ˆθ I(θ I(θ = E ( S (X = E But f X (x θ = n θe θx i = θ n e θ n x i log f X (x θ = n log θ θ n x i x i S(x = θ log f X (x θ = n θ n ( θ log f X (x θ = n θ Hence ( I(θ = E θ log f X (X θ and Now V ar(ˆθ I(θ = θ n. S(X = n θ n θ log f X (X θ = E ( n = n θ θ X i and this cannot be expressed in the form α (θ [ˆθ (X θ ] i.e. there does not exist an unbiased estimator of θ whose variance is equal to the CRLB. This can be confirmed by the fact that with φ(θ = θ and φ (θ = dφ(θ dθ, which is not independent of θ. S(X. φ (θ I(θ + φ(θ = (n θ X i θ n + θ

Example In the last example, what is the CRLB for the variances of unbiased estimators of /θ? Does there exist an unbiased estimator of /θ whose variance is equal to the CRLB? 5 Solution: For any unbiased estimator ˆψ of /θ Further from the last example [ ( ] V ar( ˆψ θ θ [ ] = θ I(θ n = θ nθ S(X = n n [ θ n X i X i = n ] [ ] = n X n θ θ which implies that X is an unbiased estimator of θ to the CRLB =. Further, by (?? nθ n = [ θ I(θ ] = ( I(θ θ ( θ whose variance is equal I(θ = n ( = n θ θ which agrees with the direct computation of I(θ. A Remark on the regularity conditions: We have seen that the regularity conditions we require are those which allows us to interchange differentiation with integration on two occasions. These required regularity conditions will normally fail to hold if. the density/mass function f(x θ does not tail off rapidly enough (i.e. high values of x can be produced with relatively high probability so that the integrals converge slowly. In practice most of the distributions we deal with satisfy this regularity condition.

6CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY. the effective range of integration depends on θ. This will be the case if the range of possible values X ofx depends on θ. One such case is the case when X has the Uniform U(0,θ distribution over the interval (0,θ so that { /θ for 0 < x < θ f(x θ = 0 otherwise.0.4 Multivariate version of the Cramér-Rao Inequality. Theorem: Let X be a sample of observations with joint p.d/m.f. f X (x θ dependent on a vector parameter θ = (θ,θ,...,θ k T. Let φ (θ be a real valued function of θ. Then under some regularity conditions, for any unbiased estimator ˆφ (X of φ (θ V ar (ˆφ δ T [I(θ] δ where δ is the vector of derivatives of φ (θ i.e. δ = ( θ φ (θ, φ (θ,..., φ (θ (.9 θ θ k and the matrix I(θ is the Fisher Information matrix with (i,jth element I ij (θ = E (S i (XS j (X ( = E log f X (X θ θ i θ j (.0 i,j =,,...,k,. Here S i (X = θ i log f X (X θ (. is the ith score statistic i =,,...,k. In particular, if ˆθ r (X is an unbiased estimator of the rth parameter θ r then under the same regularity conditions V ar (ˆθr J rr (θ

where J rr (θ is the rth diagonal element of the inverse matrix [I(θ], r =,,...,k. 7 Proof: The proof is very similar to that of the scalar case and its details are not provided. Below, however, is a sketch of the proof. The details can be filled in by you.. Following exactly the same approach as in the scalar case show that θ j φ(θ = Cov(ˆφ(X,S j (X. Make use of the fact that for any coefficients c,c,...,c k, k V ar(ˆφ(x c i S i (X 0 (. In particular, this is true for the values c = c,c = c,...,c k = c k which minimize V ar(ˆφ(x k c i S i (X. 3. Using. show that k V ar(ˆφ(x c i S i (X = V ar(ˆφ(x + c T I(θc c T δ with c = (c,c,...,c k T and then using calculus show that 4. Thus show that c = (c,c,...,c k T = [I(θ] δ k min(v ar(ˆφ(x c i S i (X = V ar(ˆφ(x δ T [I(θ] δ and because of. that the result follows. Remark: As in the univariate case the proof also shows that. E (S i (X = 0 i =,,...,k. (.3

8CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY. I ij (θ = Cov (S i (X,S j (X i,j =,,...,k (.4 Example Let X,X,...,X n be a random sample from the N(µ,σ distribution. Find the CRLB and, in cases. and. check whether it is equalled, for the variance of an unbiased estimator of. µ when σ is known,. σ when µ is known 3. µ when σ is unknown 4. σ when µ is unknown 5. the coefficient of variation σ µ when both µ and σ are unknown. Solution:The sample joint p.d.f. is f X (x θ = n σ π exp( (x i µ /σ and log f X (x θ = n log(π n log ( σ (x i µ /σ. When σ is known θ = µ and log f X (x θ = n log(π n log ( σ (x i θ /σ S(x = θ log f X(x θ = (x i θ /σ = n [ x θ] σ where x = n n x i. By the theorem on page this implies that X = n n X i is an unbiased estimator of θ = µ whose variance equals the CRLB and that n σ = I(θ i.e. CRLB = σ n. Thus X is a most efficient estimator.

9. When µ is known but σ is unknown, θ = σ and log f X (x θ = n log(π n log (θ θ Hence (x i µ S(x = θ log f X(x θ = n θ + θ = n [ ] (x θ i µ θ n (x i µ By the theorem on page, n (X n i µ is an unbiased estimator of θ = σ and n θ = I(θ i.e. the CRLB = θ n = σ4 n ( ( µ 3. and 4. Case both µ and σ θ unknown Here θ = σ = i.e. θ θ = µ and θ = σ f X (x θ = n σ π exp( (x i µ /σ ( θ n/ exp (x i θ θ and Thus log f X (x θ = n log θ θ θ log f X (x θ = θ (x i θ (x i θ log f θ X (x θ = n θ log f X (x θ = (x θ θ θ i θ log f X (x θ = n + (x θ θ θ i θ

30CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY log f θ X (x θ = n (x θ θ 3 i θ Consequently I (θ = E ( n = n θ θ ( I (θ = E (X θ i θ = 0 ( n I (θ = E (X θ θ 3 i θ = n θ i.e. n 0 I(θ = θ n 0 θ and θ σ [I(θ] 0 = J(θ = n θ = 0 n σ 0 4 0 n n Consequently, for unbiased estimators ˆµ, ˆσ of µ and σ respectively and V ar (ˆµ σ n V ar (ˆσ σ4 n 5. We were now interested in estimating the ratio σ µ = θ µ = θ and σ = θ are unknown. Since ( ( ( ( θ θ, = θ θ θ θ θ it follows that if ˆφ is an unbiased estimator of σ µ then θ θ when both (, = σ θ θ µ, µσ V ar (ˆφ ( σ µ, [I(θ] µσ σ µ µσ

3 = ( σ µ, µσ [ ] = σ4 + µ nµ 4 σ σ n 0 0 σ 4 n σ µ µσ Example Let X,X,...,X n be a random sample from the trinomial distribution with p.m.f. f(x θ = m! x!x! (m x x! θx θ x ( θ θ m x x where θ = (θ,θ are unknown values. Find lower bounds for the variance of unbiased estimators of θ and θ. Solution: The sample joint p.m.f. is and Hence f X (x θ = n log f X (x θ m! r i!r i!r i3! θr i θ r i ( θ θ m r i r i n x i log f X (x θ = θ θ log f X (x θ = x i log θ + x i log θ + + (m x i x i log ( θ θ θ n x i θ n (m x i x i ( θ θ n (m x i x i ( θ θ and since E (x i = mθ, E (x i = mθ ( I (θ = E log f X (X θ By symmetry I (θ = E θ ( θ log f X (X θ = nm θ + nm ( θ θ = nm nm + θ ( θ θ (.5 (.6

3CHAPTER. EFFICIENCY AND THE CRAMÉR-RAO INEQUALITY Finally and hence n (m x i x i log f X (x θ = θ θ ( θ θ I (θ = E ( θ θ log f X (X θ = nm ( θ θ (.7 Hence the Fisher Information matrix is + I(θ = nm θ ( θ θ ( θ θ ( θ θ + θ ( θ θ and J (θ = [I (θ] = θ θ ( θ θ nm θ + ( θ θ ( θ θ ( θ θ + θ ( θ θ Thus if ˆθ is an unbiased estimator of θ V ar (ˆθ J (θ = θ θ ( θ θ nm = θ ( θ nm Similarly if ˆθ is an unbiased estimator of θ ( θ + V ar (ˆθ J (θ = θ ( θ nm ( θ θ Notice that if θ was assumed to be known then the CRLB for the variance of ˆθ would have been V ar (ˆθ I (θ i.e. V ar (ˆθ nm ( = θ ( θ θ θ + nm ( θ ( θ θ

33 Since ( θ ( θ > ( θ θ for 0 < θ,θ < it follows that J (θ > I (θ which indicates that if θ is unknown it would be wrong to use the inequality V ar (ˆθ I (θ since this inequality is not as tight as the correct inequality In general it can be shown that V ar (ˆθ J (θ J ii (θ I ii (θ for all i, with strict inequality (and hence improved lower bound for the variance of an unbiased estimator of θ i unless the Fisher information matrix I(θ is diagonal (see C.R.Rao (973 Linear Statistical Inference and its applicationse