Fisher Information and Cramér-Rao Bound. 1 Fisher Information. Math 541: Statistical Theory II. Instructor: Songfeng Zheng

Similar documents
Maximum Likelihood Estimation

Master s Theory Exam Spring 2006

1 Prior Probability and Posterior Probability

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Exact Confidence Intervals

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

Chapter 6: Point Estimation. Fall Probability & Statistics

Multivariate Normal Distribution

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

Basics of Statistical Machine Learning

Efficiency and the Cramér-Rao Inequality

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Practice problems for Homework 11 - Point Estimation

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

Lecture Notes 1. Brief Review of Basic Probability

THE CENTRAL LIMIT THEOREM TORONTO

Principle of Data Reduction

4. Continuous Random Variables, the Pareto and Normal Distributions

1 Sufficient statistics

HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)!

Monte Carlo Simulation

6.4 Logarithmic Equations and Inequalities

Vector and Matrix Norms

CHAPTER 2 Estimating Probabilities

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

1.7 Graphs of Functions

STAT 830 Convergence in Distribution

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Sections 2.11 and 5.8

Aggregate Loss Models

MATHEMATICAL METHODS OF STATISTICS

e.g. arrival of a customer to a service station or breakdown of a component in some system.

Math 120 Final Exam Practice Problems, Form: A

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Advanced statistical inference. Suhasini Subba Rao

Linear Algebra Notes for Marsden and Tromba Vector Calculus

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

Tenth Problem Assignment

Stat 704 Data Analysis I Probability Review

Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density

Introduction to General and Generalized Linear Models

5. Continuous Random Variables

Differentiating under an integral sign

NOTES ON LINEAR TRANSFORMATIONS

Exact Nonparametric Tests for Comparing Means - A Personal Summary

5.1 Identifying the Target Parameter

Inner Product Spaces

CAPM, Arbitrage, and Linear Factor Models

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

33. STATISTICS. 33. Statistics 1

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University

MAS2317/3317. Introduction to Bayesian Statistics. More revision material

Logistic Regression (1/24/13)

Math 461 Fall 2006 Test 2 Solutions

DERIVATIVES AS MATRICES; CHAIN RULE

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

E3: PROBABILITY AND STATISTICS lecture notes

Continued Fractions and the Euclidean Algorithm

Determining distribution parameters from quantiles

9.2 Summation Notation

Math 370, Spring 2008 Prof. A.J. Hildebrand. Practice Test 1 Solutions

Lectures 5-6: Taylor Series

Satistical modelling of clinical trials

A Tutorial on Probability Theory

Lecture 7: Continuous Random Variables

Important Probability Distributions OPRE 6301

CHAPTER FIVE. Solutions for Section 5.1. Skill Refresher. Exercises

Practice with Proofs

Lecture 8. Confidence intervals and the central limit theorem

LOGNORMAL MODEL FOR STOCK PRICES

Stat 5102 Notes: Nonparametric Tests and. confidence interval

1 Inner Products and Norms on Real Vector Spaces

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Metric Spaces. Chapter Metrics

About the Gamma Function

4.5 Linear Dependence and Linear Independence

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

A power series about x = a is the series of the form

Payment streams and variable interest rates

1 Another method of estimation: least squares

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

5 Numerical Differentiation

Lecture 13: Martingales

5.3 Improper Integrals Involving Rational and Exponential Functions

Taylor and Maclaurin Series

To give it a definition, an implicit function of x and y is simply any relationship that takes the form:

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

3.2 LOGARITHMIC FUNCTIONS AND THEIR GRAPHS. Copyright Cengage Learning. All rights reserved.

WEEK #22: PDFs and CDFs, Measures of Center and Spread

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Differentiation and Integration

WHEN DOES A RANDOMLY WEIGHTED SELF NORMALIZED SUM CONVERGE IN DISTRIBUTION?

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

Factor analysis. Angela Montanari

Limits and Continuity

Transcription:

Math 54: Statistical Theory II Fisher Information Cramér-Rao Bound Instructor: Songfeng Zheng In the parameter estimation problems, we obtain information about the parameter from a sample of data coming from the underlying probability distribution. A natural question is: how much information can a sample of data provide about the unknown parameter? This section introduces such a measure for information, we can also see that this information measure can be used to find bounds on the variance of estimators, it can be used to approximate the sampling distribution of an estimator obtained from a large sample, further be used to obtain an approximate confidence interval in case of large sample. In this section, we consider a rom variable X for which the pdf or pmf is f(x θ), where θ is an unknown parameter θ Θ, with Θ is the parameter space. Fisher Information Motivation: Intuitively, if an event has small probability, then the occurrence of this event brings us much information. For a rom variable X f(x θ), if θ were the true value of the parameter, the likelihood function should take a big value, or equivalently, the derivative log-likelihood function should be close to zero, this is the basic principle of maximum likelihood estimation. We define l(x θ) = log f(x θ) as the log-likelihood function, l (x θ) = θ log f(x θ) = f (x θ) f(x θ) where f (x θ) is the derivative of f(x θ) with respect to θ. Similarly, we denote the second order derivative of f(x θ) with respect to θ as f (x θ). According to the above analysis, if l (X θ) is close to zero, then it is expected, thus the rom variable does not provide much information about θ; on the other h, if l (X θ) or [l (X θ)] 2 is large, the rom variable provides much information about θ. Thus, we can use [l (X θ)] 2 to measure the amount of information provided by X. However, since X is a rom variable, we should consider the average case. Thus, we introduce the following definition: Fisher information (for θ) contained in the rom variable X is defined as: { I(θ) = E θ [l (X θ))] 2} = [l (x θ))] 2 f(x θ)dx. ()

2 We assume that we can exchange the order of differentiation integration, then f (x θ)dx = f(x θ)dx = 0 θ Similarly, f (x θ)dx = 2 θ 2 f(x θ)dx = 0 It is easy to see that E θ [l (X θ)] = l (x θ)f(x θ)dx = f (x θ) f(x θ) f(x θ)dx = f (x θ)dx = 0 Therefore, the definition of Fisher information () can be rewritten as I(θ) = Var θ [l (X θ))] (2) Also, notice that l (x θ) = θ [ ] f (x θ) = f (x θ)f(x θ) [f (x θ)] 2 f(x θ) [f(x θ)] 2 = f (x θ) f(x θ) [l (x θ)] 2 Therefore, E θ [l (x θ)] = [ ] f (x θ) f(x θ) [l (x θ)] 2 f(x θ)dx = f (x θ)dx E θ { [l (X θ)] 2} = I(θ) Finally, we have another formula to calculate Fisher information: [ ] I(θ) = E θ [l 2 (x θ)] = log f(x θ) f(x θ)dx (3) θ2 To summarize, we have three methods to calculate Fisher information: equations (), (2), (3). In many problems, using (3) is the most convenient choice. Example : Suppose rom variable X has a Bernoulli distribution for which the parameter θ is unknown (0 < θ < ). We shall determine the Fisher information I(θ) in X. The point mass function of X is f(x θ) = θ x ( θ) x for x = or x = 0. Therefore l(x θ) = log f(x θ) = x log θ + ( x) log( θ)

3 l (x θ) = x θ x θ Since E(X) = θ, the Fisher information is I(x θ) = E[l (x θ)] = E(X) θ 2 l (x θ) = x θ 2 x ( θ) 2 + E(X) ( θ) 2 = θ + θ = θ( θ) Example 2: Suppose that X N(µ, σ 2 ), µ is unknown, but the value of σ 2 is given. find the Fisher information I(µ) in X. For < x <, we have Hence, l(x µ) = log f(x µ) = 2 log(2πσ2 ) 2σ 2 l (x µ) = x µ σ 2 l (x µ) = σ 2 It follows that the Fisher information is I(µ) = E[l (x µ)] = σ 2 If we make a transformation of the parameter, we will have different expressions of Fisher information with different parameterization. More specifically, let X be a rom variable for which the pdf or pmf is f(x θ), where the value of the parameter θ is unknown but must lie in a space Θ. Let I 0 (θ) denote the Fisher information in X. Suppose now the parameter θ is replaced by a new parameter µ, where θ = φ(µ), φ is a differentiable function. Let I (µ) denote the Fisher information in X when the parameter is regarded as µ. We will have I (µ) = [φ (µ)] 2 I 0 [φ(µ)]. Proof: Let g(x µ) be the p.d.f. or p.m.f. of X when µ is regarded as the parameter. Then g(x µ) = f[x φ(µ)]. Therefore, It follows that I (µ) = E log g(x µ) = log f[x φ(µ)] = l[x φ(µ)], µ log g(x µ) = l [x φ(µ)]φ (µ). { [ ] } 2 ( log g(x µ) = [φ (µ)] 2 E {l [X φ(µ)]} 2) = [φ (µ)] 2 I 0 [φ(µ)] µ This will be verified in exercise problems.

4 Suppose that we have a rom sample X,, X n coming from a distribution for which the pdf or pmf is f(x θ), where the value of the parameter θ is unknown. Let us now calculate the amount of information the rom sample X,, X n provides for θ. Let us denote the joint pdf of X,, X n as then f n (x θ) = l n (x θ) = log f n (x θ) = n f(x i θ) log f(x i θ) = l n(x θ) = f n(x θ) f n (x θ) l(x i θ). We define the Fisher information I n (θ) in the rom sample X,, X n as { I n (θ) = E θ [l n (X θ)] 2} = [l n(x θ)] 2 f n (x θ)dx dx n which is an n-dimensional integral. We further assume that we can exchange the order of differentiation integration, then we have f n(x θ)dx = f n (x θ)dx = 0 θ, It is easy to see that E θ [l n(x θ)] = f n(x θ)dx = 2 θ 2 l n(x θ)f n (x θ)dx = f n (x θ)dx = 0 f n (x θ) f n (x θ) f n(x θ)dx = (4) f n(x θ)dx = 0 (5) Therefore, the definition of Fisher information for the sample X,, X n can be rewritten as I n (θ) = Var θ [l n(x θ))]. It is similar to prove that the Fisher information can also be calculated as From the definition of l n (x θ), it follows that I n (θ) = E θ [l n(x θ))]. l n(x θ) = l (x i θ).

5 Therefore, the Fisher information I n (θ) = E θ [l n(x θ))] = E θ [ l (X i θ) ] = E θ [l (X i θ)] = ni(θ). In other words, the Fisher information in a rom sample of size n is simply n times the Fisher information in a single observation. Example 3: Suppose X,, X n form a rom sample from a Bernoulli distribution for which the parameter θ is unknown (0 < θ < ). Then the Fisher information I n (θ) in this sample is n I n (θ) = ni(θ) = θ( θ). Example 4: Let X,, X n be a rom sample from N(µ, σ 2 ), µ is unknown, but the value of σ 2 is given. Then the Fisher information I n (θ) in this sample is I n (µ) = ni(µ) = n σ 2. 2 Cramér-Rao Lower Bound Asymptotic Distribution of Maximum Likelihood Estimators Suppose that we have a rom sample X,, X n coming from a distribution for which the pdf or pmf is f(x θ), where the value of the parameter θ is unknown. We will show how to used Fisher information to determine the lower bound for the variance of an estimator of the parameter θ. Let ˆθ = r(x,, X n ) = r(x) be an arbitrary estimator of θ. Assume E θ (ˆθ) = m(θ), the variance of ˆθ is finite. Let us consider the rom variable l n(x θ) defined in (4), it was shown in (5) that E θ [l n(x θ)] = 0. Therefore, the covariance between ˆθ l n(x θ) is } Cov θ [ˆθ, l n(x θ)] = E θ {[ˆθ E θ (ˆθ)][l n(x θ) E θ (l n(x θ))] = E θ {[r(x) m(θ)]l n(x θ)} = E θ [r(x)l n(x θ)] m(θ)e θ [l n(x θ)] = E θ [r(x)l n(x θ)] = r(x)l n(x θ)f n (x θ)dx dx n = r(x)f n(x θ)dx dx n (Use Equation 4) = r(x)f n (x θ)dx dx n θ = θ E θ[ˆθ] = m (θ) (6)

6 By Cauchy-Schwartz inequality the definition of I n (θ), { } 2 Cov θ [ˆθ, l n(x θ)] Varθ [ˆθ]Var θ [l n(x θ)] = Var θ [ˆθ]I n (θ) i.e., [m (θ)] 2 Var θ [ˆθ]I n (θ) = ni(θ)var θ [ˆθ] Finally, we get the lower bound of variance of an arbitrary estimator ˆθ as Var θ [ˆθ] [m (θ)] 2 ni(θ) (7) The inequality (7) is called the information inequality, also known as the Cramér-Rao inequality in honor of the Sweden statistician H. Cramér Indian statistician C. R. Rao who independently developed this inequality during the 940s. The information inequality shows that as I(θ) increases, the variance of the estimator decreases, therefore, the quality of the estimator increases, that is why the quantity is called information. If ˆθ is an unbiased estimator, then m(θ) = E θ (ˆθ) = θ, m (θ) =. Hence, by the information inequality, for unbiased estimator ˆθ, Var θ [ˆθ] ni(θ) The right h side is always called the Cramér-Rao lower bound (CRLB): under certain conditions, no other unbiased estimator of the parameter θ based on an i.i.d. sample of size n can have a variance smaller than CRLB. Example 5: Suppose a rom sample X,, X n from a normal distribution N(µ, θ), with µ given the variance θ unknown. Calculate the lower bound of variance for any estimator, compare to that of the sample variance S 2. Solution: We know then Hence f(x θ) = } exp { 2πθ 2θ l(x θ) = 2θ 2 log 2π log θ. 2 l (x θ) = 2θ 2 2θ, l (x θ) = θ 3 + 2θ 2.

7 Therefore, I(θ) = E[l (X θ)] = E I n (θ) = ni(θ) = Finally, we have the Cramér-Rao lower bound 2θ2 n. The sample variance is defined as it is known that then, Therefore, S 2 = n ( ) n Var S 2 = θ (X µ)2 [ + ] = θ 3 2θ 2 2θ, 2 n 2θ 2. (X i X) 2 (n )S 2 θ Var ( S 2) = χ 2 n (n )2 θ 2 Var ( S 2) = 2(n ). 2θ2 n > 2θ2 n i.e., the variance of the estimator S 2 is bigger than the Cramér-Rao lower bound. Now, let us consider the MLE ˆθ of θ, to make notations clear, let us assume the true value of θ is θ 0. We shall prove that as the sample size n is very big, the distribution of MLE estimator ˆθ is approximately normal with mean θ 0 variance /[ni(θ 0 )]. Since this is merely a limiting result, which holds as the sample size tends to infinity, we say that the MLE is asymptotically unbiased refer to the variance of the limiting normal distribution as the asymptotic variance of the MLE. More specifically, we have the following theorem: Theorem (The asymptotic distribution of MLE): Let X,, X n be a sample of size n from a distribution for which the pdf or pmf is f(x θ), with θ the unknown parameter. Assume that the true value of θ is θ 0, the MLE of θ is ˆθ. Then the probability distribution of ni(θ 0 )(ˆθ θ 0 ) tends to a stard normal distribution. In other words, the asymptotic distribution of ˆθ is N ( θ 0, ) ni(θ 0 ) Proof: we shall prove that ni(θ0 )(ˆθ θ 0 ) N(0, ) asymptotically. We will only give a sketch of the proof; the details of the argument are beyond the scope of this course.

8 Remember the log-likelihood function is l(θ) = log f(x i θ) ˆθ is the solution to l (θ) = 0. We apply Tylor expansion of l (ˆθ) at the point θ 0, yielding Therefore, 0 = l (ˆθ) l (θ 0 ) + (ˆθ θ 0 )l (θ 0 ) ˆθ θ 0 l (θ 0 ) l (θ 0 ) n(ˆθ θ0 ) n /2 l (θ 0 ) n l (θ 0 ) First, let us consider the numerator of the last expression above. Its expectation is [ ] E[ n /2 l (θ 0 )] = n /2 E θ log f(x i θ 0 ) = n /2 E [l (X i θ 0 )] = 0, its variance is Var[ n /2 l (θ 0 )] = n E Next, we consider the denominator: [ ] 2 θ log f(x i θ 0 ) = n n l (θ 0 ) = n E [l (X i θ 0 )] 2 = I(θ 0 ). 2 θ 2 log f(x i θ 0 ) By the law of large number, this expression converges to [ ] 2 E θ log f(x i θ 2 0 ) = I(θ 0 ) We thus have Therefore, n(ˆθ θ0 ) n /2 l (θ 0 ) I(θ 0 ) [ ] E n(ˆθ θ0 ) E[n /2 l (θ 0 )] I(θ 0 ) [ ] Var n(ˆθ θ0 ) Var[n /2 l (θ 0 )] I 2 (θ 0 ) = 0, = I(θ 0) I 2 (θ 0 ) = I(θ 0 )

9 As n, applying central limit theorem, we have ( ) n(ˆθ θ0 ) N 0, I(θ 0 ) i.e., ni(θ0 )(ˆθ θ 0 ) N (0, ). This completes the proof. This theorem indicates the asymptotic optimality of maximum likelihood estimator since the asymptotic variance of MLE can achieve the CRLB. For this reason, MLE is frequently used especially with large samples. Example 6: Suppose that X, X 2,, X n are i.i.d. rom variables on the interval [0, ] with the density function f(x α) = Γ(2α) [x( x)]α Γ(α) 2 where α > 0 is a parameter to be estimated from the sample. It can be shown that E(X) = 2 V ar(x) = What is the asymptotic variance of the MLE? Solution. Let s calculate I(α): Firstly, Then, Therefore, 4(2α + ) log f(x α) = log Γ(2α) 2 log Γ(α) + (α ) log[x( x)] log f(x α) α = 2Γ (2α) Γ(2α) 2Γ (α) + log[x( x)] Γ(α) 2 log f(x α) = 2Γ (2α)2Γ(2α) 2Γ (2α)2Γ (2α) 2Γ (α)γ(α) 2Γ (α)γ (α) α 2 Γ(2α) 2 Γ(α) 2 = 4Γ (2α)Γ(2α) (2Γ (2α)) 2 2Γ (α)γ(α) 2 (Γ (α)) 2 Γ(2α) 2 Γ(α) 2 ( ) 2 log f(x α) I(α) = E = 2Γ (α)γ(α) 2 (Γ (α)) 2 α 2 Γ 2 (α) 4Γ (2α)Γ(2α) (2Γ (2α)) 2 Γ 2 (2α) The asymptotic variance of the MLE is ni(α).

0 Example 7: The Pareto distribution has been used in economics as a model for a density function with a slowly decaying tail: f(x x 0, θ) = θx θ 0x θ, x x 0, θ > Assume that x 0 > 0 is given that X, X 2,, X n is an i.i.d. sample. Find the asymptotic distribution of the mle. Solution: The asymptotic distribution of ˆθ ( ) MLE is N θ,. Let s calculate I(θ). ni(θ) Firstly, Then, So, log f(x θ) = log θ + θ log x 0 (θ + ) log x log f(x θ) θ = θ + log x 0 log x 2 log f(x θ) θ 2 = θ 2 [ ] 2 log f(x θ) I(θ) = E = θ 2 θ 2 Therefore, the asymptotic distribution of MLE is ( ) N θ, = N ni(θ) ) (θ, θ2 n 3 Approximate Confidence Intervals In previous lectures, we discussed the exact confidence intervals. However, to construct an exact confidence interval requires detailed knowledge of the sampling distribution as well as some cleverness. An alternative method of constructing confidence intervals is based on the large sample theory of the previous section. According to the large sample theory result, the distribution of ni(θ 0 )(ˆθ θ 0 ) is approximately the stard normal distribution. Since the true value of θ, θ 0, is unknown, we will use the estimated value ˆθ to estimate I(θ 0 ). It can be further argued that the distribution of ni(ˆθ)(ˆθ θ 0 ) is also approximately stard normal. Since the stard normal distribution is symmetric about 0, ( ) P z( α/2) ni(ˆθ)(ˆθ θ 0 ) z( α/2) α

Manipulation of the inequalities yields ˆθ z( α/2) θ 0 ˆθ + z( α/2) ni(ˆθ) ni(ˆθ) as an approximate 00( α)% confidence interval. Example 8: Let X,, X n denote a rom sample from a Poisson distribution that has mean λ > 0. It is easy to see that the MLE of λ is ˆλ = X. Since the sum of independent Poisson rom variables follows a Poisson distribution, the parameter of which is the sum of the parameters of the individual summs, nˆλ = n X i follows a Poisson distribution with mean nλ. Therefore the sampling distribution of ˆλ is know, which depends on the true value of λ. Exact confidence intervals for λ may be obtained by using this fact, special tables are available. For large samples, confidence intervals may be derived as follows. First, we need to calculate I(λ). The probability mass function of a Poisson rom variable with parameter λ is then It is easy to verify that therefore λ λx f(x λ) = e x! for x = 0,, 2, log f(x λ) = x log λ λ log x! 2 λ log f(x λ) = x 2 λ 2 I(λ) = E [ Xλ ] = 2 λ Thus, an approximate 00( α)% confidence interval for λ is [ ] X X X z( α/2) n, X + z( α/2) n Note that in this case, the asymptotic variance is in fact the exact variance, as we can verify. The confidence interval, however, is only approximate, since the sampling distribution is only approximately normal. 4 Multiple Parameter Case Suppose now there are more than one parameters in the distribution model, that is, the rom variable X f(x θ) with θ = (θ,, θ k ) T. We denote the log-likelihood function

2 as l(θ) = log f(x θ), its first order derivative with respect to θ is a k-dimensional vector, which is ( l(θ) l(θ) θ =,, l(θ) ) T, θ θ k The second order derivative of l(θ) with respect to θ is a k k matrix, which is [ ] 2 l(θ) 2 l(θ) θ 2 = θ i θ j,,k;j=,,k We define the Fisher information matrix as [ ( ) ] T [ ] [ ] l(θ) l(θ) l(θ) 2 l(θ) I(θ) = E = Cov = E θ θ θ θ 2 Since the covariance matrix is symmetric semi-positive definite, these properties hold for the Fisher information matrix as well. Example 9: Fisher information for normal distribution N(µ, σ 2 ). We have Thus, θ = (µ, σ 2 ) T, l(θ) = 2 log(2πσ2 ) 2σ 2. ( l(θ) l(θ) θ = µ, l(θ) ) T ( x µ =, ) T +, σ 2 σ 2 2σ2 2(σ 2 ) 2 [ 2 l(θ) σ θ 2 = 2 x µ (σ 2 ) 2 x µ (x µ)2 (σ 2 ) 2 2(σ 2 ) 2 (σ 2 ) 3 For X N(µ, σ 2 ), since E(X µ) = 0 E((X µ) 2 ) = σ 2, we can easily get the Fisher information matrix as [ ] [ 2 l(θ) ] 0 I(θ) = E θ 2 = σ 2 0 2σ 4 Similar to the one parameter case, the asymptotic distribution of MLE ˆθ MLE is approximately multi variate normal distribution with the true value of θ as the mean [ni(θ)] as the covariance matrix. ]

3 5 Exercises Problem : Suppose that a rom variable X has a Poisson distribution for which the mean θ is unknown (θ > 0). Find the Fisher information I(θ) in X. Problem 2: Suppose that a rom variable X has a normal distribution for which the mean is 0 the stard deviation σ is unknown (σ > 0). Find the Fisher information I(σ) in X. Problem 3: Suppose that a rom variable X has a normal distribution for which the mean is 0 the stard deviation σ is unknown (σ > 0). Find the Fisher information I(σ 2 ) in X. Note that in this problem, the variance σ 2 is regarded as the parameter, whereas in Problem 2 the stard deviation σ is regarded as the parameter. Problem 4: The Rayleigh distribution his defined as: f(x θ) = x θ 2 e x2 /(2θ 2), x 0, θ > 0 Assume that X, X 2,, X n is an i.i.d. sample from the Rayleigh distribution. Find the asymptotic variance of the mle. Problem 5: Suppose that X,, X n form a rom sample from a gamma distribution for which the value of the parameter α is unknown the value of parameter β is known. Show that if n is large, the distribution of the MLE of α will be approximately a normal distribution with mean α variance [Γ(α)] 2 n{γ(α)γ (α) [Γ (α)] 2 } Problem 6: Let X, X 2,, X n be an i.i.d. sample from an exponential distribution with the density function a. Find the MLE of τ. f(x τ) = τ e x/τ, x 0, τ > 0 b. What is the exact sampling distribution of the MLE. c. Use the central limit theorem to find a normal approximation to the sampling distribution. d. Show that the MLE is unbiased, find its exact variance. e. Is there any other unbiased estimate with smaller variance? f. Using the large sample property of MLE, find the asymptotic distribution of the MLE. Is it the same as in c.? g. Find the form of an approximate confidence interval for τ. h. Find the form of an exact confidence interval for τ.