Unbiased Estimation. Topic 14. 14.1 Introduction

Similar documents
Properties of MLE: consistency, asymptotic normality. Fisher information.

Chapter 7 Methods of Finding Estimators

I. Chi-squared Distributions

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Overview of some probability distributions.

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

1. C. The formula for the confidence interval for a population mean is: x t, which was

Output Analysis (2, Chapters 10 &11 Law)

Hypothesis testing. Null and alternative hypotheses

Normal Distribution.

Sequences and Series

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

5: Introduction to Estimation

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Confidence Intervals for One Mean

Measures of Spread and Boxplots Discrete Math, Section 9.4

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

1 Computing the Standard Deviation of Sample Means

Chapter 7: Confidence Interval and Sample Size

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Maximum Likelihood Estimators.

Math C067 Sampling Distributions

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Convexity, Inequalities, and Norms

Determining the sample size

Our aim is to show that under reasonable assumptions a given 2π-periodic function f can be represented as convergent series

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Section 11.3: The Integral Test


THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

A probabilistic proof of a binomial identity

Soving Recurrence Relations

Asymptotic Growth of Functions

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Infinite Sequences and Series

Factoring x n 1: cyclotomic and Aurifeuillian polynomials Paul Garrett <garrett@math.umn.edu>

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Incremental calculation of weighted mean and variance

Basic Elements of Arithmetic Sequences and Series

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

PSYCHOLOGICAL STATISTICS

INFINITE SERIES KEITH CONRAD

Sampling Distribution And Central Limit Theorem

Theorems About Power Series

Confidence Intervals

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

Quadrat Sampling in Population Ecology

Partial Di erential Equations

Parametric (theoretical) probability distributions. (Wilks, Ch. 4) Discrete distributions: (e.g., yes/no; above normal, normal, below normal)

Class Meeting # 16: The Fourier Transform on R n

5 Boolean Decision Trees (February 11)

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

LECTURE 13: Cross-validation

Lesson 17 Pearson s Correlation Coefficient

Statistical inference: example 1. Inferential Statistics

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Chapter 5: Inner Product Spaces

Research Article Sign Data Derivative Recovery

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

CHAPTER 3 THE TIME VALUE OF MONEY

4.3. The Integral and Comparison Tests

Modified Line Search Method for Global Optimization

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

Building Blocks Problem Related to Harmonic Series

1 Correlation and Regression Analysis

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

Confidence intervals and hypothesis tests

Chapter 14 Nonparametric Statistics

Notes on exponential generating functions and structures.

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

Repeating Decimals are decimal numbers that have number(s) after the decimal point that repeat in a pattern.

Lesson 15 ANOVA (analysis of variance)

One-sample test of proportions

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

7. Concepts in Probability, Statistics and Stochastic Modelling

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

1 The Gaussian channel

THE TWO-VARIABLE LINEAR REGRESSION MODEL

1 Review of Probability

Hypergeometric Distributions

A PROBABILISTIC VIEW ON THE ECONOMICS OF GAMBLING

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Department of Computer Science, University of Otago

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

3. Greatest Common Divisor - Least Common Multiple

MARTINGALES AND A BASIC APPLICATION

S. Tanny MAT 344 Spring be the minimum number of moves required.

Solving Logarithms and Exponential Equations

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

Exploratory Data Analysis

THE HEIGHT OF q-binary SEARCH TREES

Transcription:

Topic 4 Ubiased Estimatio 4. Itroductio I creatig a parameter estimator, a fudametal questio is whether or ot the estimator differs from the parameter i a systematic maer. Let s examie this by lookig a the computatio of the mea ad the variace of 6 flips of a fair coi. Give this task to 0 idividuals ad ask them report the umber of heads. We ca simulate this i R as follows > (x<-rbiom(0,6,0.5)) [] 8 5 9 7 7 9 7 8 8 0 Our estimate is obtaied by takig these 0 aswers ad averagig them. Ituitively we aticipate a aswer aroud 8. For these 0 observatios, we fid, i this case, that > sum(x)/0 [] 7.8 The result is a bit below 8. Is this systematic? To assess this, we appeal to the ideas behid Mote Carlo to perform a 000 simulatios of the example above. > meax<-rep(0,000) > for (i i :000){meax[i]<-mea(rbiom(0,6,0.5))} > mea(meax) [] 8.0049 From this, we surmise that we the estimate of the sample mea x either systematically overestimates or uderestimates the distributioal mea. From our kowledge of the biomial distributio, we kow that the mea µ p 6 0.5 8. I additio, the sample mea X also has mea E X 80 (8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8) 0 0 8 verifyig that we have o systematic error. The phrase that we use is that the sample mea X is a ubiased estimator of the distributioal mea µ. Here is the precise defiitio. Defiitio 4.. For observatios X (X,X,...,X ) based o a distributio havig parameter value, ad for d(x) a estimator for h( ), the bias is the mea of the differece d(x) h( ), i.e., b d ( ) E d(x) h( ). (4.) If b d ( ) 0for all values of the parameter, the d(x) is called a ubiased estimator. Ay estimator that is ot ubiased is called biased. 05

Itroductio to the Sciece of Statistics Ubiased Estimatio Example 4.. Let X,X,...,X be Beroulli trials with success parameter p ad set the estimator for p to be d(x) X, the sample mea. The, E p X (EX + EX + + EX ) (p + p + + p) p Thus, X is a ubiased estimator for p. I this circumstace, we geerally write ˆp istead of X. I additio, we ca use the fact that for idepedet radom variables, the variace of the sum is the sum of the variaces to see that Var(ˆp) (Var(X )+Var(X )+ + Var(X )) (p( p)+p( p)+ + p( p)) p( p). Example 4.3. If X,...,X form a simple radom sample with ukow fiite mea µ, the X is a ubiased estimator of µ. If the X i have variace, the Var( X). (4.) We ca assess the quality of a estimator by computig its mea square error, defied by E [(d(x) h( )) ]. (4.3) Estimators with smaller mea square error are geerally preferred to those with larger. Next we derive a simple relatioship betwee mea square error ad variace. We begi by substitutig (4.) ito (4.3), rearragig terms, ad expadig the square. E [(d(x) h( )) ]E [(d(x) (E d(x) b d ( ))) ]E [((d(x) E d(x)) + b d ( )) ] E [(d(x) E d(x)) ]+b d ( )E [d(x) E d(x)] + b d ( ) Var (d(x)) + b d ( ) Thus, the represetatio of the mea square error as equal to the variace of the estimator plus the square of the bias is called the bias-variace decompositio. I particular: The mea square error for a ubiased estimator is its variace. Bias always icreases the mea square error. 4. Computig Bias For the variace, we have bee preseted with two choices: (x i x) ad (x i x). (4.4) Usig bias as our criterio, we ca ow resolve betwee the two choices for the estimators for the variace. Agai, we use simulatios to make a cojecture, we the follow up with a computatio to verify our guess. For 6 tosses of a fair coi, we kow that the variace is p( p) 6 / / 4 For the example above, we begi by simulatig the coi tosses ad compute the sum of squares P 0 (x i x), > ssx<-rep(0,000) > for (i i :000){x<-rbiom(0,6,0.5);ssx[i]<-sum((x-mea(x))ˆ)} > mea(ssx) [] 35.85 06

Itroductio to the Sciece of Statistics Ubiased Estimatio The choice is to divide either by 0, for the first choice, or 9, for the secod. > mea(ssx)/0;mea(ssx)/9 [] 3.585 [] 3.983456 Exercise 4.4. Repeat the simulatio above, compute the sum of squares P 0 (x i 8). Show that these simulatios P support dividig by 0 rather tha 9. verify that (X i µ) / is a ubiased estimator for for idepedet radom variable X,...,X whose commo distributio has mea µ ad variace. I this case, because we kow all the aspects of the simulatio, ad thus we kow that the aswer ought to be ear 4. Cosequetly, divisio by 9 appears to be the appropriate choice. Let s check this out, begiig with what seems to be the iappropriate choice to see what goes wrog.. Frequecy 0 50 00 50 00 50 Histogram of ssx 0 0 40 60 80 00 0 Example 4.5. If a simple radom sample X,X,..., ssx has ukow fiite variace, the, we ca cosider the sample variace Figure 4.: Sum of squares about x for 000 simulatios. S (X i X). To fid the mea of S, we divide the differece betwee a observatio X i ad the distributioal mea ito two steps - the first from X i to the sample mea x ad ad the from the sample mea to the distributioal mea, i.e., X i µ (X i X)+( X µ). We shall soo see that the lack of kowledge of µ is the source of the bias. Make this substitutio ad expad the square to obtai (X i µ) ((X i X)+( X µ)) X (X i X) + (X i X)( X µ)+ ( X µ) (X i X) + ( X µ) (X i X)+( X µ) (X i X) + ( X µ) (Check for yourself that the middle term i the third lie equals 0.) Subtract the term ( X divide by to obtai the idetity µ) from both sides ad (X i X) (X i µ) ( X µ). 07

Itroductio to the Sciece of Statistics Ubiased Estimatio Usig the idetity above ad the liearity property of expectatio we fid that " # ES E (X i X) " # E (X i µ) ( X µ) E[(X i µ) ] E[( X µ) ] Var(X i ) Var( X) 6. The last lie uses (4.). This shows that S is a biased estimator for see that it is biased dowwards. b( ) Note that the bias is equal to Var( X). I additio, because apple E S E S ad S u S. (X i X). Usig the defiitio i (4.), we ca is a ubiased estimator for. As we shall lear i the ext sectio, because the square root is cocave dowward, S u p Su as a estimator for is dowwardly biased. Example 4.6. We have see, i the case of Beroulli trials havig x successes, that ˆp x/ is a ubiased estimator for the parameter p. This is the case, for example, i takig a simple radom sample of geetic markers at a particular biallelic locus. Let oe allele deote the wildtype ad the secod a variat. If the circumstaces i which variat is recessive, the a idividual expresses the variat pheotype oly i the case that both chromosomes cotai this marker. I the case of idepedet alleles from each paret, the probability of the variat pheotype is p. Naïvely, we could use the estimator ˆp. (Later, we will see that this is the maximum likelihood estimator.) To determie the bias of this estimator, ote that E ˆp (E ˆp) + Var(ˆp) p + p( p). (4.5) Thus, the bias b(p) p( p)/ ad the estimator ˆp is biased upward. Exercise 4.7. For Beroulli trials X,...,X, (X i ˆp) ˆp( ˆp). Based o this exercise, ad the computatio above yieldig a ubiased estimator, Su, for the variace, apple " # E ˆp( ˆp) E (X i ˆp) E[S u] Var(X ) p( p). 08

Itroductio to the Sciece of Statistics Ubiased Estimatio I other words, ˆp( ˆp) is a ubiased estimator of p( p)/. Returig to (4.5), apple E ˆp ˆp( ˆp) p + p( p) p( p) p. Thus, bp u ˆp ˆp( ˆp) is a ubiased estimator of p. To compare the two estimators for p, assume that we fid 3 variat alleles i a sample of 30, the ˆp 3/30 0.4333, ˆp 3 0.878, ad p b u 30 3 30 9 3 30 7 0.878 0.0085 0.793. 30 The bias for the estimate ˆp, i this case 0.0085, is subtracted to give the ubiased estimate p b u. The heterozygosity of a biallelic locus is h p( p). From the discussio above, we see that h has the ubiased estimator ĥ x x x( x) ˆp( ˆp) ( ). 4.3 Compesatig for Bias I the methods of momets estimatio, we have used g( X) as a estimator for g(µ). If g is a covex fuctio, we ca say somethig about the bias of this estimator. I Figure 4., we see the method of momets estimator for the estimator g( X) for a parameter i the Pareto distributio. The choice of 3correspods to a mea of µ 3/ for the Pareto radom variables. The cetral limit theorem states that the sample mea X is early ormally distributed with mea 3/. Thus, the distributio of X is early symmetric aroud 3/. From the figure, we ca see that the iterval from.4 to.5 uder the fuctio g maps ito a loger iterval above 3tha the iterval from.5 to.6 maps below 3. Thus, the fuctio g spreads the values of X above 3more tha below. Cosequetly, we aticipate that the estimator ˆ will be upwardly biased. To address this pheomea i more geeral terms, we use the characterizatio of a covex fuctio as a differetiable fuctio whose graph lies above ay taget lie. If we look at the value µ for the covex fuctio g, the this statemet becomes g(x) g(µ) g 0 (µ)(x µ). Now replace x with the radom variable X ad take expectatios. Cosequetly, E µ [g( X) g(µ)] E µ [g 0 (µ)( X µ)] g 0 (µ)e µ [ X µ] 0. E µ g( X) g(µ) (4.6) ad g( X) is biased upwards. The expressio i (4.6) is kow as Jese s iequality. Exercise 4.8. Show that the estimator S u is a dowwardly biased estimator for. To estimate the size of the bias, we look at a quadratic approximatio for g cetered at the value µ g(x) g(µ) g 0 (µ)(x µ)+ g00 (µ)(x µ). 09

Itroductio to the Sciece of Statistics Ubiased Estimatio 5 4.5 4 g(x) x/(x!)! 3.5 yg(µ)+g (µ)(x!µ) 3.5.5.3.35.4.45.5.55.6.65.7.75 x Figure 4.: Graph of a covex fuctio. Note that the taget lie is below the graph of g. Here we show the case i which µ.5 ad g(µ) 3. Notice that the iterval from x.4 to x.5 has a loger rage tha the iterval from x.5 to x.6 Because g spreads the values of X above 3more tha below, the estimator ˆ for is biased upward. We ca use a secod order Taylor series expasio to correct most of this bias. Agai, replace x i this expressio with the radom variable X ad the take expectatios. The, the bias b g (µ) E µ [g( X)] g(µ) E µ [g 0 (µ)( X µ)] + E[g00 (µ)( X µ) ] g00 (µ)var( X) g00 (µ). (4.7) (Remember that E µ [g 0 (µ)( X µ)] 0.) Thus, the bias has the ituitive properties of beig large for strogly covex fuctios, i.e., oes with a large value for the secod derivative evaluated at the mea µ, large for observatios havig high variace, ad small whe the umber of observatios is large. Exercise 4.9. Use (4.7) to estimate the bias i usig ˆp as a estimate of p is a sequece of Beroulli trials ad ote that it matches the value (4.5). Example 4.0. For the method of momets estimator for the Pareto radom variable, we determied that ad that X has g(µ) µ µ. mea µ ad variace By takig the secod derivative, we see that g 00 (µ) (µ Next, we have g 00 0 ( ) ( ) ) 3 > 0 ad, because µ>, g is a covex fuctio. 3 ( ) 3.

Itroductio to the Sciece of Statistics Ubiased Estimatio Thus, the bias b g ( ) g00 (µ) ( )3 ( ) ( ) ( ) ( ). So, for 3ad 00, the bias is approximately 0.06. Compare this to the estimated value of 0.053 from the simulatio i the previous sectio. Example 4.. For estimatig the populatio i mark ad recapture, we used the estimate N g(µ) kt µ for the total populatio. Here µ is the mea umber recaptured, k is the umber captured i the secod capture evet ad t is the umber tagged. The secod derivative g 00 (µ) kt µ 3 > 0 ad hece the method of momets estimate is biased upwards. I this siutatio, ad the umber recaptured is a hypergeometric radom variable. Hece its variace Thus, the bias b g (N) kt kt µ 3 N (N t)(n k) N(N ) kt N (N t)(n k) µ(n ) (N t)(n k). N(N ) (kt/µ t)(kt/µ k) µ(kt/µ ) kt(k µ)(t µ) µ. (kt µ) I the simulatio example, N 000,t 00,k 400 ad µ 40. This gives a estimate for the bias of 36.0. We ca compare this to the bias of 03.03-000 3.03 based o the simulatio i Example 3.. This suggests a ew estimator by takig the method of momets estimator ad subtractig the approximatio of the bias. ˆN kt r kt(k r)(t r) r (kt r) kt r (k r)(t r) r(kt r) The delta method gives us that the stadard deviatio of the estimator is g 0 (µ) / p. Thus the ratio of the bias of a estimator to its stadard deviatio as determied by the delta method is approximately g 00 (µ) /() g 0 (µ) / p g 00 (µ) g 0 (µ) p. If this ratio is, the the bias correctio is ot very importat. I the case of the example above, this ratio is. ad its usefuless i correctig bias is small. 36.0 68.40 0.34 4.4 Cosistecy Despite the desirability of usig a ubiased estimator, sometimes such a estimator is hard to fid ad at other times impossible. However, ote that i the examples above both the size of the bias ad the variace i the estimator decrease iversely proportioal to, the umber of observatios. Thus, these estimators improve, uder both of these criteria, with more observatios. A cocept that describes properties such as these is called cosistecy.

Itroductio to the Sciece of Statistics Ubiased Estimatio Defiitio 4.. Give data X,X,...ad a real valued fuctio h of the parameter space, a sequece of estimators d, based o the first observatios, is called cosistet if for every choice of wheever is the true state of ature. lim d (X,X,...,X )h( )! Thus, the bias of the estimator disappears i the limit of a large umber of observatios. I additio, the distributio of the estimators d (X,X,...,X ) become more ad more cocetrated ear h( ). For the ext example, we eed to recall the sequece defiitio of cotiuity: A fuctio g is cotiuous at a real umber x provided that for every sequece {x ; } with x! x, the, we have that g(x )! g(x). A fuctio is called cotiuous if it is cotiuous at every value of x i the domai of g. Thus, we ca write the expressio above more succictly by sayig that for every coverget sequece {x ; }, lim g(x )g( lim x ).!! Example 4.3. For a method of momet estimator, let s focus o the case of a sigle parameter (d ). For idepedet observatios, X,X,...,havig mea µ k( ), we have that E X µ, i. e. X, the sample mea for the first observatios, is a ubiased estimator for µ k( ). Also, by the law of large umbers, we have that lim! X µ. Assume that k has a cotiuous iverse g k. I particular, because µ k( ), we have that g(µ). Next, usig the methods of momets procedure, defie, for observatios, the estimators ˆ (X,X,...,X )g (X + + X ) g( X ). for the parameter. Usig the cotiuity of g, we fid that lim ˆ (X,X,...,X ) lim g( X )g( lim!!! ad so we have that g( X ) is a cosistet sequece of estimators for. X )g(µ) 4.5 Cramér-Rao Boud This topic is somewhat more advaced ad ca be skipped for the first readig. This sectio gives us a itroductio to the log-likelihood ad its derivative, the score fuctios. We shall ecouter these fuctios agai whe we itroduce maximum likelihood estimatio. I additio, the Cramér Rao boud, which is based o the variace of the score fuctio, kow as the Fisher iformatio, gives a lower boud for the variace of a ubiased estimator. These cocepts will be ecessary to describe the variace for maximum likelihood estimators. Amog ubiased estimators, oe importat goal is to fid a estimator that has as small a variace as possible, A more precise goal would be to fid a ubiased estimator d that has uiform miimum variace. I other words, d(x) has has a smaller variace tha for ay other ubiased estimator d for every value of the parameter.

Itroductio to the Sciece of Statistics Ubiased Estimatio Var d(x) apple Var d(x) for all. The efficiecy e( d) of ubiased estimator d is the miimum value of the ratio Var d(x) Var d(x) over all values of. Thus, the efficiecy is betwee 0 ad with a goal of fidig estimators with efficiecy as ear to oe as possible. For ubiased estimators, the Cramér-Rao boud tells us how small a variace is ever possible. The formula is a bit mysterious at first. However, we shall soo lear that this boud is a cosequece of the boud o correlatio that we have previously leared Recall that for two radom variables Y ad Z, the correlatio (Y,Z) Cov(Y,Z) p Var(Y )Var(Z). (4.8) takes values betwee - ad. Thus, (Y,Z) apple ad so Cov(Y,Z) apple Var(Y )Var(Z). (4.9) Exercise 4.4. If EZ 0, the Cov(Y,Z)EY Z We begi with data X (X,...,X ) draw from a ukow probability P. The parameter space R. Deote the joit desity of these radom variables, where x (x...,x ). I the case that the data comes from a simple radom sample the the joit desity is the product of the margial desities. f(x ) f(x ) (4.0) For cotiuous radom variables, the two basic properties of the desity are that 0 for all x ad that Z dx. (4.) R Now, let d be the ubiased estimator of h( ), the by the basic formula for computig expectatio, we have for cotiuous radom variables Z h( ) E d(x) d(x) dx. (4.) R If the fuctios i (4.) ad (4.) are differetiable with respect to the parameter ad we ca pass the derivative through the itegral, the we first differetiate both sides of equatio (4.), ad the use the logarithm fuctio to write this derivate as the expectatio of a radom variable, Z Z Z apple / l l f(x ) 0 dx dx dx E. (4.3) R R From a similar calculatio usig (4.), R apple h 0 l f(x ) ( ) E d(x) 3. (4.4)

Itroductio to the Sciece of Statistics Ubiased Estimatio Now, retur to the review o correlatio with Y d(x), the ubiased estimator for h( ) ad the score fuctio Z l f(x )/. From equatios (4.4) ad the (4.9), we fid that or, where apple h 0 ( ) l f(x ) E d(x) Cov d(x), Var (d(x)) l f(x ) l f(x ) apple Var (d(x))var, " # l f(x ) l f(x ) I( ) Var E h 0 ( ) I( ). (4.5) is called the Fisher iformatio. For the equality, recall that the variace Var(Z) EZ (EZ) ad recall from equatio (4.3) that the radom variable Z l f(x )/ has mea EZ 0. Equatio (4.5), called the Cramér-Rao lower boud or the iformatio iequality, states that the lower boud for the variace of a ubiased estimator is the reciprocal of the Fisher iformatio. I other words, the higher the iformatio, the lower is the possible value of the variace of a ubiased estimator. If we retur to the case of a simple radom sample, the take the logarithm of both sides of equatio (4.0) ad the differetiate with respect to the parameter, l lf(x )+ +lf(x ) l l f(x ) + + l f(x ). The radom variables { l f(x k )/; apple k apple } are idepedet ad have the same distributio. Usig the fact that the variace of the sum is the sum of the variaces for idepedet radom variables, we see that I, the Fisher iformatio for observatios is times the Fisher iformatio of a sigle observatio. l f(x ) I ( ) Var + + l f(x ) Var( l f(x ) )E[( l f(x ) ) ]. Notice the correspodece. Iformatio is liearly proportioal to the umber of observatios. If our estimator is a sample mea or a fuctio of the sample mea, the the variace is iversely proportioal to the umber of observatios. Example 4.5. For idepedet Beroulli radom variables with ukow success probability, the desity is x ( ) ( x). The mea is ad the variace is ( ). Takig logarithms, we fid that l x l +( x)l( ), l x The Fisher iformatio associated to a sigle observatio " # I( ) E l f(x ) x x ( ). ( ) E[(X ) ] ( ( ) ( ) ( ). ) Var(X) 4

Itroductio to the Sciece of Statistics Ubiased Estimatio Thus, the iformatio for observatios I ( ) /( ( )). Thus, by the Cramér-Rao lower boud, ay ubiased estimator of based o observatios must have variace al least ( )/. Now, otice that if we take d(x) x, the E X, ad Var d(x) Var( X) ( ). These two equatios show that X is a ubiased estimator havig uiformly miimum variace. Exercise 4.6. For idepedet ormal radom variables with kow variace uiformly miimum variace ubiased estimator. 0 ad ukow mea µ, X is a Exercise 4.7. Take two derivatives of l to show that " # apple l f(x ) l f(x ) I( ) E E. (4.6) This idetity is ofte a useful alterative to compute the Fisher Iformatio. Example 4.8. For a expoetial radom variable, Thus, by (4.6), l f(x )l x, I( ). f(x ). Now, X is a ubiased estimator for h( )/ By the Cramér-Rao lower boud, we have that with variace. g 0 ( ) I( ) / 4. Because X has this variace, it is a uiformly miimum variace ubiased estimator. Example 4.9. To give a estimator that does ot achieve the Cramér-Rao boud, let X,X,...,X be a simple radom sample of Pareto radom variables with desity The mea ad the variace µ Thus, X is a ubiased estimator of µ /( ) To compute the Fisher iformatio, ote that f X (x ) x +, x >., ( ) ( ). Var( X) ( ) ( ). l f(x )l ( + ) l x ad thus l f(x ). 5

Itroductio to the Sciece of Statistics Ubiased Estimatio Usig (4.6), we have that I( ). Next, for µ g( ), g0 ( ) Thus, the Cramér-Rao boud for the estimator is ( ), ad g0 ( ) ( ) 4. ad the efficiecy compared to the Cramér-Rao boud is g 0 ( ) /I ( ) Var( X) g 0 ( ) I ( ) ( ) 4. ( ) 4 ( ) ( ) ( ) ( ) ( ). The Pareto distributio does ot have a variace uless >. For just above, the efficiecy compared to its Cramér-Rao boud is low but improves with larger. 4.6 A Note o Efficiet Estimators For a efficiet estimator, we eed fid the cases that lead to equality i the correlatio iequality (4.8). Recall that equality occurs precisely whe the correlatio is ±. This occurs whe the estimator d(x) ad the score fuctio l f X (X )/ are liearly related with probability. After itegratig, we obtai, Z l f X (X ) l f X(X ) a( )d(x)+b( ). Z a( )d d(x)+ b( )d + j(x) ( )d(x)+ B( )+ j(x) Note that the costat of itegratio of itegratio is a fuctio of X. Now expoetiate both sides of this equatio f X (X ) c( )h(x)exp( ( )d(x)). (4.7) Here c( ) expb( ) ad h(x) expj(x). We shall call desity fuctios satisfyig equatio (4.7) a expoetial family with atural parameter ( ). Thus, if we have idepedet radom variables X,X,...X, the the joit desity is the product of the desities, amely, f(x ) c( ) h(x ) h(x )exp( ( )(d(x )+ + d(x )). (4.8) I additio, as a cosequece of this liear relatio i (4.8), is a efficiet estimator for h( ). Example 4.0 (Poisso radom variables). d(x) (d(x )+ + d(x )) f(x ) x x! e e x! exp(x l ). 6

Itroductio to the Sciece of Statistics Ubiased Estimatio Thus, Poisso radom variables are a expoetial family with c( )exp( ( )l. Because E X, X is a ubiased estimator of the parameter. The score fuctio l f(x ) (x l l x! ) x. The Fisher iformatio for oe observatio is " X # I( )E E [(X ) ]. ), h(x) /x!, ad atural parameter Thus, I ( )/ is the Fisher iformatio for observatios. I additio, ad d(x) x has efficiecy Var ( X) Var( X) /I ( ). This could have bee predicted. The desity of idepedet observatios is f(x ) e x! x e x! x e x +x x! x! e x x! x! ad so the score fuctio x l f(x ) ( + x l ) + showig that the estimate x ad the score fuctio are liearly related. Exercise 4.. Show that a Beroulli radom variable with parameter p is a expoetial family. Exercise 4.. Show that a ormal radom variable with kow variace family. 0 ad ukow mea µ is a expoetial 4.7 Aswers to Selected Exercises 4.4. Repeat the simulatio, replacig mea(x) by 8. > ssx<-rep(0,000) > for (i i :000){x<-rbiom(0,6,0.5);ssx[i]<-sum((x-8)ˆ)} > mea(ssx)/0;mea(ssx)/9 [] 3.998 [] 4.435333 Note that divisio by 0 gives a aswer very close to the correct value of 4. To verify that the estimator is ubiased, we write " # E (X i µ) E[(X i µ) ] Var(X i ). 7

Itroductio to the Sciece of Statistics Ubiased Estimatio 4.7. For a Beroulli trial ote that X i X i. Expad the square to obtai (X i ˆp) Divide by to obtai the result. X i ˆp X i + ˆp ˆp ˆp + ˆp (ˆp ˆp )ˆp( ˆp). 4.8. Recall that ES u. Check the secod derivative to see that g(t) p t is cocave dow for all t. For cocave dow fuctios, the directio of the iequality i Jese s iequality is reversed. Settig t S u, we have that ad S u is a dowwardly biased estimator of. ES u Eg(S u) apple g(es u)g( ) 4.9. Set g(p) p. The, g 00 (p). Recall that the variace of a Beroulli radom variable p( p) ad the bias b g (p) g00 (p) p) p( p) p(. 4.4. Cov(Y,Z) EY Z EY EZ EY Z wheever EZ 0. 4.6. For idepedet ormal radom variables with kow variace 0 ad ukow mea µ, the desity Thus, the score fuctio f(x µ) 0 p exp (x µ) 0, p (x µ) l f(x µ) l( 0 ). 0 µ l f(x µ) 0 (x µ). ad the Fisher iformatio associated to a sigle observatio " # I(µ) E l f(x µ) µ 4 E[(X µ) ] 4 Var(X). 0 0 0 Agai, the iformatio is the reciprocal of the variace. Thus, by the Cramér-Rao lower boud, ay ubiased estimator based o observatios must have variace al least 0 /. However, if we take d(x) x, the Var µ d(x) ad x is a uiformly miimum variace ubiased estimator. 4.7. First, we take two derivatives of l. 0. ad l / / l / (/) / l 8 /) (4.9)

Itroductio to the Sciece of Statistics Ubiased Estimatio upo substitutio from idetity (4.9). Thus, the expected values satisfy E apple l f(x ) E apple f(x )/ f(x ) " # l f(x ) E. h i Cosquetly, the exercise is complete if we show that E f(x )/ f(x ) 0. However, for a cotiuous radom variable, apple f(x )/ Z / Z Z E dx f(x ) dx dx 0. Note that the computatio require that we be able to pass two derivatives with respect to through the itegral sig. 4.. The Beroulli desity Thus, c(p) 4.. The ormal desity f(x p) p x ( p) x ( p) p p x ( p)exp x l p, h(x) ad the atural parameter (p) l p p, the log-odds. p p. f(x µ) 0 p exp (x µ) 0 0p e µ / 0 e x / 0 exp xµ 0 Thus, c(µ) p e µ / 0,h(x) e x / 0 ad the atural parameter (µ) µ/ 0. 0 9