1 Topic 4 Ubiased Estimatio 4. Itroductio I creatig a parameter estimator, a fudametal questio is whether or ot the estimator differs from the parameter i a systematic maer. Let s examie this by lookig a the computatio of the mea ad the variace of 6 flips of a fair coi. Give this task to 0 idividuals ad ask them report the umber of heads. We ca simulate this i R as follows > (x<-rbiom(0,6,0.5)) [] Our estimate is obtaied by takig these 0 aswers ad averagig them. Ituitively we aticipate a aswer aroud 8. For these 0 observatios, we fid, i this case, that > sum(x)/0 [] 7.8 The result is a bit below 8. Is this systematic? To assess this, we appeal to the ideas behid Mote Carlo to perform a 000 simulatios of the example above. > meax<-rep(0,000) > for (i i :000){meax[i]<-mea(rbiom(0,6,0.5))} > mea(meax) [] From this, we surmise that we the estimate of the sample mea x either systematically overestimates or uderestimates the distributioal mea. From our kowledge of the biomial distributio, we kow that the mea µ p I additio, the sample mea X also has mea E X 80 ( ) verifyig that we have o systematic error. The phrase that we use is that the sample mea X is a ubiased estimator of the distributioal mea µ. Here is the precise defiitio. Defiitio 4.. For observatios X (X,X,...,X ) based o a distributio havig parameter value, ad for d(x) a estimator for h( ), the bias is the mea of the differece d(x) h( ), i.e., b d ( ) E d(x) h( ). (4.) If b d ( ) 0for all values of the parameter, the d(x) is called a ubiased estimator. Ay estimator that is ot ubiased is called biased. 05

4 Itroductio to the Sciece of Statistics Ubiased Estimatio Usig the idetity above ad the liearity property of expectatio we fid that " # ES E (X i X) " # E (X i µ) ( X µ) E[(X i µ) ] E[( X µ) ] Var(X i ) Var( X) 6. The last lie uses (4.). This shows that S is a biased estimator for see that it is biased dowwards. b( ) Note that the bias is equal to Var( X). I additio, because apple E S E S ad S u S. (X i X). Usig the defiitio i (4.), we ca is a ubiased estimator for. As we shall lear i the ext sectio, because the square root is cocave dowward, S u p Su as a estimator for is dowwardly biased. Example 4.6. We have see, i the case of Beroulli trials havig x successes, that ˆp x/ is a ubiased estimator for the parameter p. This is the case, for example, i takig a simple radom sample of geetic markers at a particular biallelic locus. Let oe allele deote the wildtype ad the secod a variat. If the circumstaces i which variat is recessive, the a idividual expresses the variat pheotype oly i the case that both chromosomes cotai this marker. I the case of idepedet alleles from each paret, the probability of the variat pheotype is p. Naïvely, we could use the estimator ˆp. (Later, we will see that this is the maximum likelihood estimator.) To determie the bias of this estimator, ote that E ˆp (E ˆp) + Var(ˆp) p + p( p). (4.5) Thus, the bias b(p) p( p)/ ad the estimator ˆp is biased upward. Exercise 4.7. For Beroulli trials X,...,X, (X i ˆp) ˆp( ˆp). Based o this exercise, ad the computatio above yieldig a ubiased estimator, Su, for the variace, apple " # E ˆp( ˆp) E (X i ˆp) E[S u] Var(X ) p( p). 08

5 Itroductio to the Sciece of Statistics Ubiased Estimatio I other words, ˆp( ˆp) is a ubiased estimator of p( p)/. Returig to (4.5), apple E ˆp ˆp( ˆp) p + p( p) p( p) p. Thus, bp u ˆp ˆp( ˆp) is a ubiased estimator of p. To compare the two estimators for p, assume that we fid 3 variat alleles i a sample of 30, the ˆp 3/ , ˆp , ad p b u The bias for the estimate ˆp, i this case , is subtracted to give the ubiased estimate p b u. The heterozygosity of a biallelic locus is h p( p). From the discussio above, we see that h has the ubiased estimator ĥ x x x( x) ˆp( ˆp) ( ). 4.3 Compesatig for Bias I the methods of momets estimatio, we have used g( X) as a estimator for g(µ). If g is a covex fuctio, we ca say somethig about the bias of this estimator. I Figure 4., we see the method of momets estimator for the estimator g( X) for a parameter i the Pareto distributio. The choice of 3correspods to a mea of µ 3/ for the Pareto radom variables. The cetral limit theorem states that the sample mea X is early ormally distributed with mea 3/. Thus, the distributio of X is early symmetric aroud 3/. From the figure, we ca see that the iterval from.4 to.5 uder the fuctio g maps ito a loger iterval above 3tha the iterval from.5 to.6 maps below 3. Thus, the fuctio g spreads the values of X above 3more tha below. Cosequetly, we aticipate that the estimator ˆ will be upwardly biased. To address this pheomea i more geeral terms, we use the characterizatio of a covex fuctio as a differetiable fuctio whose graph lies above ay taget lie. If we look at the value µ for the covex fuctio g, the this statemet becomes g(x) g(µ) g 0 (µ)(x µ). Now replace x with the radom variable X ad take expectatios. Cosequetly, E µ [g( X) g(µ)] E µ [g 0 (µ)( X µ)] g 0 (µ)e µ [ X µ] 0. E µ g( X) g(µ) (4.6) ad g( X) is biased upwards. The expressio i (4.6) is kow as Jese s iequality. Exercise 4.8. Show that the estimator S u is a dowwardly biased estimator for. To estimate the size of the bias, we look at a quadratic approximatio for g cetered at the value µ g(x) g(µ) g 0 (µ)(x µ)+ g00 (µ)(x µ). 09

6 Itroductio to the Sciece of Statistics Ubiased Estimatio g(x) x/(x!)! 3.5 yg(µ)+g (µ)(x!µ) x Figure 4.: Graph of a covex fuctio. Note that the taget lie is below the graph of g. Here we show the case i which µ.5 ad g(µ) 3. Notice that the iterval from x.4 to x.5 has a loger rage tha the iterval from x.5 to x.6 Because g spreads the values of X above 3more tha below, the estimator ˆ for is biased upward. We ca use a secod order Taylor series expasio to correct most of this bias. Agai, replace x i this expressio with the radom variable X ad the take expectatios. The, the bias b g (µ) E µ [g( X)] g(µ) E µ [g 0 (µ)( X µ)] + E[g00 (µ)( X µ) ] g00 (µ)var( X) g00 (µ). (4.7) (Remember that E µ [g 0 (µ)( X µ)] 0.) Thus, the bias has the ituitive properties of beig large for strogly covex fuctios, i.e., oes with a large value for the secod derivative evaluated at the mea µ, large for observatios havig high variace, ad small whe the umber of observatios is large. Exercise 4.9. Use (4.7) to estimate the bias i usig ˆp as a estimate of p is a sequece of Beroulli trials ad ote that it matches the value (4.5). Example 4.0. For the method of momets estimator for the Pareto radom variable, we determied that ad that X has g(µ) µ µ. mea µ ad variace By takig the secod derivative, we see that g 00 (µ) (µ Next, we have g 00 0 ( ) ( ) ) 3 > 0 ad, because µ>, g is a covex fuctio. 3 ( ) 3.

8 Itroductio to the Sciece of Statistics Ubiased Estimatio Defiitio 4.. Give data X,X,...ad a real valued fuctio h of the parameter space, a sequece of estimators d, based o the first observatios, is called cosistet if for every choice of wheever is the true state of ature. lim d (X,X,...,X )h( )! Thus, the bias of the estimator disappears i the limit of a large umber of observatios. I additio, the distributio of the estimators d (X,X,...,X ) become more ad more cocetrated ear h( ). For the ext example, we eed to recall the sequece defiitio of cotiuity: A fuctio g is cotiuous at a real umber x provided that for every sequece {x ; } with x! x, the, we have that g(x )! g(x). A fuctio is called cotiuous if it is cotiuous at every value of x i the domai of g. Thus, we ca write the expressio above more succictly by sayig that for every coverget sequece {x ; }, lim g(x )g( lim x ).!! Example 4.3. For a method of momet estimator, let s focus o the case of a sigle parameter (d ). For idepedet observatios, X,X,...,havig mea µ k( ), we have that E X µ, i. e. X, the sample mea for the first observatios, is a ubiased estimator for µ k( ). Also, by the law of large umbers, we have that lim! X µ. Assume that k has a cotiuous iverse g k. I particular, because µ k( ), we have that g(µ). Next, usig the methods of momets procedure, defie, for observatios, the estimators ˆ (X,X,...,X )g (X + + X ) g( X ). for the parameter. Usig the cotiuity of g, we fid that lim ˆ (X,X,...,X ) lim g( X )g( lim!!! ad so we have that g( X ) is a cosistet sequece of estimators for. X )g(µ) 4.5 Cramér-Rao Boud This topic is somewhat more advaced ad ca be skipped for the first readig. This sectio gives us a itroductio to the log-likelihood ad its derivative, the score fuctios. We shall ecouter these fuctios agai whe we itroduce maximum likelihood estimatio. I additio, the Cramér Rao boud, which is based o the variace of the score fuctio, kow as the Fisher iformatio, gives a lower boud for the variace of a ubiased estimator. These cocepts will be ecessary to describe the variace for maximum likelihood estimators. Amog ubiased estimators, oe importat goal is to fid a estimator that has as small a variace as possible, A more precise goal would be to fid a ubiased estimator d that has uiform miimum variace. I other words, d(x) has has a smaller variace tha for ay other ubiased estimator d for every value of the parameter.

11 Itroductio to the Sciece of Statistics Ubiased Estimatio Thus, the iformatio for observatios I ( ) /( ( )). Thus, by the Cramér-Rao lower boud, ay ubiased estimator of based o observatios must have variace al least ( )/. Now, otice that if we take d(x) x, the E X, ad Var d(x) Var( X) ( ). These two equatios show that X is a ubiased estimator havig uiformly miimum variace. Exercise 4.6. For idepedet ormal radom variables with kow variace uiformly miimum variace ubiased estimator. 0 ad ukow mea µ, X is a Exercise 4.7. Take two derivatives of l to show that " # apple l f(x ) l f(x ) I( ) E E. (4.6) This idetity is ofte a useful alterative to compute the Fisher Iformatio. Example 4.8. For a expoetial radom variable, Thus, by (4.6), l f(x )l x, I( ). f(x ). Now, X is a ubiased estimator for h( )/ By the Cramér-Rao lower boud, we have that with variace. g 0 ( ) I( ) / 4. Because X has this variace, it is a uiformly miimum variace ubiased estimator. Example 4.9. To give a estimator that does ot achieve the Cramér-Rao boud, let X,X,...,X be a simple radom sample of Pareto radom variables with desity The mea ad the variace µ Thus, X is a ubiased estimator of µ /( ) To compute the Fisher iformatio, ote that f X (x ) x +, x >., ( ) ( ). Var( X) ( ) ( ). l f(x )l ( + ) l x ad thus l f(x ). 5

12 Itroductio to the Sciece of Statistics Ubiased Estimatio Usig (4.6), we have that I( ). Next, for µ g( ), g0 ( ) Thus, the Cramér-Rao boud for the estimator is ( ), ad g0 ( ) ( ) 4. ad the efficiecy compared to the Cramér-Rao boud is g 0 ( ) /I ( ) Var( X) g 0 ( ) I ( ) ( ) 4. ( ) 4 ( ) ( ) ( ) ( ) ( ). The Pareto distributio does ot have a variace uless >. For just above, the efficiecy compared to its Cramér-Rao boud is low but improves with larger. 4.6 A Note o Efficiet Estimators For a efficiet estimator, we eed fid the cases that lead to equality i the correlatio iequality (4.8). Recall that equality occurs precisely whe the correlatio is ±. This occurs whe the estimator d(x) ad the score fuctio l f X (X )/ are liearly related with probability. After itegratig, we obtai, Z l f X (X ) l f X(X ) a( )d(x)+b( ). Z a( )d d(x)+ b( )d + j(x) ( )d(x)+ B( )+ j(x) Note that the costat of itegratio of itegratio is a fuctio of X. Now expoetiate both sides of this equatio f X (X ) c( )h(x)exp( ( )d(x)). (4.7) Here c( ) expb( ) ad h(x) expj(x). We shall call desity fuctios satisfyig equatio (4.7) a expoetial family with atural parameter ( ). Thus, if we have idepedet radom variables X,X,...X, the the joit desity is the product of the desities, amely, f(x ) c( ) h(x ) h(x )exp( ( )(d(x )+ + d(x )). (4.8) I additio, as a cosequece of this liear relatio i (4.8), is a efficiet estimator for h( ). Example 4.0 (Poisso radom variables). d(x) (d(x )+ + d(x )) f(x ) x x! e e x! exp(x l ). 6

13 Itroductio to the Sciece of Statistics Ubiased Estimatio Thus, Poisso radom variables are a expoetial family with c( )exp( ( )l. Because E X, X is a ubiased estimator of the parameter. The score fuctio l f(x ) (x l l x! ) x. The Fisher iformatio for oe observatio is " X # I( )E E [(X ) ]. ), h(x) /x!, ad atural parameter Thus, I ( )/ is the Fisher iformatio for observatios. I additio, ad d(x) x has efficiecy Var ( X) Var( X) /I ( ). This could have bee predicted. The desity of idepedet observatios is f(x ) e x! x e x! x e x +x x! x! e x x! x! ad so the score fuctio x l f(x ) ( + x l ) + showig that the estimate x ad the score fuctio are liearly related. Exercise 4.. Show that a Beroulli radom variable with parameter p is a expoetial family. Exercise 4.. Show that a ormal radom variable with kow variace family. 0 ad ukow mea µ is a expoetial 4.7 Aswers to Selected Exercises 4.4. Repeat the simulatio, replacig mea(x) by 8. > ssx<-rep(0,000) > for (i i :000){x<-rbiom(0,6,0.5);ssx[i]<-sum((x-8)ˆ)} > mea(ssx)/0;mea(ssx)/9 [] [] Note that divisio by 0 gives a aswer very close to the correct value of 4. To verify that the estimator is ubiased, we write " # E (X i µ) E[(X i µ) ] Var(X i ). 7

14 Itroductio to the Sciece of Statistics Ubiased Estimatio 4.7. For a Beroulli trial ote that X i X i. Expad the square to obtai (X i ˆp) Divide by to obtai the result. X i ˆp X i + ˆp ˆp ˆp + ˆp (ˆp ˆp )ˆp( ˆp) Recall that ES u. Check the secod derivative to see that g(t) p t is cocave dow for all t. For cocave dow fuctios, the directio of the iequality i Jese s iequality is reversed. Settig t S u, we have that ad S u is a dowwardly biased estimator of. ES u Eg(S u) apple g(es u)g( ) 4.9. Set g(p) p. The, g 00 (p). Recall that the variace of a Beroulli radom variable p( p) ad the bias b g (p) g00 (p) p) p( p) p( Cov(Y,Z) EY Z EY EZ EY Z wheever EZ For idepedet ormal radom variables with kow variace 0 ad ukow mea µ, the desity Thus, the score fuctio f(x µ) 0 p exp (x µ) 0, p (x µ) l f(x µ) l( 0 ). 0 µ l f(x µ) 0 (x µ). ad the Fisher iformatio associated to a sigle observatio " # I(µ) E l f(x µ) µ 4 E[(X µ) ] 4 Var(X) Agai, the iformatio is the reciprocal of the variace. Thus, by the Cramér-Rao lower boud, ay ubiased estimator based o observatios must have variace al least 0 /. However, if we take d(x) x, the Var µ d(x) ad x is a uiformly miimum variace ubiased estimator First, we take two derivatives of l. 0. ad l / / l / (/) / l 8 /) (4.9)

15 Itroductio to the Sciece of Statistics Ubiased Estimatio upo substitutio from idetity (4.9). Thus, the expected values satisfy E apple l f(x ) E apple f(x )/ f(x ) " # l f(x ) E. h i Cosquetly, the exercise is complete if we show that E f(x )/ f(x ) 0. However, for a cotiuous radom variable, apple f(x )/ Z / Z Z E dx f(x ) dx dx 0. Note that the computatio require that we be able to pass two derivatives with respect to through the itegral sig. 4.. The Beroulli desity Thus, c(p) 4.. The ormal desity f(x p) p x ( p) x ( p) p p x ( p)exp x l p, h(x) ad the atural parameter (p) l p p, the log-odds. p p. f(x µ) 0 p exp (x µ) 0 0p e µ / 0 e x / 0 exp xµ 0 Thus, c(µ) p e µ / 0,h(x) e x / 0 ad the atural parameter (µ) µ/

