BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

Similar documents

Chapter 7 Methods of Finding Estimators

Properties of MLE: consistency, asymptotic normality. Fisher information.

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Statistical inference: example 1. Inferential Statistics

Sampling Distribution And Central Limit Theorem

Maximum Likelihood Estimators.

Normal Distribution.

I. Chi-squared Distributions

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

1 Computing the Standard Deviation of Sample Means

Overview of some probability distributions.

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Parametric (theoretical) probability distributions. (Wilks, Ch. 4) Discrete distributions: (e.g., yes/no; above normal, normal, below normal)

Hypothesis testing. Null and alternative hypotheses

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Unbiased Estimation. Topic Introduction

Our aim is to show that under reasonable assumptions a given 2π-periodic function f can be represented as convergent series

Confidence Intervals for One Mean

Output Analysis (2, Chapters 10 &11 Law)

5: Introduction to Estimation

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

Soving Recurrence Relations

A probabilistic proof of a binomial identity

1. MATHEMATICAL INDUCTION

Infinite Sequences and Series

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

Incremental calculation of weighted mean and variance

Chapter 5: Inner Product Spaces

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

CS103X: Discrete Structures Homework 4 Solutions

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

Measures of Spread and Boxplots Discrete Math, Section 9.4

1. C. The formula for the confidence interval for a population mean is: x t, which was

PSYCHOLOGICAL STATISTICS

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

A gentle introduction to Expectation Maximization

Sequences and Series

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Section 11.3: The Integral Test

Determining the sample size

Modified Line Search Method for Global Optimization

Convexity, Inequalities, and Norms

3. Greatest Common Divisor - Least Common Multiple

Section 8.3 : De Moivre s Theorem and Applications

STATISTICAL METHODS FOR BUSINESS

Asymptotic Growth of Functions

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

1 Correlation and Regression Analysis

THE TWO-VARIABLE LINEAR REGRESSION MODEL

Math C067 Sampling Distributions

LECTURE 13: Cross-validation

Lesson 15 ANOVA (analysis of variance)

Chapter 14 Nonparametric Statistics

5.3. Generalized Permutations and Combinations

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

Maximum Likelihood Estimation

Factoring x n 1: cyclotomic and Aurifeuillian polynomials Paul Garrett <garrett@math.umn.edu>

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

A Mathematical Perspective on Gambling

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix

Theorems About Power Series

Basic Elements of Arithmetic Sequences and Series

One-sample test of proportions

4.3. The Integral and Comparison Tests

Chapter 7: Confidence Interval and Sample Size

Lecture 5: Span, linear independence, bases, and dimension

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

1 The Gaussian channel

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

7. Concepts in Probability, Statistics and Stochastic Modelling

Descriptive Statistics

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8

S. Tanny MAT 344 Spring be the minimum number of moves required.

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 11 04/01/2008. Sven Zenker

The Stable Marriage Problem

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Lecture 4: Cheeger s Inequality

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Ekkehart Schlicht: Economic Surplus and Derived Demand

Subject CT5 Contingencies Core Technical Syllabus

AP Calculus AB 2006 Scoring Guidelines Form B

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Hypergeometric Distributions

TIGHT BOUNDS ON EXPECTED ORDER STATISTICS

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

Transcription:

BASIC STATISTICS. SAMPLES, RANDOM SAMPLING AND SAMPLE STATISTICS.. Radom Sample. The radom variables X,X 2,..., X are called a radom sample of size from the populatio f(x if X,X 2,..., X are mutually idepedet radom variables ad the margial probability desity fuctio of each X i is the same fuctio of f(x. Alteratively, X,X 2,..., X are called idepedet ad idetically distributed radom variables with pdf f(x. We abbreviate idepedet ad idetically distributed as iid. Most experimets ivolve > repeated observatios o a particular variable, the first observatio is X, the secod is X 2, ad so o. Each X i is a observatio o the same variable ad each X i has a margial distributio give by f(x. Give that the observatios are collected i such a way that the value of oe observatio has o effect or relatioship with ay of the other observatios, the X,X 2,..., X are mutually idepedet. Therefore we ca write the joit probability desity for the sample X,X 2,..., X as f(x,x 2,..., x f(x f(x 2 f(x f(x i ( If the uderlyig probability model is parameterized by θ, the we ca also write f(x,x 2,..., x θ f(x i θ (2 Note that the same θ is used i each term of the product, or i each margial desity. A differet value of θ would lead to a differet properties for the radom sample..2. Statistics. Let X,X 2,..., X be a radom sample of size from a populatio ad let T (x,x 2,..., x be a real valued or vector valued fuctio whose domai icludes the sample space of (X,X 2,..., X. The the radom variable or radom vector Y (X,X 2,..., X is called a statistic. A statistic is a map from the sample space of (X,X 2,..., X call it X, to some space of values, usually R or R. T is what we compute whe we observe the radom variable X take o some specific values i a sample. The probability distributio of a statistic Y T(X is called the samplig distributio of Y. Notice that T( is a fuctio of sample values oly, it does ot deped o ay uderlyig parameters, θ..3. Some Commoly Used Statistics..3.. Sample mea. The sample mea is the arithmetic average of the values i a radom sample. It is usually deoted X(X,X 2,,X X + X 2 +... + X X i (3 The observed value of X i ay sample is demoted by the lower case letter, i.e., x. Date: September 27, 2004.

2 BASIC STATISTICS.3.2. Sample variace. The sample variace is the statistic defied by S 2 (X,X 2,,X (X i X 2 (4 The observed value of S 2 i ay sample is demoted by the lower case letter, i.e., s 2..3.3. Sample stadard deviatio. The sample stadard deviatio is the statistic defied by S S 2 (5.3.4. Sample midrage. The sample mid-rage is the statistic defied by max(x,x 2,,X mi(x,x 2,,X (6 2.3.5. Empirical distributio fuctio. The empirical distributio fuctio is defied by ˆF (X,X 2,,X (x I(X i <x (7 where ˆF(X,X 2,,X (x meas we are evaluatig the statistic ˆF (X,X 2,,X at the particular value x. The radom sample X,X 2,..., X is assumed to come from a probability defied o R ad I(A is the idicator of the evet A. This statistic takes values i the set of all distributio fuctios o R. It estimates the fuctio valued parameter F defied by its evaluatio at x R F (P (x P [X <x] (8 2. DISTRIBUTION OF SAMPLE STATISTICS 2.. Theorem o squared deviatios ad sample variaces. Theorem. Let x,x 2, x be ay umbers ad let x x+x2+...+x. The the followig two items hold. a: mi a (x i a 2 (x i x 2 b: ( s 2 (x i x 2 x2 i x2 Part a says that the sample mea is the value about which the sum of squared deviatios is miimized. Part b is a simple idetity that will prove immesely useful i dealig with statistical data. Proof. First cosider part a of theorem. Add ad subtract x from the expressio o the lefthad side i part a ad the expad as follows (x i x + x a 2 (x i x 2 +2 (x i x( x a+ ( x a 2 (9 Now write out the middle term i 9 ad simplify (x i x( x a x x i a x i x x + x a (0 x 2 a x x 2 + xa 0

BASIC STATISTICS 3 We ca the write 9 as (x i a 2 (x i x 2 + ( x a 2 ( Equatio is clearly miimized whe a x. Now cosider part b of theorem. Expad the secod expressio i part b ad simplify (x i x 2 x 2 i 2 x x i + x 2 (2 x 2 i 2 x 2 + x 2 x 2 i x 2 2.2. Theorem 2 o expected values ad variaces of sums. Theorem 2. Let X,X 2, X be a radom sample from a populatio ad let g(x be a fuctio such that Eg(X ad Varg(X exist. The followig two items hold. a: E( g(x i (Eg(X b: Var( g(x i (Varg(X Proof. First cosider part a of theorem 2. Write the expected value of the sum as the sum of the expected values ad the ote that Eg(X Eg(X 2...Eg(X i...eg(x because the X i are all from the same distributio. ( E g(x i E(g(X i (Eg(X (3 First cosider part b of theorem 2. Write the defiitio of the variace for a variable z as E(z E(z 2 ad the combie terms i the summatio sig. ( [ ( 2 Var g(x i E g(x i E g(x i ] (4 Now write out the bottom expressio i equatio 4 as follows ( Var g(x i E [g(x E(g(X ] 2 + E [g(x E(g(X ] E [g(x 2 E(g(X 2 ] + E [g(x E(g(X ] E [g(x 3 E(g(X 3 ] + + E [g(x 2 E(g(X 2 ] E [g(x E(g(X ] + E [g(x 2 E(g(X 2 ] 2 (5 + E [g(x 2 E(g(X 2 ] E [g(x 3 E(g(X 3 ] + + + E [g(x E(g(X ] E [g(x E(g(X ] + + E [g(x E(g(X ] 2 Each of the squared terms i the summatio is a variace, i.e., the variace of X i var(x. Specifically E [g(x i E(g(X i ] 2 Varg(X i Varg(X (6

4 BASIC STATISTICS The other terms i the summatio i 5 are covariaces of the form E [g(x i E(g(X i ] E [g(x j E(g(X j ] Cov [g(x i,g(x j ] (7 Now we ca use the fact that the X ad X j i the sample X,X 2,,X are idepedet to assert that each of the covariaces i the sum i 5 is zero. We ca the rewrite 5 as ( Var g(x i E [g(x E(g(X ] 2 + E [g(x 2 E(g(X 2 ] 2 + + E [g(x E(g(X ] 2 Var(g(X + Var(g(X 2 + Var(g(X 3 + (8 Varg(X i Varg(X Varg(X 2.3. Theorem 3 o expected values of sample statistics. Theorem 3. Let X,X 2, X be a radom sample from a populatio with mea µ ad variace σ 2 <. The a: E X µ b: Var X σ2 c: ES 2 σ 2 Proof of part a. I theorem 2 let g(x g(x i Xi. This implies that Eg(X i µ The we ca write ( ( E X E X i E X i (EX µ (9 Proof of part b. I theorem 2 let g(x g(x i Xi. This implies that Varg(X i σ2 The we ca write ( ( Var X Var X i Var X 2 i (V arx σ2 (20 2 Proof of part c. As i part b of theorem, write S 2 as a fuctio of the sum of square of X i mius times the mea of X i squared ad the simplify ( [ ] ES 2 E Xi 2 X 2 ( EX 2 E X 2 ( ( σ (σ 2 + µ 2 2 + µ2 σ 2 The last lie follows from the defiitio of a radom variable, i.e., (2

BASIC STATISTICS 5 VarX σ 2 X EX 2 (EX 2 EX 2 µ 2 X (22 EX 2 σ 2 X µ 2 X 2.4. Ubiased Statistics. We say that a statistic T(Xis a ubiased statistic for the parameter θ of the uderlyig probability distributio if E T(X θ. Give this defiitio, X is a ubiased statistic for µ,ad S 2 is a ubiased statistic for σ 2 i a radom sample. 3. METHODS OF ESTIMATION Let Y,Y 2, Y deote a radom sample from a paret populatio characterized by the parameters θ,θ 2, θ k. It is assumed that the radom variable Y has a associated desity fuctio f( ; θ,θ 2, θ k. 3.. Method of Momets. 3... Defiitio of Momets. If Y is a radom variable, the rth momet of Y, usually deoted by µ r, is defied as µ r E(Y r y r f(y; θ,θ 2, θ k dy (23 if the expectatio exists. Note that µ E(Y µ Y, the mea of Y. Momets are sometimes writte as fuctios of θ. E(Y r µ r g r (θ,θ 2, θ k (24 3..2. Defiitio of Cetral Momets. If Y is a radom variable, the rth cetral momet of Y about a is defied as E[(Y a r ].Ifaµ r, we have the rth cetral momet of Y about µ Y, deoted by µ r, which is µ r E[(Y µ Y r ] (y µ y r f(y; θ,θ 2, θ k dy (25 Note thatµ E[(Y µ Y ] 0 ad µ 2 E[(Y µ Y 2 ]Var[Y ]. Also ote that all odd umbered momets of Y aroud its mea are zero for symmetrical dsitributios, provided such momets exist. 3..3. Sample Momets about the Origi. The rth sample momet about the origi is defied as ˆµ r x r yi r (26

6 BASIC STATISTICS 3..4. Estimatio Usig the Method of Momets. I geeral µ r will be a kow fuctio of the parameters θ,θ 2, θ k of the distributio of Y, that is µ r g r (θ,θ 2, θ k. Now let y,y 2,,y be a radom sample from the desity f( ; θ,θ 2, θ k. Form the K equatios µ g (θ,θ 2, θ k ˆµ µ 2 g 2 (θ,θ 2, θ k ˆµ 2. µ K g K (θ,θ 2, θ k ˆµ K y i yi 2 (27 The estimators of θ,θ 2, θ k, based o the method of momets, are obtaied by solvig the system of equatios for the K parameter estimates ˆθ, ˆθ 2, θ ˆ K. This priciple of estimatio is based upo the covetio of pickig the estimators of θı i such a maer that the correspodig populatio (theoretical momets are equal to the sample momets. These estimators are cosistet uder fairly geeral regularity coditios, but are ot geerally efficiet. Method of momets estimators may also ot be uique. y K i 3..5. Example usig desity fuctio f(y (p +y p. Cosider a desity fuctio give by f(y (p +y p 0 y (28 0 otherwise Let Y,Y 2, Y deote a radom sample from the give populatio. Express the first momet of Y as a fuctio of the parameters. E(Y 0 0 yf(y dy y (p +y p dy y p+ (p +dy (29 yp+2 (p + (p +2 p + p +2 The set this expressio of the parameters equal to the first sample momet ad solve for p. 0

BASIC STATISTICS 7 µ E(Y p + p +2 p + p +2 y i ȳ (30 p +(p +2ȳ pȳ +2ȳ p pȳ 2ȳ p( ȳ 2ȳ ˆp 2ȳ ȳ 3..6. Example usig the Normal Distributio. Let Y,Y 2, Y deote a radom sample from a ormal distributio with mea µ ad variace σ 2. Let (θ,θ 2 (µ, σ 2. Remember that µ µ ad σ 2 E[Y 2 ] E 2 [Y ]µ 2 (µ 2. µ E(Y µ (3 µ 2 E(Y 2 σ 2 + E 2 [Y ]σ 2 + µ 2 Now set the first populatio momet equal to its sample aalogue to obtai µ ˆµ ȳ Now set the secod populatio momet equal to its sample aalogue σ 2 + µ 2 y i ȳ (32 y 2 i σ 2 yi 2 µ 2 (33 σ yi 2 µ2 Now replace µ i equatio 33 with its estimator from equatio 32 to obtai ˆσ yi 2 ȳ2 (34 ˆσ (y i ȳ 2 This is, of course, from the sample stadard deviatio defied i equatios 4 ad 5.

8 BASIC STATISTICS 3..7. Example usig the Gamma Distributio. Let X,X 2, X deote a radom sample from a gamma distributio with parameters θ ad α. The desity fuctio is give by f(x; θ, α θ α Γ(α xα e x θ 0 x< (35 0 otherwise Fid the first momet of the gamma distributio by itegratig as follows E(X 0 θ α Γ(α x θ α Γ(α xα e x θ dx (36 If we multiply equatio 36 by θ +α Γ( + α we obtai 0 x (+α e x θ dx E(X θ+α Γ( + α θ α Γ(α 0 θ +α Γ( + α x(+α e x θ dx (37 The itegrad of equatio 37 is a gamma desity with parameters θ ad α This itegrad will itegrate to oe so that we obtai the expressio i frot of the itegral sig as the E(X. E(X θ+α Γ( + α θ α Γ(α θ Γ( + α Γ(α (38 The gamma fuctio has the property that Γ(t (t Γ(t or Γ(v +vγ(v. Replacig Γ( + α with α Γ(α i equatio 38, we obtai θ Γ( + α E(X Γ(α θαγ(α Γ(α θα (39 We ca fid the secod momet by fidig E(X 2. To do this we multiply the gamma desity i equatio 36 by x 2 istead of x. Carryig out the computatio we obtai E(X 2 0 θ α Γ(α x 2 θ α Γ(α xα e x θ dx (40 0 x (2+α e x θ dx

BASIC STATISTICS 9 If we the multiply 40 by θ 2+α Γ(2 + α we obtai E(X 2 θ2+α Γ(2 + α θ α Γ(α 0 θ2+α Γ(2 + α θ α Γ(α θ2 (α + Γ( + α Γ(α θ2 α(α + Γ(α Γ(α θ 2 α(α + θ 2+α Γ(2 + α x(2+α e x θ dx Now set the first populatio momet equal to the sample aalogue to obtai θα x i x (42 ˆα x θ Now set the secod populatio momet equal to its sample aalogue (4 θ 2 α ( α + x 2 i θ 2 θ 2 x2 i α( α + x2 i ( x (( x θ θ + θ 2 ( x 2 θ 2 x2 i x 2 + x θ ( + x θ x 2 i x θ x 2 i x 2 θ x2 i x 2 x (x i x 2 x (43 3.2. Method of least squares estimatio. Cosider the situatio i which the Y i from the radom sample ca be writte i the form Y i β + ɛ i ˆβ + e i (44 where E(ɛ i 0 ad Var(ɛ i σ 2 for all i. This is equivalet to statig that the populatio from which y i is draw has a mea of β ad a variace of σ 2.

0 BASIC STATISTICS The least squares estimator of β is obtaied by miimizig the sum of squares errors, SSE, defied by SSE e 2 i (y i ˆβ 2 (45 The idea is to pick the value of ˆβ to estimate β which miimizes SSE. Pictorially we select the value of ˆβ which miimizes the sum of squares of the vertical deviatios i figure. FIGURE. Least Squares Estimatio The solutio is obtaied by fidig the value of β that miimizes equatio 45. SSE β 2 (y i ˆβ( 0 ˆβ y i ȳ This method chooses values of the parameters of the uderlyig distributio, θ, such that the distace betwee the elemets of the radom sample ad predicted values are miimized. i (46

BASIC STATISTICS 3.3. Method of maximum likelihood estimatio (MLE. Least squares is idepedet of a specificatio of a desity fuctio for the paret populatio. Now assume that y i f ( ; θ (θ,..., θ K i. (47 3.3.. Motivatio for the MLE method. If a radom variable Y has a probability desity fuctio f( ; θ characterized by the parameters θ (θ,...,θ k, the the maximum likelihood estimators (MLE of θ,...,θ k are the values of these parameters which would have most likely geerated the give sample. 3.3.2. Theoretical developmet of the MLE method. The joit desity of a radom sample y,y 2,..., y is give by L g (y,,y ; θ f(y ; θ f (y 2 ; θ f(y 3 ; θ f(y ; θ. Give that we have a radom sample, the joit desity is just the product of the margial desity fuctios. This is referred to as the likelihood fuctio. The MLE of the θ i are the θ i which maximize the likelihood fuctio. The ecessary coditios for a optimum are: L 0i, 2,..., k (48 θ i This gives k equatios i k ukows to solve for the k parameters θ,..., θ k. I may istaces it will be coveiet to maximize l l L rather tha L give that the log of a product is the sum of the logs. 3.3.3. Example. Let the radom variable X i be distributed as a ormal N(µ,σ 2 so that its desity is give by f(x i ; µ, σ 2 Its likelihood fuctio is give by 2 πσ 2 e 2 ( x i µ σ 2 (49 L Π f(x i ; µ, σ 2 f(x f(x 2 f(x ( e 2 ( x i µ σ 2 e 2 ( x µ σ 2 2πσ 2 ( 2πσ 2 e 2 σ 2 (xi µ2 (50 l L l 2 l(2 πσ 2 2 σ 2 (x i µ 2 The MLE of µ ad σ 2 are obtaied by takig the partial derivatives of equatio 50

2 BASIC STATISTICS l µ ˆσ 2 l σ 2 2 (x i µ 0 ˆµ x i [ ] 2π 2πˆσ 2 2ˆσ 2 (2ˆσ 2 2 ( ( (ˆσ 2 2 2 (x i ˆµ 2 x (x i ˆµ 2 0 ˆσ (x 2 i ˆµ 2 (5 ˆσ 2 (x i ˆµ 2 ˆσ 2 (x i x 2 ( s 2 The MLE of σ 2 is equal to the sample variace ad ot S 2 ; hece, the MLE is ot ubiased as ca be see from equatio 2. The MLE of µ is the sample mea. 3.3.4. Example 2 - Poisso. The radom variable X i is distributed as a Poisso if the desity of X i is give by e λ λ x i x i x! i a o egative iteger f(x i ; λ 0 otherwise mea (X λ Var (X λ The likelihood fuctio is give by [ e λ ] [ λ x e λ ] λ x L x! x! (52 e λ λ Σ xi π x i! (53 l L l λ + x i l λ l ( π x i! To obtai a MLE of λ, differetiate l with respect to λ:

BASIC STATISTICS 3 l λ + x i λ 0 (54 ˆλ x i x 3.3.5. Example 3. Cosider the desity fuctio f ( y (p +y p 0 y 0 otherwise (55 The likelihood fuctio is give by L Π (p + yp i l L l l [(p + y p i ] (56 (l (p + +p l y i To obtai the MLE estimator differetiate 56 with respect to p l p ( p + + l y i 0 ˆp + ( l y i ˆp + ( l y i (57 ˆp + l y i ˆp l y i 3.3.6. Example 4. Cosider the desity fuctio The likelihood fuctio is give by f(y p yi ( p yi 0 p (58

4 BASIC STATISTICS L Π p yi ( p yi p i yi l L l ( p i yi ( y i l p + y i l ( p To obtai the MLE estimator differetiate 59 with respect to p where we assume that 0 < p <. i (59 l p y i p ( i y i p 0 y i p ( i y i p y i p y i p p y i (60 y i p y i ˆp 3.4. Priciple of Best Liear Ubiased Estimatio (BLUE. 3.4.. Priciple of Best Liear Ubiased Estimatio. Start with some desired properties ad deduce a estimator satisfyig them. For example suppose that we wat the estimator to be liear i the observed radom variables. This meas that if the observatios are y,...,y, a estimator of θ must satisfy where the a i are to be determied. ˆθ a i y i (6 3.4.2. Some required properties of the estimator (arbitrary. : E(ˆθ θ (ubiased 2: Var(ˆθ VAR( θ(miimum variace where θ is ay other liear combiatio of the y i that also produces a ubiased estimator. 3.4.3. Example. Let Y,Y 2,...,Y deote a radom sample draw from a populatio havig a mea µ ad variace σ 2. Now derive the best liear ubiased estimator (BLUE of µ. Let the proposed estimator be deoted by ˆθ. It is liear so we ca write it as follows. ˆθ a i y i (62 If the estimator is to be ubiased, there will be restrictios o the a i. Specifically

BASIC STATISTICS 5 Now cosider the variace of ˆθ. Ubiasedess [ ] Var( ˆθ Var a i y i E(ˆθ E µ > ( a i y i a i E(y i a i µ a i a i a 2 i Var(y i+σσ i j a i a j Cov(y i y j a 2 i σ2 because the covariace betwee y i ad y j (i j is equal to zero due to the fact that the y s are draw from a radom sample. (63 (64 subject to the co- The problem of obtaiig a BLUE of µ becomes that of miimizig strait i a i. This is doe by settig up a Lagragia i a2 i The ecessary coditios for a optimum are L(a, λ a 2 i λ( a i (65 L 2a λ 0 a... L 2a λ 0 a L λ a i +0 The first equatios imply that a a 2 a 3...a so that the last equatio implies that (66

6 BASIC STATISTICS a i 0 a i 0 a i (67 a i ˆθ a i y i Note that equal weights are assiged to each observatio. y i ȳ 4. FINITE SAMPLE PROPERTIES OF ESTIMATORS 4.. Itroductio to sample properties of estimators. I sectio 3 we discussed alterative methods of estimatig the ukow parameters i a model. I order to compare the estimatig techiques we will discuss some criteria which are frequetly used i such a compariso. Let θ deote a ukow parameter ad let ˆθ ad θ be alterative estimators. Now defie the bias, variace ad mea squared error of ˆθ as Bias (ˆθ E (ˆθ θ Var(ˆθ E MSE (ˆθ E (ˆθ E (ˆθ 2 (ˆθ θ 2 Var(ˆθ + ( Bias The result o mea squared error ca be see as follows 2 MSE(θ E (ˆθ θ (ˆθ 2 E (ˆθ E (ˆθ + E θ ( ( 2 E ((ˆθ E (ˆθ + E ˆθ θ 2 [ (ˆθ E (ˆθ E (ˆθ +2 E θ (ˆθ 2 (ˆθ] E (ˆθ E + 2 ( 2 (ˆθ E (ˆθ + E (ˆθ θ sice E (ˆθ E (ˆθ E (ˆθ Var + ( 2 Bias(ˆθ ( 2 E (ˆθ θ 0 (68 (69 4.2. Specific properties of estimators.

BASIC STATISTICS 7 4.2.. Ubiasedess. ˆθ is said to be a ubiased estimator of θ if E (ˆθ θ. I figure 2, ˆθ is a ubiased estimator of θ, while θ is a biased estimator. f Θ FIGURE 2. Ubiased Estimator f Θ f Θ Θ Θ 0 4.2.2. Miimum variace. ˆθ is said to be a miimum variace estimator of θ if Var(ˆθ Var( θ (70 where θ is ay other estimator of θ. This criterio has its disadvatages as ca be see by otig that ˆθ costat has zero variace ad yet completely igores ay sample iformatio that we may have. I figure 3, θ has a lower variace tha ˆθ. 4.2.3. Mea squared error efficiet. ˆθ is said to be a MSE efficiet estimator of θ if MSE(ˆθ MSE( θ (7 where θ is ay other estimator of θ. This criterio takes ito accout both the variace ad bias of the estimator uder cosideratio. Figure 4 shows three alterative estimators of θ. 4.2.4. Best liear ubiased estimators. ˆθ is the best liear ubiased estimator (BLUE of θ if

8 BASIC STATISTICS FIGURE 3. Estimators with the Same Mea but Differet Variaces f Θ f Θ f Θ Θ ˆθ a i y i liear E(ˆθ θ ubiased (72 Var(ˆθ Var( θ where θ is ay other liear ubiased estimator of θ. For the class of ubiased estimators of ˆθ, the efficiet estimators will also be miimum variace estimators. 4.2.5. Example. Let X,X 2,...,X deote a radom sample draw from a populatio havig a populatio mea equal to µ ad a populatio variace equal to σ 2. The sample mea (estimator of µ is calculated by the formula X i X ad is a ubiased estimator of µ from theorem 3 ad equatio 9. (73 Two possible estimates of the populatio variace are ˆσ 2 S 2 (X i X 2 (X i X 2 We have show previously i theorem 3 ad equatio 2 that ˆσ 2 is a biased estimator of σ 2 ; whereas S 2 is a ubiased estimator of σ 2. Note also that

BASIC STATISTICS 9 FIGURE 4. Three Alterative Estimators f Θ f Θ f Θ f Θ Θ ( ˆσ 2 S 2 E (ˆσ ( 2 E ( S 2 ( σ 2 Also from theorem 3 ad equatio 20, we have that Var ( X σ2 (74 (75 Now cosider the mea square error of the two estimators X ad S 2 where X,X 2,...X are a radom sample from a ormal populatio with a mea of µ ad a variace of σ 2.

20 BASIC STATISTICS E ( 2 ( σ X µ Var X 2 E ( S 2 σ 2 2 ( Var S 2 2 (76 σ4 The variace of S 2 was derived i the lecture o sample momets. The variace of ˆσ 2 is easily computed give the variace of S 2. Specifically, (( Varˆσ 2 Var s 2 ( 2 Var ( S 2 ( (77 2 2 σ 4 2( σ4 We ca compute the MSE of ˆσ 2 usig equatios 68, 74, ad 77 as follows 2 MSE ˆσ 2 E (ˆσ 2 σ 2 2( σ 4 2 + 2( σ4 2 + σ 4 ( 2( 2 + [( ] 2 σ 2 σ 2 ( 2 ( σ 4 2 ( 2 2 2 ( 2 σ 4 + σ 4 + 2 2 ( 2 2+ σ 4 2 2 + 2 2 +2 + 2 ( 2 σ 4 2 2 (78 Now compare the MSE s of S 2 ad ˆσ 2. ( ( 2 2 MSEˆσ 2 σ 4 < σ 4 2 MSE S 2 (79 So ˆσ 2 is a biased estimator of S 2 but has lower mea square error.