Statistical inference: example 1. Inferential Statistics



Similar documents
One-sample test of proportions

Hypothesis testing. Null and alternative hypotheses

1. C. The formula for the confidence interval for a population mean is: x t, which was

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Confidence Intervals for One Mean

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Practice Problems for Test 3

PSYCHOLOGICAL STATISTICS

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

Properties of MLE: consistency, asymptotic normality. Fisher information.

Chapter 7 Methods of Finding Estimators

5: Introduction to Estimation

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Output Analysis (2, Chapters 10 &11 Law)

I. Chi-squared Distributions

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)


Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Sampling Distribution And Central Limit Theorem

Confidence Intervals

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Confidence intervals and hypothesis tests

Overview of some probability distributions.

Normal Distribution.

Chapter 7: Confidence Interval and Sample Size

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

Determining the sample size

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Math C067 Sampling Distributions

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Lesson 17 Pearson s Correlation Coefficient

Lesson 15 ANOVA (analysis of variance)

Chapter 14 Nonparametric Statistics

Maximum Likelihood Estimators.

Topic 5: Confidence Intervals (Chapter 9)

1 Computing the Standard Deviation of Sample Means

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

STA 2023 Practice Questions Exam 2 Chapter 7- sec 9.2. Case parameter estimator standard error Estimate of standard error

A Mathematical Perspective on Gambling

A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

Quadrat Sampling in Population Ecology

Measures of Spread and Boxplots Discrete Math, Section 9.4

Section 11.3: The Integral Test

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

4.3. The Integral and Comparison Tests

This document contains a collection of formulas and constants useful for SPC chart construction. It assumes you are already familiar with SPC.

Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

1 Correlation and Regression Analysis

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Descriptive Statistics

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

Confidence Intervals for Linear Regression Slope

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

3 Basic Definitions of Probability Theory

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

LECTURE 13: Cross-validation

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Convexity, Inequalities, and Norms

Asymptotic Growth of Functions

Infinite Sequences and Series

THE TWO-VARIABLE LINEAR REGRESSION MODEL

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Our aim is to show that under reasonable assumptions a given 2π-periodic function f can be represented as convergent series

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Estimating Probability Distributions by Observing Betting Practices

Present Values, Investment Returns and Discount Rates

Theorems About Power Series

A probabilistic proof of a binomial identity

Hypergeometric Distributions

CHAPTER 3 DIGITAL CODING OF SIGNALS

CONTROL CHART BASED ON A MULTIPLICATIVE-BINOMIAL DISTRIBUTION

Parametric (theoretical) probability distributions. (Wilks, Ch. 4) Discrete distributions: (e.g., yes/no; above normal, normal, below normal)

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

OMG! Excessive Texting Tied to Risky Teen Behaviors

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Department of Computer Science, University of Otago

Modified Line Search Method for Global Optimization

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

Plug-in martingales for testing exchangeability on-line

Incremental calculation of weighted mean and variance

Transcription:

Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either as good quality or top quality. The agreemets require that the delivered goods comply with stadards predetermied quality. I particular, the proportio of good quality items must ot exceed 25% of the total. From a cosigmet 40 items are extracted ad 29 of these are of top quality whereas the remaiig 11 are of good quality. Statistical iferece is the brach of statistics cocered with drawig coclusios ad/or makig decisios cocerig a populatio based oly o sample data. INFERENTIAL PROBLEMS: 1. provide a estimate of π ad quatify the ucertaity associated with such estimate; 2. provide a iterval of reasoable values for π; 3. decide whether the delivered goods should be retured to the supplier. 187 188 Statistical iferece: example 1 Statistical iferece: example 2 Formalizatio of the problem: POPULATION: all the pieces of clothes of the cosigmet; VARIABLE OF INTEREST: good/top quality of the good biary variable; PARAMETER OF INTEREST: proportio of good quality items π; SAMPLE: 40 items extracted from the cosigmet. The value of the parameter π is ukow, but it affects the samplig values. Samplig evidece provides iformatio o the parameter value. 189 A machie i a idustrial plat of a bottlig compay fills oe-liter bottles. Whe the machie is operatig ormally the quatity of liquid iserted i a bottle has mea µ = 1 liter ad stadard deviatio σ =0.01 liters. Every workig day 10 bottles are checked ad, today, the average amout of liquid i the bottles is x = 1.0065 with s = 0.0095. INFERENTIAL PROBLEMS: 1. provide a estimate of µ ad quatify the ucertaity associated with such estimate; 2. provide a iterval of reasoable values for µ; 3. decide whether the machie should be stopped ad revised. 190

Formalizatio of the problem: Statistical iferece: example 2 POPULATION: all the bottles filled by the machie; VARIABLE OF INTEREST: amout of liquid i the bottles cotiuous variable; PARAMETERS OF INTEREST: mea µ ad stadard deviatio σ of the amout of liquid i the bottles; SAMPLE: 10 bottles. The values of the parameters µ ad σ are ukow, but they affect the samplig values. Samplig evidece provides iformatio o the parameter values. 191 The sample Cesus survey: attempt to gather iformatio from each ad every uit of the populatio of iterest; sample survey: gathers iformatio from oly a subset of the uits of the populatio of iterest. Why usig a sample? 1. Less time cosumig tha a cesus; 2. less costly to admiister tha a cesus; 3. measurig the variable of iterest may ivolve the destructio of the populatio uit; 4. a populatio may be ifiite. 192 Probability samplig A probability samplig scheme is oe i which every uit i the populatio has a chace greater tha zero of beig selected i the sample, ad this probability ca be accurately determied. Probabilistic descriptio of a populatio SIMPLE RANDOM SAMPLING: every uit has a equal probability of beig selected ad the selectio of a uit does ot chage the probability of selectig ay other uit. For istace: extractio with replacemet; extractio without replacemet. For large populatios compared to the sample size the differece betwee these two samplig techiques is egligible. I the followig we will always assume that samples are extracted with replacemet from the populatio of iterest. 193 Uits of the populatio; variable X measured o the populatio uits; sometimes the distributio of X is kow, for istace i X Nµ,σ 2 ; ii X Beroulliπ. 194

Probabilistic descriptio of a sample The observed samplig values are x 1,x 2,...,x ; BEFORE the sample is observed the samplig values are ukow ad the sample ca be writte as a sequece of radom variables Samplig distributio of a statistic 1 Suppose that the sample is used to compute a give statistic, for istace i the sample mea X; ii the sample variace S 2 ; iii the proportio P of uits with a give feature; X 1,X 2,...,X for simple radom samples with replacemet: 1. X 1,X 2,...,X are i.i.d.; 2. the distributio of X i is the same as that of X for every i = 1,...,. 195 geerically, we cosider a arbitrary statistic T = gx 1,...,X where g is a give fuctio. 196 Samplig distributio of a statistic 2 Suppose that X Nµ,σ 2 Normal populatio Oce the sample is observed, the observed value of the statistic is give by t = gx 1,...,x ; suppose that we draw all possible samples of size from the give populatio ad that we compute the statistic T for each sample; the samplig distributio of T is the distributio of the populatio of the values t of all possible samples. i this case the statistics of iterest are: i the sample mea X = 1 X i i=1 ii the sample variace S 2 = 1 X i X 2 1 i=1 the correspodig observed values are x = 1 x i ad s 2 = 1 x i x 2, i=1 1 i=1 respectively. 197 198

Expected value of the sample mea The sample mea The sample mea is a liear combiatio of the variables formig the sample ad this property ca be exploited i the computatio of the expected value of X, that is E X; the variace of X, that is Var X ; the probability distributio of X. For a simple radom sample X 1,...,X, the expected value of X is X1 +X E X = E 2 + +X = 1 EX 1 +X 2 + +X = 1 [EX 1+EX 2 + +EX ] = 1 µ = µ 199 200 Variace of the sample mea For a simple radom sample X 1,...,X, the variace of X is Var X X1 +X = Var 2 + +X = 1 2VarX 1 +X 2 + +X Samplig distributio of the mea For a simple radom sample X 1,X 2,...,X, the sample mea X has expected value µ ad variace σ 2 /; if the distributio of X is ormal, the = 1 2 [VarX 1+VarX 2 + +VarX ] = 1 2 σ2 = σ2 X N µ, σ2 more geerally, the cetral limit theorem ca be applied to state that the distributio of X is APPROXIMATIVELY ormal. 201 202

The sample variace The chi-squared distributio The sample variace is defied as S 2 = 1 X i X 2 1 i=1 if X i Nµ,σ 2 the 1S 2 σ 2 χ 2 1 Let Z 1,...,Z r be i.i.d. radom variables with distributio N0;1; the radom variable X = Z 2 1 + +Z2 r is said to follow a CHI-SQUARED distributio with r degrees of freedom d.f.; we write X χ 2 r; X ad S 2 are idepedet. EX = r ad VarX = 2r. 203 204 Coutig problems The variable X is biary, i.e. it takes oly two possible values; for istace success ad failure ; the radom variable X takes values 1 success ad 0 failure; the parameter of iterest is π, the proportio of uits i the populatio with value 1 ; Formally, X Beroulliπ so that EX = π ad VarX = π1 π. The sample proportio 1 Simple radom sample X 1,...,X ; the variable X takes two values: 0 ad 1, ad the sample proportio is a special case of sample mea i=1 X i P = the observed value of P is p. 205 206

The sample proportio 2 The sample proportio is such that Estimatio EP = π ad VarP = π1 π for the cetral limit theorem, the distributio of X is approximatively ormal; sometimes the followig empirical rules are used to decide if the ormal approximatio is satisfyig: 1. π > 5 ad 1 π > 5. Parameters are specific umerical characteristics of a populatio, for istace: a proportio π; a mea µ; a variace σ 2. Whe the value of a parameter is ukow it ca be estimated o the basis of a radom sample. 2. p1 p > 9. 207 208 Poit estimatio A poit estimate is a estimate that cosists of a sigle value or poit, for istace oe ca estimate a mea µ with the sample mea x; a proportio π with a sample proportio p; Estimator vs estimate A estimator of a populatio parameter is a radom variable that depeds o sample iformatio, whose value provides a approximatio to this ukow parameter. a poit estimate is always provided with its stadard error that is a measure of the ucertaity associated with the estimatio process. A specific value of that radom variable is called a estimate. 209 210

Estimatio ad ucertaity Poit estimatio of a mea σ 2 kow Parameter θ; the samplig statistics T = gx 1,...,X o which estimatio is based is called the estimator of θ ad we write ˆθ = T the observed value of the estimator, t, is called a estimate of θ ad we write ˆθ = t; it is fudametal to assess the ucertaity of ˆθ; a measure of ucertaity is the stadard deviatio of the estimator, that is SDT = SDˆθ. This quatity is called the STANDARD ERROR of ˆθ ad deoted by SEˆθ. Cosider the case where X 1,...,X is a simple radom sample from X Nµ,σ 2 ; Parameters: µ, ukow; assume that the value of σ 2 is kow. the sample mea ca be used as estimator of µ: µ = X; the distributio of the estimator is ormal with STANDARD ERROR µ E µ = µ ad Var µ = σ2 SE µ = σ 211 212 Poit estimatio of a mea σ 2 ukow Poit estimatio of a mea with σ 2 kow: example I the bottlig compay example, assume that the quatity of liquid i the bottles is ormally distributed. The a poit estimate of µ is Typically the value of σ 2 is ot kow; i this case we estimate it as ˆσ 2 = s 2 ; this ca be used, for istace, to estimate the stadard error of ˆµ µ = 1.0065 ad the stadard error of this estimate is SEˆµ = σ 10 = 0.01 10 = 0.0032 ŜEˆµ = ˆσ. I the bottlig compay example, if σ is ukow it ca be estimated as ŜEˆµ = 0.0095 10 = 0.0030 213 214

Poit estimatio of a proportio Parameter: π; Poit estimatio for the mea of a o-ormal populatio X 1,...,X i.i.d. with EX i = µ ad VarX i = σ 2 ; the distributio of X i is ot ormal; for the cetral limit theorem the distributio of X is approximatively ormal. the sample proportio P is used as a estimator of π π = P this estimator is approximately ormally distributed with E π = π ad Var π = π1 π the STANDARD ERROR of the estimator is π1 π SE π = ad i this case the value of stadard error is ever kow. 215 216 Estimatio of a proportio: example For the clothig store chai example the estimate of the proportio π of good quality items is π = 11 40 = 0.275 ad a ESTIMATE of the stadard error is ŜEˆπ = 0.2751 0.275 = 0.07 40 Properties of estimators: ubiasedess A poit estimator ˆθ is said to be a ubiased estimator of the parameter θ if the expected value, or mea, of the samplig distributio of ˆθ is θ, formally if Eˆθ = θ Iterpretatio of ubiasedess: if the samplig process was repeated, idepedetly, a ifiite umber of times, obtaiig i this way a ifiite umber of estimates of θ, the arithmetic mea of such estimates would be equal to θ. However, ubiasedess does ot guaratees that the estimate based o oe sigle sample coicides with the value of θ. 217 218

Poit estimator of the variace The sample variace S 2 is a ubiased estimator of the variace σ 2 of a ormally distributed radom variable ES 2 = σ 2. Bias of a estimator Let ˆθ be a estimator of θ. The bias of ˆθ, Biasˆθ, is defied as the differece betwee the expected value of ˆθ ad θ O the other had S 2 is a biased estimator of σ 2 Biasˆθ = Eˆθ θ E S 2 = 1 σ 2. The bias of a ubiased estimator is 0. 219 220 Properties of estimators: Mea Squared Error MSE For a estimator ˆθ of θ the ukow estimatio error is give by θ ˆθ The Mea Squared Error MSE is the expected value of the square of the error MSEˆθ = E[θ ˆθ 2 ] = Var ˆθ +[θ Eˆθ] 2 = Var ˆθ +Biasˆθ 2 Hece, for a ubiased estimator, the MSE is equal to the variace. 221 Most Efficiet Estimator Let ˆθ 1 ad ˆθ 2 be two estimator of θ, the the MSE ca be use to compare the two estimators; if both ˆθ 1 ad ˆθ 2 are ubiased the ˆθ 1 is said to be more efficiet tha ˆθ 2 if Var ˆθ 1 < Var ˆθ 2 ote that if ˆθ 1 is more efficiet tha ˆθ 2 the also MSEˆθ 1 < MSEˆθ 2 ad SEˆθ 1 < SEˆθ 2 ; the most efficiet estimator or the miimum variace ubiased estimator of θ is the ubiased estimator with the smallest variace. 222

Iterval estimatio A poit estimate cosists of a sigle value, so that if X is a poit estimator of µ the it holds that P X = µ = 0 more geerally, Pˆθ = θ = 0. Iterval estimatio is the use of sample data to calculate a iterval of possible or probable values of a ukow populatio parameter. Cofidece iterval for the mea of a ormal populatio σ kow X 1,...,X simple radom sample with X i Nµ,σ 2 ; assume σ kow; a poit estimator of µ is ˆµ = X N µ, σ2 the stadard error of the estimator is SEˆµ = σ 223 224 Before the sample is extracted... The sample distributio of the estimator is completely kow but for the value of µ; the ucertaity associated with the estimate depeds o the size of the stadard error. For istace, the probability that ˆµ = X takes a value i the iterval µ±1.96 SE is 0.95 that is 95%. Cofidece iterval for µ The probability that X belogs to the iterval µ 1.96SE, µ+1.96se is 95%; this ca be also stated as: the probability that the iterval X 1.96SE, X +1.96SE area 95% cotais the parameter µ is 95% -1.96 SE +1.96 SE µ 3 SE µ 2 SE µ 1 SE µ µ + 1 SE µ + 2 SE µ + 3 SE µ µ P µ 1.96 SE X µ+1.96 SE = 0.95 225 226

Formal derivatio of the 95% cofidece iterval for µ σ kow It holds that so that X µ SE 0.95 = P N 0,1 where SE = σ 1.96 X µ SE 1.96 = P 1.96 SE X µ 1.96 SE = P X 1.96 SE µ X +1.96 SE = P X 1.96 SE µ X +1.96 SE Cofidece iterval for µ with σ kow: example I the bottlig compay example, if oe assumes σ = 0.01 kow, a 95% cofidece iterval for µ is that is so that 1.0065 1.96 0.01 ; 1.0065+1.96 0.01 10 10 1.0065 0.0062; 1.0065 + 0.0062 1.0003; 1.0126 227 228 After the sample is extracted... O the basis of the sample values the observed value of µ = x is computed. x may belog to the iterval µ±1.96 SE or ot. For istace a differet sample... A differet sample may lead to a sample mea x that, as i the example below, does ot belog to the iterval µ±1.96 SE ad, as a cosequece, also the iterval x 1.96 SE; x+1.96 SE will ot cotai µ. area 95% x area 95% x µ 3 SE µ 2 SE µ 1 SE µ µ + 1 SE µ + 2 SE µ + 3 SE ad i this case x belogs to the iterval µ±1.96 SE ad, as a cosequece, also the iterval x 1.96 SE; x+1.96 SE will cotai µ. 229 µ 3 SE µ 2 SE µ 1 SE µ µ + 1 SE µ + 2 SE µ + 3 SE The iterval x 1.96 SE; x+1.96 SE will cotai µ for the 95% of all possible samples. 230

Iterpretatio of cofidece itervals Probability is associated with the procedure that leads to the derivatio of a cofidece iterval, ot with the iterval itself. A specific iterval either will cotai or will ot cotai the true parameter, ad o probability ivolved i a specific iterval. Cofidece iterval: defiitio A cofidece iterval for a parameter is a iterval costructed usig a procedure that will cotai the parameter a specified proportio of the times, typically 95% of the times. Cofidece itervals for five differet samples of size = 25, extracted from a ormal populatio with µ = 368 ad σ = 15. A cofidece iterval estimate is made up of two quatities: iterval: set of scores that represet the estimate for the parameter; cofidece level: percetage of the itervals that will iclude the ukow populatio parameter. 231 232 A wider cofidece iterval for µ Sice it also holds that P µ 2.58 SE X µ+2.58 SE = 0.99 Cofidece level The cofidece level is the percetage associated with the iterval. A larger value of the cofidece level will typically lead to a icrease of the iterval width. The most commoly used cofidece levels are area 99% x area 99% x 68% associated with the iterval X ±1SE; 95% associated with the iterval X ±1.96SE; 99% associated with the iterval X ±2.58SE. µ 3 SE µ 2 SE µ 1 SE µ µ + 1 SE µ + 2 SE µ + 3 SE µ 3 SE µ 2 SE µ 1 SE µ µ + 1 SE µ + 2 SE µ + 3 SE the the probability that X 2.58SE, X+2.58SE cotais µ is 99%. Where the values 1, 1.96 ad 2.58 are derived from the stadard ormal distributio tables. 233 234

Z N0,1; Notatio: stadard ormal distributio tables α value betwee zero ad oe; z α value such that the area uder the Z pdf betwee z α ad + is equal to α; formally furthermore PZ > z α = α ad PZ < z α = 1 α P z α/2 < Z < z α/2 = 1 α 235 Cofidece iterval for µ with σ kow: formal derivatio 1 It holds that so that X µ SE or, equivaletly, N 0,1 where SE = σ P µ z α/2 SE X µ+z α/2 SE = 1 α P z α/2 X µ SE z α/2 = 1 α 236 Cofidece iterval at the level 1 α for µ with σ kow Cofidece iterval for µ with σ kow: formal derivatio 2 1 α = P z α/2 X µ SE z α/2 = P z α/2 SE X µ z α/2 SE = P X z α/2 SE µ X +z α/2 SE = P X z α/2 SE µ X +z α/2 SE A cofidece iterval at the cofidece level 1 α, or 1 α%, for µ is give by Sice SE = σ the X z α/2 SE; X +z α/2 SE X z α/2 σ ; X +z α/2 σ 237 238

Margi of error The cofidece iterval Reducig the margi of error x±z α/2 σ ca also be writte as x±me where ME = z α/2 σ ME = z α/2 σ is called the margi of error. The margi of error ca be reduced, without chagig the accuracy of the estimate, by icreasig the sample size. the iterval width is equal to twice the margi of error. 239 240 Cofidece iterval for µ with σ ukow The Studet s t distributio 1 X N µ, σ2 ; X µ SE N0,1; i this case the stadard error is ukow ad eeds to be estimated. For Z N0;1 ad X χ 2 r, idepedet; the radom variable T = Z X/r is said to follow a Studet s t distributio with r degrees of freedom; the pdf of the t distributio differs from that of the stadard ormal distributio because it has heavier tails. SEˆµ = σ is estimated by ŜEˆµ = ˆσ where ˆσ = S Studet s t ormal ad it holds that X µ ŜE t 1 241 4 3 2 1 0 1 2 3 4 t 1 ad N0;1 compariso. 242

Cofidece iterval for µ with σ ukow The Studet s t distributio 2 For r + the Studet s t distributio coverges to the stadard ormal distributio. 0.4 0.3 0.2 0.1 5 4 3 2 1 0 1 2 3 4 5 t 25 ad N0;1 compariso.. 1 α = P t 1,α/2 X µ ŜE t 1,α/2 = P t 1,α/2 ŜE X µ t 1,α/2 ŜE = P X t 1,α/2 ŜE µ X +t 1,α/2 ŜE = P X t 1,α/2 ŜE µ X +t 1,α/2 ŜE where t 1,α/2 is the value such that the area uder the t pdf, with 1 d.f. betwee t 1,α/2 ad + is equal to α/2. Hece, a cofidece iterval at the level 1 α for µ is X t 1,α/2 S ; X +t 1,α/2 S 243 244 Cofidece iterval for µ with σ ukow: example For the bottlig compay example, if the value of σ is ot kow, the s = 0.0095 e t 9;0.025 = 2.2622 ad a 95% cofidece iterval for µ is that is so that 1.0065 2.2622 0.0095 ; 1.0065+2.2622 0.0095 10 10 1.0065 0.0068; 1.0065 + 0.0068 Cofidece iterval for the mea of a o-ormal populatio X 1,...,X i.i.d. with EX i = µ ad VarX i = σ 2 ; the distributio of X i is ot ormal; for the cetral limit theorem the distributio of X is approximatively ormal; if oe uses the procedures described above to costruct a cofidece iterval for µ the omial cofidece level of the iterval is oly a approximatio of the true cofidece level. 0.9997; 1.0133 245 246

For the cetral limit theorem so that P π SE 1 α P Cofidece iterval for π N 0,1 where SE = z α/2 P π SE z α/2 π1 π = P z α/2 SE P π z α/2 SE = P P z α/2 SE π P +z α/2 SE = P P z α/2 SE π P +z α/2 SE Cofidece iterval for π: example For the clothig store chai example, a 95% cofidece iterval for π is 11 40 1.96 SEˆπ; 11 +1.96 SEˆπ 40 so that π1 π SEˆπ = is estimated by ŜEˆπ = ˆπ1 ˆπ where ˆπ = x = 11 40 ad oe obtais 11 40 1.96 0.07 ; 11 40 +1.96 0.07 so that Sice π is always ukow, it is always ecessary to estimate the stadard error. 247 0.137; 0.413 248 Example of decisio problem Problem: i the example of the bottlig compay, the quality cotrol departmet has to decide whether to stop the productio i order to revise the machie. Hypothesis: the expected mea quatity of liquid i the bottles is equal to oe liter. The stadard deviatio is assumed kow ad equal to σ = 0.01. The decisio is based o a simple radom sample of = 10 bottles. Statistical hypotheses A decisioal problem i expressed by meas of two statistical hypotheses: the ull hypothesis H 0 the alterative hypothesis H 1 the two hypotheses cocer the value of a ukow populatio parameter, for istace µ, { H0 : µ = µ 0 H 1 : µ µ 0 249 250

Distributio of X uder H 0 If H 0 is true that is uder H 0 the distributio of the sample mea X has expected value equal to µ 0 = 1; has stadard error equal to SE = σ/ 10 = 0.00316. if X 1,...,X 10 is a ormally distributed i.i.d. sample tha also X follows a ormal distributio, otherwise the distributio of X is oly approximatively ormal by the cetral limit theorem. Observed value of the sample mea The observed value of the sample mea is x. x is almost surely differet form µ 0 = 1. uder H 0, the expected value of X is equal to µ 0 ad the differece betwee µ 0 ad x is uiquely due to the samplig error. HENCE THE SAMPLING ERROR IS x µ 0 that is 0.9905 0.9937 0.9968 1.0000 1.0032 1.0063 1.0095 observed value mius expected value 251 252 Decisio rule The space of all possible sample meas is partitioed ito a Outcomes ad probabilities There are two possible states of the world ad two possible decisios. This leads to four possible outcomes. rejectio regio also said critical regio; orejectio regio. H 0 IS REJECTED H 0 TRUE H 0 FALSE Type I error α OK H 0 IS NOT REJECTED OK Type II error β The probability of the type I error is said sigificace level of the test ad ca be arbitrarily fixed typically 5%. 253 254

Test statistic A test statistic is a fuctio of the sample, that ca be used to perform a hypothesis test. for the example cosidered, X is a valid test statistics, which is equivalet to the, more commo, z test statistic Hypothesis testig: example 5% sigificace level arbitrarily fixed; Z = X 1 0.00316 N0,1 Z = X µ 0 σ/ N0,1 the observed value of Z is z = 1.0065 1 0.00316 = 2.055; the empirical evidece leads to the rejectio of H 0. 255 256 p-value approach to testig the p-value, also called observed level of sigificace is the probability of obtaiig a value of the test statistic more extreme tha the observed sample value, uder H 0. decisio rule: compare the p-value with α: p-value < α = reject H 0 p-value α = oreject H 0 z test for µ with σ kow X 1,...,X i.i.d. with distributio Nµ,σ 2 ; Hypotheses: { H0 : µ = µ 0 H 1 :... for the example cosidered p-value=pz 2.055+PZ 2.055 = 0.04; test statistic: Z = X µ 0 σ/ p-value 5% = statistically sigificat result. p-value 1% = highly sigificat result. uder H 0 the test statistic Z has distributio N0; 1 257 258

z test: two-sided hypothesis z test: oe-sided hypothesis right H 1 : µ µ 0 H 1 : µ > µ 0 i this case i this case p value = PZ > z p value = PZ > z 3.5 z 1.0 0.0 1.0 z 3.0 3.5 3.0 2.0 1.0 0.0 1.0 z 3.0 3.5 259 260 z test: oe-sided hypothesis left z test for µ with σ ukow H 1 : µ < µ 0 Hypotheses: i this case { H0 : µ = µ 0 H 1 : µ µ 0 p value = PZ < z test statistic: t = X µ 0 S/ 3.5 z 1.0 0.0 1.0 2.0 3.0 p-value: PT 1 > t where T 1 follows a Studet s t distributio with 1 degrees of freedom. 261 262

z test for π Test for a proportio Hypotheses: Null hypotheses: H 0 : π = π 0 ; Uder H 0 the samplig distributio of P is approximately ormal with expected value EP = π 0 ad stadard error π SEP = 0 1 π 0 Note that uder H 0 there are o ukow parameters. test statistic: { H0 : π = π 0 H 1 : π π 0 P π Z = 0 π 0 1 π 0 / P-value: PZ > z 263 264 z test for π: example For the clothig store chai example, the hypotheses are { H0 : π = 0.25 H 1 : π > 0.25 Hece, uder H 0 the stadard error is 0.251 0.25 SE = = 0.068 40 so that z = 0.275 0.25 = 0.37 0.068 ad the p-value is PZ 0.37 = 0.36 ad the ull hypothesis caot be rejected. 265