Hypothesis testing. Null and alternative hypotheses

Transcription

1 Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate that the populatio mea is equal to some specified value ad the use sample iformatio to decide whether the hypothetical value ca be rejected or ot i the light of sample evidece. The decisio will deped o (1) the size of the differece betwee the hypothetical populatio mea ad the sample mea, () the size of the samplig error associated with the sample mea, ad (3) the degree of certaity the decisio-maker requires before rejectig the iitial hypothesis. Null ad alterative hypotheses First we set up what is kow as the ull hypothesis, H 0, about the populatio parameter, e.g. we may claim that the populatio mea µ is equal to some value µ 0, say. This is usually writte as H 0 :µ=µ 0. We the stipulate a alterative hypothesis, H 1, which may state, e.g., that the populatio mea is ot equal to µ 0, H 1 :µ µ 0. The purpose of hypothesis testig is to see if we have sufficiet evidece to reject the ull hypothesis. Typically, the ull hypothesis says that there is othig uusual or importat about the data we are cosiderig; for example, if we were lookig at the average test scores of childre who have received a particular teachig method, the ull hypothesis would be that the mea is equal to the atioal average. If we are testig a ew drug, ad are lookig at the proportio of people takig the drug whose coditio improves, we would take as our ull the proportio who improve with a placebo, or with a previous drug. If we are lookig for a relatioship betwee two variables, the ull hypothesis is usually that there is o relatioship, that is that the regressio coefficiet betwee them is 0. The alterative hypothesis is thus that there is somethig iterestig or differet about the populatio for example that the average test score from the ew teachig method is ot equal to the atioal average, or that the proportio who improve with the ew drug is ot equal to the previous rate, or that there is a relatioship betwee the two variables, so that the regressio coefficiet is ot equal to 0.

2 We treat H 0 as our default positio, ad we usually require quite strog evidece to reject the ull hypothesis typically 90%, 95% or 99%, depedig o the cotext. Test statistic Havig set up our ull ad alterative hypotheses, we look for a suitable test statistic that will give us evidece for or agaist the two hypotheses. For example, if we are lookig for evidece about the populatio mea (H 0 :µ=µ 0 vs. H 1 :µ µ 0 ), we will most likely use a statistic based o the sample mea, X. From our work i sectio 4, a suitable statistic (assumig we ow the stadard deviatio σ of the populatio) is X µ 0 Z = - that is, we measure X -µ 0 i terms of the Stadard Error ( σ / ) of X as a estimator for µ, which is equal to σ/. For large samples, 30, we kow that the distributio of X is ormal, so that Z will be a stadard ormal variable, that is Z N(0,1). The larger is ( X -µ 0 ), the bigger is Z, ad the less credible it is that H 0 is correct. So essetially what we are tryig to do is to measure whether the sample mea, X, is sigificatly differet from µ 0. Decisio rule We ow have to decide how large Z must be for us to reject H 0. This is related to the risk we are prepared to take of a icorrect decisio. I decidig whether to accept or reject a ull hypothesis, there are two types of error we may make: A Type 1 error is to reject the ull hypothesis whe it is correct. A Type error is to accept the ull hypothesis whe it is icorrect. We usually specify our decisio rule i terms of the probability of a type 1 error we are prepared to accept, deoted α. Depedig o α, we ca calculate critical values of the test statistic Z, so that if Z lies beyod the critical values, we reject H 0, while if Z lies withi the critical values, we accept H 0. Thus, i the case of the populatio mea, if our acceptable level of Type 1 error is α=0.05, the the critical values of the test statistic will be

3 Z=±1.96, sice we kow from sectio 4 that, if H 0 is true ad µ=µ 0, the P(-1.96<Z<1.96)=0.95. Hece we kow that, if µ=µ 0, there would be a less tha 5% probability of obtaiig a value of greater tha 1.96 or less tha -1.96, so that the probability of a type 1 error i rejectig H 0 is less tha 5%. If we obtai a value of Z betwee the critical values, we coclude that we do ot have sufficiet evidece to reject H 0, so we accept it. The acceptable probability of Type 1 error is also called the sigificace level of the test. If, say, α=5%, ad we reject H 0, we will say that we reject H 0 at the 5% level of sigificace, or that X is sigificatly differet from µ 0 at the 5% level of sigificace, etc. Thus, we set up our decisio rule to give H 0 the beefit of the doubt. We require 95% cofidece to reject it. Note agai that if we reject the ull hypothesis, we are ot sayig there is a 95% probability that µ µ 0. µ is a costat which either is equal to µ 0 or it is t. What we are sayig is that, if µ were equal to µ 0, there would be a 95% chace of obtaiig a test statistic betwee the critical values. Oly 5% of the time would we obtai a value for Z that would lead us to reject H 0. Hece P(Reject H 0 H 0 true) Note that if we were prepared to accept a Type 1 error probability of 10%, we would set our critical values at Z=±1.645, while if we were oly prepared to accept a 1% Type 1 error, we would set critical values of Z=±.58. Power of a test The power of a hypothesis test is the probability β of a Type error. Give two tests of a hypothesis H 0, we say that oe test is more powerful tha the other if, give a specified level of Type 1 error, it has a lower probability of Type error. Example Suppose we kow that average household icome i the populatio is 300 p.w., with stadard deviatio 50 per week. We are tryig to see whether households i a particular tow have a higher or lower average icome. We take a radom sample of 100 households i the tow, ad fid a average icome of 85 p.w. We wish to test the hypothesis that

4 average household icome i the tow is equal to the atioal average, with a 5% level of sigificace. Here H 0 is µ= 300, ad H 1 is µ 300. X µ 0 Our test statistic is Z=, with µ 0 =300, σ=50, ad =100. From the ( σ / ) sample, X =85. Hece, Z=(85-300)/(50/ 100) = -15/5 = -3. Give a 5% sigificace level, the critical values of the Z statistic are ±1.96. Our decisio rule is to accept H 0 if -1.96<Z<1.96, ad reject H 0 otherwise. Hece, we reject H 0, ad coclude that µ 300. I fact, we may coclude that the average household icome i this tow is sigificatly less tha the atioal average, at the 5% (or ideed at the 1%) level of sigificace. Two-tailed ad oe-tailed tests The example above ivolved a two-tailed test of sigificace that is, we were tryig to see if X was sigificatly higher or sigificatly lower tha µ 0. That is, H 1 was specified as µ µ 0. I a oe-tailed test, the alterative hypothesis is H 1 :µ>µ 0, or Hµ<µ 0. This would be appropriate if we had some a priori reaso to believe that we were likely to fid a differece i a particular directio. For example, if we were tryig to see if graduates have the same icome as the rest of the populatio, we might use a 1-tailed test, as we would aturally assume that graduates ted to ejoy a higher icome, so H 1 would be that µ>µ 0, where µ is graduate average icome, ad µ 0 is the average for the whole populatio. Whe we use a 1-tailed test, the critical value of Z is differet. For example, at the 5% level of sigificace, we would use a critical value for Z of 1.645, istead of ±1.96, sice P(Z>1.645 H 0 )=5%. (Hece ±1.645 as the 10% critical value for a -tailed test, sice P(Z< H 0 ) is also 5%, so we have 5% i each tail.) If our alterative hypothesis were µ<µ 0, the our critical value would be Z=-1.645, rejectig H 0 if Z falls below this.

5 1-tailed vs. Two-tailed test f(z).5%.5%.5% Z Proportios The procedure ad ratioale for testig hypotheses about populatio proportios are similar to those used for meas. They are based o the ormal distributio ad apply to large samples, 30. The ull hypothesis is specified i terms of the populatio proportio P, ad the sample proportio, p, ad the stadard error, SE(p)=( P(1-P))/ are used i the test statistic. For example, suppose we wish to test the ull hypothesis that the proportio of households i a certai tow with at least oe wage-earer is We have a radom sample of 100 households, ad the proportio of the sample with at least oe wageearer is p=0.81. We have H 0 : P=P 0 =0.85 H 1 :P Z = P p P 1 P ) 0 ( 0 0 = * = -.04/.0357 = Note that we use the stadard error calculated from the populatio proportio based o the ull hypothesis this is because we are tryig to say If the ull hypothesis were true, how likely would it be to get this

6 much differece betwee the sample proportio ad populatio proportio?. So we cosider the probability distributio of the test statistic that would apply if the ull hypothesis were true. As 1.10<1.96, the r% level of sigificace -tailed critical value of the Z statistic, we caot reject H 0, i other words the sample proportio is ot sigificatly differet from 0.85 (at the 5% level). We therefore accept H 0. Differece betwee two sample meas So far we have made ifereces o a sigle sample. Now we shall make ifereces from two samples. Typically we shall have two radom samples from two populatios ad we shall be makig ifereces about the differeces betwee the meas of the two populatios usig the differece betwee the two sample meas. For example, we may be iterested i testig whether boys are achievig sigificatly differet results i school tha girls. To be able to aswer such a questio, we first eed to study the samplig distributio of the differece betwee two sample meas. If a radom sample of size 1 is take from oe populatio with mea µ 1 ad variace σ 1, ad aother radom sample of size is take from aother populatio with mea µ ad variace σ, the differece betwee the two sample meas is defied as d=( X 1 X ) where X 1 ad X are idepedet radom variables because they will ot vary from oe set of two samples to aother, ad because chages i X 1 are ot iflueced by chages i X ad vice-versa. E(d) = E( X 1- X ) = E( X 1)-E( X ) = µ 1 -µ = D. i.e. the sample differece (d) is a ubiased estimator of the populatio differece D. Var(d) = Var( X 1 X ) = Var( X 1) + Var( X ) = (σ 1 / 1 ) + (σ+ / ) Sice X 1 ad X are idepedet.

7 σ The stadard error of d is give by SE(d)= 1 σ + ad shows that the larger are the two variaces ad the smaller the sample sizes, the larger will be the samplig error of d. If X 1 ad X are ormally distributed, the X 1 ad X are also ormally distributed. Also, if both samples are large ( 1, 30), the eve if X ad X are ot ormally distributed, the Cetral Limit Theorem esures that X 1 ad X will be approximately ormally distributed. If either of these is true, the d will also be ormally distributed, as the differece betwee two ormal variables. Thus, σ d=( X 1 X ) N[(µ 1 -µ ), 1 σ + ] The cofidece iterval for the differece betwee the populatio meas ca ow be easily calculated. The 95% cofidece iterval is (µ 1 -µ ) = ( X 1 X ) ±1.96 σ 1 σ + The calculated cofidece iterval will cotai the true populatio differece i 95% of samples. Hece, the hypothesis test for the populatio differece ca also be performed i the usual maer. Let H 0 : µ 1 -µ =0, ad H 1 :µ 1 -µ 0. The test statistic is Z = ( X1 X ) 0, σ σ + 1 ad the decisio rule, for a 5% sigificace level, will be to reject H 0 if Z 1.96, otherwise accept H 0. Example A school wats to fid out if there is a differece i test performace betwee boys ad girls. A sample of test scores of 60 boys ad 50 girls is

8 examied. It is foud that the boys have sample mea X 1=54 with stadard deviatio 14, ad the girls have sample mea X =60, with stadard deviatio 9. NB: we shall igore for ow the problem of estimatig the populatio stadard deviatios, ad assume these figures are correct. We set up H 0 : X 1- X =0 H 1 : X 1- X 0. Our test statistic is ( X 1 X ) 0 σ 1 σ + = = -6/ (4.68) = As usual, for a 5% level of sigificace o a two-tailed test, our critical value for Z is ±1.96, so we do ot have sufficiet evidece to reject the ull hypothesis. Girls are doig better, but ot sigificatly better. Differece betwee two sample proportios This ca be tested i a similar maer. Exercise Two differet teachig methods are tried with differet groups of studets o the same course. I the first group, 47 out of 63 studets pass. I the secod group, 66 out of 78 pass. The departmet wats to work out whether oe teachig method is sigificatly better tha the other. Formulate suitable ull ad alterative hypotheses, ad calculate a suitable test statistic, to test this.