Te sts o f S ig n ifi ca n ce Outline: G eneral Pro ced ure fo r H y p o th esis Testing N ull and A lternativ e H y p o th eses Test S tatistics p-v alues Interp retatio n o f th e S ig nifi cance L ev el Tests fo r a Po p ulatio n M ean Interp retatio n o f p-v alues S tatistical v s. Practical S ig nifi cance C o nfi d ence Interv als and H y p o th esis Tests Po tential A b uses o f Tests A co n fi d en ce in terval is a very u sefu l statistical in feren ce to o l w h en th e g o al is to estim ate a po pu latio n param eter. W h en th e g o al is to assess th e evid en ce pro vid ed by th e d ata in favo r o f so m e claim abo u t th e po pu latio n, test o f sig n ifi c a n c e are u sed. E x a m p le: F illin g C o k e B o ttles A m ach in e at a C o k e pro d u ctio n plan t is d esig n ed to fi ll bo ttles w ith 1 6 o z o f C o k e. T h e actu al am o u n t varies slig h tly fro m bo ttle to bo ttle. Fro m past ex perien ce, it is k n o w n th at th e S D 0.2 o z. A S R S o f 1 0 0 bo ttles fi lled by th e m ach in e h as a m ean 1.9 4 o z per bo ttle. Is th is evid en ce th at th e m ach in e n eed s to be recalibrated, o r co u ld th is d iff eren ce be a resu lt o f ran d o m variatio n? 1 2 General P ro c ed u re fo r H y p o theses Testing Te stin g H y p o the se s A hy p o the sis te st is an assessm en t o f the evid en ce pro vid ed by the d ata in favo r o f (o r ag ain st) so m e claim abo u t the po pu latio n. Fo r ex am ple, su ppo se we perfo rm a ran d o m ized ex perim en t o r tak e a ran d o m sam ple an d calcu late so m e sam ple statistic, say the sam ple m ean. We wan t to d ecid e if the observed valu e o f the sa m ple statistic is co n sisten t with so m e h y poth esized valu e o f the co rrespo n d in g popu la tion param eter. If the o bserved an d hypo thesized valu e d iff er (as they alm o st certain ly will), is the d iff eren ce d u e to an in co rrect hypo thesis o r m erely d u e to chan ce variatio n? 1. Fo rm u late the nu ll hy p o thesis and the alternative hy p o thesis T he nu ll hy p o thesis H 0 is the statem ent being tested. U su ally it states that the d iff erence between the o bserved valu e and the hy p o thesized valu e is o nly d u e to chance variatio n. Fo r ex am p le, µ = o z. T he alternativ e hy p o thesis H a is the statem ent we will favo r if we fi nd evid ence that the nu ll hy p o thesis is false. It u su ally states that there is a real d iff erence between the o bserved and hy p o thesiz ed valu es. Fo r ex am p le, µ, µ >, o r µ <. A test is called two -sid ed if H a is o f the fo rm µ. o ne-sid ed if H a is o f the fo rm µ <. µ >, o r 3 4
General P ro c ed u re fo r H y p o th eses Testing c o nt... Example: G R E S c o res The m ean sco re o f all ex am in ees o n the Verb al an d Q u an titative sectio n s o f the G R E is ab o u t 1 0 4 0. S u p p o se 0 ran d o m ly sam p led U C B erk eley g rad u ate stu d en ts have a m ean G R E V+ Q sco re o f 1 3 1 0. We are in terested in d eterm in in g if a m ean G R E V+ Q sco re o f 1 3 1 0 g ives evid en ce that, as a w ho le, B erk eley g rad u ate stu d en ts have a hig her m ean G R E sco re than the n atio n al averag e. What is H 0? What is H a? 2. C alcu late the test statistic o n which the test will be based. T he test statistic m easu res the d iff erence between the o bserved d ata and what wo u ld be ex p ected if the nu ll hyp o thesis were tru e. W hen H 0 is tru e, we ex p ect the estim ate based o n the sam p le to tak e a valu e near the p aram ater valu e sp ecifi ed by H 0. O u r g o al is to answer the q u estio n, H o w ex trem e is the valu e calcu lated fro m the sam p le fro m what we wo u ld ex p ect u nd er the nu ll hyp o thesis? In m any co m m o n situ atio ns the test statistic has the fo rm estim ate - hyp o thesized valu e stand ard d eviatio n o f the estim ate 6 3. F ind the p-va lu e o f the o bserved resu lt Fo r the C o k e ex am ple, we have that the m ean o f the sam ple is 1.9 4 o z. T he po pu latio n m ean specifi ed by the nu ll hypo thesis is 1 6 o z. A test statistic is z = 1.9 4 1 6 0.2/ 1 00 = 3 (W e ll have m o re to say abo u t this in a m o m ent.) T he p -valu e is the p ro bability o f o bserving a test statistic as extrem e o r m o re extrem e th an actu ally o bserved, assu m ing the nu ll hyp o thesis H 0 is tru e. T he sm aller the p -valu e, the stro ng er the evid ence again st the nu ll hyp o thesis. if the p -valu e is as sm all o r sm aller than so m e nu m ber α (e.g. 0.01, 0.0 ), we say that the resu lt is sta tistic a lly sig n ifi c a n t at level α. α is called the sig n ifi c a n c e le ve l o f the test. In the case o f the C o k e ex am p le, p = 0.001 3 fo r a o ne-sid ed test o r p = 0.002 6 fo r a two -sid ed test. (O nce ag ain, we ll have m o re to say abo u t this in a m o m ent.) 7 8
Inte rp re ta tio n o f th e S ig nifi c a nc e L e v e l To perform a te st o f sig nifi c a nc e le v e l α, we perform the prev iou s three steps an d then reject H 0 if th e p-valu e is less th an α. The followin g ou tcom es are possib le when con d u ctin g a test: R eality O u r D ecision H 0 H a H 0 Type I E rror H a Type II E rror S u ppose H 0 is actu ally tru e. If we d raw m an y sam ples, an d perform a test for each on e, α of these tests will (in correctly) reject H 0. In other word s, α is th e pro bability th at w e w ill m ake a Ty pe I erro r. Type II error is related to the n otion of the po w er of a test, which we will d iscu ss later. Example: A n Exact B in o mial Test In the last 1 Wo rld S eries (thro u g h 2003 ) there have been 24 seven g am e series. S u ppo se we wish to test the hypo thesis H 0 : G am es w ith in a W o rld S eries are in d epen d en t, w ith each team h avin g p ro bability 1 o f w in n in g. 2 Fo r the alternative hypo thesis, let s u se the g eneric Ha: T h e m od el in H 0 is in co rrect. L et X d eno te the nu m ber o f g am es in the Wo rld S eries. U nd er H 0, X has the fo llo wing d istribu tio n: Fo r o u r test statistic, let s ju st u se What is the p-valu e? k 4 6 7 P (X = k) 1 8 1 4 M = # seven g am e series We need to find m su ch that P H0 (M m) 0.0. A ssu m ing d ifferent years Wo rld S eries are ind epend ent (i.e. that the last 1 Wo rld S eries are an S R S fro m the po pu latio n o f Wo rld S eries), the nu m ber o f seven gam e series in 1 trials is B(1, /). P (M 20) = 0.086 P (M 21) = 0.049 We want to have a sig nificance level o f n o m o re th an a %, so the critical valu e will be 21. D o we reject H 0 at sig nificance level α = 0.0? T his is ju st a m atter o f check ing whether ou r observed valu e of M (24) ex ceed s the critical valu e (21). It d o es, so we reject H 0. 9 10 Te sts fo r a Po p u latio n M e an In the preced ing ex am ple, we were able to perfo rm an ex act B ino m ial test. Freq u ently, an ex act test is im practical, bu t we can u se the appro xim ate n o rm ality o f m ean s to co nd u ct an appro xim ate test. S u ppo se we want to test the hy po thesis that µ has a specifi c valu e: H 0 : µ = µ 0 S ince x estim ates µ, the test is based o n x, which has a (perhaps appro x im ately ) N o rm al d istribu tio n. T hu s, z = is a stand ard no rm al rand o m h y po th esis. x µ0 σ/ n variable, u n d er th e n u ll p-valu es fo r d iff erent alternative hy po theses: H a : µ > µ 0 p-valu e is P (Z z) (area o f rig ht-hand tail) H a : µ < µ 0 p-valu e is P (Z z) (area o f left-hand tail) H a : µ µ 0 p-valu e is 2P (Z z ) (area o f bo th tails) Example: F illin g C ok e B ottles (con t.) We are in terested in assessin g whether or n ot the machin e n eed s to be recalibrated, which will be the case if it is sy stematically over- or u n d er-fi llin g bottles. T hu s, we will u se the hy potheses H 0 : µ = 1 6 H a : µ 1 6 R ecall that x = 1.9 4, σ = 0.2, an d n = 1 00. T hu s, z = x µ 0 σ/ n = 3 T he p-valu e for a two-sid ed test is p = 2P (Z 3) = 0.0026. If α = 0.01, we reject H 0. If α = 0.0, we reject H 0. 11 1 2
Example: TV Tu b es TV tu b es are tak en at ran d o m an d th e lifetime measu red. n = 1 00, σ = 3 00 an d x = 1 26 (d ay s). Test wh eth er th e po pu latio n mean is 1 200, o r g reater th an 1 200. H 0 : µ = 1 200 H a : µ > 1 200 U n d er H 0, x N(1 200, 3 0). z = x 1 2 00 3 0 N(0, 1 ) u n d er H 0 1 2 6 1 2 00 Th e test statistic is z = 3 0 = 2.1 7, an d th e p-valu e is P (Z 2.1 7 H 0 ) = 0.01 Th is is evid en ce ag ain st H 0 at sig n ifi can ce level 0.0, so we reject H 0. Th at is, we co n clu d e th at th e averag e lifetime o f TV tu b es is g reater th an 1 200 d ay s. A R o u g h In te rp re ta tio n o f p-v a lu e s p-valu e In terpretatio n p > 0.1 0 n o evid en ce ag ain st H 0 0.0 < p 0.1 0 weak evid en ce ag ain st H 0 0.01 < p 0.0 evid en ce ag ain st H 0 p 0.01 stro n g evid en ce ag ain st H 0 S ta tistic a l v s. P ra c tic a l S ig n ifi c a n c e S ay in g th at a resu lt is statistically sign ifi can t d o es n o t sig n ify th at it is larg e o r n ecessarily im po rtan t. T h at d ecisio n d epen d s o n th e particu lars o f th e pro b lem. A statistically sig n ifi can t resu lt o n ly say s th at th ere is su b stan tial evid en ce th at H 0 is false. Failu re to reject H 0 d o es n o t im ply th at H 0 is co rrect. It o n ly im plies th at w e h ave in su ffi cien t evid en ce to co n clu d e th at H 0 is in co rrect. 1 3 1 4 Confidence Interv a ls a nd H y p oth esis Tests A level α two -sid ed test rejects a hy p o thesis H 0 : µ = µ 0 ex actly when the valu e o f µ 0 falls o u tsid e a (1 α) co n fi d en ce in terval fo r µ. Fo r ex am p le, co n sid er a two -sid ed test o f the fo llo win g hy p o theses H 0 : µ = µ 0 H a : µ µ 0 at the sig n ifi can ce level α =.0. If µ 0 is a valu e in sid e the 9 % co n fi d en ce in terval fo r µ, then this test will have a p-valu e g reater than.0, an d therefo re will n o t reject H 0. If µ 0 is a valu e o u tsid e the 9 % co n fi d en ce in terval fo r µ, then this test will have a p-valu e sm aller than.0, an d therefo re will reject H 0. Example A particu lar area contains 8 0 0 0 cond ominiu m u nits. In a su rvey of th e occu pants, a simple rand om sample of size 1 0 0 yield s th e information th at th ere are 1 6 0 motor veh icles in th e sample g iving an averag e nu mber of motor veh icles per u nit of 1.6, w ith a sample stand ard d eviation of 0.8. C onstru ct a confi d ence interval for th e total nu mber of veh icles in th e area. T h e city claims th at th ere are only 1 1,0 0 0 veh icles in th e area, so th ere is no need for a new g arag e. W h at d o you th ink? 1
Po te n tia l A b u se s o f Te sts More on C on stru c tin g H y p oth esis Tests Hypo thesis always refer to so me po pu latio n o r mo d el, no t to a particu lar o u tco me. A s a resu lt, H 0 and H a mu st be ex pressed in terms o f so me po pu latio n parameter o r parameters. H a typically ex presses the eff ect that we ho pe to fi nd evid ence fo r. S o H a is u su ally carefu lly tho u g ht o u t fi rst. We then set u p H 0 to be the case when the ho pe-fo r eff ect is no t present. It is no t always clear whether H a sho u ld be o ne-sid ed o r two -sid ed, i.e., d o es the parameter d iff er fro m its nu ll hypo thesis valu e in a specifi ed d irectio n. N ote: You a re n ot a llowed to look a t th e d a ta fi rst a n d th en fra m e H a to fi t wh a t th a t d a ta sh ow. In m any applications, a researcher constru cts a nu ll hypotheses with the intent of d iscred iting it. For ex am ple: H 0: new d ru g has the sam e eff ect as placebo H 0: m en and wom en are paid eq u ally A sm all p valu e can help a d ru g com pany can g et a d ru g approved by the FD A. S im ilarly, a researcher m ay have an easier tim e pu blishing his resu lts if the p-valu e is sm aller than 0.0. B ecau se of that we have to be aware of the following potential abu ses: U sing one-sid ed tests to m ak e the p-valu e one-half as big C ond u cting repeated sam pling and testing and reporting only the lowest p-valu e Testing m any hypothesis or testing the sam e hypothesis on m any d iff erent su bg rou ps. In the last two, even if there is actu ally no eff ect, you will probably g et at least one sm all p-valu e. 1 7 18