Te sts o f S ig n ifi ca n ce



Similar documents
AN EVALUATION OF SHORT TERM TREATMENT PROGRAM FOR PERSONS DRIVING UNDER THE INFLUENCE OF ALCOHOL P. A. V a le s, Ph.D.

w ith In fla m m a to r y B o w e l D ise a se. G a s tro in te s tin a l C lin ic, , K a s h iw a z a, A g e o C ity, S a ita m a

S y ste m s. T h e D atabase. D atabase m anagem e n t sy ste m

B a rn e y W a r f. U r b a n S tu d ie s, V o l. 3 2, N o. 2, ±3 7 8

1.- L a m e j o r o p c ió n e s c l o na r e l d i s co ( s e e x p li c a r á d es p u é s ).

I n la n d N a v ig a t io n a co n t r ib u t io n t o eco n o m y su st a i n a b i l i t y

H ig h L e v e l O v e r v iew. S te p h a n M a rt in. S e n io r S y s te m A rc h i te ct


EM EA. D is trib u te d D e n ia l O f S e rv ic e

SCO TT G LEA SO N D EM O Z G EB R E-

A n d r e w S P o m e r a n tz, M D


Put the human back in Human Resources.


Campus Sustainability Assessment and Related Literature


i n g S e c u r it y 3 1B# ; u r w e b a p p li c a tio n s f r o m ha c ke r s w ith t his å ] í d : L : g u id e Scanned by CamScanner

Public Health is Like..

W h a t is m e tro e th e rn e t

Software Quality Requirements and Evaluation, the ISO Series


M P L S /V P N S e c u rity , C is c o S y s te m s, In c. A ll rig h ts re s e rv e d.


Bewährte Six Sigma Tools in der Praxis




Frederikshavn kommunale skolevæsen

CIS CO S Y S T E M S. G u ille rm o A g u irre, Cis c o Ch ile , C is c o S y s te m s, In c. A ll rig h ts re s e rv e d.

Farmers attitudes toward and evaluation and use of insurance for income protection on Montana wheat farms by Gordon E Rodewald



PSTN. Gateway. Switch. Supervisor PC. Ethernet LAN. IPCC Express SERVER. CallManager. IP Phone. IP Phone. Cust- DB


GlasCraft Air Motor Repair Kits


J a re k G a w o r, J o e B e s te r, M a th e m a tic s & C o m p u te r. C o m p u ta tio n In s titu te,

JCUT-3030/6090/1212/1218/1325/1530

/*

Understanding, Modelling and Improving the Software Process. Ian Sommerville 1995 Software Engineering, 5th edition. Chapter 31 Slide 1


How To Read A Book

Health, Insurance, and Pension Plans in Union Contracts


The SmartView Tracker

ACE-1/onearm #show service-policy client-vips

B rn m e d s rlig e b e h o v... 3 k o n o m i S s k e n d e tils k u d o g k o n o m is k frip la d s... 7 F o r ld re b e ta lin g...

T ra d in g A c tiv ity o f F o re ig n In s titu tio n a l In v e s to rs a n d V o la tility

Workload Management Services. Data Management Services. Networking. Information Service. Fabric Management

Thuraya XT-LITE Simple. Reliable. Affordable.

Erfa rin g fra b y g g in g a v


Performance Engineering of a

Lehman Brothers UK Holdings Limited In Administration

proxy cert request dn, cert, Pkey, VOMS cred. (short lifetime) certificate: dn, ca, Pkey mod_ssl pre-process: parameters->





Engenharia de Software

S c h ools a n d W e b 2.0: a c ritic a l pe rspe c tiv e


Beverlin Allen, PhD, RN, MSN, ARNP


Tau tangles in Alzheimer's disease

B R T S y s te m in S e o u l a n d In te g r a te d e -T ic k e tin g S y s te m

Good to Be Alive. j œ. j œ. j œ. j œ œ. j œ. j œ. j œ. j œ œ. b b b. 4 œ P. & b b b. œ œ œ œ. œ œ œ œ. Ó Œ j œ. œ œ. Alto. Piano. Pno. Pno.


Overview of Spellings on

Peach State Reserves 40 1(k) and 457 Plans

Using Predictive Modeling to Reduce Claims Losses in Auto Physical Damage

An E mpir ical Analysis of Stock and B ond M ar ket Liquidity



Bypassing Spam Filters Using Homographs. Fa d y Moh a med Os ma n www. d a r kma s ter. d y_os ma n

How To Be A Successful Thai

Online Department Stores. What are we searching for?

U S B Pay m e n t P r o c e s s i n g TM

e-global Logistics Harald Lundestad February 14, 2001

The h o rtic u ltu r e in. Jammu and Kashmir. State i s one of the oldest industries and. economy. It s contribution to the State economy

UNITED STATES DEPARTMENT OF LABOR W. Willard Wirtz, Secretary



Warsaw School of Economics (SGH)

STUDENT HEALTH INSURANCE

MS IN EARLY CHILDHOOD STUDIES

ASCENT TM Integrated Shipment Management


BOOTH CONFIGURATIONS. R ev ised by Exhibits R ound T able Executiv e B oard at the S p ring B oard M eeting. HE IGHT

Vanguard Direct Deposit S e r v i c e. An easy, electronic way to deposit your pay at Va n g u a r d

MASON COUNTY BOARD OF REV IEW. Rules Governing the Mason County B oard of Review

Voxeo CXP Analyzer Report Samples

Liability Insurance for Public Agencies

Russell Blair Senior Designer 2013 PORTFOLIO

OpenScape Office V3 interaction with Microsoft Small Business Server 2011 Standard


New CPT codes for Acupuncture & Electrical Acupuncture AAOM 2005

A Unified Approach to Statistical Estimation and Model Parameterisation in Mass Calibration

He Will Hold Me Fast (When I Fear My Faith Will Fail)

Transcription:

Te sts o f S ig n ifi ca n ce Outline: G eneral Pro ced ure fo r H y p o th esis Testing N ull and A lternativ e H y p o th eses Test S tatistics p-v alues Interp retatio n o f th e S ig nifi cance L ev el Tests fo r a Po p ulatio n M ean Interp retatio n o f p-v alues S tatistical v s. Practical S ig nifi cance C o nfi d ence Interv als and H y p o th esis Tests Po tential A b uses o f Tests A co n fi d en ce in terval is a very u sefu l statistical in feren ce to o l w h en th e g o al is to estim ate a po pu latio n param eter. W h en th e g o al is to assess th e evid en ce pro vid ed by th e d ata in favo r o f so m e claim abo u t th e po pu latio n, test o f sig n ifi c a n c e are u sed. E x a m p le: F illin g C o k e B o ttles A m ach in e at a C o k e pro d u ctio n plan t is d esig n ed to fi ll bo ttles w ith 1 6 o z o f C o k e. T h e actu al am o u n t varies slig h tly fro m bo ttle to bo ttle. Fro m past ex perien ce, it is k n o w n th at th e S D 0.2 o z. A S R S o f 1 0 0 bo ttles fi lled by th e m ach in e h as a m ean 1.9 4 o z per bo ttle. Is th is evid en ce th at th e m ach in e n eed s to be recalibrated, o r co u ld th is d iff eren ce be a resu lt o f ran d o m variatio n? 1 2 General P ro c ed u re fo r H y p o theses Testing Te stin g H y p o the se s A hy p o the sis te st is an assessm en t o f the evid en ce pro vid ed by the d ata in favo r o f (o r ag ain st) so m e claim abo u t the po pu latio n. Fo r ex am ple, su ppo se we perfo rm a ran d o m ized ex perim en t o r tak e a ran d o m sam ple an d calcu late so m e sam ple statistic, say the sam ple m ean. We wan t to d ecid e if the observed valu e o f the sa m ple statistic is co n sisten t with so m e h y poth esized valu e o f the co rrespo n d in g popu la tion param eter. If the o bserved an d hypo thesized valu e d iff er (as they alm o st certain ly will), is the d iff eren ce d u e to an in co rrect hypo thesis o r m erely d u e to chan ce variatio n? 1. Fo rm u late the nu ll hy p o thesis and the alternative hy p o thesis T he nu ll hy p o thesis H 0 is the statem ent being tested. U su ally it states that the d iff erence between the o bserved valu e and the hy p o thesized valu e is o nly d u e to chance variatio n. Fo r ex am p le, µ = o z. T he alternativ e hy p o thesis H a is the statem ent we will favo r if we fi nd evid ence that the nu ll hy p o thesis is false. It u su ally states that there is a real d iff erence between the o bserved and hy p o thesiz ed valu es. Fo r ex am p le, µ, µ >, o r µ <. A test is called two -sid ed if H a is o f the fo rm µ. o ne-sid ed if H a is o f the fo rm µ <. µ >, o r 3 4

General P ro c ed u re fo r H y p o th eses Testing c o nt... Example: G R E S c o res The m ean sco re o f all ex am in ees o n the Verb al an d Q u an titative sectio n s o f the G R E is ab o u t 1 0 4 0. S u p p o se 0 ran d o m ly sam p led U C B erk eley g rad u ate stu d en ts have a m ean G R E V+ Q sco re o f 1 3 1 0. We are in terested in d eterm in in g if a m ean G R E V+ Q sco re o f 1 3 1 0 g ives evid en ce that, as a w ho le, B erk eley g rad u ate stu d en ts have a hig her m ean G R E sco re than the n atio n al averag e. What is H 0? What is H a? 2. C alcu late the test statistic o n which the test will be based. T he test statistic m easu res the d iff erence between the o bserved d ata and what wo u ld be ex p ected if the nu ll hyp o thesis were tru e. W hen H 0 is tru e, we ex p ect the estim ate based o n the sam p le to tak e a valu e near the p aram ater valu e sp ecifi ed by H 0. O u r g o al is to answer the q u estio n, H o w ex trem e is the valu e calcu lated fro m the sam p le fro m what we wo u ld ex p ect u nd er the nu ll hyp o thesis? In m any co m m o n situ atio ns the test statistic has the fo rm estim ate - hyp o thesized valu e stand ard d eviatio n o f the estim ate 6 3. F ind the p-va lu e o f the o bserved resu lt Fo r the C o k e ex am ple, we have that the m ean o f the sam ple is 1.9 4 o z. T he po pu latio n m ean specifi ed by the nu ll hypo thesis is 1 6 o z. A test statistic is z = 1.9 4 1 6 0.2/ 1 00 = 3 (W e ll have m o re to say abo u t this in a m o m ent.) T he p -valu e is the p ro bability o f o bserving a test statistic as extrem e o r m o re extrem e th an actu ally o bserved, assu m ing the nu ll hyp o thesis H 0 is tru e. T he sm aller the p -valu e, the stro ng er the evid ence again st the nu ll hyp o thesis. if the p -valu e is as sm all o r sm aller than so m e nu m ber α (e.g. 0.01, 0.0 ), we say that the resu lt is sta tistic a lly sig n ifi c a n t at level α. α is called the sig n ifi c a n c e le ve l o f the test. In the case o f the C o k e ex am p le, p = 0.001 3 fo r a o ne-sid ed test o r p = 0.002 6 fo r a two -sid ed test. (O nce ag ain, we ll have m o re to say abo u t this in a m o m ent.) 7 8

Inte rp re ta tio n o f th e S ig nifi c a nc e L e v e l To perform a te st o f sig nifi c a nc e le v e l α, we perform the prev iou s three steps an d then reject H 0 if th e p-valu e is less th an α. The followin g ou tcom es are possib le when con d u ctin g a test: R eality O u r D ecision H 0 H a H 0 Type I E rror H a Type II E rror S u ppose H 0 is actu ally tru e. If we d raw m an y sam ples, an d perform a test for each on e, α of these tests will (in correctly) reject H 0. In other word s, α is th e pro bability th at w e w ill m ake a Ty pe I erro r. Type II error is related to the n otion of the po w er of a test, which we will d iscu ss later. Example: A n Exact B in o mial Test In the last 1 Wo rld S eries (thro u g h 2003 ) there have been 24 seven g am e series. S u ppo se we wish to test the hypo thesis H 0 : G am es w ith in a W o rld S eries are in d epen d en t, w ith each team h avin g p ro bability 1 o f w in n in g. 2 Fo r the alternative hypo thesis, let s u se the g eneric Ha: T h e m od el in H 0 is in co rrect. L et X d eno te the nu m ber o f g am es in the Wo rld S eries. U nd er H 0, X has the fo llo wing d istribu tio n: Fo r o u r test statistic, let s ju st u se What is the p-valu e? k 4 6 7 P (X = k) 1 8 1 4 M = # seven g am e series We need to find m su ch that P H0 (M m) 0.0. A ssu m ing d ifferent years Wo rld S eries are ind epend ent (i.e. that the last 1 Wo rld S eries are an S R S fro m the po pu latio n o f Wo rld S eries), the nu m ber o f seven gam e series in 1 trials is B(1, /). P (M 20) = 0.086 P (M 21) = 0.049 We want to have a sig nificance level o f n o m o re th an a %, so the critical valu e will be 21. D o we reject H 0 at sig nificance level α = 0.0? T his is ju st a m atter o f check ing whether ou r observed valu e of M (24) ex ceed s the critical valu e (21). It d o es, so we reject H 0. 9 10 Te sts fo r a Po p u latio n M e an In the preced ing ex am ple, we were able to perfo rm an ex act B ino m ial test. Freq u ently, an ex act test is im practical, bu t we can u se the appro xim ate n o rm ality o f m ean s to co nd u ct an appro xim ate test. S u ppo se we want to test the hy po thesis that µ has a specifi c valu e: H 0 : µ = µ 0 S ince x estim ates µ, the test is based o n x, which has a (perhaps appro x im ately ) N o rm al d istribu tio n. T hu s, z = is a stand ard no rm al rand o m h y po th esis. x µ0 σ/ n variable, u n d er th e n u ll p-valu es fo r d iff erent alternative hy po theses: H a : µ > µ 0 p-valu e is P (Z z) (area o f rig ht-hand tail) H a : µ < µ 0 p-valu e is P (Z z) (area o f left-hand tail) H a : µ µ 0 p-valu e is 2P (Z z ) (area o f bo th tails) Example: F illin g C ok e B ottles (con t.) We are in terested in assessin g whether or n ot the machin e n eed s to be recalibrated, which will be the case if it is sy stematically over- or u n d er-fi llin g bottles. T hu s, we will u se the hy potheses H 0 : µ = 1 6 H a : µ 1 6 R ecall that x = 1.9 4, σ = 0.2, an d n = 1 00. T hu s, z = x µ 0 σ/ n = 3 T he p-valu e for a two-sid ed test is p = 2P (Z 3) = 0.0026. If α = 0.01, we reject H 0. If α = 0.0, we reject H 0. 11 1 2

Example: TV Tu b es TV tu b es are tak en at ran d o m an d th e lifetime measu red. n = 1 00, σ = 3 00 an d x = 1 26 (d ay s). Test wh eth er th e po pu latio n mean is 1 200, o r g reater th an 1 200. H 0 : µ = 1 200 H a : µ > 1 200 U n d er H 0, x N(1 200, 3 0). z = x 1 2 00 3 0 N(0, 1 ) u n d er H 0 1 2 6 1 2 00 Th e test statistic is z = 3 0 = 2.1 7, an d th e p-valu e is P (Z 2.1 7 H 0 ) = 0.01 Th is is evid en ce ag ain st H 0 at sig n ifi can ce level 0.0, so we reject H 0. Th at is, we co n clu d e th at th e averag e lifetime o f TV tu b es is g reater th an 1 200 d ay s. A R o u g h In te rp re ta tio n o f p-v a lu e s p-valu e In terpretatio n p > 0.1 0 n o evid en ce ag ain st H 0 0.0 < p 0.1 0 weak evid en ce ag ain st H 0 0.01 < p 0.0 evid en ce ag ain st H 0 p 0.01 stro n g evid en ce ag ain st H 0 S ta tistic a l v s. P ra c tic a l S ig n ifi c a n c e S ay in g th at a resu lt is statistically sign ifi can t d o es n o t sig n ify th at it is larg e o r n ecessarily im po rtan t. T h at d ecisio n d epen d s o n th e particu lars o f th e pro b lem. A statistically sig n ifi can t resu lt o n ly say s th at th ere is su b stan tial evid en ce th at H 0 is false. Failu re to reject H 0 d o es n o t im ply th at H 0 is co rrect. It o n ly im plies th at w e h ave in su ffi cien t evid en ce to co n clu d e th at H 0 is in co rrect. 1 3 1 4 Confidence Interv a ls a nd H y p oth esis Tests A level α two -sid ed test rejects a hy p o thesis H 0 : µ = µ 0 ex actly when the valu e o f µ 0 falls o u tsid e a (1 α) co n fi d en ce in terval fo r µ. Fo r ex am p le, co n sid er a two -sid ed test o f the fo llo win g hy p o theses H 0 : µ = µ 0 H a : µ µ 0 at the sig n ifi can ce level α =.0. If µ 0 is a valu e in sid e the 9 % co n fi d en ce in terval fo r µ, then this test will have a p-valu e g reater than.0, an d therefo re will n o t reject H 0. If µ 0 is a valu e o u tsid e the 9 % co n fi d en ce in terval fo r µ, then this test will have a p-valu e sm aller than.0, an d therefo re will reject H 0. Example A particu lar area contains 8 0 0 0 cond ominiu m u nits. In a su rvey of th e occu pants, a simple rand om sample of size 1 0 0 yield s th e information th at th ere are 1 6 0 motor veh icles in th e sample g iving an averag e nu mber of motor veh icles per u nit of 1.6, w ith a sample stand ard d eviation of 0.8. C onstru ct a confi d ence interval for th e total nu mber of veh icles in th e area. T h e city claims th at th ere are only 1 1,0 0 0 veh icles in th e area, so th ere is no need for a new g arag e. W h at d o you th ink? 1

Po te n tia l A b u se s o f Te sts More on C on stru c tin g H y p oth esis Tests Hypo thesis always refer to so me po pu latio n o r mo d el, no t to a particu lar o u tco me. A s a resu lt, H 0 and H a mu st be ex pressed in terms o f so me po pu latio n parameter o r parameters. H a typically ex presses the eff ect that we ho pe to fi nd evid ence fo r. S o H a is u su ally carefu lly tho u g ht o u t fi rst. We then set u p H 0 to be the case when the ho pe-fo r eff ect is no t present. It is no t always clear whether H a sho u ld be o ne-sid ed o r two -sid ed, i.e., d o es the parameter d iff er fro m its nu ll hypo thesis valu e in a specifi ed d irectio n. N ote: You a re n ot a llowed to look a t th e d a ta fi rst a n d th en fra m e H a to fi t wh a t th a t d a ta sh ow. In m any applications, a researcher constru cts a nu ll hypotheses with the intent of d iscred iting it. For ex am ple: H 0: new d ru g has the sam e eff ect as placebo H 0: m en and wom en are paid eq u ally A sm all p valu e can help a d ru g com pany can g et a d ru g approved by the FD A. S im ilarly, a researcher m ay have an easier tim e pu blishing his resu lts if the p-valu e is sm aller than 0.0. B ecau se of that we have to be aware of the following potential abu ses: U sing one-sid ed tests to m ak e the p-valu e one-half as big C ond u cting repeated sam pling and testing and reporting only the lowest p-valu e Testing m any hypothesis or testing the sam e hypothesis on m any d iff erent su bg rou ps. In the last two, even if there is actu ally no eff ect, you will probably g et at least one sm all p-valu e. 1 7 18