SESUG 212 Pae SD-7 Samle Size Detemiatio fo a Noaametic Ue Toleace Limit fo ay Ode Statistic D. Deis Beal, Sciece Alicatios Iteatioal Cooatio, Oak Ridge, Teessee ABSTRACT A oaametic ue toleace limit (UTL) bouds a secified ecetage of the oulatio distibutio with secified cofidece. The most commo UTL is based o the lagest ode statistic (the maximum) whee the umbe of samles equied fo a give cofidece ad coveage is easily deived fo a ifiitely lage oulatio. Howeve, fo othe ode statistics such as the secod lagest, thid lagest, etc., the equatios used to detemie the umbe of samles to achieve a secified cofidece ad coveage become moe comlex usig the icomlete Beta fuctio as the ode statistic deceases fom the maximum. This ae uses the theoy of ode statistics to deive the equatios fom the icomlete Beta distibutio fo calculatig the samle size fo a oe-sided oaametic UTL usig ay ode statistic. SAS code is show that efoms these calculatios i a sigle maco. The umbe of samles equied fo vaious ode statistics is comaed fo the icomlete Beta fuctio, the omal aoximatio to the biomial ad the biomial distibutio. Examles of SAS code ae show fo each method. The biomial distibutio is show to be the most accuate fo calculatig the oe ode statistic fo ay umbe of samles. This ae is fo itemediate SAS uses of Base SAS who udestad statistical itevals, statistical distibutios ad SAS macos. Key wods: ue toleace limit, macos, ode statistics, samle size, cofidece, coveage, biomial INTRODUCTION A oe-sided distibutio-fee (oaametic) ue toleace limit (UTL) is equivalet to a oe-sided distibutio-fee cofidece boud fo a ecetile of that oulatio. Sice it is oaametic, o distibutioal assumtios ae ecessay such as omality, logomality, gamma o ay othe cotiuous distibutio. The oaametic UTL does assume the data collected ae adomly selected fom a ifiitely lage oulatio, ae statistically ideedet samles ad ae statistically eesetative of the oulatio. UTLs have both a cofidece ad coveage attibutio. The coveage of a UTL is the ecetage of the oulatio distibutio that is bouded by the ode statistic fom the samle. The cofidece of a UTL is how cofidet oe is that the secified ode statistic bouds the ecetile of the oulatio distibutio ad is deoted 1x(1 - α)% whee α is the Tye I eo ate ( < α < 1). A Tye I eo (α ) is the obability of ejectig the ull hyothesis whe i fact the ull hyothesis is tue. Oce the cofidece, coveage ad desied ode statistic ae secified, the miimum umbe of samles () ecessay to achieve these aametes ca be calculated usig the SAS code eseted i this ae. Fo examle, if α =.5 ad =.9 usig the lagest ode statistic fom = 29 samles, the we would be 95% cofidet that the maximum fom the 29 samles bouds at least 9% of the cotiuous oulatio distibutio. The SAS code uses the SAS System fo esoal comutes vesio 9.3 uig o Widows 7. THEORY OF ORDER STATISTICS A oe-sided oaametic UTL assumig a ifiitely lage oulatio uses a icomlete Beta fuctio descibed i Beye (Beye 1966). The equatio to solve fo the umbe of samles () is show i Equatio 1. Γ( + 1) Γ( u) Γ( + 1 u) u 1 u α x (1 x) (1) whee u = the ode statistic of iteest (u = fo the maximum, u = - 1 fo the secod to maximum, etc.), Γ( + 1) =!, α = Tye I eo ate ( < α < 1), = coveage ( < < 1) LARGEST ORDER STATISTIC Theefoe, Equatio 1 educes to Equatio 2 fo the lagest o maximum cocetatio whee u =. 1
SESUG 212 1 α x (2) Itegatig Equatio 2 yields Equatio 3. α (3) Solvig Equatio 3 fo yields Equatio 4 usig the maximum fo the oe-sided UTL lα l (4) Fo examle, if α =.5 ad =.9, the a miimum of = 29 samles fom a ifiitely lage oulatio ae eeded fo a oe-sided oaametic UTL usig the maximum ode statistic fom Equatio 4. The maximum fom the 29 samles bouds at least 9% of the oulatio distibutio with 95% cofidece. Equatio 4 is also show ad deived i Hah ad Meeke (1991). SECOND LARGEST ORDER STATISTIC Howeve, suose oe susects thee is a high likelihood that a outlie could be at of the 29 samles. Icludig a outlie would foce the outlie as the maximum to be the oe-sided oaametic UTL. This could udeestimate the ecetage of the oulatio distibutio that the outlie bouds with 95% cofidece. Theefoe, we wat to calculate the umbe of samles equied so the secod lagest ode statistic ca be used as the UTL i the evet a sigle outlie is eset i the samle. Usig Equatio 1 whee u = 1 fo the secod lagest ode statistic, Equatio 1 educes to Equatio 5. 2 α ( 1) x (1 x) (5) Itegatig Equatio 5 yields Equatio 6 as show i Hah ad Meeke (1991). 1 α ( 1) (6) Howeve, thee is o closed fom solutio fo solvig Equatio 6 fo as a fuctio of ad α. Theefoe, afte secifyig Equatio 6 ca be solved usig ay umbe of stadad aalytical techiques. The easiest method is to iset the fuctio fom Equatio 6 ito a SAS do loo that icemets by oe ad evaluatig Equatio 6 util is foud so that the ight had side of Equatio 6 is α. Usig this techique shows that = 46 is the miimum samle size that causes Equatio 6 to be tue fo α =.5 fo =.9. Theefoe, if the samle size iceases fom = 29 to = 46, the the secod to lagest esult i the samle of 46 is the oe-sided oaametic UTL istead of the maximum ode statistic i the evet of a sigle outlie with 95% cofidece ad 9% coveage. Equatios 4 ad 6 ca be used fo ay cofidece 1x(1 α)% ad coveage. THIRD LARGEST ORDER STATISTIC Suose oe susects thee could be at most two outlies that could be at of the 46 samles. The we wat to calculate the umbe of samles equied so the thid lagest ode statistic ca be used as the UTL i the evet two outlies ae eset i the samle. Usig Equatio 1 whee u = 2 fo the thid lagest ode statistic, Equatio 1 educes to Equatio 7. Itegatig Equatio 7 yields Equatio 8. ( 1)( 2) 2! 3 2 α x (1 x) (7) 2
SESUG 212 2 1 ( 1)( 2) 2 + 2! 2 1 α (8) Thee also is o closed fom solutio fo solvig Equatio 8 fo as a fuctio of ad α. Theefoe, afte secifyig Equatio 8 ca be solved usig SAS by icemetig by oe ad evaluatig Equatio 8 util is foud so that the ight had side of Equatio 8 is α. Usig this techique shows that = 61 is the miimum samle size that causes Equatio 8 to be tue fo α =.5 fo =.9. Theefoe, if the samle size iceases fom = 46 to = 61, the the thid lagest esult i the samle of 61 is the oe-sided oaametic UTL. Equatios 4, 6 ad 8 ca be used fo ay cofidece 1x(1 α)% ad coveage. FOURTH LARGEST ORDER STATISTIC Suose we wat to calculate the umbe of samles equied so the fouth lagest ode statistic ca be used as the UTL i the evet thee outlies ae eset i the samle. Usig Equatio 1 whee u = 3 fo the fouth lagest ode statistic, Equatio 1 educes to Equatio 9. Itegatig Equatio 9 yields Equatio 1. ( 1)( 2)( 3) 3! 4 3 α x (1 x) (9) 3 2 ( 1)( 2)( 3) 3 3! 3 2 1 3 + 1 α (1) Thee also is o closed fom solutio fo solvig Equatio 1 fo as a fuctio of ad α. Theefoe, afte secifyig Equatio 1 ca be solved usig SAS by icemetig by oe ad evaluatig Equatio 1 util is foud so that the ight had side of Equatio 1 is α. Usig this techique shows that = 76 is the miimum samle size that causes Equatio 1 to be tue fo α =.5 fo =.9. Theefoe, if the samle size iceases fom = 61 to = 76, the the fouth lagest esult i the samle of 76 is the oe-sided oaametic UTL. Equatios 4, 6, 8 ad 1 ca be used fo ay cofidece 1x(1 α)% ad coveage. ANY ORDER STATISTIC Usig mathematical iductio we ca deive the equatio used to deive fo ay (+1) th obsevatio fom the maximum fo - 1. So = coesods to the lagest ode statistic (maximum), = 1 is the secod lagest ode statistic, etc. Usig Equatio 1 whee u = fo the (+1) th lagest obsevatio, Equatio 1 educes to Equatio 11. Itegatig Equatio 11 yields Equatio 12. 1! 1! 1 α ( i) x (1 x) (11) i= i i! ( i) ( 1) i= i!( i)! i α (12) i= Clealy thee is o closed fom solutio fo solvig Equatio 12 fo as a fuctio of ad α. Theefoe, afte secifyig Equatio 12 ca be solved usig SAS by icemetig by oe ad evaluatig Equatio 12 util the ight had side of Equatio 12 is α. Equatios 4, 6, 8, 1 ad 12 ca be used fo ay cofidece 1x(1 α)% ad coveage. 3
SESUG 212 SAS CODE FOR EQUATION 12 The SAS code that solves Equatio 12 fo ay α,, ad is show i the SAS maco UTL below. %maco utl(num); ** ode statistic (1=max, 2=secod to max, 3=thid to max, etc.) ; data a# NUM = &NUM; do =.95; * ecet coveage desied; do alha =.5; CONF = (1 - alha) * 1; ** cofidece as a itege; = um-1; f = 1; ** iitialize f() = 1; do util (=2); +1; %do t = 1 %to &NUM; T&t = (-1)**(&t+1) * comb(um-1, &t.-1) * **(&t.-1) / ( um + &t.); % f = comb(, -um)*um***(-um+1)*sum(of t1-t&um) - alha; ** Eq. 12 ; outut; outut; oc it data=a# title "&NUM"; u; %med utl; %utl(1) NORMAL APPROXIMATION TO THE BINOMIAL I actice, Equatio 12 has bee show to be ovely cosevative fo estimatig the UTL whe is lage. Equatio 12 is best used whe < 5 o (1-) < 5. Equatio 12 solves coectly fo fo 9, but the fuctio has multile oots close togethe fo > 9, makig it difficult to choose the oe value of. Whe is lage eough, the omal aoximatio to the biomial distibutio ovides a moe accuate ode statistic tha Equatio 12. Whe 5 ad (1-) 5, Equatio 13 (U.S. Eviometal Potectio Agecy 21) ca be used to detemie the oe ode statistic k (1 k ) fo a oe-sided oaametic UTL. k α (13) = + z1 (1 ) +.5 The z (1-α) tem i Equatio 13 is the deviate fom the stadad omal distibutio associated with a 1x(1 α)% oe-sided cofidece iteval. Fo examle, whe α =.5 fo a 95% cofidece iteval, z.95 = 1.645. The.5 tem i Equatio 13 is icluded as a coectio facto as the cotiuous omal distibutio aoximates the discete biomial distibutio. SAS CODE FOR EQUATION 13 The SAS code that calculates the ode statistics k fom Equatio 13 usig the omal distibutio to aoximate the biomial distibutio fo ay α,, ad is show below. data a; =.95; z = obit(); do = 1 to 4; ** = umbe of samles; k = * + z*sqt(**(1-)) +.5; ** k = ode statistic; = - k; ** = obsevatios below the maximum; outut; oc it data=a; u; 4
SESUG 212 THE BINOMIAL DISTRIBUTION The ode statistic k (1 k ) ca be calculated diectly ad exactly fom the biomial distibutio. The cumulative biomial distibutio is show i Equatio 14.! k i i 1 α (1 ) (14) i= i!( i)! Equatio 14 is used to calculate the smallest ode statistic k such that the cumulative biomial distibutio equals o exceeds the cofidece coefficiet 1-α. SAS CODE FOR EQUATION 14 The SAS code that calculates the exact ode statistics k fom Equatio 14 usig the cumulative biomial distibutio fo ay α,, ad is show below. data b; =.95; ** coveage ; cof =.95; ** cofidece as a ecet; do = 2 to 2; ** = umbe of samles; do k = to ; ** k = ode statistic (1 k ); = - k; ** = umbe of obsevatios below maximum ( -1); ob = obbml(,, k); ** cumulative biomial distibutio; if ob >= cof the do; outut; goto doe; doe: oc it data=b; u; RESULTS The esults fom imlemetig the SAS code fom Equatios 12, 13 ad 14 fo α =.5 (95% cofidece) ad =.95 (95% coveage) ae show i Table 1 whee = umbe of obsevatios below the maximum ( 3). Fo examle, = is the maximum, = 1 is the secod lagest ode statistic, etc. Table 1 shows the icomlete Beta fuctio (Eq. 12) ad the biomial distibutio (Eq. 14) agee exactly fo 9, while the omal aoximatio (Eq. 13) ages fom 7 to 11 samles highe fo 9 (the 1 th lagest ode statistic). Fo > 9, the icomlete Beta fuctio diveges much highe fom both the omal aoximatio ad the biomial, causig the icomlete Beta fuctio to be ovely cosevative by selectig a much highe ode statistic tha is ecessay to boud the 95 th ecetile with 95% cofidece. The omal aoximatio cosistetly equies 6 o 7 moe samles tha the biomial fo > 9, but equies fewe samles tha the icomlete Beta. Figue 1 shows the omal aoximatio to the biomial distibutio lots cosistetly slightly above the biomial distibutio fo 3. Howeve, the icomlete Beta fuctio diveges fom both the omal aoximatio to the biomial ad the biomial begiig at = 1. The divegece iceases as iceases. 5
SESUG 212 Miimum * usig Miimum * usig Miimum * usig Icomlete Beta (Eq. 12) Nomal (Eq. 13) Biomial (Eq. 14) 59 7 59 1 93 13 93 2 124 133 124 3 153 161 153 4 181 189 181 5 28 216 28 6 234 242 234 7 26 268 26 8 286 293 286 9 311 318 311 1 345 343 336 11 436 368 361 12 577 392 386 13 745 417 41 14 84 441 434 15 17 465 458 16 1151 489 482 17 1287 513 56 18 1418 536 53 19 1535 56 554 2 1682 584 577 21 1799 67 61 22 1916 63 624 23 258 654 647 24 2186 677 671 25 2293 7 694 26 2421 723 717 27 2559 746 74 28 2679 769 763 29 2788 792 786 3 2939 815 89 * Numbe of samles assumig 95% cofidece with 95% coveage Table 1. Miimum Samle Sizes fo Oe-Sided Noaametic UTLs 6
SESUG 212 3 95% Cofidece with 95% Coveage 25 Numbe of Samles () 2 15 1 5 Exact Biomial Icomlete Beta Nomal Aoximatio 2 4 6 8 1 12 14 16 18 2 22 24 26 28 3 Numbe of Obsevatios Below Maximum () Figue 1. Numbe of Samles () with the Numbe of Obsevatios Below the Maximum () by Method CONCLUSION While the most commoly used oe-sided oaametic UTL is based o the lagest ode statistic (the maximum), the icomlete Beta fuctio ca be used with othe ode statistics such as the secod lagest, thid lagest, etc., to detemie the umbe of samles to achieve a secified cofidece ad coveage. The equatio to use fo ay ode statistic was deived i geeal fo the icomlete Beta fuctio. SAS code was eseted i a sigle maco to calculate the umbe of samles equied fo secified cofidece ad coveage fo ay ode statistic. The icomlete Beta fuctio efomed well fo the 1 lagest ode statistics, but ovided ovely cosevative estimates begiig with the 11 th lagest ode statistic. The omal aoximatio to the biomial cosistetly equies 6 o 7 moe samles tha the cumulative biomial distibutio. SAS code was eseted to calculate the ode statistics fo the icomlete Beta, omal aoximatio to the biomial ad the biomial distibutio. Sice the cumulative biomial distibutio ca be calculated easily i SAS, the biomial distibutio has bee show to be the efeed method fo calculatig the oe ode statistic fo ay umbe of samles. Examles of calculatios usig the SAS code fo the thee methods wee show fo the 31 lagest ode statistics fo 95% cofidece ad 95% coveage. REFERENCES Beye, W. 1966. Hadbook of Tables fo Pobability ad Statistics. 251. Boca Rato, Floida: CRC Pess, Ic. Hah, G. ad W. Meeke. 1991. Statistical Itevals: A Guide fo Pactitioes. 91-92. New Yok, New Yok: Joh Wiley & Sos, Ic. U.S. Eviometal Potectio Agecy (May 21). PoUCL Vesio 4.1. Techical Guide: Statistical Softwae fo Eviometal Alicatios fo Data Sets with ad without Nodetect Obsevatios. 88. (EPA/6/R-7/41). Washigto, DC 7
SESUG 212 CONTACT INFORMATION The autho welcomes ad ecouages ay questios, coectios, feedback, ad emaks. Cotact the autho at: Deis J. Beal, Ph.D. Seio Statisticia / Risk Scietist Sciece Alicatios Iteatioal Cooatio 151 Lafayette Dive Oak Ridge, Teessee 37831 hoe: 865-481-8736 e-mail: beald@saic.com SAS ad all othe SAS Istitute Ic. oduct o sevice ames ae egisteed tademaks o tademaks of SAS Istitute Ic. i the USA ad othe couties. idicates USA egistatio. Othe bad ad oduct ames ae egisteed tademaks o tademaks of thei esective comaies. 8