Paper SD-07. Key words: upper tolerance limit, macros, order statistics, sample size, confidence, coverage, binomial



Similar documents
Understanding Financial Management: A Practical Guide Guideline Answers to the Concept Check Questions

Finance Practice Problems

Confidence Intervals for One Mean

Money Math for Teens. Introduction to Earning Interest: 11th and 12th Grades Version

Annuities and loan. repayments. Syllabus reference Financial mathematics 5 Annuities and loan. repayments

Confidence Intervals for Two Proportions

Periodic Review Probabilistic Multi-Item Inventory System with Zero Lead Time under Constraints and Varying Order Cost

Learning Algorithm and Application of Quantum Neural Networks with Quantum Weights

3. Greatest Common Divisor - Least Common Multiple

Soving Recurrence Relations

Solving Logarithms and Exponential Equations

Streamline Compositional Simulation of Gas Injections Dacun Li, University of Texas of the Permian Basin

Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS

Section 11.3: The Integral Test

ANNUITIES SOFTWARE ASSIGNMENT TABLE OF CONTENTS... 1 ANNUITIES SOFTWARE ASSIGNMENT... 2 WHAT IS AN ANNUITY?... 2 EXAMPLE QUESTIONS...

Derivation of Annuity and Perpetuity Formulae. A. Present Value of an Annuity (Deferred Payment or Ordinary Annuity)

Two degree of freedom systems. Equations of motion for forced vibration Free vibration analysis of an undamped system

1240 ev nm 2.5 ev. (4) r 2 or mv 2 = ke2

A probabilistic proof of a binomial identity

Learning Objectives. Chapter 2 Pricing of Bonds. Future Value (FV)

Gauss Law. Physics 231 Lecture 2-1

Infinite Sequences and Series

Chapter 4: Matrix Norms

1. MATHEMATICAL INDUCTION

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Chapter 7: Confidence Interval and Sample Size

Questions & Answers Chapter 10 Software Reliability Prediction, Allocation and Demonstration Testing

The dinner table problem: the rectangular case

CHAPTER 10 Aggregate Demand I

5: Introduction to Estimation

Determining the sample size

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES


Time Value of Money, NPV and IRR equation solving with the TI-86

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

An Introduction to Omega

FM4 CREDIT AND BORROWING

Maximum Entropy, Parallel Computation and Lotteries

The Stable Marriage Problem

Semipartial (Part) and Partial Correlation

Properties of MLE: consistency, asymptotic normality. Fisher information.

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Regression with a Binary Dependent Variable (SW Ch. 11)

Chapter 5 Unit 1. IET 350 Engineering Economics. Learning Objectives Chapter 5. Learning Objectives Unit 1. Annual Amount and Gradient Functions

Chapter 14 Nonparametric Statistics

I. Chi-squared Distributions

FXA Candidates should be able to : Describe how a mass creates a gravitational field in the space around it.

REVIEW OF INTEGRATION

Determining solar characteristics using planetary data

Breakeven Holding Periods for Tax Advantaged Savings Accounts with Early Withdrawal Penalties

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

Sequences and Series

Output Analysis (2, Chapters 10 &11 Law)

Normal Distribution.

Incremental calculation of weighted mean and variance

Hypergeometric Distributions

2-3 The Remainder and Factor Theorems

CHAPTER 3 THE TIME VALUE OF MONEY

THE PRINCIPLE OF THE ACTIVE JMC SCATTERER. Seppo Uosukainen

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Maximum Likelihood Estimators.

CS103X: Discrete Structures Homework 4 Solutions

Open Economies. Chapter 32. A Macroeconomic Theory of the Open Economy. Basic Assumptions of a Macroeconomic Model of an Open Economy

Lesson 15 ANOVA (analysis of variance)

On the Optimality and Interconnection of Valiant Load-Balancing Networks

THE LEAST COMMON MULTIPLE OF A QUADRATIC SEQUENCE

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Hypothesis testing. Null and alternative hypotheses

Methods in Sample Surveys rd Quarter, 2009

1. C. The formula for the confidence interval for a population mean is: x t, which was

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

A Recursive Formula for Moments of a Binomial Distribution

Network Theorems - J. R. Lucas. Z(jω) = jω L

INVESTMENT PERFORMANCE COUNCIL (IPC)

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

S. Tanny MAT 344 Spring be the minimum number of moves required.

The Binomial Distribution

Chapter 7 Methods of Finding Estimators

Asian Development Bank Institute. ADBI Working Paper Series

CHAPTER 3 DIGITAL CODING OF SIGNALS

Episode 401: Newton s law of universal gravitation

UPS Virginia District Package Car Fleet Optimization

Practice Problems for Test 3

Elementary Theory of Russian Roulette

Experiment 6: Centripetal Force

Integer sequences from walks in graphs

5.3. Generalized Permutations and Combinations

BENEFIT-COST ANALYSIS Financial and Economic Appraisal using Spreadsheets

Building Blocks Problem Related to Harmonic Series

PY1052 Problem Set 8 Autumn 2004 Solutions

Quality Provision in Two-Sided Markets: the Case of. Managed Care

Theorems About Power Series

Present Value Factor To bring one dollar in the future back to present, one uses the Present Value Factor (PVF): Concept 9: Present Value

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

5 Boolean Decision Trees (February 11)

How To Schedule A Cloud Comuting On A Computer (I.E. A Computer)

between Modern Degree Model Logistics Industry in Gansu Province 2. Measurement Model 1. Introduction 2.1 Synergetic Degree

Transcription:

SESUG 212 Pae SD-7 Samle Size Detemiatio fo a Noaametic Ue Toleace Limit fo ay Ode Statistic D. Deis Beal, Sciece Alicatios Iteatioal Cooatio, Oak Ridge, Teessee ABSTRACT A oaametic ue toleace limit (UTL) bouds a secified ecetage of the oulatio distibutio with secified cofidece. The most commo UTL is based o the lagest ode statistic (the maximum) whee the umbe of samles equied fo a give cofidece ad coveage is easily deived fo a ifiitely lage oulatio. Howeve, fo othe ode statistics such as the secod lagest, thid lagest, etc., the equatios used to detemie the umbe of samles to achieve a secified cofidece ad coveage become moe comlex usig the icomlete Beta fuctio as the ode statistic deceases fom the maximum. This ae uses the theoy of ode statistics to deive the equatios fom the icomlete Beta distibutio fo calculatig the samle size fo a oe-sided oaametic UTL usig ay ode statistic. SAS code is show that efoms these calculatios i a sigle maco. The umbe of samles equied fo vaious ode statistics is comaed fo the icomlete Beta fuctio, the omal aoximatio to the biomial ad the biomial distibutio. Examles of SAS code ae show fo each method. The biomial distibutio is show to be the most accuate fo calculatig the oe ode statistic fo ay umbe of samles. This ae is fo itemediate SAS uses of Base SAS who udestad statistical itevals, statistical distibutios ad SAS macos. Key wods: ue toleace limit, macos, ode statistics, samle size, cofidece, coveage, biomial INTRODUCTION A oe-sided distibutio-fee (oaametic) ue toleace limit (UTL) is equivalet to a oe-sided distibutio-fee cofidece boud fo a ecetile of that oulatio. Sice it is oaametic, o distibutioal assumtios ae ecessay such as omality, logomality, gamma o ay othe cotiuous distibutio. The oaametic UTL does assume the data collected ae adomly selected fom a ifiitely lage oulatio, ae statistically ideedet samles ad ae statistically eesetative of the oulatio. UTLs have both a cofidece ad coveage attibutio. The coveage of a UTL is the ecetage of the oulatio distibutio that is bouded by the ode statistic fom the samle. The cofidece of a UTL is how cofidet oe is that the secified ode statistic bouds the ecetile of the oulatio distibutio ad is deoted 1x(1 - α)% whee α is the Tye I eo ate ( < α < 1). A Tye I eo (α ) is the obability of ejectig the ull hyothesis whe i fact the ull hyothesis is tue. Oce the cofidece, coveage ad desied ode statistic ae secified, the miimum umbe of samles () ecessay to achieve these aametes ca be calculated usig the SAS code eseted i this ae. Fo examle, if α =.5 ad =.9 usig the lagest ode statistic fom = 29 samles, the we would be 95% cofidet that the maximum fom the 29 samles bouds at least 9% of the cotiuous oulatio distibutio. The SAS code uses the SAS System fo esoal comutes vesio 9.3 uig o Widows 7. THEORY OF ORDER STATISTICS A oe-sided oaametic UTL assumig a ifiitely lage oulatio uses a icomlete Beta fuctio descibed i Beye (Beye 1966). The equatio to solve fo the umbe of samles () is show i Equatio 1. Γ( + 1) Γ( u) Γ( + 1 u) u 1 u α x (1 x) (1) whee u = the ode statistic of iteest (u = fo the maximum, u = - 1 fo the secod to maximum, etc.), Γ( + 1) =!, α = Tye I eo ate ( < α < 1), = coveage ( < < 1) LARGEST ORDER STATISTIC Theefoe, Equatio 1 educes to Equatio 2 fo the lagest o maximum cocetatio whee u =. 1

SESUG 212 1 α x (2) Itegatig Equatio 2 yields Equatio 3. α (3) Solvig Equatio 3 fo yields Equatio 4 usig the maximum fo the oe-sided UTL lα l (4) Fo examle, if α =.5 ad =.9, the a miimum of = 29 samles fom a ifiitely lage oulatio ae eeded fo a oe-sided oaametic UTL usig the maximum ode statistic fom Equatio 4. The maximum fom the 29 samles bouds at least 9% of the oulatio distibutio with 95% cofidece. Equatio 4 is also show ad deived i Hah ad Meeke (1991). SECOND LARGEST ORDER STATISTIC Howeve, suose oe susects thee is a high likelihood that a outlie could be at of the 29 samles. Icludig a outlie would foce the outlie as the maximum to be the oe-sided oaametic UTL. This could udeestimate the ecetage of the oulatio distibutio that the outlie bouds with 95% cofidece. Theefoe, we wat to calculate the umbe of samles equied so the secod lagest ode statistic ca be used as the UTL i the evet a sigle outlie is eset i the samle. Usig Equatio 1 whee u = 1 fo the secod lagest ode statistic, Equatio 1 educes to Equatio 5. 2 α ( 1) x (1 x) (5) Itegatig Equatio 5 yields Equatio 6 as show i Hah ad Meeke (1991). 1 α ( 1) (6) Howeve, thee is o closed fom solutio fo solvig Equatio 6 fo as a fuctio of ad α. Theefoe, afte secifyig Equatio 6 ca be solved usig ay umbe of stadad aalytical techiques. The easiest method is to iset the fuctio fom Equatio 6 ito a SAS do loo that icemets by oe ad evaluatig Equatio 6 util is foud so that the ight had side of Equatio 6 is α. Usig this techique shows that = 46 is the miimum samle size that causes Equatio 6 to be tue fo α =.5 fo =.9. Theefoe, if the samle size iceases fom = 29 to = 46, the the secod to lagest esult i the samle of 46 is the oe-sided oaametic UTL istead of the maximum ode statistic i the evet of a sigle outlie with 95% cofidece ad 9% coveage. Equatios 4 ad 6 ca be used fo ay cofidece 1x(1 α)% ad coveage. THIRD LARGEST ORDER STATISTIC Suose oe susects thee could be at most two outlies that could be at of the 46 samles. The we wat to calculate the umbe of samles equied so the thid lagest ode statistic ca be used as the UTL i the evet two outlies ae eset i the samle. Usig Equatio 1 whee u = 2 fo the thid lagest ode statistic, Equatio 1 educes to Equatio 7. Itegatig Equatio 7 yields Equatio 8. ( 1)( 2) 2! 3 2 α x (1 x) (7) 2

SESUG 212 2 1 ( 1)( 2) 2 + 2! 2 1 α (8) Thee also is o closed fom solutio fo solvig Equatio 8 fo as a fuctio of ad α. Theefoe, afte secifyig Equatio 8 ca be solved usig SAS by icemetig by oe ad evaluatig Equatio 8 util is foud so that the ight had side of Equatio 8 is α. Usig this techique shows that = 61 is the miimum samle size that causes Equatio 8 to be tue fo α =.5 fo =.9. Theefoe, if the samle size iceases fom = 46 to = 61, the the thid lagest esult i the samle of 61 is the oe-sided oaametic UTL. Equatios 4, 6 ad 8 ca be used fo ay cofidece 1x(1 α)% ad coveage. FOURTH LARGEST ORDER STATISTIC Suose we wat to calculate the umbe of samles equied so the fouth lagest ode statistic ca be used as the UTL i the evet thee outlies ae eset i the samle. Usig Equatio 1 whee u = 3 fo the fouth lagest ode statistic, Equatio 1 educes to Equatio 9. Itegatig Equatio 9 yields Equatio 1. ( 1)( 2)( 3) 3! 4 3 α x (1 x) (9) 3 2 ( 1)( 2)( 3) 3 3! 3 2 1 3 + 1 α (1) Thee also is o closed fom solutio fo solvig Equatio 1 fo as a fuctio of ad α. Theefoe, afte secifyig Equatio 1 ca be solved usig SAS by icemetig by oe ad evaluatig Equatio 1 util is foud so that the ight had side of Equatio 1 is α. Usig this techique shows that = 76 is the miimum samle size that causes Equatio 1 to be tue fo α =.5 fo =.9. Theefoe, if the samle size iceases fom = 61 to = 76, the the fouth lagest esult i the samle of 76 is the oe-sided oaametic UTL. Equatios 4, 6, 8 ad 1 ca be used fo ay cofidece 1x(1 α)% ad coveage. ANY ORDER STATISTIC Usig mathematical iductio we ca deive the equatio used to deive fo ay (+1) th obsevatio fom the maximum fo - 1. So = coesods to the lagest ode statistic (maximum), = 1 is the secod lagest ode statistic, etc. Usig Equatio 1 whee u = fo the (+1) th lagest obsevatio, Equatio 1 educes to Equatio 11. Itegatig Equatio 11 yields Equatio 12. 1! 1! 1 α ( i) x (1 x) (11) i= i i! ( i) ( 1) i= i!( i)! i α (12) i= Clealy thee is o closed fom solutio fo solvig Equatio 12 fo as a fuctio of ad α. Theefoe, afte secifyig Equatio 12 ca be solved usig SAS by icemetig by oe ad evaluatig Equatio 12 util the ight had side of Equatio 12 is α. Equatios 4, 6, 8, 1 ad 12 ca be used fo ay cofidece 1x(1 α)% ad coveage. 3

SESUG 212 SAS CODE FOR EQUATION 12 The SAS code that solves Equatio 12 fo ay α,, ad is show i the SAS maco UTL below. %maco utl(num); ** ode statistic (1=max, 2=secod to max, 3=thid to max, etc.) ; data a&num; NUM = &NUM; do =.95; * ecet coveage desied; do alha =.5; CONF = (1 - alha) * 1; ** cofidece as a itege; = um-1; f = 1; ** iitialize f() = 1; do util (=2); +1; %do t = 1 %to &NUM; T&t = (-1)**(&t+1) * comb(um-1, &t.-1) * **(&t.-1) / ( um + &t.); % f = comb(, -um)*um***(-um+1)*sum(of t1-t&um) - alha; ** Eq. 12 ; outut; outut; oc it data=a&num; title "&NUM"; u; %med utl; %utl(1) NORMAL APPROXIMATION TO THE BINOMIAL I actice, Equatio 12 has bee show to be ovely cosevative fo estimatig the UTL whe is lage. Equatio 12 is best used whe < 5 o (1-) < 5. Equatio 12 solves coectly fo fo 9, but the fuctio has multile oots close togethe fo > 9, makig it difficult to choose the oe value of. Whe is lage eough, the omal aoximatio to the biomial distibutio ovides a moe accuate ode statistic tha Equatio 12. Whe 5 ad (1-) 5, Equatio 13 (U.S. Eviometal Potectio Agecy 21) ca be used to detemie the oe ode statistic k (1 k ) fo a oe-sided oaametic UTL. k α (13) = + z1 (1 ) +.5 The z (1-α) tem i Equatio 13 is the deviate fom the stadad omal distibutio associated with a 1x(1 α)% oe-sided cofidece iteval. Fo examle, whe α =.5 fo a 95% cofidece iteval, z.95 = 1.645. The.5 tem i Equatio 13 is icluded as a coectio facto as the cotiuous omal distibutio aoximates the discete biomial distibutio. SAS CODE FOR EQUATION 13 The SAS code that calculates the ode statistics k fom Equatio 13 usig the omal distibutio to aoximate the biomial distibutio fo ay α,, ad is show below. data a; =.95; z = obit(); do = 1 to 4; ** = umbe of samles; k = * + z*sqt(**(1-)) +.5; ** k = ode statistic; = - k; ** = obsevatios below the maximum; outut; oc it data=a; u; 4

SESUG 212 THE BINOMIAL DISTRIBUTION The ode statistic k (1 k ) ca be calculated diectly ad exactly fom the biomial distibutio. The cumulative biomial distibutio is show i Equatio 14.! k i i 1 α (1 ) (14) i= i!( i)! Equatio 14 is used to calculate the smallest ode statistic k such that the cumulative biomial distibutio equals o exceeds the cofidece coefficiet 1-α. SAS CODE FOR EQUATION 14 The SAS code that calculates the exact ode statistics k fom Equatio 14 usig the cumulative biomial distibutio fo ay α,, ad is show below. data b; =.95; ** coveage ; cof =.95; ** cofidece as a ecet; do = 2 to 2; ** = umbe of samles; do k = to ; ** k = ode statistic (1 k ); = - k; ** = umbe of obsevatios below maximum ( -1); ob = obbml(,, k); ** cumulative biomial distibutio; if ob >= cof the do; outut; goto doe; doe: oc it data=b; u; RESULTS The esults fom imlemetig the SAS code fom Equatios 12, 13 ad 14 fo α =.5 (95% cofidece) ad =.95 (95% coveage) ae show i Table 1 whee = umbe of obsevatios below the maximum ( 3). Fo examle, = is the maximum, = 1 is the secod lagest ode statistic, etc. Table 1 shows the icomlete Beta fuctio (Eq. 12) ad the biomial distibutio (Eq. 14) agee exactly fo 9, while the omal aoximatio (Eq. 13) ages fom 7 to 11 samles highe fo 9 (the 1 th lagest ode statistic). Fo > 9, the icomlete Beta fuctio diveges much highe fom both the omal aoximatio ad the biomial, causig the icomlete Beta fuctio to be ovely cosevative by selectig a much highe ode statistic tha is ecessay to boud the 95 th ecetile with 95% cofidece. The omal aoximatio cosistetly equies 6 o 7 moe samles tha the biomial fo > 9, but equies fewe samles tha the icomlete Beta. Figue 1 shows the omal aoximatio to the biomial distibutio lots cosistetly slightly above the biomial distibutio fo 3. Howeve, the icomlete Beta fuctio diveges fom both the omal aoximatio to the biomial ad the biomial begiig at = 1. The divegece iceases as iceases. 5

SESUG 212 Miimum * usig Miimum * usig Miimum * usig Icomlete Beta (Eq. 12) Nomal (Eq. 13) Biomial (Eq. 14) 59 7 59 1 93 13 93 2 124 133 124 3 153 161 153 4 181 189 181 5 28 216 28 6 234 242 234 7 26 268 26 8 286 293 286 9 311 318 311 1 345 343 336 11 436 368 361 12 577 392 386 13 745 417 41 14 84 441 434 15 17 465 458 16 1151 489 482 17 1287 513 56 18 1418 536 53 19 1535 56 554 2 1682 584 577 21 1799 67 61 22 1916 63 624 23 258 654 647 24 2186 677 671 25 2293 7 694 26 2421 723 717 27 2559 746 74 28 2679 769 763 29 2788 792 786 3 2939 815 89 * Numbe of samles assumig 95% cofidece with 95% coveage Table 1. Miimum Samle Sizes fo Oe-Sided Noaametic UTLs 6

SESUG 212 3 95% Cofidece with 95% Coveage 25 Numbe of Samles () 2 15 1 5 Exact Biomial Icomlete Beta Nomal Aoximatio 2 4 6 8 1 12 14 16 18 2 22 24 26 28 3 Numbe of Obsevatios Below Maximum () Figue 1. Numbe of Samles () with the Numbe of Obsevatios Below the Maximum () by Method CONCLUSION While the most commoly used oe-sided oaametic UTL is based o the lagest ode statistic (the maximum), the icomlete Beta fuctio ca be used with othe ode statistics such as the secod lagest, thid lagest, etc., to detemie the umbe of samles to achieve a secified cofidece ad coveage. The equatio to use fo ay ode statistic was deived i geeal fo the icomlete Beta fuctio. SAS code was eseted i a sigle maco to calculate the umbe of samles equied fo secified cofidece ad coveage fo ay ode statistic. The icomlete Beta fuctio efomed well fo the 1 lagest ode statistics, but ovided ovely cosevative estimates begiig with the 11 th lagest ode statistic. The omal aoximatio to the biomial cosistetly equies 6 o 7 moe samles tha the cumulative biomial distibutio. SAS code was eseted to calculate the ode statistics fo the icomlete Beta, omal aoximatio to the biomial ad the biomial distibutio. Sice the cumulative biomial distibutio ca be calculated easily i SAS, the biomial distibutio has bee show to be the efeed method fo calculatig the oe ode statistic fo ay umbe of samles. Examles of calculatios usig the SAS code fo the thee methods wee show fo the 31 lagest ode statistics fo 95% cofidece ad 95% coveage. REFERENCES Beye, W. 1966. Hadbook of Tables fo Pobability ad Statistics. 251. Boca Rato, Floida: CRC Pess, Ic. Hah, G. ad W. Meeke. 1991. Statistical Itevals: A Guide fo Pactitioes. 91-92. New Yok, New Yok: Joh Wiley & Sos, Ic. U.S. Eviometal Potectio Agecy (May 21). PoUCL Vesio 4.1. Techical Guide: Statistical Softwae fo Eviometal Alicatios fo Data Sets with ad without Nodetect Obsevatios. 88. (EPA/6/R-7/41). Washigto, DC 7

SESUG 212 CONTACT INFORMATION The autho welcomes ad ecouages ay questios, coectios, feedback, ad emaks. Cotact the autho at: Deis J. Beal, Ph.D. Seio Statisticia / Risk Scietist Sciece Alicatios Iteatioal Cooatio 151 Lafayette Dive Oak Ridge, Teessee 37831 hoe: 865-481-8736 e-mail: beald@saic.com SAS ad all othe SAS Istitute Ic. oduct o sevice ames ae egisteed tademaks o tademaks of SAS Istitute Ic. i the USA ad othe couties. idicates USA egistatio. Othe bad ad oduct ames ae egisteed tademaks o tademaks of thei esective comaies. 8