ProUCL Version User Guide Statistical Software for Environmental Applications for Data Sets with and without Nondetect Observations


 Andra Heath
 1 years ago
 Views:
Transcription
1 PrUCL Versin User Guide Statistical Sftware fr Envirnmental Applicatins fr Data Sets with and withut Nndetect Observatins R E S E A R C H A N D D E V E L O P M E N T
2 2
3 PrUCL Versin User Guide EPA/600/R07/041 September Statistical Sftware fr Envirnmental Applicatins fr Data Sets with and withut Nndetect Observatins Prepared fr: Felicia Barnett, Directr ORD Site Characterizatin and Mnitring Technical Supprt Center (SCMTSC) Superfund and Technlgy Liaisn, Regin 4 U.S. Envirnmental Prtectin Agency 61 Frsyth Street SW, Atlanta, GA Prepared by: Anita Singh, Ph.D. and Rbert Maichle Lckheed Martin IS&GSCIVIL 2890 Wdbridge Ave Edisn NJ U.S. Envirnmental Prtectin Agency Office f Research and Develpment Washingtn, DC Ntice: Althugh this wrk was reviewed by EPA and apprved fr publicatin, it may nt necessarily reflect fficial Agency plicy. Mentin f trade names and cmmercial prducts des nt cnstitute endrsement r recmmendatin fr use. i 129cmb07
4 NOTICE The United States Envirnmental Prtectin Agency (EPA) thrugh its Office f Research and Develpment (ORD) funded and managed the research described in this PrUCL Technical Guide. It has been peer reviewed by the EPA and apprved fr publicatin. Mentin f trade names r cmmercial prducts des nt cnstitute endrsement r recmmendatin by the EPA fr use. PrUCL sftware was develped by Lckheed Martin, IS&GS  CIVIL under a cntract with the EPA and is made available thrugh the EPA Technical Supprt Center in Atlanta, Gergia. Use f any prtin f PrUCL that des nt cmply with the PrUCL Technical Guide is nt recmmended. PrUCL cntains embedded licensed sftware. Any mdificatin f the PrUCL surce cde may vilate the embedded licensed sftware agreements and is expressly frbidden. PrUCL sftware prvided by the EPA was scanned with McAfee VirusScan v4.5.1 SP1 and is certified free f viruses. With respect t PrUCL distributed sftware and dcumentatin, neither the EPA nr any f their emplyees, assumes any legal liability r respnsibility fr the accuracy, cmpleteness, r usefulness f any infrmatin, apparatus, prduct, r prcess disclsed. Furthermre, sftware and dcumentatin are supplied asis withut guarantee r warranty, expressed r implied, including withut limitatin, any warranty f merchantability r fitness fr a specific purpse. ii
5 Minimum Hardware Requirements PrUCL will functin but will run slwly and page a lt. Intel Pentium 1.0 GHz 45 MB f hard drive space 512 MB f memry (RAM) CDROM drive r internet cnnectin Windws XP (with SP3), Vista (with SP1 r later), and Windws 7. PrUCL will functin but sme titles and sme Graphical User Interfaces (GUIs) will need t be scrlled. Definitin withut clr will be marginal. 800 by 600 Pixels Basic Clr is preferred Preferred Hardware Requirements 1 gigahertz (GHz) r faster Prcessr. 1 gigabyte (GB) f memry (RAM) 1024 by 768 Pixels r greater clr display Sftware Requirements PrUCL has been develped in the Micrsft.NET Framewrk 4.0 using the C# prgramming language. T prperly run PrUCL sftware, the cmputer using the prgram must have the.net Framewrk 4.0 preinstalled. The dwnladable.net Framewrk 4.0 files can be btained frm ne f the fllwing websites: Quicker site fr 32 Bit Operating systems Use this site if yu have a 64 Bit perating system iii
6 Installatin Instructins when Dwnlading frm the EPA Web Site Dwnlad the file SETUP.EXE frm the EPA Web site and save t a temprary lcatin. Run the SETUP.EXE prgram. This will create a PrUCL directry and tw flders: 1) The USER GUIDE (this dcument), and 2) DATA (example data sets). T run the prgram, use Windws Explrer t lcate the PrUCL applicatin file, and Duble click n it, r use the RUN cmmand frm the start menu t lcate the PrUCL.exe file, and run PrUCL.exe. T uninstall the prgram, use Windws Explrer t lcate and delete the PrUCL flder. Cautin: If yu have previus versins f the PrUCL, which were installed n yur cmputer, yu shuld remve r rename the directry in which earlier PrUCL versins are currently lcated. Installatin Instructins when Cpying frm a CD Create a flder named PrUCL 5.0 n a lcal hard drive f the machine yu wish t install PrUCL 5.0. Extract the zipped file PrUCL.zip t the flder yu have just created. Run PrUCL.exe. Nte: If yu have extensin turned ff, the prgram will shw with the name PrUCL in yur directry and have an Icn with the label PrUCL. Creating a Shrtcut fr PrUCL 5.0 n Desktp T create a shrtcut f the PrUCL prgram n yur desktp, g t yur PrUCL directry and right click n the executable prgram and send it t desktp. A PrUCL icn will be displayed n yur desktp. This shrtcut will pint t the PrUCL directry cnsisting f all files required t execute PrUCL 5.0. Cautin: It shuld be nted that since all files in yur PrUCL directry are needed t execute the PrUCL sftware, ne needs yu generate a shrtcut using the prcess described abve. Specifically, simply dragging the PrUCL executable file frm Windw Explrer nt yur desktp will nt wrk successfully (an errr message will appear) as all files needed t run the sftware are nt available n yur desktp. Yur shrtcut shuld pint t the directry path with all required PrUCL files. iv
7 Getting Started The functinality and the use f the methds and ptins available in PrUCL 5.0 have been illustrated using Screen shts f utput screen generated by PrUCL 5.0. PrUCL 5.0 uses a pulldwn menu structure, similar t a typical Windws prgram. The screen shwn belw appears when the prgram is executed. Navigatin Panel Main Windw Lg Panel The abve screen cnsists f three main windw panels: The MAIN WINDOW displays data sheets and utputs results frm the prcedure used. The NAVIGATION PANEL displays the name f data sets and all generated utputs. The navigatin panel can hld up t 40 utput files. In rder t see mre files (data files r generated utput files), ne can click n Widw Optin. In the NAVIGATION PANEL, PrUCL assigns self explanatry names t utput files generated using the varius mdules f PrUCL. If the same mdule (e.g., Time Series Plt) is used many times, PrUCL identifies them by using letters a, b, c,...and s n as shwn belw. v
8 The user may want t assign names f his chice t these utput files when saving them using the "Save" r "Save As" Optins. The LOG PANEL displays transactins in green, warnings in range, and errrs in red. Fr an example, when ne attempts t run a prcedure meant fr leftcensred data sets n a fulluncensred data set, PrUCL 5.0 will print ut a warning message in range in this panel. Shuld bth panels be unnecessary, yu can chse Cnfigure Panel ON/OFF. The use f this ptin gives extra space t see and print ut the statistics f interest. Fr example, ne may want t turn ff these panels when multiple variables (e.g., multiple quantilequantile [QQ] plts) are analyzed and gdnessffit (GOF) statistics and ther statistics may need t be captured fr all f the selected variables. vi
9 EXECUTIVE SUMMARY The main bjective f the PrUCL sftware funded by the USEPA is t cmpute rigrus statistics t help decisin makers and prject teams in making crrect decisins at a plluted site which are csteffective, and prtective f human health and the envirnment. The PrUCL sftware is based upn the philsphy that rigrus statistical methds can be used t cmpute crrect estimates f ppulatin parameters and decisin making statistics including: the upper cnfidence limit (UCL) f the mean, the upper tlerance limit (UTL), and the upper predictin limit (UPL) t help decisin makers and prject teams in making crrect decisins. A few cmmnly used text bk type methds (e.g., CLT, Student's t UCL) alne cannt address all scenaris and situatins ccurring in the varius envirnmental studies. Since many envirnmental decisins are based upn a 95% UCL (UCL95) f the ppulatin mean, it is imprtant t cmpute crrect UCLs f practical merit. The use and applicability f a statistical methd (e.g., student's tucl, Central Limit Therem (CLT)UCL, adjusted gammaucl, Chebyshev UCL, btstrapt UCL) depend upn data size, data skewness, and data distributin. PrUCL cmputes decisin statistics using several parametric and nnparametric methds cvering a widerange f data variability, distributin, skewness, and sample size. It is anticipated that the availability f the statistical methds in the PrUCL sftware cvering a wide range f envirnmental data sets will help the decisin makers in making mre infrmative and crrect decisins at the varius Superfund and RCRA sites. It is nted that fr mderately skewed t highly skewed envirnmental data sets, UCLs based n the CLT and the Student's tstatistic fail t prvide the desired cverage (e.g., 0.95) t the ppulatin mean even when the sample sizes are as large as 100 r mre. The sample size requirements assciated with the CLT increases with skewness. It will be naive and incrrect t state that a CLT r Student's statistic based UCLs are adequate t estimate EPC terms based upn skewed data sets. These facts have been described in the published dcuments summarizing simulatin experiments cnducted n psitively skewed data sets t evaluate the perfrmances f the varius UCL cmputatin methds. The use f a parametric lgnrmal distributin n a lgnrmally distributed data set yields unstable impractically large UCLs values, especially when the standard deviatin (sd) f the lgtransfrmed data becmes greater than 1.0 and the data set is f small size less than Many envirnmental data sets can be mdeled by a gamma as well as a lgnrmal distributin. The use f a gamma distributin n gamma distributed data sets tends t yield UCL values f practical merit. Therefre, the use f gamma distributin based decisin statistics such as UCLs, UPLs, and UTLs cannt be dismissed by stating that it is easier (than a gamma mdel) t use a lgnrmal mdel t cmpute these upper limits. The suggestins made in PrUCL are based upn the extensive experience f the develpers in envirnmental statistical methds, published envirnmental literature, and prcedures described in varius EPA guidance dcuments. The inclusin f utliers in the cmputatin f the varius decisin statistics tends t yield inflated values f thse decisin statistics, which can lead t incrrect decisins. Often inflated statistics cmputed using a few utliers tend t represent thse utliers rather than representing the main dminant ppulatin f interest (e.g., reference area). It is suggested t identify utliers, bservatins cming frm ppulatin(s) ther than the main dminant ppulatin, befre cmputing the decisin statistics needed t address prject bjectives. The prject team may want t perfrm the statistical evaluatins twice, nce with utliers and nce withut utliers. This exercise will help the prject team in cmputing crrect and defensible decisin statistics needed t make cleanup and remediatin decisins at plluted sites. The initial develpment during and all subsequent upgrades and enhancements f the PrUCL sftware have been funded by USEPA thrugh its Office f Research and Develpment (ORD). Initially vii
10 PrUCL was develped as a research tl fr USEPA scientists and researchers f the Technical Supprt Center and ORDNERL, EPA Las Vegas. Backgrund evaluatins, grundwater mnitring, expsure and risk management and cleanup decisins in supprt f the Cmprehensive Envirnmental Recvery, Cmpensatin, and Liability Act (CERCLA) and Resurce Cnservatin and Recvery Act (RCRA) site prjects f USEPA are ften derived based upn the varius test statistics (e.g., ShapirWilk test, ttest, WilcxnMannWhitney (WMW) test, analysis f variance [ANOVA], MannKendall [MK] test) and decisin statistics including UCLs f mean, UPLs, and UTLs. T address the statistical needs f the envirnmental prjects f the USEPA, ver the years PrUCL sftware has been upgraded and enhanced t include many graphical tls and statistical methds described in the varius EPA guidance dcuments including: EPA 1989a, 1989b, 1991, 1992a, 1992b, 2000 (MARSSIM), 2002a, 2002b, 2002c, 2006a, 2006b, and Several statistically rigrus methds (e.g., fr data sets with NDs) nt easily available in the existing guidance dcuments and in the envirnmental literature are als available in PrUCL versin (PrUCL 5.0). PrUCL 5.0 has graphical, estimatin, and hyptheses testing methds fr uncensredfull data sets and fr leftcensred data sets cnsisting f NDs bservatins with multiple detectin limits (DLs) r reprting limits (RLs). In additin t cmputing general statistics, PrUCL 5.0 has gdnessffit (GOF) tests fr nrmal, lgnrmal and gamma distributins, parametric and nnparametric methds including btstrap methds fr skewed data sets t cmpute varius decisin making statistics such as UCLs f mean (EPA 2002a), percentiles, UPLs fr a certain number f future bservatins (e.g., k with k=1, 2, 3,...), UPLs fr mean f future k ( 1) bservatins, and UTLs (e.g., EPA 1992b, 2002b, and 2009). Many psitively skewed envirnmental data sets can be mdeled by a lgnrmal as well as a gamma mdel. It is wellknwn that fr mderately skewed t highly skewed data sets, the use f a lgnrmal distributin tends t yield inflated and unrealistically large values f the decisin statistics especially when the sample size is small (e.g., <2030). Fr gamma distributed skewed uncensred and leftcensred data sets, PrUCL sftware cmputes decisin statistics including UCLs, percentiles, UPLs fr future k ( 1) bservatins, UTLs, and upper simultaneus limits (USLs). Fr data sets with NDs, PrUCL has several estimatin methds including the KaplanMeier (KM) methd, regressin n rder statistics (ROS) methds and substitutin methds (e.g., replacing NDs by DL, DL/2). PrUCL 5.0 can be used t cmpute upper limits which adjust fr data skewness; specifically, fr skewed data sets, PrUCL 5.0 cmputes upper limits using KM estimates in gamma (lgnrmal) UCL and UTL equatins prvided the detected bservatins in the leftcensred data set fllw a gamma (lgnrmal) distributin. Sme pr perfrming cmmnly used and cited methds such as the DL/2 substitutin methd and Hstatistic based UCL cmputatin methd have been incrprated in PrUCL fr histrical reasns, and research and cmparisn purpses. The Sample Sizes mdule f PrUCL can be used t develp data quality bjectives (DQOs) based sampling designs and t perfrm pwer evaluatins needed t address statistical issues assciated with the varius plluted sites prjects. PrUCL prvides user friendly ptins t enter the desired values fr the decisin parameters such as Type I and Type II errr rates, and ther DQOs used t determine the minimum sample sizes needed t address prject bjectives. The Sample Sizes mdule can cmpute DQOs based minimum sample sizes needed: t estimate the ppulatin mean; t perfrm single and twsample hyptheses testing appraches; and in acceptance sampling t accept r reject a batch f discrete items such as a lt f drums cnsisting f hazardus waste. Bth parametric (e.g., ttest) and nnparametric (e.g., Sign test, WMW test, test fr prprtins) sample size determinatin methds are available in PrUCL. PrUCL has explratry graphical methds fr bth uncensred data sets and fr leftcensred data sets cnsisting f ND bservatins. Graphical methds in PrUCL include histgrams, multiple quantilequantile (QQ) plts, and sidebyside bx plts. The use f graphical displays prvides additinal insight viii
11 abut the infrmatin cntained in a data set that may nt therwise be revealed by the use f estimates (e.g., 95% upper limits) and test statistics (e.g., twsample ttest, WMW test). In additin t prviding infrmatin abut the data distributins (e.g., nrmal r gamma), QQ plts are als useful in identifying utliers and the presence f mixture ppulatins (e.g., data frm several ppulatins) ptentially present in a data set. Sidebyside bx plts and multiple QQ plts are useful t visually cmpare tw r mre data sets, such as: siteversusbackgrund cnstituent cncentratins, surfaceversussubsurface cncentratins, and cnstituent cncentratins f several grundwater mnitring wells (MWs). PrUCL als has a cuple f classical utlier test prcedures, such as the Dixn test and the Rsner test which can be used n uncensred data sets as well as n leftcensred data sets cnsisting f ND bservatins. PrUCL has parametric and nnparametric singlesample and twsample hyptheses testing appraches fr uncensred as well as leftcensred data sets. Singlesample hyptheses tests: Student s ttest, Sign test, Wilcxn Signed Rank test, and the Prprtin test are used t cmpare site mean/median cncentratins (r sme ther threshld such as an upper percentile) with sme average cleanup standard, C s (r a nttexceed cmpliance limit, A 0 ) t verify the attainment f cleanup levels (EPA, 1989a; MARSSIM, 2000; EPA 2006a) at remediated site areas f cncern. Singlesample tests such as the Sign test and Prprtin test, and upper limits including UTLs and UPLs are als used t perfrm intrawell cmparisns. Several twsample hyptheses tests as described in EPA guidance dcuments (e.g., EPA 2002b, 2006b, 2009) are als available in the PrUCL sftware. The twsample hyptheses testing appraches in PrUCL include: Student s ttest, WMW test, Gehan test and TarneWare test. The twsample tests are used t cmpare cncentratins f tw ppulatins such as site versus backgrund, surface versus subsurface sils, and upgradient versus dwngradient wells. The Oneway Analysis f Variance (ANOVA) mdule in PrUCL has bth classical and nnparametric KruskalWallis (KW) tests. Oneway ANOVA is used t cmpare means (r medians) f multiple grups such as cmparing mean cncentratins f several areas f cncern and t perfrm interwell cmparisns. In grundwater (GW) mnitring applicatins, the rdinary least squares (OLS) f regressin, trend tests, and time series plts are used t identify upwards r dwnwards trends ptentially present in cnstituent cncentratins identified in GW mnitring wells ver a certain perid f time. The Trend Analysis mdule perfrms MannKendall trend test and TheilSen trend test n data sets with missing values; and generates trend graphs displaying a parametric OLS regressin line and nnparametric TheilSen trend line. The Time Series Plts ptin can be used t cmpare multiple timeseries data sets. The use f the incremental sampling methdlgy (ISM) has been recmmended (ITRC, 2012) t cllect ISM sil samples needed t cmpute mean cncentratins f the decisin units (DUs) and sampling units (SUs) requiring characterizatin and remediatin activities. At many plluted sites, a large amunt f discrete nsite and/r ffsite backgrund data are already available which cannt be directly cmpared with newly cllected ISM data. In rder t prvide a tl t cmpare the existing discrete backgrund data with actual field nsite r backgrund ISM data, a Mnte Carl Backgrund Incremental Sample Simulatr (BISS) mdule has been incrprated in PrUCL 5.0 (blcked fr general public use) which may be used n a large existing discrete backgrund data set. The BISS mdule simulates incremental sampling methdlgy based equivalent backgrund incremental samples. The availability f a large discrete backgrund data set cllected frm areas with gelgical cnditins cmparable t the DU(s) f interest is a prerequisite fr successful applicatin f this mdule. The BISS mdule has been temprarily blcked fr use in PrUCL 5.0 as this mdule is awaiting adequate instructins and guidance fr its intended use n discrete backgrund data sets. PrUCL 5.0 is a user friendly freeware package prviding statistical and graphical tls needed t address statistical issues described in the varius EPA guidance dcuments. PrUCL 5.0 can prcess many ix
12 cnstituents (variables) simultaneusly t: perfrm varius tests (e.g., ANOVA and trend test statistics) and cmpute decisin statistics including UCLs f mean, UPLs, and UTLs a capability nt available in several cmmercial sftware packages such as Minitab 16 and NADA fr R (Helsel, 2013). PrUCL 5.0 als has the capability f prcessing data by grup variables. PrUCL 5.0 is easy t use and it des nt require any prgramming skills as needed when using ther sftware packages such as Minitab, SAS, and prgrams written in R script. Methds incrprated in PrUCL 5.0 have been tested and verified extensively by the develpers and the varius researchers, scientists, and users. The results btained by PrUCL are in agreement with the results btained by using ther sftware packages including Minitab, SAS, and prgrams written in R Script. PrUCL 5.0 cmputes decisin statistics (e.g., UPL, UTL) based upn the KM methd in a straight frward manner withut flipping the data and reflipping the cmputed statistics fr leftcensred data sets; these peratins are nt easy fr a typical user t understand and perfrm. This can unnecessarily becme tedius when cmputing decisin statistics fr multiple variables/analytes. Mrever, unlike survival analysis, it is imprtant t cmpute an accurate estimate f the sd which is needed t cmpute decisin making statistics including UPLs and UTLs. Fr leftcensred data sets, PrUCL cmputes a KM estimate f sd directly. These issues are elabrated by examples discussed in this User Guide and in the accmpanying PrUCL 5.0 Technical Guide. x
13 Table f Cntents NOTICE... ii Minimum Hardware Requirements...iii Sftware Requirements...iii Installatin Instructins when Dwnlading frm the EPA Web Site...iv EXECUTIVE SUMMARY...vii Table f Cntents...xi Cntact Infrmatin fr all Versins f PrUCL... xvi ACRONYMS and ABBREVIATIONS... xviii Acknwledgements... xxiii Intrductin Overview f PrUCL Versin Sftware... 1 The Need fr PrUCL Sftware... 5 PrUCL 5.0 Capabilities... 8 PrUCL 5.0 Technical Guide Chapter 1 Guidance n the Use f Statistical Methds and Assciated Minimum Sample Size Requirements fr PrUCL Sftware Backgrund Data Sets Site Data Sets Discrete Samples r Cmpsite Samples? Upper Limits and Their Use PintbyPint Cmparisn f Site Observatins with BTVs, Cmpliance Limits, and Other Threshld Values Hypthesis Testing Appraches and Their Use Single Sample Hyptheses (Preestablished BTVs and NttExceed Values are Knwn) TwSample Hyptheses (BTVs and NttExceed Values are Unknwn) Minimum Sample Size Requirements and Pwer Assessment Sample Sizes fr Btstrap Methds Statistical Analyses by a Grup ID Statistical Analyses fr Many Cnstituents/Variables Use f Maximum Detected Value as Estimates f Upper Limits Use f Maximum Detected Value t Estimate BTVs and NttExceed Values Use f Maximum Detected Value t Estimate EPC Terms Chebyshev Inequality Based UCL Samples with Nndetect Observatins Avid the Use f DL/2 Methd t Cmpute UCL Samples with Lw Frequency f Detectin Sme Other Applicatins f Methds in PrUCL Identificatin f COPCs Identificatin f NnCmpliance Mnitring Wells Verificatin f the Attainment f Cleanup Standards, C s xi
14 Using BTVs (Upper Limits) t Identify Ht Spts Sme General Issues and Recmmendatins made by PrUCL Multiple Detectin Limits PrUCL Recmmendatin abut ROS Methd and Substitutin (DL/2) Methd The Unfficial User Guide t PrUCL4 (Helsel and Gilry, 2012) Chapter 2 Entering and Manipulating Data Creating a New Data Set Opening an Existing Data Set Input File Frmat Number Precisin Entering and Changing a Header Name Saving Files Editing Handling Nndetect Observatins and Generating Files with Nndetects Cautin Summary Statistics fr Data Sets with Nndetect Observatins Warning Messages and Recmmendatins fr Datasets with an Insufficient Amunt f Data Handling Missing Values User Graphic Display Mdificatin Graphics Tl Bar DrpDwn Menu Graphics Tls Chapter 3 Select Variables Screen Select Variables Screen Graphs by Grups Chapter 4 General Statistics General Statistics fr Full Data Sets withut NDs General Statistics with NDs Chapter 5 Imputing Nndetects Using ROS Methds...62 Chapter 6 Graphical Methds (Graph) Bx Plt Histgram QQ Plts Multiple QQ Plts Multiple QQ plts (Uncensred data sets) Multiple Bx Plts Multiple Bx plts (Uncensred data sets) Chapter 7 Classical Outlier Tests Outlier Test fr Full Data Set Outlier Test fr Data Sets with NDs Chapter 8 GdnessfFit (GOF) Tests fr Uncensred and LeftCensred Data Sets GdnessfFit test in PrUCL GdnessfFit Tests fr Uncensred Full Data Sets GOF Tests fr Nrmal and Lgnrmal Distributin GOF Tests fr Gamma Distributin xii
15 8.3 GdnessfFit Tests Excluding NDs Nrmal and Lgnrmal Optins Gamma Distributin Optin GdnessfFit Tests with ROS Methds Nrmal r Lgnrmal Distributin (LgROS Estimates) Gamma Distributin (GammaROS Estimates) GdnessfFit Tests with DL/2 Estimates Nrmal r Lgnrmal Distributin (DL/2 Estimates) GdnessfFit Test Statistics Chapter 9 SingleSample and TwSample Hyptheses Testing Appraches SingleSample Hyptheses Tests SingleSample Hypthesis Testing fr Full Data withut Nndetects SingleSample ttest SingleSample Prprtin Test SingleSample Sign Test SingleSample Wilcxn Signed Rank (WSR) Test SingleSample Hypthesis Testing fr Data Sets with Nndetects Single Prprtin Test n Data Sets with NDs SingleSample Sign Test with NDs SingleSample Wilcxn Signed Rank Test with NDs TwSample Hyptheses Testing Appraches TwSample Hypthesis Tests fr Full Data TwSample ttest withut NDs TwSample WilcxnMannWhitney (WMW) Test withut NDs TwSample Hypthesis Testing fr Data Sets with Nndetects TwSample WilcxnMannWhitney Test with Nndetects TwSample Gehan Test fr Data Sets with Nndetects TwSample TarneWare Test fr Data Sets with Nndetects. 127 Chapter 10 Cmputing Upper Limits t Estimate Backgrund Threshld Values Based Upn Full Uncensred Data Sets and LeftCensred Data Sets with Nndetects Backgrund Statistics fr Full Data Sets withut Nndetects Nrmal r Lgnrmal Distributin Gamma Distributin Nnparametric Methds All Statistics Optin Backgrund Statistics with NDs Nrmal r Lgnrmal Distributin Gamma Distributin Nnparametric Methds (with NDs) All Statistics Optin Chapter 11 Cmputing Upper Cnfidence Limits (UCLs) f Mean Based Upn Full Uncensred Data Sets and LeftCensred Data Sets with Nndetects UCLs fr Full (w/ NDs) Data Sets Nrmal Distributin (Full Data Sets withut NDs) Gamma, Lgnrmal, Nnparametric, All Statistics Optin (Full Data withut NDs) xiii
16 11.2 UCL fr LeftCensred Data Sets with NDs Chapter 12 Sample Sizes Based Upn User Specified Data Quality Objectives (DQOs) and Pwer Assessment Estimatin f Mean Sample Sizes fr SingleSample Hypthesis Tests Sample Size fr SingleSample ttest Sample Size fr SingleSample Prprtin Test Sample Size fr SingleSample Sign Test Sample Size fr SingleSample Wilcxn Signed Rank Test Sample Sizes fr TwSample Hypthesis Tests Sample Size fr TwSample ttest Sample Size fr TwSample Wilcxn MannWhitney Test Sample Sizes fr Acceptance Sampling Chapter 13 Analysis f Variance Classical Oneway ANOVA Nnparametric ANOVA Chapter 14 Ordinary Least Squares f Regressin and Trend Analysis Simple Linear Regressin MannKendall Test Theil Sen Test Time Series Plts Chapter 15 Backgrund Incremental Sample Simulatr (BISS) Simulating BISS Data frm a Large Discrete Backgrund Data Chapter 16 Windws Cpying and Saving Graphs Printing Graphs Printing Nngraphical Outputs Saving Output Screens as Excel Files Chapter 18 Summary and Recmmendatins t Cmpute a 95% UCL fr Full Uncensred and LeftCensred Data Sets with NDs Cmputing UCL95s f the Mean Based Upn Uncensred Full Data Sets Cmputing UCLs Based Upn LeftCensred Data Sets with Nndetects GLOSSARY REFERENCES xiv
17 xv
18 PrUCL Sftware PrUCL versin (PrUCL 5.0), its earlier versins: PrUCL versin , , , , , and , assciated Facts Sheet, User Guides and Technical Guides (e.g., EPA 2010a, 2010b) can be dwnladed frm the fllwing EPA website: Material fr a cuple f PrUCL webinars ffered in March 2011, and relevant literature used in the develpment f PrUCL 5.0 can als be dwnladed frm the abve EPA website. Cntact Infrmatin fr all Versins f PrUCL The PrUCL sftware is develped under the directin f the Technical Supprt Center (TSC). As f Nvember 2007, the directin f the TSC is transferred frm Brian Schumacher t Felicia Barnett. Therefre, any cmments r questins cncerning all versins f PrUCL shuld be addressed t: Felicia Barnett, Directr ORD Site Characterizatin and Mnitring Technical Supprt Center (SCMTSC) Superfund and Technlgy Liaisn, Regin 4 U.S. Envirnmental Prtectin Agency 61 Frsyth Street SW, Atlanta, GA (404) Fax: (404) xvi
19 xvii
20 ACRONYMS and ABBREVIATIONS ACL AD, AD AM AOC ANOVA A 0 BCA BIS BISS BTV CC, cc CERCLA CL CLT COPC COPCs C s CSM CV DL, L DL/2 (t) DL/2 Estimates DOE DQOs DU EA EDF EM alternative cmpliance r cncentratin limit AndersnDarling test arithmetic mean area(s) f cncern analysis f variance nt t exceed cmpliance limit r specified actin level biascrrected accelerated btstrap methd Backgrund Incremental Sample Backgrund Incremental Sample Simulatr backgrund threshld value cnfidence cefficient Cmprehensive Envirnmental Recvery, Cmpensatin, and Liability Act cmpliance limit central limit therem cntaminant/cnstituent f ptential cncern Cntaminants/cnstituents f ptential cncern cleanup standards cnceptual site mdel cefficient f variatin detectin limit UCL based upn DL/2 methd using Student s tdistributin cutff value estimates based upn data set with NDs replaced by half f the respective detectin limits Department f Energy data quality bjectives decisin unit expsure area empirical distributin functin expectatin maximizatin xviii
21 EPA EPC GB GHz GROS GOF, G.O.F. GOF QQ Plt GUI HUCL H A H 0 i.i.d. United States Envirnmental Prtectin Agency expsure pint cncentratin Gigabyte Gigahertz gamma ROS gdnessffit QuantileQuantile Plt shwing GOF statistics graphical user interface UCL based upn Land s Hstatistic alternative hypthesis null hypthesis independently identically distributed ITRC Interstate Technlgy & Regulatry Cuncil k, K a psitive integer representing future r next k bservatins k K k hat k star KM (%) KM (Chebyshev) KM (t) KM (z) KM, KM KS, KS KW LCL LN, ln LPL LROS LTL number f nndetects in a sample shape parameter f a gamma distributin MLE f the shape parameter f a gamma distributin biased crrected MLE f the shape parameter f a gamma distributin UCL based upn KaplanMeier estimates using the percentile btstrap methd UCL based upn KaplanMeier estimates using the Chebyshev inequality UCL based upn KaplanMeier estimates using the Student s tdistributin critical value UCL based upn KaplanMeier estimates using critical value f a standard nrmal distributin KaplanMeier KlmgrvSmirnv Kruskal Wallis lwer cnfidence limit lgnrmal distributin lwer predictin limit lgros; rbust ROS lwer tlerance limit xix
22 LSL lwer simultaneus limit M, m applied t incremental sampling: number f increments in a BISS sample MAD median abslute deviatin MARSSIM MultiAgency Radiatin Survey and Site Investigatin Manual MCL maximum cncentratin limit, maximum cmpliance limit MDD minimum detectable difference MDL methd detectin limit MK, MK MannKendall ML maximum likelihd MLE maximum likelihd estimate MLE (t) UCL based upn ML estimates using Student s tdistributin critical value Multiple QQ multiple quantilequantile plt MVUE minimum variance unbiased estimate MW mnitring well ND, nd, Nd nndetect NERL Natinal Expsure Research Labratry NRC OKG OLS ORD PCA PDF, pdf Pdf PRG QQ R RAGS RCRA RL ROS RPM Nuclear Regulatry Cmmissin Orthgnalized Kettenring Gnanadesikan rdinary least squares Office f Research and Develpment principal cmpnent analysis prbability density functin files in pdf frmat preliminary remediatin gals quantilequantile applied t incremental sampling: number f replicate ISM Risk Assessment Guidance fr Superfund Resurce Cnservatin and Recvery Act reprting limit regressin n rder statistics Remedial Prject Manager xx
23 RSD relative standard deviatin S substantial difference SCMTSC Site Characterizatin and Mnitring Technical Supprt Center SD, Sd, sd standard deviatin SE standard errr s p SSL SQL SU SW, SW TS TSC TW, TW UCL UCL95 UPL U.S. EPA, USEPA UTL USGS USL WMW WRS WSR pled standard deviatin sil screening levels sample quantitatin limit sampling unit ShapirWilk TheilSen Technical Supprt Center TarneWare upper cnfidence limit 95% upper cnfidence limit upper predictin limit United States Envirnmental Prtectin Agency upper tlerance limit U.S. Gelgical Survey upper simultaneus limit WilcxnMannWhitney Wilcxn Rank Sum Wilcxn Signed Rank < less than > Greater than greater than r equal t less than r equal t X p Δ Σ p th percentile f a distributin Greek letter denting the width f the gray regin assciated with hypthesis testing Greek letter representing the summatin f several mathematical quantities, numbers % represents the percentage symbl α Type I errr rate xxi
24 β σ Ө Type II errr rate standard deviatin f a lgtransfrmed sample scale parameter f a gamma distributin xxii
25 Acknwledgements We wish t express ur gratitude and thanks t ur friends and clleagues wh have cntributed during the develpment f past versins f PrUCL and t all f the many peple wh reviewed, tested, and gave helpful suggestins thrughut the develpment f the PrUCL sftware package. We wish t especially acknwledge EPA scientists including Deana Crumbling, Nancy RisJaflla, Tim Frederick, Dr. Maliha Nash, Kira Lynch, and Marc Stiffleman; James Durant f ATSDR, Dr. Steve Rberts f University f Flrida, Dr. Elise A. Striz f NRC, and Drs. Phillip Gdrum and Jhn Samuelian f Integral Cnsulting Inc. fr testing and reviewing PrUCL 5.0 and its assciated guidance dcuments, and fr prviding helpful cmments and suggestins. Special thanks g t Ms D. Getty and Mr. R. Leuser f Lckheed Martin fr prviding a thrugh technical and editrial review f PrUCL 5.0 User Guide and Technical Guide. A special nte f thanks is due t Ms. Felicia Barnett f EPA ORD Site Characterizatin and Mnitring Technical Supprt Center (SCMTSC), withut whse assistance the develpment f the PrUCL5.0 sftware and assciated guidance dcuments wuld nt have been pssible. Finally, we wish t dedicate the PrUCL 5.0 sftware package t ur friend and clleague, Jhn M. Ncerin wh had cntributed significantly in the develpment f PrUCL and Scut sftware packages. xxiii
26
27 Intrductin Overview f PrUCL Versin Sftware The main bjective f the PrUCL sftware funded by the USEPA is t cmpute rigrus decisin statistics t help the decisin makers in making crrect decisins which are csteffective, and prtective f human health and the envirnment. The PrUCL sftware is based upn the philsphy that rigrus statistical methds can be used t cmpute the crrect estimates f ppulatin parameters (e.g., site mean, backgrund percentiles) and decisin making statistics including the upper cnfidence limit f (UCL) the mean, the upper tlerance limit (UTL), and the upper predictin limit (UPL) t help the decisin makers and prject teams in making crrect decisins. The use and applicability f a statistical methd (e.g., student's tucl, Central Limit Therem (CLT)UCL, adjusted gammaucl, Chebyshev UCL, btstrapt UCL) depend upn data size, data variability, data skewness, and data distributin. PrUCL cmputes decisin statistics using several parametric and nnparametric methds cvering a widerange f data variability, skewness, and sample size. A cuple f text bk methds described in mst f the statistical text bks (e.g., Hgg and Craig, 1995) based upn the Student's tstatistic and the CLT alne cannt address all scenaris and situatins cmmnly ccurring in the varius envirnmental studies. It is naive and incrrect t state r assume that Student's tstatistic and/r CLT based UCLs f mean will prvide the desired cverage (e.g., 0.95) t the ppulatin mean irrespective f the skewness f the data set/ppulatin under cnsideratin. These issues have been discussed in detail in Chapters 2 and 4 f the PrUCL 5.0 Technical guide. Several examples have been discussed thrughut this guidance dcument and als in the accmpanying PrUCL 5.0 Technical Guide t elabrate n these issues. The use f a parametric lgnrmal distributin n a lgnrmally distributed data set tends t yield unstable impractically large UCLs values, especially when the standard deviatin f the lgtransfrmed data is greater than 1.0 and the data set is f small size such as less than (Hardin and Gilbert, 1993; Singh, Singh, and Engelhardt, 1997). Many envirnmental data sets can be mdeled by a gamma as well as a lgnrmal distributin. Generally, the use f a gamma distributin n gamma distributed data sets yields UCL values f practical merit (Singh, Singh, and Iaci, 2002). Therefre, the use f gamma distributin based decisin statistics such as UCLs, UPL, and UTLs cannt be dismissed just because it is easier t use a lgnrmal mdel t cmpute these upper limits r incrrectly assuming that the tw distributins behave in a similar manner. The advantages f cmputing the gamma distributin based decisin statistics are discussed in Chapters 25 f the PrUCL 5.0 Technical guidance dcument. Since many envirnmental decisins are made based upn a 95% UCL f the ppulatin mean, it is imprtant t cmpute crrect UCLs and ther decisin making statistics f practical merit. In an effrt t cmpute crrect UCLs f the ppulatin mean and ther decisin making statistics, in additin t cmputing the Student's t statistic and the CLT based statistics (e.g., UCLs, UPLs), significant effrt has been made t incrprate rigrus statistical methds based UCLs (and ther limits) in the PrUCL sftware cvering a widerange f data skewness and sample sizes (e.g., Singh, Singh, and Engelhardt, 1997; Singh, Singh, and Iaci, 2002; and Singh, Singh, 2003). It is anticipated that the availability f the statistical methds in the PrUCL sftware cvering a wide range f envirnmental data sets will help the decisin makers in making mre infrmative and crrect decisins at the varius plluted sites. It is nted that even fr skewed data sets, practitiners tend t use the CLT r Student's tstatistic based UCLs f mean based upn samples f sizes (large sample rulefthumb t use CLT). Hwever, this rulefthumb des nt apply t mderately skewed t highly skewed data sets, specifically when σ 1
28 (standard deviatin f the lgtransfrmed data) starts exceeding 1. The large sample requirement assciated with the use f the CLT depends upn the skewness f the data distributin under cnsideratin. The large sample requirement fr the sample mean t fllw an apprximate nrmal distributin increases with the data skewness; and fr skewed data sets, even samples f size greater than (>)100 may nt be large enugh fr the sample mean t fllw an apprximate nrmal distributin. Fr mderately skewed t highly skewed envirnmental data sets, as expected, UCLs based n the CLT and the Student's t statistic fail t prvide the desired cverage t the ppulatin mean even when the sample sizes are as large as 100 r mre. These facts have been verified in the published simulatin experiments cnducted n psitively skewed data sets (e.g., Singh, Singh, and Engelhardt, 1997; Singh, Singh, and Iaci, 2002; and Singh and Singh, 2003). The initial develpment and all subsequent upgrades and enhancement f the PrUCL sftware have been funded by the USEPA thrugh its Office f Research and Develpment (ORD). Initially PrUCL was develped as a research tl fr scientists and researchers f the Technical Supprt Center and ORD NERL, EPA Las Vegas. During , the initial intent and bjectives f develping the PrUCL sftware (Versin 1.0 and Versin 2.0) were t prvide a statistical research tl t EPA scientists which can be used t cmpute theretically sund 95% upper cnfidence limits (UCL95s) f the mean rutinely used in expsure assessment, risk management and cleanup decisins made at varius CERCLA and RCRA sites (EPA 1992a, 2002a). During 2002, the peerreviewed PrUCL versin 2.1 (with Chebyshev inequality based UCLs) was released fr public use. Several researchers have develped rigrus parametric and nnparametric statistical methds (e.g., Jhnsn, 1978; Grice and Bain, 1980; Efrn (1981, 1982); Efrn and Tibshirani, 1993; Hall (1988, 1992); Suttn, 1993; Chen, 1995; Singh, Singh, and Engelhardt, 1997; Singh, Singh, and Iaci, 2002) t cmpute upper limits (e.g., UCLs) which adjust fr data skewness. Since Student's tucl, CLTUCL, and percentile btstrap UCL fail t prvide the desired cverage t the ppulatin mean f skewed distributins, several parametric (e.g., gamma distributin based) and nnparametric (e.g., BCA btstrap and btstrapt, Chebyshev UCL) UCL cmputatin methds which adjust fr data skewness were incrprated in PrUCL versins 3.0 and during PrUCL versin als had graphical quantilequantile (QQ) plts and GOF tests fr nrmal, lgnrmal, and gamma distributins; capabilities t statistically analyze multiple variables simultaneusly were als incrprated in PrUCL (EPA 2004). It is imprtant t cmpute decisin statistics (e.g., UCLs, UTLs) which are csteffective and prtective f human health and the envirnment (balancing between Type I and Type II errrs), therefre, ne cannt dismiss the use f the better [better than tucl, CLTUCL, ROS and KM percentile btstrap UCL, KMUCL (t)] perfrming UCL cmputatin methds including gamma UCLs and the varius btstrap UCLs which adjust fr data skewness. During , PrUCL was upgraded t versins , and These upgrades included explratry graphical (e.g., QQ plts, bx plts) and statistical (e.g., maximum likelihd estimatin [MLE], KM, and ROS) methds fr leftcensred data sets cnsisting f nndetect (NDs) bservatins with multiple DLs r RLs. Fr uncensred and leftcensred data sets, these upgrades prvide statistical methds t cmpute upper limits: percentiles, UPLs and UTLs needed t estimate sitespecific backgrund level cnstituent cncentratins r backgrund threshld values (BTVs). T address statistical needs f backgrund evaluatin prjects (e.g., MARSSIM, 2000; EPA 2002b), several singlesample and twsample hyptheses testing appraches were als included in these PrUCL upgrades. During , PrUCL was upgraded t PrUCL The upgraded PrUCL was enhanced by including methds t cmpute gamma distributin based UPLs and UTLs (Krishnamrthy, Mathew, and Mukherjee, 2008). The Sample Size mdule t cmpute DQOs based minimum sample sizes needed t 2
29 address statistical issues assciated with the varius envirnmental prjects (e.g., MARSSIM, 2000; EPA [2002c, 2006a, 2006b]) was als incrprated in PrUCL During , PrUCL was upgraded t PrUCL 4.1 and PrUCL 4.1 (2010) and (2011) retain all capabilities f the previus versins f PrUCL sftware. Tw new mdules: Oneway ANOVA and Trend Analysis were included in PrUCL 4.1. The Oneway ANOVA mdule has bth parametric and nnparametric ANOVA tests t perfrm interwell cmparisns. The Trend Analysis mdule can be used t determine ptential upward r dwnward trends present in cnstituent cncentratins identified in GW mnitring wells (MWs). The Trend Analysis mdule can cmpute MannKendall (MK) and TheilSen (TS) trend statistics t determine upward r dwnward trends ptentially present in analyte cncentratins. PrUCL 4.1 als has the Ordinary Least Squares (OLS) Regressin mdule. In PrUCL 4.1, sme mdificatins were made in decisin tables used t make recmmendatins regarding the use f UCL95 t estimate EPC terms. Specifically, based upn the recent experience, develpers f PrUCL reiterated that the use f a lgnrmal distributin t estimate EPC terms and BTVs shuld be avided, as the use f lgnrmal distributin tends t yield unrealistic and unstable values f the decisin making statistics including UCL, UPL, and UTL; this is especially true when the sample size is <2030 and the data set is mderately skewed t highly skewed. During March 2011, a cuple f webinars were presented describing the capabilities and use f the methds available in PrUCL 4.1. PrUCL versin represents an upgrade f PrUCL (EPA, June 2011) which represents an upgrade f PrUCL (EPA 2010). Fr uncensred and leftcensred data sets, PrUCL 5.0 cnsists f all statistical and graphical methds that are available in the previus versins f the PrUCL sftware package except fr a cuple f pr perfrming and restricted (e.g., can be used nly when a single detectin limit is present) estimatin methds such as the MLE and winsrizatin methds fr leftcensred data sets. PrUCL has GOF tests fr nrmal, lgnrmal, and gamma distributins fr uncensred and leftcensred data sets with NDs. PrUCL 5.0 has the extended versin f the Shapir Wilk (SW) test t perfrm nrmal and lgnrmal GOF tests fr data sets f sizes up t 2000 (Rystn [1982, 1982a]). In additin t nrmal and lgnrmal distributin based decisin statistics, PrUCL sftware cmputes UCLs, UPLs, and UTLs based upn the gamma distributin. Several enhancements have been made in the UCLs and BTVs mdules f the PrUCL 5.0 sftware. A new statistic, an upper simultaneus limit (Singh and Ncerin, 2002; Wilks, 1963) has been incrprated in the Upper limits/btvs mdule f PrUCL fr data sets cnsisting f NDs with multiple DLs, a twsample hypthesis test, the TarneWare (TW; Tarne and Ware, 1978) test has been incrprated in PrUCL 5.0. Nnparametric tlerance limits have been enhanced, and fr specific values f cnfidence cefficients, cverage prbability, and sample size, PrUCL 5.0 utputs the cnfidence cefficient actually achieved by a UTL. The Trend Analysis and OLS Regressin mdules can handle missing events t cmpute trend test statistics and generate trend graphs. Sme new methds using KM estimates in gamma (and lgnrmal) distributin based UCL, UPL, and UTL equatins have been incrprated t cmpute the decisin statistics fr data sets cnsisting f nndetect bservatins. T facilitate the cmputatin f UCLs frm ISM based samples (ITRC, 2012); the minimum sample size requirement has been lwered t 3, s that ne can cmpute the UCL95 based upn ISM data sets f sizes 3. T select an apprpriate UCL95 f mean fr ISM data set, the user shuld cnsult the ITRC (2012) Tech Reg Guide n Incremental Sampling Methdlgy. All knwn bugs, typgraphical errrs, and discrepancies fund by the develpers and the varius users f the PrUCL sftware package have been addressed in the PrUCL versin Specifically, a discrepancy fund in the estimate f mean based upn the KM methd has been fixed in PrUCL
30 Sme changes have been made in the decisin lgic used in GOF and UCL mdules. In practice, based upn a given data set, it is well knwn that the tw statistical tests (e.g., TheilSen and OLS trend tests) can lead t different cnclusins. T streamline the decisin lgic assciated with the cmputatin f the varius UCLs, the decisin tables in PrUCL 5.0 have been updated. Specifically, fr each distributin if at least ne f the tw GOF tests (e.g., ShapirWilk r Lilliefrs test fr nrmality) determines that the hypthesized distributin hlds, then PrUCL cncludes that the data set fllws the hypthesized distributin, and decisin statistics are cmputed accrdingly. Additinally, fr gamma distributed data sets, PrUCL 5.0 suggests the use f the: adjusted gamma UCL fr samples f sizes 50 (instead f 40 suggested in previus versins); and apprximate gamma UCL fr samples f sizes >50. Als, fr samples f larger sizes (e.g., with n > 100) and small values f the gamma shape parameter, k (e.g., k 0.1), significant discrepancies were fund in the critical values f the tw gamma GOF test statistics (AndersnDarling and Klmgrv Smirnv tests) btained using the tw gamma deviate generatin algrithms: Whitaker (1974) and Marsaglia and Tsang (2000). Fr values f k 0.2, the critical values f the tw gamma GOF tests: AndersnDarling (AD) and KlmgrvSmirnv (KS) tests have been updated using the currently available mre accurate gamma deviate generatin algrithm due t Marsaglia and Tsang's (2000); mre details abut the implementatin f their algrithm can be fund in Krese, Taimre, and Btev (2011). Fr values f the shape parameter, k=0.025, 0.05, 0.1, and 0.2, the critical value tables fr these tw tests have been updated by incrprating the newly generated critical values fr the three significance levels: 0.05, 0.1, and The updated tables are prvided in Appendix A. It shuld be nted that fr k=0.2, the lder and the newly generated critical values are in general agreement. PrUCL 5.0 als has a new Backgrund Incremental Sample Simulatr (BISS) mdule (temprarily nt available fr general use) which can be used n a large existing discrete backgrund data set t simulate backgrund incremental samples (BIS). The availability f a large discrete data set cllected frm areas with gelgical frmatins and cnditins cmparable t the DUs (backgrund r nsite) f interest is a requirement fr successful applicatin f this mdule. The simulated BISS data can be cmpared with the actual field ISM (ITRC, 2012) data cllected frm the varius DUs using ther mdules f PrUCL 5.0. The values f the BISS data are nt directly available t users; hwever, the simulated BISS data can be accessed by the varius mdules f PrUCL 5.0 t perfrm desired statistical evaluatins. Fr example, the simulated backgrund BISS data can be merged with the actual field ISM data after cmparing the tw data sets using a twsample ttest; the simulated BISS r the merged data can be used t cmpute a UCL f the mean r a UTL. Nte: The ISM methdlgy used t develp the BISS mdule is a relatively new apprach; methds incrprated in this BISS mdule require further investigatin. The BISS mdule has been temprarily blcked fr use in PrUCL 5.0 as this mdule is awaiting adequate guidance fr its intended use n discrete backgrund data sets. Sftware PrUCL versin 5.0, its earlier versins: PrUCL versin , , , , and , assciated Facts Sheet, User Guides and Technical Guides (e.g., EPA [2004, 2007, 2009a, 2009b, 2010a, 2010b]) can be dwnladed frm the EPA website: PrUCL 5.0 is a userfriendly freeware package prviding statistical and graphical tls needed t address statistical issues described in several EPA guidance dcuments. Cnsiderable effrt has been 4
31 made t prvide a detailed technical guide t help practitiners understand statistical methds needed t address statistical needs f their envirnmental prjects. PrUCL generates detailed utput sheets and graphical displays fr each methd which can be used t educate students learning envirnmental statistical methds. Like previus versins, PrUCL 5.0 can prcess many variables simultaneusly t cmpute varius tests (e.g., ANOVA and trend test statistics) and decisin statistics including UCL f mean, UPLs, and UTLs, a capability nt available in ther sftware packages such as Minitab 16 and NADA fr R (Helsel, 2013). Withut the availability f this ptin, the user has t cmpute decisin and test statistics fr ne variable at a time which becmes cumbersme when dealing with a large number f variables. PrUCL 5.0 als has the capability f prcessing data by grups. PrUCL 5.0 is easy t use; it des nt require any prgramming skills as needed when using prgrams written in R Script. The Need fr PrUCL Sftware EPA guidance dcuments (e.g., EPA [1989a, 1989b, 1992a, 1992b, 1994, 1996, 2000, 2002a, 2002b, 2002c, 2006a, 2006b, 2009a, and 2009b]) describe statistical methds including: DQOs based sample size determinatin prcedures, methds t cmpute decisin statistics: UCL95, UPL, and UTLs, parametric and nnparametric hyptheses testing appraches, Oneway ANOVA, OLS regressin, and trend determinatin appraches. Specifically, EPA guidance dcuments (e.g., EPA [2002c, 2006a, 2006b; and MARSSIM, 2000]) describe DQOs based parametric and nnparametric minimum sample size determinatin prcedures needed: t cmpute decisin statistics (e.g., UCL95); t perfrm site versus backgrund cmparisns (e.g., ttest, prprtin test, WMW test); and t determine the number f discrete items (e.g., drums filled with hazardus material) that need t be sampled t meet the DQOs (e.g., specified prprtin, p 0 f defective items, allwable errr margin in an estimate f mean). Statistical methds are used t cmpute test statistics (e.g., SW test, ttest, WMW test, TS trend statistic) and decisin statistics (e.g., 95% UCL, 95% UPL, UTL9595) needed t address statistical issues assciated with CERCLA and RCRA site prjects. Fr example, expsure and risk management and cleanup decisins in supprt f EPA prjects are ften made based upn the mean cncentratins f the cntaminants/cnstituents f ptential cncern (COPCs). Sitespecific BTVs are used in site versus backgrund evaluatin studies. A UCL95 is used t estimate the EPC terms (EPA1992a, 2002a); and upper limits such as upper percentiles, UPLs, r UTLs are used t estimate BTVs r nttexceed values (EPA 1992b, 2002b, and 2009). The estimated BTVs are als used: t identify the COPCs; t identify the site areas f cncern (AOCs); t perfrm intrawell cmparisns t identify MWs nt meeting specified standards; and t cmpare nsite cnstituent cncentratins with sitespecific backgrund level cnstituent cncentratins. Oneway ANOVA is used t perfrm interwell cmparisns, OLS regressin and trend tests are ften used t determine ptential trends present in cnstituent cncentratins identified in grundwater mnitring wells (MWs). Mst f the methds described in this paragraph are available in the PrUCL 5.0 sftware package. It is nted that nt much guidance is available in the guidance dcuments cited abve t cmpute rigrus UCLs, UPLs, and UTLs fr mderately skewed t highly skewed uncensred and leftcensred data sets cnsisting f NDs with multiple DLs, a cmmn ccurrence in envirnmental data sets. Several parametric and nnparametric methds are available in the statistical literature (Singh, Singh, and Engelhardt, 1997; Singh, Singh, and Iaci, 2002; Krishnamrthy et al. 2008; Singh, Maichle, and Lee, 2006) t cmpute UCLs and ther upper limits which adjust fr data skewness. During the years, as new methds became available t address statistical issues related t the envirnmental prjects, thse methds were incrprated in PrUCL sftware s that envirnmental scientists and decisin makers can make mre accurate and infrmative decisins based upn thse rigrus statistical methds. Until 2006, nt much guidance was prvided n hw t cmpute UCL95 f mean and ther upper limits (e.g., UPLs and UTLs) based upn data sets cnsisting f NDs with multiple DLs. Fr data sets with NDs, Singh, 5
32 Maichle, and Lee (EPA 2006) cnducted an extensive simulatin study t cmpare the perfrmances f the varius estimatin methds (in terms f bias in the mean estimate) and UCL cmputatin methds (in terms f cverage prvided by a UCL). They demnstrated that the nnparametric KM methd perfrms well in terms f bias in estimates f mean. They als cncluded that UCLs cmputed using the Student's tstatistic and percentile btstrap methd using the KM estimates d nt prvide the desired cverage t the ppulatin mean f skewed data sets. They demnstrated that the depending upn sample size and data skewness, UCLs cmputed using KM estimates and: the BCA btstrap methd (mildly skewed data sets); the btstrapt methd, and the Chebyshev inequality (mderately t highly skewed data sets) prvide better cverage (clser t the specified 95% cverage) t the ppulatin mean than the varius ther UCL cmputatin methds. Based upn their findings, during , several UCL and ther upper limits cmputatin methds based upn KM and ROS estimates were incrprated in the PrUCL 4.0 sftware. It is nted that since the inclusin f the KM methd in PrUCL 4.0 (2007), the use f the KM methd based upper limits has becme ppular in many envirnmental applicatins t estimate EPC terms and backgrund threshld values (BTVs). The KM methd is als described in the latest versin f the unified RCRA guidance dcument (EPA 2009). It is nt easy t justify distributinal assumptins f data sets cnsisting f bth detects and NDs with multiple DLs. Therefre, based upn the published literature and recent experience, parametric UCL cmputatin methds such as the MLE methds fr nrmal and lgnrmal distributins are excluded frm PrUCL 5.0. Additinally, the winsrizatin methd (Gilbert, 1987) has als been excluded frm PrUCL 5.0 due t its pr perfrmance. PrUCL sftware is als used fr teaching envirnmental statistics curses therefre, in additin t statistical and graphical methds rutinely used t address statistical needs f envirnmental prjects, due t their ppularity sme pr perfrming methds such as the substitutin DL/2 methd and Land's (1975) Hstatistic based UCL cmputatin methd have been retained in PrUCL versin fr research and cmparisn purpses. Methds incrprated in PrUCL 5.0 and in its earlier versins have been tested and verified extensively by the develpers and varius researchers, scientists, and users. Specifically, the results btained by PrUCL 5.0 are in agreement with the results btained by using ther sftware packages including Minitab, SAS, and prgrams available in RScript (nt all methds are available in these sftware packages). Additinally, PrUCL 5.0 utputs several intermediate results (e.g., khat and biased crrected kstar estimates f the gamma shape parameter, k) and critical values (e.g., K factr used t cmpute UTLs, d2max needed t cmpute USL) needed t cmpute the varius decisin statistics f interest, which may help the interested users t verify statistical results cmputed by the PrUCL sftware. PrUCL is a user friendly sftware which can be used t: prcess multiple variables (analytes) simultaneusly (e.g., perfrm ANOVA n many variables); prcess gruped data; t generate and display multiple plts (QQ plts) n the same graphical display. N prgramming skills are needed t use PrUCL sftware. PrUCL prvides warning messages and makes suggestins t help a typical user in selecting the mst apprpriate decisin statistic (e.g., UCL). Nte: The availability f intermediate results and critical values can be used t cmpute lwer limits and twsided intervals which are nt as yet available in the PrUCL sftware. Fr leftcensred data sets, PrUCL 5.0 cmputes decisin statistics (e.g., UCL, UPL, and UTL) based upn KM estimates cmputed in a straight frward manner withut flipping the data and reflipping the decisin statistics; these peratins are nt easy fr a typical user t understand and perfrm and can becme quite tedius when multiple analytes need t be prcessed. Mrever, in envirnmental applicatins it is imprtant t cmpute accurate estimates f standard deviatins which are needed t cmpute the decisin making statistics including UPLs and UTLs. Decisin statistics (UPL, UTL) based upn a KM estimate f the f standard deviatin cmputed using indirect methds can be different frm 6
33 the statistics cmputed using an estimate f sd btained using the KM methd directly, especially when ne is dealing with skewed data set r using a lgtransfrmatin. These issues are elabrated by examples discussed in this Guide and the accmpanying PrUCL 5.0 Tech Guide. Fr uncensred data sets, researchers (e.g., Jhnsn (1978), Chen (1995), Efrn and Tibshirani (1993), Hall [1988, 1992], mre references in Chapters 2 and 3) had develped parametric (e.g., gamma distributin based) and nnparametric (btstrapt and Hall's btstrap methd, mdifiedt) methds t cmpute decisin statistics which adjust fr data skewness. Fr uncensred psitively skewed data sets, Singh, Singh, and Iaci (2002) and Singh and Singh (2003) perfrmed simulatin experiments t cmpare the perfrmances (in terms f cverage prbabilities) f the varius UCL cmputatin methds described in the literature. They demnstrated that fr skewed data sets, UCLs based upn Student's t statistic, central limit therem (CLT), and percentile btstrap methd tend t underestimate the ppulatin mean (EPC term). It is reasnable t state and assume the findings f the simulatin studies perfrmed n uncensred skewed data sets t cmpare the perfrmances f the varius UCL cmputatin methds can be extended t skewed leftcensred data sets. Based upn the findings f thse studies perfrmed n uncensred data sets and als using the findings summarized in Singh, Maichle, and Lee (2006), it is cncluded that tstatistic, CLT, and the percentile btstrap methd based UCLs cmputed using KM estimates (and als ROS estimates) underestimate the ppulatin mean f mderately skewed t highly skewed data sets. Interested users may want t verify these statements via simulatin experiments r therwise. Like uncensred skewed data sets, fr leftcensred data sets, PrUCL 5.0 ffers several parametric and nnparametric methds t cmpute UCLs and ther limits which adjust fr data skewness. In earlier versins f the PrUCL sftware (e.g., PrUCL ), fr leftcensred data sets, KM estimates were used in the nrmal distributin based equatins t cmpute the varius upper limits. Hwever, nrmal distributin based upper limits (e.g., tucl) using KM estimates (r any ther estimates such as ROS estimates) fail t prvide the specified cverage t the parameters (e.g., mean, percentiles) f ppulatins with skewed distributins (Singh, Singh, and Iaci, 2002, Jhnsn, 1978, Chen 1995). Als, the nnparametric UCL cmputatin methds (e.g., percentile btstrap) d nt prvide the desired cverage t the ppulatin means f skewed distributins (e.g., Hall [1988, 1992], Efrn and Tibshirani, 1993). Fr an example, the use f tucl r the percentile btstrap UCL methd n rbust ROS estimates r n KM estimates underestimates the ppulatin mean fr mderately skewed t highly skewed data sets. Chapters 3 and 5 f the PrUCL 5.0 Tech Guide describe parametric and nnparametric KM methd based upper limits cmputatin methds (and available in PrUCL 5.0) which adjust fr data skewness. The KM methd yields gd estimates f the ppulatin mean and standard deviatin (Singh, Maichle, and Lee, 2006); hwever upper limits cmputed using the KM r ROS estimates in nrmal equatins r in the percentile btstrap methd d nt accunt fr skewness present in the data set. Apprpriate UCL cmputatin methds which accunt fr data skewness shuld be used n KM r ROS estimates. Fr leftcensred data sets, PrUCL 5.0 cmputes upper limits using KM estimates in gamma (lgnrmal) UCL, UPL, and UTL equatins (e.g., als suggested in EPA 2009) prvided the detected bservatins in the leftcensred data set fllw a gamma (lgnrmal) distributin. Recently, the use f the ISM methdlgy has been recmmended (ISM ITRC, 2012) t cllect sil samples needed t estimate mean cncentratins f the DUs requiring characterizatin and remediatin activities. PrUCL can be used t cmpute UCLs based upn ISM data as described and recmmended in the ITRC ISM Tech Reg Guide (2012). At many sites, a large amunt f discrete backgrund data is already available which are nt directly cmparable t the actual field ISM data (nsite r backgrund). T cmpare the existing discrete backgrund data with field ISM data, the BISS mdule f PrUCL 5.0 (blcked fr general use in PrUCL versin 5.0 and is awaiting instructins and guidance fr its intended 7
34 use) can be used n a large (e.g., cnsisting f at least 30 bservatins) existing discrete backgrund data set. The BISS mdule simulates incremental sampling methdlgy based equivalent incremental backgrund samples; and each simulated BISS sample represents an estimate f the mean f the ppulatin represented by the discrete backgrund data set. The availability f a large discrete backgrund data set cllected frm areas with gelgical cnditins cmparable t the DU(s) f interest (nsite DUs) is a requirement fr successful applicatin f this mdule. The user cannt see the simulated BISS data; hwever the simulated BISS data can be accessed by the varius ther mdules f PrUCL 5.0 t perfrm desired statistical evaluatins. Fr example, the simulated BISS data can be merged with the actual field ISM data (e.g., field backgrund ISM data) after cmparing the tw data sets using a twsample ttest. The actual field ISM r the merged ISM and BISS data can be accessed by the varius mdules f PrUCL t cmpute a UCL f mean r a UTL. PrUCL 5.0 Capabilities A summary f statistical methds available in the PrUCL sftware is prvided as fllws. Assumptins: Like mst statistical methds, statistical methds t cmpute upper limits (e.g., UCLs, UPLs, UTLs) are als based upn certain assumptins including the availability f a randmly cllected data set cnsisting f independently and identically distributed (i.i.d) bservatins representing the ppulatin (e.g., site area, reference area) under investigatin. A UCL f the mean (f a ppulatin) and BTV estimates (UPL, UTL) shuld be cmputed using a randmly cllected (simple randm r systematic randm) data set representing a single statistical ppulatin (e.g., site ppulatin r backgrund ppulatin). If multiple ppulatins (e.g., backgrund and site data mixed tgether) are present in a data set, it is recmmended t separate them ut first by using the ppulatin partitining techniques (e.g., Singh, Singh, and Flatman 1994), and then cmpute apprpriate decisin statistics (e.g., 95% UCLs) separately fr each identified ppulatin. The tpic f ppulatin partitining and the extractin f a valid sitespecific backgrund data set frm a brader mixture data set ptentially cnsisting f bth nsite and ffsite data are beynd the scpe f PrUCL 5.0. Parametric estimatin and hyptheses testing methds (e.g., ttest, UCLs, UTLs) are based upn distributinal (e.g., nrmal distributin, gamma) assumptins. PrUCL has GOF tests fr nrmal, gamma, and lgnrmal distributins. Multiple Cnstituents/Variables: Envirnmental scientists need t evaluate many cnstituents in their decisin making prcesses (expsure and risk assessment). PrUCL can prcess multiple cnstituents/variables simultaneusly in a user friendly manner, an ptin nt available in ther freeware r cmmercial sftware packages such as NADA fr R (Helsel, 2013). This ptin is very useful when ne has t prcess many variables/analytes and cmpute decisin statistics (e.g., UCLs, UPLs, and UTLs) and test statistics (e.g., ANOVA test, trend test) fr thse variables/analytes. Analysis by a Grup Variable: PrUCL als has the capability f prcessing data by grups. A valid grup clumn shuld be included in the data file. The analyses f data categrized by a grup ID variable such as: 1) Surface vs. Subsurface; 2) AOC1 vs. AOC2; 3) Site vs. Backgrund; and 4) Upgradient vs. Dwngradient MWs are cmmn in many envirnmental applicatins. PrUCL ffers this ptin fr data sets with and withut nndetects. The Grup Optin prvides a useful ptin t perfrm varius statistical tests and methds including graphical displays separately fr each f the grup (samples frm different ppulatins) that may be present in a data set. Fr an example, the same data set may cnsist f analytical data frm the varius grups r ppulatins representing site, backgrund, tw r mre AOCs, surface, subsurface, mnitring wells. By using this ptin, the graphical displays (e.g., bx plts, QQ plts, histgrams) and statistics including cmputatin f backgrund statistics, UCLs, ANOVA test, trend test and OLS regressin statistics can be easily cmputed separately fr each grup in the data set. 8
35 Explratry Graphical Displays fr Uncensred and LeftCensred Data Sets: Graphical methds included in the Graph mdule f PrUCL include: QQ plts (data in same clumn), multiple QQ plts (data in different clumns), bx plts, multiple bx plts, and histgrams. These graphs can als be generated fr data sets cnsisting f ND bservatins. Additinally, the OLS Regressin and Trend Analysis mdule can be used t generate graphs displaying parametric OLS regressin lines with cnfidence intervals and predictin intervals arund the regressin lines and nnparametric TheilSen trend lines. The Trend Analysis mdule can generate trend graphs fr data sets withut a sampling event variable, and als generate time series graphs fr data sets with a sampling event (time) variable. PrUCL 5.0 accepts nly numerical values fr the event variable. Graphical displays f a data set are useful t gain added insight cntained in a data set that may nt therwise be clear by lking at test statistics such as ttest, Dixn test r TS test. Unlike test statistics (e.g., ttest, MK test, AD test) and decisin statistics (e.g., UCL, UTL), graphical displays d nt get influenced by utliers and nndetect bservatins. It is suggested that the final decisins be made based upn statistical results as well as graphical displays. Sidebyside bx plts r multiple QQ plts are useful t graphically cmpare cncentratins f tw r mre grups (e.g., several mnitring wells). The GOF mdule f PrUCL generates QQ plts fr nrmal, gamma, and lgnrmal distributins based upn uncensred as well as leftcensred data sets with NDs. All relevant infrmatin such as the test statistics, critical values and pvalues (when available) are als displayed n the GOF QQ plts. In additin t prviding infrmatin abut the data distributin, a nrmal QQ plt in the riginal raw scale als helps t identify utliers and multiple ppulatins that may be present in a data set. On a QQ plt, bservatins wellseparated frm the majrity f the data may represent ptential utliers cming frm a ppulatin different frm the main dminant ppulatin (e.g., backgrund ppulatin). In a QQ plt, jumps and breaks f significant magnitude suggest the presence f bservatins cming frm multiple ppulatins (nsite and ffsite areas). PrUCL can als be used t display bx plts with hrizntal lines displayed at prespecified cmpliance limits r cmputed upper limits (e.g., UPL, UTL) superimpsed n the same graph. This kind f graph prvides a visual cmparisn f site data with cmpliance limits and/r BTV estimates. Outlier Tests: PrUCL als has a cuple f classical utlier test prcedures (EPA 2006b, 2009), such as the Dixn test and the Rsner test. The details f these utlier tests are described in Chapter 7. These utlier tests ften suffer frm masking effects in the presence f multiple utliers. It is suggested that the classical utlier prcedures shuld always be accmpanied by graphical displays including bx plts and QQ plts. Descriptin and use f the rbust and resistant (t masking) utlier prcedures (Russeeuw and Lery, 1987; Singh and Ncerin, 1995) are beynd the scpe f PrUCL 5.0. Interested users are encuraged t try the Scut 2008 sftware package (EPA 2009) t use the rbust utlier identificatin methds especially when dealing with multivariate data sets cnsisting f data fr several variables/analytes. Outliers represent bservatins cming frm ppulatins different frm the main dminant ppulatin represented by the majrity f the data set. Outliers distrt mst statistics (e.g., mean, UCLs, UPLs, test statistics) f interest. Therefre, it is desirable t cmpute decisins statistics based upn data sets representing the main dminant ppulatin and nt t cmpute distrted statistics by accmmdating a few lw prbability utliers (e.g., by using a lgnrmal distributin). Mrever, it shuld be nted that even thugh utliers might have minimal influence n hyptheses testing statistics based upn ranks (e.g., WMW test), utliers d distrt several nnparametric statistics including btstrap methds such as btstrapt and Hall's btstrap UCLs and ther nnparametric UPLs and UTLs cmputed using the higher rder statistics. 9
36 GdnessfFit Tests: In additin t cmputing simple summary statistics fr data sets with and withut NDs, PrUCL 5.0 has GOF tests fr nrmal, lgnrmal and gamma distributins. T test fr nrmality (lgnrmality) f a data set, PrUCL has the Lilliefrs test and the extended SW test fr samples f sizes up t 2000 (Rystn, 1982, 1982a). Fr the gamma distributin, tw GOF tests: the AndersnDarling test (1954) and Klmgrv Smirnv test (Schneider, 1978) are available in PrUCL. Fr samples f larger sizes (e.g., with n > 100) and small values f the gamma shape parameter, k (e.g., k 0.1), significant discrepancies were fund in the critical values f the tw gamma GOF test statistics (AndersnDarling and Klmgrv Smirnv tests) btained using the tw gamma deviate generatin algrithms: Whitaker (1974) and Marsaglia and Tsang (2000). Fr values f k 0.2, the critical values f the tw gamma GOF tests: AndersnDarling (AD) and KlmgrvSmirnv (KS) tests have been updated using the currently available mre efficient gamma deviate generatin algrithm due t Marsaglia and Tsang's (2000); mre details abut the implementatin f their algrithm can be fund in Krese, Taimre, and Btev (2011). Fr values f the shape parameter, k=0.025, 0.05, 0.1, and 0.2, the critical value tables fr these tw GOF tests have been updated by incrprating the newly generated critical values fr three levels f significance: 0.05, 0.1, and The updated tables are prvided in Appendix A. It shuld be nted that fr k=0.2, the lder (generated in 2002) and the newly generated critical values are in general agreement. PrUCL als generates GOF QQ plts fr nrmal, lgnrmal, and gamma distributin displaying all relevant statistics including GOF test statistics. GOF tests fr data sets with and withut NDs are described in Chapter 8 f this User Guide. Fr data sets cnsisting f NDs, it is nt easy t verify the distributinal assumptins crrectly, especially when the data set cnsists f a large percentage f NDs with multiple DLs and NDs exceeding the detected values. Typically, decisins abut distributins f data sets with NDs are based upn GOF test statistics cmputed using the data btained: withut NDs; replacing NDs by 0, DL, r DL/2; using imputed NDs based upn a ROS (e.g., lgnrmal ROS) methd. Fr data sets with NDs, PrUCL can perfrm GOF tests using methds listed abve. Using the "Imputed NDs using ROS Methds" ptin f the "Stats/Sample Sizes" mdule f PrUCL 5.0, additinal clumns can be generated t stre imputed (estimated) values fr NDs based upn nrmal ROS, gamma ROS, and lgnrmal ROS (als knwn as rbust ROS) methds. Sample Size Determinatin and Pwer Evaluatin: Sample Sizes mdule in PrUCL can be used t develp DQOs based sampling designs needed t address statistical issues assciated with the varius plluted sites prjects. PrUCL 5.0 prvides user friendly ptins t enter the desired/prespecified values fr decisin parameters (e.g., Type I and Type II errr rates) and ther DQOs used t determine minimum sample sizes fr the selected statistical applicatins including: estimatin f mean, single and twsample hypthesis testing appraches, and acceptance sampling. Bth parametric (e.g., fr ttests) and nnparametric (e.g., Sign test, WRS test) sample size determinatin methds as described in EPA (2002c, 2006a, 2006b) and MARSSIM (2000) guidance dcuments are available in PrUCL versin 5.0. PrUCL als has the sample size determinatin ptin fr acceptance sampling f lts f discrete bjects such as a lt (batch, set) f drums cntaining f hazardus waste (e.g., RCRA applicatins, EPA 2002c). When the sample size fr an applicatin (e.g., verificatin f cleanup level) is nt cmputed using the DQOs based sampling design prcess, the Sample Size mdule can be used t assess the pwer f the test statistic used in retrspect. The Sample Sizes mdule with examples is cnsidered in Chapter 12 f this dcument. Btstrap Methds: Btstrap methds are cmputer intensive nnparametric methds which can be used t cmpute decisin statistics f interest when a data set des nt fllw a knwn distributin, r when it is difficult t analytically derive the distributins f statistics f interest. It is wellknwn that fr mderately skewed t highly skewed data sets, UCLs based upn standard btstrap and the percentile btstrap methds d nt perfrm well (e.g., Efrn [1981, 1982]; Efrn and Tibshirani,1993; Hall 10
37 [1988,1992]; Singh, Singh, and Iaci 2002; Singh and Singh, 2003, Singh, Maichle and Lee 2006) as the interval estimates based upn these btstrap methds fail t prvide the specified cverage (e.g., UCL 95 des nt prvide adequate 95% cverage t ppulatin mean) t the ppulatin mean. Fr skewed data sets, Efrn and Tibshirani (1993) and Hall (1988, 1992) cnsidered ther btstrap methds such as the BCA, btstrapt and Hall s btstrap methds. Fr skewed data sets, btstrapt and Hall s btstrap (meant t adjust fr skewness) methds perfrm better (e.g., in terms f cverage fr the ppulatin mean) than the ther btstrap methds. Hwever, it has been nted (e.g., Efrn and Tibshirani,1993, Singh, Singh, and Iaci,2002) that these tw btstrap methds tend t yield erratic and inflated UCL values (rders f magnitude higher than ther UCLs) in the presence f utliers. Similar behavir f the btstrap t UCL and Hall s btstrap UCL methds is bserved n data sets cnsisting f NDs and utliers. Due t the reasns described abve, whenever applicable, PrUCL 5.0 prvides cautinary ntes and warning messages regarding the use f btstrapt and Halls btstrap UCL methds. Fr nnparametric uncensred and leftcensred data sets with NDs, depending upn data variability and skewness, PrUCL recmmends the use f BCA btstrap, btstrapt, r Chebyshev inequality based methds t cmpute decisin statistics. Hyptheses Testing Appraches: PrUCL sftware has bth Single Sample (e.g., Student s ttest, sign test, prprtin test, WSR test) and TwSample (Student s ttest, WMW test, Gehan test, and TW test) parametric and nnparametric hyptheses testing appraches. Hyptheses testing appraches in PrUCL can handle bth fulluncensred data sets withut NDs, and leftcensred data sets with NDs. Mst f the hyptheses tests als reprt assciated pvalues. Fr sme hyptheses tests (e.g., WMW test, WSR test, prprtin test), large sample pvalues based upn nrmal apprximatin are cmputed using the cntinuity crrectin factrs. The varius Singlesample and TwSample hyptheses testing appraches are cnsidered in Chapter 9. Singlesample: parametric (Student s ttest) and nnparametric (Sign test, WSR test, tests fr prprtins and percentiles) hyptheses testing appraches are available in PrUCL. The singlesample hyptheses tests are used when the envirnmental parameters such as the cleanup standard, actin level, r cmpliance limits are knwn, and the bjective is t cmpare site cncentratins with thse knwn threshld values. Specifically, a ttest (r a sign test) may be used t verify the attainment f cleanup levels at an AOC) after a remediatin activity has taken place; and a test fr prprtin may be used t verify if the prprtin f exceedances f an actin level (r a cmpliance limit) by sample bservatins cllected frm an AOC (r a MW) exceeds a certain specified prprtin (e.g., 1%, 5%, 10%). The differences between these tests shuld be nted and understd. Specifically, a ttest r a Wilcxn Signed Rank (WSR) test are used t cmpare the measures f lcatin and central tendencies (e.g., mean, median) f a site area (e.g., AOC) t a cleanup standard, C s r actin level als representing a measure f central tendency (e.g., mean, median); whereas, a prprtin test cmpares if the prprtin f site bservatins frm an AOC exceeding a cmpliance limit (CL) exceeds a specified prprtin, P 0 (e.g., 5%, 10%). The percentile test cmpares a specified percentile (e.g., 95 th ) f the site data t a prespecified upper threshld (e.g., actin level). Twsample: Hyptheses tests (Student s ttest, WMW test, Gehan test, TW test) are used t perfrm site versus backgrund cmparisns, cmpare cncentratins f tw r mre AOCs, cmpare cncentratins f GW mnitring wells (MWs). It shuld be nted that as cited in the literature, sme f the hyptheses testing appraches (e.g., nnparametric twsample WMW) deal with the single detectin limit scenari. When using the WMW test n a data set with multiple detectin limits, all 11
38 bservatins (detects and NDs) belw the largest detectin limit need t be cnsidered as NDs (Gilbert, 1987). This in turn tends t reduce the pwer and increase uncertainty assciated with test. As mentined befre, it is always desirable t supplement the test statistics and cnclusins with graphical displays such as the multiple QQ plts and sidebyside bx plts. Gehan test r Tarne Ware (new in PrUCL 5.0) shuld be used in cases where multiple detectin limits are present. Cmputatin f Upper Limits including UCLs, UPLs, UTLs, and USLs: PrUCL sftware has parametric and nnparametric methds including btstrap and Chebyshev inequality based methds t cmpute the varius decisin making statistics such as UCLs f mean (EPA 2002a), percentiles, UPLs fr future k ( 1) bservatins, UTLs (e.g., EPA 1992b, EPA 2009) and upper simultaneus limits (USLs) (Singh and Ncerin, [1995, 2002]) based upn uncensred full data sets and leftcensred data sets cnsisting f NDs with multiple DLs. Methds incrprated in PrUCL cver a wide range f skewed data distributins with and withut NDs. In additin t nrmal and lgnrmal distributins based upper limits, PrUCL 5.0 can cmpute parametric UCLs, percentiles, UPLs fr future k ( 1) bservatins, UTLs, and USLs based upn gamma distributed data sets. Fr data sets with NDs, PrUCL has several estimatin methds including the KM methd (1958), ROS methds (Helsel, 2005) and substitutin methds such as replacing NDs by DL r DL/2 (Gilbert, 1987, EPA 2006b). Substitutin DL/2 methd has been incrprated in PrUCL fr research and cmparisn purpses as requested by EPA scientists. Cmputatin f UCLs Based Upn Uncensred Data Sets withut NDs: Parametric UCL cmputatin methds in PrUCL fr uncensred data sets include: Student s tucl, Apprximate gamma UCL (using chisquare apprximatin), Adjusted gamma UCL (adjusted fr level significance), Land s HUCL, and Chebyshev inequalitybased UCL (using MVUEs f parameters f a lgnrmal distributin). Nnparametric UCL cmputatin methds fr data sets withut NDs include: CLTbased UCL, Mdifiedtstatistic (adjusted fr skewness)based UCL, AdjustedCLT (adjusted fr Skewness)based UCL, Chebyshev inequality baseducl (using sample mean and standard deviatin), Jackknife methdbased UCL, UCL based upn standard btstrap, UCL based upn percentile btstrap, UCL based upn BCA btstrap, UCL based upn btstrapt, and UCL based upn Hall s btstrap methd. The details f UCL cmputatin methds fr uncensred data sets are summarized in Chapter 2 f the assciated PrUCL 5.0 Technical Guide; and cmputatins f the varius parametric and nnparametric UCLs using PrUCL 5.0 are described in Chapter 11 f this dcument. Cmputatins f UPLs, UTLs, and USLs Based Upn Uncensred Data Sets withut NDs: Fr uncensred data sets withut NDs, PrUCL can cmpute parametric percentiles, UPLs fr k (k 1) future bservatins, UPLs fr mean f k ( 1) future bservatins, UTLs, and USLs based upn nrmal, gamma, and lgnrmal distributins. Nnparametric upper limits are typically based upn rder statistics f a data set such as a backgrund r a reference area data set. Depending upn the size f the data set, the higher rder statistics (maximum, secnd largest, third largest, and s n) are used t cmpute these upper limits (e.g., UTLs). Depending upn the sample size, specified cnfidence cefficient and cverage prbability, PrUCL 5.0 utputs the actual cnfidence cefficient achieved by a nnparametric UTL. The mathematical details f the varius parametric and nnparametric cmputatin methds fr UPLs, UTLs, and USLs are described in Chapter 3 f the PrUCL 5.0 Technical Guide; and cmputatins f the these intervals using PrUCL 5.0 are described in Chapter 10 f this User Guide. Cmputatin f UCLs, UPLs, UTLs, and USLs Based Upn LeftCensred Data Sets with NDs: Fr data sets with NDs, PrUCL cmputes UCLs, UPLs, UTLs, and USLs based upn mean and sd cmputed using lgros (LROS, rbust ROS), Gamma ROS (GROS), KM, and DL/2 methds. Fr nnparametric data sets, t adjust fr skewness, PrUCL uses btstrap methds and Chebyshev inequality t cmpute UCLs and ther limits using estimates f mean and standard deviatin btained using methds listed 12
39 abve. PrUCL als uses parametric methds n KM (and ROS) estimates prvided detected bservatins in the leftcensred data set fllw a parametric distributin. Fr example, if the detected data fllw a gamma distributin, PrUCL uses KM estimates in gamma distributin based equatins t cmpute UCLs, UTLs, and ther upper limits. Based upn a Mnte Carl study perfrmed by Singh, Maichle, and Lee (EPA, 2006), PrUCL recmmends the use f the KaplanMeier (1958) estimates in btstrap and Chebyshev inequality t cmpute the varius decisin statistics (e.g.,ucl95, UPL, UTL) f interest. PrUCL 5.0 suggests the use f KMGamma upper limits when the detected data fllw a gamma distributin. PrUCL cmputes KM estimates directly using leftcensred data sets withut flipping data and reflipping decisin statistics. The KM methd incrprated in PrUCL cmputes bth sd and standard errr (SE) f the mean. Fr histrical reasns and fr cmparisn and research purpses, the DL/2 substitutin methd and HUCL based upn LROS methd have been retained in PrUCL 5.0. The inclusin f the substitutin methd in PrUCL shuld nt be inferred as an endrsement f thse methds by PrUCL sftware and its develpers. The mathematical details f the UCL cmputatin methds fr data sets with NDs are given in Chapter 4 and the descriptin f the varius ther upper limits: UPLs, UTLs, and USLs fr data sets with NDs are given in Chapter 5 f the PrUCL 5.0 Technical Guide. The cmputatins f these limits fr data sets cnsisting f NDs using PrUCL 5.0 are cnsidered in chapters 10 and 11 f this User Guide. OneWay ANOVA, OLS Regressin and Trend Analysis: The Oneway ANOVA mdule has bth classical and nnparametric KW ANOVA tests as described in EPA guidance dcuments (e.g., EPA [2006b, 2009]). Oneway ANOVA is used t cmpare means (r medians) f multiple grups such as cmparing mean cncentratins f several areas f cncern; and perfrming interwell cmparisns cmparing cncentratins f several MWs. The OLS Regressin ptin cmputes the classical OLS regressin line, and generates graphs displaying the OLS line, cnfidence bands and predictin bands arund the regressin line. All statistics f interest including slpe, intercept, and crrelatin cefficient are displayed n the OLS line graph. The Trend Analysis mdule has tw nnparametric trend tests: MK trend test and TheilSen trend test. Using this ptin, ne can generate trend graphs and timeseries graphs displaying TheilSen trend line and all ther statistics f interest with assciated pvalues. In GW mnitring applicatins, OLS regressin, trend tests, and time series plts are ften used t identify trends (e.g., upwards, dwnwards) in cnstituent cncentratins f the varius GW mnitring wells ver a certain perid f time (EPA 2009). The details f Oneway ANOVA are given in Chapter 9, and OLS regressin line and Trend tests methds are described in Chapter 10 f the PrUCL 5.0 Technical Guide. Chapters 13 and 14 f this User Guide respectively, illustrate the use f Oneway ANOVA mdule and OLS Regressin and Trend Analysis mdule. BISS Mdule: At many sites, a large amunt f discrete nsite and backgrund data are already available which are nt directly cmparable t actual field ISM data. In rder t prvide a tl t cmpare the existing discrete data with ISM data, the BISS mdule f PrUCL 5.0 may be used n a large existing discrete data set. The ISM methdlgy used t develp the BISS mdule is a relatively new apprach; methds incrprated in this BISS mdule require further investigatin. The BISS mdule has been temprarily blcked fr use in PrUCL 5.0 as this mdule is awaiting adequate guidance fr its intended use n discrete backgrund data sets. Recmmendatins and Suggestins in PrUCL: Nt much guidance is available in the envirnmental literature including the available guidance dcuments t cmpute rigrus UCLs, UPLs, and UTLs fr mderately skewed t highly skewed uncensred and leftcensred data sets cnsisting f NDs with multiple DLs, a cmmn ccurrence in envirnmental data sets. Fr uncensred psitively skewed data sets, Singh, Singh, and Iaci (2002) and Singh and Singh (2003) perfrmed extensive simulatin 13
40 experiments t cmpare the perfrmances (in terms f cverage prbabilities) f several UCL cmputatin methds described in statistical and envirnmental literature. They nted that the ptimal chice f a decisin statistic (e.g., UCL 95) depends upn the sample size, data distributin and data skewness. Until 2006, nt much guidance was available n hw t cmpute UCL95 f mean and ther upper limits (e.g., UPLs and UTLs) based upn skewed data sets cnsisting f NDs with multiple DLs. Fr data sets with NDs, Singh, Maichle, and Lee (EPA 2006) cnducted a similar simulatin study t cmpare the perfrmances f the varius estimatin methds (in terms f bias in the mean estimate); and f sme the UCL cmputatin methds (in terms f cverage prvided by a UCL). They cncluded that the nnparametric KM estimatin methd perfrms well in terms f bias in estimate f the mean; and fr skewed data sets, tstatistic, CLT, and the percentile btstrap methd based UCLs cmputed using KM estimates (and ROS estimates) underestimate the ppulatin mean. Based upn the findings summarized in Singh, Singh, and Iaci (2002) and Singh, Maichle, and Lee (2006), it is reasnable t state and assume that the findings f the simulatin studies perfrmed n uncensred skewed data sets t cmpare the perfrmances f the varius UCL cmputatin methds can be extended t skewed leftcensred data sets. Fr data sets with and withut NDs, PrUCL cmputes decisin statistics including UCLs, UPLs, and UTLs using several parametric and nnparametric methds cvering a widerange f sample size, data variability and skewness. Using the results and findings summarized in the literature cited abve, based upn the sample size, data distributin, and data skewness, sme mdules f PrUCL make suggestins abut using a decisin statistic t estimate ppulatin parameters f interest (e.g., EPC). The recmmendatins made in PrUCL are based upn the extensive experience f the develpers in envirnmental statistical methds, published literature (e.g., Efrn and Tibshirani, 1993; Hall, 1988; Singh, Singh, and Engelhardt 1997; Singh, Singh, and Iaci 2002; and Singh, Maichle, and Lee 2006) and prcedures described in the varius EPA guidance dcuments (EPA [1992a, 1992b 2002a, 2002b, 2006b, 2009, 2009a, 2009b]). Based upn the cnceptual site mdel (CSM), expert site and reginal knwledge, the prject team shuld make the final decisin regarding using r nt using the suggestins made by PrUCL. If deemed necessary, the prject team may want t cnsult a statistician. Even thugh, PrUCL 5.0 has been develped using limited gvernment funding, fr data sets with and withut NDs, PrUCL 5.0 prvides many statistical and graphical methds described in the EPA dcuments cited abve. Hwever, ne may nt cmpare the availability f methds in PrUCL 5.0 with methds available in the cmmercial sftware packages such as SAS and Minitab 16. Fr example, trend tests crrecting fr seasnal/spatial variatins are nt available in the PrUCL sftware. Fr thse methds the user is referred t the cmmercial sftware packages. As mentined earlier, it is recmmended t supplement test results (e.g., twsample test) with graphical displays (e.g., QQ plts, sidebyside bx plts); especially when data sets cnsist f NDs and utliers. With the inclusin f BISS mdule, Oneway ANOVA, Regressin and Trend tests, and the userfriendly DQOs based Sample Size determinatin mdules, PrUCL represents a cmprehensive statistical sftware package equipped with statistical methds and graphical tls needed t address many envirnmental sampling and statistical issues as described in the varius CERCLA (EPA 1989a, 1992a, 2002a, 2002b, 2006a, 2006b), MARSSIM (EPA 2000), and RCRA (EPA 1989b, 1992b, 2002c, 2009) guidance dcuments. Finally, the users f PrUCL are cautined abut the use f methds and suggestins described in sme recent envirnmental literature. Fr example, many decisin statistics (e.g., UCLs, UPLs, UTLs,) cmputed using the methds (e.g., percentile btstrap, statistics using KM estimates and tcritical values) described in Helsel (2012) will fail t prvide desired cverage t the envirnmental parameters f interest (mean, upper percentile) f mderately skewed t highly skewed ppulatins; and cnclusins 14
41 derived based upn thse decisins statistics may lead t incrrect cnclusins which may nt be csteffective r prtective f human health and the envirnment. PrUCL 5.0 Technical Guide In additin t this User Guide, a Technical dcument als accmpanies PrUCL , prviding technical details f the graphical and statistical methds incrprated in PrUCL Mst f the mathematical algrithms and frmulae (with references) used in the develpment f PrUCL 5.0 are described in the assciated Technical Guide. 15
42 Chapter 1 Guidance n the Use f Statistical Methds and Assciated Minimum Sample Size Requirements fr PrUCL Sftware Decisins based upn statistics cmputed using discrete data sets f small sizes (e.g., < 6) cannt be cnsidered reliable enugh t make remediatin decisins that affect human health and the envirnment. Fr example, a backgrund data set f size less than 6 is nt large enugh t characterize backgrund ppulatin, t cmpute backgrund threshld values (BTV) estimates, r t perfrm backgrund versus site cmparisns. Several EPA guidance dcuments (e.g., MARSSIM 2000; EPA [2006a, 2006b]) describe data quality bjectives (DQOs) and minimum sample size cmputatins needed t address statistical issues assciated with the varius envirnmental applicatins. In rder t btain reliable results using statistical methds, an adequate amunt f data shuld be cllected using desired DQOs (cnfidence cefficient, decisin errr rates). The Sample Sizes mdule f PrUCL cmputes DQOs based minimum sample sizes needed t use the statistical methds described in the varius guidance dcuments. In sme cases, it may nt be pssible (e.g., due t resurce cnstraints) t cllect DQOs based number f samples; under these circumstances ne can use the Sample Sizes mdule t assess the pwer f the test statistic used in retrspect. Sme suggestins abut the minimum sample size requirements needed t use statistical methds t estimate envirnmental parameters f interest such as expsure pint cncentratin (EPC) terms and BTVs, t cmpare site data with backgrund data r with sme preestablished screening levels (e.g., actin levels [ALs], cmpliance limits [CLs]), are prvided in this chapter. It is nted that similar minimum sample size suggestins made by PrUCL (EPA 2007, 2009a, 2009b) have been made in sme ther guidance dcuments including the RCRA Guidance Dcument (EPA 2009). This chapter als describes the differences between the varius statistical upper limits including upper cnfidence limits (UCLs) f the mean, upper predictin limits (UPLs) fr future bservatins, and upper tlerance intervals (UTLs) ften used t estimate the envirnmental parameters f interest including EPC terms and BTVs. The use f a statistical methd depends upn the envirnmental parameter(s) being estimated r cmpared with. The measures f central tendency (e.g., means, medians, r their UCLs) are used t cmpare site mean cncentratins with a cleanup standard, C s, als representing sme central tendency measure f a reference area r sme ther knwn threshld representing a measure f central tendency. The upper threshld values, such as the CLs, alternative cncentratin limits (ACL), r nttexceed values, are used when individual pintbypint nsite bservatins are cmpared with thse threshld values. It shuld be nted that depending upn whether the envirnmental parameters (e.g., BTVs, nttexceed value, r EPC term) are knwn r unknwn, different statistical methds with different data requirements are needed t cmpare site cncentratins with preestablished (knwn) r estimated (unknwn) standards and BTVs. Several upper limits, and single and tw sample hyptheses testing appraches, fr bth fulluncensred and leftcensred data sets are available in the PrUCL sftware package t perfrm the cmparisns described abve. 1.1 Backgrund Data Sets Based upn the cnceptual site mdel (CSM), the prject team familiar with the site selects backgrund r reference areas. Depending upn the site activities and the pllutants, the backgrund area can be sitespecific r a general reference area. An apprpriate randm sample f independent bservatins (e.g., i.i.d) shuld be cllected frm the backgrund area. A defensible backgrund data set represents a single ppulatin pssibly withut any utliers. In a backgrund data set, in additin t reprting 16
43 and/r labratry errrs, statistical utliers may als be present. A few elevated statistical utliers present in a backgrund data set may actually represent ptentially cntaminated lcatins belnging t impacted site areas and/r pssibly frm ther plluted site(s); thse elevated utliers may nt be cming frm the main dminant backgrund ppulatin under evaluatin. Since the presence f utliers in a data set tends t yield distrted (incrrect and misleading) values f the decisin making statistics (e.g., UCLs, UPLs and UTLs), elevated utliers shuld nt be included in backgrund data sets and estimatin f BTVs. The bjective here is t cmpute backgrund statistics based upn the majrity f the data set representing the main dminant backgrund ppulatin, and nt t accmmdate a few lw prbability high utliers (e.g., cming frm extreme tails f the data distributin) that may als be present in the backgrund data set. The ccurrence f elevated utliers is cmmn when backgrund samples are cllected frm varius nsite areas (e.g., large Federal Facilities). The prper dispsitin f utliers, t include r nt include them in statistical cmputatins, shuld be decided by the prject team. The prject team may want t cmpute decisin statistics with and withut the utliers t evaluate the influence f utliers n the decisin making statistics. A cuple f classical utlier tests (Dixn and Rsner tests) are available in PrUCL. Since bth f these classical tests suffer frm masking effects (e.g., sme extreme utliers may mask the ccurrence f ther intermediate utliers), it is suggested that these classical utlier tests be supplemented with graphical displays such as a bx plt and a QQ plt. The use f explratry graphical displays helps in determining the number f utliers ptentially present in a data set. The use f graphical displays als helps in identifying extreme high utliers as well as intermediate and mild utliers. The use f rbust and resistant utlier identificatin prcedures (Singh and Ncerin, 1995, Russeeuw and Lery, 1987) is recmmended when multiple utliers are present in a data set. Thse methds are beynd the scpe f PrUCL 5.0. Hwever, several rbust utlier identificatin methds are available in the Scut 2008 versin 1.0 sftware package (EPA 2009). An apprpriate backgrund data set f a reasnable size (preferably cmputed using DQOs prcesses) is needed t represent a backgrund area and t cmpute upper limits (e.g., estimates f BTVs) based upn backgrund data sets and als t cmpare site and backgrund data sets using hyptheses testing appraches. At the minimum, a backgrund data set shuld have at least 10 (mre bservatins are preferable) bservatins t perfrm backgrund evaluatins. 1.2 Site Data Sets A data set cllected frm a site ppulatin (e.g., area f cncern [AOC], expsure areas [EA], decisin unit [DU], grup f mnitring wells [MWs]) shuld be representative f the site area under investigatin. Depending upn the site areas under investigatin, different sil depths and sil types may be cnsidered as representing different statistical ppulatins. In such cases, backgrund versus site cmparisns may have t be cnducted separately fr each f thse site subppulatins (e.g., surface and subsurface layers f an AOC, clay and sandy site areas). These issues, such as cmparing depths and sil types, shuld als be cnsidered in planning stages when develping sampling designs t cllect samples frm the varius site AOCs. Specifically, the availability f an adequate amunt f representative site data is required frm each f thse site subppulatins/strata defined by sample depths, sil types, and the varius ther characteristics. Fr detailed guidance n sil sample cllectins, the reader is referred t Gerlach and Ncerin (EPA, 2003). Site data cllectin requirements depend upn the bjective(s) f the study. Specifically, in backgrund versus site cmparisns, site data are needed t perfrm: 17
44 pintbypint nsite cmparisns with preestablished actin levels r estimated BTVs. Typically, this apprach is used when nly a small number (e.g., < 6) f nsite bservatins are cmpared with a BTV r sme ther nttexceed value. If many nsite values need t be cmpared with a BTV, it is recmmended t use UTL r upper simultaneus limit (USL) t cntrl the false ps1itive errr rate (Type I Errr Rate). Alternatively, ne can use hypthesis testing appraches prvided enugh bservatins (at least 10, mre are preferred) are available. singlesample hyptheses tests t cmpare site data with a preestablished cleanup standards, C s (e.g., representing a measure f central tendency); prprtin test t cmpare site prprtin f exceedances f an AL with a prespecified allwable prprtin, P 0. These hyptheses testing appraches are used n site data when enugh site bservatins are available. Specifically, when at least 10 (mre are desirable) site bservatins are available; it is preferable t use hyptheses testing appraches t cmpare site bservatins with specified threshld values. The use f hyptheses testing appraches can cntrl bth types f errr rates (Type 1 and Type 2) mre efficiently than the pintbypint individual bservatin cmparisns. This is especially true as the number f pintbypint cmparisns increases. This issue is illustrated by the fllwing table summarizing the prbabilities f exceedances (false psitive errr rate) f the BTV (e.g., 95 th percentile) by nsite bservatins, even when the site and backgrund ppulatins have cmparable distributins. The prbabilities f these chance exceedances increase as the site sample size increases. Sample Size Prbability f Exceedance twsample hyptheses tests t cmpare site data distributin with backgrund data distributin t determine if the site cncentratins are cmparable t backgrund cncentratins. An adequate amunt f data needs t be made available frm the site as well as the backgrund ppulatins. It is preferable t cllect at 10 bservatins frm each ppulatin under cmparisn. Ntes: Frm a mathematical pint f view, ne can perfrm hypthesis tests n data sets cnsisting f nly 34 data values; hwever, the reliability f the test statistics (and the cnclusins derived) thus btained is questinable. In these situatins it is suggested t supplement the test statistics decisins by graphical displays. 1.3 Discrete Samples r Cmpsite Samples? PrUCL can be used n discrete data sets as well as n cmpsite data sets. Hwever, in a data set (backgrund r site), cllected samples shuld be either all discrete r all cmpsite. In general, bth discrete and cmpsite site samples may be used fr individual pintbypint site cmparisns with a threshld value, and fr single and twsample hyptheses testing applicatins. When using a singlesample hypthesis testing apprach, site data can be btained by cllecting all discrete r all cmpsite samples. The hypthesis testing apprach is used when many (e.g., 18
45 10) site bservatins are available. Details f the singlesample hypthesis appraches are widely available in EPA guidance dcuments (MARSSIM, 2000; EPA [1989a 2006b]). Several singlesample hyptheses testing prcedures available in PrUCL are described in Chapter 6 f the PrUCL 5.0 Tech Guide. If a twsample hypthesis testing apprach is used t perfrm site versus backgrund cmparisns, then samples frm bth f the ppulatins shuld be either all discrete samples, r all cmpsite samples. The twsample hypthesis testing appraches are used when many (e.g., at least 10) site, as well as backgrund, bservatins are available. Fr better results with higher statistical pwer, the availability f mre bservatins perhaps based upn an apprpriate DQOs prcess (EPA 2006a) is desirable. Several twsample hyptheses tests available in PrUCL 5.0 are described in Chapter 6 f the PrUCL 5.0 Tech Guide. 1.4 Upper Limits and Their Use The cmputatin and use f statistical limits depend upn their applicatins and the parameters (e.g., EPC term, BTVs) they are suppsed t be estimating. Depending upn the bjective f the study, a prespecified cleanup standard, C s, can be viewed as t represent: 1) an average (r median) cnstituent cncentratin, 0 ; r 2) a nttexceed upper threshld cncentratin value, A 0. These tw threshld values, an average value, 0, and a nttexceed value, A 0, represent tw significantly different parameters, and different statistical methds and limits are used t cmpare the site data with these tw very different threshld values. Statistical limits, such as an UCL f the ppulatin mean, an UPL fr an independently btained single bservatin, r independently btained k bservatins (als called future k bservatins, next k bservatins, r k different bservatins), upper percentiles, and UTLs are ften used t estimate the envirnmental parameters: an EPC term ( 0 ) and a BTV (A 0 ). A new upper limit, USL has been included in PrUCL 5.0 which may be used t estimate a BTV based upn a wellestablished backgrund data set withut any utliers. It is imprtant t understand and nte the differences between the uses and numerical values f these statistical limits s that they can be prperly used. Specifically, the differences between UCLs and UPLs (r upper percentiles), and UCLs and UTLs shuld be clearly understd and acknwledged. A UCL with a 95% cnfidence limit (UCL95) f the mean represents an estimate f the ppulatin mean (measure f the central tendency), whereas a UPL95, a UTL95%95% (UTL9595), and an upper 95 th percentile represent estimates f a threshld frm the upper tail f the ppulatin distributin such as the 95 th percentile. Here, UPL95 represents a 95% upper predictin limit, and UTL9595 represents a 95% cnfidence limit f the 95 th percentile. Fr mildly skewed t mderately skewed data sets, the numerical values f these limits tend t fllw the rder given as fllws: Sample Mean UCL95 f Mean Upper 95 th Percentile UPL95 f a Single Observatin UTL9595 Fr highly skewed data sets, these limits may nt fllw the rder described abve. This is especially true when the upper limits are cmputed based upn a lgnrmal distributin (Singh, Singh, and Engelhardt, 1997). It is well knwn that a lgnrmal distributin based HUCL95 (Land s UCL95) ften yields unstable and impractically large UCL values. An HUCL95 ften becmes larger than UPL95 and even larger than a UTL 95%95% and the largest sample value. This is especially true when dealing with skewed data sets f smaller sizes. Mrever, it shuld als be nted that in sme cases, a HUCL95 becmes smaller than the sample mean, especially when the data are mildly skewed and the sample size is 19
46 large (e.g., > 50, 100). The differences amng the varius upper limits discussed abve are illustrated by the fllwing example. Example 1.1. Cnsider a backgrund real data set cllected frm a Superfund site (EPA 2002b). The data set has several inrganic COPC, including aluminum, arsenic, chrmium, irn, and lead. Irn cncentratins fllw a nrmal distributin. Sme upper limits fr the irn data set are summarized as fllws. Hwever, the varius upper limits d fllw the rder as described abve. Table 11. Cmputatin f Upper Limits fr Irn (Nrmally Distributed) Mean Median Min Max UCL95 UPL95 fr a Single Observatin UPL95 fr 4 Observatins UTL % Upper Percentile A brief discussin abut the differences between the applicatins and uses f the varius statistical limits is prvided belw. A UCL represents an average value that shuld be cmpared with a threshld value als representing an average value (preestablished r estimated), such as a mean C s. Fr example, a site 95% UCL exceeding a C s, may lead t the cnclusin that the C s has nt been attained by the average site area cncentratin. It shuld als be nted that UCLs f means are typically cmputed based upn the site data set. A UCL represents a cllective measure f central tendency, and it is nt apprpriate t cmpare individual site bservatins with a UCL. Depending upn data availability, single r twsample hyptheses testing appraches are used t cmpare a site average r a site median with a specified r preestablished cleanup standard (singlesample hypthesis), r with the backgrund ppulatin average r median (twsample hypthesis). A UPL, an upper percentile, r an UTL represents an upper limit t be used fr pintbypint individual site bservatin cmparisns. UPLs and UTLs are cmputed based upn backgrund data sets, and pintbypint nsite bservatins are cmpared with thse limits. A site bservatin exceeding a backgrund UTL may lead t the cnclusin that the cnstituent is present at the site at levels greater than the backgrund cncentratins level. When enugh (e.g., at least10) site bservatins are available, it is preferable t use hyptheses testing appraches. Specifically, singlesample hyptheses testing (cmparing site t a specified threshld) appraches shuld be used t perfrm site versus a knwn threshld cmparisn; and twsample hyptheses testing (prvided enugh backgrund data are als available) appraches shuld be used t perfrm site versus backgrund cmparisn. Several parametric and nnparametric single and twsample hyptheses testing appraches are available in PrUCL 5.0. It is reemphasized that nly averages shuld be cmpared with averages r UCLs, and individual site bservatins shuld be cmpared with UPLs, upper percentiles, UTLs, r USLs. Fr example, the cmparisn f a 95% UCL f ne ppulatin (e.g., site) with a 90% r 95% upper percentile f anther ppulatin (e.g., backgrund) cannt be cnsidered fair and reasnable as these limits (e.g., UCL and UPL) estimate and represent different parameters. 20
47 1.5 PintbyPint Cmparisn f Site Observatins with BTVs, Cmpliance Limits, and Other Threshld Values The pintbypint bservatin cmparisn methd is used when a small number (e.g., < 6) f site bservatins are cmpared with preestablished r estimated BTVs, screening levels, r preliminary remediatin gals (PRGs). Typically, a single exceedance f the BTV by an nsite (r a mnitring well) bservatin may be cnsidered as an indicatin f the presence f cntaminatin at the site area under investigatin. The cnclusin f an exceedance by a site value is smetimes cnfirmed by resampling (taking a few mre cllcated samples) that site lcatin (r a mnitring well) exhibiting cnstituent cncentratin in excess f the BTV. If all cllcated (r cllected during the same time perid) sample bservatins cllected frm the same site lcatin (r well) exceed the BTV r PRG, then it may be cncluded that the lcatin (well) requires further investigatin (e.g., cntinuing treatment and mnitring) and cleanup. When BTV cnstituent cncentratins are nt knwn r preestablished, ne has t cllect r extract a backgrund data set f an apprpriate size that can be cnsidered representing the site backgrund. Statistical upper limits are cmputed using the backgrund data set thus btained, which are used as estimates f BTVs. T cmpute reasnably reliable estimates f BTVs, enugh backgrund bservatins (minimum f 10) shuld be cllected, perhaps using an apprpriate DQOs prcess as described in EPA (2006a) and MARSSIM (2000). Several statistical limits listed abve are used t estimate the BTVs based upn a defensible (free f utliers, representing the backgrund ppulatin) backgrund data set f an adequate size. The pintbypint cmparisn methd is als useful when quick turnarund cmparisns are required in real time. Specifically, when decisins have t be made in real time by a sampling r a screening crew, r when nly a few site samples are available, then individual pintbypint site cncentratins are cmpared either with preestablished cleanup gals r with estimated BTVs. The sampling crew can use these cmparisns t: 1) screen and identify the cntaminants/cnstituents f ptential cncern (COPCs), 2) identify the plluted site AOCs, r 3) cntinue r stp remediatin r excavatin at an nsite area f cncern. If a larger number f samples (e.g., >10) are available frm the varius nsite lcatins representing the site area under investigatin, then the use f hyptheses testing appraches (bth singlesample and a twsample) is preferred. The use f hypthesis testing appraches cntrl the errr rates mre tightly and efficiently than the individual pintbypint site cmparisns. 1.6 Hypthesis Testing Appraches and Their Use Bth singlesample and twsample hyptheses testing appraches are used t make cleanup decisins at plluted sites, and als t cmpare cnstituent cncentratins f tw (e.g., site versus backgrund) r mre ppulatins (e.g., MWs) Single Sample Hyptheses (Preestablished BTVs and NttExceed Values are Knwn) When preestablished BTVs are used such as the U.S. Gelgical Survey (USGS) backgrund values (Shacklette and Berngen, 1984), r threshlds btained frm similar sites, there is n need t extract, establish, r cllect a backgrund data set. When the BTVs and cleanup standards are knwn, nesample hyptheses are used t cmpare site data (prvided enugh site data are available) with knwn and preestablished threshld values. It is suggested that the prject team determine (e.g., using DQOs) r decide 21
48 (depending upn resurces) abut the number f site bservatins that shuld be cllected and cmpared with the preestablished standards befre cming t a cnclusin abut the status (clean r plluted) f the site AOCs. As mentined earlier, when the number f available site samples is less than 6, ne might perfrm pintbypint site bservatin cmparisns with a BTV; and when enugh site bservatins (at least 10) are available, it is desirable t use singlesample hypthesis testing appraches. Depending upn the parameter (e.g., the average value, 0, r a nttexceed value, A 0 ), represented by the knwn threshld value, ne can use singlesample hyptheses tests fr ppulatin mean r median (ttest, sign test), r use singlesample tests fr prprtins and percentiles. The details f the singlesample hyptheses testing appraches can be fund in EPA (2006b) guidance dcument and in Chapter 6 f this Technical Guide. OneSample ttest: This test is used t cmpare the site mean,, with sme specified cleanup standard, C s, where the C s represents an average threshld value, 0. The Student s ttest (r a UCL f mean) is used (assuming nrmality f site data set r when sample size is large such as larger than 30, 50) t verify the attainment f cleanup levels at a plluted site after sme remediatin activities. OneSample Sign Test r Wilcxn Signed Rank (WSR) Test: These tests are nnparametric tests and can als handle ND bservatins, prvided all NDs (e.g., assciated detectin limits) fall belw the specified threshld value, C s. These tests are used t cmpare the site lcatin (e.g., median, mean) with sme specified C s representing a similar lcatin measure. OneSample Prprtin Test r Percentile Test: When a specified cleanup standard, A 0, such as a PRG r a BTV represents an upper threshld value f a cnstituent cncentratin distributin rather than the mean threshld value, 0, then a test fr prprtin r a test fr percentile (r equivalently a UTL UTL 9590) may be used t cmpare site prprtin (r site percentile) with the specified threshld r actin level, A TwSample Hyptheses (BTVs and NttExceed Values are Unknwn) When BTVs, nttexceed values, and ther cleanup standards are nt available, then site data are cmpared directly with the backgrund data. In such cases, twsample hypthesis testing appraches are used t perfrm site versus backgrund cmparisns. Nte that this apprach can be used t cmpare cncentratins f any tw ppulatins including tw different site areas r tw different mnitring wells (MWs). In rder t use and perfrm a twsample hypthesis testing apprach, enugh data shuld be available frm each f the tw ppulatins. Site and backgrund data requirements (e.g., based upn DQOs) t perfrm twsample hypthesis test appraches are described in EPA (2002b, 2006a, 2006b), MARSSIM (2000) and als in Chapter 6 f the PrUCL 5.0 Technical Guide. While cllecting site and backgrund data, fr better representatin f ppulatins under investigatin, ne may als want t accunt fr the size f the backgrund area (and site area fr site samples) in sample size determinatin. That is, a larger number (e.g., > 1520) f representative backgrund (and site) samples shuld be cllected frm larger backgrund (and site) areas; every effrt shuld be made t cllect as many samples as determined by the DQOs based sample sizes. The twsample (r mre) hyptheses appraches are used when the site parameters (e.g., mean, shape, distributin) are being cmpared with the backgrund parameters (e.g., mean, shape, distributin). The twsample hyptheses testing apprach is als used when the cleanup standards r screening levels are nt knwn a priri. Specifically, in envirnmental applicatins, twsample hyptheses testing appraches are used t cmpare average r median cnstituent cncentratins f tw r mre ppulatins. T derive reliable cnclusins with higher statistical pwer based upn hypthesis testing appraches, an 22
49 adequate amunt f data (e.g., minimum f 10 samples) shuld be cllected frm all f the ppulatins under investigatin. The twsample hyptheses testing appraches incrprated in PrUCL 5.0 are listed as fllws: 1. Student ttest (with equal and unequal variances) Parametric test assumes nrmality 2. WilcxnMannWhitney (WMW) test Nnparametric test handles data with NDs with ne DL  assumes tw ppulatins have cmparable shapes and variability 3. Gehan test Nnparametric test handles data sets with NDs and multiple DLs  assumes cmparable shapes and variability 4. TarneWare (TW) test Nnparametric test handles data sets with NDs and multiple DLs  assumes cmparable shapes and variability The Gehan and TarneWare tests are meant t be used n leftcensred data sets with multiple detectin limits (DLs). Fr best results, the samples cllected frm the tw (r mre) ppulatins shuld all be f the same type btained using similar analytical methds and apparatus; the cllected site and backgrund samples shuld be all discrete r all cmpsite (btained using the same design and pattern), and be cllected frm the same medium (sil) at similar depths (e.g., all surface samples r all subsurface samples) and time (e.g., during the same quarter in grundwater applicatins) using cmparable (preferably same) analytical methds. Gd sample cllectin methds and sampling strategies are given in EPA (1996, 2003) guidance dcuments. Ntes: PrUCL 5.0 (and previus versins) has been develped using limited gvernment funding. PrUCL 5.0 is equipped with statistical and graphical methds needed t address many envirnmental sampling and statistical issues as described in the varius CERCLA, MARSSIM, and RCRA dcuments cited earlier. Hwever, ne may nt cmpare the availability f methds in PrUCL 5.0 with methds incrprated in cmmercial sftware packages such as SAS and Minitab 16. Nt all methds available in the statistical literature are available in PrUCL. 1.7 Minimum Sample Size Requirements and Pwer Assessment Due t resurce limitatins, it may nt be pssible (nr needed) t sample the entire ppulatin (e.g., backgrund area, site area, AOCs, EAs) under study. Statistics is used t draw inference(s) abut the ppulatins (clean, dirty) and their knwn r unknwn parameters (e.g., mean, variance, upper threshld values) based upn much smaller data sets (samples) cllected frm thse ppulatins. T determine and establish BTVs and site specific screening levels, defensible data set(s) f apprpriate size(s) need t be cllected frm backgrund areas (e.g., sitespecific, general reference area, r histrical data). The prject team and site experts shuld decide what represents a site ppulatin and what represents a backgrund ppulatin. The prject team shuld determine the ppulatin area and bundaries based upn all current and future uses, and the bjectives f data cllectin. Using the cllected site and backgrund data sets, statistical methds supplemented with graphical displays are used t perfrm site versus backgrund cmparisns. The test results and statistics btained by perfrming such site versus backgrund cmparisns are used t determine if the site and backgrund level cnstituent cncentratins are cmparable; r if the site cncentratins exceed the backgrund threshld cncentratin level; r if an adequate amunt f remediatin appraching the BTV r sme cleanup level has been perfrmed at plluted site AOCs. T perfrm these statistical tests, ne needs t determine the apprpriate sample sizes that need t be cllected frm the ppulatins (e.g., site and backgrund) under investigatin using apprpriate DQOs 23
50 prcesses (EPA [2006a, 2006b]; MARSSIM, 2000). PrUCL has the Sample Sizes mdule which can be used t develp DQOs based sampling designs needed t address statistical issues assciated with the varius plluted sites prjects. PrUCL prvides user friendly ptins t enter the desired/prespecified values f decisin parameters (e.g., Type I and Type II errr rates) t determine minimum sample sizes fr the selected statistical applicatins including: estimatin f mean, single and twsample hypthesis testing appraches, and acceptance sampling. Sample size determinatin methds are available fr the sampling f cntinuus characteristics (e.g., lead r Radium 226), as well as fr attributes (e.g., prprtin f ccurrences exceeding a specified threshld). Bth parametric (e.g., ttests) and nnparametric (e.g., Sign test, test fr prprtins, WRS test) sample size determinatin methds are available in PrUCL 5.0. PrUCL 5.0 als has sample size determinatin methds fr acceptance sampling f lts f discrete bjects such as a lt f drums cntaining hazardus waste (e.g., RCRA applicatins, EPA 2002c). Hwever, due t budget cnstraints, it may nt be pssible t cllect the same number f samples as determined by using a DQOs prcess. Fr example, the data might have already been cllected (ften is the case) withut using a DQOs prcess, r due t resurce cnstraints, it may nt be pssible t cllect as many samples as determined by using a DQOs based sample size frmula. In practice, the prject team and the decisin makers may decide nt t cllect enugh backgrund samples. It is suggested t cllect at least10 backgrund bservatins befre using statistical methds t perfrm backgrund evaluatins based upn data cllected using discrete samples. The minimum sample size recmmendatins described here are useful when resurces are limited, thugh it may nt be pssible t cllect as many backgrund and site samples as cmputed using DQOs based sample size determinatin frmulae. In case data are cllected withut using a DQOs prcess, the Sample Sizes mdule can be used t assess the pwer f the test statistic in retrspect. Specifically, ne can use the standard deviatin f the cmputed test statistic (EPA 2006b) and cmpute the sample size (e.g., using Sample Size mdule f PrUCL) needed t meet the desired DQOs. If the cmputed sample size is greater than the size f the data set used, the prject team may want t cllect additinal samples t meet the desired DQOs. Ntes: Frm a mathematical pint f view, the statistical methds incrprated in PrUCL and described in this guidance dcument t estimate EPC terms and BTVs, and cmpare site versus backgrund cncentratins can be perfrmed n small site and backgrund data sets (e.g., f sizes as small as 3). Hwever, thse statistics may nt be cnsidered representative and reliable enugh t make imprtant cleanup and remediatin decisins. It is recmmended nt t use thse statistics t draw cleanup and remediatin decisins ptentially impacting human health and the envirnment. The minimum sample size recmmendatin (at least 10 bservatins) may be used nly when data sets f size determined by a DQOs prcess (EPA, 2006) cannt be cllected. Sme f the recent guidance dcuments (e.g., EPA 2009) are als suggesting cllecting a minimum f abut 10 samples in the circumstance that data cannt be cllected using a DQOs based prcess. T allw the users t cmpute decisin statistics based upn cmpsite data cllected using the Incremental Sampling Methdlgy (ITRC, 2012), PrUCL 5.0 will cmpute decisin statistics (e.g., UCLs, UPLs, UTLs) based upn samples f sizes as small as 3. The user is referred t the ITRC ISM Tech Reg Guide (2012) t determine which UCL (e.g., Student's tucl r Chebyshev UCL) shuld be used t estimate the EPC term. 24
51 1.7.1 Sample Sizes fr Btstrap Methds Several nnparametric methds including btstrap methds t cmpute UCL, UTL, and ther limits fr bth fulluncensred data sets and leftcensred data sets with NDs are available in PrUCL 5.0. Btstrap resampling methds are useful when nt t few (e.g., < 1520) and nt t many (e.g., > ) bservatins are available. Fr btstrap methds (e.g., percentile methd, BCA btstrap methd, btstrapt methd), a large number (e.g., 1000, 2000) f btstrap resamples (with replacement) are drawn with replacement frm the same data set. Therefre, t btain btstrap resamples with at least sme distinct values (s that statistics can be cmputed frm each resample), it is suggested that a btstrap methd shuld nt be used when dealing with small data sets f sizes less than Als, it is nt necessary t btstrap a large data set f size greater than 500 r 1000; that is when a data set f a large size (e.g., > 500) is available, there is n need t btain btstrap resamples t cmpute statistics f interest (e.g., UCLs). One can simply use a statistical methd n the riginal large data set. Mrever, btstrapping a large data set f size greater than 500 r 1000 will be time cnsuming. 1.8 Statistical Analyses by a Grup ID The analyses f data categrized by a grup ID variable such as: 1) Surface vs. Subsurface; 2) AOC1 vs. AOC2; 3) Site vs. Backgrund; and 4) Upgradient vs. Dwngradient mnitring wells are cmmn in envirnmental and varius ther applicatins. PrUCL 5.0 ffers this ptin fr data sets with and withut NDs. The Grup Optin prvides a useful tl t perfrm varius statistical tests and methds (including graphical displays) separately fr each f the grup (samples frm different ppulatins) that may be present in a data set. The graphical displays (e.g., bx plts, (quantilequantile) QQ plts) and statistics (e.g., backgrund statistics, UCLs, hyptheses testing appraches) f interest can be cmputed separately fr each grup by using this ptin. Mrever, using the Grup Optin, graphical methds can display multiple graphs (e.g., QQ plts) n the same graph prviding graphical cmparisn f multiple grups. It shuld be pinted ut that it is the users respnsibility t prvide adequate amunt f data t perfrm the grup peratins. Fr an example, if the user desires t prduce a graphical QQ plt (e.g., using nly detected data) with regressin lines displayed, then there shuld be at least tw detected data values (t cmpute slpe, intercept, standard deviatin [sd]) in the data set. Similarly if the graphs are desired fr each grup specified by the grup ID variable, there shuld be at least tw bservatins in each grup specified by the grup variable. PrUCL generates a warning message (clred range) in the lwer Lg Panel f the PrUCL 5.0 screen. 1.9 Statistical Analyses fr Many Cnstituents/Variables PrUCL sftware can prcess multiple analytes/variables simultaneusly in a user friendly manner an ptin nt available in ther sftware packages such as Minitab 16 (2012), NADA fr R (Helsel, 2013). This ptin is very useful when ne has t prcess multiple variables and cmpute decisin statistics (e.g., UCLs, UPLs, and UTLs) and test statistics (e.g., ANOVA test, trend test) fr thse variables. It is the user s respnsibility t make sure that each selected variable has an adequate amunt f data s that PrUCL can perfrm the selected statistical methd crrectly. PrUCL displays warning messages when a selected variable des nt have enugh data needed t perfrm the selected statistical methd. 25
52 1.10 Use f Maximum Detected Value as Estimates f Upper Limits Sme practitiners tend t use the maximum detected value as an estimate f the EPC term. This is especially true when the sample size is small such as < 5 r when a UCL95 exceeds the maximum detected values (EPA, 1992a). Als, many times in practice, the BTVs and nttexceed values are estimated by the maximum detected value (e.g., nnparametric UTLs, USLs) Use f Maximum Detected Value t Estimate BTVs and NttExceed Values BTVs and nttexceed values represent upper threshld values frm the upper tail f a data distributin; therefre, depending upn the data distributin and sample size, the BTVs and ther nttexceed values may be estimated by the largest r the secnd largest detected value. A nnparametric UPL, UTL, and USL are ften estimated by higher rder statistics such as the maximum value r the secnd largest value (EPA 1992b, 2009). The use f higher rder statistics t estimate the UTLs depends upn the sample size. Fr an example, fr data sets f size: 1) 59 t 92 bservatins, a nnparametric UTL9595 is given by the maximum detected value; 2) 93 t 123 bservatins, a nnparametric UTL9595 is given by the secnd largest maximum detected value; and 3) 124 t 152 bservatins, a UTL9595 is given by the third largest detected value in the sample, and s n Use f Maximum Detected Value t Estimate EPC Terms Sme practitiners tend t use the maximum detected value as an estimate f the EPC term. This is especially true when the sample size is small such as < 5 r when a UCL95 exceeds the maximum detected values (EPA, 1992a). Specifically, the EPA (1992a) dcument suggests the use f the maximum detected value as a default value t estimate the EPC term when a 95% UCL (e.g., the HUCL) exceeds the maximum value. PrUCL cmputes 95% UCLs f mean using several methds based upn nrmal, gamma, lgnrmal, and nndiscernible distributins. In the past (e.g., EPA 1992), a lgnrmal distributin was used as the default distributin t mdel psitively skewed envirnmental data sets; and nly tw methds were used t estimate the EPC term based upn: 1) nrmal distributin and Student s t statistic, and 2) lgnrmal distributin and Land s Hstatistic (1971, 1975). The use f the Hstatistic ften yields unstable and impractically large UCL95 f the mean (Singh, Singh, and Engelhardt, 1997; Singh, Singh, and Iaci, 2002). Fr skewed data sets f smaller sizes (e.g., < 30, < 50,...), HUCL ften exceeds the maximum detected value. Since the use f a lgnrmal distributin has been quite cmmn (e.g., suggested as a default mdel in a risk assessment guidance fr Superfund [RAGS] dcument [EPA, 1992a]), the exceedance f the maximum value by an HUCL95 is frequent fr many skewed data sets f smaller sizes (e.g., < 30, < 50). These ccurrences result in the pssibility f using the maximum detected value as an estimate f the EPC term. It shuld be pinted ut that in sme cases, the maximum bserved value actually might represent an impacted lcatin. Obviusly, it is nt desirable t use a ptential utlier representing an impacted lcatin t estimate the EPC term fr an AOC. The EPC term represents the average expsure cntracted by an individual ver an EA during a lng perid f time; therefre, the EPC term shuld be estimated by using an average value (such as an apprpriate 95% UCL f the mean) and nt by the maximum bserved cncentratin. One needs t cmpute an average expsure and nt the maximum expsure. Singh and Singh (2003) studied the perfrmance f the max test (using the maximum bserved value as an estimate f the EPC term) via Mnte Carl simulatin experiments. They nted that fr skewed data sets f small sizes (e.g., < 1020), even the max test des nt prvide the specified 95% cverage t the ppulatin mean, and fr larger data sets it verestimates the EPC term, which may lead t unnecessary further remediatin. 26
53 Tday, several methds, sme f which are described in EPA (2002a), are available in the varius versins f PrUCL (e.g., PrUCL [EPA 2004], PrUCL 4.0 [EPA 2007], PrUCL [EPA 2009, 2010]) t estimate the EPC terms. Fr data sets with NDs, PrUCL 5.0 has sme new UCL (and ther limits) cmputatin methds which were nt available in earlier versins f PrUCL. It is unlikely that the UCLs based upn thse methds will exceed the maximum detected value, unless sme utliers are present in the data set Chebyshev Inequality Based UCL95 PrUCL 5.0 (and its earlier versins) displays a warning message when the suggested 95% UCL (e.g., Hall s r btstrapt UCL with utliers) f the mean exceeds the detected maximum cncentratin. When a 95% UCL des exceed the maximum bserved value, PrUCL recmmends the use f an alternative UCL cmputatin methd based upn the Chebyshev inequality. One may use a 97.5% r 99% Chebyshev UCL t estimate the mean f a highly skewed ppulatin. The use f the Chebyshev inequality t cmpute UCLs tends t yield mre cnservative (but stable) UCLs than ther methds available in PrUCL sftware. In such cases, when the sample size is large (and ther UCL methds such as the btstrapt methd yield unrealistically high values due t presence f utlier(s)), ne may want t use a 95% Chebyshev UCL r a Chebyshev UCL with lwer cnfidence cefficient such as 90% as an estimate f the ppulatin mean, especially when the sample size is large (e.g., >100, 150). The detailed recmmendatins (as functins f sample size and skewness) fr the use f thse UCLs are summarized in varius versins f PrUCL Technical Guides (EPA, 2004, 2007, 2009, and 2010d). Ntes: It is recmmended nt t use the maximum bserved value t estimate the EPC term representing the average expsure cntracted by an individual ver an EA. Fr the sake f interested users, PrUCL displays a warning message when the recmmended 95% UCL (e.g., Hall s btstrap UCL) f the mean exceeds the bserved maximum cncentratin. Fr such scenaris (when a 95% UCL des exceed the maximum bserved value), an alternative 95% UCL cmputatin methd based upn Chebyshev inequality is recmmended by the PrUCL sftware Samples with Nndetect Observatins ND bservatins are inevitable in mst envirnmental data sets. Singh, Maichle, and Lee (EPA, 2006) studied the perfrmances (in terms f cverages) f the varius UCL95 cmputatin methds including the simple substitutin methds (such as the DL/2 and DL methds) fr data sets with ND bservatins. They cncluded that the UCLs btained using the substitutin methds, including the replacement f NDs by respective DL/2; d nt perfrm well even when the percentage f ND bservatins is lw, such as less than 5% t 10%. They recmmended aviding the use f substitutin methds t cmpute UCL95 based upn data sets with ND bservatins Avid the Use f DL/2 Methd t Cmpute UCL95 Based upn the results f the reprt by Singh, Maichle, and Lee (EPA, 2006), it is recmmended t avid the use f the DL/2 methd t perfrm a GOF test, and t cmpute the summary statistics and varius ther limits (e.g., UCL, UPL, UTLs) ften used t estimate the EPC terms and BTVs. Until recently, the DL/2 methd has been the mst cmmnly used methd t cmpute the varius statistics f interest fr data sets with NDs. The main reasn fr this has been the lack f the availability f the ther rigrus methds and assciated sftware prgrams that can be used t estimate the varius envirnmental parameters f interest. Tday, several methds (e.g., using KaplanMeier [KM] estimates) including 27
54 Chebyshev inequality and btstrap methds with better perfrmance are available that can be used t cmpute the varius upper limits f interest. Several f thse parametric and nnparametric methds are available in PrUCL 4.0 and higher versins. It shuld be nted that the DL/2 methd is included in PrUCL fr histrical reasns as it had been the mst cmmnly used and recmmended methd until recently (EPA, 2006b). EPA scientists and several reviewers f the PrUCL sftware had suggested and requested the inclusin f DL/2 methd in PrUCL fr cmparisn and research purpses. Ntes: Even thugh the DL/2 methd (t cmpute UCLs, UPLs, and fr gdnessffit [GOF] tests) has been incrprated in PrUCL, its use is nt recmmended due t its pr perfrmance. The DL/2 methd has been retained in PrUCL 5.0 fr histrical and cmparisn purpses. NERLEPA, Las Vegas strngly recmmends aviding the use f DL/2 methd even when the % f NDs is as lw as 5% t 10% Samples with Lw Frequency f Detectin When all f the sampled values are reprted as NDs, the EPC term and ther statistical limits shuld als be reprted as a ND value, perhaps by the maximum reprting limit (RL) r the maximum RL/2. Statistics (e.g., UCL95) cmputed based upn nly a few detected values (e.g., < 4) cannt be cnsidered reliable enugh t estimate the EPC terms having ptential impact n human health and the envirnment. When the number f detected values is small, it is preferable t use ad hc methds rather than using statistical methds t cmpute the EPC terms and ther upper limits. Specifically, it is suggested that fr data sets cnsisting f less than 4 detects and fr small data sets (e.g., size < 10) with lw detectin frequency (e.g., < 10%), the prject team and the decisin makers tgether shuld decide n a sitespecific basis n hw t estimate the average expsure (EPC term) fr the cnstituent and area under cnsideratin. Fr such data sets with lw detectin frequencies, ther measures such as the median r mde represents better estimates (with lesser uncertainty) f the ppulatin measure f central tendency. Additinally, it is als suggested that when mst (e.g., > 95%) f the bservatins fr a cnstituent lie belw the DLs, the sample median r the sample mde (rather than the sample average) may be used as an estimate the EPC term. Nte that when the majrity f the data are NDs, the median and the mde may als be represented by a ND value. The uncertainty assciated with such estimates will be high. The statistical prperties, such as the bias, accuracy, and precisin f such estimates, wuld remain unknwn. In rder t be able t cmpute defensible estimates, it is always desirable t cllect mre samples Sme Other Applicatins f Methds in PrUCL 5.0 In additin t perfrming backgrund versus site cmparisns fr CERCLA and RCRA sites, and estimating the EPC terms in expsure and risk evaluatin studies, the statistical methds as incrprated in PrUCL can be used t address ther issues dealing with envirnmental investigatins that are cnducted at Superfund r RCRA sites Identificatin f COPCs Risk assessrs and remedial prject managers (RPMs) ften use screening levels r BTVs t identify the COPCs during the screening phase f a cleanup prject t be cnducted at a cntaminated site. The screening fr the COPCs is perfrmed prir t any characterizatin and remediatin activities that may have t be cnducted at the site. This cmparisn is perfrmed t screen ut thse cnstituents that may be present in the site medium f interest at lw levels (e.g., at r belw the backgrund levels r sme preestablished screening levels) and may nt pse any threat and cncern t human health and the 28
55 envirnment. Thse cnstituents may be eliminated frm all future site investigatins, and risk assessment and risk management studies. T identify the COPCs, pintbypint site bservatins are cmpared with sme preestablished sil screening levels (SSL), r estimated BTVs. This is especially true when the cmparisns f site cncentratins with screening levels r BTVs are cnducted in real time by the sampling r cleanup crew nsite. The prject team shuld decide the type f site samples (discrete r cmpsite) and the number f site bservatins that shuld be cllected and cmpared with the screening levels r the BTVs. In case BTVs r screening levels are nt knwn, the availability f a defensible sitespecific backgrund r reference data set f reasnable size (e.g., at least 10) is required t btain reliable estimates f BTVs and screening levels. The cnstituents with cncentratins exceeding the respective screening values r BTVs may be cnsidered COPCs, whereas cnstituents with cncentratins (e.g., in all cllected samples) lwer than the screening values r BTVs may be mitted frm all future evaluatins Identificatin f NnCmpliance Mnitring Wells In MW cmpliance assessment applicatins, individual (ften discrete) cnstituent cncentratins frm a MW are cmpared with sme preestablished limits such as an ACL r a maximum cncentratin limit (MCL). An exceedance f the MCL r the BTV by a MW cncentratin may be cnsidered an indicatin f cntaminatin in that MW. In such individual cncentratin cmparisns, the presence f cntaminatin (determined by an exceedance) may have t be cnfirmed by resampling frm that MW. If cncentratins f cnstituents in the riginal sample and resample(s) exceed the MCL r BTV, then that MW may require further scrutiny, perhaps triggering remediatin remedies as determined by the prject team. If the cncentratin data frm a MW fr abut 4 t 5 cntinuus quarters (r sme ther designated time perid determined by the prject team) are belw the MCL r BTV level, then that MW may be cnsidered as cmplying with (achieving) the preestablished r estimated standards Verificatin f the Attainment f Cleanup Standards, C s Hypthesis testing appraches are used t verify the attainment f the cleanup standard, C s, at plluted site AOCs after cnducting remediatin and cleanup at thse site AOCs (EPA, 1989a, 1994). In rder t assess the attainment f cleanup levels, a representative data set f adequate size perhaps btained using the DQOs prcess (r a minimum f 10 bservatins shuld be cllected) needs t be made available frm the remediated/excavated areas f the site under investigatin. The sample size shuld als accunt fr the size f the remediated site areas: meaning that larger site areas shuld be sampled mre (with mre bservatins) t btain a representative sample f the remediated site areas under investigatin. Typically, the null hypthesis f interest is H 0 : Site Mean, s C s versus the alternative hypthesis, H 1 : Site Mean, s < C s, where the cleanup standard, C s, is knwn a priri Using BTVs (Upper Limits) t Identify Ht Spts The use f upper limits (e.g., UTLs) t identify ht spt(s) has als been mentined in the Guidance fr Cmparing Backgrund and Chemical Cncentratins in Sil fr CERCLA Sites (EPA, 2002b). Pintbypint site bservatins are cmpared with a preestablished r estimated BTV. Exceedances f the BTV by site bservatins may be cnsidered as representing impacted lcatins with elevated cncentratins (ht spts). 29
56 1.14 Sme General Issues and Recmmendatins made by PrUCL Sme general issues regarding the handling f multiple detectin limits and field duplicates by PrUCL and recmmendatins made abut varius substitutin and regressin n rder statistics (ROS) methds fr data sets with NDs are described in the fllwing sectins Multiple Detectin Limits PrUCL 5.0 des nt make distinctins between methd detectin limits (MDLs), adjusted MDLs, sample quantitatin limits (SQLs), r DLs. Multiple DLs in PrUCL mean different values f the DL. An indicatr variable with f 0 (=nndetect) and 1(= detect) is assigned t each variable cnsisting f NDs. All ND bservatins in PrUCL are indentified by the value 0 f the indicatr variable used in PrUCL t distinguish between detected (=1) and nndetected (=0) bservatins. It is the users respnsibility t supply crrect numerical values fr NDs (shuld be entered as the reprted detectin limit r RL values) and nt as qualifiers (e.g., J, U, B, UJ,...) fr ND bservatins in the data set PrUCL Recmmendatin abut ROS Methd and Substitutin (DL/2) Methd Fr data sets with NDs, PrUCL 5.0 can cmpute pint estimates f ppulatin mean and standard deviatin using the KM and ROS methds (and als using DL/2 methd). The DL/2 methd has been retained in PrUCL fr histrical and research purpses. PrUCL uses Chebyshev inequality, btstrap methds, and nrmal, gamma, and lgnrmal distributin based equatins n KM (r ROS) estimates t cmpute the varius upper limits (e.g., UCLs, UTLs). The simulatin study cnducted by Singh, Maichle and Lee (2006) demnstrated that the KM methd yields accurate estimates f the ppulatin mean. They als demnstrated that fr mderately skewed t highly skewed data sets, UCLs based upn KM estimates and BCA btstrap (mild skewness), KM estimates and Chebyshev inequality (mderate t high skewness), and KM estimates and btstrapt methd (mderate t high skewness) yield better (in terms f cverage prbability) estimates f EPC terms than ther UCL methds based upn Student's t statistic n KM estimates, percentile btstrap methd n KM r ROS estimates The Unfficial User Guide t PrUCL4 (Helsel and Gilry, 2012) Several PrUCL users sent inquiries abut the validity f the cmments made abut the PrUCL sftware in the Unfficial User Guide t PrUCL4 (Helsel and Gilry, 2012) and in the Practical Stats webinar, "PrUCL v4: The Unfficial User Guide," presented by Dr. Helsel n Octber 15, 2012 (Helsel 2012a). Their inquiries led us t review cmments made abut the PrUCL v4 sftware and its assciated guidance dcuments (EPA 2007, 2009a, 2009b, 2010c, and 2010d) in the Unfficial PrUCL v4 User Guide and in the webinar, "PrUCL v4: The Unfficial User Guide". These tw dcuments cllectively are referred t as the Unfficial PrUCLv4 User Guide in this PrUCL dcument. The pdf dcument describing the material presented in the Practical Stats Webinar (Helsel, 2012a) was dwnladed frm the website. In the "PrUCL v4: The Unfficial User Guide", cmments have been made abut the sftware and its guidance dcuments, therefre, it is apprpriate t address thse cmments in the present PrUCL guidance dcument. It is necessary t prvide the detailed respnse t cmments made in the Unfficial PrUCL v4 User Guide t assure that: 1) rigrus statistical methds are used t cmpute the decisin making statistics; and 2) the methds incrprated in PrUCL sftware are nt misrepresented and misinterpreted. Sme general respnses and cmments abut the material presented in the Practical Stats webinar and in the Unfficial User Guide t PrUCLv4 are described as fllws. Specific cmments and 30
57 respnses are als cnsidered in the respective chapters f PrUCL 5.0 Technical and User Guides. The detailed respnses t the cmments made abut the PrUCL sftware in the Unfficial PrUCL v4 User Guide are prvided elsewhere. PrUCL is a freeware sftware package which has been develped under limited gvernment funding t address statistical issues assciated with varius envirnmental site prjects. Nt all statistical methds (e.g., Levene test) described in the statistical literature have been incrprated in PrUCL. One may nt cmpare PrUCL with the cmmercial sftware packages which are expensive and nt as easy t use as the PrUCL sftware t address envirnmental statistical issues. The existing and sme new statistical methds based upn the research cnducted by ORDNERL, EPA Las Vegas during the last cuple f decades have been incrprated in PrUCL t address the statistical needs f the varius envirnmental site prjects and research studies. Sme f thse new methds may nt be available in text bks, in the library f prgrams written in Rscript, and in cmmercial sftware packages. Hwever, thse methds are described in detail in the cited published literature and als in the PrUCL Technical Guides (e.g., EPA [2007, 2009a, 2009b, 2010c and 2010d]). Even thugh fr uncensred data sets, prgrams t cmpute gamma distributin based UCLs and UPLs are available in R Script, prgrams t cmpute a 95% UCL f mean based upn a gamma distributin n KM estimates are nt easily available in cmmercial sftware packages and in R script. In the Unfficial PrUCL v4 User Guide, several statements have been made abut percentiles. There are several ways t cmpute percentiles. Percentiles cmputed by PrUCL may r may nt be identical (dn't have t be) t percentiles cmputed by NADA fr R (Helsel, 2013) r described in Helsel and Gilry (2012). T address users' requests, PrUCL 4.1 (2010) and its higher versins cmpute percentiles that are cmparable t the percentiles cmputed by Excel 2003 and higher versins. The literature search suggests that there are a ttal f nine (9) knwn types f percentiles, i.e., 9 different methds f calculating percentiles in statistics literature (Hyndman and Fan, 1996). The R prgramming language (R Cre Team, 2012) has all f these 9 types which can be cmputed using the fllwing statement in R quantile(x, p, type=k) where p = percentile, k = integer between 19 PrUCL cmputes percentiles using Type 7; Minitab 16 and SPSS cmpute percentiles using Type 6. It is simply a matter f chice, as there is n 'best' type t use. Many sftware packages use ne type fr calculating a percentile, and anther fr a bx plt (Hyndman and Fan, 1996). An incrrect statement "By definitin, the sample mean has a 50% chance f being belw the true ppulatin mean" has been made in Helsel and Gilry (2012) and als in Helsel (2012a). The abve statement is nt crrect fr means f skewed distributins (e.g., lgnrmal r gamma) cmmnly ccurring in envirnmental applicatins. Since Helsel (2012) prefers t use a lgnrmal distributin, the incrrectness f the abve statement has been illustrated using a lgnrmal distributin. The mean and median f a lgnrmal distributin (details in Sectin f Chapter 2 f PrUCL Tech Guide) are given by: 2 mean = μ exp( μ 0.5 ); and median = M exp(μ) 1 σ Frm the abve equatins, it is clear that the mean f a lgnrmal distributin is always greater than the median fr all psitive values f σ (sd f lgtransfrmed variable). Actually the mean is greater 31
58 than the p th percentile when σ >2z p. Fr example, when p = 0.80, z p = 0.845, and mean f a lgnrmal distributin, μ 1 exceeds x 0.80, the 80 th percentile when σ > In ther wrds, when σ > 1.69 the lgnrmal mean will exceed the 80 th percentile f a lgnrmal distributin. T demnstrate the incrrectness f the abve statement, a small simulatin study was cnducted. The distributin f sample means based upn samples f size 100 were generated frm lgnrmal distributins with µ =4, and varying skewness. The experiment was perfrmed 10,000 times t generate the distributins f sample means. The prbabilities f sample means less than the ppulatin means were cmputed. The fllwing results are nted. Table 12. Prbabilities px ( 1) Cmputed fr Lgnrmal Distributins with µ=4 and Varying Values f σ Results are based upn Simulatin Runs fr Each Lgnrmal Distributin Cnsidered µ=4, σ=0.5 µ=4, σ=1 µ=4,σ=1.5 µ=4,σ=2 µ=4,σ=2.5 Parameter µ 1 =61.86 σ 1 =32.97 µ 1 = σ 1 = µ 1 = σ 1 = µ 1 = σ 1 = µ 1 = , σ 1 = px ( ) Mean Median The prbabilities summarized in the abve table demnstrate that the statement abut the mean made in Helsel and Gilry (2012) is incrrect. Graphical Methds: Graphical methds are available in PrUCL as explratry tls which can be generated fr bth uncensred and leftcensred data sets. The Unfficial PrUCL Guide makes several cmments abut Bx plts and QQ plts incrprated in PrUCL. The Unfficial PrUCL Guide states that all graphs with NDs are incrrect. These statements are misleading and incrrect. The intent f the graphical methds in PrUCL is explratry t gain infrmatin (e.g., utliers, multiple ppulatins, data distributin, patterns, and skewness) present in a data set. Based upn the data displayed (PrUCL displays a message [e.g., as a subtitle] in this regard) n thse graphs, all statistics shwn n thse graphs generated by PrUCL are crrect. Bx Plts: In statistical literature, ne can find several ways t generate bx plts. The practitiners may have their wn preferences t use ne methd ver the ther. All bx plt methds including the ne in PrUCL cnvey the same infrmatin abut the data set (utliers, mean, median, symmetry, skewness). PrUCL uses a cuple f develpment tls such as FarPint spread (fr Excel type input and utput peratins) and ChartFx (fr graphical displays); and PrUCL generates bx plts using the builtin bx plt feature in ChartFx. Fr all practical and explratry purpses, bx plts in PrUCL are equally gd (if nt better) as available in the varius cmmercial sftware packages t get an idea abut the data distributin (skewed r symmetric), t identify utliers, and t cmpare multiple grups (main bjectives f bx plts in PrUCL). As mentined earlier, it is a matter f chice f using percentiles/quartiles t cnstruct a bx plt. There is n 'best' methd t cnstruct a bx plt. Many sftware packages use ne methd (e.g., ut f 9 described abve) fr calculating a percentile, and anther fr cnstructing a bx plt (Hyndman and Fan, 1996). QQ plts: All QQ plts incrprated in PrUCL are crrect and f high quality. In additin t identify utliers, QQ plts are als used t assess data distributins. Multiple QQ plts are useful t 32
59 perfrm pintbypint cmparisns f gruped data sets unlike bx plts based upn the five pint summary statistics. PrUCL has QQ plts fr nrmal, lgnrmal, and gamma distributins  nt all f these graphical capabilities are directly available in ther sftware packages such as NADA fr R (Helsel, 2013). PrUCL ffers several explratry ptins t generate QQ plts fr data sets with NDs. Only detected utlying bservatins may require additinal investigatin; therefre, frm an explratry pint f view, PrUCL can generate QQ plts excluding all NDs (and ther ptins). Under this scenari there is n need t retain place hlders (cmputing pltting psitins used t impute NDs) as the bjective is nt t impute NDs. T impute NDs, PrUCL uses ROS methds (Gamma ROS and lg ROS) requiring place hlders; and PrUCL cmputes pltting psitins fr all detects and NDs t generate a prper regressin mdel which is used t impute NDs. Als fr cmparisn purpses, PrUCL can be used t generate QQ plts n data sets btained by replacing NDs by their respective DLs r DL/2s. In these cases als, n NDs are imputed, and there is n need t retain placehlders fr NDs. On these QQ plts, PrUCL displays sme relevant statistics which are cmputed based upn the data displayed n thse graphs. Helsel (2012a) states that the Summary Statistics mdule des nt display KM estimates and that statistics based upn lgged data are useless. Typically, estimates cmputed after prcessing the data d nt represent summary statistics. Therefre, KM and ROS estimates are nt displayed in Summary Statistics mdule. These statistics are available in several ther mdules including the UCL and BTV mdules. At the request f several users, summary statistics are cmputed based upn lgged data. It is believed that mean, median, r standard deviatin f lgged data d prvide useful infrmatin abut data skewness and data variability. T test fr the equality f variances the Ftest, as incrprated in PrUCL, perfrms fairly well and the inclusin f the Levene's (1960) test will nt add any new capability t the PrUCL sftware. Therefre, taking budget cnstraints int cnsideratin, Levene's test has nt been incrprated in the PrUCL sftware. Hwever, althugh it makes sense t first determine if the tw variances are equal r unequal; this is nt a requirement t perfrm a ttest. The tdistributin based cnfidence interval r test fr 12 based n the pled sample variance des nt perfrm better than the apprximate cnfidence intervals based upn Satterthwaite's test. Hence testing fr the equality f variances is nt required t perfrm a twsample ttest. The use f WelchSatterthwaite's r Cchran's methd is recmmended in all situatins (see, fr example, F. Hayes [2005]). Helsel (2012a) suggested that imputed NDs shuld nt be made available t the users. The develpers f PrUCL and ther researchers like t have access t imputed NDs. As a researcher, fr explratry purpses, ne may want t have access t imputed NDs t be used by explratry advanced methds such as multivariate methds including data mining and principal cmpnent analysis. It is nted that ne cannt easily perfrm explratry methds n multivariate data sets with NDs. The availability f imputed NDs makes it pssible fr researchers t use data mining explratry methds n multivariate data sets with NDs. The statements summarized abve shuld nt be misinterpreted. One may nt use parametric hypthesis tests such as a ttest r a classical ANOVA n data sets cnsisting f imputed NDs. These methds require further investigatin as the decisin errrs assciated with such methds remain unquantified. There are ther methds such as Gehan and TarneWare tests in PrUCL5.0 which are better suited fr data sets with multiple detectin limits. 33
60 Outliers: Helsel (2012a) and Helsel and Gilry (2012) make several cmments abut utliers. The philsphy (with input frm EPA scientists) f the develpers f PrUCL abut the utliers in envirnmental applicatins is that thse utliers (unless they represent typgraphical errrs) may ptentially represent impacted (site related r therwise) lcatins r mnitring wells; and therefre may require further investigatin. The presence f utliers in a data set tends t destry the nrmality f the data set. In ther wrds, a data set with utliers can seldm (may be when utliers are mild lying arund the brder f the central and tail part f a nrmal distributin) fllw a nrmal distributin. There are mdern rbust and resistant utlier identificatin methds (e.g., Russeeuw and Lery, 1987; Singh and Ncerin, 1995) which are better suited t identify utliers present in a data set; several f thse rbust utlier identificatin methds are available in the Scut 2008 versin 1.0 (EPA 2009) sftware package. Fr bth Rsner and Dixn tests, it is the data set (als called the main bdy f the data set) btained after remving the utliers (and nt the data set with utliers) that needs t fllw a nrmal distributin. Outliers are nt knwn in advance. PrUCL has nrmal QQ plts which can be used t get an idea abut ptential utliers (r mixture ppulatins) present in a data set. Hwever, since a lgnrmal mdel tends t accmmdate utliers, a data set with utliers can fllw a lgnrmal distributin; this des nt imply that the utlier ptentially representing an impacted/unusual lcatin des nt exist! In envirnmental applicatins, utlier tests shuld be perfrmed n raw data sets, as the cleanup decisins need t be made based upn values in the raw scale and nt in lgscale r sme ther transfrmed space. Mre discussin abut utliers can be fund in Chapter 7 f the PrUCL Technical Guide. In Helsel (2012a), it is stated, "Mathematically, the lgnrmal is simpler and easier t interpret than the gamma (pinin)." We d agree with the pinin that the lgnrmal is simpler and easier t use but the lgtransfrmatin is ften misunderstd and hence incrrectly used and interpreted. Numerus examples (e.g., Example 21 and 22, Chapter 2 f PrUCL Technical Guide) are prvided in the PrUCL guidance dcuments illustrating the advantages f the using a gamma distributin. It is further stated in Helsel (2012 a) that PrUCL prefers the gamma distributin because it dwnplays utliers as cmpared t the lgnrmal. This argument can be turned arund  in ther wrds, ne can say that the lgnrmal is preferred by practitiners wh want t inflate the effect f the utlier. Setting this argument aside, we prefer the gamma distributin as it des nt transfrm the variable s the results are in the same scale as the cllected data set. As mentined earlier, lgtransfrmatin des appear t be simpler but prblems arise when practitiners are nt aware f the pitfalls (e.g., Singh and Ananda, 2002; Singh, Singh, and Iaci, 2002). Helsel (2012a) and Helsel and Gilry (2012) state that "lgnrmal and gamma are similar, s usually if ne is cnsidered pssible, s is the ther." This is an incrrect and misleading statement. There are significant differences in the tw distributins and in their mathematical prperties. Based upn the extensive experience in envirnmental statistics and published literature, fr skewed data sets that fllw bth lgnrmal and gamma distributins, the develpers d favr the use f the gamma distributin ver the lgnrmal distributin. The use f the gamma distributin based decisin statistics is preferred t estimate the envirnmental parameters (mean, upper percentile). A lgnrmal mdel tends t hide cntaminatin by accmmdating utliers and multiple ppulatins whereas a gamma distributin tends nt t accmmdate cntaminatin as can be seen in Examples 21and 22 f Chapter 2 f PrUCL Technical Guide. The use f the lgnrmal distributin n a data set with 34
61 utliers tends t yield inflated and distrted estimates which may nt be prtective f human health and the envirnment; this is especially true fr skewed data sets f small f sizes < In the cntext f cmputing a UCL95 f mean, Helsel and Gilry (2012) and Helsel (2012a) state that GROS and LROS are prbably never better than KM. It shuld be nted that these three estimatin methds cmpute estimates f mean and standard deviatin and nt the upper limits used t estimate EPC terms and BTVs. The use f KM methd des yield gd estimates f mean and standard deviatin as nted by Singh, Maichle, and Lee (2006). Cmputing gd estimates f mean and sd based upn leftcensred data sets addresses nly half f the prblem. The main issue is t cmpute decisin statistics (UCL, UPL, UTL) which accunt fr uncertainty and data skewness inherently present in envirnmental data sets. Realizing that fr skewed data sets, Student's tucl, CLTUCL, and standard and percentile btstrap UCLs d nt prvide the specified cverage t the ppulatin mean, fr uncensred data sets researchers (e.g., Jhnsn (1978), Chen (1995), Efrn and Tibshirani (1993), Hall [1988, 1992], Grice and Bain (1980), Singh, Singh, and Engelhardt (1997), Singh, Singh, and Iaci (2002)) have develped parametric (e.g., gamma distributin) and nnparametric (e.g., btstrapt and Hall's btstrap methd, mdifiedt, and Chebyshev inequality) methds t cmpute cnfidence intervals and upper limits which adjust fr data skewness. Analytically, it is nt feasible t cmpare the varius estimatin and UCL cmputatin methds fr skewed data sets cnsisting f nndetect bservatins. Instead, researchers use simulatin experiments t learn abut the distributins and perfrmances f the varius statistics (e.g., KMt UCL, KMpercentile btstrap UCL, KMbtstrapt UCL, KMGamma UCL). Based upn the suggestins made in published literature and findings summarized in Singh, Maichle, and Lee (2006), it is reasnable t state and assume that the findings f the simulatin studies perfrmed n uncensred skewed data sets t cmpare the perfrmances f the varius UCL cmputatin methds can be extended t skewed leftcensred data sets. Like uncensred skewed data sets, fr leftcensred data sets, PrUCL 5.0 has several parametric and nnparametric methds t cmpute UCLs and ther limits which adjust fr data skewness. Specifically, PrUCL uses KM estimates in gamma equatins; in btstrapt methd, and in Chebyshev inequality t cmpute upper limits fr leftcensred skewed data sets. Helsel (2012a) states that PrUCL 4 is based upn presuppsitins. It is emphasized that PrUCL des nt make any suppsitins in advance. Due t the pr perfrmance f a lgnrmal mdel (as demnstrated in the literature and illustrated via examples thrughut the PrUCL Technical Guide), the use f a gamma distributin is preferred when a data set can be mdeled by a lgnrmal mdel and a gamma mdel. T prvide the desired cverage (as clse as pssible) fr the ppulatin mean, in earlier versins f PrUCL (versin 3.0), in lieu f HUCL, the use f Chebyshev UCL was suggested fr mderately and highly skewed data sets. In later ( and higher) versins f PrUCL, depending upn data skewness and data distributin, fr gamma distributed data sets, the use f Gamma distributin was suggested t cmpute the UCL f mean. Upper limits (e.g., UCLs, UPLs, UTLs) cmputed using the Student's t statistic and percentile btstrap methd (Helsel, 2012, NADA fr R, 2013) ften fail t prvide the desired cverage (e.g., 95% cnfidence cefficient) t the parameters (mean, percentile) f mst f the skewed envirnmental ppulatins. It is suggested that the practitiners cmpute the decisin making statistics (e.g., UCLs, UTLs) by taking: data distributin; data set size; and data skewness int cnsideratin. Fr uncensred and leftcensred data 35
62 sets, several such upper limits cmputatin methds have been incrprated in PrUCL 5.0 and its earlier versins. Cntrary t the statements made in Helsel and Gilry (2012), PrUCL sftware des nt favr statistics which yield higher (e.g., nnparametric Chebyshev UCL) r lwer (e.g., preferring the use f a gamma distributin t using a lgnrmal distributin) estimates f the envirnmental parameters (e.g., EPC and BTVs). The main bjectives f the PrUCL sftware funded by USEPA is t cmpute rigrus decisin statistics t help the decisin makers and prject teams in making crrect decisins which are prtective f human health and the envirnment. Page 75 (Helsel [2012]): One f the reviewers f the PrUCL 5.0 sftware drew ur attentin t the fllwing incrrect statement made n page 75 f Helsel (2012): "If there is nly 1 reprting limit, the result is that the mean is identical t a substitutin f the reprting limit fr censred bservatins." An example leftcensred data set cnsisting f nndetect (NDs) bservatins with ne reprting limit f 20 illustrating this issue is described as fllws. Y D_y The mean and standard deviatin based upn the KM and tw substitutin methds: DL/2 and DL are summarized as fllws: KaplanMeier (KM) Statistics Mean 39.4 SD DL Substitutin methd (replacing censred values by the reprting limit) Mean 42.7 SD DL/2 Substitutin methd (replacing NDs by the reprting limit) Mean 39.7 SD
63 The abve example illustrates that the KM mean (when nly 1 detectin limit is present) is nt actually identical t the mean estimate btained using the substitutin, DL methd. The statement made in Helsel's text hlds when all bservatins reprted as detects are greater than the single reprting limit which is seldm true in envirnmental data sets cnsisting f analytical cncentratins. 37
64 38
65 Chapter 2 Entering and Manipulating Data 2.1 Creating a New Data Set By executing PrUCL 5.0, the fllwing file ptins will appear: By chsing the File New ptin, a new wrksheet shwn belw will appear. The user enters variable names and data fllwing the PrUCL input file frmat requirements described in Sectin Opening an Existing Data Set The user can pen an existing wrksheet (*.xls, *.xlsx, *.wst, and *.st) by chsing the File Open Single File Sheet ptin. The fllwing drp dwn menu will appear: 39
66 Chse a file by high lighting the type f file such as.xls as shwn abve. This ptin can als be used t read in *.wst wrksheet and *.st utput sheet generated by earlier versins (e.g., PrUCL 4.1 and lder) f PrUCL. By chsing the File Excel Multiple Sheets ptin, the user can pen an Excel file cnsisting f multiple sheets. Each sheet will be pened as a separate file t be prcessed individually by PrUCL 5.0 Cautin: If yu are editing a file (e.g., an excel file using Excel), make sure t clse the file befre imprting the file int PrUCL using the file pen ptin. 2.3 Input File Frmat The prgram can read Excel files. The user can perfrm typical Cut, Paste, and Cpy peratins available under the Edit Menu Optin as shwn belw. 40
67 The first rw in all input data files cnsist f alphanumeric (strings f numbers and characters) names representing the header rw. Thse header names may represent meaningful variable names such as Arsenic, Chrmium, Lead, GrupID, and s n. The GrupID clumn hlds the labels fr the grups (e.g., Backgrund, AOC1, AOC2, 1, 2, 3, a, b, c, Site 1, Site 2,...) that might be present in the data set. Alphanumeric strings (e.g., Surface, Subsurface) can be used t label the varius grups. Mst f the mdules f PrUCL can prcess data by a grup variable. The data file can have multiple variables (clumns) with unequal number f bservatins. Mst f the mdules f PrUCL can prcess data by a grup variable. Except fr the header rw and clumns representing the grup labels, nly numerical values shuld appear in all ther rws. All alphanumeric strings and characters (e.g., blank, ther characters, and strings), and all ther values (that d nt meet the requirements abve) in the data file are treated as missing values and are mitted frm statistical evaluatins. Als, a large value dented by 1E31 (= 1x10 31 ) can be used t represent missing data values. All entries with this value are ignred frm the cmputatins. These values are cunted under the number f missing values. 2.4 Number Precisin The user may turn Full Precisin n r ff by chsing Cnfigure Full Precisin On/OFF By leaving Full Precisin turned ff, PrUCL will display numerical values using an apprpriate (default) decimal digit ptin; and by turning Full Precisin ff, all decimal values will be runded t the nearest thusandths place. Full Precisin n ptin is specifically useful when ne is dealing with data sets cnsisting f small numerical values (e.g., < 1) resulting in small values f the varius estimates and test statistics. These values may becme s small with several leading zers (e.g., ) after the decimal. In such situatins, ne may want t use the "Full Precisin n ptin t see nnzer values after the decimal. Nte: Fr the purpse f this User Guide, unless nted therwise, all examples have used the Full Precisin ff ptin. This ptin prints ut results up t 3 significant digits after the decimal. 41
68 2.5 Entering and Changing a Header Name 1. The user can change variable names (Header Name) using the fllwing prcess. Highlight the clumn whse header name (variable name) yu want t change by clicking either the clumn number r the header as shwn belw. 2. Rightclick and then click Header Name. 3. Change the Header Name. 42
69 4. Click the OK buttn t get the fllwing utput with the changed variable name. 2.6 Saving Files The Save ptin allws the user t save the active windw in Excel 2003 r Excel The Save As ptin als allws the user t save the active windw. This ptin fllws typical Windws standards, and saves the active windw t a file in.xls r.xlsx frmat. All mdified/edited data files, and utput screens (excluding graphical displays) generated by the sftware can be saved as.xls r.xlsx files. 43
70 2.7 Editing Click n the Edit menu item t reveal the fllwing drpdwn ptins. Cut ptin: similar t a standard Windws Edit ptin, such as in Excel. It perfrms standard edit functins n selected highlighted data (similar t a buffer). Cpy ptin: similar t a standard Windws Edit ptin, such as in Excel. It perfrms typical edit functins n selected highlighted data (similar t a buffer). Paste ptin: similar t a standard Windws Edit ptin, such as in Excel. It perfrms typical edit functins f pasting the selected (highlighted) data t the designated spreadsheet cells r area. 2.8 Handling Nndetect Observatins and Generating Files with Nndetects Several mdules f PrUCL (e.g., Statistical Tests, Upper limits/btvs, UCLs/EPCs) handle data sets cnsisting f ND bservatins with single and multiple DLs. The user infrms the prgram abut the status f a variable cnsisting f NDs. Fr a variable with ND bservatins (e.g., arsenic), the detected values, and the numerical values f the assciated detectin limits (fr less than values) are entered in the apprpriate clumn assciated with that variable. N qualifiers r flags (e.g., J, B, U, UJ, X,...) shuld be entered in data files cnsisting f ND bservatins. Data fr variables with ND values are prvided in tw clumns. One clumn cnsists f numerical values f detected bservatins and numerical values f detectin limits (r reprting limits) assciated with bservatins reprted as NDs; and the secnd clumn represents their detectin status cnsisting f nly 0 (fr ND values) and 1 (fr detected values) values. The name f the crrespnding variable representing the detectin status shuld start with d_, r D_ (nt case sensitive) and the variable name. The detectin status clumn with variable name starting with a D_ (r a d_) shuld have nly tw values: 0 fr ND values, and 1 fr detected bservatins. Fr example, the header name, D_Arsenic is used fr the variable, Arsenic having ND bservatins. The variable D_Arsenic cntains a 1 if the crrespnding Arsenic value represents a detected entry, and cntains a 0 if the crrespnding entry represents a ND entry. 44
71 The user shuld fllw this frmat therwise the prgram will nt recgnize that yur data set has NDs. An example data set illustrating these pints is given as fllws. 2.9 Cautin Care shuld be taken t avid any misrepresentatin f detected and nndetected values. Specifically, it is advised nt t have any missing values (blanks, characters) in the D_clumn (detectin status clumn). If a missing value is lcated in the D_clumn (and nt in the assciated variable clumn), the crrespnding value in the variable clumn is treated as a ND, even if this might nt have been the intentin f the user. It is mandatry that the user makes sure that nly a 1 r a 0 are entered in the detectin status D_clumn. If a value ther than a 0 r a 1 (such as qualifiers) is entered in the D_ clumn (the detectin clumn), results may becme unreliable, as the sftware defaults t any number ther than 0 r 1 as a ND value. When cmputing statistics fr full uncensred data sets withut any ND values, the user shuld select nly thse variables (frm the list f available variables) that cntain n ND bservatins. Specifically, ND values fund in a clumn chsen fr the summary statistics (fulluncensred data set) will be treated as a detected value; whatever value (e.g., detectin limit) is entered in that clumn will be used t cmpute summary statistics fr a fulluncensred data set withut any ND values. It is mandatry that the header name f a nndetect clumn assciated with a variable such as XYZ shuld be D_XYZ (r d_xyz). N ther characters r blanks are allwed. Hwever, the header (clumn) names are nt case sensitive. If the nndetect clumn is nt labeled prperly, methds t handle nndetect data will nt be activated and shwn. 45
72 TwSample Hyptheses: It shuld be nted when using twsample hyptheses tests (WMW test, Gehan test, and TarneWare test) n data sets with NDs, bth samples r variables (e.g., siteas, BackAs) shuld be specified as having NDs, even thugh ne f the variables may nt have any ND bservatins. This means that a ND clumn (with 0 = ND, and 1 = detect) shuld be prvided fr each variable (here D_siteAs, and D_BackAs) t be used in this cmparisn. If a variable (e.g., siteas) des nt have any NDs, still a clumn with label D_siteAs shuld be included in the data set with all entries = 1 (detected values). The sample data set given n the previus page illustrates pints related t this ptin and issues listed abve. The data set cntains sme ND measurements fr Arsenic and Mercury. It shuld be nted that mercury cncentratins are used t illustrate the pints related t ND bservatins; arsenic and zinc cncentratins are used t illustrate the use f the grup variable, Grup (Surface, Subsurface). If fr mercury, ne cmputes summary statistics (assuming n ND values) using Full data set ptin, then all ND values (with 0 entries in D_Mercury clumn) will be treated as detected values, and summary statistics will be cmputed accrdingly Summary Statistics fr Data Sets with Nndetect Observatins T cmpute varius statistics f interest (e.g., backgrund statistics, GOF test, UCLs, WMW test) fr variables with ND values, ne shuld chse the ND ptin, With NDs frm the varius available menu ptins such as Stats/Sample Sizes, Graphs, Statistical Tests, Upper Limits/BTVs, and UCLs/EPCs. The NDs ptin f these mdules gets activated nly when yur data set cnsists f NDs. Fr data sets with NDs, the Stats/Sample Sizes mdule f PrUCL 5.0 cmputes summary statistics and ther general statistics such as the KM mean and KM standard deviatin based upn raw as well as lgtransfrmed data. The General Statistics/With NDs ptin als prvides simple statistics (e.g., % NDs, max detect, Min detect, Mean f detected values) based upn detected values. The statistics cmputed in lgscale (e.g., sd f lgtransfrmed detected values) may help a user t determine the degree f skewness (e.g., mild, mderate, high) f a data set based upn detected values. These statistics may als help the user t chse the mst apprpriate methd (e.g., KM btstrapt UCL r KM percentile btstrap UCL) t cmpute UCLs, UPLs, and ther limits used t cmpute decisin statistics. 46
73 All ther parametric and nnparametric statistics and estimates f ppulatin mean, variance, percentiles (e.g., KM, and ROS estimates) fr variables with ND bservatins are prvided in ther menu ptins such as Upper Limits/BTVs and UCLs/EPCs Warning Messages and Recmmendatins fr Datasets with an Insufficient Amunt f Data PrUCL 5.0 prvides warning messages and recmmendatins fr datasets with insufficient amunt f data t calculate meaningful estimates and statistics f interest. Fr example, it is nt desirable t cmpute an estimate f the EPC term based upn a discrete data set f size less than 5, especially when NDs are als present in the data set. Hwever, t accmmdate the cmputatin f UCLs and ther limits based upn ISM data sets, PrUCL 5.0 allws users t cmpute UCLs, UPLs, and UTLs based upn data sets f sizes as small as 3. The user is advised t fllw the guidance prvided in the ITRC ISM Technical Regulatry Guidance Dcument (ITRC, 2012) t select an apprpriate UCL95 t estimate the EPC term. Due t lwer variability in ISM data, the minimum sample size requirements fr statistical methds used n ISM data are lwer than the minimum sample size requirements fr statistical methds used n discrete data sets. It is suggested that fr discrete data sets, the users shuld use at least 10 bservatins t cmpute UCLs and varius ther limits. Sme examples f datasets with insufficient amunt f data include datasets with less than 3 distinct bservatins, datasets with nly ne detected bservatin, and datasets cnsisting f all nndetects. Sme f the warning messages generated by PrUCL 5.0 are shwn as fllws. 47
74 2.12 Handling Missing Values The varius mdules (e.g., Stats, GOF, UCLs, BTVs, Regressin, Trend tests) f PrUCL 5.0 can handle missing values within a data set. Apprpriate messages are displayed when deemed necessary. All blanks, alphanumeric strings (except fr grup variables), r the specific large value 1e31 are cnsidered as missing values. 48
75 A grup variable (representing tw r mre grups, ppulatins, MWs) can have alphanumeric values (e.g., MW01, MW02, AOC1, AOC2,...). PrUCL ignres all missing values in all statistical evaluatins it perfrms. Missing values are therefre nt treated as being part f a data set. Number f Valid Samples r Number f Valid Observatins represents the Ttal Number f Observatins minus the Number f Missing Values. If there are n missing values, then number f valid samples = ttal number f bservatins. Valid Samples = Ttal Number f Observatins Missing Values. It is imprtant t nte, hwever, that if a missing value nt meant (e.g., a blank, r 1e31) t represent a grup categry is present in a Grup variable, PrUCL 5.0 will treat that blank value (r 1e31 value) as a new grup. All variables and values that crrespnd t this missing value will be treated as part f a new grup and nt with any existing grups. It is therefre imprtant t check the cnsistency and validity f all data sets befre perfrming statistical evaluatins. PrUCL prints ut the number f missing values (if any) and the number f reprted values (excluding the missing values) assciated with each variable in the data sheet. This infrmatin is prvided in several utput sheets (e.g., General statistics, BTVs, UCLs, Outliers, OLS, Trend Tests) generated by PrUCL 5.0. Number f missing values in Regressin: The OLS mdule als handles number f missing values in the tw clumns (X and Y) representing independent (X) and dependent (Y) variables. PrUCL prvides warning messages fr bad data sets (e.g., all identical values) when statistics f interest cannt be cmputed. Hwever, a bad/extreme data set can ccur in numerus different ways, and PrUCL may nt cver all f thse extreme bad data sets. In such cases, PrUCL may still yield an errr message. The user needs t review and fix his data set befre perfrming regressin r trend analysis again. Fr further clarificatin f labeling f missing values, the fllwing example illustrates the terminlgy used fr the number f valid samples, number f unique and distinct samples n the varius utput sheets generated by the PrUCL sftware. Example: The fllwing example illustrates the ntin f Valid Samples, Unique r Distinct Samples, and Missing Values. The data set als has ND values. PrUCL 5.0 cmputes these numbers and prints them n the UCLs and backgrund statistics utput. x D_x w E anm 0 49
76 Valid Samples: Represents the ttal number f bservatins (censred and uncensred inclusive) excluding the missing values. If a data set has n missing value, then the ttal number f data pints equals number f valid samples. Missing Values: All values nt representing a real numerical number are treated as missing values. Specifically, all alphanumeric values including blanks are cnsidered t be missing values. Big numbers such as 1.0e31 are als treated as missing values and are cnsidered as nt valid bservatins. Unique r Distinct Samples: The number f unique samples r number f distinct samples represents all unique (r distinct) detected values. Number f unique r distinct values is cmputed fr detected values nly. This number is especially useful when using btstrap methds. As well knwn, it is nt desirable and advisable t use btstrap methds, when the number f unique samples is small User Graphic Display Mdificatin Advanced users are prvided tw sets f tls t mdify graphics displays. A graphics tl bar is available abve the graphics display and the user can rightclick n the desired bject within the graphics display, and a drpdwn menu will appear. The user can select an item frm the drpdwn menu list by clicking n that item. This will allw the user t make desired mdificatins as available fr the selected menu item. An illustratin is given as fllws Graphics Tl Bar The user can change fnts, fnt sizes, vertical and hrizntal axis s, select new clrs fr the varius features and text. All these actins are generally used t mdify the appearance f the graphic display. 50
77 The user is cautined that these tls can be unfrgiving and may put the user in a situatin where the user cannt g back t the riginal display. Users are n their wn in explring the rbustness f these tls. Therefre, less experienced users may nt want t use these drpdwn menu graphic tls DrpDwn Menu Graphics Tls Graphs can be mdified by using the ptins shwn n the tw graphs displayed belw. These tls allw the user t mve the muse t a specific graphic item like an axis label r a display feature. The user then rightclicks their muse and a drpdwn menu will appear. This menu presents the user with available ptins fr that particular cntrl r graphic bject. Fr example, the user can change clrs, title name, axes labels, fnt size, and resize the graphs. There is less chance f making an unrecverable errr but that risk is always present. As a cautinary nte, the user can always delete the graphics windw and redraw the graphical displays by repeating their peratins frm the datasheet and menu ptins available in PrUCL. A cuple f examples f a drpdwn menu btained by rightclicking the muse n the backgrund area f the graphics display are given as fllws. 51
78 52
79 Chapter 3 Select Variables Screen 3.1 Select Variables Screen The Select Variable screen is assciated with all mdules f PrUCL. Variables need t be selected t perfrm statistical analyses. When the user clicks n a drpdwn menu fr a statistical prcedure (e.g., UCLs/EPCs), the fllwing windw will appear. The Optins buttn is available in certain menus. The use f this ptin leads t anther ppup windw such as shwn belw. This windw prvides varius ptins assciated with the selected statistical methd (e.g., BTVs, OLS Regressin). 53
80 PrUCL can prcess multiple variables simultaneusly. PrUCL sftware can generate graphs, and cmpute UCLs, and backgrund statistics simultaneusly fr all selected variables shwn in the right panel f the screen sht displayed n the previus page. If the user wants t perfrm statistical analysis n a variable (e.g., manganese) by a Grup variable, click the arrw belw the Select Grup Clumn (Optinal) t get a drpdwn list f available variables frm which t select an apprpriate grup variable. Fr example, a grup variable (e.g., Well ID) can have alphanumeric values such as MW8, MW9, and MW1. Thus in this example, the grup variable name, Well ID, takes 3 values: MW1, MW8, and MW9. The selected statistical methd (e.g., GOF test) perfrms cmputatins n data sets fr all the grups assciated with the selected grup variable (e.g., Well ID) 54
81 The Grup variable is useful when data frm tw r mre samples need t be cmpared. Any variable can be a grup variable. Hwever, fr meaningful results, nly a variable, that really represents a grup variable (categries) shuld be selected as a grup variable. The number f bservatins in the grup variable and the number bservatins in the selected variables (t be used in a statistical prcedure) shuld be the same. In the example belw, the variable Mercury is nt selected because the number f bservatins fr Mercury is 30; in ther wrds mercury values have nt been gruped. The grup variable and each f the selected variables have 20 data values. As mentined earlier, ne shuld nt assign any missing value such as a Blank fr the grup variable. If there is a missing value (represented by blanks, strings r 1E31) fr a grup variable, PrUCL will treat thse missing values as a new grup. As such, data values crrespnding t the missing Grup will be assigned t a new grup. The Grup Optin prvides a useful tl t perfrm varius statistical tests and methds (including graphical displays) separately fr each f the grup (samples frm different ppulatins) that may be present in a data set. Fr example, the same data set may cnsist f samples frm the varius grups (ppulatins). The graphical displays (e.g., bx plts, QQ plts) and statistics f interest can be cmputed separately fr each grup by using this ptin. Ntes: Once again, care shuld be taken t avid misrepresentatin and imprper use f grup variables. It is recmmended nt t assign any missing value fr the grup variable. 55
82 3.1.1 Graphs by Grups The fllwing ptins are available t generate graphs by grups. Individual r multiple graphs (QQ plts, bx plts, and histgrams) can be displayed n a graph by selecting a "Grup Clumn (Optinal) ptin shwn as fllws An individual graph fr each grup (specified by the selected grup variable) is prduced by selecting the Individual Graph ptin; and multiple graphs (e.g., sidebyside bx plts, multiple QQ plts n the same graph) are prduced by selecting the Grup Graph ptin as shwn belw. Using the Grup Graph ptin, multiple graphs are displayed fr all subgrups included in the Grup variable. This ptin is used when data are given in the same clumn and are classified by a grup variable. 56
83 Multiple graphs fr selected variables are prduced by selecting ptins: Multiple Bx Plts r Multiple QQ Plts. Using the Grup Graph ptin, multiple graphs fr all selected variables are shwn n the same graphical display. This ptin is useful when data (e.g., site lead and backgrund lead) t be cmpared are given in different clumns. Ntes: It shuld be nted that it is the users respnsibility t prvide adequate amunt f detected data t perfrm the grup peratins. Fr example, if the user desires t prduce a graphical QQ plt (using nly detected data) with regressin lines displayed, then there shuld be at least tw detected pints (t cmpute slpe, intercept, and sd) in the data set. Similarly if graphs are desired fr each grup specified by a Grup ID variable, there shuld be at least tw detected bservatins in each grup specified by the Grup ID variable. PrUCL displays a warning message (in range) in the lwer Lg Panel f the PrUCL screen when nt enugh data are available t perfrm a statistical r graphical peratin. 57
84 Chapter 4 General Statistics The "General Statistics" ptin is available under the Stats/Sample Sizes mdule f PrUCL 5.0. This ptin is used t cmpute general statistics including simple summary statistics (e.g., mean, standard deviatin) fr all selected variables. In additin t simple summary statistics, several ther statistics are cmputed fr full uncensred data sets (w/ NDs), and fr data sets with nndetect (with NDs) bservatins (e.g., estimates based upn the KM methd). Tw Menu ptins: Full and With NDs are available. Full (w/ NDs): This ptin cmputes varius general statistics fr all selected variables. With NDs: This ptin cmputes general statistics including KM methd based mean and standard deviatins fr all selected variables with ND bservatins. Each menu ptin (Full (w/ NDs) and With NDs) has tw submenu ptins: Raw Statistics LgTransfrmed When cmputing general statistics fr raw data, a message will be displayed fr each variable that cntains nnnumeric values. The General Statistics ptin cmputes lgtransfrmed (natural lg) statistics nly if all f the data values fr the selected variable(s) are psitive real numbers. A message will be displayed if nnnumeric characters, zer, r negative values are fund in the clumn crrespnding t a selected variable. 4.1 General Statistics fr Full Data Sets withut NDs 1. Click General Statistics Full (w/ NDs) 2. Select either LgTransfrmed r Raw Statistics ptin. 3. The Select Variables screen (see Chapter 3) will appear. Select ne r mre variables frm the Select Variables screen. If statistics are t be cmputed by a Grup variable, then select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) buttn. This will result in drpdwn list f available variables, and select a prper grup variable. 58
85 Click n the OK buttn t cntinue r n the Cancel buttn t cancel the General Statistics ptin. Raw Statistics 59
86 LgTransfrmed Statistics 4. The General Statistics screen (and all ther utput screens generated by ther mdules) shwn abve can be saved as an Excel 2003 (.xls) r 2007 (.xlsx) file. Click Save frm the file menu. 5. On the utput screen shwn abve, mst f the statistics are self explanatry and described in the PrUCL Technical Guide (EPA 2013). A cuple f simple rbust statistics (Haglin, Msteller, and Tukey, 1983) included in the abve utput are described as fllws. MAD = Median abslute deviatin MAD/0.675 = Rbust and resistant (t utliers) estimate f variability, ppulatin standard deviatin, 4.2 General Statistics with NDs 1. As abve, Click General Statistics With NDs 2. Select either LgTransfrmed r Raw Statistics ptin. 60
87 3. The Select Variables screen (Chapter 3) will appear. Select variable(s) frm the list f variables. Only thse variables that have ND values will be shwn. The user shuld make sure that the variables with NDs are defined prperly including the clumn shwing the detectin status f the varius bservatins. If statistics are t be cmputed by a Grup variable, then select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) buttn. This will result in a drpdwn list f available variables. Select a prper grup variable. Click n the OK buttn t cntinue r n the Cancel buttn t cancel the summary statistics peratins. Raw Statistics Data Set with NDs The Summary Statistics screen shwn abve can be saved as an Excel 2003 r 2007 file. Click Save frm the file menu. 61
88 Chapter 5 Imputing Nndetects Using ROS Methds The imputing f NDs using regressin n rder statistics (ROS) methds ptin is available under the Stats/Sample Sizes mdule f PrUCL 5.0. This ptin is prvided fr advanced users wh want t use the detected and imputed NDs data fr explratry and data mining purpses n multivariate data sets. Fr explratry methds such as the principal cmpnent analysis ( PCA), cluster, and discriminant analysis t gain additinal insight int ptential structures and patterns present in a multivariate (mre than ne variable) data set, ne may want t use imputed values in graphical displays (line graphs, scatter plts, bxplts etc.) and in explratry PCA and cluster analysis. T derive cnclusins based upn multivariate data sets cnsisting f nndetects, the develpers suggest the use f the KM methd based cvariance r crrelatin matrix t perfrm principal cmpnent and regressin analysis. These methds are beynd the scpe f the PrUCL sftware which deals nly with univariate methds. The details f cmputing an Orthgnalized Kettenring and Gnanadesikan (OKG) psitive definite KM matrix can be fund in Marnna, Martin, and Yhai (2006) and in Scut 2008 Versin 1.0 guidance dcuments (2009) which can be dwnladed frm the EPA NERL Site. One may nt use ROS imputed data t perfrm parametric statistical tests such as ttest and ANOVA test withut further investigatin. These issues require further research t evaluate decisin errrs assciated with cnclusins derived using such methds. The ROS methds can be used t impute ND bservatins using a nrmal, lgnrmal, r gamma mdel. PrUCL has three ROS estimatin methds that can be used t impute ND bservatins. The use f this ptin generates additinal clumns cnsisting f all imputed NDs and detected bservatins. These clumns are appended t the existing pen spreadsheet file. The user shuld save the updated file if they want t use the imputed data fr their ther applicatin(s) such as PCA r discriminant analysis. It is nt easy t perfrm multivariate statistical methds n data sets with NDs. The availability f imputed NDs in a data file helps the advanced users wh want t use explratry methds n data sets cnsisting f ND bservatins. Like ther statistical methds in PrUCL, NDs can als be imputed by a grup variable. One can impute NDs using the fllwing steps. 1. Click Imputed NDs using ROS Methds Lgnrmal ROS 2. The Select Variables screen (Chapter 3) will appear. Select ne r mre variable(s) frm the Select Variables screen; NDs can be imputed using a grup variable as shwn in the fllwing screen sht. 62
89 Click n the OK buttn t cntinue r n the Cancel buttn t cancel the ptin. Output Screen fr ROS Est. NDs (Lgnrmal ROS) Optin Ntes: Fr gruped data, PrUCL generates a separate clumn fr each grup in the data set as shwn in the abve table. Clumns with a similar naming cnventin are generated fr each selected variable and distributin using the ROS ptin. 63
90 Chapter 6 Graphical Methds (Graph) The graphical methds described here are used as explratry tls t get sme idea abut data distributins (e.g., skewed, symmetric), ptential utliers and/r multiple ppulatins present in a data set. The fllwing graphical methds are available under the Graphs ptin f PrUCL 5.0 All graphical displays listed abve can be generated using uncensred full data sets (Full w/ NDs) as well as leftcensred data sets with nndetect (With NDs) bservatins. On bx plt graphs fr data sets with NDs, a hrizntal line is als displayed at the largest RL assciated with ND bservatins. QQ Plts and Histgrams: QQ plts and histgrams can be generated individually as well as by using a Grup variable. Graphs generated using the Grup Graphs ptin shwn belw is useful when data fr selected variable(s) are given in the same clumn (stacked data) categrized by a Grup ID. Fr data sets with NDs, three ptins described belw are available t draw QQ plts and histgrams. Specifically, these graphs are displayed nly fr detected values, r with NDs replaced by ½ DL values, r with NDs replaced by the respective DLs. The statistics displayed n a QQ plt (mean, sd, slpe, intercept) are cmputed accrding t the methd used. On QQ plts, ND values are displayed using a smaller fnt. The explratry QQ plts described here d nt require any placehlders fr NDs. These graphs are used nly t determine the distributin f detected values and t identify ptential utliers and/r multiple ppulatins present in a data set. On histgrams, the user can change the number f bins (mre bins, less bins) used t generate histgrams. 64
91 D nt Display Nndetects: Selectin f this ptin excludes all NDs frm a graphical methd (QQ plts and histgrams) and plts nly detected values. The statistics shwn n QQ plts are cmputed nly using the detected data. Use Reprted Detectin Limit: Selectin f this ptin treats DLs as detected values assciated with the ND values. The graphs are generated using the numerical values f detectin limits and statistics displayed n QQ plts are cmputed accrdingly. Use Detectin Limit Divided by 2.0: Selectin f this ptin replaces the DLs with their half values. All QQ plts and histgrams are generated using the half detectin limits and detected values. The statistics displayed n QQ plts are cmputed accrdingly. Fr data sets in different clumns, ne can use the Multiple QQ Plts ptin. By default, this ptin will display multiple QQ plts fr all selected variables n the same graph. One can als generate multiple QQ plts by using a grup variable. Bx Plt: Like QQ plts, bx plts can als be generated by a Grup variable. This ptin is useful when all data are given in the same clumn (stacked data) categrized by a Grup ID variable. On bx plts with NDs, a hrizntal line is displayed at the largest detectin limit level. PrUCL 5.0 cnstructs a bx plt using all detected and nndetected (using assciated DL values) values. A hrizntal line is displayed at the largest detectin limit. Bx Plts are generated using ChartFx, a sftware used in the develpment f PrUCL 5.0 Multiple Bx Plts: Fr data in different clumns, ne can use the Multiple Bx Plts ptin t display multiple bx plts fr all selected variables n the same graph. One can als generate multiple bx plts by using a grup variable. Bx Plts have an ptinal feature, which can be used t draw up t fur (4) hrizntal lines at preestablished screening levels r at statistical limits (e.g., upper limits f a backgrund data set) cmputed using a backgrund data set. This ptin can be used when bx plts are generated using nsite data and ne may be interested in cmparing nsite data with backgrund threshld values and/r preestablished screening levels. This type f bx plt represents a useful visual cmparisn f site data with backgrund threshld values and/r ther actin levels. Up t fur (4) values can be displayed n a bx plt as shwn belw. If the user inputs a value in the value clumn, the check bx in that rw will get activated. Fr example, the user may want t display hrizntal lines at a backgrund UTL9595 r sme preestablished actin level(s) n bx plts generated using AOCs data. 65
92 6.1 Bx Plt 1. Click Graphs Bx Plt 2. The Select Variables screen (Chapter 3) will appear. Select ne r mre variable(s) frm the Select Variables screen. If graphs have t be prduced by using a Grup variable, then select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) buttn. This will result in a drpdwn list f available variables. The user shuld select an apprpriate variable representing a grup variable as shwn belw. 66
93 The default ptin fr Graph by Grups is Grup Graphs. This ptin prduces sidebyside bx plts fr all grups included in the selected Grup ID Clumn (e.g., Zne here). The Grup Graphs ptin is used when multiple graphs categrized by a grup variable need t be prduced n the same graph. The Individual Graphs ptin generates individual graphs fr each selected variable r ne bx plt fr each grup fr the variable categrized by a Grup ID clumn (variable). While generating bx plts, ne can display hrizntal lines at specified screening levels r a BTV estimate (e.g., UTL9595) cmputed using a backgrund data set. Fr data sets with NDs, a hrizntal line is als displayed at the largest reprted DL assciated with a ND value. The use f this ptin may prvide infrmatin abut the analytical methds used t analyze field samples. Click n the OK buttn t cntinue r n the Cancel buttn t cancel the Bx Plt (r ther selected graphical) ptin. Bx Plt Output Screen (Grup Graph) Selected ptins: Label (Screening Level), Value (12) 67
94 6.2 Histgram 1. Click Graphs Histgram 2. The Select Variables screen (Chapter 3) will appear. Select ne r mre variable(s) frm the Select Variables screen. If graphs have t be prduced by using a Grup variable, then select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) buttn. This will result in a drpdwn list f available variables. The user shuld select an apprpriate variable representing a grup variable as shwn belw. When the ptin buttn is clicked, fr data sets with NDs, the fllwing windw will be shwn. By default, histgrams are generating using the RLs fr NDs. The default selectin fr histgrams (and fr all ther graphs) by a grup variable is Grup Graphs. This ptin prduces multiple histgrams n the same graph. If histgrams needed t be displayed individually, the user shuld check the radi buttn next t Individual Graphs. Click n the OK buttn t cntinue r n the Cancel buttn t cancel the Histgram (r ther selected graphical) ptin. 68
95 Histgram Output Screen Selected ptins: Grup Graphs Ntes: PrUCL des nt perfrm any GOF tests when generating histgrams. Histgrams are generated using the develpment sftware ChartFx. Histgram ptin autmatically generates a nrmal prbability density functin (pdf) curve irrespective f the data distributin. At this time, PrUCL 5.0 des nt display a pdf curve fr any ther distributin (e.g., gamma) n a histrgram. The user can increase r decrease the number f bins t be used in a histgram. 6.3 QQ Plts 1. Click Graphs QQ Plts. When that ptin buttn is clicked, the fllwing windw will be shwn. 2. QQ Plts can be generated fr data sets With NDs and withut NDs [Full (w/ NDs)]. Select either Full (w/ NDs) r With NDs ptin. The Select Variables screen (Chapter 3) will appear. Select ne r mre variable(s) frm the Select Variables screen. 69
96 If graphs have t be prduced by using a Grup variable, then select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) buttn. This will result in a drpdwn list f available variables. The user shuld select and click n an apprpriate variable representing a grup variable as shwn belw. Click n the OK buttn t cntinue r n the Cancel buttn t cancel the selected QQ plts ptin. The fllwing ptins screen appears prviding chices t treat NDs. The default ptin is t use the reprted values fr all NDs. Click n the OK buttn t cntinue r n the Cancel buttn t cancel the selected QQ plts ptin. The fllwing QQ plt appears when used n the cpper cncentratins f tw znes: Alluvial Fan and Basin Trugh. Output Screen fr QQ plts (With NDs) Selected ptins: Grup Graph, N Best Fit Line Nte: The fnt size f ND values is smaller than that f the detected values. 70
97 6.4 Multiple QQ Plts Multiple QQ plts (Uncensred data sets) 1. Click Graphs Multiple QQ Plts 2. Multiple QQ Plts can be generated fr data sets With NDs and withut NDs [Full (w/ NDs)]. When that Optin buttn is clicked, the fllwing windw will be shwn. Select either Full (w/ NDs) r With NDs. The Select Variables Screen (Chapter 3) will appear. Select ne r mre variable(s) frm the Select Variables screen. If graphs have t be prduced by using a Grup variable, then select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) buttn. This will result in a drpdwn list f available variables. The user shuld select and click n an apprpriate variable representing a grup variable as shwn belw. Click OK t cntinue r Cancel buttn t cancel the selected Multiple QQ Plts ptin. Example 61: The fllwing graph is generated by using Fisher's (1936) data set fr 3 Iris species. Output Screen fr Multiple QQ Plts (Full w/ NDs) Selected Optins: Grup Graph, Best Fit Line 71
98 If the user des nt want the regressin lines shwn abve, click n the Best Fit Line and all regressin lines will disappear as shwn belw. Ntes: Fr QQ plts and Multiple QQ plts ptin, fr bth Full as well as fr data sets With NDs, the values alng the hrizntal axis represent quantiles f a standardized nrmal distributin (Nrmal distributin with mean=0 and standard deviatin=1). Quantiles fr ther distributins (e.g., Gamma distributin) are used when using the GdnessfFit (GOF, G.O.F.) test ptin. 6.5 Multiple Bx Plts Multiple Bx plts (Uncensred data sets) 1. Click Graphs Multiple Bx Plts 2. Multiple QQ Plts can be generated fr data sets With NDs and withut NDs [Full (w/ NDs)]. When the ptin buttn is clicked, the fllwing windw will be shwn. Select either Full (w/ NDs) r With NDs. The Select Variables screen (Chapter 3) will appear. Select ne r mre variable(s) frm the Select Variables screen. 72
99 If graphs have t be prduced by using a Grup variable, then select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) buttn. This will result in a drpdwn list f available variables. The user shuld select and click n an apprpriate variable representing a grup variable as shwn belw. Click n the OK buttn t cntinue r n the Cancel buttn t cancel the selected Multiple Bx Plts ptins. The fllwing graph is generated by using the abve ptins. Example 61 (cntinued): The fllwing graph is generated by using the abve ptins n Fisher's (1936) Iris data set cllected frm 3 species f Iris flwer. Output Screen fr Multiple Bx Plts (Full w/ NDs) Selected ptins: Grup Graph 73
100 Chapter 7 Classical Outlier Tests Outliers are inevitable in data sets riginating frm envirnmental and varius ther applicatins. In additin t infrmal graphical displays (e.g., QQ plts and bx plts) and classical utlier tests (Dixn test, Rsner test), there exist several rbust utlier identificatin methds (e.g., biweight, Huber, PROP, MCD) t identify any number f multiple utliers ptentially present in data sets f varius sizes (Scut 2008; EPA 2009). It is well knwn that the classical utlier tests: Dixn test and Rsner test suffer frm masking (e.g., extreme utliers may mask intermediate utliers) effects. The use f rbust utlier identificatin prcedures is recmmended t identify multiple utliers, especially when dealing with multivariate (having multiple cnstituents) data sets. Hwever, thse preferred and mre effective rbust utlier identificatin methds are beynd the scpe f PrUCL 5.0. Several rbust utlier identificatin methds (e.g., based upn biweight, Huber, and PROP influence functins, Singh and Ncerin, 1995) are available in the Scut 2008 v1.0 sftware package (EPA, 2009). The tw classical utlier tests: Dixn and Rsner tests (EPA 2006a; Gilbert, 1987) are available in the PrUCL sftware. These tests can be used n data sets with and withut ND bservatins. These tests als require the assumptin f nrmality f the data set withut the utliers. It shuld be nted that in envirnmental applicatins, ne f the bjectives is t identify high utlying bservatins that might be present in the right tail f a data distributin as thse bservatins ften represent cntaminated lcatins f a plluted site ptentially requiring further investigatins. Therefre, fr data sets with NDs, tw ptins are available in PrUCL t deal with data sets with utliers. These ptins are: 1) exclude NDs and 2) replace NDs by DL/2 values. These ptins are used nly t identify utliers and nt t cmpute any estimates and limits used in decisinmaking prcess. T cmpute the varius statistics f interest, PrUCL uses rigrus statistical methds suited fr leftcensred data sets with multiple DLs. It is suggested that the utlier identificatin prcedures be supplemented with graphical displays such as nrmal QQ plts and bx plts. On a nrmal QQ plt, bservatins that are well separated frm the bulk (central part) f the data typically represent ptential utliers needing further investigatin. Als, significant and bvius jumps and breaks in a nrmal QQ plt are indicatins f the presence f mre than ne ppulatin. Data sets exhibiting such behavir f QQ plts shuld be partitined ut int cmpnent subppulatins befre estimating EPC terms r BTVs. Outlier tests in PrUCL 5.0 are available under the Statistical Tests mdule. 74
101 Dixn's Outlier Test (Extreme Value Test): Dixn's test is used t identify statistical utliers when the sample size is 25. This test identifies utliers r extreme values in the left tail (Case 2) and als in the right tail (Case 1) f a data distributin. In envirnmental data sets, utliers fund in the right tail, ptentially representing impacted lcatins, are f interest. The Dixn test assumes that the data withut the suspected utlier (s) are nrmally distributed. If the user wants t perfrm a nrmality test n the data set, he shuld first remve the utliers befre perfrming the nrmality test. This test tends t suffer frm masking in the presence f multiple utliers. This means that if mre than ne utlier (in either tail) is suspected, this test may fail t identify all f the utliers. Rsner Outlier Test: This test can be used t identify up t 10 utliers in data sets f sizes 25 and higher. This test als assumes that the data set withut the suspected utliers is nrmally distributed. Like the Dixn test, if the user wants t perfrm a nrmality test n the data set, he shuld first remve the utliers (which are nt knwn in advance) befre perfrming the nrmality test. The detailed discussin f these tw tests is given in the assciated PrUCL Technical Guide. A cuple f examples illustrating the identificatin f utliers in data sets with NDs are described in the fllwing sectins. 7.1 Outlier Test fr Full Data Set 1. Click Outlier Tests Full (w/ NDs) Cmpute 2. The Select Variables screen (Chapter 3) will appear. Select ne r mre variable(s) frm the Select Variables screen. If utlier test needs t be perfrmed by using a Grup variable, then select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) buttn. This will result in a drpdwn list f available variables. The user shuld select and click n an apprpriate variable representing a grup variable. If at least ne f the selected variables (r grup) has 25 r mre bservatins, then click the ptin buttn fr the Rsner Test. PrUCL autmatically perfrms the Dixn test fr data sets f sizes
102 The default ptin fr the number f suspected utliers is 1. T use the Rsner test, the user has t btain an initial guess abut the number f suspected utliers that may be present in the data set. This can be dne by using graphical displays such as a QQ plt. On a QQ plt, higher bservatins that are well separated frm the rest f the data may be cnsidered as ptential r suspected utliers. Click n the OK buttn t cntinue r n the Cancel buttn t cancel the Outlier Test. 7.2 Outlier Test fr Data Sets with NDs Tw ptins: exclude NDs; r replace NDs by their respective DL/2 are available in PrUCL t perfrm utlier tests n data sets with NDs. 1. Click Outlier Tests With NDs Exclude NDs Output Screen fr Dixn s Outlier Test 76
103 QQ plt withut Nndetect Observatins are Shwn as Fllws Example: Rsner s Outlier Test by a Grup Variable, Zne Selected Optins: Number f Suspected Outliers = 4 NDs excluded frm the Rsner Test Outlier test perfrmed using the Select Grup Clumn (Optinal) Output Screen fr Rsner s Outlier Test fr Zinc in Zne: Alluvial Fan 77
104 QQ plt fr Zinc Based upn Detected Data (Alluvial Fan) Output Screen fr Rsner s Outlier Test fr Zinc in Zne: Basin Trugh 78
105 79
106 Chapter 8 GdnessfFit (GOF) Tests fr Uncensred and LeftCensred Data Sets The GOF tests are available under the Statistical Test mdule f PrUCL 5.0. Thrughut this User Guide and in PrUCL 5.0 sftware, Full represents uncensred data sets withut ND bservatins. The details and usage f the varius GOF tests are described in the assciated PrUCL 5.0 Technical Guide. 8.1 GdnessfFit test in PrUCL Several GOF tests fr uncensred full (Full (w/ NDs)) and leftcensred (With NDs) data sets are available in the PrUCL sftware. Full (w/ NDs) This ptin is used n uncensred full data sets withut any ND bservatins. This ptin can be used t determine GOF fr nrmal, gamma, r lgnrmal distributin f the variable(s) selected using the Select Variables ptin. Like all ther methds in PrUCL, GOF tests can als be perfrmed n variables categrized by a Grup ID variable. Based upn the hypthesized distributin (nrmal, gamma, lgnrmal), a QQ plt displaying all statistics f interest including the derived cnclusin is als generated. The GOF Statistics ptin generates a detailed utput lg (Excel type spreadsheet) shwing all GOF test statistics (with derived cnclusins) available in PrUCL. This ptin helps a user t determine the distributin f a data set befre generating a GOF QQ plt fr the hypthesized distributin. This ptin was included at the request f sme users in earlier versins f PrUCL. 80
107 With NDs This ptin perfrms GOF tests n data sets cnsisting f bth nndetected and detected data values. Several submenu items shwn belw are available fr this ptin. 1. Exclude NDs: tests fr nrmal, gamma, r lgnrmal distributin f the selected variable(s) using nly the detected values. 2. ROS Estimates: tests fr nrmal, gamma, r lgnrmal distributin f the selected variable(s) using detected values and imputed nndetects. Three ROS methds fr nrmal, lgnrmal (Lg), and gamma distributins are available. This ptin imputes the NDs based upn the specified distributin and perfrms the specified GOF test n the data set cnsisting f detects and imputed nndetects. 3. DL/2 Estimates: tests fr nrmal, gamma, r lgnrmal distributin f the selected variable(s) using the detected values and the ND values replaced by their respective DL/2 values. This ptin is included fr histrical reasns and als fr curius users. PrUCL des nt make any recmmendatins based upn this ptin. 4. G.O.F. Statistics: Like full uncensred data sets, this ptin generates an utput lg f all GOF test statistics available in PrUCL fr data sets with nndetects. The cnclusins abut the data distributins fr all selected variables are als displayed n the generated utput file (Exceltype spreadsheet). Multiple variables: When multiple variables are selected frm the Select Variables screen, ne can use ne f the fllwing tw ptins: Grup Graphs ptin t prduce multiple GOF QQ plts fr all selected variables in a single graph. This ptin may be used when a selected variable has data cming frm tw r mre grups r ppulatins. The relevant statistics (e.g., slpe, intercept, crrelatin, test statistic and critical value) assciated with the selected variables are shwn n the right panel f the GOF QQ plt. T capture all the graphs and results shwn n the windw screen, it is preferable t print the graph using the Landscape ptin. The user may als want t turn ff the Navigatin Panel and Lg Panel. 81
108 Individual Graphs ptin is used t generate individual GOF QQ plts fr each f the selected variables, ne variable at a time (r fr each grup individually f the selected variable categrized by a Grup ID). This is the mst cmmnly used ptin t perfrm GOF tests fr the selected variables. GOF QQ plts fr hypthesized distributins: PrUCL cmputes the relevant test statistic and the assciated critical value, and prints them n the assciated QQ plt (called GOF QQ plt). On this GOF QQ plt, the prgram infrms the user if the data are gamma, nrmally, r lgnrmally distributed. Fr all ptins described abve, PrUCL generates GOF QQ plts based upn the hypthesized distributin (nrmal, gamma, lgnrmal). All GOF QQ plts display several statistics f interest including the derived cnclusin. The linear pattern displayed by a GOF QQ plt suggests an apprximate GOF fr the selected distributin. The prgram cmputes the intercept, slpe, and the crrelatin cefficient fr the linear pattern displayed by the QQ plt. A high value f the crrelatin cefficient (e.g., > 0.95) is an indicatin f a gd fit fr that distributin. This high crrelatin shuld exhibit a definite linear pattern in the QQ plt withut abrupt jumps. On a GOF QQ plt, bservatins that are well separated frm the majrity f the data (central part) typically represent ptential utliers needing further investigatin. Significant and bvius jumps and breaks and curves in a QQ plt are indicatins f the presence f mre than ne ppulatin. Data sets exhibiting such behavir f QQ plts shuld be partitined ut int cmpnent subppulatins befre estimating EPC terms r BTVs. It is recmmended that bth graphical and frmal gdnessffit tests be used n the same data set t determine the distributin f the data set under study. Nrmality r Lgnrmality Tests: In additin t infrmal graphical nrmal and lgnrmal QQ plts, a frmal GOF test is als available t test the nrmality r lgnrmality f the data set. Lilliefrs Test: a test typically used fr samples f size larger than 50 (> 50). Hwever, the Lilliefrs test (generalized Klmgrv Smirnv [KS] test) is available fr samples f all sizes. There is n applicable upper limit fr sample size fr the Lilliefrs test. Shapir and Wilk (SW, SW) Test: a test used fr samples f size smaller than r equal t 2000 (<= 2000). In PrUCL 5.0, the SW test uses the exact SW critical values fr samples f size 50 r less. Fr samples f size, greater than 50, the SW test statistic is displayed alng with the pvalue f the test (Rystn, 1982, 1982a). Ntes: As with ther statistical tests, smetimes these tw tests might lead t different cnclusins. The user is advised t exercise cautin when interpreting these test results. GOF test fr Gamma Distributin: In additin t the graphical gamma QQ plt, tw frmal empirical distributin functin (EDF) prcedures are als available t test the gamma distributin f a data set. These tests are the AD test and the KS test. 82
109 It is nted that these tw tests might lead t different cnclusins. Therefre, the user shuld exercise cautin interpreting the results. These tw tests may be used fr samples f sizes in the range f Als, fr these tw tests, the value (knwn r estimated) f the shape parameter, k (k hat) shuld lie in the interval [0.01, 100.0]. Cnsult the assciated PrUCL Technical Guide fr a detailed descriptin f the gamma distributin and its parameters, including k. Extraplatin beynd these sample sizes and values f k is nt recmmended. Ntes: Even thugh, the GOF Statistics ptin prints ut all GOF test statistics fr all selected variables, it is suggested that the user shuld lk at the graphical QQ plt displays t gain extra insight (e.g., utliers, multiple ppulatin) int the data set. 8.2 GdnessfFit Tests fr Uncensred Full Data Sets 1. Click GdnessfFit Tests Full (w/ NDs) 2. Select the distributin t be tested: Nrmal, Lgnrmal, r Gamma T test fr nrmality, click n Nrmal frm the drpdwn menu list. T test fr lgnrmality, click n Lgnrmal frm the drpdwn menu list. T test fr gamma distributin, click n Gamma frm the drpdwn menu list. 83
110 8.2.1 GOF Tests fr Nrmal and Lgnrmal Distributin 1. Click GdnessfFit Tests Full (w/ NDs) Nrmal r Lgnrmal 2. The Select Variables screen (Chapter 3) will appear. Select ne r mre variable(s) frm the Select Variables screen. If graphs have t be prduced by using a Grup variable, then select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) buttn. This will result in a drpdwn list f available variables. The user shuld select and click n an apprpriate variable representing a grup variable. When the Optin buttn is clicked, the fllwing windw will be shwn. The default ptin fr the Cnfidence Level is 95%. The default GOF Methd is ShapirWilk. The default ptin fr Graphs by Grup is Grup Graphs. If yu want t see the plts fr all selected variables individually, and then check the buttn next t Individual Graphs. Click OK buttn t cntinue r Cancel buttn t cancel the GOF tests. 84
111 Ntes: This ptin fr Graphs by Grup is specifically prvided when the user wants t display multiple graphs fr a variable by a grup variable (e.g., site AOC1, site AOC2, backgrund). This kind f display represents a useful visual cmparisn f the values f a variable (e.g., cncentratins f COPCArsenic) cllected frm tw r mre grups (e.g., upgradient wells, mnitring wells, residential wells). Example 81a (Superfund Site Data Cntinued): The lgnrmal and nrmal GOF test results n chrmium cncentratins are shwn in the fllwing figures. Output Screen fr Lgnrmal Distributin (Full (w/ NDs)) Selected Optins: ShapirWilk Output Screen fr Nrmal Distributin (Full (w/ NDs)) Selected Optins: ShapirWilk, Best Fit Line nt Displayed 85
112 8.2.2 GOF Tests fr Gamma Distributin 1. Click GdnessfFit Tests Full (w/ NDs) Gamma 2. The Select Variables screen (described in Chapter 3) will appear. Select ne r mre variable(s) frm the Select Variables screen. If graphs have t be prduced by using a Grup variable, then select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) buttn. This will result in a drpdwn list f available variables. The user shuld select and click n an apprpriate variable representing a grup variable. When the ptin buttn is clicked, the fllwing windw will be shwn. The default ptin fr the Cnfidence Cefficient is 95%. The default GOF methd is Andersn Darling. 86
113 The default ptin fr Graph by Grups is Grup Graphs. If yu want t see individual graphs, then check the radi buttn next t Individual Graphs. Click the OK buttn t cntinue r the Cancel buttn t cancel the ptin. Click OK buttn t cntinue r Cancel buttn t cancel the GOF tests. Example 81b (Superfund Site Data Cntinued): The Gamma GOF test results, fr the data set f arsenic cncentratins, are shwn in the fllwing G.O.F. QQ plt. Output Screen fr Gamma Distributin (Full (w/ NDs)) Selected Optins: Andersn Darling with Best Line Fit 8.3 GdnessfFit Tests Excluding NDs This ptin is the mst imprtant ptin fr a GOF test based upn data sets with ND bservatins. Based upn the skewness and distributin f detected data, PrUCL cmputes apprpriate decisin statistics (UCLs, UPLs, UTLs, and USLs) which accmmdate data skewness. Specifically, depending upn the distributin f detected data, PrUCL uses KM estimates in parametric r nnparametric upper limits cmputatin frmulae (UCLs, UTLs) t estimate EPC terms and BTV estimates. 1. Click GdnessfFit Tests With NDs Exclude NDs 87
114 2. Select distributin t be tested: Nrmal, Gamma, r Lgnrmal. T test fr nrmality, click n Nrmal frm the drpdwn menu list. T test fr lgnrmality, click n Lgnrmal frm the drpdwn menu list. T test fr gamma distributin, click n Gamma frm the drpdwn menu list Nrmal and Lgnrmal Optins 1. Click GdnessfFit Tests With NDs Excluded NDs Nrmal r Lgnrmal 2. The Select Variables screen (Chapter 3) will appear. Select ne r mre variable(s) frm the Select Variables screen. If graphs have t be prduced by using a Grup variable, then select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) buttn. This will result in a drpdwn list f available variables. The user shuld select and click n an apprpriate variable representing a grup variable. When the ptin buttn: Nrmal r Lgnrmal is clicked, the fllwing windw is displayed 88
115 The default ptin fr the Cnfidence Cefficient is 95%. The default GOF Methd is ShapirWilk. The default ptin fr Graphs by Grup is Grup Graphs. If yu want t see the plts fr all selected variables individually, and then check the buttn next t Individual Graphs. Click the OK buttn t cntinue r the Cancel buttn t cancel the ptin. Click the OK buttn t cntinue r the Cancel buttn t cancel the GOF tests. Example 82a. Cnsider the arsenic Oahu data set with NDs discussed in the literature (e.g., Helsel, 2012; NADA in R [Helsel, 2013]). The nrmal and lgnrmal GOF test results based upn the detected data respectively are shwn in the fllwing tw figures. Output Screen fr Nrmal Distributin (Exclude NDs) Selected Optins: ShapirWilk with Best Fit Line 89
116 Output Result fr Lgnrmal Distributin (Exclude NDs) Selected ptins: Lilliefrs Test with Best Fit Line Gamma Distributin Optin 1. Click GdnessfFit Tests With NDs Excluded NDs Gamma 2. The Select Variables screen (Chapter 3) will appear. Select ne r mre variable(s) frm the Select Variables screen. If graphs have t be prduced by using a Grup variable, then select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) buttn. This will result in a drpdwn list f available variables. The user shuld select and click n an apprpriate variable representing a grup variable. When the ptin buttn (Gamma) is clicked, the fllwing windw is shwn. 90
117 The default ptin fr the Cnfidence Cefficient is 95%. The default GOF test Methd is the Andersn Darling test. The default ptin fr Graph by Grups is Grups Graphs. If yu want t display all selected variables n separate graphs, check the buttn next t Individual Graphs. Click the OK buttn t cntinue r the Cancel buttn t cancel the ptin. Click the OK buttn t cntinue r the Cancel buttn t cancel the GOF tests. Example 82b (cntinued). Cnsider the arsenic Oahu data set with NDs as discussed in Example 82a abve. The gamma GOF test results based upn the detected data are shwn in the fllwing GOF QQ plt. Output Screen fr Gamma Distributin (Exclude NDs) Selected Optins: Klmgrv Smirnv Test with Best Fit Line 91
118 8.4 GdnessfFit Tests with ROS Methds 1. Click GdnessfFit Tests With NDs GammaROS Estimates r LgROS Estimates 2. Select the distributin t be tested: Nrmal, Lgnrmal, r Gamma T test fr nrmal distributin, click n Nrmal frm the drpdwn menu list. T test fr gamma distributin, click n Gamma frm the drpdwn menu list. T test fr lgnrmal distributin, click n Lgnrmal frm the drpdwn menu Nrmal r Lgnrmal Distributin (LgROS Estimates) 1. Click GdnessfFit Tests With NDs LgROS Estimates Nrmal, Lgnrmal 2. The Select Variables screen (Chapter 3) will appear. Select ne r mre variable(s) frm the Select Variables screen. If graphs have t be prduced by using a Grup variable, then select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) buttn. This will result in a drpdwn list f available variables. The user shuld select and click n an apprpriate variable representing a grup variable. When the ptin buttn: Nrmal r Lgnrmal is clicked, the fllwing windw appears. 92
119 The default ptin fr the Cnfidence Cefficient is 95%. The default GOF test Methd is ShapirWilk. The default ptin fr Graphs by Grup is Grup Graphs. If yu want t display graphs fr all selected variables individually, check the buttn next t Individual Graphs. Click the OK buttn t cntinue r the Cancel buttn t cancel the ptin. Click the OK buttn t cntinue r the Cancel buttn t cancel the GOF tests. Example 82c (cntinued). Cnsider the arsenic Oahu data set with NDs cnsidered earlier in this chapter. The lgnrmal GOF test results n LROS data (detected and imputed LROS NDs) is shwn in the fllwing GOF QQ plt. Output Screen fr Lgnrmal Distributin (LgROS Estimates) Selected Optins: Shapir Wilk test with Best Line Fit Nte: The fnt size f ND values is smaller than that f the detected values. 93
120 8.4.2 Gamma Distributin (GammaROS Estimates) 1. Click GdnessfFit Tests With NDs GammaROS Estimates Gamma 2. The Select Variables screen (Chapter 3) will appear. Select ne r mre variable(s) frm the Select Variables screen. If graphs have t be generated by using a Grup variable, then select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) buttn. This will result in a drpdwn list f available variables. The user shuld select and click n an apprpriate variable representing a grup variable. When the ptin buttn (Gamma) is clicked, the fllwing windw will be shwn. The default ptin fr the Cnfidence Cefficient is 95%. The default GOF test Methd is Andersn Darling. The default ptin fr Graph by Grups is Grup Graphs. If yu want t generate separate graphs fr all selected variables, the check the buttn next t Individual Graphs. Click the OK buttn t cntinue r the Cancel buttn t cancel the GOF tests. 94
121 Example 82d (cntinued). Cnsider the arsenic Oahu data set with NDs cnsidered earlier. The gamma GOF test results n GROS data (detected and imputed GROS NDs) are shwn in the fllwing GOF QQ plt. Output Screen fr Gamma Distributin (GammaROS Estimates) Selected Optins: Andersn Darling Nte: The fnt size f ND values in the abve graph (and in all GOF graphs) is smaller than that f detected values. 8.5 GdnessfFit Tests with DL/2 Estimates 1. Click GdnessfFit Tests With NDs DL/2 Estimates 2. Select the distributin t be tested: Nrmal, Gamma, r Lgnrmal T test fr nrmality, click n Nrmal frm the drpdwn menu list. T test fr lgnrmality, click n Lgnrmal frm the drpdwn menu list. T test fr a gamma distributin, click n Gamma frm the drpdwn menu list. 95
122 8.5.1 Nrmal r Lgnrmal Distributin (DL/2 Estimates) 1. Click GdnessfFit Tests With NDs DL/2 Estimates Nrmal r Lgnrmal 2. The Select Variables screen (Chapter 3) will appear. Select ne r mre variable(s) frm the Select Variables screen. If graphs have t be generated by using a Grup variable, then select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) buttn. This will result in a drpdwn list f available variables. The user shuld select and click n an apprpriate variable representing a grup variable. The rest f the prcess t determine the distributin (nrmal, lgnrmal, and gamma) f the data set thus btained is the same as described in earlier sectins. 8.6 GdnessfFit Test Statistics The G.O.F. ptin displays all GOF test statistics available in PrUCL. This ptin is used when the user des nt knw which GOF test t use t determine the data distributin. Based upn the infrmatin prvided by the GOF test results, the user can perfrm an apprpriate GOF test t generate GOF QQ plt based upn the hypthesized distributin. This ptin is available fr uncensred as well as left censred data sets. Input and utput screens assciated with the G.O.F statistics ptin fr data sets with NDs are summarized as fllws. 1. Click GdnessfFit With NDs G.O.F. Statistics 96
123 2. The Select Variables screen (Chapter 3) will appear. Select ne r mre variable(s) frm the Select Variables screen. When the ptin buttn is clicked, the fllwing windw will be shwn. The default cnfidence level is 95%. Click the OK buttn t cntinue r the Cancel buttn t cancel the ptin. Example 82e (cntinued). Cnsider the arsenic Oahu data set with NDs discussed earlier. Partial GOF test results, btained using the G.O.F. Statistics ptin, are summarized in the fllwing table. Sample Output Screen fr G.O.F. Test Statistics n Data Sets with Nndetect Observatins 97
124 98
125 Chapter 9 SingleSample and TwSample Hyptheses Testing Appraches This chapter illustrates singlesample and twsample parametric and nnparametric hyptheses testing appraches as incrprated in the PrUCL sftware. All hypthesis tests are available under the "Statistical Tests" mdule f PrUCL 5.0. The PrUCL sftware can perfrm these hyptheses tests n data sets with and withut ND bservatins. It shuld be pinted ut that, when ne wants t use twsample hyptheses tests n data sets with NDs, PrUCL 5.0 assumes that samples frm bth f the samples/grups have ND bservatins. All this means is that, a ND clumn (with 0 r 1 entries nly) needs t be prvided fr the variable in each f the tw samples. This has t be dne even if ne f the samples (e.g., Site) has all detected entries; in this case the assciated ND clumn will have all entries equal t '1.' This will allw the user t cmpare tw grups (e.g., arsenic in backgrund vs. site samples) with ne f the grups having sme NDs and the ther grup having all detected data. 9.1 SingleSample Hyptheses Tests In many envirnmental applicatins, singlesample hyptheses tests are used t cmpare site data with prespecified cleanup standards r cmpliance limits (CLs). The singlesample hyptheses tests are useful when the envirnmental parameters such as the cleanup standard (C s ), actin level, r CLs are knwn, and the bjective is t cmpare site cncentratins with thse knwn preestablished threshld values. Specifically, a ttest (r a sign test) may be used t verify the attainment f cleanup levels at an AOC after a remediatin activity; and a test fr prprtin may be used t verify if the prprtin f exceedances f an actin level (r a cmpliance limit) by sample cncentratins cllected frm an AOC (r a MW) exceeds a certain specified prprtin (e.g., 1%, 5%, 10%). PrUCL 5.0 can perfrm these hyptheses tests n data sets with and withut ND bservatins. Hwever, it shuld be nted that fr singlesample hyptheses tests (e.g., sign test, prprtin test) used t cmpare site mean/median cncentratin level with a C s r a CL (e.g., prprtin test), all NDs (if any) shuld lie belw the cleanup standard, C s. Fr prper use f these hyptheses testing appraches, the differences between these tests shuld be nted and understd. Specifically, a ttest r a WSR test is used t cmpare the measures f lcatin and central tendencies (e.g., mean, median) f a site area (e.g., AOC) t a cleanup standard, C s, r actin level als representing a measure f central tendency (e.g., mean, median); whereas, a prprtin test cmpares if the prprtin f site bservatins frm an AOC exceeding a CL exceeds a specified prprtin, P 0 (e.g., 5%, 10%). PrUCL 5.0 has graphical methds that may be used t visually cmpare the cncentratins f a site AOC with an actin level. This can be dne using a bx plt f site data with hrizntal lines displayed at actin levels n the same graph. The details f the varius singlesample hyptheses testing appraches are prvided in the assciated PrUCL Technical Guide. 99
126 9.1.1 SingleSample Hypthesis Testing fr Full Data withut Nndetects 1. Click Single Sample Hypthesis Full (w/ NDs) 2. Select Full (w/ NDs) This ptin is used fr full data sets withut nndetects. T perfrm a ttest, click n ttest frm the drpdwn menu as shwn abve. T perfrm a Prprtin test, click n Prprtin frm the drpdwn menu. T run a Sign test, click n Sign test frm the drpdwn menu. T run a Wilcxn Signed Rank (WSR) test, click n Wilcxn Signed Rank frm the drpdwn menu. All singlesample hypthesis tests fr uncensred and leftcensred data sets can be perfrmed by a grup variable. The user selects a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) buttn. This will result in a drpdwn list f available variables. The user shuld select and click n an apprpriate variable representing a grup variable. 100
127 SingleSample ttest 1. Click Single Sample Hypthesis Full (w/ NDs) ttest 2. The Select Variables screen will appear. Select variable(s) frm the Select Variables screen. When the Optins buttn is clicked, the fllwing windw will be shwn. Specify the Cnfidence Level; default is Specify meaningful values fr Substantial Difference, S and the Actin Level. The default chice fr S is 0. Select frm f Null Hypthesis; default is Sample Mean <= Actin Level (Frm 1). Click n OK buttn t cntinue r n Cancel buttn t cancel the test. 101
128 Example 91a. Cnsider the WSR data set described in EPA (2006a). One Sample ttest results are summarized as fllws. Output fr SingleSample ttest (Full Data w/ NDs) SingleSample Prprtin Test 1. Click Single Sample Hypthesis Full (w/ NDs) Prprtin 102
129 2. The Select Variables screen will appear. Select variable(s) frm the Select Variables screen. When the Optins buttn is clicked, the fllwing windw will be shwn. Specify the Cnfidence level; default is Specify the Prprtin level and a meaningful Actin Level. Select the frm f Null Hypthesis; default is Sample 1 Prprtin <= P0 (Frm 1). Click n OK buttn t cntinue r n Cancel buttn t cancel the test. 103
130 Example 91b (cntinued). Cnsider the WSR data set described in EPA (2006a). One Sample prprtin test results are summarized as fllws. Output fr SingleSample Prprtin Test (Full Data withut NDs) SingleSample Sign Test 1. Click Single Sample Hypthesis Full (w/ NDs) Sign test 2. The Select Variables screen will appear. Select variable(s) frm the Select Variables screen. When the Optins buttn is clicked, the fllwing windw will be shwn. 104
131 Specify the Cnfidence Level; default chice is Specify meaningful values fr Substantial Difference, S and Actin Level. Select the frm f Null Hypthesis; default is Sample Median <= Actin Level (Frm 1). Click n OK buttn t cntinue r n Cancel buttn t cancel the test. Example 91c (cntinued). Cnsider the WSR data set described in EPA (2006a). The Sign test results are summarized as fllws. Output fr SingleSample Sign Test (Full Data withut NDs) 105
132 SingleSample Wilcxn Signed Rank (WSR) Test 1. Click Single Sample Hypthesis Full (w/ NDs) Wilcxn Signed Rank 2. The Select Variables screen will appear. Select variable(s) frm the Select Variables screen. When the Optins buttn is clicked, the fllwing windw will be shwn. Specify the Cnfidence Level; default is Specify meaningful values fr Substantial Difference, S, and Actin Level. Select frm f Null Hypthesis; default is Mean/Median <= Actin Level (Frm 1). Click n OK buttn t cntinue r n Cancel buttn t cancel the test. 106
133 Example 91d (cntinued). Cnsider the WSR data set described in EPA (2006a). One Sample WSR test results are summarized as fllws. Output fr SingleSample Wilcxn Signed Rank Test (Full Data withut NDs) SingleSample Hypthesis Testing fr Data Sets with Nndetects Mst f the nesample tests such as the Prprtin test and the Sign test n data sets with ND values assume that all ND bservatins lie belw the specified actin level, A 0. These singlesample tests are nt perfrmed if ND bservatins exceed the actin levels. Singlesample hypthesis tests fr data sets with NDs are shwn in the fllwing PrUCL 5.0 screen sht. 1. Click n Single Sample Hypthesis With NDs 107
134 2. Select the With NDs ptin T perfrm a prprtin test, click n Prprtin frm the drpdwn menu. T perfrm a sign test, click n Sign test frm the drpdwn menu. T perfrm a Wilcxn Signed Rank test, click n Wilcxn Signed Rank frm the drpdwn menu list Single Prprtin Test n Data Sets with NDs 1. Click Single Sample Hypthesis With NDs Prprtin 2. The Select Variables screen will appear. Select variable(s) frm the Select Variables screen. If hypthesis test has t be perfrmed by using a Grup variable, then select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) buttn. This will result in a drpdwn list f available variables. The user shuld select and click n an apprpriate variable representing a grup variable. This ptin has been used in the fllwing screen sht fr the singlesample prprtin test. 108
135 When the Optins buttn is clicked, the fllwing windw will be shwn. Specify the Cnfidence Level; default is Specify meaningful values fr Prprtin and the Actin Level (=15 here). Select frm f Null Hypthesis; default is Sample 1 Prprtin, P <= P0 (Frm 1). Click n OK buttn t cntinue r n Cancel buttn t cancel the test. Example 92a. Cnsider the cpper and zinc data set cllected frm tw znes: Alluvial Fan and Basin Trugh discussed in the literature (Helsel, 2012, NADA in R [Helsel, 2013]). This data set is used here t illustrate the ne sample prprtin test n a data set with NDs. The utput sheet generated by PrUCL 5.0 is described as fllws. 109
136 Output fr SingleSample Prprtin Test (with NDs) by Grups: Alluvial Fan and Basin Trugh 110
137 SingleSample Sign Test with NDs 1. Click Single Sample Hypthesis With NDs Sign test 2. The Select Variables screen will appear. Select variable(s) frm the Select Variables screen. When the Optins buttn is clicked, the fllwing windw will be shwn. Specify the Cnfidence Level; default is Select an Actin Level. Select the frm f Null Hypthesis; default is Sample Median <= Actin Level (Frm 1). Click n OK buttn t cntinue r n Cancel buttn t cancel the test. Example 92b (cntinued). Cnsider the cpper and zinc data set cllected frm tw znes: Alluvial Fan and Basin Trugh discussed abve. This data set is used here t illustrate the SingleSample Sign test n a data set with NDs. The utput sheet generated by PrUCL 5.0 is described as fllws. 111
138 Output fr SingleSample Sign Test (Data with Nndetects) SingleSample Wilcxn Signed Rank Test with NDs 1. Click Single Sample Hypthesis With NDs Wilcxn Signed Rank 2. The Select Variables screen will appear. Select variable(s) frm the Select Variables screen. When the Optins buttn is clicked, the fllwing windw will be shwn. 112
139 Specify the Cnfidence Level; default is Specify an Actin Level. Select frm f Null Hypthesis; default is Sample Mean/Median <= Actin Level (Frm 1). Click n OK buttn t cntinue r n Cancel buttn t cancel the test. Example 92c (cntinued). Cnsider the cpper and zinc data set cllected frm tw znes: Alluvial Fan and Basin Trugh discussed earlier in this chapter. This data set is used here t illustrate ne sample Wilcxn Signed Rank test n a data set with NDs. The utput sheet generated by PrUCL 5.0 is prvided as fllws. Output fr SingleSample Wilcxn Signed Rank Test (Data with Nndetects) 113
140 9.2 TwSample Hyptheses Testing Appraches The twsample hyptheses testing appraches available in PrUCL 5.0 are described in this sectin. Like SingleSample Hypthesis, the TwSample Hypthesis ptins are available under the "Statistical Tests" mdule f PrUCL 5.0. These appraches are used t cmpare the parameters and distributins f the tw ppulatins (e.g., Backgrund vs. AOC) based upn data sets cllected frm thse ppulatins. Bth frms (Frm 1 and Frm 2, and Frm 2 with Substantial Difference, S) f the twsample hypthesis testing appraches are available in PrUCL 5.0. The methds are available fr fulluncensred data sets as well as fr data sets with ND bservatins with multiple detectin limits. Full (w/ NDs) perfrms parametric and nnparametric hypthesis tests n uncensred data sets cnsisting f all detected values. The fllwing tests are available: Student s t and Satterthwaite tests t cmpare the means f tw ppulatins (e.g. Backgrund versus AOC). Ftest t the check the equality f dispersins f tw ppulatins. Twsample nnparametric WilcxnMannWhitney (WMW) test. This test is equivalent t Wilcxn Rank Sum (WRS) test. With NDs perfrms hypthesis tests n leftcensred data sets cnsisting f detected and ND values. The fllwing tests are available: WilcxnMannWhitney test. All bservatins (including detected values) belw the highest detectin limit are treated as ND (less than the highest DL) values. Gehan s test is useful when multiple detectin limits may be present. 114
141 TarneWare test is useful when multiple detectin limits may be present. The details f these methds can be fund in the PrUCL 5.0 Technical Guide and are als available in EPA (2002b, 2006a, 2009a, 2009b). It is emphasized that the use f infrmal graphical displays (e.g., sidebyside bx plts, multiple QQ plts) shuld always accmpany the frmal hypthesis testing appraches listed abve. This is especially warranted when data sets may cnsist f NDs with multiple detectin limits and bservatins frm multiple ppulatins (e.g., mixture samples cllected frm varius nsite lcatins) and utliers. Ntes: As mentined befre, it is pinted ut that, when ne wants t use twsample hyptheses tests n data sets with NDs, PrUCL 5.0 assumes that samples frm bth f the grups have ND bservatins. This may nt be the case, as data frm a plluted site may nt have any ND bservatins. PrUCL can handle such data sets; the user will have t prvide a ND clumn (with 0 r 1 entries nly) fr the selected variable f each f the tw samples/grups. Thus when ne f the samples (e.g., site arsenic) has n ND value, the user supplies an assciated ND clumn with all entries equal t '1'. This will allw the user t cmpare tw grups (e.g., arsenic in backgrund vs. site samples) with ne f the grups having sme NDs and the ther grup having all detected data TwSample Hypthesis Tests fr Full Data Full (w/ NDs): This ptin is used t analyze data sets cnsisting f all detected values. The fllwing twsample tests are available in PrUCL 5.0. Student s t and Satterthwaite tests t cmpare the means f tw ppulatins (e.g., Backgrund versus AOC). Ftest is als available t test the equality f dispersins f tw ppulatins. Twsample nnparametric WilcxnMannWhitney (WMW) test. Student s ttest Based upn cllected data sets, this test is used t cmpare the mean cncentratins f tw ppulatins/grups prvided the ppulatins are nrmally distributed. The data sets are represented by independent randm bservatins, X1, X2,..., Xn cllected frm ne ppulatin (e.g., site), and independent randm bservatins, Y1, Y2,..., Ym cllected frm anther (e.g., backgrund) ppulatin. The same terminlgy is used fr all ther twsample tests discussed in the fllwing subsectins f this sectin. Student s ttest als assumes that the spreads (variances) f the tw ppulatins are apprximately equal. The Ftest can be used t the check the equality f dispersins f tw ppulatins. A cuple f ther tests (e.g., Levene, 1960) are als available t cmpare the variances f tw ppulatins. Since the Ftest perfrms fairly well, ther tests are nt included in the PrUCL sftware. Fr mre details refer t PrUCL 5.0 Technical Guide. 115
142 Satterthwaite ttest This test is used t cmpare the means f tw ppulatins when the variances f thse ppulatins may nt be equal. As mentined befre, the Fdistributin based test can be used t verify the equality f dispersins f the tw ppulatins. Hwever, this test alne is mre pwerful test t cmpare the means f tw ppulatins (see the PrUCL 5.0 Technical Guide fr further details). Test fr Equality f tw Dispersins (Ftest) This test is used t determine whether the true underlying variances f tw ppulatins are equal. Usually the Ftest is emplyed as a preliminary test, befre cnducting the twsample ttest fr testing the equality f means f tw ppulatins. The assumptins underlying the Ftest are that the twsamples represent independent randm samples frm tw nrmal ppulatins. The Ftest fr equality f variances is sensitive t departures frm nrmality. TwSample Nnparametric WMW Test This test is used t determine the cmparability f the tw cntinuus data distributins. This test als assumes that the shapes (e.g., as determined by spread, skewness, and graphical displays) f the tw ppulatins are rughly equal. The test is ften used t determine if the measures f central lcatins (mean, median) f the tw ppulatins are significantly different. The WilcxnMannWhitney test des nt assume that the data are nrmally r lgnrmally distributed. Fr large samples (e.g., 20), the distributin f the WMW test statistic can be apprximated by a nrmal distributin. Ntes: The use f the tests listed abve is nt recmmended n lgtransfrmed data sets, especially when the parameters f interests are the ppulatin means. In practice, the cleanup and remediatin decisins have t be made in the riginal scale based upn statistics and estimates cmputed in the riginal scale. The equality f means in lgscale des nt necessarily imply the equality f means in the riginal scale. 1. Click n Tw Sample Hypthesis Full (w/ NDs) 116
143 2. Select the Full (w/ NDs) ptin T perfrm a ttest, click n t Test frm the drpdwn menu. T perfrm a WilcxnMannWhitney, click n WilcxnMannWhitney frm the drpdwn menu list TwSample ttest withut NDs 1. Click n Tw Sample Hypthesis Full (w/ NDs) t Test 2. The Select Variables screen will appear. Select variable(s) frm the Select Variables screen. Withut Grup Variable: This ptin is used when the sampled data f the variable (e.g., lead) fr the tw ppulatins (e.g., site vs. backgrund) are given in separate clumns. With Grup Variable: This ptin is used when sampled data f the variable (e.g., lead) fr the tw ppulatins (e.g., site vs. backgrund) are given in the same clumn. The values are separated int different ppulatins (grups) by the values f an assciated Grup ID Variable. The grup variable may represent several ppulatins (e.g., backgrund, surface, subsurface, silt, clay, sand, several AOCs, MWs). The user can cmpare tw grups at a time by using this ptin. When the Grup ptin is used, the user then selects a grup variable by using the Grup Variable. The user shuld select an apprpriate variable representing a grup variable. The user can use letters, numbers, r alphanumeric labels fr the grup names. When the Optins buttn is clicked, the fllwing windw will be shwn. 117
144 Specify a useful Substantial Difference, S value. The default chice is 0. Select the Cnfidence Cefficient. The default chice is 95%. Select the frm f Null Hypthesis. The default is Sample 1 <= Sample 2 (Frm 1). Click n OK buttn t cntinue r n Cancel buttn t cancel the ptin. Click n OK buttn t cntinue r n Cancel buttn t cancel the Sample 1 versus Sample 2 Cmparisn. Example 93. Cnsider the manganese cncentratins data set cllected frm three wells: MW1, an upgradient well, and MW8 and MW9 are tw dwngradient wells. The twsample ttest results cmparing Mn cncentratins in MW8 vs. MW9 are described as fllws. 118
145 Output fr TwSample ttest (Full Data withut NDs) 119
146 TwSample WilcxnMannWhitney (WMW) Test withut NDs 1. Click n Tw Sample Hypthesis Testing Full (w/ NDs) WilcxnMannWhitney 2. The Select Variables screen will appear. Select variable(s) frm the Select Variables screen. Withut Grup Variable: This ptin is used when the data values f the variable (arsenic) are given in separate clumns. With Grup Variable: This ptin is used when data f the variable (arsenic) are given in the same clumn. The values are separated int different samples (grups) by the values f an assciated Grup Variable. When the Grup ptin is used, the user then selects a grup variable by using the Grup Variable. The user shuld select an apprpriate variable representing a grup variable. The user can use letters, numbers, r alphanumeric labels fr the grup names. Ntes: PrUCL 5.0 has been written using envirnmental terminlgy such as perfrming backgrund versus site cmparisns. Hwever, all tests and prcedures incrprated in PrUCL 5.0 can be used n data sets frm any ther applicatin. Fr ther applicatins such as cmparing 120
147 a new treatment drug versus lder treatment drug, the grup variable may represent the tw grups: Cntrl Drug and New Drug. When the Optins buttn is clicked, the fllwing windw is shwn. Specify a Substantial Difference, S value. The default chice is 0. Chse the Cnfidence Cefficient. The default chice is 95%. Select the frm f Null Hypthesis. The default is Sample 1<= Sample 2 (Frm 1). Click n OK buttn t cntinue r n Cancel buttn t cancel the selected ptins. Click n OK t cntinue r n Cancel t cancel Sample 1 vs. Sample 2 cmparisn. Example 94. The twsample Wilcxn Mann Whitney (WMW) test results n a data set with ties are summarized as fllws. Output fr TwSample WilcxnMannWhitney Test (Full Data with ties) 121
148 9.2.2 TwSample Hypthesis Testing fr Data Sets with Nndetects 1. Click Tw Sample Hypthesis With NDs 2. Select the With NDs ptin. A list f available tests will appear (shwn abve). T perfrm a WilcxnMannWhitney test, click n WilcxnMannWhitney frm the drpdwn menu list. T perfrm a Gehan test, click n Gehan frm the drpdwn menu. T perfrm a TarneWare test, click n TarneWare frm the drpdwn menu TwSample WilcxnMannWhitney Test with Nndetects 1. Click Tw Sample Hypthesis With NDs WilcxnMannWhitney 122
149 2. The Select Variables Screen shwn belw will appear. Select variable(s) frm the Select Variables screen. Withut Grup Variable: This ptin is used when the data values f the variable (e.g., TCDD 2,3,7,8) fr the site and the backgrund are given in separate clumns. With Grup Variable: This ptin is used when data values f the variable (TCDD 2, 3, 7, 8) are given in the same clumn. The values are separated int different samples (grups) by the values f an assciated Grup Variable. When using this ptin, the user shuld select an apprpriate variable representing grups such as AOC1, AOC2, AOC3,..., and s n. When the Optins buttn is clicked, the fllwing windw will be shwn. Chse the Cnfidence Cefficient. The default chice is 95%. Select the frm f Null Hypthesis. The default is Sample 1 <= Sample 2 (Frm 1). 123
150 Click n OK buttn t cntinue r n Cancel buttn t cancel the selected ptins. Click n OK t cntinue r n Cancel t cancel the Sample 1 vs. Sample 2 cmparisn. Example 95. Cnsider a tw sample data set with nndetects and multiple detectin limits. Since the data sets have mre than ne detectin limit, therefre it is nt recmmended t use the WMW test n this data set. Hwever, smetimes, the users tend t use the WMW test n data sets with multiple detectin limits. The WMW test results are summarized as fllws: Output fr TwSample WilcxnMannWhitney Test (with Nndetects) Ntes: In the WMW test, all bservatins belw the largest detectin limit are cnsidered as NDs (ptentially including sme detected values) and hence they all receive the same average rank. This actin tends t reduce the assciated pwer f the WMW test cnsiderably. This in turn may lead t an incrrect cnclusin TwSample Gehan Test fr Data Sets with Nndetects 1. Click Tw Sample Hypthesis With NDs Gehan 124
151 2. The Select Variables screen will appear. Select variable(s) frm the Select Variables screen. Withut Grup Variable: This ptin is used when the data values f the variable (Zinc) fr the tw data sets are given in separate clumns. With Grup Variable: This ptin is used when data values f the variable (Zinc) fr the tw data sets are given in the same clumn. The values are separated int different samples (grups) by the values f an assciated Grup Variable. When using this ptin, the user shuld select a grup variable representing grups/ppulatins such as Zne 1, Zne2, Zne3,... When the Optins buttn is clicked, the fllwing windw will be shwn. 125
152 Chse the Cnfidence Cefficient. The default chice is 95%. Select the frm f Null Hypthesis. The default is Sample 1 <= Sample 2 (Frm 1). Click n OK buttn t cntinue r n Cancel buttn t cancel selected ptins. Click n the OK buttn t cntinue r n the Cancel buttn t cancel the Sample 1 vs. Sample 2 Cmparisn. Example 96a. Cnsider the cpper and zinc data set cllected frm tw znes: Alluvial Fan and Basin Trugh discussed in the literature (Helsel, 2012, NADA in R [2013]). This data set is used here t illustrate the Gehan twsample test. The utput sheet generated by PrUCL 5.0 is described as fllws. Output fr TwSample Gehan Test (with Nndetects) 126
153 TwSample TarneWare Test fr Data Sets with Nndetects The twsample TarneWare (TW) test (1978) fr data sets with NDs is new in PrUCL Click Tw Sample Hypthesis Testing Tw Sample With NDs TarneWare 2. The Select Variables screen will appear. Select variable(s) frm the Select Variables screen. Withut Grup Variable: This ptin is used when the data values f the variable (Cu) fr the tw data sets are given in separate clumns. With Grup Variable: This ptin is used when data values f the variable (Cu) fr the tw data sets are given in the same clumn. The values are separated int different samples (grups) by the values f an assciated Grup Variable. When using this ptin, the user shuld select a grup variable by clicking the arrw next t the Grup Variable ptin fr a drpdwn list f available variables. The user selects an apprpriate grup variable representing grups. When the Optins buttn is clicked, the fllwing windw will be shwn. 127
154 Chse the Cnfidence Cefficient. The default chice is 95%. Select the frm f Null Hypthesis. The default is Sample 1 <= Sample 2 (Frm 1). Click n OK buttn t cntinue r n Cancel buttn t cancel selected ptins. Click n the OK buttn t cntinue r n the Cancel buttn t cancel the Sample 1 vs. Sample 2 Cmparisn. Example 96b (cntinued). Cnsider the cpper and zinc data set used earlier. The data set is used here t illustrate the TW twsample test. The utput sheet generated by PrUCL 5.0 is described as fllws. Output fr TwSample TarneWare Test (with Nndetects) 128
155 129
156 Chapter 10 Cmputing Upper Limits t Estimate Backgrund Threshld Values Based Upn Full Uncensred Data Sets and Left Censred Data Sets with Nndetects This chapter illustrates the cmputatins f the varius parametric and nnparametric statistics and upper limits that can be used as estimates f backgrund threshld values (BTVs) and ther nttexceed values. The BTV estimatin methds are available fr data sets with and withut nndetect (ND) bservatins. Technical details abut the cmputatin f the varius limits can be fund in the assciated PrUCL 5.0 Technical Guide. Fr each selected variable, this ptin cmputes varius upper limits such as upper predictin limits (UPLs), upper tlerance limits (UTLs), upper simultaneus limits (USLs) and upper percentiles t estimate the BTVs that are used in site versus backgrund evaluatins. Tw chices fr data sets are available t cmpute backgrund statistics: Full (w/ NDs) cmputes backgrund statistics fr uncensred full data sets withut any ND bservatin. With NDs cmputes backgrund statistics fr data sets cnsisting f detected as well as nndetected bservatins with multiple detectin limits. The user specifies the cnfidence cefficient (prbability) assciated with each interval estimate. PrUCL accepts a cnfidence cefficient value in the interval (0.5, 1), 0.5 inclusive. The default chice is Fr data sets with and withut NDs, PrUCL 5.0 can cmpute the fllwing upper limits t estimate BTVs: Parametric and nnparametric upper percentiles. Parametric and nnparametric UPLs fr a single bservatin, future r next k ( 1) bservatins, mean f next k bservatins. Here future k, r next k bservatins may represent k bservatins frm anther ppulatin (e.g., site) different frm the sampled (backgrund) ppulatin. Parametric and nnparametric UTLs. Parametric and nnparametric USLs. Nte n Cmputing Lwer Limits: In many envirnmental applicatins (e.g., grundwater mnitring), ne needs t cmpute lwer limits including: lwer predictin limits (LPLs), lwer tlerance limits (LTLs), r lwer simultaneus limit (LSLs). At present, PrUCL des nt directly cmpute a LPL, LTL, r a LSL. It shuld be nted that fr data sets with and withut nndetects, PrUCL utputs the several intermediate results and critical values (e.g., khat, nuhat, K, d2max) needed t cmpute the interval estimates and lwer limits. Fr data sets with and withut nndetects, except fr the btstrap methds, the same critical value (e.g., nrmal z value, Chebyshev critical value, r tcritical value) can be used t cmpute a parametric LPL, LSL, r a LTL (fr samples f size >30 t be able t use Natrella's apprximatin in LTL) as used in the cmputatin f a UPL, USL, r a UTL (fr samples f size >30). 130
157 Specifically, t cmpute a LPL, LSL, and LTL (n>30) the '+' sign used in the cmputatin f the crrespnding UPL, USL, and UTL (n>30) needs t be replaced by the '' sign in the equatins used t cmpute UPL, USL, and UTL (n>30). Fr specific details, the user may want t cnsult a statistician. Fr data sets withut nndetect bservatins, the user may want t use the Scut 2008 sftware package (EPA 2009c) t cmpute the varius parametric and nnparametric LPLs, LTLs (all sample sizes), and LSLs Backgrund Statistics fr Full Data Sets withut Nndetects 1. Click Upper Limits/BTVs Full (w/ NDs) 2. Select Full (w/ NDs) T cmpute the backgrund statistics assuming the nrmal distributin, click n Nrmal frm the drpdwn menu list. T cmpute the backgrund statistics assuming the gamma distributin, click n Gamma frm the drpdwn menu list. T cmpute the backgrund statistics assuming the lgnrmal distributin, click n Lgnrmal frm the drpdwn menu list. T cmpute the backgrund statistics using distributinfree nnparametric methds, click n NnParametric frm the drpdwn menu list. T cmpute and see all backgrund statistics available in PrUCL 5.0, click n the All ptin frm the drpdwn menu list. PrUCL will display data distributin, all parametric and nnparametric backgrund statistics in an Excel type spreadsheet. The user may use this utput sheet t select the mst apprpriate statistic t estimate a BTV Nrmal r Lgnrmal Distributin 1. Click Upper Limits/BTVs Full (w/ NDs) Nrmal r Lgnrmal 131
158 2. The Select Variables screen (Chapter 3) will appear. Select a variable(s) frm the Select Variables screen. T cmpute BTV estimates by a grup variable, select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) t btain a drpdwn list f available variables and select an apprpriate grup variable. When the Optin buttn is clicked, the fllwing windw will be shwn. Specify the Cnfidence Level; a number in the interval (0.5, 1), 0.5 inclusive. The default chice is Specify the Cverage cefficient (fr a percentile) needed t cmpute UTLs. Cverage represents a number in the interval (0.0, 1). The default chice is Remember, a UTL is an upper cnfidence limit (e.g., with cnfidence level = 0.95) fr a 95% (e.g., with cverage = 0.95) percentile. 132
159 Specify the Different r Future K Observatins. The default chice is 1. It is nted that when K = 1, the resulting interval will be a UPL fr a single future bservatin. In the example shwn abve, a value f K = 1 has been used. Click n OK buttn t cntinue r n Cancel buttn t cancel this ptin. Click n OK t cntinue r n Cancel buttn t cancel the Upper Limits/BTVs ptins. Example 101a. Cnsider the real data set cnsisting f cncentratins f several metals cllected frm a Superfund site. Aluminum cncentratins fllw a nrmal distributin and manganese cncentratins fllw a lgnrmal distributin. The nrmal and lgnrmal distributin based estimates f BTVs are summarized in the fllwing tw tables. Aluminum  Output Screen fr BTV Estimates Based upn a Nrmal Distributin (Full  Uncensred Data Set) 133
160 Manganese Output Screen fr BTV Estimates Based upn a Lgnrmal Distributin (FullUncensred Data Set) Gamma Distributin 1. Click Upper Limits/BTVs Full (w/ NDs) Gamma 134
161 2. The Select Variables screen (Chapter 3) will appear. Select a variable(s) frm the Select Variables screen. If needed, select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) t btain a drpdwn list f variables, and select a prper grup variable. When the Optin buttn is clicked, the fllwing windw will be shwn. Specify the Cnfidence Level; a number in the interval (0.5, 1), 0.5 inclusive. The default chice is Specify the Cverage level; a number in interval (0.0, 1). Default chice is Specify the Future K. The default chice is 1. Specify the Number f Btstrap Operatins. The default chice is Click n OK buttn t cntinue r n Cancel buttn t cancel the ptin. Click n OK t cntinue r n Cancel buttn t cancel the Upper Limits/BTVs ptins. 135
162 Example 101b (cntinued). Manganese cncentratins als fllw a gamma distributin. The gamma distributin based BTV estimates are summarized in the fllwing table generated by PrUCL 5.0. The Gamma GOF test is shwn in the fllwing figure. Gamma GOF Test fr Manganese Data Set Manganese  Output Screen fr BTV Estimates Based Upn a Gamma Distributin (FullUncensred Data Set) The mean manganese cncentratin is with sd = 134.5, and the maximum value = 530. The UTL based upn a lgnrmal distributin is which is significantly higher than the largest value f 530. It 136
163 is nted that the sd f the lgtransfrmed data is By cmparing BTV estimates cmputed using lgnrmal and gamma distributins, it is nted that the lgnrmal distributin based upper limits: UTL and UPL are significantly higher than thse based upn a gamma distributin cnfirming the statements made earlier that the use f a lgnrmal distributin tends t yield inflated values f the upper limits used t estimate envirnmental parameters (e.g., BTVs, EPCs). These upper limits are summarized as fllws. Lgnrmal Gamma (WH) Gamma (HW) UTL UPL Mean = 113.8, Max value = Nnparametric Methds 1. Click Upper Limits/BTVs Full (w/ NDs) NnParametric 2. The Select Variables screen (Chapter 3) will appear. Select a variable(s) frm the Select Variables screen. If needed, select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) t btain a drpdwn list f variables, and select a prper grup variable. When the Optin buttn is clicked, the fllwing windw will be shwn. Specify the Cnfidence Level; a number in the interval (0.5, 1), 0.5 inclusive. The default chice is
164 Specify the Cverage level; a number in the interval (0.0, 1). Default chice is Specify the Number f Btstrap Operatins. The default chice is Click n the OK buttn t cntinue r n the Cancel buttn t cancel the ptin. Click OK buttn t cntinue r Cancel buttn t cancel the Upper Limits/BTVs ptins. Example Lead cncentratins cllected frm the same Superfund site as used in Example 101 d nt fllw a discernible distributin. Nnparametric BTV estimates are summarized as fllws. Lead  Output Screen fr Nnparametric BTVs Estimates (FullUncensred Data Set) T cmpute nnparametric upper limits prviding the specified cverage (e.g., 0.95), sizes f the data sets shuld be fairly large (e.g., > 59). Fr details, cnsult the assciated PrUCL 5.0 Technical Guide. In this example the sample size is nly 24, and the cnfidence cefficient (CC) achieved by the nnparametric, UTL is nly 0.71 which is significantly lwer than the desired CC f
165 All Statistics Optin 1. Click Upper Limits/BTVs Full (w/ NDs) All 2. The Select Variables screen (Chapter 3) will appear. Select a variable(s) frm the Select Variables screen. If needed, select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) t btain a drpdwn list f variables, and select a prper grup variable. When the Optin buttn is clicked, the fllwing windw will be shwn. Specify the Cnfidence Level; a number in the interval (0.5, 1), 0.5 inclusive. The default chice is Specify the Cverage level; a number in the interval (0.0, 1). Default is 0.9. Specify the Future K. The default chice is 1. Specify the Number f Btstrap Operatins. The default chice is Click n OK buttn t cntinue r n Cancel buttn t cancel the ptin. Click n OK t cntinue r n Cancel buttn t cancel the Upper Limits/BTVs ptins. 139
166 Example 101c (cntinued). The varius BTV estimates based upn the manganese cncentratins cmputed using the All ptin f PrUCL are summarized as fllws. The All ptin cmputes and displays all available parametric and nnparametric BTV estimates. This ptin als infrms the user abut the distributin(s) f the data set. This ptin is specifically useful when ne has t prcess many analytes (variables) withut any knwledge abut their prbability distributins. Manganese  Output Screen fr All BTVs Estimates (FullUncensred Data Set) 140
167 10.2 Backgrund Statistics with NDs 1. Click Upper Limits/BTVs With NDs 2. Select the With NDs ptin. T cmpute the backgrund statistics assuming the nrmal distributin, click n Nrmal frm the drpdwn menu list. T cmpute the backgrund statistics assuming the gamma distributin, click n Gamma frm the drpdwn menu list. T cmpute the backgrund statistics assuming the lgnrmal distributin, click n Lgnrmal frm the drpdwn menu list. T cmpute the backgrund statistics using distributinfree methds, click n Nn Parametric frm the drpdwn menu list. T cmpute all available backgrund statistics in PrUCL 5.0, click n the All ptin frm the drpdwn menu list. 141
168 Nrmal r Lgnrmal Distributin 1. Click Upper Limits/BTVs With NDs Nrmal r Lgnrmal 2. The Select Variables screen (Chapter 3) will appear. Select a variable(s) frm the Select Variables screen. If needed, select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) t btain a drpdwn list f variables, and select a prper grup variable. When the ptin buttn is clicked, the fllwing windw will be shwn. Specify the Cnfidence Level; a number in the interval (0.5, 1), 0.5 inclusive. The default chice is Specify the Cverage level; a number in the interval (0.0, 1). Default chice is Specify the Future K. The default chice is 1. Specify the Number f Btstrap Operatins. The default chice is Click n the OK buttn t cntinue r n the Cancel buttn t cancel the ptin. Click n OK t cntinue r n Cancel buttn t cancel the Upper limits/btvs ptins. 142
169 Example 103a. Cnsider a small real TCE data set f size n=12 cnsisting f 4 ND bservatins. The detected data set f size 8 fllws a nrmal as well as a lgnrmal distributin. The BTV estimates using the LROS methd, nrmal and lgnrmal distributin n KM estimates, and nnparametric Chebyshev inequality and btstrap methds n KM estimates are summarized in the fllwing tw tables. It is nted that upper limits including UTL9595 and UPL95 based upn the rbust LROS methd yield much higher values than the ther methds including KM estimates in nrmal and lgnrmal equatins t cmpute the upper limits. It is nted that the detected data als fllws a gamma distributin. The gamma distributin (f detected data) based BTV estimates are described in the next sectin. TCE  Output Screen fr BTV Estimates Cmputed Using Nrmal Distributin f Detected Data (LeftCensred Data Set with NDs) 143
170 144 Output Screen fr BTV Estimates Cmputed Using a Lgnrmal Distributin f Detected Data (LeftCensred Data Set with NDs)
171 Gamma Distributin 1. Click Upper Limits/BTVs With NDs Gamma 2. The Select Variables screen (Chapter 3) will appear. Select a variable(s) frm the Select Variables screen. If needed, select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) t btain a drpdwn list f variables, and select a prper grup variable. When the Optin buttn is clicked, the fllwing windw will be shwn. Specify the Cnfidence Level; a number in the interval (0.5, 1), 0.5 inclusive. The default chice is Specify the Cverage level; a number in the interval (0.0, 1). Default chice is Click n the OK buttn t cntinue r n the Cancel buttn t cancel ptin. Click n OK t cntinue r n Cancel buttn t cancel the Upper Limits/BTVs ptins. 145
172 Example 103b (cntinued). It is nted that the detected TCE data cnsidered in Example 103 als fllws a gamma distributin. The gamma distributin based upper limits are summarized as fllws. TCE  Output Screen fr BTV Estimates Cmputed Using Gamma Distributin f Detected Data (LeftCensred Data Set with NDs) 146
173 The detected data set des nt fllw a nrmal distributin based upn the SW test, but fllws a nrmal distributin based upn the Lilliefrs test. Since the detected data set is f small size (=8), the nrmal GOF cnclusin is suspect. The detected data fllw a gamma distributin. There are several NDs reprted with a lw detectin limit f 0.68, therefre, GROS methd may yield infeasible negative imputed values. Therefre, the use f a gamma distributin n KM estimates is preferred t cmpute the varius BTV estimates. The gamma KM UTL9595 (HW) =11.34, and gamma KM UTL9595 (WH) = Any ne f these tw limits can be used t estimate the BTV Nnparametric Methds (with NDs) 1. Click Upper Limits/BTVs With NDs NnParametric 2. The Select Variables screen (Chapter 3) will appear. Select a variable(s) frm the Select Variables screen. If needed, select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) t btain a drpdwn list f variables, and select a prper grup variable. When the Optin buttn is clicked, the fllwing windw will be shwn. 147
174 Specify the Cnfidence Level; a number in the interval (0.5, 1), 0.5 inclusive. The default chice is Specify the Cverage level; a number in interval (0.0, 1). Default chice is Click n the OK buttn t cntinue r n the Cancel buttn t cancel the ptin. Click n OK t cntinue r n Cancel buttn t cancel the Upper Limit/BTVs ptin. Example 103c (cntinued). The nnparametric upper limits based the TCE data cnsidered in Example 103 are summarized in the fllwing table. TCE  Output Screen fr Nnparametric BTV Estimates (LeftCensred Data Set with NDs) 148
175 All Statistics Optin 1. Click Upper Limits/BTVs With NDs All 2. The Select Variables screen (Chapter 3) will appear. Select a variable(s) frm the Select Variables screen. If needed, select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) t btain a drpdwn list f variables, and select a prper grup variable. When the Optin buttn is clicked, the fllwing windw will be shwn. Specify the Cnfidence Level; a number in the interval (0.5, 1), 0.5 inclusive. The default chice is Specify the Cverage level; a number in the interval (0.0, 1). Default chice is Specify the Future K. The default chice is 1. Click n the OK buttn t cntinue r n the Cancel buttn t cancel the ptin. Click n OK t cntinue r n Cancel buttn t cancel the Upper Limits/BTVs ptin. 149
176 Example 103d (cntinued). BTV estimates using the All ptin fr the TCE data are summarized as fllws. The detected data set is f small size (n=8) and fllws a gamma distributin. The gamma GOF QQ plt based upn detected data is shwn in the fllwing figure. The relevant statistics have been highlighted in the utput table prvided after the gamma GOF QQ plt. TCE  Output Screen fr All BTV Estimates (LeftCensred Data Set with NDs) 150
177 Nte: Even thugh the data set failed the ShapirWilk test f nrmality, based upn Lilliefrs test it was cncluded that the data set fllws a nrmal distributin. Therefre instead f saying that the data set des nt fllw a nrmal distributin, PrUCL utputs that the data set fllws an apprximate nrmal distributin. In practice the tw tests can lead t different cnclusins, especially when the data set is f small size. In such instances, it is suggested that the user supplements test results with graphical displays t derive the final cnclusin. 151
178 As nted, detected data fllw a gamma as well as a lgnrmal distributin. The varius upper limits using Gamma ROS and Lgnrmal ROS methds and Gamma and Lgnrmal distributin n KM estimates are summarized as fllws. Summary f Upper Limits Cmputed using Gamma and Lgnrmal Distributin f Detected Data Sample Size = 12, N. f NDs = 4, % NDs = 33.33, Max Detect = 9.29 Upper Limits Gamma Distributin Reference/ Result Methd f Calculatin Lgnrmal Distributin Reference/ Result Methd f Calculatin Mean (KM) Lgged Mean (ROS) UPL95 (ROS) 9.79 WH PrUCL(ROS) UTL9595 (ROS) WH PrUCL(ROS) UPL95 (KM) 6.88 UTL9595 (KM) WH  PrUCL (KM Gamma) WH  PrUCL (KM Gamma) Helsel (2012), EPA (2009) LROS Helsel (2012), EPA (2009) LROS 7.06 KMLgnrmal EPA (2009) KM Lgnrmal EPA(2009) The statistics summarized abve demnstrate the merits f using the gamma distributin based upper limits t estimate decisin parameters (BTVs) f interest. These results summarized in the abve tables suggest that the use f a gamma distributin cannt be dismissed just because it is easier t use a lgnrmal distributin t mdel skewed data sets. 152
179 153
180 Chapter 11 Cmputing Upper Cnfidence Limits (UCLs) f Mean Based Upn FullUncensred Data Sets and LeftCensred Data Sets with Nndetects Several parametric and nnparametric UCL methds fr fulluncensred and leftcensred data sets cnsisting f ND bservatins with multiple detectin limits (DLs) are available in PrUCL 5.0. Methds such as the KaplanMeier (KM) and regressin n rder statistics (ROS) methds incrprated in PrUCL can handle multiple detectin limits. Fr details regarding the gdnessffit tests and UCL cmputatin methds available in PrUCL, cnsult the PrUCL 5.0 Technical Guide, Singh, Singh, and Engelhardt, 1997; Singh, Singh, and Iaci (2002); and Singh, Maichle, and Lee (USEPA, 2006). In PrUCL 5.0, tw chices are available t cmpute UCL statistics: Full (w/ NDs): Cmputes UCLs fr fulluncensred data sets withut any nndetects. With NDs: Cmputes UCLs fr data sets cnsisting f ND bservatins with multiple DLs r reprting limits (RLs). Fr full data sets withut NDs and als fr data sets with NDs, the fllwing ptins and chices are available t cmpute UCLs f the ppulatin mean. The user specifies a cnfidence level; a number in the interval (0.5, 1), 0.5 inclusive. The default chice is The prgram cmputes several nnparametric UCLs using the CLT, adjusted CLT, Chebyshev inequality, jackknife, and btstrap resampling methds. Fr the btstrap methd, the user can select the number f btstrap runs (resamples). The default chice fr the number f btstrap runs is The user is respnsible fr selecting an apprpriate chice fr the data distributin: nrmal, gamma, lgnrmal, r nnparametric. It is desirable that user determines data distributin using the GdnessfFit test ptin prir t using the UCL ptin. The UCL utput sheet als infrms the user if data are nrmal, gamma, lgnrmal, r a nndiscernible distributin. Prgram cmputes statistics depending n the user selectin. Fr data sets, which are nt nrmal, ne may try the gamma UCL next. The prgram will ffer yu advice if yu chse the wrng UCL ptin. Fr data sets, which are neither nrmal nr gamma, ne may try the lgnrmal UCL. The prgram will ffer yu advice if yu chse the wrng UCL ptin. 154
181 Data sets that are nt nrmal, gamma, r lgnrmal are classified as distributinfree nnparametric data sets. The user may use nnparametric UCL ptin fr such data sets. The prgram will ffer yu advice if yu chse the wrng UCL ptin. The prgram als prvides the All ptin. By selecting this ptin, PrUCL utputs mst f the relevant UCLs available in PrUCL. The prgram infrms the user abut the distributin f the underlying data set, and ffers advice regarding the use f an apprpriate UCL. Fr lgnrmal data sets, PrUCL can cmpute 90%, 95%, 97.5%, and 99% Land s statistic based HUCL f the mean. Fr all ther methds, PrUCL can cmpute a UCL fr any cnfidence cefficient (CC) in the interval (0.5, 1.0), 0.5 inclusive. If yu have selected a distributin, then PrUCL will prvide a recmmended UCL methd fr 0.95, cnfidence level. Even thugh PrUCL can cmpute UCLs fr any cnfidence cefficient level in the interval (0.5, 1.0), the recmmendatins are prvided nly fr 95% UCL; as EPC term is estimated by a 95% UCL f the mean. Ntes: Like all ther methds, it is recmmended that the user identify a few lw prbability (cming frm extreme tails) utlying bservatins that may be present in the data set. Outliers distrt statistics f interest including summary statistics, data distributins, test statistics, UCLs and BTVs. Decisins based upn distrted statistics may be misleading and incrrect. The bjective is t cmpute decisin statistics based upn the majrity f the data set representing the main dminant ppulatin. The prject team shuld decide abut the dispsitin (t include r nt t include) f utliers befre cmputing estimates the EPC terms and BTVs. T determine the influence f utliers n UCLs and backgrund statistics, the prject team may want t cmpute statistics twice: nce using the data set with utliers, and nce using the data set withut utliers. Nte n Cmputing Lwer Cnfidence Limits (LCLs) f Mean: In several envirnmental applicatins, ne needs t cmpute a LCL f the ppulatin mean. At present, PrUCL des nt directly cmpute LCLs f mean. It shuld be pinted ut that fr data sets with and withut nndetects, except fr the btstrap methds, gamma distributin (e.g., samples f sizes <50), and Hstatistic based LCL f mean, the same critical value (e.g., nrmal z value, Chebyshev critical value, r tcritical value) are used t cmpute a LCL f mean as used in the cmputatin f the UCL f mean. Specifically, t cmpute a LCL, the '+' sign used in the cmputatin f the crrespnding UCL needs t be replaced by the '' sign in the equatin used t cmpute that UCL (excluding gamma, lgnrmal Hstatistic, and btstrap methds). Fr specific details, the user may want t cnsult a statistician. Fr data sets withut nndetect bservatins, the user may want t use the Scut 2008 sftware package (EPA 2009c) t directly cmpute the varius parametric and nnparametric LCLs f mean. 155
182 11.1 UCLs fr Full (w/ NDs) Data Sets Nrmal Distributin (Full Data Sets withut NDs) 1. Click UCLs/EPCs Full (w/ NDs) Nrmal 2. The Select Variables screen (Chapter 3) will appear. Select a variable(s) frm the Select Variables screen. If needed, select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) t btain a drpdwn list f available variables t select a grup variable. When the Optin buttn is clicked, the fllwing windw will be shwn. 156
183 Specify the Cnfidence Level; a number in the interval (0.5, 1), 0.5 inclusive. The default chice is Click n OK buttn t cntinue r n Cancel buttn t cancel the ptin. Click n OK t cntinue r n Cancel t cancel the UCL cmputatin ptin. Example Cnsider the real data set cnsisting f cncentratins f several metals cllected frm a Superfund site; vanadium cncentratins fllw a nrmal distributin. The nrmal distributin based 95% UCLs f mean are summarized in the fllwing table. Vanadium  Output Screen fr Nrmal Distributin (Full Data w/ NDs) Gamma, Lgnrmal, Nnparametric, All Statistics Optin (Full Data withut NDs) 1. Click UCLs/EPCs Full (w/ NDs) Gamma, Lgnrmal, NnParametric, r All 157
184 2. The Select Variables screen (Chapter 3) will appear. Select a variable(s) frm the Select Variables screen. If needed, select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) t btain a drpdwn list f available variables, and select a prper grup variable. When the Optin buttn is clicked, the fllwing windw will be shwn. Specify the Cnfidence Level; a number in the interval (0.5, 1), 0.5 inclusive. Specify the Number f Btstrap Operatins (runs). Default chice is Click n OK buttn t cntinue r n Cancel buttn t cancel the UCLs ptin. Click n OK t cntinue r n Cancel t cancel the selected UCL cmputatin ptin. Example 112: This skewed data set f size n=25 with mean=44.09 was used in Chapter 2 f the Technical Guide. The data fllws a lgnrmal and a gamma distributin. The data are: , , , , , , , , , , , , , , , , , , , , , , , , and UCLs based upn Gamma, Lgnrmal, Nnparametric, and All ptins are summarized in the fllwing tables. 158
185 Output Screen fr Gamma Distributin Based UCLs (Full (w/ NDs)) 159
186 Output Screen fr Lgnrmal Distributin Based UCLs (Full (w/ NDs)) Output Screen fr Nnparametric UCLs (Full (w/ NDs)) 160
187 Output Screen fr All Statistics Optin (Full [w/ NDs]) 161
188 Ntes: Once again, the statistics summarized abve demnstrate the merits f using the gamma distributin based UCL f mean t estimate EPC terms. The use f a lgnrmal distributin tends t yield unrealistic UCLs f n practical merit (e.g., Lgnrmal UCL = and the maximum = in the abve example). The results summarized in the abve tables suggest that the use f a gamma distributin (when a data set fllws a gamma distributin) cannt be dismissed just because it is easier (Helsel and Gilry, 2012) t use a lgnrmal distributin t mdel skewed data sets. 162
189 Number f valid samples represents the ttal number f samples minus () the missing values (if any). The number f unique r distinct samples simply represents number f distinct bservatins. The infrmatin abut the number f distinct values is useful when using btstrap methds. Specifically, it is nt desirable t use btstrap methds n data sets with nly a few distinct values UCL fr LeftCensred Data Sets with NDs 1. Click UCLs/EPCs With NDs 2. Chse the Nrmal, Gamma, Lgnrmal, NnParametric, r All ptin. 3. The Select Variables screen (Chapter 3) will appear. Select a variable(s) frm the Select Variables screen. If needed, select a grup variable by clicking the arrw belw the Select Grup Clumn (Optinal) t btain a drpdwn list f available variables, and select a prper grup variable. The selectin f this ptin will cmpute the relevant statistics separately fr each grup that may be present in the data set. When the Optin buttn is clicked, the fllwing windw will be shwn. Specify the Cnfidence Level; a number in the interval (0.5, 1), 0.5 inclusive. The default chice is Specify the Number f Btstrap Operatins (runs). Default chice is Click n OK buttn t cntinue r n Cancel buttn t cancel the UCLs ptin. Click n OK t cntinue r n Cancel t cancel the selected UCL cmputatin ptin. 163
190 Example This real data set f size n=55 with 18.8% NDs (=10) is als used in Chapters 4 and 5 f the PrUCL Technical Guide. The minimum detected value is 5.2 and the largest detected value is 79000, sd f detected lgged data is 2.79 suggesting that the data set is highly skewed. The detected data fllw a gamma as well as a lgnrmal distributin. It is nted that GROS data set with imputed values fllws a gamma distributin and LROS data set with imputed values fllws a lgnrmal distributin (results nt included). The lgnrmal QQ plt based upn detected data is shwn in the fllwing figure. The varius UCL utput sheets: nrmal, nnparametric, gamma, and lgnrmal generated by PrUCL are summarized in tables fllwing the lgnrmal QQ plt n detected data. The main results have been highlighted in the utput screen prvided after the lgnrmal GOF QQ plt. Output Screen fr UCLs based upn Nrmal, Lgnrmal, and Gamma Distributins (f Detects) 164
191 GROS Statistics using imputed NDs Detected data fllw a gamma as well as a lgnrmal distributin. The varius upper limits using Gamma ROS and Lgnrmal ROS methds and Gamma and Lgnrmal distributin n KM estimates are summarized in the fllwing table. 165
192 Upper Cnfidence Limits Cmputed using Gamma and Lgnrmal Distributins f Detected Data Sample Size = 55, N. f NDs=10, % NDs = 18.18% Gamma Distributin Lgnrmal Distributin Reference/ Reference/ Upper Limits Result Methd f Calculatin Result Methd f Calculatin Min (detects) lgged Max (detects) lgged Mean (KM) lgged Mean (ROS) UCL95 (ROS) PrUCL 5.0 GROS UCL (KM) PrUCL KMGamma btstrapt n LROS, PrUCL 5.0 percentile btstrap n LROS, Helsel(2012) HUCL, KM mean and sd n lgged data, EPA (2009) The results summarized in the abve table reiterate that the use f a gamma distributin cannt be dismissed just because it is easier t use a lgnrmal distributin t mdel skewed data sets. These results als demnstrate that fr skewed data sets, ne shuld use btstrap methds which adjust fr data skewness (e.g., btstrap t methd) rather than using percentile btstrap methd. 166
193 167
194 Chapter 12 Sample Sizes Based Upn User Specified Data Quality Objectives (DQOs) and Pwer Assessment One f the mst frequent prblems in the applicatin f statistical thery t practical applicatins, including envirnmental prjects, is t determine the minimum number f samples needed fr sampling f reference/backgrund areas and survey units (e.g., ptentially impacted site areas, areas f cncern, decisin units) t make csteffective and defensible decisins abut the ppulatin parameters based upn the sampled discrete data. The sample size determinatin frmulae fr estimatin f the ppulatin mean (r sme ther parameters) depend upn certain decisin parameters including the cnfidence cefficient, (1α) and the specified errr margin (difference), Δ frm the unknwn true ppulatin mean, µ. Similarly, fr hyptheses testing appraches, sample size determinatin frmulae depend upn prespecified values f the decisin parameters selected while describing the data quality bjectives (DQOs) assciated with an envirnmental prject. The decisin parameters assciated with hyptheses testing appraches include Type I (false psitive errr rate, α) and Type II (false negative errr rate, β=1pwer) errr rates; and the allwable width, Δ f the gray regin. Fr values f the parameter f interest (e.g., mean, prprtin) lying in the gray regin, the cnsequences f cmmitting the tw types f errrs described abve are nt significant frm bth human health and csteffectiveness pint f view. Bth parametric (assuming nrmality) and nnparametric (distributin free) sample size determinatin frmulae as described in guidance dcuments (e.g., MARSSIM 2000; EPA [2002c, 2006a]) have been incrprated in the PrUCL sftware. Specifically, the DQOs Based Sample Sizes mdule f PrUCL can be used t determine sample sizes t estimate the mean, perfrm parametric and nnparametric singlesample and twsample hypthesis tests, and apply acceptance sampling appraches t address prject needs f the varius CERCLA and RCRA site prjects. The details can be fund in Chapter 8 f the PrUCL Technical Guide and in EPA guidance dcuments (EPA [2006a, 2006b]). New in PrUCL 5.0: The Sample size mdule in PrUCL 5.0 can be used at tw different stages f a prject. Mst f the sample size frmulae require sme estimate f the ppulatin standard deviatin (variability). Depending upn the prject stage, a standard deviatin: 1) represents a preliminary estimate f the ppulatin (e.g., study area) variability needed t cmpute the minimum sample size during the planning and design stage; r 2) represents the sample standard deviatin cmputed using the data cllected withut cnsidering DQOs prcess which is used t assess the pwer f the test based upn the cllected data. During the pwer assessment stage, if the cmputed sample size is larger than the size f already cllected data set, it can be inferred that the size f the cllected data set is nt large enugh t achieve the desired pwer. The frmulae t cmpute the sample sizes during the planning stage and after perfrming a statistical test are the same except that the estimates f standard deviatins are cmputed/estimated differently. Planning stage befre cllecting data: Sample size frmulae are cmmnly used during the planning stage f a prject t determine the minimum sample sizes needed t address prject bjectives (estimatin, hypthesis testing) with specified values f the decisin parameters (e.g., Type I and II errrs, width f gray regin). During the planning stage, since the data are nt cllected a priri, a preliminary rugh estimate f the ppulatin standard deviatin (t be expected in sampled data) is btained frm ther similar sites, pilt studies, r expert pinins. An estimate f the expected standard deviatin alng with the specified values f the ther decisin parameters are used t cmpute the minimum sample sizes needed t address the prject bjectives during the sampling planning stage; the prject team is expected 168
195 t cllect the number f samples thus btained. The detailed discussin f the sample size determinatin appraches during the planning stage can be fund in EPA [2006a] and MARSSIM [2000]. Pwer assessment stage after perfrming a statistical methd: Often, in practice, envirnmental samples/data sets are cllected withut taking the DQOs prcess int cnsideratin. Under this scenari, the prject team perfrms statistical tests n the available already cllected data set. Hwever, nce a statistical test (e.g., WMW test) has been perfrmed, the prject team can assess the pwer assciated with the test in retrspect. That is fr specified DQOs and decisin errrs (Type I errr and pwer f the test [=1Type II errr]) and using the sample standard deviatin cmputed based upn the already cllected data, the minimum sample size needed t perfrm the test fr specified values f the decisin parameters is cmputed. If the cmputed sample size btained using the sample variance is less than the size f the already cllected data set used t perfrm the test, it may be determined that the pwer f the test has been achieved. Hwever, if the sample size f the cllected data is less than the minimum sample size cmputed in retrspect, the user may want t cllect additinal samples t assure that the test achieves the desired pwer. It shuld be pinted ut that there culd be differences in the sample sizes cmputed in tw different stages due t the differences in the values f the estimated variability. Specifically, the preliminary estimate f the variance cmputed using infrmatin frm similar sites culd be significantly different frm the variance cmputed using the available data already cllected frm the study area under investigatin which will yield different values f the sample size. Sample size determinatin methds in PrUCL can be used fr bth stages. The nly difference will be in the input value f the standard deviatin/variance. It is user's respnsibility t input a crrect value fr the standard deviatin during the tw stages. 169
196 12.1 Estimatin f Mean 1. Click Stats/Sample Sizes DQOs Based Sample Sizes Estimate Mean 2. The fllwing ptins windw is shwn. Specify the Cnfidence Cefficient. Default is Specify the Estimate f standard deviatin. Default is 3. Specify the Allwable Errr Margin in Mean Estimate. Default is 10. Click n OK buttn t cntinue r n Cancel buttn t cancel the ptins. Output Screen fr Sample sizes fr Estimatin f Mean (CC = 95%, sd = 25, Errr Margin = 10) 170
197 12.2 Sample Sizes fr SingleSample Hypthesis Tests Sample Size fr SingleSample ttest 1. Click DQOs Based Sample Sizes Hypthesis Tests Single Sample Tests t Test The fllwing ptins windw is shwn. Specify the False Rejectin Rate (Alpha). Default is Specify the False Acceptance Rate (Beta). Default is 0.1. Specify the Estimate f standard deviatin. Default is 3. Specify the Width f the Gray Regin (Delta). Default is 2. Click n OK buttn t cntinue r n Cancel buttn t cancel the ptins. 171
198 Output Screen fr Sample Sizes fr SingleSample ttest (α = 0.05, β = 0.2, sd = 10.41, Δ = 10) Example frm EPA 2006a (page 49) Sample Size fr SingleSample Prprtin Test 1. Click DQOs Based Sample Sizes Hypthesis Tests Single Sample Tests Prprtin 2. The fllwing ptins windw is shwn. 172
199 Specify the False Rejectin Rate (Alpha). Default is Specify the False Acceptance Rate (Beta). Default is 0.1. Specify the Desirable Prprtin (P0). Default is 0.3. Specify the Width f the Gray Regin (Delta). Default is Click n OK buttn t cntinue r n Cancel buttn t cancel the ptins. Output Screen fr Sample Size fr SingleSample Prprtin Test (α = 0.05, β = 0.2, P0 = 0.2, Δ = 0.05) Example frm EPA 2006a (page 59) Sample Size fr SingleSample Sign Test 1. Click DQOs Based Sample Sizes Hypthesis Tests Single Sample Tests Sign Test 173
200 2. The fllwing ptins windw is shwn. Specify the False Rejectin Rate (Alpha). Default is Specify the False Acceptance Rate (Beta). Default is 0.1. Specify the Estimate f standard deviatin. Default is 3 Specify the Width f the Gray Regin (Delta). Default is 2. Click n OK buttn t cntinue r n Cancel buttn t cancel the ptins. Output Screen fr Sample Sizes fr SingleSample Sign Test (Default Optins) 174
201 Sample Size fr SingleSample Wilcxn Signed Rank Test 1. Click DQOs Based Sample Sizes Hypthesis Tests Single Sample Tests Wilcxn Signed Rank 2. The fllwing ptins windw is shwn. Specify the False Rejectin Rate (Alpha). Default is Specify the False Acceptance Rate (Beta). Default is 0.1. Specify the Estimate f standard deviatin f WSR Test Statistic. Default is 3 Specify the Width f the Gray Regin (Delta). Default is 2. Click n OK buttn t cntinue r n Cancel buttn t cancel the ptins. 175
202 Output Screen fr Sample Sizes fr SingleSample WSR Test (α = 0.1, β = 0.2, sd = 130, Δ = 100) Example frm EPA 2006a (page 65) 12.3 Sample Sizes fr TwSample Hypthesis Tests Sample Size fr TwSample ttest 1. Click DQOs Based Sample Sizes Hypthesis Tests Tw Sample Tests t Test 2. The fllwing ptins windw is shwn. 176
203 Specify the False Rejectin Rate (Alpha). Default is Specify the False Acceptance Rate (Beta). Default is 0.1. Specify the Estimate f standard deviatin. Default is 3 Specify the Width f the Gray Regin (Delta). Default is 2. Click n OK buttn t cntinue r n Cancel buttn t cancel the ptins. Output Screen fr Sample Sizes fr TwSample ttest (α = 0.05, β = 0.2, s p = 1.467, Δ = 2.5) Example frm EPA 2006a (page 68) Sample Size fr TwSample Wilcxn MannWhitney Test 1. Click DQOs Based Sample Sizes Hypthesis Tests Tw Sample Tests WilcxnMannWhitney 177
204 2. The fllwing ptins windw is shwn. Specify the False Rejectin Rate (Alpha). Default is Specify the False Acceptance Rate (Beta). Default is 0.1. Specify the Estimate f standard deviatin f WMW Test Statistic. Default is 3 Specify the Width f the Gray Regin (Delta). Default is 2. Click n OK buttn t cntinue r n Cancel buttn t cancel the ptins. Output Screen fr Sample Sizes fr SingleSample WMW Test (Default Optins) 178
205 12.4 Sample Sizes fr Acceptance Sampling 1. Click DQOs Based Sample Sizes Acceptance Sampling 2. The fllwing ptins windw is shwn. Specify the Cnfidence Cefficient. Default is Specify the Prprtin [P] f nncnfrming items/drums. Default is Specify the Number f Allwable nncnfrming items/drums. Default is 0. Click n OK buttn t cntinue r n Cancel buttn t cancel the ptins. Output Screen fr Sample Sizes fr Acceptance Sampling (Default Optins) 179
206 180
207 Chapter 13 Analysis f Variance Oneway Analysis f Variance (ANOVA) is a statistical technique that is used t cmpare the measures f central tendencies: means r medians f mre than tw ppulatins/grups. Oneway ANOVA is ften used t perfrm interwell cmparisns in grundwater mnitring prjects. Classical Oneway ANOVA is a generalizatin f the twsample ttest (Hgg and Craig, 1995); and nnparametric ANOVA, KruskalWallis test (Hllander and Wlfe, 1999) is a generalizatin f the tw sample Wilcxn Mann Whitney test. Theretical details f Oneway ANOVA are given in the PrUCL Technical Guide. Oneway ANOVA is available under the Statistical Tests mdule f PrUCL 5.0. It is advised t use these tests n raw data in the riginal scale withut transfrming the data (e.g., using a lgtransfrmatin) Classical Oneway ANOVA 1. Click Oneway ANOVA Classical The data file used shuld fllw the frmat as shwn belw; the data file shuld cnsist f a grup variable defining the varius grups (stacked data) t be evaluated using the Oneway ANOVA mdule. The Oneway ANOVA mdule can prcess multiple variables simultaneusly. 181
208 2. The Select Variables screen will appear. Select the variables fr testing. Select a Grup variable by using the arrw under the Grup Clumn ptin. Click OK t cntinue r Cancel t cancel the test. Example 131a. Cnsider Fisher s (1936) 3 species (grups) Iris flwer data set. Fisher cllected data n sepal length, sepal width, petal length and petal width fr each f the 3 species. Oneway ANOVA results with cnclusins fr the variable sepalwidth (spwidth) are shwn as fllws: Output fr a Classical Oneway ANOVA 182
209 13.2 Nnparametric ANOVA Nnparametric Oneway ANOVA r the Kruskal Wallis (KW) test is a generalizatin f the Mann Whitney twsample test. This is a nnparametric test and can be used when data frm the varius grups are nt nrmally distributed. 1. Click Oneway ANOVA Nnparametric Like classical Oneway ANOVA, nnparametric ANOVA als requires that the data file used shuld fllw the data frmat as shwn abve; the data file shuld cnsist f a grup variable defining the varius grups t be evaluated using the Oneway ANOVA mdule. 2. The Select Variables screen will appear. Select the variables fr testing. Select the Grup variable. Click OK t cntinue r Cancel t cancel the test. Example 131b (cntinued). Nnparametric Oneway ANOVA results with cnclusin fr sepallength (splength) are shwn as fllws. Output fr a Nnparametric ANOVA 183
210 184
211 Chapter 14 Ordinary Least Squares f Regressin and Trend Analysis The OLS f regressin and trend tests are ften used t determine trends ptentially present in cnstituent cncentratins at plluted sites, especially in GW mnitring applicatins. The OLS regressin and tw nnparametric trend tests: MannKendall test and TheilSen test are available under the Statistical Tests mdule f PrUCL 5.0. The details f these tests can be fund in Hllander and Wlfe (1999) and Draper and Smith (1998). Sme time series plts, which are useful in cmparing trends in analyte cncentratins f multiple grups (e.g., mnitring wells) are als available in PrUCL 5.0. The tw nnparametric trend tests: MK test and TheilSen test are meant t identify trends in time series data (data cllected ver a certain perid f time such as daily, mnthly, quarterly,...) with distinct values f the time variable (time f sampling events). If multiple bservatins are cllected/reprted at a sampling event (time), ne r mre pairwise slpes used in the cmputatin f the TheilSen test may nt be cmputed (becme infinite). Therefre, it is suggested t use the TheilSen test n data sets with ne measurement cllected at each sampling event. If multiple measurements are cllected at a sampling event, the user may want t use the average (r median, mde, minimum r maximum) f thse measurements resulting in a time series with ne measurement per sampling time event. TheilSen test in PrUCL 5.0 has an ptin which can be used t average multiple bservatins reprted fr the varius sampling events. The use f this ptin als cmputes MK test statistic and OLS statistics based upn averages f multiple bservatins cllected at the varius sampling events. The trend tests in PrUCL sftware als assume that the user has entered data in chrnlgical rder. If the data are nt entered prperly in chrnlgical rder, the graphical trend displays may be meaningless. Trend Analysis and OLS Regressin mdules handle missing values in bth respnse variable (e.g., analyte cncentratins) as well as the sampling event variable (called independent variable in OLS) Simple Linear Regressin 1. Click Statistical Tests OLS Regressin. 185
212 2. The Select Regressin Variables screen will appear. Select the Dependent Variable and the Independent Variable fr the regressin analysis. Select a grup variable (if any) by using the arrw belw the Select Grup Clumn (Optinal). The analysis will be perfrmed separately fr each grup. When the Optins buttn is clicked, the fllwing ptins windw will appear. Select Display Intervals fr the cnfidence limits and the predictin limits f each bservatin t be displayed at the specified Cnfidence Cefficient. The interval estimates will be displayed in the utput sheet. Select Display Regressin Table t display Yhat, residuals and the standardized residuals in the utput sheet. Select XY Plt t generate a scatter plt display shwing the regressin line. 186
213 Select Cnfidence Interval and Predictin Interval t display the cnfidence and the predictin bands arund the regressin line. Click n OK buttn t cntinue r n Cancel buttn t cancel the ptin. Click OK t cntinue r Cancel t cancel the OLS Regressin. The use f the abve ptins will display the fllwing graph n yur cmputer screen which can be cpied using the Cpy Chart (T Clipbard) in a Micrsft dcuments (e.g., wrd dcument) using the File Paste cmbinatin. The abve ptins will als generate an ExcelType utput sheet. A partial utput sheet is shwn belw fllwing the OLS Regressin Graph. Example 141a. Cnsider analyte cncentratins, X cllected frm a grundwater (GW) mnitring well, MW28 ver a certain perid f time. The bjective is t determine if there is any trend in GW cncentratins, X f the MW28. The OLS regressin line with inference abut slpe and intercept are shwn in the fllwing figure. The slpe and its assciated pvalue suggest that there is a significant dwnward trend in GW cncentratins f MW28. OLS Regressin Graph withut Regressin and Predictin Intervals 187
214 OLS Regressin Graph with Regressin and Predictin Intervals Partial Output f OLS Regressin Analysis 188
215 Verifying Nrmality f Residuals: As shwn in the abve partial utput, PrUCL displays residuals including standardized residuals n the OLS utput sheet. Thse residuals can be imprted (cpying and pasting) in an excel file t assess the nrmality f thse OLS residuals. The parametric trend evaluatins based upn the OLS slpe (significant slpe, cnfidence interval and predictin interval) are valid prvided the OLS residuals are nrmally distributed. Therefre, it is suggested that the user assesses the nrmality f OLS residuals befre drawing trend cnclusins using a parametric test based upn the OLS slpe estimate. When the assumptins are nt met, ne can use graphical displays and nnparametric trend tests t determine ptential trends present in a time series data set MannKendall Test 1. Click Statistical Tests Trend Analysis MannKendall. 2. The Select Trend Event Variables screen will appear. Select the Event/Time variable. This variable is ptinal t perfrm the MannKendall (MK) Test; hwever, fr graphical display it is suggested t prvide a valid Event/Time variable (numerical values nly). If the user wants t generate a graphical display withut prviding an Event/Time variable, PrUCL generates an index variable t represent sampling events. 189
216 Select the Values/Measured Data variable t perfrm the trend test. Select a grup variable (if any) by using the arrw belw the Select Grup Clumn (Optinal). When a grup variable is chsen, the analysis is perfrmed separately fr each grup represented by the grup variable. When the Optins buttn is clicked, the fllwing windw will be shwn. Specify the Cnfidence Level; a number in the interval (0.5, 1), 0.5 inclusive. The default chice is Select the trend lines t be displayed: OLS Regressin Line and/r TheilSen Trend Line. If nly Display Graphics is chsen, a time series plt will be generated. Click n OK buttn t cntinue r n Cancel buttn t cancel the ptin. Click OK t cntinue r Cancel t cancel the MannKendall test. 190
217 141b (Cntinued). The MK test results are shwn in the fllwing figure and in the fllwing MK test utput sheet. Based upn the MK test, it is cncluded that there is a statistically significant dwnward trend in GW cncentratins f the MW28. Mann Kendall Test Trend Graph Displaying all Selected Optins MannKendall Trend Test Output Sheet 191
218 14.3 Theil Sen Test T perfrm the TheilSen test, the user is required t prvide numerical values fr a sampling event variable (numerical values nly) as well as values f a characteristic (e.g., analyte cncentratins) f interest bserved at thse sampling events. 1. Click Statistical Tests Trend Analysis TheilSen. 2. The Select Variables screen will appear. Select an Event/Time Data variable. Select the Values/Measured Data variable t perfrm the test. Select a grup variable (if any) by using the arrw belw the Select Grup Clumn (Optinal). When a grup variable is chsen, the analysis is perfrmed separately fr each grup represented by the grup variable. When the Optins buttn is clicked, the fllwing windw will be shwn. 192
219 Specify the Cnfidence Level; a number in the interval (0.5, 1), 0.5 inclusive. The default chice is Select the trend lines t be displayed: OLS Regressin Line and/r TheilSen Trend Line. Click n OK buttn t cntinue r n Cancel buttn t cancel the ptin. Click OK t cntinue r Cancel t cancel the TheilSen Test. 141c (cntinued). The TheilSen test results are shwn in the fllwing figure and in the fllwing TheilSen test Output Sheet. It is cncluded that there is a statistically significant dwnward trend in GW cncentratins f MW28. TheilSen Test Trend Graph displaying all Selected Optins 193
220 TheilSen Trend Test Output Sheet Ntes: As with ther statistical test statistics, trend test statistics: MK test statistic, OLS regressin and TheilSen slpes may lead t different trend cnclusins. In such instances it is suggested that the user supplements statistical cnclusins with graphical displays. Averaging f Multiple Measurements at Sampling Events: In practice, when multiple bservatins are cllected/reprted at ne r mre sampling events (times), ne r mre pairwise slpes may becme infinite resulting in a failure t cmpute the TheilSen test statistic. In such cases, the user may want t preprcess the data befre using the TheilSen test. Specifically, t assure that nly ne measurement is available at each sampling event, the user preprcesses the time series data by cmputing average, median, mde, minimum, r maximum f the multiple bservatins cllected at thse sampling events. The TheilSen test in PrUCL 5.0 prvides the ptin f averaging multiple measurements cllected at the varius sampling events. This ptin als cmputes MK test and OLS regressin statistics using the averages f multiple measurements cllected at the varius sampling event. The OLS regressin and MK test can be perfrmed n data sets with multiple measurements taken at the varius sampling time events. Hwever, ften it is desirable t use the averages (r median) f measurements taken at the varius sampling events t determine ptential trends present in a timeseries data set Time Series Plts This ptin f the Trend Analysis mdule can be used t determine and cmpare trends in multiple grups ver the same perid f time. This ptin is specifically useful when the user wants t cmpare the cncentratins f multiple grups (wells) and the exact sampling event dates are nt be available (data nly ptin). The user may just want t graphically cmpare the timeseries data cllected frm multiple grups/wells during several quarters (every year, every 5 year,...). When the user wants t use this mdule using the" data/event" ptin, each grup (e.g., well) defined by a grup variable must have the same number f bservatins and shuld share the same sampling event values. That is the number f sampling events and values (e.g., quarter ID, year ID etc) fr each grup (well) must be the same fr this ptin t wrk. Hwever, the exact sampling dates (nt needed t use this ptin) in the varius quarters (years) d nt have t be the same as lng as the values f the sampling quarters/years (1,3,5,6,7,9,..) used in generating timeseries 194
221 plts fr the varius grups (wells) match. Using the gelgical and hydrlgical infrmatin, this kind f cmparisn may help the prject team in identifying nncmpliance wells (e.g., with upward trends in cnstituent cncentratins) and assciated reasns. 1. Click Statistical Tests Trend Analysis Time Series Plts 2. When the Data Only ptin is clicked, the fllwing windw is shwn: This ptin is used n the measured data nly. The user selects a variable with measured values which are used in generating a time series plt. The time series plt ptin is specifically useful when data cme frm multiple grups (mnitring wells during the same perid f time). Select a grup variable (is any) by using the arrw shwn belw the Grup Clumn (Optinal). 195
222 When the Optins buttn is clicked, the fllwing windw will be shwn. The user can select t display graphs individually r tgether fr all grups n the same graph by selecting the Grup Graphs ptin. The user can als display the OLS line and/r the TheilSen line fr all grups displayed n the same graph. The user may pick an initial starting value and an increment value t display the measured data. All statistics will be cmputed using the data displayed n the graphs (e.g., selected Event values). Input a starting value fr the index f the plt using the Set Initial Start Value. Input the increment steps fr the index f the plt using the Set Index/Event Increments. Specify the lines (Regressin and/r TheilSen) t be displayed n the time series plt. Select Plt Graphs Tgether ptin fr cmparing the time series trends fr mre than ne grup n the same graph. If this ptin is nt selected but a Grup Variable is selected, different graphs will be pltted fr each grup. Click n OK buttn t cntinue r n Cancel buttn t cancel the Time Series Plt. 196
223 3. When the Event/Data ptin is clicked, the fllwing windw is shwn: Select a grup variable (is any) by using the arrw shwn belw the Grup Clumn (Optinal). This ptin uses bth the Measured Data and the Event/Time Data. The user selects tw variables; ne representing the Event/Time variable and the ther representing the Measured Data values which will be used in generating a time series plt. When the Optins buttn is clicked, the fllwing windw will be shwn. The user can select t display graphs individually r tgether fr all grups n the same graph by selecting the Plt Graphs Tgether ptin. The user can als display the OLS line and/r the TheilSen line fr all grups displayed n the same graph. 197
224 Specify the lines (Regressin and/r TheilSen) t be displayed n the time series plt. Select Plt Graphs Tgether ptin fr cmparing time series trends fr mre than ne grup n the same graph. If this ptin is nt selected but a Grup Variable is selected, different graphs will be pltted fr each grup. Click n OK buttn t cntinue r n Cancel buttn t cancel the ptins. Click OK t cntinue r Cancel t cancel the Time Series Plt. Ntes: T use this ptin, each grup (e.g., well) defined by a grup variable must have the same number f bservatins and shuld share the same sampling event values (if available). That is the sampling events (e.g., quarter ID, year ID etc.) fr each grup (well) must be the same fr this ptin t wrk. Specifically, the exact sampling dates within the varius quarters (years) d nt have t be the same as lng as the sampling quarters (years) fr the varius wells match. Example The fllwing graph has three (3) time series plts cmparing manganese cncentratins f the three GW mnitring wells (1 upgradient well (MW1) and 2 dwngradient wells (MW8 and MW9)) ver the perid f 4 years (data cllected quarterly). Sme trend statistics are displayed in the side panel. Output fr a Time Series Plt Event/Data Optin by a Grup Variable (1, 8, and 9) 198
225 199
226 Chapter 15 Backgrund Incremental Sample Simulatr (BISS) Simulating BISS Data frm a Large Discrete Backgrund Data The Backgrund Incremental Sample Simulatr (BISS) mdule has been incrprated in PrUCL5.0 at the request f the Office f Superfund Remediatin and Technlgy Innvatin (OSRTI). Hwever, this mdule is currently under further investigatin and research, and therefre it is nt available fr general public use. This mdule may be released in a future versin f the PrUCL sftware, alng with strict cnditins and guidance fr hw it is applied. The main text fr this chapter is nt included in this dcument fr release t general public. Only a brief placehlder writeup is prvided here. The fllwing scenari describes the Site r prject cnditins under which the BISS mdule culd be useful: Suppse there is a lng histry f sil sample cllectin at a Site. In additin t having a large amunt f Site data, a rbust backgrund data set (at least 30 samples frm verified backgrund lcatins) has als been cllected. Cmparisn f backgrund data t nsite data has been, and will cntinue t be, an imprtant part f this prject s decisinmaking strategy. All histrical data is frm discrete samples, including the backgrund data. There is nw a desire t switch t incremental sampling fr the Site. Hwever, guidance fr incremental sampling makes it clear that it is inapprpriate t cmpare discrete sample results t incremental sample results. That includes cmparing a Site s incremental results directly t discrete backgrund results. One ptin is t recllect all backgrund data in the frm f incremental samples frm backgrund decisin units (DUs) that are designed t match Site DUs in gelgy, area, depth, target sil particle size, number f increments, increment sample supprt, etc. If prject decisinmaking uses a backgrund threshld value (BTV) strategy t cmpare Site DU results ne at a time against backgrund, then an apprpriate number (the default is n less than 10) f backgrund DU incremental samples wuld need t be cllected t determine the BTV fr the ppulatin f backgrund DUs. Hwever, if the existing discrete backgrund data shw backgrund cncentratins t be lw (in cmparisn t Site cncentratins) and fairly cnsistent (relative standard deviatin, RSD <1), there is a secnd ptin described as fllws. When a rbust discrete backgrund data set that meets the abve cnditins already exists, the fllwing is an alternative t autmatically recllecting ALL backgrund data as incremental samples. Step 1. Identify 3 backgrund DUs and cllect at least 1 incremental sample frm each fr a minimum f 3 backgrund incremental samples. Step 2. Enter the discrete backgrund data set (n 30) and the 3 backgrund incremental samples int the BISS mdule (the BISS mdule will nt run unless bth data sets are entered). The BISS mdule will generate a specified (default is 7) simulated incremental samples frm the discrete data set. The mdule will then run a ttest t cmpare the simulated backgrund incremental data set (e.g., with n = 7) t the actual backgrund incremental data set (n 3). 200
227 If the ttest finds n difference between the 2 data sets, the BISS mdule will cmbine the 2 data sets and determine the statistical distributin, mean, standard deviatin, ptential UCLs and ptential BTVs fr the cmbined data set. Only this infrmatin will be supplied t the general user. The individual values f the simulated incremental samples will nt be prvided. If the ttest finds a difference between the actual and simulated data sets, the BISS mdule will nt cmbine the data sets nr prvide a BTV. In bth cases, the BISS mdule will reprt summary statistics fr the actual and simulated data sets. Step 3. If the BISS mdule reprted ut statistical analyses frm the cmbined data set, select the BTV t use with Site DU incremental sample results. Dcument the prcedure used t generate the BTV in prject reprts. If the BISS mdule reprted that the simulated and actual data sets were different, the histrical discrete data set cannt be used t simulate incremental results. Additinal backgrund DU incremental samples will need t be cllected t btain a backgrund DU incremental data set with the number f results apprpriate fr the intended use f the backgrund data set. The bjective f the BISS mdule is t take advantage f the infrmatin prvided by the existing backgrund discrete samples. The availability f a large discrete data set cllected frm the backgrund areas with gelgical frmatins and cnditins cmparable t the Site DU(s) f interest is a requirement fr successful applicatin f this mdule. There are fundamental differences between incremental and discrete samples. Fr example, the sample supprts f discrete and incremental samples are very different. Sample supprt has a prfund effect n sample results s samples with different sample supprts shuld nt be cmpared directly, r cmpared with great cautin. Since incremental sampling is a relatively new apprach, the perfrmance f the BISS mdule requires further investigatin. If yu wuld like t try this strategy fr yur prject, r if yu have questins, cntact Deana Crumbling, 201
228 202
229 Chapter 16 Windws The Windws Menu perfrms typical Windws prgram ptins. Click n the Windw menu t reveal the drpdwn ptins shwn abve. The fllwing Windw drpdwn menu ptins are available: Cascade ptin: arranges windws in a cascade frmat.{tc "7. Windw " \l 2} This is similar t a typical Windws prgram ptin. Tile ptin: resizes each windw vertically r hrizntally and then displays all pen windws. This is similar t a typical Windws prgram ptin. The drpdwn ptins list als includes a list f all pen windws with a check mark in frnt f the active windw. Click n any f the windws listed t make that windw active. This is especially useful if yu have many windws (e.g., >40) pen; the navigatin panel nly hlds the first 40 windws. 203
230 Chapter 17 Handling the Output Screens and Graphs 17.1 Cpying and Saving Graphs Graphs can be cpied int Wrd, Excel, r PwerPint files in tw ways. 1. Click the Cpy Chart (T Clipbard) shwn belw; a graph must be present t be cpied t the clipbard. File Cpy Chart (T Clipbard) Once the user has clicked Cpy Chart (T Clipbard), the graph is ready t be imprted (pasted) int mst Micrsft ffice applicatins (e.g., Wrd, Excel, and PwerPint) by clicking the Edit Paste ptin in thse Micrsft applicatins as shwn belw. 204
231 2. Graphs can be saved using the Save Graph Optin in the Navigatin Panel as a Bitmap file with.bmp extensin. The user can imprt the saved bitmap file int a desired dcument such as a wrd dcument r a PwerPint presentatin by using the Cpy and Paste ptins available in the selected Micrsft applicatin. File Save Graph 17.2 Printing Graphs 1. Click the graph yu want t print in the Navigatin Panel. 205
232 2. Click File Page Setup. 3. Check the buttn next t Prtrait r Landscape (shwn belw), and click OK. In sme cases, with larger headings and captins, it may be desirable t use the Landscape printing ptin. 4. Click File Print t print the graph, and File Print Preview t preview (ptinal) the graph befre printing. 206
233 17.3 Printing Nngraphical Outputs 1. Click/Highlight the utput yu want t save r print in the Navigatin Panel. 2. Click File Print r File Print Preview if yu wish t see the preview befre printing. 207
234 17.4 Saving Output Screens as Excel Files PrUCL 5.0 saves utput files and data files as Excel files with.xls r.xlsx extensins. 1. Click n the utput yu want t save in the Navigatin Panel List. 2. Click File Save r File Save As 3. Enter the desired file name yu want t use, and click Save, and save the file in the desired flder using yur brwser as shwn belw. 208
An Introduction to Statistical Learning
Springer Texts in Statistics Gareth James Daniela Witten Trevr Hastie Rbert Tibshirani An Intrductin t Statistical Learning with Applicatins in R Springer Texts in Statistics 103 Series Editrs: G. Casella
More informationPCI DSS Cloud Computing Guidelines
Standard: PCI Data Security Standard (PCI DSS) Versin: 2.0 Date: February 2013 Authr: Clud Special Interest Grup PCI Security Standards Cuncil Infrmatin Supplement: PCI DSS Clud Cmputing Guidelines Table
More informationCh 1  Establishing And Monitoring Contract Type
Ch 1  Establishing And Mnitring Cntract Type 1.0  Chapter Intrductin 1.1  Matching Cntract Type t Cntract Risk 1.2  Utilizing FixedPrice Ecnmic Price Adjustment Cntracts 1.2.1  Establishing Terms
More informationns Rev. 0 (3.9.15) Reporting water MDL is allowable) Preparatory Method Analysis Method The MDL programs and by covered The LOD reporting?
NR149 LOD/ /LOQ Clarificatin Required frequency Annually an MDL study must be perfrmed fr each cmbinatin f the fllwing: Matrix (if the slid and aqueus matrix methds are identical, extraplatin frm the water
More informationZimbra Collaboration Suite Advanced Web Client User Guide. Version 5.0
Zimbra Cllabratin Suite Advanced Web Client User Guide Versin 5.0 Zimbra Web Client User Guide Cpyright Ntice Cpyright 2008 Zimbra, Inc. All rights reserved. This dcument cntains cnfidential, prprietary
More informationHow to use Moodle 2.7. Teacher s Manual for the world s most popular LMS. Jaswinder Singh
Teacher s Manual fr the wrld s mst ppular LMS Jaswinder Singh Hw t Use Mdle 2.7 2 Hw t use Mdle 2.7, 1 st Editin Teacher s Manual fr the wrld s mst ppular LMS Jaswinder Singh 3 This bk is dedicated t my
More informationReplacement Cost New Property Insurance Appraisal Report
Replacement Cst New Prperty Insurance Appraisal Reprt Mnth 1 st, 20xx fr Any Grup City, Prvince Table f Cntents Transmittal Letter... 2 Purpse f the Appraisal... 4 Effective Date... 4 Reprt Cntents...
More informationxdb Configuration Guide
Sitecre 7.5 xdb Cnfiguratin Guide Rev: 9 April 2015 Sitecre 7.5 xdb Cnfiguratin Guide Cnfiguratin guide fr Sitecre administratrs and develpers Sitecre 7.5 Table f Cntents Chapter 1 Intrductin... 3 1.1
More informationCertification Handbook. The IIBA guide to gaining the CBAP designation.
Certificatin Handbk The IIBA guide t gaining the CBAP designatin. April 16, 2014 Table f Cntents Table f Cntents... 2 1.0 Abut this Handbk... 3 2.0 Abut Internatinal Institute f Business Analysis... 3
More informationA Call for Clarity: Open Questions on the Scope of FDA Regulation of mhealth. A whitepaper prepared by the mhealth Regulatory Coalition
A Call fr Clarity: Open Questins n the Scpe f FDA Regulatin f mhealth A whitepaper prepared by the mhealth Regulatry Calitin December 22, 2010 Authrs Bradley Merrill Thmpsn Epstein, Becker & Green P.C.
More informationThe Elements of Statistical Learning
Springer Series in Statistics Trevr Hastie Rbert Tibshirani Jerme Friedman The Elements f Statistical Learning Data Mining, Inference, and Predictin Secnd Editin This is page v Printer: paque this T ur
More informationo Monitoring Business Critical Applications with VMware vcenter Operations Manager
Mnitring Business Critical Applicatins with VMware vcenter Operatins Manager Mnitring Business Critical Applicatins with This prduct is prtected by U.S. and internatinal cpyright and intellectual prperty
More informationDepartment of State Development, Infrastructure and Planning. State Planning Policy state interest guideline. Water quality
Department f State Develpment, Infrastructure and Planning State Planning Plicy state interest guideline Water quality August 2014 Great state. Great pprtunity. Preface Using this state interest guideline
More information
THE INTERNATIONAL FRAMEWORK
THE INTERNATIONAL FRAMEWORK ABOUT THE IIRC The Internatinal Integrated Reprting Cuncil (IIRC) is a glbal calitin f regulatrs, investrs, cmpanies, standard setters, the accunting prfessin and NGOs.
More informationVersion 12.6. Accessibility Guide for Moderators
Versin 12.6 Accessibility Guide fr Mderatrs January 3, 2014 Table f Cntents Preface 1 Cnventins Used in this Guide 1 Variables 1 Ntes 2 Typgraphical Cnventins 2 Getting Help 3 Dcumentatin and Learning
More informationBest Practices Guide for Provisioning Services and XenApp
White Paper Citrix Cnsulting Best Practices Guide fr Prvisining Services and XenApp Designing an enterprise slutin fr the fast prvisining f XenApp servers Table f cntents Best Practices Guide fr Prvisining
More informationCLINICAL MENTAL HEALTH COUNSELING & SCHOOL COUNSELING
Cllege f Educatin, Health and Human Services (EHHS) Schl f Lifespan Develpment and Educatinal Sciences (LDES) THE COUNSELING AND HUMAN DEVELOPMENT SERVICES MASTER'S PROGRAMS BROCHURE & STUDENT HANDBOOK
More informationNo Unsafe Lift. Workbook
N Unsafe Lift Wrkbk Cver and Sectin Break image prvided curtesy f Arj Canada Inc. Table Of Cntents Purpse f this wrkbk... 2 Hw t use this wrkbk...3 SECTION ONE A Brief Review f the Literature...5 SECTION
More informationSECURITY GUIDANCE FOR CRITICAL AREAS OF FOCUS IN CLOUD COMPUTING V3.0
SECURITY GUIDANCE FOR CRITICAL AREAS OF FOCUS IN CLOUD COMPUTING V3.0 INTRODUCTION The guidance prvided herein is the third versin f the Clud Security Alliance dcument, Security Guidance fr Critical Areas
More informationWhite Paper www.amdosoft.com
Business Prcess Prtectrs Business Service Management Active Errr Identificatin Event Driven Autmatin Errr Handling and Escalatin Intelligent Ntificatin Prcess Reprting IT Management Business and IT Autmatin
More informationEnvironmental, Health, and Safety General Guidelines
GENERAL EHS GUIDELINES: INTRODUCTION Envirnmental, Health, and Safety General Guidelines Intrductin The Envirnmental, Health, and Safety (EHS) Guidelines are technical reference dcuments with general and
More informationVersion 11. Elluminate Plan! v2.1 User's Guide
Versin 11 Elluminate Plan! v2.1 User's Guide December, 2011 Table f Cntents Preface 1 Audience 1 Hw t Use this Guide 1 Cnventins Used in this Guide 2 Operating System Differences 2 Variables 2 Ntes 3
More informationCreación de un motor de render para una plataforma móvil
UNIVERSIDAD CARLOS III DE MADRID INGENIERÍA INFORMÁTICA GRUPO DE INTELIGENCIA ARTIFICIAL APLICADA Creación de un mtr de render para una platafrma móvil Autr: Tutr: Antni Berlanga de Jesús Fecha: Ener 2008
More informationCALL CENTER APPLICATIONS. Training Catalogue. Call Processing, Mapping, Data Management / Reporting. January 2015
CALL CENTER APPLICATIONS Call Prcessing, Mapping, Data Management / Reprting Training Catalgue January 2015 Airbus DS Cmmunicatins CCA Training Catalgue January 2015 CRITICAL MATTERS 1 2 Airbus DS Cmmunicatins
More informationThe Total Economic Impact Of KPN s Managed Video Services
A Frrester Ttal Ecnmic Impact Study Prepared Fr KPN The Ttal Ecnmic Impact Of KPN s Managed Vide Services As Used By A Large Financial Service Organizatin Prject Directr: Sebastian Selhrst March 2012 TABLE
More informationFORMAT FOR APPRAISAL OF NETWORK SUPPORT ORGANIZATIONS
FORMAT FOR APPRAISAL OF NETWORK SUPPORT ORGANIZATIONS September 2004 Cntents Acknwledgments...iv Intrductin...1 Methdlgy...3 1 Summary Analysis Reprt...7 1.1 Overview f the Netwrk...7 1.2 NSO Services
More informationTowards Supporting the Adoption of Software Reference Architectures: An EmpiricallyGrounded Framework
Twards Supprting the Adptin f Sftware Reference Architectures: An EmpiricallyGrunded Framewrk Silveri MartínezFernández Universitat Plitècnica de Catalunya Jrdi Girna, 13 08034, Barcelna (Spain) +34
More informationEuropean Investment Bank. Guide to Procurement
GUIDE TO PROCUREMENT fr prjects financed by the EIB Updated versin f June 2011 TABLE OF CONTENTS Intrductin 1. General Aspects...4 1.1. The Bank s Plicy... 4 1.2. Eligibility f Cntractrs and Suppliers
More informationCyber Defence Exercise Locked Shields 2013. After Action Report
Cyber Defence Exercise Lcked Shields 2013 After Actin Reprt Tallinn 2013 1 Executive Summary This reprt describes the technical cyber defence exercise (CDX) named Lcked Shields 2013 (LS13). The intended
More informationCloud PBX Master Service Agreement
Clud PBX Master Service Agreement Versin 1.2 Updated 7/1/2012 http://www.vipcnnectins.cm 1 frsupprt@vipcnnectins.cm This Master Service Agreement (this Agreement ) is entered int this day f ( Effective
More information