Data Quality Assessment: A Reviewer s Guide EPA QA/G-9R

Size: px
Start display at page:

Download "Data Quality Assessment: A Reviewer s Guide EPA QA/G-9R"

Transcription

1 United States Office of Environmental EPA/240/B-06/002 Environmental Protection Information Agency Washington, DC Data Quality Assessment: A Reviewer s Guide EPA QA/G-9R

2

3 FOREWORD This document is the 2006 version of the Data Quality Assessment: A Reviewer s Guide which provides general guidance to organizations on assessing data quality criteria and performance specifications for decision making. The Environmental Protection Agency (EPA) has developed a process for performing the Data Quality Assessment (DQA) Process for project managers and planners to determine whether the type, quantity, and quality of data needed to support Agency decisions have been achieved. This guidance is the culmination of experiences in the design and statistical analyses of environmental data in different Program Offices at the EPA. Many elements of prior guidance, statistics, and scientific planning have been incorporated into this document. This document is one of a series of quality management guidance documents that the EPA Quality Staff has prepared to assist users in implementing the Agency-wide Quality System. Other related documents include: EPA QA/G-4 Guidance on Systematic Planning using the Data Quality Objectives Process EPA QA/G-4D DEFT Software for the Data Quality Objectives Process EPA QA/G-5S Guidance on Choosing a Sampling Design for Environmental Data Collection EPA QA/G-9S Data Quality Assessment: Statistical Methods for Practitioners This document is intended to be a "living document" that will be updated periodically to incorporate new topics and revisions or refinements to existing procedures. Comments received on this 2006 version will be considered for inclusion in subsequent versions. Please send your written comments on the Data Quality Assessment: A Reviewer s Guide to: Quality Staff (2811R) Office of Environmental Information U.S. Environmental Protection Agency 1200 Pennsylvania Avenue N.W. Washington, DC Phone: (202) Fax: (202) EPA QA/G-9R iii

4 EPA QA/G-9R iv

5 TABLE OF CONTENTS Page INTRODUCTION Purpose of this Guidance DQA and the Data Life Cycle The Five Steps of Statistical DQA Intended Audience Organization of this Guidance... 5 STEP 1: REVIEW THE PROJECT S OBJECTIVES AND SAMPLING DESIGN Review Study Objectives Translate Study Objectives into Statistical Terms Developing Limits on Uncertainty Review Sampling Design What Outputs Should a DQA Reviewer Have at the Conclusion of Step 1? STEP 2: CONDUCT A PRELIMINARY DATA REVIEW Review Quality Assurance Reports Calculate Basic Statistical Quantities Graph the Data What Outputs Should a DQA Reviewer Have at the Conclusion of Step 2? STEP 3: SELECT THE STATISTICAL METHOD Choosing Between Alternatives: Hypothesis Testing Estimating a Parameter: Confidence Intervals and Tolerance Intervals What Output Should a DQA Reviewer Have at the Conclusion of Step 3? STEP 4: VERIFY THE ASSUMPTIONS OF THE STATISTICAL METHOD Perform Tests of Assumptions Develop an Alternate Plan Corrective Actions What Outputs Should a DQA Reviewer Have at the End of Step 4? STEP 5: DRAW CONCLUSIONS FROM THE DATA Perform the Statistical Method Draw Study Conclusions Hypothesis Tests Confidence Intervals Tolerance Intervals Evaluate Performance of the Sampling Design What Output Should the DQA Reviewer Have at the End of Step 5? INTERPRETING AND COMMUNICATING THE TEST RESULTS Data Interpretation: The Meaning of p-values Data Interpretation: "Accepting" vs. "Failing to Reject" the Null Hypothesis Data Sufficiency: "Proof of Safety" vs. "Proof of Hazard" Data Sufficiency: Quantity vs. Quality of Data Data Sufficiency: Statistical Significance vs. Practical Significance Conclusions REFERENCES EPA QA/G-9R v

6 Page Appendix A: Commonly Used Statistical Quantities Appendix B: Graphical Representation of Data Appendix C: Common Hypothesis Tests Appendix D: Commonly Used Statements of Hypotheses Appendix E: Common Assumptions and Transformations Appendix F: Checklist of Outputs for Data Quality Assessment EPA QA/G-9R vi

7 CHAPTER 0 INTRODUCTION 0.1 Purpose of this Guidance Data Quality Assessment (DQA) is the scientific and statistical evaluation of environmental data to determine if they meet the planning objectives of the project, and thus are of the right type, quality, and quantity to support their intended use. This guidance describes broadly the statistical aspects of DQA in evaluating environmental data sets. A more detailed discussion about DQA graphical and statistical tools may be found in the companion guidance document, Data Quality Assessment: Statistical Methods for Practitioners (Final Draft) (EPA QA/G-9S) (U.S. EPA 2004). This guidance applies to using DQA to support environmental decision-making (e.g., compliance determinations), and to using DQA in estimation problems in which environmental data are used (e.g., monitoring programs). DQA is built on a fundamental premise: data quality is meaningful only when it relates to the intended use of the data. Data quality does not exist in a vacuum, a reviewer needs to know in what context a data set is to be used in order to establish a relevant yardstick for judging whether or not the data is acceptable. By using DQA, a reviewer can answer four important questions: 1. Can a decision (or estimate) be made with the desired level of certainty, given the quality of the data? 2. How well did the sampling design perform? 3. If the same sampling design strategy is used again for a similar study, would the data be expected to support the same intended use with the desired level of certainty? 4. Is it likely that sufficient samples were taken to enable the reviewer to see an effect if it was really present? The first question addresses the reviewer s immediate needs. For example, if the data are being used for decision-making and provide evidence strongly in favor of one course of action over another, then the decision maker can proceed knowing that the decision will be supported by unambiguous data. However, if the data do not show sufficiently strong evidence to favor one alternative, then the data analysis alerts the decision maker to this uncertainty. The decision maker now is in a position to make an informed choice about how to proceed (such as collect more or different data before making the decision, or proceed with the decision despite the relatively high, but tolerable, chance of drawing an erroneous conclusion). The second question addresses how robust this sampling design is with respect to changing conditions. If the design is very sensitive to potentially disturbing influences, then EPA QA/G-9R 1

8 interpretation of the results may be difficult. By addressing the second question the reviewer guards against the possibility of a spurious result arising from a unique set of circumstances. The third question addresses the problem of whether this could be considered a unique situation where the results of this DQA only applies to this situation only and the conclusions cannot be extrapolated to similar situations. It also addresses the suitability of using this data collection design cannot reviewer s potential future needs. For example, if reviewers intend to use a certain sampling design at a different location from where the design was first used, they should determine how well the design can be expected to perform given that the outcomes and environmental conditions of this sampling event will be different from those of the original event. As environmental conditions will vary from one location or one time to another, the adequacy of the sampling design should be evaluated over a broad range of possible outcomes and conditions. The final question addresses the issue of whether sufficient resources were used in the study. For example, in an epidemiological investigation, was it likely the effect of interest could be reliably observed given the limited number of samples actually obtained. 0.2 DQA and the Data Life Cycle The data life cycle (depicted in Figure 0-1) comprises three steps: planning, implementation, and assessment. During the planning phase, a systematic planning procedure (such as the Data Quality Objectives (DQO) Process) is used to define criteria for determining the number, location, and timing of samples (measurements) to be collected in order to produce a result with a desired level of certainty. This information, along with the sampling methods, analytical procedures, and appropriate quality assurance (QA) and quality control procedures, is documented in the QA Project Plan. Data are then collected following the QA Project Plan specifications in the implementation phase. At the outset of the assessment phase, the data are verified and validated to ensure that the sampling and analysis protocols specified in the QA Project Plan were followed, and that the measurement systems were performed in accordance with the criteria specified in the QA Project Plan. Then the statistical component of DQA completes the data life cycle by providing the evaluation needed to determine if the performance and acceptance criteria developed by the DQO planning process were achieved. EPA QA/G-9R 2

9 QUALITY ASSURANCE ASSESSMENT PLANNING Data Quality Objectives Process Quality Assurance Project Plan Development IMPLEMENTATION Field Data Collection and Associated Quality Assurance / Quality Control Activities QC/Performance Routine Data Evaluation Data INPUTS DATA VERIFICATION /VALIDATION Verify measurement performance Verify measurement procedures and reporting specifications OUTPUT VERIFIED /VALIDATED DATA INPUT ASSESSMENT Data Verification/ Validation Data Quality Assessment DATA QUALITY ASSESSMENT Review project objectives and sampling design Conduct preliminary data review Select statistical method Verify assumptions of the method Draw conclusions from the data OUTPUT PROJECT CONCLUSIONS 0.3 The Five Steps of Statistical DQA Figure 0-1. Data Life Cycle The statistical part of DQA involves five steps that begin with a review of the planning documentation and end with an answer to the problem or question posed during the planning phase of the study. These steps roughly parallel the actions of an environmental statistician when analyzing a set of data. The five steps, which are described in more detail in the following chapters of this guidance, are briefly summarized as follows: 1. Review the project s objectives and sampling design: Review the objectives defined during systematic planning to assure that they are still applicable. If objectives have not been developed (e.g., when using existing data independently collected), specify them before evaluating the data for the projects objectives. Review the sampling design and data collection documentation for consistency with the project objectives observing any potential discrepancies. 2. Conduct a preliminary data review: Review QA reports (when possible) for the validation of data, calculate basic statistics, and generate graphs of the data. Use this information to learn about the structure of the data and identify patterns, relationships, or potential anomalies. 3. Select the statistical method: Select the appropriate procedures for summarizing and analyzing the data, based on the review of the performance and acceptance EPA QA/G-9R 3

10 criteria associated with the projects objectives, the sampling design, and the preliminary data review. Identify the key underlying assumptions associated with the statistical test. 4. Verify the assumptions of the statistical method: Evaluate whether the underlying assumptions hold, or whether departures are acceptable, given the actual data and other information about the study. 5. Draw conclusions from the data: Perform the calculations pertinent to the statistical test, and document the conclusions to be drawn as a result of these calculations. If the design is to be used again, evaluate the performance of the sampling design. Although these five steps are presented in a linear sequence, DQA is by its very nature iterative. For example, if the preliminary data review reveals patterns or anomalies in the data set that are inconsistent with the project objectives, then some aspects of the study analysis may have to be reconsidered. Likewise, if the underlying assumptions of the statistical test are not supported by the data, then previous steps of the DQA may have to be revisited. The strength of DQA Process is that it is designed to promote an understanding of how well the data satisfy their intended use by progressing in a logical and efficient manner. Nevertheless, it should be realized that DQA cannot absolutely prove that the objectives set forth in the planning phase of a study have been achieved. This is because the reviewer can never know the true value of the item of interest, only information from a sample. Sample data collection provides the reviewer only with an estimate, not the true value. As an reviewer makes a determination based on the estimated value, there is always the risk of drawing an incorrect conclusion. Use of a well-documented planning process helps reduce this risk to an acceptable level. 0.4 Intended Audience This guidance is written as a general overview of statistical DQA for a broad audience of potential data users, reviewers, data generators and data investigators. Reviewers (such as project managers, risk assessors, or principal investigators who are responsible for making decisions or producing estimates regarding environmental characteristics based on environmental data) should find this guidance useful for understanding and directing the technical work of others who produce and analyze data. Data generators (such as analytical chemists, field sampling specialists, or technical support staff responsible for collecting and analyzing environmental samples and reporting the resulting data values) should find this guidance helpful for understanding how their work will be used. Data investigators (such as technical investigators responsible for evaluating the quality of environmental data) should find this guidance to be a handy summary of DQA-related concepts. Specific information about applying DQA-related graphical and statistical techniques is contained in the companion guidance, Data Quality Assessment: Statistical Methods for Practitioners (Final Draft) (EPA QA/G-9S). EPA QA/G-9R 4

11 0.5 Organization of this Guidance Chapters 1 through 5 of this guidance address the five steps of DQA in turn. Each chapter discusses the activities expected and includes a list of the outputs that should be achieved in that step. Chapter 6 provides additional perspectives on how to interpret data and understand/ communicate the conclusions drawn from data. Finally, Appendices A through E contain nontechnical explanatory material describing some of the statistical concepts used. Appendix F is a checklist that can be used to ensure all steps of the DQA process have been addressed. EPA QA/G-9R 5

12 EPA QA/G-9R 6

13 CHAPTER 1 STEP 1: REVIEW THE PROJECT S OBJECTIVES AND SAMPLING DESIGN DQA begins by reviewing the key outputs from the planning phase of the data life cycle such as the Data Quality Objectives, the QA Project Plan, and any related documents. The study objective provides the context for understanding the purpose of the data collection effort and establishes the qualitative and quantitative basis for assessing the quality of the data set for the intended use. The sampling design (documented in the QA Project Plan) provides important information about how to interpret the data. By studying the sampling design, the reviewer can gain an understanding of the assumptions under which the design was developed, as well as the relationship between these assumptions and the study objective. By reviewing the methods by which the samples were collected, measured, and reported, the reviewer prepares for the preliminary data review and subsequent steps of DQA. Systematic planning improves the representativeness and overall quality of a sampling design, the effectiveness and efficiency with which the sampling and analysis plan is implemented, and the usefulness of subsequent DQA efforts. For systematic planning, the Agency recommends the DQO Process, a logical, systematic planning process based on the scientific method. The DQO Process emphasizes the planning and development of a sampling design to collect the right type, quality, and quantity of data for the intended use. Employing both the DQO Process and DQA will help to ensure that projects are supported by data of adequate quality; the DQO Process does so prospectively and DQA does so retrospectively. Systematic planning, whether the DQO Process or other, can help assure that data are not collected spuriously. The DQO Process is discussed in Guidance on the Data Quality Objectives Process (QA/G-4) (U.S. EPA 2000a). Step 1. State the Problem Define the problem that motivates the study; Identify the planning team; examine budget, schedule. Step 2. Identify the Goal of the Study State how environmental data will be used in solving the problem; identify study questions; define alternative outcomes. Step 3. Identify Information Inputs Identify data and information needed to answer study questions. Step 4. Define the Boundaries of the Study Specify the target population and characteristics of interest; define spatial and temporal limits, scale of inference. Step 5. Develop the Analytic Approach Define the parameter of interest; specify the type of inference and develop logic for drawing conclusions from the findings. Statistical Hypothesis Testing Estimation and other analytical approaches Step 6. Specify Performance or Acceptance Criteria Develop performance criteria for new data being collected, acceptance criteria for data already collected. Step 7. Develop the Detailed Plan for Obtaining Data Select the most resource- effective sampling and analysis plan that satisfies the performance or acceptance criteria. In instances where project objectives have not been developed and documented during the planning phase of the study, it is necessary to recreate some of Figure 1-1. The Data Quality the project objectives prior to conducting the DQA. Objectives Process These are used to make appropriate criteria for evaluating the quality of the data with respect to their intended use. The most important recreations are: hypotheses chosen, level of significance selected (tolerable levels of potential decision errors), statistical method selected, and number of samples collected. The seven steps of the DQO Process are illustrated in Figure 1-1. EPA QA/G-9R 7

14 1.1 Review Study Objectives First, the objectives of the study should be reviewed in order to provide a context for analyzing the data. If a systematic planning process has been implemented before the data are collected, then this step reduces to reviewing the documentation on the study objectives. If no clear planning process was used, the reviewer should: Develop a concise definition of the problem (e.g. DQO Process Step 1) and of the methodology of how the data were collected (e.g. DQO Process Step 2). This should provide the fundamental reason for collecting the environmental data and identify all potential actions that could result from the data analysis. Identify the target population (universe of interest) and determine if any essential information is missing (e.g. DQO Process Step 3). If so, either collect the missing information before proceeding, or select a different approach to resolving the problem. Specify the scale of determination (any subpopulations of interest) and any boundaries on the study (e.g. DQO Process Step 4) based on the sampling design. The scale of determination is the smallest area or time period to which the conclusions of the study will apply. The apparent sampling design and implementation may restrict how small or how large this scale of determination can be. 1.2 Translate Study Objectives into Statistical Terms In this activity, the reviewer's objectives are used to develop a precise statement of how environmental data will be evaluated to generate the study s conclusions. If DQOs were generated during planning, this statement will be found as an output of DQO Process Step 5. In many cases, this activity is best accomplished by the formulation of statistical hypotheses, including a null hypothesis, which is a "baseline condition" that is presumed to be true in the absence of strong evidence to the contrary, as well as an alternative hypothesis, which bears the burden of proof. In other words, the baseline condition will be retained unless the alternative condition (the alternative hypothesis) is thought to be true due to the preponderance of evidence. In general, such hypotheses often consist of the following elements: a population parameter of interest (such as a mean or a median), which describes the feature of the environment that the reviewer is investigating; a numerical value to which the parameter will be compared, such as a regulatory or riskbased threshold or a similar parameter from another place (e.g., comparison to a reference site) or time (e.g., comparison to a prior time); and EPA QA/G-9R 8

15 a relationship (such as "is equal to" or "is greater than") that specifies precisely how the parameter will be compared to the numerical value. Section 3.1 provides additional information on how to develop the statement of hypotheses, and includes a list of commonly encountered hypotheses for environmental projects. Some environmental data collection efforts do not involve the direct comparison of measured values to a fixed value. For instance, for monitoring programs or exploratory studies, the goal may be to develop estimates of values or ranges applicable to given parameters. This is best accomplished by the formulation of confidence intervals or tolerance intervals, which estimate the probability that the true value of a parameter is within a given range. In general, confidence intervals consist of the following elements: a range of values with in which the unknown population parameter of interest (such as the mean or median) is thought to lie; and a probabilistic expression denoting the chance that this range captures the parameter of interest. An example of a confidence interval would be We are 95% confident that the interval 47.3 to 51.8 contains the population mean. Tolerance intervals are used with proportions. Here, we wish to have a certain level of confidence that a certain proportion of the population falls in a certain region. An example of a tolerance interval would be We are 95% confident that at least 80% of the population is above the threshold value. Section 3.2 provides additional information on confidence intervals and tolerance intervals. For discussion of technical issues related to statistical testing using hypotheses or confidence/tolerance intervals, refer to Chapter 3 of Data Quality Assessment: Statistical Methods for Practitioners (Final Draft) (EPA QA/G-9S). 1.3 Developing Limits on Uncertainty The goal of this activity is to develop quantitative statements of the reviewer s tolerance for uncertainty in conclusions drawn from the data and in actions based on those conclusions. These statements are generated during DQO Process Step 6, but they can also be generated retrospectively as part of DQA. If the project has been framed as a hypothesis test, then the uncertainty limits can be expressed as the reviewer's tolerance for committing false rejection (Type I, sometimes called a false positive) or false acceptance (Type II, sometimes called a false negative) decision errors 1. 1 Decision errors occur when the data collected do not adequately represent the population of interest. For example, the limited amount of information collected may have a preponderance of high values that were sampled by pure EPA QA/G-9R 9

16 A false rejection error occurs when the null hypothesis is rejected when it is, in fact, true. A false acceptance error occurs when the null hypothesis is not rejected (often called accepted ) when it is, in fact, false. Other related phrases in common use include "level of significance" which is equal to the Type I error (false rejection) rate, and "power" which is equal to 1 - Type II error (false acceptance) rate. When a hypothesis is being tested, it is convenient to summarize the applicable uncertainty limits by means of a decision performance goal diagram. For detailed information on how to develop false rejection and false acceptance decision error rates, see Chapter 6 of Guidance on the Data Quality Objectives Process (QA/G-4) (U.S. EPA 2000a). If the project has been framed in terms of confidence intervals, then uncertainty is expressed as a combination of two interrelated terms: the width of the interval (smaller intervals correspond to a smaller degree of uncertainty); and the confidence level (typically stated as a percentage) indicating the chance this interval captures the unknown parameter of interest (a 95% confidence level represents a smaller degree of uncertainty than, say, a 90% confidence level). If the project has been framed in terms of tolerance intervals, then uncertainty is expressed as a combination of confidence level and: the proportion of the population that lies in the interval. Note that there is nothing inherently preferable about obtaining a particular probability, such as 95% for the confidence interval. For the same data set, there can be a 95% probability that the parameter lies within a given interval, as well as a 90% probability that it lies within another (smaller) interval, and an 80% probability of being in even a smaller interval. All the intervals are centered on the best estimate of that parameter usually calculated directly from the data (see also Chapter 3.2). 1.4 Review Sampling Design The goal of this activity is to familiarize the reviewer with the main features of the sampling design that was used to generate the environmental data. If DQOs were developed during planning, the sampling design will have been summarized as part of DQO Process Step 7. The design and sampling strategy should be discussed in clear detail in the QA Project Plan or Sampling and Analysis Plan. The overall type of sampling design and the manner in which samples were collected or measurements were taken will place conditions and constraints on how the data can be used and interpreted. chance. A decision maker could possibly draw the conclusion (decision) that the target population was high when, in fact, it was much lower. A similar situation occurs when the data are collected according to a plan that is too limited to reflect the true underlying variability. EPA QA/G-9R 10

17 The key distinction in sampling design is between judgmental (also called authoritative) sampling (in which sample numbers and locations are selected based on expert knowledge of the problem) and probability sampling (in which sample numbers and locations are selected based on randomization, and each member of the target population has a known probability of being included in the sample). Judgmental sampling has some advantages and is appropriate in some cases, but the reviewer should be aware of its limitations and drawbacks. This type of sampling should be considered only when the objectives of the investigation are not of a statistical nature (for example, when the objective of a study is to identify specific locations of leaks, or when the study is focused solely on the sampling locations themselves). Generally, conclusions drawn from judgmental samples apply only to those individual samples; aggregation may result in severe bias due to lack of representativeness and lead to highly erroneous conclusions. Judgmental sampling, although often rapid to implement, precludes the use of the sample for any purpose other than the original one. If the reviewer elects to proceed with judgmental data, then great care should be taken in interpreting any statistical statements concerning the conclusions to be drawn. Using a probabilistic statement with a judgmental sample is incorrect and should be avoided as it gives an illusion of correctness where there is none. The further the judgmental sample is from a truly random sample, the more questionable the conclusions. Probabilistic sampling is often more difficult to implement than judgmental sampling due to the difficulty of locating the random locations of the samples. It does have the advantage of allowing probability statements to be made about the quality of estimates or hypothesis tests that are derived from the resultant data. One common misconception of probability sampling procedures is that these procedures preclude the use of expert knowledge or important prior information about the problem. Indeed, just the opposite is true; an efficient sampling design is one that uses all available prior information to stratify the region (in order to improve the representativeness of the resulting samples) and set appropriate probabilities of selection. Common types of probabilistic sampling designs include the following: Simple random sampling the method of sampling where samples are collected at random times or locations throughout the sampling period or study area. Stratified sampling a sampling method where a population is divided into nonoverlapping sub-populations called strata and sampling locations are selected randomly within each stratum using some sampling design. Systematic sampling a randomly selected unit (in space or time) establishes the starting place of a systematic pattern that is repeated throughout the population. With some important assumptions, can be shown to be equivalent to simple random sampling. EPA QA/G-9R 11

18 Ranked set sampling a field sampling design where expert judgment or an auxiliary measurement method is used in combination with simple random sampling to determine which locations should be sampled. Adaptive cluster sampling a sampling method in which some samples are taken using simple random sampling, and additional samples are taken at locations where measurements exceed some threshold value. Composite sampling a sampling method in which multiple samples are physically mixed into a larger sample and samples for analysis drawn from this larger sample. This technique can be highly cost-effective (but at the expense of variability estimation) and had the advantage it can be used in conjunction with any other sampling design. The document Guidance on Choosing a Sampling Design for Environmental Data Collection (EPA QA/G-5S) (U.S. EPA 2002) provides extensive information on sampling design issues and their implications for data interpretation. Regardless of the type of sampling scheme, the reviewer should review the sampling design documentation and look for design features that support the project s objectives. For example, if the reviewer is interested in making a decision about the mean level of contamination in an effluent stream over time, then composite samples may be an appropriate sampling approach. On the other hand, if the reviewer is looking for hot spots of contamination at a hazardous waste site, compositing should be used with caution, to avoid "averaging away" hot spots. Also, look for potential problems in the implementation of the sampling design. For example, if simple random sampling has been used, can the reviewer be confident this was actually achieved in the actual selection of data point? Small deviations from a sampling plan probably have minimal effect on the conclusions drawn from the data set, but significant or substantial deviations should be flagged and their potential effect carefully considered. The most important point is to verify that the collected data are consistent with how the QA Project Plan, Sampling and Analysis Plan, or overall objectives of the study stated them to be. 1.5 What Outputs Should a DQA Reviewer Have at the Conclusion of Step 1? There are three outputs a DQA reviewer should have documented at the conclusion of Step 1: 1. Well-defined project objectives and criteria, 2. Verification that the hypothesis or estimate chosen is consistent with the project s objective and meets the project s performance and acceptance criteria, and 3. A list of any deviations from the planned sampling design and the potential effects of these deviations. EPA QA/G-9R 12

19 CHAPTER 2 STEP 2: CONDUCT A PRELIMINARY DATA REVIEW The principal goal of this step of the process is to review the calculation of some basic statistical quantities, and review any graphical representations of the data. By reviewing the data both numerically and graphically, one can learn the "structure" of the data and thereby identify appropriate approaches and limitations for using the data. There are two main elements of preliminary data review: (1) basic statistical quantities (summary statistics) and (2) graphical representations of the data. Statistical quantities are functions of the data that numerically describe the data and include the sample mean, sample median, sample percentiles, sample range, and sample standard deviation. These quantities, known as estimates, condense the data and are useful for making inferences concerning the population from which the data were drawn. Graphical representations are used to identify patterns and relationships within the data, confirm or disprove assumptions, and identify potential problems. The preliminary data review step is designed to make the reviewer familiar with the data. The review should identify anomalies that could indicate unexpected events that may influence the analysis of the data. 2.1 Review Quality Assurance Reports When sufficient documentation is present, the first activity is to review any relevant QA reports that describe the data collection and reporting process as it was actually implemented. These QA reports provide valuable information about potential problems or anomalies in the data set. Specific items that may be helpful include: Data verification and validation reports that document the sample collection, handling, analysis, data reduction, and reporting procedures used; Quality control reports from laboratories or field stations that document measurement system performance. These QA reports are useful when investigating data anomalies that may affect critical assumptions made to ensure the validity of the statistical tests. In many cases, such as the evaluation of data cited in a publication, these reports may be unobtainable. Auxiliary questions such as Has this project or data set been peer reviewed?, Were the peer reviewers chosen independently of the data generators?, and Is there evidence to persuade me that the appropriate QA protocols have been observed?, should be asked to assess the integrity of the data. Without some form of positive response to these questions, it is difficult to assess the validity of the data and the resulting conclusions. The purpose of this EPA QA/G-9R 13

20 validity inspection of the data is to assure a firm foundation exists to support the conclusions drawn from the data. 2.2 Calculate Basic Statistical Quantities The basic quantitative characteristics of the data using common statistical quantities ato be expected of almost any quantitative study. It is often useful to prepare a table of descriptive statistics for each population when more than one is being studied (e.g., background compared to a potentially contaminated site) so that obvious differences between the populations can be identified. Commonly used statistical quantities and the differences between them are discussed in Appendix A 2.3 Graph the Data The visual display of data is used to identify patterns and trends in the data that might go unnoticed using purely numerical methods. Graphs can be used to identify these patterns and trends, to quickly confirm or disprove hypotheses, to discover new phenomena, to identify potential problems, and to suggest corrective measures. In addition, some graphical representations can be used to record and store data compactly or to convey information to others. Plots and graphs of the data are very valuable tools for stakeholder interactions and often provide an immediate understanding of the important characteristics of the data. Graphical representations include displays of individual data points, statistical quantities, temporal data, or spatial data. Since no single graphical representation will provide a complete picture of the data set, the reviewer should choose different graphical techniques to illuminate different features of the data. At a minimum, there should be a graphical representation of the individual data points and a graphical representation of the statistical quantities. If the data set consists of more than one variable, each variable should be treated individually before developing graphical representations for the multiple variables. If the sampling plan or suggested analysis methods rely on any critical assumptions, consider whether a particular type of graph might shed light on the validity of that assumption. Usually, graphs should be applied to each group of data separately or each data set should be represented by a different symbol. There are many types of graphical displays that can be applied to environmental data; a variety of data plots are shown in Appendix B. 2.4 What Outputs Should a DQA Reviewer Have at the Conclusion of Step 2? At the conclusion, two main outputs should be present: 1. Basic statistical quantities should have been calculated, and 2. Graphs showing different aspects of the data should have been developed. EPA QA/G-9R 14

Quality. Guidance for Data Quality Assessment. Practical Methods for Data Analysis EPA QA/G-9 QA00 UPDATE. EPA/600/R-96/084 July, 2000

Quality. Guidance for Data Quality Assessment. Practical Methods for Data Analysis EPA QA/G-9 QA00 UPDATE. EPA/600/R-96/084 July, 2000 United States Environmental Protection Agency Office of Environmental Information Washington, DC 060 EPA/600/R-96/08 July, 000 Guidance for Data Quality Assessment Quality Practical Methods for Data Analysis

More information

APPENDIX E THE ASSESSMENT PHASE OF THE DATA LIFE CYCLE

APPENDIX E THE ASSESSMENT PHASE OF THE DATA LIFE CYCLE APPENDIX E THE ASSESSMENT PHASE OF THE DATA LIFE CYCLE The assessment phase of the Data Life Cycle includes verification and validation of the survey data and assessment of quality of the data. Data verification

More information

Organizing Your Approach to a Data Analysis

Organizing Your Approach to a Data Analysis Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize

More information

Data Quality Objectives Decision Error Feasibility Trials Software (DEFT) - USER'S GUIDE

Data Quality Objectives Decision Error Feasibility Trials Software (DEFT) - USER'S GUIDE United States Office of Environmental EPA/240/B-01/007 Environmental Protection Information Agency Washington, DC 20460 Data Quality Objectives Decision Error Feasibility Trials Software (DEFT) - USER'S

More information

Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

More information

DIVISION OF ENVIRONMENTAL RESPONSE AND REVITALIZATION DATA QUALITY OBJECTIVES PROCESS SUMMARY DERR-00-DI-32

DIVISION OF ENVIRONMENTAL RESPONSE AND REVITALIZATION DATA QUALITY OBJECTIVES PROCESS SUMMARY DERR-00-DI-32 DIVISION OF ENVIRONMENTAL RESPONSE AND REVITALIZATION DATA QUALITY OBJECTIVES PROCESS SUMMARY DERR-00-DI-32 Internal Guidance Document: January 2002 DERR DQO PROCESS SUMMARY Table of Contents Step Page

More information

8 INTERPRETATION OF SURVEY RESULTS

8 INTERPRETATION OF SURVEY RESULTS 8 INTERPRETATION OF SURVEY RESULTS 8.1 Introduction This chapter discusses the interpretation of survey results, primarily those of the final status survey. Interpreting a survey s results is most straightforward

More information

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing. Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

More information

APPENDIX N. Data Validation Using Data Descriptors

APPENDIX N. Data Validation Using Data Descriptors APPENDIX N Data Validation Using Data Descriptors Data validation is often defined by six data descriptors: 1) reports to decision maker 2) documentation 3) data sources 4) analytical method and detection

More information

Great Lakes National Program Office and Office of Water Quality Management Training Modules

Great Lakes National Program Office and Office of Water Quality Management Training Modules GLOSSARY Assessment - the evaluation process used to measure the performance or effectiveness of a system and its elements. As used here, assessment is an all-inclusive term used to denote any of the following:

More information

Guidance on Systematic Planning Using the Data Quality Objectives Process EPA QA/G-4. United States Environmental Protection Agency

Guidance on Systematic Planning Using the Data Quality Objectives Process EPA QA/G-4. United States Environmental Protection Agency United States Environmental Protection Agency Office of Environmental Information Washington, DC 20460 EPA/240/B-06/001 February 2006 Guidance on Systematic Planning Using the Data Quality Objectives Process

More information

Visual Sample Plan (VSP): A Tool for Balancing Sampling Requirements Against Decision Error Risk

Visual Sample Plan (VSP): A Tool for Balancing Sampling Requirements Against Decision Error Risk Visual Sample Plan (VSP): A Tool for Balancing Sampling Requirements Against Decision Error Risk B.A. Pulsipher, R.O. Gilbert, and J.E. Wilson Pacific Northwest National Laboratory, Richland, Washington,

More information

SA 530 AUDIT SAMPLING. Contents. (Effective for audits of financial statements for periods beginning on or after April 1, 2009) Paragraph(s)

SA 530 AUDIT SAMPLING. Contents. (Effective for audits of financial statements for periods beginning on or after April 1, 2009) Paragraph(s) SA 530 AUDIT SAMPLING (Effective for audits of financial statements for periods beginning on or after April 1, 2009) Contents Introduction Paragraph(s) Scope of this SA... 1 2 Effective Date... 3 Objective...

More information

HYPOTHESIS TESTING WITH SPSS:

HYPOTHESIS TESTING WITH SPSS: HYPOTHESIS TESTING WITH SPSS: A NON-STATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER

More information

4 PROJECT PLAN DOCUMENTS

4 PROJECT PLAN DOCUMENTS 1 4 PROJECT PLAN DOCUMENTS 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 4.1 Introduction The project plan documents are a blueprint for how a particular project

More information

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

More information

Guidance on Systematic Planning Using the Data Quality Objectives Process EPA QA/G-4. United States Environmental Protection Agency

Guidance on Systematic Planning Using the Data Quality Objectives Process EPA QA/G-4. United States Environmental Protection Agency United States Environmental Protection Agency Office of Environmental Information Washington, DC 20460 EPA/240/B-06/001 February 2006 Guidance on Systematic Planning Using the Data Quality Objectives Process

More information

Measurement Information Model

Measurement Information Model mcgarry02.qxd 9/7/01 1:27 PM Page 13 2 Information Model This chapter describes one of the fundamental measurement concepts of Practical Software, the Information Model. The Information Model provides

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Audit Sampling. HKSA 530 Issued July 2009; revised July 2010

Audit Sampling. HKSA 530 Issued July 2009; revised July 2010 HKSA 530 Issued July 2009; revised July 2010 Effective for audits of financial statements for periods beginning on or after 15 December 2009 Hong Kong Standard on Auditing 530 Audit Sampling COPYRIGHT

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

INTERNATIONAL STANDARD ON AUDITING 530 AUDIT SAMPLING

INTERNATIONAL STANDARD ON AUDITING 530 AUDIT SAMPLING INTERNATIONAL STANDARD ON 530 AUDIT SAMPLING (Effective for audits of financial statements for periods beginning on or after December 15, 2009) CONTENTS Paragraph Introduction Scope of this ISA... 1 2

More information

Risk Management Procedure For The Derivation and Use Of Soil Exposure Point Concentrations For Unrestricted Use Determinations

Risk Management Procedure For The Derivation and Use Of Soil Exposure Point Concentrations For Unrestricted Use Determinations Risk Management Procedure For The Derivation and Use Of Soil Exposure Point Concentrations For Unrestricted Use Determinations Hazardous Materials and Waste Management Division (303) 692-3300 First Edition

More information

Statistical Inference and t-tests

Statistical Inference and t-tests 1 Statistical Inference and t-tests Objectives Evaluate the difference between a sample mean and a target value using a one-sample t-test. Evaluate the difference between a sample mean and a target value

More information

ASSURING THE QUALITY OF TEST RESULTS

ASSURING THE QUALITY OF TEST RESULTS Page 1 of 12 Sections Included in this Document and Change History 1. Purpose 2. Scope 3. Responsibilities 4. Background 5. References 6. Procedure/(6. B changed Division of Field Science and DFS to Office

More information

MTH 140 Statistics Videos

MTH 140 Statistics Videos MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

More information

Haulsey Engineering, Inc. Quality Management System (QMS) Table of Contents

Haulsey Engineering, Inc. Quality Management System (QMS) Table of Contents Haulsey Engineering, Inc. Quality Management System (QMS) Table of Contents 1.0 Introduction 1.1 Quality Management Policy and Practices 2.0 Quality System Components 2.1 Quality Management Plans 2.2 Quality

More information

MEASURES OF LOCATION AND SPREAD

MEASURES OF LOCATION AND SPREAD Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

Local outlier detection in data forensics: data mining approach to flag unusual schools

Local outlier detection in data forensics: data mining approach to flag unusual schools Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential

More information

Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

More information

Guidance for Quality Assurance Project Plans for Modeling

Guidance for Quality Assurance Project Plans for Modeling United States Office of Environmental EPA/240/R-02/007 Environmental Protection Information Agency Washington, DC 20460 Guidance for Quality Assurance Project Plans for Modeling EPA QA/G-5M Quality Quality

More information

How to Write a Successful PhD Dissertation Proposal

How to Write a Successful PhD Dissertation Proposal How to Write a Successful PhD Dissertation Proposal Before considering the "how", we should probably spend a few minutes on the "why." The obvious things certainly apply; i.e.: 1. to develop a roadmap

More information

Defining the Beginning: The Importance of Research Design

Defining the Beginning: The Importance of Research Design Research and Management Techniques for the Conservation of Sea Turtles K. L. Eckert, K. A. Bjorndal, F. A. Abreu-Grobois, M. Donnelly (Editors) IUCN/SSC Marine Turtle Specialist Group Publication No. 4,

More information

CHANCE ENCOUNTERS. Making Sense of Hypothesis Tests. Howard Fincher. Learning Development Tutor. Upgrade Study Advice Service

CHANCE ENCOUNTERS. Making Sense of Hypothesis Tests. Howard Fincher. Learning Development Tutor. Upgrade Study Advice Service CHANCE ENCOUNTERS Making Sense of Hypothesis Tests Howard Fincher Learning Development Tutor Upgrade Study Advice Service Oxford Brookes University Howard Fincher 2008 PREFACE This guide has a restricted

More information

Module 5: Statistical Analysis

Module 5: Statistical Analysis Module 5: Statistical Analysis To answer more complex questions using your data, or in statistical terms, to test your hypothesis, you need to use more advanced statistical tests. This module reviews the

More information

Chapter 4 DECISION ANALYSIS

Chapter 4 DECISION ANALYSIS ASW/QMB-Ch.04 3/8/01 10:35 AM Page 96 Chapter 4 DECISION ANALYSIS CONTENTS 4.1 PROBLEM FORMULATION Influence Diagrams Payoff Tables Decision Trees 4.2 DECISION MAKING WITHOUT PROBABILITIES Optimistic Approach

More information

Quality Assurance Guidance for Conducting Brownfields Site Assessments

Quality Assurance Guidance for Conducting Brownfields Site Assessments United States Office of Solid Waste and EPA 540-R-98-038 Environmental Protection Emergency Response OSWER 9230.0-83P Agency (5102G) PB98-963307 September 1998 Quality Assurance Guidance for Conducting

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

INTERNATIONAL STANDARD ON AUDITING 530 AUDIT SAMPLING AND OTHER MEANS OF TESTING CONTENTS

INTERNATIONAL STANDARD ON AUDITING 530 AUDIT SAMPLING AND OTHER MEANS OF TESTING CONTENTS INTERNATIONAL STANDARD ON AUDITING 530 AUDIT SAMPLING AND OTHER MEANS OF TESTING (Effective for audits of financial statements for periods beginning on or after December 15, 2004) CONTENTS Paragraph Introduction...

More information

Example G Cost of construction of nuclear power plants

Example G Cost of construction of nuclear power plants 1 Example G Cost of construction of nuclear power plants Description of data Table G.1 gives data, reproduced by permission of the Rand Corporation, from a report (Mooz, 1978) on 32 light water reactor

More information

Common Tools for Displaying and Communicating Data for Process Improvement

Common Tools for Displaying and Communicating Data for Process Improvement Common Tools for Displaying and Communicating Data for Process Improvement Packet includes: Tool Use Page # Box and Whisker Plot Check Sheet Control Chart Histogram Pareto Diagram Run Chart Scatter Plot

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

Normality Testing in Excel

Normality Testing in Excel Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

More information

INTERNATIONAL STANDARD ON AUDITING (UK AND IRELAND) 530 AUDIT SAMPLING AND OTHER MEANS OF TESTING CONTENTS

INTERNATIONAL STANDARD ON AUDITING (UK AND IRELAND) 530 AUDIT SAMPLING AND OTHER MEANS OF TESTING CONTENTS INTERNATIONAL STANDARD ON AUDITING (UK AND IRELAND) 530 AUDIT SAMPLING AND OTHER MEANS OF TESTING CONTENTS Paragraph Introduction... 1-2 Definitions... 3-12 Audit Evidence... 13-17 Risk Considerations

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Mathematics. Probability and Statistics Curriculum Guide. Revised 2010

Mathematics. Probability and Statistics Curriculum Guide. Revised 2010 Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

More information

How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

Statistical & Technical Team

Statistical & Technical Team Statistical & Technical Team A Practical Guide to Sampling This guide is brought to you by the Statistical and Technical Team, who form part of the VFM Development Team. They are responsible for advice

More information

Overview of EPA Quality System Requirements

Overview of EPA Quality System Requirements Overview of EPA Quality System Requirements Course Goals At the completion of this course, you will: Understand EPA's quality system requirements Understand the roles and responsibilities in implementing

More information

Final Draft Report. Criteria and Resources for Facility-Based and Post-Rule Assessment

Final Draft Report. Criteria and Resources for Facility-Based and Post-Rule Assessment Final Draft Report Criteria and Resources for Facility-Based and Post-Rule Assessment Disclaimer The SCAQMD staff is releasing this draft report to stakeholders and the AQMD Advisory Group for review and

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

Non-Inferiority Tests for One Mean

Non-Inferiority Tests for One Mean Chapter 45 Non-Inferiority ests for One Mean Introduction his module computes power and sample size for non-inferiority tests in one-sample designs in which the outcome is distributed as a normal random

More information

EPA GUIDANCE FOR QUALITY ASSURANCE PROJECT PLANS EPA QA/G-5

EPA GUIDANCE FOR QUALITY ASSURANCE PROJECT PLANS EPA QA/G-5 United States Office of Research and EPA/600/R-98/018 Environmental Protection Development February 1998 Agency Washington, D.C. 20460 EPA GUIDANCE FOR QUALITY ASSURANCE PROJECT PLANS FOREWORD The U.S.

More information

EXPLORING SPATIAL PATTERNS IN YOUR DATA

EXPLORING SPATIAL PATTERNS IN YOUR DATA EXPLORING SPATIAL PATTERNS IN YOUR DATA OBJECTIVES Learn how to examine your data using the Geostatistical Analysis tools in ArcMap. Learn how to use descriptive statistics in ArcMap and Geoda to analyze

More information

Data Quality. Tips for getting it, keeping it, proving it! Central Coast Ambient Monitoring Program

Data Quality. Tips for getting it, keeping it, proving it! Central Coast Ambient Monitoring Program Data Quality Tips for getting it, keeping it, proving it! Central Coast Ambient Monitoring Program Why do we care? Program goals are key what do you want to do with the data? Data utility increases directly

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Statistics 641 - EXAM II - 1999 through 2003

Statistics 641 - EXAM II - 1999 through 2003 Statistics 641 - EXAM II - 1999 through 2003 December 1, 1999 I. (40 points ) Place the letter of the best answer in the blank to the left of each question. (1) In testing H 0 : µ 5 vs H 1 : µ > 5, the

More information

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST UNDERSTANDING The independent-samples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly

More information

AP Statistics 2001 Solutions and Scoring Guidelines

AP Statistics 2001 Solutions and Scoring Guidelines AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for non-commercial use by AP teachers for course and exam preparation; permission for any other use

More information

Developing Quality Assurance Project Plans using Data Quality Objectives and other planning tools. Developing QAPPs 5/10/2012 1

Developing Quality Assurance Project Plans using Data Quality Objectives and other planning tools. Developing QAPPs 5/10/2012 1 Developing Quality Assurance Project Plans using Data Quality Objectives and other planning tools 1 Introductions 2 Agenda I. Developing a QAPP II. Systematic Planning using Data Quality Objectives III.

More information

Parametric and Nonparametric: Demystifying the Terms

Parametric and Nonparametric: Demystifying the Terms Parametric and Nonparametric: Demystifying the Terms By Tanya Hoskin, a statistician in the Mayo Clinic Department of Health Sciences Research who provides consultations through the Mayo Clinic CTSA BERD

More information

Planning sample size for randomized evaluations

Planning sample size for randomized evaluations TRANSLATING RESEARCH INTO ACTION Planning sample size for randomized evaluations Simone Schaner Dartmouth College povertyactionlab.org 1 Course Overview Why evaluate? What is evaluation? Outcomes, indicators

More information

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Assuming Equal Variance (Enter Means) Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

More information

STAT355 - Probability & Statistics

STAT355 - Probability & Statistics STAT355 - Probability & Statistics Instructor: Kofi Placid Adragni Fall 2011 Chap 1 - Overview and Descriptive Statistics 1.1 Populations, Samples, and Processes 1.2 Pictorial and Tabular Methods in Descriptive

More information

Selecting Research Participants

Selecting Research Participants C H A P T E R 6 Selecting Research Participants OBJECTIVES After studying this chapter, students should be able to Define the term sampling frame Describe the difference between random sampling and random

More information

UNIVERSITY OF NAIROBI

UNIVERSITY OF NAIROBI UNIVERSITY OF NAIROBI MASTERS IN PROJECT PLANNING AND MANAGEMENT NAME: SARU CAROLYNN ELIZABETH REGISTRATION NO: L50/61646/2013 COURSE CODE: LDP 603 COURSE TITLE: RESEARCH METHODS LECTURER: GAKUU CHRISTOPHER

More information

Testing Hypotheses About Proportions

Testing Hypotheses About Proportions Chapter 11 Testing Hypotheses About Proportions Hypothesis testing method: uses data from a sample to judge whether or not a statement about a population may be true. Steps in Any Hypothesis Test 1. Determine

More information

Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Battery Bias

Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Battery Bias Glossary of Terms Ability A defined domain of cognitive, perceptual, psychomotor, or physical functioning. Accommodation A change in the content, format, and/or administration of a selection procedure

More information

Appendix B Checklist for the Empirical Cycle

Appendix B Checklist for the Empirical Cycle Appendix B Checklist for the Empirical Cycle This checklist can be used to design your research, write a report about it (internal report, published paper, or thesis), and read a research report written

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

Intro to Data Analysis, Economic Statistics and Econometrics

Intro to Data Analysis, Economic Statistics and Econometrics Intro to Data Analysis, Economic Statistics and Econometrics Statistics deals with the techniques for collecting and analyzing data that arise in many different contexts. Econometrics involves the development

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

(Refer Slide Time: 01:52)

(Refer Slide Time: 01:52) Software Engineering Prof. N. L. Sarda Computer Science & Engineering Indian Institute of Technology, Bombay Lecture - 2 Introduction to Software Engineering Challenges, Process Models etc (Part 2) This

More information

TIPS DATA QUALITY STANDARDS ABOUT TIPS

TIPS DATA QUALITY STANDARDS ABOUT TIPS 2009, NUMBER 12 2 ND EDITION PERFORMANCE MONITORING & EVALUATION TIPS DATA QUALITY STANDARDS ABOUT TIPS These TIPS provide practical advice and suggestions to USAID managers on issues related to performance

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Final Exam Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) A researcher for an airline interviews all of the passengers on five randomly

More information

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

More information

Audit Sampling. AU Section 350 AU 350.05

Audit Sampling. AU Section 350 AU 350.05 Audit Sampling 2067 AU Section 350 Audit Sampling (Supersedes SAS No. 1, sections 320A and 320B.) Source: SAS No. 39; SAS No. 43; SAS No. 45; SAS No. 111. See section 9350 for interpretations of this section.

More information

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)

More information

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

More information

Audit Sampling 101. BY: Christopher L. Mitchell, MBA, CIA, CISA, CCSA Cmitchell@KBAGroupLLP.com

Audit Sampling 101. BY: Christopher L. Mitchell, MBA, CIA, CISA, CCSA Cmitchell@KBAGroupLLP.com Audit Sampling 101 BY: Christopher L. Mitchell, MBA, CIA, CISA, CCSA Cmitchell@KBAGroupLLP.com BIO Principal KBA s Risk Advisory Services Team 15 years of internal controls experience within the following

More information

A Comparison of System Dynamics (SD) and Discrete Event Simulation (DES) Al Sweetser Overview.

A Comparison of System Dynamics (SD) and Discrete Event Simulation (DES) Al Sweetser Overview. A Comparison of System Dynamics (SD) and Discrete Event Simulation (DES) Al Sweetser Andersen Consultng 1600 K Street, N.W., Washington, DC 20006-2873 (202) 862-8080 (voice), (202) 785-4689 (fax) albert.sweetser@ac.com

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

STATISTICAL ANALYSIS AND INTERPRETATION OF DATA COMMONLY USED IN EMPLOYMENT LAW LITIGATION

STATISTICAL ANALYSIS AND INTERPRETATION OF DATA COMMONLY USED IN EMPLOYMENT LAW LITIGATION STATISTICAL ANALYSIS AND INTERPRETATION OF DATA COMMONLY USED IN EMPLOYMENT LAW LITIGATION C. Paul Wazzan Kenneth D. Sulzer ABSTRACT In employment law litigation, statistical analysis of data from surveys,

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study.

Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study. Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study Prepared by: Centers for Disease Control and Prevention National

More information

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

A Correlation of. to the. South Carolina Data Analysis and Probability Standards A Correlation of to the South Carolina Data Analysis and Probability Standards INTRODUCTION This document demonstrates how Stats in Your World 2012 meets the indicators of the South Carolina Academic Standards

More information

Statistical Inference: Hypothesis Testing

Statistical Inference: Hypothesis Testing Statistical Inference: Hypothesis Testing Scott Evans, Ph.D. 1 The Big Picture Populations and Samples Sample / Statistics x, s, s 2 Population Parameters μ, σ, σ 2 Scott Evans, Ph.D. 2 Statistical Inference

More information

Basic research methods. Basic research methods. Question: BRM.2. Question: BRM.1

Basic research methods. Basic research methods. Question: BRM.2. Question: BRM.1 BRM.1 The proportion of individuals with a particular disease who die from that condition is called... BRM.2 This study design examines factors that may contribute to a condition by comparing subjects

More information

Develop Project Charter. Develop Project Management Plan

Develop Project Charter. Develop Project Management Plan Develop Charter Develop Charter is the process of developing documentation that formally authorizes a project or a phase. The documentation includes initial requirements that satisfy stakeholder needs

More information

STATISTICAL SIGNIFICANCE AND THE STANDARD OF PROOF IN ANTITRUST DAMAGE QUANTIFICATION

STATISTICAL SIGNIFICANCE AND THE STANDARD OF PROOF IN ANTITRUST DAMAGE QUANTIFICATION Lear Competition Note STATISTICAL SIGNIFICANCE AND THE STANDARD OF PROOF IN ANTITRUST DAMAGE QUANTIFICATION Nov 2013 Econometric techniques can play an important role in damage quantification cases for

More information

3. Data Analysis, Statistics, and Probability

3. Data Analysis, Statistics, and Probability 3. Data Analysis, Statistics, and Probability Data and probability sense provides students with tools to understand information and uncertainty. Students ask questions and gather and use data to answer

More information

Introduction to Hypothesis Testing OPRE 6301

Introduction to Hypothesis Testing OPRE 6301 Introduction to Hypothesis Testing OPRE 6301 Motivation... The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about

More information