Data Quality Assessment: A Reviewer s Guide EPA QA/G9R


 Arabella Spencer
 1 years ago
 Views:
Transcription
1 United States Office of Environmental EPA/240/B06/002 Environmental Protection Information Agency Washington, DC Data Quality Assessment: A Reviewer s Guide EPA QA/G9R
2
3 FOREWORD This document is the 2006 version of the Data Quality Assessment: A Reviewer s Guide which provides general guidance to organizations on assessing data quality criteria and performance specifications for decision making. The Environmental Protection Agency (EPA) has developed a process for performing the Data Quality Assessment (DQA) Process for project managers and planners to determine whether the type, quantity, and quality of data needed to support Agency decisions have been achieved. This guidance is the culmination of experiences in the design and statistical analyses of environmental data in different Program Offices at the EPA. Many elements of prior guidance, statistics, and scientific planning have been incorporated into this document. This document is one of a series of quality management guidance documents that the EPA Quality Staff has prepared to assist users in implementing the Agencywide Quality System. Other related documents include: EPA QA/G4 Guidance on Systematic Planning using the Data Quality Objectives Process EPA QA/G4D DEFT Software for the Data Quality Objectives Process EPA QA/G5S Guidance on Choosing a Sampling Design for Environmental Data Collection EPA QA/G9S Data Quality Assessment: Statistical Methods for Practitioners This document is intended to be a "living document" that will be updated periodically to incorporate new topics and revisions or refinements to existing procedures. Comments received on this 2006 version will be considered for inclusion in subsequent versions. Please send your written comments on the Data Quality Assessment: A Reviewer s Guide to: Quality Staff (2811R) Office of Environmental Information U.S. Environmental Protection Agency 1200 Pennsylvania Avenue N.W. Washington, DC Phone: (202) Fax: (202) EPA QA/G9R iii
4 EPA QA/G9R iv
5 TABLE OF CONTENTS Page INTRODUCTION Purpose of this Guidance DQA and the Data Life Cycle The Five Steps of Statistical DQA Intended Audience Organization of this Guidance... 5 STEP 1: REVIEW THE PROJECT S OBJECTIVES AND SAMPLING DESIGN Review Study Objectives Translate Study Objectives into Statistical Terms Developing Limits on Uncertainty Review Sampling Design What Outputs Should a DQA Reviewer Have at the Conclusion of Step 1? STEP 2: CONDUCT A PRELIMINARY DATA REVIEW Review Quality Assurance Reports Calculate Basic Statistical Quantities Graph the Data What Outputs Should a DQA Reviewer Have at the Conclusion of Step 2? STEP 3: SELECT THE STATISTICAL METHOD Choosing Between Alternatives: Hypothesis Testing Estimating a Parameter: Confidence Intervals and Tolerance Intervals What Output Should a DQA Reviewer Have at the Conclusion of Step 3? STEP 4: VERIFY THE ASSUMPTIONS OF THE STATISTICAL METHOD Perform Tests of Assumptions Develop an Alternate Plan Corrective Actions What Outputs Should a DQA Reviewer Have at the End of Step 4? STEP 5: DRAW CONCLUSIONS FROM THE DATA Perform the Statistical Method Draw Study Conclusions Hypothesis Tests Confidence Intervals Tolerance Intervals Evaluate Performance of the Sampling Design What Output Should the DQA Reviewer Have at the End of Step 5? INTERPRETING AND COMMUNICATING THE TEST RESULTS Data Interpretation: The Meaning of pvalues Data Interpretation: "Accepting" vs. "Failing to Reject" the Null Hypothesis Data Sufficiency: "Proof of Safety" vs. "Proof of Hazard" Data Sufficiency: Quantity vs. Quality of Data Data Sufficiency: Statistical Significance vs. Practical Significance Conclusions REFERENCES EPA QA/G9R v
6 Page Appendix A: Commonly Used Statistical Quantities Appendix B: Graphical Representation of Data Appendix C: Common Hypothesis Tests Appendix D: Commonly Used Statements of Hypotheses Appendix E: Common Assumptions and Transformations Appendix F: Checklist of Outputs for Data Quality Assessment EPA QA/G9R vi
7 CHAPTER 0 INTRODUCTION 0.1 Purpose of this Guidance Data Quality Assessment (DQA) is the scientific and statistical evaluation of environmental data to determine if they meet the planning objectives of the project, and thus are of the right type, quality, and quantity to support their intended use. This guidance describes broadly the statistical aspects of DQA in evaluating environmental data sets. A more detailed discussion about DQA graphical and statistical tools may be found in the companion guidance document, Data Quality Assessment: Statistical Methods for Practitioners (Final Draft) (EPA QA/G9S) (U.S. EPA 2004). This guidance applies to using DQA to support environmental decisionmaking (e.g., compliance determinations), and to using DQA in estimation problems in which environmental data are used (e.g., monitoring programs). DQA is built on a fundamental premise: data quality is meaningful only when it relates to the intended use of the data. Data quality does not exist in a vacuum, a reviewer needs to know in what context a data set is to be used in order to establish a relevant yardstick for judging whether or not the data is acceptable. By using DQA, a reviewer can answer four important questions: 1. Can a decision (or estimate) be made with the desired level of certainty, given the quality of the data? 2. How well did the sampling design perform? 3. If the same sampling design strategy is used again for a similar study, would the data be expected to support the same intended use with the desired level of certainty? 4. Is it likely that sufficient samples were taken to enable the reviewer to see an effect if it was really present? The first question addresses the reviewer s immediate needs. For example, if the data are being used for decisionmaking and provide evidence strongly in favor of one course of action over another, then the decision maker can proceed knowing that the decision will be supported by unambiguous data. However, if the data do not show sufficiently strong evidence to favor one alternative, then the data analysis alerts the decision maker to this uncertainty. The decision maker now is in a position to make an informed choice about how to proceed (such as collect more or different data before making the decision, or proceed with the decision despite the relatively high, but tolerable, chance of drawing an erroneous conclusion). The second question addresses how robust this sampling design is with respect to changing conditions. If the design is very sensitive to potentially disturbing influences, then EPA QA/G9R 1
8 interpretation of the results may be difficult. By addressing the second question the reviewer guards against the possibility of a spurious result arising from a unique set of circumstances. The third question addresses the problem of whether this could be considered a unique situation where the results of this DQA only applies to this situation only and the conclusions cannot be extrapolated to similar situations. It also addresses the suitability of using this data collection design cannot reviewer s potential future needs. For example, if reviewers intend to use a certain sampling design at a different location from where the design was first used, they should determine how well the design can be expected to perform given that the outcomes and environmental conditions of this sampling event will be different from those of the original event. As environmental conditions will vary from one location or one time to another, the adequacy of the sampling design should be evaluated over a broad range of possible outcomes and conditions. The final question addresses the issue of whether sufficient resources were used in the study. For example, in an epidemiological investigation, was it likely the effect of interest could be reliably observed given the limited number of samples actually obtained. 0.2 DQA and the Data Life Cycle The data life cycle (depicted in Figure 01) comprises three steps: planning, implementation, and assessment. During the planning phase, a systematic planning procedure (such as the Data Quality Objectives (DQO) Process) is used to define criteria for determining the number, location, and timing of samples (measurements) to be collected in order to produce a result with a desired level of certainty. This information, along with the sampling methods, analytical procedures, and appropriate quality assurance (QA) and quality control procedures, is documented in the QA Project Plan. Data are then collected following the QA Project Plan specifications in the implementation phase. At the outset of the assessment phase, the data are verified and validated to ensure that the sampling and analysis protocols specified in the QA Project Plan were followed, and that the measurement systems were performed in accordance with the criteria specified in the QA Project Plan. Then the statistical component of DQA completes the data life cycle by providing the evaluation needed to determine if the performance and acceptance criteria developed by the DQO planning process were achieved. EPA QA/G9R 2
9 QUALITY ASSURANCE ASSESSMENT PLANNING Data Quality Objectives Process Quality Assurance Project Plan Development IMPLEMENTATION Field Data Collection and Associated Quality Assurance / Quality Control Activities QC/Performance Routine Data Evaluation Data INPUTS DATA VERIFICATION /VALIDATION Verify measurement performance Verify measurement procedures and reporting specifications OUTPUT VERIFIED /VALIDATED DATA INPUT ASSESSMENT Data Verification/ Validation Data Quality Assessment DATA QUALITY ASSESSMENT Review project objectives and sampling design Conduct preliminary data review Select statistical method Verify assumptions of the method Draw conclusions from the data OUTPUT PROJECT CONCLUSIONS 0.3 The Five Steps of Statistical DQA Figure 01. Data Life Cycle The statistical part of DQA involves five steps that begin with a review of the planning documentation and end with an answer to the problem or question posed during the planning phase of the study. These steps roughly parallel the actions of an environmental statistician when analyzing a set of data. The five steps, which are described in more detail in the following chapters of this guidance, are briefly summarized as follows: 1. Review the project s objectives and sampling design: Review the objectives defined during systematic planning to assure that they are still applicable. If objectives have not been developed (e.g., when using existing data independently collected), specify them before evaluating the data for the projects objectives. Review the sampling design and data collection documentation for consistency with the project objectives observing any potential discrepancies. 2. Conduct a preliminary data review: Review QA reports (when possible) for the validation of data, calculate basic statistics, and generate graphs of the data. Use this information to learn about the structure of the data and identify patterns, relationships, or potential anomalies. 3. Select the statistical method: Select the appropriate procedures for summarizing and analyzing the data, based on the review of the performance and acceptance EPA QA/G9R 3
10 criteria associated with the projects objectives, the sampling design, and the preliminary data review. Identify the key underlying assumptions associated with the statistical test. 4. Verify the assumptions of the statistical method: Evaluate whether the underlying assumptions hold, or whether departures are acceptable, given the actual data and other information about the study. 5. Draw conclusions from the data: Perform the calculations pertinent to the statistical test, and document the conclusions to be drawn as a result of these calculations. If the design is to be used again, evaluate the performance of the sampling design. Although these five steps are presented in a linear sequence, DQA is by its very nature iterative. For example, if the preliminary data review reveals patterns or anomalies in the data set that are inconsistent with the project objectives, then some aspects of the study analysis may have to be reconsidered. Likewise, if the underlying assumptions of the statistical test are not supported by the data, then previous steps of the DQA may have to be revisited. The strength of DQA Process is that it is designed to promote an understanding of how well the data satisfy their intended use by progressing in a logical and efficient manner. Nevertheless, it should be realized that DQA cannot absolutely prove that the objectives set forth in the planning phase of a study have been achieved. This is because the reviewer can never know the true value of the item of interest, only information from a sample. Sample data collection provides the reviewer only with an estimate, not the true value. As an reviewer makes a determination based on the estimated value, there is always the risk of drawing an incorrect conclusion. Use of a welldocumented planning process helps reduce this risk to an acceptable level. 0.4 Intended Audience This guidance is written as a general overview of statistical DQA for a broad audience of potential data users, reviewers, data generators and data investigators. Reviewers (such as project managers, risk assessors, or principal investigators who are responsible for making decisions or producing estimates regarding environmental characteristics based on environmental data) should find this guidance useful for understanding and directing the technical work of others who produce and analyze data. Data generators (such as analytical chemists, field sampling specialists, or technical support staff responsible for collecting and analyzing environmental samples and reporting the resulting data values) should find this guidance helpful for understanding how their work will be used. Data investigators (such as technical investigators responsible for evaluating the quality of environmental data) should find this guidance to be a handy summary of DQArelated concepts. Specific information about applying DQArelated graphical and statistical techniques is contained in the companion guidance, Data Quality Assessment: Statistical Methods for Practitioners (Final Draft) (EPA QA/G9S). EPA QA/G9R 4
11 0.5 Organization of this Guidance Chapters 1 through 5 of this guidance address the five steps of DQA in turn. Each chapter discusses the activities expected and includes a list of the outputs that should be achieved in that step. Chapter 6 provides additional perspectives on how to interpret data and understand/ communicate the conclusions drawn from data. Finally, Appendices A through E contain nontechnical explanatory material describing some of the statistical concepts used. Appendix F is a checklist that can be used to ensure all steps of the DQA process have been addressed. EPA QA/G9R 5
12 EPA QA/G9R 6
13 CHAPTER 1 STEP 1: REVIEW THE PROJECT S OBJECTIVES AND SAMPLING DESIGN DQA begins by reviewing the key outputs from the planning phase of the data life cycle such as the Data Quality Objectives, the QA Project Plan, and any related documents. The study objective provides the context for understanding the purpose of the data collection effort and establishes the qualitative and quantitative basis for assessing the quality of the data set for the intended use. The sampling design (documented in the QA Project Plan) provides important information about how to interpret the data. By studying the sampling design, the reviewer can gain an understanding of the assumptions under which the design was developed, as well as the relationship between these assumptions and the study objective. By reviewing the methods by which the samples were collected, measured, and reported, the reviewer prepares for the preliminary data review and subsequent steps of DQA. Systematic planning improves the representativeness and overall quality of a sampling design, the effectiveness and efficiency with which the sampling and analysis plan is implemented, and the usefulness of subsequent DQA efforts. For systematic planning, the Agency recommends the DQO Process, a logical, systematic planning process based on the scientific method. The DQO Process emphasizes the planning and development of a sampling design to collect the right type, quality, and quantity of data for the intended use. Employing both the DQO Process and DQA will help to ensure that projects are supported by data of adequate quality; the DQO Process does so prospectively and DQA does so retrospectively. Systematic planning, whether the DQO Process or other, can help assure that data are not collected spuriously. The DQO Process is discussed in Guidance on the Data Quality Objectives Process (QA/G4) (U.S. EPA 2000a). Step 1. State the Problem Define the problem that motivates the study; Identify the planning team; examine budget, schedule. Step 2. Identify the Goal of the Study State how environmental data will be used in solving the problem; identify study questions; define alternative outcomes. Step 3. Identify Information Inputs Identify data and information needed to answer study questions. Step 4. Define the Boundaries of the Study Specify the target population and characteristics of interest; define spatial and temporal limits, scale of inference. Step 5. Develop the Analytic Approach Define the parameter of interest; specify the type of inference and develop logic for drawing conclusions from the findings. Statistical Hypothesis Testing Estimation and other analytical approaches Step 6. Specify Performance or Acceptance Criteria Develop performance criteria for new data being collected, acceptance criteria for data already collected. Step 7. Develop the Detailed Plan for Obtaining Data Select the most resource effective sampling and analysis plan that satisfies the performance or acceptance criteria. In instances where project objectives have not been developed and documented during the planning phase of the study, it is necessary to recreate some of Figure 11. The Data Quality the project objectives prior to conducting the DQA. Objectives Process These are used to make appropriate criteria for evaluating the quality of the data with respect to their intended use. The most important recreations are: hypotheses chosen, level of significance selected (tolerable levels of potential decision errors), statistical method selected, and number of samples collected. The seven steps of the DQO Process are illustrated in Figure 11. EPA QA/G9R 7
14 1.1 Review Study Objectives First, the objectives of the study should be reviewed in order to provide a context for analyzing the data. If a systematic planning process has been implemented before the data are collected, then this step reduces to reviewing the documentation on the study objectives. If no clear planning process was used, the reviewer should: Develop a concise definition of the problem (e.g. DQO Process Step 1) and of the methodology of how the data were collected (e.g. DQO Process Step 2). This should provide the fundamental reason for collecting the environmental data and identify all potential actions that could result from the data analysis. Identify the target population (universe of interest) and determine if any essential information is missing (e.g. DQO Process Step 3). If so, either collect the missing information before proceeding, or select a different approach to resolving the problem. Specify the scale of determination (any subpopulations of interest) and any boundaries on the study (e.g. DQO Process Step 4) based on the sampling design. The scale of determination is the smallest area or time period to which the conclusions of the study will apply. The apparent sampling design and implementation may restrict how small or how large this scale of determination can be. 1.2 Translate Study Objectives into Statistical Terms In this activity, the reviewer's objectives are used to develop a precise statement of how environmental data will be evaluated to generate the study s conclusions. If DQOs were generated during planning, this statement will be found as an output of DQO Process Step 5. In many cases, this activity is best accomplished by the formulation of statistical hypotheses, including a null hypothesis, which is a "baseline condition" that is presumed to be true in the absence of strong evidence to the contrary, as well as an alternative hypothesis, which bears the burden of proof. In other words, the baseline condition will be retained unless the alternative condition (the alternative hypothesis) is thought to be true due to the preponderance of evidence. In general, such hypotheses often consist of the following elements: a population parameter of interest (such as a mean or a median), which describes the feature of the environment that the reviewer is investigating; a numerical value to which the parameter will be compared, such as a regulatory or riskbased threshold or a similar parameter from another place (e.g., comparison to a reference site) or time (e.g., comparison to a prior time); and EPA QA/G9R 8
15 a relationship (such as "is equal to" or "is greater than") that specifies precisely how the parameter will be compared to the numerical value. Section 3.1 provides additional information on how to develop the statement of hypotheses, and includes a list of commonly encountered hypotheses for environmental projects. Some environmental data collection efforts do not involve the direct comparison of measured values to a fixed value. For instance, for monitoring programs or exploratory studies, the goal may be to develop estimates of values or ranges applicable to given parameters. This is best accomplished by the formulation of confidence intervals or tolerance intervals, which estimate the probability that the true value of a parameter is within a given range. In general, confidence intervals consist of the following elements: a range of values with in which the unknown population parameter of interest (such as the mean or median) is thought to lie; and a probabilistic expression denoting the chance that this range captures the parameter of interest. An example of a confidence interval would be We are 95% confident that the interval 47.3 to 51.8 contains the population mean. Tolerance intervals are used with proportions. Here, we wish to have a certain level of confidence that a certain proportion of the population falls in a certain region. An example of a tolerance interval would be We are 95% confident that at least 80% of the population is above the threshold value. Section 3.2 provides additional information on confidence intervals and tolerance intervals. For discussion of technical issues related to statistical testing using hypotheses or confidence/tolerance intervals, refer to Chapter 3 of Data Quality Assessment: Statistical Methods for Practitioners (Final Draft) (EPA QA/G9S). 1.3 Developing Limits on Uncertainty The goal of this activity is to develop quantitative statements of the reviewer s tolerance for uncertainty in conclusions drawn from the data and in actions based on those conclusions. These statements are generated during DQO Process Step 6, but they can also be generated retrospectively as part of DQA. If the project has been framed as a hypothesis test, then the uncertainty limits can be expressed as the reviewer's tolerance for committing false rejection (Type I, sometimes called a false positive) or false acceptance (Type II, sometimes called a false negative) decision errors 1. 1 Decision errors occur when the data collected do not adequately represent the population of interest. For example, the limited amount of information collected may have a preponderance of high values that were sampled by pure EPA QA/G9R 9
16 A false rejection error occurs when the null hypothesis is rejected when it is, in fact, true. A false acceptance error occurs when the null hypothesis is not rejected (often called accepted ) when it is, in fact, false. Other related phrases in common use include "level of significance" which is equal to the Type I error (false rejection) rate, and "power" which is equal to 1  Type II error (false acceptance) rate. When a hypothesis is being tested, it is convenient to summarize the applicable uncertainty limits by means of a decision performance goal diagram. For detailed information on how to develop false rejection and false acceptance decision error rates, see Chapter 6 of Guidance on the Data Quality Objectives Process (QA/G4) (U.S. EPA 2000a). If the project has been framed in terms of confidence intervals, then uncertainty is expressed as a combination of two interrelated terms: the width of the interval (smaller intervals correspond to a smaller degree of uncertainty); and the confidence level (typically stated as a percentage) indicating the chance this interval captures the unknown parameter of interest (a 95% confidence level represents a smaller degree of uncertainty than, say, a 90% confidence level). If the project has been framed in terms of tolerance intervals, then uncertainty is expressed as a combination of confidence level and: the proportion of the population that lies in the interval. Note that there is nothing inherently preferable about obtaining a particular probability, such as 95% for the confidence interval. For the same data set, there can be a 95% probability that the parameter lies within a given interval, as well as a 90% probability that it lies within another (smaller) interval, and an 80% probability of being in even a smaller interval. All the intervals are centered on the best estimate of that parameter usually calculated directly from the data (see also Chapter 3.2). 1.4 Review Sampling Design The goal of this activity is to familiarize the reviewer with the main features of the sampling design that was used to generate the environmental data. If DQOs were developed during planning, the sampling design will have been summarized as part of DQO Process Step 7. The design and sampling strategy should be discussed in clear detail in the QA Project Plan or Sampling and Analysis Plan. The overall type of sampling design and the manner in which samples were collected or measurements were taken will place conditions and constraints on how the data can be used and interpreted. chance. A decision maker could possibly draw the conclusion (decision) that the target population was high when, in fact, it was much lower. A similar situation occurs when the data are collected according to a plan that is too limited to reflect the true underlying variability. EPA QA/G9R 10
17 The key distinction in sampling design is between judgmental (also called authoritative) sampling (in which sample numbers and locations are selected based on expert knowledge of the problem) and probability sampling (in which sample numbers and locations are selected based on randomization, and each member of the target population has a known probability of being included in the sample). Judgmental sampling has some advantages and is appropriate in some cases, but the reviewer should be aware of its limitations and drawbacks. This type of sampling should be considered only when the objectives of the investigation are not of a statistical nature (for example, when the objective of a study is to identify specific locations of leaks, or when the study is focused solely on the sampling locations themselves). Generally, conclusions drawn from judgmental samples apply only to those individual samples; aggregation may result in severe bias due to lack of representativeness and lead to highly erroneous conclusions. Judgmental sampling, although often rapid to implement, precludes the use of the sample for any purpose other than the original one. If the reviewer elects to proceed with judgmental data, then great care should be taken in interpreting any statistical statements concerning the conclusions to be drawn. Using a probabilistic statement with a judgmental sample is incorrect and should be avoided as it gives an illusion of correctness where there is none. The further the judgmental sample is from a truly random sample, the more questionable the conclusions. Probabilistic sampling is often more difficult to implement than judgmental sampling due to the difficulty of locating the random locations of the samples. It does have the advantage of allowing probability statements to be made about the quality of estimates or hypothesis tests that are derived from the resultant data. One common misconception of probability sampling procedures is that these procedures preclude the use of expert knowledge or important prior information about the problem. Indeed, just the opposite is true; an efficient sampling design is one that uses all available prior information to stratify the region (in order to improve the representativeness of the resulting samples) and set appropriate probabilities of selection. Common types of probabilistic sampling designs include the following: Simple random sampling the method of sampling where samples are collected at random times or locations throughout the sampling period or study area. Stratified sampling a sampling method where a population is divided into nonoverlapping subpopulations called strata and sampling locations are selected randomly within each stratum using some sampling design. Systematic sampling a randomly selected unit (in space or time) establishes the starting place of a systematic pattern that is repeated throughout the population. With some important assumptions, can be shown to be equivalent to simple random sampling. EPA QA/G9R 11
18 Ranked set sampling a field sampling design where expert judgment or an auxiliary measurement method is used in combination with simple random sampling to determine which locations should be sampled. Adaptive cluster sampling a sampling method in which some samples are taken using simple random sampling, and additional samples are taken at locations where measurements exceed some threshold value. Composite sampling a sampling method in which multiple samples are physically mixed into a larger sample and samples for analysis drawn from this larger sample. This technique can be highly costeffective (but at the expense of variability estimation) and had the advantage it can be used in conjunction with any other sampling design. The document Guidance on Choosing a Sampling Design for Environmental Data Collection (EPA QA/G5S) (U.S. EPA 2002) provides extensive information on sampling design issues and their implications for data interpretation. Regardless of the type of sampling scheme, the reviewer should review the sampling design documentation and look for design features that support the project s objectives. For example, if the reviewer is interested in making a decision about the mean level of contamination in an effluent stream over time, then composite samples may be an appropriate sampling approach. On the other hand, if the reviewer is looking for hot spots of contamination at a hazardous waste site, compositing should be used with caution, to avoid "averaging away" hot spots. Also, look for potential problems in the implementation of the sampling design. For example, if simple random sampling has been used, can the reviewer be confident this was actually achieved in the actual selection of data point? Small deviations from a sampling plan probably have minimal effect on the conclusions drawn from the data set, but significant or substantial deviations should be flagged and their potential effect carefully considered. The most important point is to verify that the collected data are consistent with how the QA Project Plan, Sampling and Analysis Plan, or overall objectives of the study stated them to be. 1.5 What Outputs Should a DQA Reviewer Have at the Conclusion of Step 1? There are three outputs a DQA reviewer should have documented at the conclusion of Step 1: 1. Welldefined project objectives and criteria, 2. Verification that the hypothesis or estimate chosen is consistent with the project s objective and meets the project s performance and acceptance criteria, and 3. A list of any deviations from the planned sampling design and the potential effects of these deviations. EPA QA/G9R 12
19 CHAPTER 2 STEP 2: CONDUCT A PRELIMINARY DATA REVIEW The principal goal of this step of the process is to review the calculation of some basic statistical quantities, and review any graphical representations of the data. By reviewing the data both numerically and graphically, one can learn the "structure" of the data and thereby identify appropriate approaches and limitations for using the data. There are two main elements of preliminary data review: (1) basic statistical quantities (summary statistics) and (2) graphical representations of the data. Statistical quantities are functions of the data that numerically describe the data and include the sample mean, sample median, sample percentiles, sample range, and sample standard deviation. These quantities, known as estimates, condense the data and are useful for making inferences concerning the population from which the data were drawn. Graphical representations are used to identify patterns and relationships within the data, confirm or disprove assumptions, and identify potential problems. The preliminary data review step is designed to make the reviewer familiar with the data. The review should identify anomalies that could indicate unexpected events that may influence the analysis of the data. 2.1 Review Quality Assurance Reports When sufficient documentation is present, the first activity is to review any relevant QA reports that describe the data collection and reporting process as it was actually implemented. These QA reports provide valuable information about potential problems or anomalies in the data set. Specific items that may be helpful include: Data verification and validation reports that document the sample collection, handling, analysis, data reduction, and reporting procedures used; Quality control reports from laboratories or field stations that document measurement system performance. These QA reports are useful when investigating data anomalies that may affect critical assumptions made to ensure the validity of the statistical tests. In many cases, such as the evaluation of data cited in a publication, these reports may be unobtainable. Auxiliary questions such as Has this project or data set been peer reviewed?, Were the peer reviewers chosen independently of the data generators?, and Is there evidence to persuade me that the appropriate QA protocols have been observed?, should be asked to assess the integrity of the data. Without some form of positive response to these questions, it is difficult to assess the validity of the data and the resulting conclusions. The purpose of this EPA QA/G9R 13
20 validity inspection of the data is to assure a firm foundation exists to support the conclusions drawn from the data. 2.2 Calculate Basic Statistical Quantities The basic quantitative characteristics of the data using common statistical quantities ato be expected of almost any quantitative study. It is often useful to prepare a table of descriptive statistics for each population when more than one is being studied (e.g., background compared to a potentially contaminated site) so that obvious differences between the populations can be identified. Commonly used statistical quantities and the differences between them are discussed in Appendix A 2.3 Graph the Data The visual display of data is used to identify patterns and trends in the data that might go unnoticed using purely numerical methods. Graphs can be used to identify these patterns and trends, to quickly confirm or disprove hypotheses, to discover new phenomena, to identify potential problems, and to suggest corrective measures. In addition, some graphical representations can be used to record and store data compactly or to convey information to others. Plots and graphs of the data are very valuable tools for stakeholder interactions and often provide an immediate understanding of the important characteristics of the data. Graphical representations include displays of individual data points, statistical quantities, temporal data, or spatial data. Since no single graphical representation will provide a complete picture of the data set, the reviewer should choose different graphical techniques to illuminate different features of the data. At a minimum, there should be a graphical representation of the individual data points and a graphical representation of the statistical quantities. If the data set consists of more than one variable, each variable should be treated individually before developing graphical representations for the multiple variables. If the sampling plan or suggested analysis methods rely on any critical assumptions, consider whether a particular type of graph might shed light on the validity of that assumption. Usually, graphs should be applied to each group of data separately or each data set should be represented by a different symbol. There are many types of graphical displays that can be applied to environmental data; a variety of data plots are shown in Appendix B. 2.4 What Outputs Should a DQA Reviewer Have at the Conclusion of Step 2? At the conclusion, two main outputs should be present: 1. Basic statistical quantities should have been calculated, and 2. Graphs showing different aspects of the data should have been developed. EPA QA/G9R 14
Quality. Guidance for Data Quality Assessment. Practical Methods for Data Analysis EPA QA/G9 QA00 UPDATE. EPA/600/R96/084 July, 2000
United States Environmental Protection Agency Office of Environmental Information Washington, DC 060 EPA/600/R96/08 July, 000 Guidance for Data Quality Assessment Quality Practical Methods for Data Analysis
More informationAPPENDIX E THE ASSESSMENT PHASE OF THE DATA LIFE CYCLE
APPENDIX E THE ASSESSMENT PHASE OF THE DATA LIFE CYCLE The assessment phase of the Data Life Cycle includes verification and validation of the survey data and assessment of quality of the data. Data verification
More informationAN APPLICATION OF USEPA S DATA QUALITY OBJECTIVE PROCESS
AN APPLICATION OF USEPA S DATA QUALITY OBJECTIVE PROCESS Karen A. Storne O Brien & Gere Engineers, Inc., 5000 Brittonfield Parkway, Syracuse, NY 13221 ABSTRACT The United States Environmental Protection
More informationOrganizing Your Approach to a Data Analysis
Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize
More informationData Quality Objectives Decision Error Feasibility Trials Software (DEFT)  USER'S GUIDE
United States Office of Environmental EPA/240/B01/007 Environmental Protection Information Agency Washington, DC 20460 Data Quality Objectives Decision Error Feasibility Trials Software (DEFT)  USER'S
More informationDIVISION OF ENVIRONMENTAL RESPONSE AND REVITALIZATION DATA QUALITY OBJECTIVES PROCESS SUMMARY DERR00DI32
DIVISION OF ENVIRONMENTAL RESPONSE AND REVITALIZATION DATA QUALITY OBJECTIVES PROCESS SUMMARY DERR00DI32 Internal Guidance Document: January 2002 DERR DQO PROCESS SUMMARY Table of Contents Step Page
More informationTutorial 5: Hypothesis Testing
Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrclmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................
More information8 INTERPRETATION OF SURVEY RESULTS
8 INTERPRETATION OF SURVEY RESULTS 8.1 Introduction This chapter discusses the interpretation of survey results, primarily those of the final status survey. Interpreting a survey s results is most straightforward
More informationAPPENDIX N. Data Validation Using Data Descriptors
APPENDIX N Data Validation Using Data Descriptors Data validation is often defined by six data descriptors: 1) reports to decision maker 2) documentation 3) data sources 4) analytical method and detection
More informationIntroduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.
Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative
More informationGreat Lakes National Program Office and Office of Water Quality Management Training Modules
GLOSSARY Assessment  the evaluation process used to measure the performance or effectiveness of a system and its elements. As used here, assessment is an allinclusive term used to denote any of the following:
More informationGuidance on Systematic Planning Using the Data Quality Objectives Process EPA QA/G4. United States Environmental Protection Agency
United States Environmental Protection Agency Office of Environmental Information Washington, DC 20460 EPA/240/B06/001 February 2006 Guidance on Systematic Planning Using the Data Quality Objectives Process
More informationSA 530 AUDIT SAMPLING. Contents. (Effective for audits of financial statements for periods beginning on or after April 1, 2009) Paragraph(s)
SA 530 AUDIT SAMPLING (Effective for audits of financial statements for periods beginning on or after April 1, 2009) Contents Introduction Paragraph(s) Scope of this SA... 1 2 Effective Date... 3 Objective...
More informationVisual Sample Plan (VSP): A Tool for Balancing Sampling Requirements Against Decision Error Risk
Visual Sample Plan (VSP): A Tool for Balancing Sampling Requirements Against Decision Error Risk B.A. Pulsipher, R.O. Gilbert, and J.E. Wilson Pacific Northwest National Laboratory, Richland, Washington,
More information4 PROJECT PLAN DOCUMENTS
1 4 PROJECT PLAN DOCUMENTS 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 4.1 Introduction The project plan documents are a blueprint for how a particular project
More informationGuidance on Systematic Planning Using the Data Quality Objectives Process EPA QA/G4. United States Environmental Protection Agency
United States Environmental Protection Agency Office of Environmental Information Washington, DC 20460 EPA/240/B06/001 February 2006 Guidance on Systematic Planning Using the Data Quality Objectives Process
More informationHYPOTHESIS TESTING WITH SPSS:
HYPOTHESIS TESTING WITH SPSS: A NONSTATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER
More informationQUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NONPARAMETRIC TESTS
QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NONPARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.
More informationAudit Sampling. HKSA 530 Issued July 2009; revised July 2010
HKSA 530 Issued July 2009; revised July 2010 Effective for audits of financial statements for periods beginning on or after 15 December 2009 Hong Kong Standard on Auditing 530 Audit Sampling COPYRIGHT
More informationMINITAB ASSISTANT WHITE PAPER
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. OneWay
More informationCONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE
1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,
More informationINTERNATIONAL STANDARD ON AUDITING 530 AUDIT SAMPLING
INTERNATIONAL STANDARD ON 530 AUDIT SAMPLING (Effective for audits of financial statements for periods beginning on or after December 15, 2009) CONTENTS Paragraph Introduction Scope of this ISA... 1 2
More informationMeasurement Information Model
mcgarry02.qxd 9/7/01 1:27 PM Page 13 2 Information Model This chapter describes one of the fundamental measurement concepts of Practical Software, the Information Model. The Information Model provides
More informationBusiness Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGrawHill/Irwin, 2008, ISBN: 9780073319889. Required Computing
More informationStatistical Inference and ttests
1 Statistical Inference and ttests Objectives Evaluate the difference between a sample mean and a target value using a onesample ttest. Evaluate the difference between a sample mean and a target value
More informationCourse Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics
Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGrawHill/Irwin, 2010, ISBN: 9780077384470 [This
More informationRisk Management Procedure For The Derivation and Use Of Soil Exposure Point Concentrations For Unrestricted Use Determinations
Risk Management Procedure For The Derivation and Use Of Soil Exposure Point Concentrations For Unrestricted Use Determinations Hazardous Materials and Waste Management Division (303) 6923300 First Edition
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationInferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More informationASSURING THE QUALITY OF TEST RESULTS
Page 1 of 12 Sections Included in this Document and Change History 1. Purpose 2. Scope 3. Responsibilities 4. Background 5. References 6. Procedure/(6. B changed Division of Field Science and DFS to Office
More informationHaulsey Engineering, Inc. Quality Management System (QMS) Table of Contents
Haulsey Engineering, Inc. Quality Management System (QMS) Table of Contents 1.0 Introduction 1.1 Quality Management Policy and Practices 2.0 Quality System Components 2.1 Quality Management Plans 2.2 Quality
More informationMTH 140 Statistics Videos
MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative
More informationHypothesis Testing Level I Quantitative Methods. IFT Notes for the CFA exam
Hypothesis Testing 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 3 2. Hypothesis Testing... 3 3. Hypothesis Tests Concerning the Mean... 10 4. Hypothesis Tests
More informationModule 5: Statistical Analysis
Module 5: Statistical Analysis To answer more complex questions using your data, or in statistical terms, to test your hypothesis, you need to use more advanced statistical tests. This module reviews the
More informationGuidance for Quality Assurance Project Plans for Modeling
United States Office of Environmental EPA/240/R02/007 Environmental Protection Information Agency Washington, DC 20460 Guidance for Quality Assurance Project Plans for Modeling EPA QA/G5M Quality Quality
More informationDefining the Beginning: The Importance of Research Design
Research and Management Techniques for the Conservation of Sea Turtles K. L. Eckert, K. A. Bjorndal, F. A. AbreuGrobois, M. Donnelly (Editors) IUCN/SSC Marine Turtle Specialist Group Publication No. 4,
More informationCHANCE ENCOUNTERS. Making Sense of Hypothesis Tests. Howard Fincher. Learning Development Tutor. Upgrade Study Advice Service
CHANCE ENCOUNTERS Making Sense of Hypothesis Tests Howard Fincher Learning Development Tutor Upgrade Study Advice Service Oxford Brookes University Howard Fincher 2008 PREFACE This guide has a restricted
More informationHow to Write a Successful PhD Dissertation Proposal
How to Write a Successful PhD Dissertation Proposal Before considering the "how", we should probably spend a few minutes on the "why." The obvious things certainly apply; i.e.: 1. to develop a roadmap
More informationINTERNATIONAL STANDARD ON AUDITING 530 AUDIT SAMPLING AND OTHER MEANS OF TESTING CONTENTS
INTERNATIONAL STANDARD ON AUDITING 530 AUDIT SAMPLING AND OTHER MEANS OF TESTING (Effective for audits of financial statements for periods beginning on or after December 15, 2004) CONTENTS Paragraph Introduction...
More informationGood luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:
Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours
More informationMEASURES OF LOCATION AND SPREAD
Paper TU04 An Overview of Nonparametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the
More informationINTERNATIONAL STANDARD ON AUDITING (UK AND IRELAND) 530 AUDIT SAMPLING AND OTHER MEANS OF TESTING CONTENTS
INTERNATIONAL STANDARD ON AUDITING (UK AND IRELAND) 530 AUDIT SAMPLING AND OTHER MEANS OF TESTING CONTENTS Paragraph Introduction... 12 Definitions... 312 Audit Evidence... 1317 Risk Considerations
More informationBiostatistics: Types of Data Analysis
Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More information2. Simple Linear Regression
Research methods  II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationQuality Assurance Guidance for Conducting Brownfields Site Assessments
United States Office of Solid Waste and EPA 540R98038 Environmental Protection Emergency Response OSWER 9230.083P Agency (5102G) PB98963307 September 1998 Quality Assurance Guidance for Conducting
More informationStatistical & Technical Team
Statistical & Technical Team A Practical Guide to Sampling This guide is brought to you by the Statistical and Technical Team, who form part of the VFM Development Team. They are responsible for advice
More informationOutline of Topics. Statistical Methods I. Types of Data. Descriptive Statistics
Statistical Methods I Tamekia L. Jones, Ph.D. (tjones@cog.ufl.edu) Research Assistant Professor Children s Oncology Group Statistics & Data Center Department of Biostatistics Colleges of Medicine and Public
More informationChapter 4 DECISION ANALYSIS
ASW/QMBCh.04 3/8/01 10:35 AM Page 96 Chapter 4 DECISION ANALYSIS CONTENTS 4.1 PROBLEM FORMULATION Influence Diagrams Payoff Tables Decision Trees 4.2 DECISION MAKING WITHOUT PROBABILITIES Optimistic Approach
More informationLocal outlier detection in data forensics: data mining approach to flag unusual schools
Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential
More informationOverview of EPA Quality System Requirements
Overview of EPA Quality System Requirements Course Goals At the completion of this course, you will: Understand EPA's quality system requirements Understand the roles and responsibilities in implementing
More informationData Quality. Tips for getting it, keeping it, proving it! Central Coast Ambient Monitoring Program
Data Quality Tips for getting it, keeping it, proving it! Central Coast Ambient Monitoring Program Why do we care? Program goals are key what do you want to do with the data? Data utility increases directly
More informationExample G Cost of construction of nuclear power plants
1 Example G Cost of construction of nuclear power plants Description of data Table G.1 gives data, reproduced by permission of the Rand Corporation, from a report (Mooz, 1978) on 32 light water reactor
More informationFinal Draft Report. Criteria and Resources for FacilityBased and PostRule Assessment
Final Draft Report Criteria and Resources for FacilityBased and PostRule Assessment Disclaimer The SCAQMD staff is releasing this draft report to stakeholders and the AQMD Advisory Group for review and
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationEPA GUIDANCE FOR QUALITY ASSURANCE PROJECT PLANS EPA QA/G5
United States Office of Research and EPA/600/R98/018 Environmental Protection Development February 1998 Agency Washington, D.C. 20460 EPA GUIDANCE FOR QUALITY ASSURANCE PROJECT PLANS FOREWORD The U.S.
More informationAudit Sampling. AU Section 350 AU 350.05
Audit Sampling 2067 AU Section 350 Audit Sampling (Supersedes SAS No. 1, sections 320A and 320B.) Source: SAS No. 39; SAS No. 43; SAS No. 45; SAS No. 111. See section 9350 for interpretations of this section.
More informationSample Size and Power in Clinical Trials
Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance
More informationCommon Tools for Displaying and Communicating Data for Process Improvement
Common Tools for Displaying and Communicating Data for Process Improvement Packet includes: Tool Use Page # Box and Whisker Plot Check Sheet Control Chart Histogram Pareto Diagram Run Chart Scatter Plot
More information(Refer Slide Time: 01:52)
Software Engineering Prof. N. L. Sarda Computer Science & Engineering Indian Institute of Technology, Bombay Lecture  2 Introduction to Software Engineering Challenges, Process Models etc (Part 2) This
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationPermutation Tests for Comparing Two Populations
Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. JaeWan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of
More informationDeveloping Quality Assurance Project Plans using Data Quality Objectives and other planning tools. Developing QAPPs 5/10/2012 1
Developing Quality Assurance Project Plans using Data Quality Objectives and other planning tools 1 Introductions 2 Agenda I. Developing a QAPP II. Systematic Planning using Data Quality Objectives III.
More informationHow to Conduct a Hypothesis Test
How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some
More informationParametric and Nonparametric: Demystifying the Terms
Parametric and Nonparametric: Demystifying the Terms By Tanya Hoskin, a statistician in the Mayo Clinic Department of Health Sciences Research who provides consultations through the Mayo Clinic CTSA BERD
More informationNonInferiority Tests for One Mean
Chapter 45 NonInferiority ests for One Mean Introduction his module computes power and sample size for noninferiority tests in onesample designs in which the outcome is distributed as a normal random
More informationSTATISTICAL ANALYSIS AND INTERPRETATION OF DATA COMMONLY USED IN EMPLOYMENT LAW LITIGATION
STATISTICAL ANALYSIS AND INTERPRETATION OF DATA COMMONLY USED IN EMPLOYMENT LAW LITIGATION C. Paul Wazzan Kenneth D. Sulzer ABSTRACT In employment law litigation, statistical analysis of data from surveys,
More informationPlanning sample size for randomized evaluations
TRANSLATING RESEARCH INTO ACTION Planning sample size for randomized evaluations Simone Schaner Dartmouth College povertyactionlab.org 1 Course Overview Why evaluate? What is evaluation? Outcomes, indicators
More informationEvaluation of Quantitative Data (errors/statistical analysis/propagation of error)
Evaluation of Quantitative Data (errors/statistical analysis/propagation of error) 1. INTRODUCTION Laboratory work in chemistry can be divided into the general categories of qualitative studies and quantitative
More informationIntro to Data Analysis, Economic Statistics and Econometrics
Intro to Data Analysis, Economic Statistics and Econometrics Statistics deals with the techniques for collecting and analyzing data that arise in many different contexts. Econometrics involves the development
More informationTesting Hypotheses About Proportions
Chapter 11 Testing Hypotheses About Proportions Hypothesis testing method: uses data from a sample to judge whether or not a statement about a population may be true. Steps in Any Hypothesis Test 1. Determine
More informationNormality Testing in Excel
Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com
More informationLecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
More informationSelecting Research Participants
C H A P T E R 6 Selecting Research Participants OBJECTIVES After studying this chapter, students should be able to Define the term sampling frame Describe the difference between random sampling and random
More informationMathematics. Probability and Statistics Curriculum Guide. Revised 2010
Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction
More informationData Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
More informationAP Statistics 2001 Solutions and Scoring Guidelines
AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for noncommercial use by AP teachers for course and exam preparation; permission for any other use
More informationUNDERSTANDING THE INDEPENDENTSAMPLES t TEST
UNDERSTANDING The independentsamples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly
More informationNonparametric tests, Bootstrapping
Nonparametric tests, Bootstrapping http://www.isrec.isbsib.ch/~darlene/embnet/ Hypothesis testing review 2 competing theories regarding a population parameter: NULL hypothesis H ( straw man ) ALTERNATIVEhypothesis
More informationDevelop Project Charter. Develop Project Management Plan
Develop Charter Develop Charter is the process of developing documentation that formally authorizes a project or a phase. The documentation includes initial requirements that satisfy stakeholder needs
More informationEXPLORING SPATIAL PATTERNS IN YOUR DATA
EXPLORING SPATIAL PATTERNS IN YOUR DATA OBJECTIVES Learn how to examine your data using the Geostatistical Analysis tools in ArcMap. Learn how to use descriptive statistics in ArcMap and Geoda to analyze
More informationUnderstanding Confidence Intervals and Hypothesis Testing Using Excel Data Table Simulation
Understanding Confidence Intervals and Hypothesis Testing Using Excel Data Table Simulation Leslie Chandrakantha lchandra@jjay.cuny.edu Department of Mathematics & Computer Science John Jay College of
More informationSTAT355  Probability & Statistics
STAT355  Probability & Statistics Instructor: Kofi Placid Adragni Fall 2011 Chap 1  Overview and Descriptive Statistics 1.1 Populations, Samples, and Processes 1.2 Pictorial and Tabular Methods in Descriptive
More informationTwoSample TTests Assuming Equal Variance (Enter Means)
Chapter 4 TwoSample TTests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one or twosided twosample ttests when the variances of
More informationSurvey Process White Paper Series The Six Steps in Conducting Quantitative Marketing Research
Survey Process White Paper Series The Six Steps in Conducting Quantitative Marketing Research POLARIS MARKETING RESEARCH, INC. 1455 LINCOLN PARKWAY, SUITE 320 ATLANTA, GEORGIA 30346 404.816.0353 www.polarismr.com
More informationPresenting survey results Report writing
Presenting survey results Report writing Introduction Report writing is one of the most important components in the survey research cycle. Survey findings need to be presented in a way that is readable
More informationTIPS DATA QUALITY STANDARDS ABOUT TIPS
2009, NUMBER 12 2 ND EDITION PERFORMANCE MONITORING & EVALUATION TIPS DATA QUALITY STANDARDS ABOUT TIPS These TIPS provide practical advice and suggestions to USAID managers on issues related to performance
More informationbusiness statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
More informationStatistics 641  EXAM II  1999 through 2003
Statistics 641  EXAM II  1999 through 2003 December 1, 1999 I. (40 points ) Place the letter of the best answer in the blank to the left of each question. (1) In testing H 0 : µ 5 vs H 1 : µ > 5, the
More informationThe Auditor's Responsibilities Relating to Other Information
Exposure Draft April 2014 Comments due: July 18, 2014 Proposed International Standard on Auditing (ISA) 720 (Revised) The Auditor's Responsibilities Relating to Other Information Proposed Consequential
More informationNorthumberland Knowledge
Northumberland Knowledge Know Guide How to Analyse Data  November 2012  This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about
More informationINTERNATIONAL STANDARD ON AUDITING 230 AUDIT DOCUMENTATION CONTENTS
INTERNATIONAL STANDARD ON AUDITING 230 AUDIT DOCUMENTATION (Effective for audits of financial statements for periods beginning on or after December 15, 2009) CONTENTS Paragraph Introduction Scope of this
More informationISRE 2400 (Revised), Engagements to Review Historical Financial Statements
International Auditing and Assurance Standards Board Exposure Draft January 2011 Comments requested by May 20, 2011 Proposed International Standard on Review Engagements ISRE 2400 (Revised), Engagements
More informationTwoSample TTests Allowing Unequal Variance (Enter Difference)
Chapter 45 TwoSample TTests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one or twosided twosample ttests when no assumption
More informationAppendix B Checklist for the Empirical Cycle
Appendix B Checklist for the Empirical Cycle This checklist can be used to design your research, write a report about it (internal report, published paper, or thesis), and read a research report written
More informationSouth Carolina College and CareerReady (SCCCR) Probability and Statistics
South Carolina College and CareerReady (SCCCR) Probability and Statistics South Carolina College and CareerReady Mathematical Process Standards The South Carolina College and CareerReady (SCCCR)
More informationGlossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Battery Bias
Glossary of Terms Ability A defined domain of cognitive, perceptual, psychomotor, or physical functioning. Accommodation A change in the content, format, and/or administration of a selection procedure
More information4 QUALITY ASSURANCE 4.1 INTRODUCTION
4 QUALITY ASSURANCE 4.1 INTRODUCTION BNL has an established Quality Assurance/Quality Control (QA/QC) Program to ensure that the accuracy, precision, and reliability of environmental monitoring data are
More informationUse advanced techniques for summary and visualization of complex data for exploratory analysis and presentation.
MS Biostatistics MS Biostatistics Competencies Study Development: Work collaboratively with biomedical or public health researchers and PhD biostatisticians, as necessary, to provide biostatistical expertise
More informationAudit Sampling 101. BY: Christopher L. Mitchell, MBA, CIA, CISA, CCSA Cmitchell@KBAGroupLLP.com
Audit Sampling 101 BY: Christopher L. Mitchell, MBA, CIA, CISA, CCSA Cmitchell@KBAGroupLLP.com BIO Principal KBA s Risk Advisory Services Team 15 years of internal controls experience within the following
More information