Assessing Data Quality for Healthcare Systems Data Used in Clinical Research (Version 1.0)
|
|
|
- Ashley Robinson
- 10 years ago
- Views:
Transcription
1 Assessing Data Quality for Healthcare Systems Data Used in Clinical Research (Version 1.0) An NIH Health Care Systems Research Collaboratory Phenotypes, Data Standards, and Data Quality Core White Paper Meredith N. Zozus, PhD 1 ; W. Ed Hammond, PhD 1 ; Beverly B. Green, MD, MPH 2 ; Michael G. Kahn, MD, PhD 3 ; Rachel L. Richesson, PhD, MPH 4 ; Shelley A. Rusincovitch 5 ; Gregory E. Simon, MD, MPH 2 ; Michelle M. Smerek 5 1Duke Translational Medicine Institute, Duke University School of Medicine, Durham, North Carolina; 2Group Health Research Institute, Seattle, Washington; 3 University of Colorado, Denver, Denver, Colorado; 4 Duke University School of Nursing, Durham, North Carolina; 5 Duke Clinical Research Institute, Duke University School of Medicine, Durham, North Carolina This work was supported by a cooperative agreement (U54 AT007748) from the NIH Common Fund for the NIH Health Care Systems Research Collaboratory. The views presented here are solely the responsibility of the authors and do not necessarily represent the official views of the National Institutes of Health. Address for correspondence: Meredith N. Zozus, PhD Associate Director for Clinical Research Informatics Duke Translational Medicine Institute DUMC Box 3850 Durham, NC Tel: (919) [email protected] 1
2 Table of Contents Objective... 3 The NIH Health Care Systems Research Collaboratory... 3 Data Quality Assessment Background... 4 Data Quality Assessment Dimensions... 5 Completeness... 5 Accuracy... 7 Consistency Data Quality Assessment Recommendations for Collaboratory Projects Recommendation 1 - Key data quality dimensions Recommendation 2 - Description of formal of assessments Recommendation 3 Reporting data quality assessment with research results Use of workflow and data flow diagrams to inform data quality assessment Concluding Remarks References Appendix I Defining data quality Defining the quality of research data Data quality related review criteria Criterion 1: Are data collection methods adequately validated? Criterion 2: Validated methods for the electronic health record information? Criterion 3: Demonstrated quality assurance and harmonization of data elements across healthcare systems/sites? Criterion 4: Are plans adequate for data quality control during the UH3 phase? References Appendix II: Data Quality Assessment Plan Inventory Appendix III: Initial Data Quality Assessment Recommendations for Collaboratory Projects Testing the recommendations with the STOP CRC project Summary of findings from testing with the STOP CRC project References
3 Objective Quality assessment of healthcare data used in clinical research is a developing area of inquiry. The methods used to assess healthcare data quality in practice are varied, and evidence-based or consensus best practices have yet to emerge. 1 Further, healthcare data have long been criticized for a plethora of quality problems. To establish credibility, studies that use healthcare data are increasingly expected to demonstrate that the quality of the data is adequate to support research conclusions. Pragmatic clinical trials (PCTs) in healthcare settings rely upon data generated during routine patient care to support the identification of individual research subjects or cohorts as well as outcomes. Knowing whether data are accurate depends on some comparison, e.g., comparison to a source of truth or to an independent source of data. Estimating an error or discrepancy rate, of course, requires a representative sample for the comparison. Assessing variability in the error or discrepancy rates between multiple clinical research sites likewise requires a sufficient sample from each site. In cases where the data used for the comparison are available electronically, the cost of data quality assessment is largely based on time required for programming and statistical analysis. However, when labor-intensive methods such as manual review of patient charts are used, the cost is considerably higher. The cost of rigorous data quality assessment may in some cases present a barrier to conducting PCTs. For this reason, we seek to highlight the need for more cost-effective methods for assessing data quality. PRAGMATIC CLINICAL TRIAL (PCT): We use the definition articulated by the Clinical and Translational Science Awards pragmatic clinical trials infrastructure (PCTi) workshop: A prospective comparison of a community, clinical, or system-level intervention and a relevant comparator in participants who are similar to those affected by the condition(s) under study and in settings that are similar to those in which the condition is typically treated. 2 Because of the potential cost implications and the fear of taking the pragmatic out of PCTs, we find it difficult to make these recommendations. However, the principles underlying recommendations for applying data quality assessment to research that uses healthcare data are irrefutable. The credibility and reproducibility of research depends on the investigator s demonstration that the data on which conclusions are based are of sufficient quality to support them. Thus, the objective of this document is to provide guidance, based on the best available evidence and practice, for assessing data quality in PCTs conducted through the National Institutes of Health (NIH) Health Care Systems Research Collaboratory. The NIH Health Care Systems Research Collaboratory The NIH Health Care Systems Research Collaboratory ( or Collaboratory is intended to improve the way clinical trials are conducted by creating new approaches, infrastructure, and methods for collaborative research. To develop and 3
4 demonstrate these methods, the Collaboratory also supports the design and rapid execution of high-impact PCT Demonstration Projects that 1) address questions of major public health importance and 2) engage healthcare delivery systems in research partnership. Organizationally, the Collaboratory comprises a series of these Demonstration Projects funded for 1 planning year, with competitive renewal to allow transition into actual trial conduct, and a Coordinating Center to provide support for these efforts. Within the Coordinating Center, seven Working Groups/Cores serve to identify, develop, and promote solutions for issues central to conducting PCTs: 1) electronic health record use in research; 2) phenotypes, data standards, and data quality; 3) patient-reported outcomes; 4) healthcare system interactions; 5) regulatory and ethical issues; 6) biostatistics and study design; and 7) stakeholder engagement. The Cores have the bidirectional objectives of promoting the exchange of information on methods and approaches among Demonstration Projects and the Coordinating Center, as well as synthesizing and disseminating best practices derived from Demonstration Project experiences to the larger research community. Supported by the NIH Common Fund, the Collaboratory s ultimate goal is to ensure that healthcare providers and patients can make decisions based on the best available clinical evidence. The Collaboratory provides an opportunity to observe data quality assessment plans and practices for PCTs conducted in healthcare settings. The Collaboratory s Phenotypes, Data Standards, and Data Quality (PDSDQ) Core 3 includes representatives from the Collaboratory Coordinating Center and Demonstration Projects, researchers with related interests, and NIH staff. In keeping with the bidirectional goals of the PDSDQ Core, an action research paradigm was used in which the Core interacted with Demonstration Projects, observed data quality assessment plans and practices, participated where invited, and synthesized experience to generalize information for others embarking on similar research. We report here the observations and iteratively developed (and still-evolving) data quality assessment methodology from the initial planning grant year for the Collaboratory s first seven Demonstration Projects. These results have been vetted by the PDSDQ Core and other Collaboratory participants and represent the experience of this group at the time of development; however, they do not represent official NIH opinions or positions. Data Quality Assessment Background Depending on the scientific question posed by a given study, PCTs may rely on data generated during routine care or on data collected prospectively for the study. Therefore, data quality assurance and assessment for such studies necessarily includes methods for both situations: 1) collection of data specifically for a study, where the researcher is able to influence or control the original data collection and 2) use of data generated in routine care, where the researcher has little or no control over the data collection. For the former, significant guidance is available via the Good Clinical Data Management Practices (GCDMP) document, 4 and we do not further discuss quality assurance and assessment methods for these types of prospective research data. Instead, this guidance will focus on the use or reuse of data generated from routine patient care, based on the following: 4
5 1. Existing literature on data quality assessment for healthcare data that are re-used for research 2. Experience during the first year of the Collaboratory In this document, we rely on a multidimensional definition of data quality. The dimensions of accuracy and completeness are the most commonly assessed in health-related research. 5 A recent review identified five dimensions that have been assessed in electronic health record (EHR) data used for research; they include completeness, correctness, concordance, plausibility, and currency. 6 Accuracy, completeness, and consistency (Table 1) most closely affect the capacity of data to support research conclusions and are therefore the focus of our discussion here. A brief review of the literature on defining data quality is provided in Appendix I, and specific dimensions used here are defined below. Unfortunately, definitions of data quality dimensions are highly variable in the literature. The sections below outline conceptual definitions of these dimensions followed by operational examples. Table 1. Data Quality Dimensions Determining Fitness for Use of Research Data Dimension Conceptual definition Operational examples Completeness Presence of the necessary data Presence of necessary data elements, percent of missing values for a data element, percent of records with sufficient data to calculate a required variable (e.g., an outcome) Accuracy Consistency Closeness of agreement between a data value and the true value* Relevant uniformity in data across clinical investigation sites, facilities, departments, units within a facility, providers, or other assessors Percent of data values found to be in error based on a gold standard, percent of physically implausible values, percent of data values that do not conform to range expectations Comparable proportions of relevant diagnoses across sites, comparable proportions of documented order fulfillment (e.g., returned procedure report for ordered diagnostic tests) *Consistent with the International Organization for Standardization (ISO) 8000 Part 2 definition of accuracy, 7 replaced property value in the ISO 8000 definition with data value for consistency with the language used in clinical research. Based on the literature relevant to data quality assessment in the secondary use of EHR data and our experience thus far with the Collaboratory (described in Appendices II and III), we offer a set of data quality assessment recommendations for Collaboratory projects. First, we summarize important dimensions, common or reported approaches to characterizing them, and characteristics of an ideal operationalization. Next, we streamline specific recommendations for researchers using data generated from routine care. Data Quality Assessment Dimensions Completeness Conceptually, completeness is the presence of necessary data. The operationalization of completeness presented below was adapted from recent theoretical work by Weiskopf et 5
6 al., 8 in which a comprehensive assessment of completeness covers four mutually exclusive areas: 1. Data element completeness: The presence of all necessary variables in a candidate dataset; i.e., Are the right columns present? Data element completeness is assessed by examining metadata, such as a data dictionary or list of data elements contained in a dataset and their accompanying definitions, and comparing this information against the variables required in the analytic or statistical plan. With adequate data documentation, data element completeness can be assessed without examining any data values. 2. Column data value completeness: The percentage of data values present for each data element. Note, however, that often (as in normalized structures) more than one data element may be stored in a database column. The word column is used to help the reader visualize the concept and because normalized data structures are often flattened to a 1-column-per-data-element format to generate and report data quality related statistics. Column data value completeness is assessed by structuring the dataset in a 1-column-per-data-element format and calculating the percentage of non-missing data for each column, with non-missing defined as not null and not otherwise coded to a null flavor. Null flavors (e.g., not applicable, not done) are defined in the International Organization for Standardization (ISO) and Health Level Seven International (HL7) 10 data type definition standards. 3. Ascertainment completeness: The percentage of eligible cases present; i.e., Do you have the right rows in the dataset? Ascertainment usually cannot be verified with absolute certainty. Assessment options are typically comparison based and include but are not limited to: 1) chart review in a representative sample and 2) comparison to one or more independent data sources covering the same population or a subset of that population. Ascertainment completeness is affected by data quality problems, by phenotype definition and execution, and by factors that bias membership of a dataset. Other issues commonly evaluated in an ascertainment assessment include the presence and extent of duplicate records and records for patients that do not exist (for example: an error in the medical record number creates a new case; a patient gives a name other than his or her own), or duplicate events such as a single procedure being documented more than once. Ascertainment completeness and phenotype validation significantly overlap in goals and can be accomplished together. 4. Row data value completeness: The percentage of cases/patients with sufficient data values present for a given data use. Row data value presence is assessed using study-specific algorithms programmed to calculate the percentage of cases with all data or with study-relevant combinations of missing and non-missing data (e.g., in the case of body mass index [BMI], the percent missing of either weight OR height might be calculated, because missing either data point renders the case unusable for calculating BMI). A comprehensive completeness assessment consists of all four components. In terms of effort, column completeness is accomplished through a review of data elements available in 6
7 a data source, and column data value completeness and row data value completeness are straightforward computational activities. Ascertainment completeness, however, can be a resource-intensive task (e.g., chart review on a representative sample; electronic comparison among several data sources). Additional guidance and discussion regarding data completeness in the setting of EHR data extracted for pragmatic clinical research is available here. 11 Completeness, although necessary to establish fitness for use in clinical research, is not sufficient to evaluate the competence of a dataset to support research conclusions. Assessment of accuracy and consistency are also necessary. Accuracy In keeping with ISO 8000 standards, 7 we define data accuracy as the property exhibited by a data value when it reflects the true state of the world at the stated or implied point of assessment. It follows that an inaccurate or errant datum does not reflect the true state of the world at the stated or implied point of assessment. 12 Data errors are instances of inaccuracy. DATA ACCURACY: The closeness of agreement between a data value and the true value. adapted from ISO Detection of data errors is accomplished through comparison; for example, comparison of a dataset to some other source of information (Figure 1). The comparison may be between the data value and a source of truth, a known standard, a set of valid values, a redundant measurement, independently collected data for the same concept, an upstream data source, a validated indicator of possible errors, or aggregate statistics. As the source for comparison moves farther from a source of truth, we move from identification of data errors to indications that a datum may be in error. We use the term error to denote any deviation from accuracy regardless of the cause. For example, a programming problem in data transformation that renders an originally accurate value incorrect is considered to have caused a data error. Because data are subject to multiple processing steps, some count the number of errors (consider, for instance, a data value that has sustained two problems that each would have individually caused an error for a total of two errors). From an outcomes perspective, it is the number of fields in error that matters rather than the number of errors; thus, in data quality assessment, the number of data values in error is counted rather than the number of errors. Different agreement statistics may be applied depending on whether the source of comparison is considered a source of truth or gold standard versus an independent source of information. Operationally, an instance of inaccuracy INACCURACY/DATA ERROR: Any discrepancy or data error is any discrepancy identified that cannot be explained by documentation. through such a comparison that cannot be explained by documentation. 12,13 The caveat not explained by documentation is 7
8 important because efforts to identify data discrepancies (i.e., potential errors) can be undertaken on data at different stages of processing. Such processing sometimes includes transformations on the data such as imputations that purposefully change the value. In these cases, a data consumer should expect the changes to be supported by documentation and be traceable through all of the data processing steps. Accuracy has been described in terms of two basic concepts: 1) representational adequacy/inadequacy, defined as the extent to which an operationalization is consistent with/differs from the desired concept (validity), including but not limited to imprecision or semantic variability, hampering interpretation of data and 2) information loss and degradation, including but not limited to reliability, change over time, and error. 14 Representational inadequacy is the degree to which a data element differs from the desired concept. For example, a researcher seeking obese patients for a study uses BMI REPRESENTATIONAL INADEQUACY: The degree to which a data element differs from the desired concept. to define the obesity phenotype, knowing that a small percentage of bulky but lean bodybuilders may be included. Representational inadequacy is best addressed at the point in research design when data elements and sources are selected. Representational inadequacy can be affected by local work and data flows of data elements used in a study, e.g., differences in local coding practices causing differences in datasets across institutions. Thus, harmonization of data elements across sites is emphasized in NIH review criteria for Collaboratory PCTs (Appendix I). Documenting work and data flows for each data element, from the point of origin to the analysis dataset (traceability), has long been required in regulated research, 4 reported as a best practice in the information quality literature, and implemented in healthcare settings. 15 Comparisons of data definitions, workflows, and data flows across research sites are as important in assessing representational inadequacy of healthcare data as is the use of validated questionnaires in assessing subjective concepts. Some differences in workflow will not affect representation, while others may; the only way to know is to understand the workflow at each site and evaluate the effect, if any, of representation. Such documentation for data collected in healthcare settings may not be as precise as that for clinical trial data collection processes. For example, it can be difficult to assess the data capture process of patient-reported outcomes (PROs) in healthcare settings due to differences in individual departments, clinics, and hospitals within an individual healthcare organization. The workflow can also vary over time as refinements are made. Results of such assessments for representational inadequacy are often qualitative and used either as formative assessments in research design or to describe limitations in reported results. Comparisons of aggregate or distributional statistics (e.g., marginal) as performed by the Observational Medical Outcomes Project (OMOP), 16 have also been used to identify representational variations in datasets caused by differences in local practice among the institutions providing data. 16 Using both process-oriented and data-based approaches in concert to confirm representational adequacy of data elements is recommended. A process- 8
9 oriented approach may be used at the time of site selection; once consistency is confirmed, a data-based approach may be used to monitor consistency during the study. Information loss or degradation is the loss of information content over time and can arise from errors or purposeful decisions in data collection and processing (for example: data reduction such as interval data collected as ordinal data; separation of data values from contextual data elements; or data values that lose accuracy or relevance over time). Information loss and degradation may be prevented or mitigated by decisions made during research design. Because such errors and omissions are sensitive to many organizational factors (e.g., local clinical documentation practices, mapping decisions made for warehoused data), they should be assessed for any data source. Thus, workflow and data flow documentation also help to assess sources of information loss and degredation. 15 Assessing data accuracy, primarily with regard to information loss and degradation, involves comparisons, either of individual values (as is commonly done in clinical trials 14 and registries 5 ) or of aggregate or distributional statistics. 14, Individual value comparisons: At the individual-value level, the comparison could be to the truth (if known), to an independent measurement, to a validated indicator, or to valid (expected, physically plausible, or logically consistent) values. 14,17,19 In practice, the options for comparison (Figure 1) represent a continuum from truth to measurements of lesser proximity to the truth, such as an accepted gold standard or valid values. Thus, accuracy assessment usually provides a disagreement rate, and much less often, an actual error rate. Further, in some prospective settings, 4 the identification of data discrepancies is done for the purpose of resolving them; in other settings, where data correction is not possible, data discrepancies are identified for the purpose of reporting a data discrepancy rate or to inform statistical analysis. 2. Aggregate and distributional comparisons: Aggregate and distributional comparisons (such as frequency counts or measures of central tendency and dispersion) can be used as a surrogate accuracy assessments. For example, differences in aggregate or distributional measures between a research dataset and an independent data source with a similar population may indicate possible data discrepancies, while similar measures would increase confidence in the research data. Differences in central tendency and dispersion measures in age or a socioeconomic status measure may indicate significant differences in the populations in two data sets. Aggregate and distributional comparisons can be also be performed within a dataset, between multiple sites in a multicenter study, 17,18 or between subgroups as measures of consistency. In the absence of a source of truth, comprehensive accuracy assessment of multisite studies includes use of individual value, aggregate, and distributional measures. 17 To emphasize the importance of these within and between dataset comparisons, a third dimension, consistency (described below), was added. The difference between the two dimensions here lies not in the measures, but in the purpose of the comparisons and in the choice of data on which to run them. 9
10 An accuracy assessment requires selecting a source for comparison, making the comparison, and then quantifying the results. In Figure 1, sources for comparison are listed in descending order of their proximity to truth. If there are multiple options, those sources for comparison toward the top of the list in Figure 1 are preferred because the sources for comparison are closer to the truth. Thus, sources for comparison toward the top provide quantitative assessments of accuracy, whereas sources for comparison in the middle provide partial measures of accuracy and, depending on the data source used for the comparison, may enable identification of errors or may only indicate discrepancies. Sources for comparison toward the bottom identify only data discrepancies, i.e., items that may or may not represent an actual error. For example, if it has been shown that a percentage of missing values is inversely correlated with data accuracy, then percent missing may be an indicator of lower accuracy. The hierarchy of sources for comparison shown in Figure 1 provides a list of possible comparisons ranging (from bottom to top) from those that are achievable in every situation but provide less information about true data accuracy, to the ideal but rarely achievable case that provides an actual data error rate. This hierarchy simplifies the selection of sources for comparison: where more than one source for comparison exists, the highest practical comparison in the list should be used. Comparison to a source of truth Comparison to an independent measurement Comparison to independently managed data Comparison to an upstream data source Comparison to a known standard Comparison to valid values Comparison to validated indicators Comparison to aggregate statistics Accuracy Partial accuracy Discrepancy detection Gestalt Figure 1. Data Accuracy Assessment Comparison Hierarchy. Comparison of data to sources listed above the top line provides full assessment of data accuracy; sources listed below the top line provide only partial assessments of accuracy. Sources above the bottom line can be used to detect actual data discrepancies, whereas sources below the bottom line can only indicate that discrepancies may exist. Items at the top of the list identify actual errors, whereas items in the middle only identify discrepancies that may or may not in fact be an error. Items toward the bottom merely indicate that discrepancies may exist. The strength of the accuracy assessment depends not only on the proximity to truth of the source for comparison, but also on the importance and number of data elements for which accuracy can be assessed. Accuracy assessments are often performed on subsets of data elements or subsets of the population, rather than across the whole dataset. Common 10
11 subsets assessed include data elements used in subject or cohort identification, data elements used to derive clinical outcomes, and patients for whom an independent source of data (such as registry or Medicare claims data) is readily available for comparison. Accuracy assessments should be done for cohort identification data elements, outcome data elements, and covariates. Accuracy assessments for a given study may use different sources for comparison. Comparisons for data accuracy assessments will likely differ based on the underlying nature of the phenomena about which the data values were collected. Examples of different phenomena include anatomic or pathologic phenomena, physiologic or functional phenomena, imaging or laboratory findings, patients' symptomatic experiences, and patients behaviors or functioning. The data values collected about these phenomena may be the result of inherently different processes, including but not limited to measurement of a physical quantity, direct observation, clinical interpretation of available information, asking patients directly, or psychometric measurements. These are not complete lists, and we do not provide a deterministic map of phenomena and measurement CONSISTENCY: Relevant uniformity in data across clinical investigation sites, facilities, departments, units within a facility, providers, or other assessors. processes to associated error sources. We simply note that different phenomena and measurement or collection processes are sometimes characteristically prone to different sources of error. Such associations should be considered when data elements and comparisons for data quality assessment are chosen. Consistency Consistency is defined here as relevant uniformity in data across clinical investigation sites, facilities, departments, units within a facility, providers, or other assessors. Inconsistencies, therefore are instances of difference. In other frameworks, 18,20,21 the label consistency is used for several different things, such as uniformity of data over time or conformance of data values to other values in the dataset (e.g., gender correctness of gender-specific diagnoses and procedures, procedure dates before discharge date). Here, we view these valid value comparisons as surrogate indicators of accuracy (Figure 1). There are many ways that data can be inconsistent; for example, clinical documentation policies or practices may vary over time within a facility, between facilities, or between individuals in a facility. Consider a study where the outcome measure is whether or not patient behavior regarding medication taking changes. If some sites document filled prescriptions from pharmacy data sources while others rely on patient reporting, the outcome measure would be inconsistent between the sites. Actions should be taken to improve similarity in documentation or to use other documentation that is common across all sites. Otherwise, such inconsistencies may introduce bias and affect the capacity of the data to support study conclusions. Thus, the consistency dimension comes into play particularly in multisite or multifacility studies and when such differences may exist in clinical documentation, data collection, or data handling within a study. Comparisons of 11
12 multisite data over time to examine expected and unexpected changes in aggregate or distributional data can also be useful. For example, changes in EHR systems, such as new data being captured, data no longer being captured, or even implementation of a new system, are commonplace and affect data. Assessing consistency during a study (data quality monitoring) is the only way to ensure that such changes will be detected. Targeted consistency assessments are important during the feasibility-assessment phase of study planning. For example, to ascertain whether data are sufficiently consistent across facilities to support a proposed study, consistency assessments may be operationalized by qualitative assessments such as review of clinical documentation policies and procedures, interviews with facilities covering clinical documentation procedures and practice, or direct observation of workflow. Initial consistency checks can also be established using aggregate or distributional statistics. Once data collection has started, consistency should be monitored over time or across individuals, units, or facilities by aggregate or distributional statistics. OMOP 16 and Mini-Sentinel 18 both provide publically available consistency checks that are executable against the OMOP and Mini-Sentinel common data models, respectively. Although PCTs are less likely to utilize a common data model, the OMOP and Mini-Sentinel programs provide excellent examples of checks that can be used to evaluate consistency across investigational sites, facilities, departments, clinical units, providers, or assessors in PCTs. As with accuracy assessments, consistency assessments should be conducted for data elements used in subject or cohort identification, outcome data elements, and covariates. Data Quality Assessment Recommendations for Collaboratory Projects We have defined critical components of data quality assessment for research using data generated in healthcare settings that we consider to be necessary in demonstrating the capacity of data to support research conclusions. Our recommendations below for data quality assessment for Collaboratory research projects are based on these key components: Recommendation 1 - Key data quality dimensions We recommend that accuracy, completeness, and consistency be formally assessed for data elements used in subject identification, outcome measures, and important covariates. Recommendation 2 - Description of formal of assessments 1. Completeness assessment recommendation: Use of a four-part completeness assessment. The same column and data value completeness measures can be employed for monitoring completeness throughout the project. The completeness assessment applies to both prospectively collected and secondary use data. Additional requirements suggested by the GCDMP, such as on-screen prompts for missing data where appropriate, apply to data collected prospectively for a study. 2. Accuracy assessment recommendation: Identification and conduct of projectspecific accuracy assessments for subject/cohort identification data elements, outcome data elements, and covariates. The highest practical accuracy assessment in 12
13 the hierarchy shown in Figure 1 should be used. The same measures may be applicable for monitoring data accuracy throughout the project. Additional requirements suggested by the GCDMP, such as on-screen prompts for inconsistent data where appropriate apply to prospectively collected data. 3. Consistency assessment recommendation: Identification of: a) areas where differences in clinical documentation, data collection, or data handling may exist between individuals, units, facilities, sites, or assessors, or over time and b) measures to assess consistency and monitor it throughout the project. A systematic approach to identifying candidate consistency assessments should be used. Such an approach will likely be based on review of available data sources, accompanied by an approach for systematically identifying and evaluating the likelihood and impact of possible inconsistencies. This recommendation applies to both prospectively collected data and secondary use data. 4. Impact assessment recommendation: Use of completeness, accuracy, and consistency assessment results by the project statistician to test sensitivity of the analyses to anticipated or identified data quality problems, including a plan for reassessing based on results of data quality monitoring throughout the project. Recommendation 3 Reporting data quality assessment with research results As recommended elsewhere, results of data quality assessments should be reported with research results. 1,22 Data quality assessments are the only way to demonstrate that data quality is sufficient to support the research conclusions. Thus, data quality assessment results must be accessible to consumers of research. Use of workflow and data flow diagrams to inform data quality assessment In our initial recommendations (Appendix III), we encouraged the creation and use of data flow and workflow diagrams to aid in identifying accuracy and in conducting consistency assessments; however, this strategy has both advantages and disadvantages. Among the advantages is that the diagrams are helpful in other aspects of operationalizing a research project and in managing institutional information architecture. Thus, they may already exist, and if not, they will likely be used for other purposes. Understanding workflow around clinical documentation of cohort identifiers, outcomes data, and covariates is necessary for assessing potential inconsistencies between sites. Workflow knowledge is also required in cases where the clinical workflow will be modified for the research, e.g., collecting study-specific data within clinical processes or using routine clinical data to trigger research activities. In the Collaboratory STOP CRC Demonstration Project, documentation of a patient s colonoscopy turns off further fecal occult blood test screening interventions for a period of time. Logic decisions similar to these would be clearly documented in the workflow and data flow analysis. On our test project, the process of creating and reviewing the diagrams prompted discussion of potential data quality issues as well as strategies for prevention or mitigation of problems. 13
14 Alternatively, if workflow diagrams do not exist for a facility, creation of these diagrams solely for the purpose of such an analysis may not be feasible. Consider a study with 30 small participating investigational sites from different institutions. Creation of workflow and data flow diagrams de novo for a study would consume significant resources. In such cases where the effort associated with creating and reviewing such diagrams is not practical, we offer the following set of questions that could be reviewed with personnel at each facility. These questions were developed based on our experience with the testing of the initial recommendations. 1. Talk through each of the data elements used for cohort identification. Can you explain how and where each one is documented in the clinic/on the unit (i.e., what information system, what screen, at what point in the clinical process, and by whom)? 2. When you send us the data or connect data to a federated system, what data store will you create/use? Importantly, please describe all data transformation between the source system and the data store used for this research. 3. For each data element used in the cohort identification, do you know of any difference in data capture or clinical documentation practices across clinics at your site or for different subsets of your population? 4. For each data element used in cohort identification, do you know of any subsets of data that may be documented differently, such as data from specialist or hospital reports external to your group versus data from your practice, or internal laboratory data from analyzers on site versus those that you receive from external clinical laboratories? The four questions above should be applied to other important data elements such as outcome measures and covariates. Concluding Remarks Moving forward, attention to data quality will be critical and increasingly expected, as in the case of the data validation review criteria for the Collaboratory. Although generalized computational approaches have shown great promise in large national initiatives such as Mini-Sentinel and OMOP, they are currently dependent on the existence of a common data model. However, as healthcare institutions across the country embark upon data governance initiatives, and as standard data elements become a reality for healthcare and health-related research, more and better machine-readable metadata are becoming available. Ongoing research in this arena will work toward leveraging this information to increase automation of data quality assessment and create metadata-driven, next-generation approaches to computational data quality assessment. 14
15 References 1. Brown JS, Kahn M, Toh S. Data quality assessment for comparative effectiveness research in distributed data networks. Med Care 2013;51(8 suppl 3):S22 S29. PMID: doi: /MLR.0b013e31829b1e2c. 2. Saltz J. Report on Pragmatic Clinical Trials Infrastructure Workshop. Available at: pdf. Accessed July 28, Richesson, RL, Hammond WE, Nahm M, et al. Electronic health records based phenotyping in next-generation clinical trials: a perspective from the NIH Health Care Systems Collaboratory. J Am Med Inform Assoc 2013;20:e226 e231. PMID: doi: /amiajnl Society for Clinical Data Management. Good Clinical Data Management Practices (GCDMP). Available at: Accessed July 2, Arts DG, De Keizer NF, Scheffer GJ. Defining and improving data quality in medical registries: a literature review, case study, and generic framework. J Am Med Inform Assoc 2002;9: PMID: doi: /jamia.M Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc 2013;20: PMID: doi: /amiajnl International Organization for Standardization. ISO :2012(E) Data Quality Part 2: Vocabulary. 1st ed. June 15, Weiskopf NG, Hripcsak G, Swaminathan S, Weng C. Defining and measuring completeness of electronic health records for secondary use. J Biomed Inform 2013;46: PMID: doi: /j.jbi International Organization for Standards. ISO Health Informatics Harmonized Data Types for Information Interchange Health Level Seven International. HL7 Data Type Definition Standards. Available at: v. Accessed July 2, NIH Health Care Systems Collaboratory Biostatistics and Study Design Core. Key issues in extracting usable data from electronic health records for pragmatic clinical trials. Version 1.0 (June 26, 2014). Available at: Accessed July 28, Nahm M, Bonner J, Reed PL, Howard K. Determinants of accuracy in the context of clinical study data. International Conference on Information Quality (ICIQ), Paris, 15
16 France, November Available at: Accessed July 2, Nahm ML, Pieper CF, Cunningham MM. Quantifying data quality for clinical trials using electronic data capture. PLoS ONE 2008;3:e3049. PMID: doi: /journal.pone Tcheng J, Nahm M, Fendt K. Data quality issues and the electronic health record. Drug Information Association Global Forum 2010;2: Davidson B, Lee WY, Wang R. Developing data production maps: meeting patient discharge data submission requirements. International Journal of Healthcare Technology and Management 2004;6: Observational Medical Outcomes Partnership. Generalized Review of OSCAR Unified Checking Available at: Accessed July 2, Kahn MG, Raebel MA, Glanz JM, Riedlinger K, Steiner JF. A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research. Med Care 2012;50(suppl):S21 S29. PMID: doi: /MLR.0b013e318257dd Mini-Sentinel Operations Center. Mini-Sentinel Common Data Model Data Quality Review and Characterization Process and Programs. Program Package version: September Available at: Accessed July 2, Brown PJ, Warmington V. Data quality probes - exploiting and improving the quality of electronic patient record data and patient care. Int J Med Inform 2002;68: PMID: doi: /S (02) Weiskopf NG, Enabling the Reuse of Electronic Health Record Data through Data Quality Assessment and Transparency. Doctoral Dissertation, Columbia University, June 13, Sebastian-Cole L. Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework. Waltham, MA: Morgan Kaufmann (Elsevier); Kahn MG, Brown J, Chun A, et al. A consensus-based data quality reporting framework for observational healthcare data. Submitted to egems Journal, December Draft version available at: qc. Accessed July 2,
17 Appendix I Defining data quality The ISO 8000 series of standards focuses on data quality. 1 Quality is defined as the degree to which a set of inherent characteristics fulfills requirements. 2 Thus, data quality is the degree to which a set of inherent characteristics of the data fulfills requirements for the data. Describing data quality in terms of characteristics inherent to data means that we subscribe to a multidimensional conceptualization of data quality. 3 Briefly, these inherent characteristics, also called dimensions of data quality, include concepts such as accuracy, relevance, accessibility, contemporaneity, timeliness, and completeness. The initial work establishing the multidimensional conceptualization of data quality identified over 200 dimensions in use across surveyed organizations from different industries. 4 For most data uses, only a handful of dimensions are deemed important enough to formally measure and assess. The dimensions measured in data quality assessment should be those necessary to indicate fitness of the data for a particular use. In summary, data quality is assessed by identifying important dimensions and measuring them. Defining the quality of research data The Collaboratory has embraced the definition of quality data from the 1999 Institute of Medicine Workshop Report titled, Assuring Data Quality and Validity in Clinical Trials for Regulatory Decision Making, 5 in which fitness for use (i.e., quality data) in clinical research is defined as data that sufficiently support conclusions and interpretations equivalent to those derived from error-free data. 5 The job, then, of assessing the quality of research data begins with identifying those aspects of data that bear most heavily on the capacity of the data to support conclusions drawn from the research. Immediately prior to the April 2013 Collaboratory Steering Committee meeting, the program office released the review criteria for Demonstration Projects applying for funds for trial conduct: Criterion 1: Are data collection methods adequately validated? Criterion 2: Validated methods for the electronic health record information? Criterion 3: Demonstrated quality assurance and harmonization of data elements across healthcare systems/sites? Criterion 4: Plans adequate for data quality control during the UH3 (trial conduct) phase? In keeping with the Institute of Medicine definition of quality data, the goal of these requirements is to provide reasonable assurance that data used for Collaboratory Demonstration Projects are capable of supporting the research conclusions. 17
18 The requirements were not further defined at the time of release. To aid in operationalizing data quality assessment, the PDSDQ Core drafted definitions for each criterion. These draft definitions (provided below) reflect the consensus of the Core and do not necessarily represent the opinions or official positions of the NIH. Briefly, Criterion 1 pertains to data prospectively collected for research only (i.e., in addition to data generated in routine care). Criterion 2 applies to data generated in routine care. Criterion 3 pertains to a priori activities to assure consistency in data collection and clinical documentation across clinical sites. Criterion 4 requires plans to assess and control data quality throughout trial conduct. The criteria can be decomposed into data quality activities and data sources to which they apply (Figure A1). The third axis of consideration is the data quality dimensions important for a given study. Data quality activity Data Source Validation of data collection methods Data quality assurance Routine care Harmonization of data elements Data quality control Figure A1. Graphic Representation of Review Criterion Data collected solely for the research Historically, in clinical trials conducted for regulatory review for marketing authorization, identification of data discrepancies was followed by a communication back to the source of the data in an attempt to ascertain the correct value. 6 This process of identifying and resolving data discrepancies is a type of data cleaning. Correction of data discrepancies is best applied to prospective trials with prospectively collected data. As described above, some trials conducted in healthcare settings will collect add-on data (i.e., data necessary for the research that are not captured in routine care). Our initial Demonstration Project data quality assessment inventory (data available upon request) confirmed that multiple Demonstration Projects are collecting prospective data. Five Demonstration Projects planned to collect PROs and one added screens in the local EHR to capture study-specific data. All projects also used routine care data and administrative data. Details of the Demonstration Project data quality assessment inventory are provided in Appendix II. Data quality related review criteria The following four UH3 review criteria (February 12, 2013 UH3 Transition Criteria Draft) were provided by the National Center for Complementary and Alternative Medicine. The PDSDQ Core has defined the criteria as outlined below. 18
19 Criterion 1: Are data collection methods adequately validated? Scope: This criterion applies to data collected prospectively for the project (i.e., collected outside of routine clinical documentation). Purpose: The purpose of this criterion is to provide assurance that project-specific data collection tools, systems, and processes produce data that can support the intended analysis and ultimately the research conclusions. Data collection methods: The processes used to measure, observe, or otherwise obtain and document study assessments. Adequate: Evidence that the error rate has been characterized and will not likely impact the intended analysis and ultimately the conclusions. Validated: Shown to consistently represent and record the intended concept. For questionnaires and rating scales, this refers to evidence that the tool measures the intended concept in the intended population. With respect to measurement of physical quantities or observation of phenomena, this refers to the ability of the measurement or observation to consistently and accurately capture the actual state of the patient. With respect to data processing, this refers to evidence of fidelity in operations performed on the data. Criterion 2: Validated methods for the electronic health record information? Scope: This criterion applies to data collected during routine care (i.e., during or associated with a clinical encounter or assessment). It applies to patient-reported data collected in conjunction with routine care (e.g., intake forms, questionnaires, or rating scales used in routine care and collected through healthcare information systems such as patient portals or EHRs). NOTE: Questionnaires administered through stand-alone systems created for a research study are not included in this criterion. Purpose: The purpose of this criterion is to provide assurance that health system data used for the project can support the intended analysis and ultimately the research conclusions. See definition of validated above. EHR information: For our purposes, this definition encompasses data from information systems used in patient care and self-monitoring; this includes such data obtained through organizational data warehouses. Criterion 3: Demonstrated quality assurance and harmonization of data elements across healthcare systems/sites? Scope: This criterion applies to data elements collected for the project, including both those collected through healthcare systems and those collected through add-on systems for the study. 19
20 Purpose: The purpose of this criterion is to provide assurance that the meaning and format of data are consistent across facilities and that the methods of measurement, observation, and collection uphold the intended consistency. Quality assurance (within this criterion): All the planned and systematic activities implemented within the quality system that can be demonstrated to provide confidence that a product or service will fulfill requirements for quality. Here, quality assurance pertains to activities undertaken to 1) assess existence of and potential for inconsistent data across participating facilities and 2) technical, managerial, or procedural controls in place to maintain consistency throughout the UH3 phase. NOTE: The U.S. Food and Drug Administration has defined quality assurance as independent. Harmonization of data elements across health systems/sites: Use of or mapping organizational data to common data elements. Common data elements: Data elements with the same semantics and representation as defined by the ISO standard. 7 Data element: As defined by the ISO standard, a data element is pairing of a concept and a set of valid values. 7 Criterion 4: Are plans adequate for data quality control during the UH3 phase? Scope: This criterion applies to data collected for the project, including both those collected through healthcare systems and those collected through add-on systems for the study. Purpose: The purpose of this criterion is to provide assurance that data quality monitoring and control processes are in place to maintain the desired quality levels and consistency between data collection facilities/sites. Quality control: The operational techniques and activities used to fulfill requirements for quality. Quality control activities are usually thought of as those activities performed as part of routine operations to measure, monitor, and take corrective action necessary to maintain the desired quality levels within acceptable variance (e.g., re-abstracting a sample of charts on a quarterly basis to measure inter-rater reliability and provide feedback to abstractors). References 1. International Organization for Standardization. ISO :2012(E) Data Quality Part 2: Vocabulary. 1st ed. June 15, International Organization for Standardization. ISO 9000:2005, definition 3.1.1, ISO 9000:2005(E) Quality management systems Fundamentals and vocabulary. 3rd ed. September 15, Lee YW, Pipino LL, Funk JD, Wang RY. Journey to Data Quality. Cambridge, MA: MIT Press;
21 4. Wang R, Strong D. Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems 1996;12:4 34. Available at: Accessed July 2, Institute of Medicine Roundtable on Research and Development of Drugs, Biologics, and Medical Devices. Davis JR, Nolan VP, Woodcock J, Estabrook RW, eds. Assuring Data Quality and Validity in Clinical Trials for Regulatory Decision Making. Workshop Report. Washington, DC: National Academies Press; Available at: Accessed July 2, Society for Clinical Data Management. Good Clinical Data Management Practices (GCDMP). Available at: Accessed July 2, International Organization for Standardization. ISO Information Technology Metadata registries (MDR). 21
22 Appendix II: Data Quality Assessment Plan Inventory The initial plan of the PDSDQ Core was to inventory planned data quality assessment practice from the UH2 applications, review the statements of data quality assessment plans with the Demonstration Projects, summarize the plans in the context of the existing literature, and support the projects as needed in following the existing plans or in formulating and undertaking new plans if desired. The PDSDQ Core conducted a data quality assessment inventory to characterize data quality assessment plans across the initial seven UH2 funded Demonstration Projects. The data quality assessment inventory was conducted in March 2013 and reported at the April 29-30, 2013 Collaboratory Steering Committee meeting (data available upon request). Given the Collaboratory s focus on PCTs in its Demonstration Projects, we expected variability in the data sources used as well as the extent to which any project relied upon any one data source. To characterize their use, data sources commonly used by the Demonstration Projects were classified into five categories (Table A1): 1. External PRO: PRO or other questionnaire data collected outside of an EHR, such as those using a separate personal health record system, interviews, or paper questionnaires. 2. PRO in EHR: PRO or other questionnaire data collected using an EHR system. 3. Research-specific EHR screens: Data collection fields, modules, or screens rendered for users as if they were a part of the EHR system. 4. Clinical data from an institutional data warehouse: Medications, reports from laboratory and diagnostic tests, clinical notes, and structured clinical data such as vital signs originating from a patient care facing system that are accessed through an institutional clinical data warehouse rather than directly from the transactional system. 5. Administrative data from an institutional data warehouse: Coded diagnoses and procedures used for reimbursement. There was also variability in the extent to which Demonstration Projects relied on existing versus prospectively collected data. Five of seven projects were identified as collecting research data in addition to routine care data, four projects are using PRO data (one of which includes patient and staff interviews), and one project is adding data collection screens to the EHR. 22
23 Table A1. Data Source Summary Project Collaborative care for chronic pain in primary care* Nighttime dosing of antihypertensive medications: a pragmatic clinical trial Decreasing bioburden to reduce healthcare-associated infections and readmissions*, Strategies and opportunities to stop colon cancer in priority populations * A pragmatic trial of lumbar image reporting with epidemiology (LIRE) Pragmatic trial of populationbased programs to prevent suicide attempt Pragmatic trials in maintenance hemodialysis External PRO PRO in EHR Research-specific EHR screens Data warehouse/ EHR clinical data Paper, interview X X X Personal health record or interview Patient interviews X X X X X X X X *Includes staff interview data. Includes data from external laboratory. Includes externally enhanced data. X X Data warehouse admin. data Variation in data quality assessment methods corresponds with variation in data sources. To characterize data quality assessment practices across Demonstration Projects, initial applications were reviewed and each project provided any updates describing their planned data quality assessment practices (Table A2). Due to the dependence of data quality assessment on the type of data and the available sources for comparison, opportunistic data quality assessments that leverage available sources for comparison should be expected, rather than uniformity with respect to comparisons. X X X X X 23
24 Table A2. Data Quality Assessment Activities Project Collaborative care for chronic pain in primary care Nighttime dosing of antihypertensive medications: a pragmatic clinical trial Collection control Procedural; technical Procedural (abstraction forms) Decreasing bioburden to reduce Procedural; healthcare-associated infections technical and readmissions Strategies and opportunities to stop colon cancer in priority populations A pragmatic trial of lumbar image reporting with epidemiology (LIRE) Pragmatic trial of populationbased programs to prevent suicide attempt Pragmatic trials in maintenance hemodialysis Completeness Ascertainment 100-case chart review; AC PPV/NPV 90% Independent data; % chart review % chart review % Column complete Part of ETL into warehouse 1. Yes on n=1000 cases, AC <5% per DE; 2. Site-to-site variability, completeness Yes (monthly monitoring) Yes Accuracy Individual Part of ETL into warehouse 1. Comparison to patient selfreport; 2. Comparison to NDI; 3. IRR abstraction threshold; 4. Endpoint review; 5. Out-of-range values Health system validated Call audit; % chart review Procedural Yes Valid values Aggregate Site-to-site variability AC, ascertainment completeness; DE, data error; ETL, extract-transform-load; IRR, interrater reliability; NPV, negative predictive value; PPV, positive predictive value In most cases, data quality assurance and control activities for data collected de novo were not described in detail. This inventory was reported to the Collaboratory in the context of the existing literature. 24
25 Appendix III: Initial Data Quality Assessment Recommendations for Collaboratory Projects At the April 2013 Collaboratory Steering Committee meeting, an approach for addressing the data validation requirements was presented (data available upon request). This approach included: 1. Completeness assessment: A four-dimensional completeness assessment that could be conducted by all Demonstration Projects. 2. Accuracy assessment: Identification and conduct of project-specific data quality (accuracy) assessments. 3. Impact assessment: Use of the completeness and accuracy assessment results by the project statistician to test sensitivity of the analyses to anticipated data quality problems. A Total Data Quality Management approach 1,2 was applied to identify and prioritize projectspecific data quality needs (step 2 above). The following project information was reviewed: 1. Data elements collected and used for the project s statistical analysis 2. Workflow diagrams for clinic processes that generate data used in the study 3. Data flow diagrams for data elements used in the study Higher priority was to be given to cohort identification and outcome data elements. The initial data element list from each project application was reviewed and updated where needed; specification of the source system for each data element was added. The workflow and data flow diagrams concentrated on processes used to generate data utilized in the study without regard to whether these processes were part of routine clinic practice or specific to the study. The development and discussion of the diagrams were used to surface potential sources of inconsistency or data error. The Core proposed one-on-one work with, or individual work by, each Demonstration Project team to determine the type of accuracy assessment attainable and the targeted data validation assessments valuable for each project. Testing the recommendations with the STOP CRC project At the Steering Committee meeting, one project, Strategies and Opportunities to Stop Colon Cancer in Priority Populations (STOP CRC), came forward to work through the proposed approach with the Core. A series of several calls held over 2 months were conducted as the project and Core worked through the above approach. The calls were attended by a co principal investigator of the STOP CRC trial, the project informatician overseeing the study s multifacility EHR implementation, the first author of this report, and two informaticians from the Coordinating Center. As planned, the development and discussion of the diagrams were used to surface potential sources of inconsistency or data error. 25
26 The data quality assessment work was reported on monthly calls and was summarized both in writing and by a template for reporting a data quality assessment. Summary of findings from testing with the STOP CRC project A workflow diagram existed and was contributed by the STOP CRC project team, as was an updated data dictionary. A Coordinating Center informatician conducted interviews with the STOP CRC project co principal investigator and informatician to understand the workflow and complete the data flow diagram. The majority of the time on the calls was spent 1) educating the Collaboratory Coordinating Center informaticians about the study and the local data policies, procedures, and systems; 2) discussing data sources and reviewing the workflow and data flow diagrams; 3) discussing possible data quality problems based on the co principal investigator s experience with a similar project, as well as potential solutions; and 4) creating a plan for initial and ongoing assessments of data quality and completeness. The STOP CRC statistician attended the final call to discuss the data validation plan results, impact assessments, and plans for ongoing data quality assurance during the multisite trial. Due to the fact that data quality assessment plans existed for each project and were deemed acceptable through the grant review, the systematic approach at designing a data quality assessment plan was offered on a voluntary basis. Because of the inclusion of the data validation criteria in the review criteria for the trial conduct funding decision, we anticipated that most if not all Demonstration Projects would have shown interest in the offered approach and support. Only one project engaged the Core for a systematic assessment. No other projects reported making use of the draft review criteria definitions or data quality assessment information, plans, or templates produced. References 1. Davidson B, Lee WY, Wang R. Developing data production maps: meeting patient discharge data submission requirements. International Journal of Healthcare Technology and Management 2004;6: Lee YW, Pipino LL, Funk JD, Wang RY. Journey to Data Quality. Cambridge, MA: MIT Press;
Adventures in EHR Computable Phenotypes: Lessons Learned from the Southeastern Diabetes Initiative (SEDI)
Adventures in EHR Computable Phenotypes: Lessons Learned from the Southeastern Diabetes Initiative (SEDI) PCORnet Best Practices Sharing Session Wednesday, August 5, 2015 Introductions to the Round Table
Practical Development and Implementation of EHR Phenotypes. NIH Collaboratory Grand Rounds Friday, November 15, 2013
Practical Development and Implementation of EHR Phenotypes NIH Collaboratory Grand Rounds Friday, November 15, 2013 The Southeastern Diabetes Initiative (SEDI) Setting the Context Risk Prediction and Intervention:
Appendix B Data Quality Dimensions
Appendix B Data Quality Dimensions Purpose Dimensions of data quality are fundamental to understanding how to improve data. This appendix summarizes, in chronological order of publication, three foundational
Environmental Health Science. Brian S. Schwartz, MD, MS
Environmental Health Science Data Streams Health Data Brian S. Schwartz, MD, MS January 10, 2013 When is a data stream not a data stream? When it is health data. EHR data = PHI of health system Data stream
EQR PROTOCOL 4 VALIDATION OF ENCOUNTER DATA REPORTED BY THE MCO
OMB Approval No. 0938-0786 EQR PROTOCOL 4 VALIDATION OF ENCOUNTER DATA REPORTED BY THE MCO A Voluntary Protocol for External Quality Review (EQR) Protocol 1: Assessment of Compliance with Medicaid Managed
Dimensions of Statistical Quality
Inter-agency Meeting on Coordination of Statistical Activities SA/2002/6/Add.1 New York, 17-19 September 2002 22 August 2002 Item 7 of the provisional agenda Dimensions of Statistical Quality A discussion
Instructional Technology Capstone Project Standards and Guidelines
Instructional Technology Capstone Project Standards and Guidelines The Committee recognizes the fact that each EdD program is likely to articulate some requirements that are unique. What follows are a
DGD14-006. ACT Health Data Quality Framework
ACT Health Data Quality Framework Version: 1.0 Date : 18 December 2013 Table of Contents Table of Contents... 2 Acknowledgements... 3 Document Control... 3 Document Endorsement... 3 Glossary... 4 1 Introduction...
INTERAGENCY GUIDANCE ON THE ADVANCED MEASUREMENT APPROACHES FOR OPERATIONAL RISK. Date: June 3, 2011
Board of Governors of the Federal Reserve System Federal Deposit Insurance Corporation Office of the Comptroller of the Currency Office of Thrift Supervision INTERAGENCY GUIDANCE ON THE ADVANCED MEASUREMENT
Use of Electronic Health Records in Clinical Research: Core Research Data Element Exchange Detailed Use Case April 23 rd, 2009
Use of Electronic Health Records in Clinical Research: Core Research Data Element Exchange Detailed Use Case April 23 rd, 2009 Table of Contents 1.0 Preface...4 2.0 Introduction and Scope...6 3.0 Use Case
Improvement of the quality of medical databases: data-mining-based prediction of diagnostic codes from previous patient codes
Digital Healthcare Empowering Europeans R. Cornet et al. (Eds.) 2015 European Federation for Medical Informatics (EFMI). This article is published online with Open Access by IOS Press and distributed under
TOWARD A FRAMEWORK FOR DATA QUALITY IN ELECTRONIC HEALTH RECORD
TOWARD A FRAMEWORK FOR DATA QUALITY IN ELECTRONIC HEALTH RECORD Omar Almutiry, Gary Wills and Richard Crowder School of Electronics and Computer Science, University of Southampton, Southampton, UK. {osa1a11,gbw,rmc}@ecs.soton.ac.uk
BOSTON UNIVERSITY SCHOOL OF PUBLIC HEALTH PUBLIC HEALTH COMPETENCIES
BOSTON UNIVERSITY SCHOOL OF PUBLIC HEALTH PUBLIC HEALTH COMPETENCIES Competency-based education focuses on what students need to know and be able to do in varying and complex situations. These competencies
Organizing Your Approach to a Data Analysis
Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize
Graduate Program Objective #1 Course Objectives
1 Graduate Program Objective #1: Integrate nursing science and theory, biophysical, psychosocial, ethical, analytical, and organizational science as the foundation for the highest level of nursing practice.
How to Develop a Research Protocol
How to Develop a Research Protocol Goals & Objectives: To explain the theory of science To explain the theory of research To list the steps involved in developing and conducting a research protocol Outline:
Guidance for Industry. Q10 Pharmaceutical Quality System
Guidance for Industry Q10 Pharmaceutical Quality System U.S. Department of Health and Human Services Food and Drug Administration Center for Drug Evaluation and Research (CDER) Center for Biologics Evaluation
Risk Knowledge Capture in the Riskit Method
Risk Knowledge Capture in the Riskit Method Jyrki Kontio and Victor R. Basili [email protected] / [email protected] University of Maryland Department of Computer Science A.V.Williams Building
Integrated Risk Management:
Integrated Risk Management: A Framework for Fraser Health For further information contact: Integrated Risk Management Fraser Health Corporate Office 300, 10334 152A Street Surrey, BC V3R 8T4 Phone: (604)
PERFORMANCE MONITORING & EVALUATION TIPS CONDUCTING DATA QUALITY ASSESSMENTS ABOUT TIPS
NUMBER 18 1 ST EDITION, 2010 PERFORMANCE MONITORING & EVALUATION TIPS CONDUCTING DATA QUALITY ASSESSMENTS ABOUT TIPS These TIPS provide practical advice and suggestions to USAID managers on issues related
Building a Data Quality Scorecard for Operational Data Governance
Building a Data Quality Scorecard for Operational Data Governance A White Paper by David Loshin WHITE PAPER Table of Contents Introduction.... 1 Establishing Business Objectives.... 1 Business Drivers...
Assurance Engagements
IFAC International Auditing and Assurance Standards Board March 2003 Exposure Draft Response Due Date June 30, 2003 Assurance Engagements Proposed International Framework For Assurance Engagements, Proposed
Data Governance, Data Architecture, and Metadata Essentials Enabling Data Reuse Across the Enterprise
Data Governance Data Governance, Data Architecture, and Metadata Essentials Enabling Data Reuse Across the Enterprise 2 Table of Contents 4 Why Business Success Requires Data Governance Data Repurposing
Challenges and Opportunities in Clinical Trial Data Processing
Challenges and Opportunities in Clinical Trial Data Processing Vadim Tantsyura, Olive Yuan, Ph.D. Sergiy Sirichenko (Regeneron Pharmaceuticals, Inc., Tarrytown, NY) PG 225 Introduction The review and approval
PATIENT IDENTIFICATION AND MATCHING INITIAL FINDINGS
PATIENT IDENTIFICATION AND MATCHING INITIAL FINDINGS Prepared for the Office of the National Coordinator for Health Information Technology by: Genevieve Morris, Senior Associate, Audacious Inquiry Greg
Use advanced techniques for summary and visualization of complex data for exploratory analysis and presentation.
MS Biostatistics MS Biostatistics Competencies Study Development: Work collaboratively with biomedical or public health researchers and PhD biostatisticians, as necessary, to provide biostatistical expertise
Public Health Reporting Initiative Functional Requirements Description
Public Health Reporting Initiative Functional Requirements Description 9/25/2012 1 Table of Contents 1.0 Preface and Introduction... 2 2.0 Initiative Overview... 3 2.1 Initiative Challenge Statement...
Data quality and metadata
Chapter IX. Data quality and metadata This draft is based on the text adopted by the UN Statistical Commission for purposes of international recommendations for industrial and distributive trade statistics.
On the Setting of the Standards and Practice Standards for. Management Assessment and Audit concerning Internal
(Provisional translation) On the Setting of the Standards and Practice Standards for Management Assessment and Audit concerning Internal Control Over Financial Reporting (Council Opinions) Released on
PHARMACEUTICAL QUALITY SYSTEM Q10
INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE ICH HARMONISED TRIPARTITE GUIDELINE PHARMACEUTICAL QUALITY SYSTEM Q10 Current Step
EURORDIS-NORD-CORD Joint Declaration of. 10 Key Principles for. Rare Disease Patient Registries
EURORDIS-NORD-CORD Joint Declaration of 10 Key Principles for Rare Disease Patient Registries 1. Patient Registries should be recognised as a global priority in the field of Rare Diseases. 2. Rare Disease
A General Approach to Incorporate Data Quality Matrices into Data Mining Algorithms
A General Approach to Incorporate Data Quality Matrices into Data Mining Algorithms Ian Davidson 1st author's affiliation 1st line of address 2nd line of address Telephone number, incl country code 1st
Barnett International and CHI's Inaugural Clinical Trial Oversight Summit June 4-7, 2012 Omni Parker House Boston, MA
Barnett International and CHI's Inaugural Clinical Trial Oversight Summit June 4-7, 2012 Omni Parker House Boston, MA This presentation is the property of DynPort Vaccine Company LLC, a CSC company, and
Appendix A: Data Quality Management Model Domains and Characteristics
Appendix A: Data Quality Management Model Domains and Characteristics Characteristic Application Collection Warehousing Analysis Data Accuracy The extent to which the data are free of identifiable errors.
DNP CURRICULUM PROGRAM OUTCOMES
DNP CURRICULUM PROGRAM OUTCOMES CCNE DNP ESSENTIALS Essential I: Scientific Underpinnings for Practice 1. Integrate nursing science with knowledge from ethics, the biophysical, psychosocial, analytical,
Accountable Care: Implications for Managing Health Information. Quality Healthcare Through Quality Information
Accountable Care: Implications for Managing Health Information Quality Healthcare Through Quality Information Introduction Healthcare is currently experiencing a critical shift: away from the current the
Clinical Decision Support (CDS) to improve colorectal cancer screening
Clinical Decision Support (CDS) to improve colorectal cancer screening NIH Collaboratory Grand Rounds Sept 26, 2014 Presented by: Tim Burdick MD MSc OCHIN Chief Medical Informatics Officer Adjunct Associate
NASCIO EA Development Tool-Kit Solution Architecture. Version 3.0
NASCIO EA Development Tool-Kit Solution Architecture Version 3.0 October 2004 TABLE OF CONTENTS SOLUTION ARCHITECTURE...1 Introduction...1 Benefits...3 Link to Implementation Planning...4 Definitions...5
Design and Implementation of an Automated Geocoding Infrastructure for the Duke Medicine Enterprise Data Warehouse
Design and Implementation of an Automated Geocoding Infrastructure for the Duke Medicine Enterprise Data Warehouse Shelley A. Rusincovitch, Sohayla Pruitt, MS, Rebecca Gray, DPhil, Kevin Li, PhD, Monique
Stakeholder Guide 2014 www.effectivehealthcare.ahrq.gov
Stakeholder Guide 2014 www.effectivehealthcare.ahrq.gov AHRQ Publication No. 14-EHC010-EF Replaces Publication No. 11-EHC069-EF February 2014 Effective Health Care Program Stakeholder Guide Contents Introduction...1
BS, MS, DNP and PhD in Nursing Competencies
BS, MS, DNP and PhD in Nursing Competencies The competencies arise from the understanding of nursing as a theory-guided, evidenced -based discipline. Graduates from the curriculum are expected to possess
The PCORI Methodology Report. Appendix A: Methodology Standards
The Appendix A: Methodology Standards November 2013 4 INTRODUCTION This page intentionally left blank. APPENDIX A A-1 APPENDIX A: PCORI METHODOLOGY STANDARDS Cross-Cutting Standards for PCOR 1: Standards
Health Information Technology in the United States: Information Base for Progress. Executive Summary
Health Information Technology in the United States: Information Base for Progress 2006 The Executive Summary About the Robert Wood Johnson Foundation The Robert Wood Johnson Foundation focuses on the pressing
School of Advanced Studies Doctor Of Management In Organizational Leadership. DM 004 Requirements
School of Advanced Studies Doctor Of Management In Organizational Leadership The mission of the Doctor of Management in Organizational Leadership degree program is to develop the critical and creative
Accelerating Clinical Trials Through Shared Access to Patient Records
INTERSYSTEMS WHITE PAPER Accelerating Clinical Trials Through Shared Access to Patient Records Improved Access to Clinical Data Across Hospitals and Systems Helps Pharmaceutical Companies Reduce Delays
The Product Review Life Cycle A Brief Overview
Stat & Quant Mthds Pharm Reg (Spring 2, 2014) Lecture 2,Week 1 1 The review process developed over a 40 year period and has been influenced by 5 Prescription User Fee Act renewals Time frames for review
Analyzing Research Articles: A Guide for Readers and Writers 1. Sam Mathews, Ph.D. Department of Psychology The University of West Florida
Analyzing Research Articles: A Guide for Readers and Writers 1 Sam Mathews, Ph.D. Department of Psychology The University of West Florida The critical reader of a research report expects the writer to
FINAL DOCUMENT. Guidelines for Regulatory Auditing of Quality Management Systems of Medical Device Manufacturers Part 1: General Requirements
GHTF/SG4/N28R4:2008 FINAL DOCUMENT Title: Guidelines for Regulatory Auditing of Quality Management Systems of Medical Device Manufacturers Authoring Group: GHTF Study Group 4 Endorsed by: The Global Harmonization
Basic Results Database
Basic Results Database Deborah A. Zarin, M.D. ClinicalTrials.gov December 2008 1 ClinicalTrials.gov Overview and PL 110-85 Requirements Module 1 2 Levels of Transparency Prospective Clinical Trials Registry
Following are detailed competencies which are addressed to various extents in coursework, field training and the integrative project.
MPH Epidemiology Following are detailed competencies which are addressed to various extents in coursework, field training and the integrative project. Biostatistics Describe the roles biostatistics serves
Exploring the Challenges and Opportunities of Leveraging EMRs for Data-Driven Clinical Research
White Paper Exploring the Challenges and Opportunities of Leveraging EMRs for Data-Driven Clinical Research Because some knowledge is too important not to share. Exploring the Challenges and Opportunities
Private Circulation Document: IST/35_07_0075
Private Circulation Document: IST/3_07_007 BSI Group Headquarters 389 Chiswick High Road London W4 4AL Tel: +44 (0) 20 8996 9000 Fax: +44 (0) 20 8996 7400 www.bsi-global.com Committee Ref: IST/3 Date:
DOCTOR OF PHILOSOPHY DEGREE. Educational Leadership Doctor of Philosophy Degree Major Course Requirements. EDU721 (3.
DOCTOR OF PHILOSOPHY DEGREE Educational Leadership Doctor of Philosophy Degree Major Course Requirements EDU710 (3.0 credit hours) Ethical and Legal Issues in Education/Leadership This course is an intensive
Step 1: Analyze Data. 1.1 Organize
A private sector assessment combines quantitative and qualitative methods to increase knowledge about the private health sector. In the analytic phase, the team organizes and examines information amassed
Underwriting put to the test: Process risks for life insurers in the context of qualitative Solvency II requirements
Underwriting put to the test: Process risks for life insurers in the context of qualitative Solvency II requirements Authors Lars Moormann Dr. Thomas Schaffrath-Chanson Contact [email protected]
Department/Academic Unit: Public Health Sciences Degree Program: Biostatistics Collaborative Program
Department/Academic Unit: Public Health Sciences Degree Program: Biostatistics Collaborative Program Department of Mathematics and Statistics Degree Level Expectations, Learning Outcomes, Indicators of
Re: Docket No. FDA 2014 N 0339: Proposed Risk-Based Regulatory Framework and Strategy for Health Information Technology Report; Request for Comments
Leslie Kux Assistant Commissioner for Policy Food and Drug Administration Division of Docket Management (HFA 305) Food and Drug Administration 5630 Fishers Lane, Rm. 1061 Rockville, MD 20852 Re: Docket
Designing ETL Tools to Feed a Data Warehouse Based on Electronic Healthcare Record Infrastructure
Digital Healthcare Empowering Europeans R. Cornet et al. (Eds.) 2015 European Federation for Medical Informatics (EFMI). This article is published online with Open Access by IOS Press and distributed under
University of Maryland School of Medicine Master of Public Health Program. Evaluation of Public Health Competencies
Semester/Year of Graduation University of Maryland School of Medicine Master of Public Health Program Evaluation of Public Health Competencies Students graduating with an MPH degree, and planning to work
Finding and Evaluating Evidence: Systematic Reviews and Evidence-Based Practice
University Press Scholarship Online You are looking at 1-10 of 54 items for: keywords : systematic reviews Systematic Reviews and Meta-Analysis acprof:oso/9780195326543.001.0001 This book aims to make
The Future Use of Electronic Health Records for reshaped health statistics: opportunities and challenges in the Australian context
The Future Use of Electronic Health Records for reshaped health statistics: opportunities and challenges in the Australian context The health benefits of e-health E-health, broadly defined by the World
Department of Veterans Affairs Health Services Research and Development - A Systematic Review
Department of Veterans Affairs Health Services Research & Development Service Effects of Health Plan-Sponsored Fitness Center Benefits on Physical Activity, Health Outcomes, and Health Care Costs and Utilization:
Interoperability and Analytics February 29, 2016
Interoperability and Analytics February 29, 2016 Matthew Hoffman MD, CMIO Utah Health Information Network Conflict of Interest Matthew Hoffman, MD Has no real or apparent conflicts of interest to report.
Tertiary Use of Electronic Health Record Data. Maggie Lohnes, RN, CPHIMS, FHIMSS VP Provider Relations Anolinx, LLC October 26, 2015
Tertiary Use of Electronic Health Record Data Maggie Lohnes, RN, CPHIMS, FHIMSS VP Provider Relations Anolinx, LLC October 26, 2015 Disclosure of Conflicts of Interest No relevant conflicts of interest
Guide for the Development of Results-based Management and Accountability Frameworks
Guide for the Development of Results-based Management and Accountability Frameworks August, 2001 Treasury Board Secretariat TABLE OF CONTENTS Section 1. Introduction to the Results-based Management and
Fortune 500 Medical Devices Company Addresses Unique Device Identification
Fortune 500 Medical Devices Company Addresses Unique Device Identification New FDA regulation was driver for new data governance and technology strategies that could be leveraged for enterprise-wide benefit
Information Management & Data Governance
Data governance is a means to define the policies, standards, and data management services to be employed by the organization. Information Management & Data Governance OVERVIEW A thorough Data Governance
Effecting Data Quality Improvement through Data Virtualization
Effecting Data Quality Improvement through Data Virtualization Prepared for Composite Software by: David Loshin Knowledge Integrity, Inc. June, 2010 2010 Knowledge Integrity, Inc. Page 1 Introduction The
Performing Audit Procedures in Response to Assessed Risks and Evaluating the Audit Evidence Obtained
Performing Audit Procedures in Response to Assessed Risks 1781 AU Section 318 Performing Audit Procedures in Response to Assessed Risks and Evaluating the Audit Evidence Obtained (Supersedes SAS No. 55.)
Data Governance. David Loshin Knowledge Integrity, inc. www.knowledge-integrity.com (301) 754-6350
Data Governance David Loshin Knowledge Integrity, inc. www.knowledge-integrity.com (301) 754-6350 Risk and Governance Objectives of Governance: Identify explicit and hidden risks associated with data expectations
Internal Audit. Audit of HRIS: A Human Resources Management Enabler
Internal Audit Audit of HRIS: A Human Resources Management Enabler November 2010 Table of Contents EXECUTIVE SUMMARY... 5 1. INTRODUCTION... 8 1.1 BACKGROUND... 8 1.2 OBJECTIVES... 9 1.3 SCOPE... 9 1.4
SAS Enterprise Guide in Pharmaceutical Applications: Automated Analysis and Reporting Alex Dmitrienko, Ph.D., Eli Lilly and Company, Indianapolis, IN
Paper PH200 SAS Enterprise Guide in Pharmaceutical Applications: Automated Analysis and Reporting Alex Dmitrienko, Ph.D., Eli Lilly and Company, Indianapolis, IN ABSTRACT SAS Enterprise Guide is a member
Risk based monitoring using integrated clinical development platform
Risk based monitoring using integrated clinical development platform Authors Sita Rama Swamy Peddiboyina Dr. Shailesh Vijay Joshi 1 Abstract Post FDA s final guidance on Risk Based Monitoring, Industry
HIM 111 Introduction to Health Information Management HIM 135 Medical Terminology
HIM 111 Introduction to Health Information Management 1. Demonstrate comprehension of the difference between data and information; data sources (primary and secondary), and the structure and use of health
Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole
Paper BB-01 Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole ABSTRACT Stephen Overton, Overton Technologies, LLC, Raleigh, NC Business information can be consumed many
Procurement Programmes & Projects P3M3 v2.1 Self-Assessment Instructions and Questionnaire. P3M3 Project Management Self-Assessment
Procurement Programmes & Projects P3M3 v2.1 Self-Assessment Instructions and Questionnaire P3M3 Project Management Self-Assessment Contents Introduction 3 User Guidance 4 P3M3 Self-Assessment Questionnaire
The Importance of Data Governance in Healthcare
WHITE PAPER The Importance of Data Governance in Healthcare By Bill Fleissner; Kamalakar Jasti; Joy Ales, MHA, An Encore Point of View October 2014 BSN, RN; Randy Thomas, FHIMSS AN ENCORE POINT OF VIEW
Data Governance Primer. A PPDM Workshop. March 2015
Data Governance Primer A PPDM Workshop March 2015 Agenda - SETTING THE STAGE - DATA GOVERNANCE BASICS - METHODOLOGY - KEYS TO SUCCESS Copyright 2015 Noah Consulting LLC. All Rights Reserved. Industry Drivers
APPENDIX E THE ASSESSMENT PHASE OF THE DATA LIFE CYCLE
APPENDIX E THE ASSESSMENT PHASE OF THE DATA LIFE CYCLE The assessment phase of the Data Life Cycle includes verification and validation of the survey data and assessment of quality of the data. Data verification
Validation and Calibration. Definitions and Terminology
Validation and Calibration Definitions and Terminology ACCEPTANCE CRITERIA: The specifications and acceptance/rejection criteria, such as acceptable quality level and unacceptable quality level, with an
Utility of Common Data Models for EHR and Registry Data Integration: Use in Automated Surveillance
Utility of Common Data Models for EHR and Registry Data Integration: Use in Automated Surveillance Frederic S. Resnic, MS, MSc Chairman, Department of Cardiovascular Medicine Co-Director, Comparative Effectiveness
COMPREHENSIVE EXAMINATION. Adopted May 31, 2005/Voted revisions in January, 2007, August, 2008, and November 2008 and adapted October, 2010
COMPREHENSIVE EXAMINATION Adopted May 31, 2005/Voted revisions in January, 2007, August, 2008, and November 2008 and adapted October, 2010 All students are required to successfully complete the Comprehensive
Fixed-Effect Versus Random-Effects Models
CHAPTER 13 Fixed-Effect Versus Random-Effects Models Introduction Definition of a summary effect Estimating the summary effect Extreme effect size in a large study or a small study Confidence interval
Data Governance. Data Governance, Data Architecture, and Metadata Essentials Enabling Data Reuse Across the Enterprise
Data Governance Data Governance, Data Architecture, and Metadata Essentials Enabling Data Reuse Across the Enterprise 2 Table of Contents 4 Why Business Success Requires Data Governance Data Repurposing
