Quality. Data. In Context
|
|
|
- Vernon Bennett
- 9 years ago
- Views:
Transcription
1 Diane M. Strong, Yang W. Lee, and Richard Y. Wang Data A new study reveals businesses are defining Quality data quality with the consumer in mind. In Context DATA-QUALITY (DQ) PROBLEMS ARE INCREASINGLY EVIdent, particularly in organizational databases. Indeed, 50% to 80% of computerized criminal records in the U.S. were found to be inaccurate, incomplete, or ambiguous. The social and economic impact of poor-quality data costs billions of dollars. [5-7, 10]. Organizational databases, however, reside in the larger context of information systems (IS). Within this larger context, data is collected from multiple data sources and stored in databases. From this stored data, useful information 1 is generated for organizational decision-making. 1 For consistency, we use the term data throughout this article to refer to both data and information. This avoids switching between terms as we switch between production and use of data. COMMUNICATIONS OF THE ACM May 1997/Vol. 40, No
2 DQ problems may arise anywhere in this larger IS context. 2 Thus, we argue for a conceptualization of data quality that includes this context. Database research aims at ensuring the quality of data in databases. In the DQ area, existing research investigates DQ definitions [8, 11], modeling [1, 2], and control [6]. With few exceptions, however, DQ is treated as an intrinsic concept, independent of the context in which data is produced and used. This focus on intrinsic DQ problems in stored data fails to solve complex organizational problems. We attribute this failure, in part, to the lack of a broader DQ conceptualization. When quality problems are defined as errors in stored data, IS professionals may not recognize, and thus solve, the most critical DQ problems in organizations. In contrast to this intrinsic view, it is well accepted in the quality literature that quality cannot be assessed independent of consumers who choose and use products [4]. Similarly, the quality of data cannot be assessed independent of the people who use data data consumers. Data consumers assessments of DQ are increasingly important because consumers now have more choices and control over their computing environment and the data they use. To solve organizational DQ problems, therefore, one must consider DQ beyond the intrinsic view. Moreover, one must move beyond stored data to include data in production and utilization processes. Using qualitative analysis, we examined DQ projects from three leading-edge organizations and identified common patterns of quality problems. These patterns emerged because we used a broader conceptualization of DQ. Based on these patterns, we developed recommendations for IS professionals to improve DQ from the perspective of data consumers. DQ Category Intrinsic DQ Accessibility DQ Contextual DQ Representational DQ Definitions and Methods in Context Production and storage of data has been conceptualized as a data manufacturing system [3, 9]. Central to this is the concept of a data production process that transforms data into information useful to data consumers. We identify three roles within data manufacturing systems: data producers (people, groups, or other sources who generate data); data custodians (people who provide and manage computing 2 The term information system is sometimes used to mean a database or a computer system (including hardware and software). Our use of the phase larger information systems context covers the organizational processes, procedures, and roles employed in collecting, processing, distributing and using data. resources for storing and processing data); and data consumers (people or groups who use data). Each role is associated with a process or task: data producers are associated with data-production processes; data custodians with data storage, maintenance, and security; and data consumers with data-utilization processes, which may involve additional data aggregation and integration. We define high-quality data as data that is fit for use by data consumers a widely adopted criteria. Table 1. DQ categories and dimensions DQ Dimensions Accuracy, Objectivity, Believability, Reputation Accessibility, Access security Relevancy, Value-Added, Timeliness, Completeness, Amount of data Interpretability, Ease of understanding, Concise representation, Consistent representation This means that usefulness and usability are important aspects of quality. Using this definition, the characteristics of high-quality data (Table 1) consist of four categories: intrinsic, accessibility, contextual, and representational aspects. This data consumers perspective is a broader conceptualization of DQ than the conventional intrinsic view. We define a DQ problem as any difficulty encountered along one or more quality dimensions that renders data completely or largely unfit for use. We define a DQ project as organizational actions taken to address a DQ problem given some recognition of poor DQ by the organization. We intentionally include projects initiated for purposes other than improving DQ. For example, during conversion of data to a client/server system, poor DQ may be recognized and an improvement initiated. To examine DQ problems in practice, we studied 42 DQ projects from three data-rich organizations: GoldenAir, an international airline; BetterCare, a hospital; and HyCare, a Health Maintenance Organization (HMO). In terms of industry position, attention to DQ, and information systems, these three firms are leaders, yet they exhibit sufficient variation for investigating data projects (Table 2). All have identified significant DQ problems, and are actively attending to them. This contrasts with many organizations that fail to address their quality problems. This research employed qualitative data collection and analysis techniques. We collected data about these projects via interviews of data producers, cus- 104 May 1997/Vol. 40, No. 5 COMMUNICATIONS OF THE ACM
3 Table 2. Site characteristics Site Name* and Industry Attention to DQ IS Organization Hardware and Software Environment GoldenAir Airline IS Development IS is essentially a service bureau. IBM-compatible mainframe with IMS databases and MMS. BetterCare Hospital DQ Administrator Centralized IS organization reporting to finance VP in a centralized, functional firm. PC-based client server environment with TRACE, a MUMPS-based database system. HyCare HMO Total Quality Management (TQM) Initatives Powerful, centralized IS organization in a decentralized, divisional firm. Heterogeneous hardware and software across divisions. *All names are fictitious DQ Category Intrinsic DQ Accessibility DQ Contextual DQ Representational DQ Table 3. DQ patterns in DQ projects todians, consumers, and managers. We organized each DQ project in terms of three problem-solving steps: problem finding (how the organization identified a DQ problem), problem analysis (what the organization determined the cause to be), and problem resolution that includes changing processes (changing the procedures for producing, storing, or using data) and changing data (updating the data value). Each project was analyzed using the DQ dimensions as content analysis codes. From the coded projects, we identified common patterns and sequences of dimensions attended to during DQ projects (Table 3). 3 Intrinsic DQ Pattern Mismatches among sources of the same data are a common cause of intrinsic DQ concerns. Initially, data consumers do not know the source to which quality problems should be attributed; they know only that data is conflicting. Thus, these 3 An appendix containing method details and example projects is posted at 4The italics signifies that believability is a DQ dimension. This convention will be used to highlight the interaction of DQ dimensions in a DQ project. DQ Dimensions Accuracy, Objectivity, Believability, Reputation Accessibility, Access security Relevancy, Value-Added, Timeliness, Completeness, Amount of data Interpretability, Ease of understanding, Concise representation, Consistent representation concerns initially appear as believability 4 problems. Over time, information about the causes of mismatches accumulates from evaluations of the accuracy of different sources, which leads to a poor reputation for less accurate sources. (A reputation for poor quality can also develop with little factual basis.) As a reputation for poor-quality data becomes common PoorApplication Questionable Believability Multiple sources of same data Little Added-value knowledge, these data sources are viewed as having little added value for the organization, resulting in reduced use (Figure 1, subpattern 1). Judgment or subjectivity in the data production process is another common cause (subpattern 2). For example, coded or interpreted data is considered to be of lower quality than raw, uninterpreted Figure 1. Intrinsic DQ problem pattern Data not used * Data not used because of little added-value and poor reputation * Poor intrinsic DQ becomes common knowledge * Information about causes of mismatches accumulates * Mismatches exist * Information about subjectivity accumulates Questionable Objectivity * Data production process viewed as subjective Judgement involved in data production (1) (2) COMMUNICATIONS OF THE ACM May 1997/Vol. 40, No
4 data. Initially, only those with knowledge of data production processes are aware of these potential problems, which appear as concerns about data objectivity. Over time, information about the subjective nature of data production accumulates, resulting in data of questionable believability and reputation and thus of little added value to data consumers. The overall result is reduced use of this suspect data. Intrinsic DQ subpattern 1 was exhibited at all three research sites. GoldenAir has a history of mismatches between their inventory system data and physical warehouse counts. Warehouse counts serve as a standard against which to measure the accuracy of system data, for example, the system data source is between internal HMO patient records and bills submitted by hospitals for reimbursement. For example, when the HMO is billed for coronary bypass surgery, the HMO patient record should indicate active, serious heart problems. Mismatches occur in both directions, hospital claims without HMO records of problems, and HMO records of problems without corresponding hospital claims. Initially, HyCare assumed the external (hospital) data was wrong; HMO staff perceived their data to be more believable and have a better reputation than those of hospitals. This general sense of the quality of sources, however, was not based on factual analysis. Subpattern 2 occurred at both BetterCare and * Computerized data inaccessible due to insufficient systems resources Poor Accessibility * Systems difficult to access: e.g., unreliable network Lack of computing resources * Computerized data inaccessible due to time and effort to get authorized permission to access Access Security * Must protect confidentiality Privacy and confidentiality Barriers to data accessibility * Computerized data inaccessible because multiple specialists are needed to interpret data across multiple specialties Interpretability and Understandability * Computerized data coded, e.g., DRG and procedure codes * Technical data across multiple specialties included in databases: e.g., medical terminology, medical measurements, and engineering specifications. * Computerized data inaccessible for analysis due to limited capacities to summarize across image and text data Concise and Consistent Representation * Advanced IT permits storage of image and text data Computerizing and data analyzing (1) (2) (3) (4) (5) Figure 2. Accessibility DQ problem pattern * Computerized data inaccessible when needed Timeliness * Processing slowed due to large data volume: e.g., weekend batch extracts Amount of Data * Large amount of data accumulated inaccurate and not believable, and is adjusted periodically to match actual warehouse counts. The system data gradually develops mismatches, however, and its reputation gradually worsens until the data is not used for decision-making. At BetterCare, this subpattern occurred between TRACE 5 and STATUS. 6 Some data, like daily hospital bed utilization, is available from both systems. Nevertheless, it frequently have different values. Over time, TRACE has developed a reputation as an accurate source, and the use of STATUS has declined. At HyCare, inconsistent data values occur 5 TRACE is a database containing historical data extracted from the hospital s information and control system for use by managers making longer-term decisions and by medical researchers. 6 STATUS is an operational system that records a snapshot of daily hospital resources. HyCare. Using doctors and nurses notes about patients, BetterCare s medical record coders designate diagnosis and procedure codes and corresponding diagnosis-related groups (DRG) codes for billing. Although coders are highly trained, some subjectivity remains. Thus, this data is considered to be less objective than raw data. Data-production forms also contribute to reduced objectivity of data. At HyCare, doctors using preprinted forms with check boxes for specifying procedure codes generated a reduced range of procedures performed, as compared to doctors using freeform input. This variance affects the believability of this data. The three organizations developed the following solutions for handling subpattern 1: 106 May 1997/Vol. 40, No. 5 COMMUNICATIONS OF THE ACM
5 GoldenAir continues their cycle of physically counting inventory and adjusting system values whenever the mismatch becomes unacceptably large. BetterCare is rewriting STATUS. They are also designating single data production points for data items and improving computerized support for data production. HyCare s analysis of the causes of mismatches between hospital and internal data found problems with both sources. They fixed an edit check problem with their internal computer systems, fixed a data production problem in doctors designation of active, serious problems for internal HMO records, and initiated joint DQ projects with associated hospitals. These solutions manifest two different approaches to problem resolution: changing the systems or changing the production processes. GoldenAir focused on computer systems as the solution and ignored their data production processes. As a result, their processes continue to produce poor-quality data that increases data inaccuracies. In contrast, BetterCare s and HyCare s solutions involve Poor Relevancy * Computerized data is not relevant to current data consumers' tasks due to incomplete data for analysis and aggregation * Data producers fail to supply complete data Operational data production problems both data production processes and computer systems, resulting in long-term DQ improvements. BetterCare s efforts to designate single data production points deserve further discussion. Systems developed for different purposes sometimes require the same data, such as an indicator of patient severity in intensive care units in both STATUS and HICS. For HICS, a specialist examines the patient immediately before intensive care. For STATUS, an intensive-care nurse observes the patient during intensive care. These two observations can be different. To designate a single source, definitions and indicators of severity were agreed upon and both systems were changed to support this single data production source. BetterCare s decision to rewrite STATUS illustrates reputation development. Like accounting systems that prohibit changes once the accounting period is closed, STATUS prohibits changes to the official daily record. STATUS s data is consistent across time, whereas TRACE s data is accurate because it is updated as needed. Although both systems are viewed as containing the correct data, TRACE developed a reputation as the system with high-quality data, whereas STATUS s data was considered to be suspect. As a result, STATUS is being rewritten with update routines. Data utilization difficulty * Inability to integrate or aggregate data results in poor contextual DQ (data with little value-added or relevancy to data consumers' tasks) Incomplete Data * Need for new data * Need to aggregate data based on "fields" (attributes) that do not exist in the data Changing data consumers' needs Little Value-added * Combined computing systems add no additional value due to integrating and aggregating inconsistent representations Inconsistent Representation * Need to aggregate, report, and integrate across autonomous and heterogeneous systems Distributed computing (1) (2) (3) Figure 3. Contextual DQ problem pattern Accessibility DQ Pattern Accessibility DQ problems were characterized by underlying concerns about technical accessibility (Figure 2, subpatterns 1-2), data-representation issues interpreted by data consumers as accessibility problems (subpatterns 3-4), and data-volume issues interpreted as accessibility problems (subpattern 5). GoldenAir provides a simple example of subpattern 1. When GoldenAir moved to its new airport, its computing operations remained at the old airport with access to data via unreliable data communications lines. Since reservations had priority, the unreliable lines resulted in inventory data accessibility problems. This, in turn, contributed to GoldenAir s inventory accuracy problems because updating took lower priority than other data-related tasks. COMMUNICATIONS OF THE ACM May 1997/Vol. 40, No
6 BetterCare had an accessibility DQ concern related to the confidential nature of patient records (subpattern 2). Data consumers realized the importance of access security for patient records, but they also perceived the permissions as barriers to accessibility. This, in turn, affects the overall reputation and value of this data. In addition, data custodians became barriers to accessibility because they could not provide data access without approval. Subpattern 3 addresses concerns about interpretability and understandability of data. Coding systems for physician and hospital activities at BetterCare and HyCare are necessary for summarizing and grouping common diagnoses and procedures. The expertise required to interpret codes, however, becomes a barrier to accessibility; these codes are not understandable to most doctors and analysts. At HyCare, analyzing and interpreting across physician groups is a problem because they use different coding systems. Medical data in text or image form also presents an interpretability problem (subpattern 4). Medical records include text written by doctors and nurses and images produced by medical equipment, such as X-rays and EKGs. This data is difficult to analyze across time for individual patients. Furthermore, analyzing trends across patients is difficult. Thus, data representation becomes a barrier to data accessibility. This data are inaccessible to data consumers because it is not in a representation that permits analysis. Subpattern 5 addresses providing relevant data that adds value to tasks in a timely manner. For example, HyCare serves hundreds of thousands of patients resulting in several million patient records tracking medical history. Analyses of patient records usually require a weekend data extraction. In addition, companies purchasing HMO options are increasingly demanding evaluations of medical practices, resulting in an increased need for these analyses at HyCare. This pattern of a large amount of data leading to timeliness problems are interpreted as accessibility problems. Subpattern 1 has straight-forward, though possibly costly, solutions. For example, GoldenAir is moving its computing facility to the new airport to avoid unreliable data communication lines. Subpattern 5 is also relatively easy to solve. For example, BetterCare s HICS generates 40GB of data per year. From this, TRACE extracts the most relevant data (totaling 5GB over 12 years) for historical and crosspatient analyses. Subpatterns 3 and 4 are more difficult to solve. Although HyCare completely automated its medical records, including text and image data, to solve accessibility problems for individual patients, and problems with analyzing data across patients persist. At BetterCare, data consumers and custodians believe that an automated representation of text and image data would not solve their analyzability problems; thus, they partially automated their patient records. Contextual DQ Pattern We observed three underlying causes for data consumers complaints that available data does not support their tasks: missing (incomplete) data, inadequately defined or measured data, data that could not be appropriately aggregated. To solve these contextual DQ problems, specific projects were initiated to provide relevant data that adds value to the tasks of data consumers. Subpattern 1 in Figure 3 addresses incomplete data due to operational problems. At GoldenAir, incomplete data in inventory transactions contributed to inventory data accuracy problems. For example, mechanics sometimes failed to record part numbers on their work activity forms. Because transaction data was incomplete, the inventory database could not be updated, which in turn produced inaccurate records. According to one supervisor, this was tolerated because the primary job of mechanics is to service aircraft in a timely manner, not to fill out forms. BetterCare s data was incomplete by design (subpattern 2), whereas GoldenAir s data was incomplete due to operational problems. By design, the amount of data in BetterCare s TRACE database is small enough to be accessible but complete enough to be relevant and add value to data consumers tasks. As a result, data consumers occasionally complained about incomplete data. Subpattern 3 addresses problems caused by integrating data across distributed systems. At HyCare, data consumers complained about inconsistent definitions and data representations across divisions, like DRG codes stored with decimal points in one division and without in another. Furthermore, basic utilization measures, such as hospital days per thousand patients, were defined differently across divisions. These problems were caused by autonomous design decisions in each division. GoldenAir is considering bar code readers as data input mechanisms (subpattern 1). BetterCare s decision about the data to include in TRACE is reassessed as data consumers request additional data (subpattern 2), such as healthcare proxy and living will information were added. 108 May 1997/Vol. 40, No. 5 COMMUNICATIONS OF THE ACM
7 This reassessment of TRACE data in the context of its relevance and value to data consumers goes beyond missing data. As healthcare reimbursement systems move from payments for procedures performed (fee for service) to payments for diagnosed diseases (prospective payment) to possibly payments for yearly care of patients (capitated payment), the basic unit of analysis for managerial decision-making in hospitals has changed from procedures, to hospital visits, to patients. When BetterCare tracked data by procedures, for example, they could answer questions about costs of blood tests, but not costs of treating heart attacks. Such analyses became necessary when hospital reimbursement changed to a fixed amount for treating each disease. TRACE was developed in response to this anticipated change to prospective payments. Such a reimbursement system began in 1983 for Medicare. At that time, TRACE had the capability to aggregate across patient visits for similar diagnoses. Currently, the ability to aggregate across all in- and out-patient medical services delivered to each patient per year is being anticipated by BetterCare. Thus, TRACE is being extended with out-patient data and quality indicators because management anticipates these changes. HyCare initiated DQ projects to develop common data definitions and representations for cross-divisional data (subpattern 3). The comprehensive data dictionary and corresponding data warehouse are their next steps. Implications for IS Professionals Our findings provide generalizable implications for IS professionals about solving intrinsic, accessibility, and contextual DQ problems. Conventional DQ approaches employ control techniques (like edit checks, database integrity constraints, and program control of database updates) to ensure data quality. These approaches have improved intrinsic DQ substantially, especially the accuracy dimension. Attention to accuracy alone, however, does not correspond to data consumers broader DQ concerns. Furthermore, controls on data storage are necessary but not sufficient. IS professionals also need to apply process-oriented techniques, like IS auditing [12], to the processes that produce this data. Data consumers perceive any access barriers as accessibility problems. Conventional approaches treat accessibility as a technical, computer systems issue, not a DQ concern. That is, data custodians have provided access if data is technically accessible (such as when terminals and lines are connected and available, access permission is granted, and access methods are installed). To data consumers, however, accessibility goes beyond technical accessibility; it includes the ease with which they can manipulate this data to suit their needs. These contrasting accessibility views are evident in our study. For example, advanced forms of data (medical image data) can now be stored as binary large objects (blobs). Although data custodians provide technical methods for accessing this new form of data, data consumers continued to experience this data as inaccessible. They need to analyze this data like they analyze traditional record-oriented data. Other examples of differing views of accessibility include Data combined across autonomous systems is technically accessible, but data consumers view it as inaccessible because similar data items are defined, measured, or represented differently. Coded medical data is technically accessible as text, but data consumers view it as inaccessible because they cannot interpret the codes. Large volumes of data is technically accessible, but data consumers view it as inaccessible because of excessive access time. IS professionals must understand the difference between the technical accessibility they supply and the broad accessibility concerns of data consumers. Once this difference is clarified, technologies such as data warehouses can provide a smaller amount of more relevant data, and graphical interfaces can improve ease of access. Data consumers evaluate DQ relative to their tasks. At any time, the same data may be needed for multiple tasks that require different quality characteristics. Furthermore, these quality characteristics will change over time as work requirements change. Therefore, providing high-quality data implies tracking an ever-moving target. Conventional approaches handle contextual DQ through techniques such as user requirements analysis and relational database query capabilities. They do not explicitly incorporate the changing nature of data consumers task context. Because data consumers perform many different tasks and the data requirements for these tasks change, contextual DQ means much more than good data requirements specification. Providing highquality data along the dimensions of value and usefulness relative to data consumers task contexts places a premium on designing flexible systems with data that can be easily aggregated and manipulated. COMMUNICATIONS OF THE ACM May 1997/Vol. 40, No
8 The alternative is constant maintenance of data and systems to meet changing data requirements. Concluding Remarks Existing research focuses on intrinsic aspects of DQ. It fails to address the broader concerns of data consumers. While intrinsic DQ aspects are important, organizations also initiate projects to address accessibility and contextual DQ issues. Accessibility DQ includes concerns about the ease of access and ease of understanding data. Contextual DQ includes concerns about how well data matches task contexts. This research adopts a data-consumer perspective. The results confirm the importance of the quality categories and dimensions in our previous research [11]. They also enrich our understanding of how organizations experience DQ problems and which dimensions comprise these problems. For example, this research discovered that representational DQ dimensions are underlying causes of accessibility DQ problem patterns. Some might argue our research findings can be attributed to poor management or poor IS organizations at our field sites. We reject such a claim. The organizations we studied are competent and address their DQ problems effectively. They are at the forefront of DQ practice. Others may agree with our findings, but argue that accessibility and contextual DQ fall outside the domain. We also reject such a view. To solve organizational DQ problems, IS professionals must attend to the entire range of concerns of data consumers. The results of this research may be used as an empirical basis for building DQ theories about the nature of organizational DQ problems and their solutions. Given our results, new DQ theories will incorporate the task context of users and the processes by which users access and manipulate data to meet their task requirements. For example, a theory based on consumer marketing research could investigate when and how data consumers apply various DQ dimensions in choosing data for their tasks. Studies that focus on accessibility issues exemplify this approach. The three patterns for how intrinsic, accessibility, and contextual DQ problems develop in organizations provides an empirical basis for studying organizational choices and actions about DQ improvement. For example, organizational theories can be applied to understand how organizations find and choose to solve DQ problems. Following a timedependent decision processes perspective, solutions to DQ problems are found, implemented, learned, and improved, through adaptation over time. Following a perspective of organizational routines as sources of performance, TQM procedures and DQ administrators can establish organizational routines that improve DQ. Theories in information economics could also be applied to understanding organizational decisions about improvement. In addition to theory building, studies of DQ solutions could use the DQ-problem patterns identified in this research as solution objectives. For example, known DQ problems will focus the search for organizational mechanisms that solve these problems. Finally, this research should be replicated in organizations such as financial firms, where data is their primary product. c References 1. Ballou, D. P. and Pazer, H. L. Modeling data and process quality in multi-input, multi-output information systems. Manage. Sci. 31, 2 (1985), pp Ballou, D. P. and Tayi, K. G. Methodology for allocating resources for data quality enhancement. Commun. ACM 32, 3 (1989), pp Ballou, D. P., Wang, R. Y., Pazer, H., and Tayi, K. G. Modeling information manufacturing systems to determine information product quality. Manage. Sci. (accepted for publication, 1996). 4. Deming, E. W. Out of the Crisis. MIT Center for Advanced Engineering Study. Cambridge, Mass Laudon, K. C. Data quality and due process in large interorganizational record systems. Commun. ACM 29, 1 (1986), pp Liepins, G. E. and Uppuluri, V. R. R., Eds. Data Quality Control: Theory and Pragmatics. D. B. Owen, (1990), Marcel Dekker, New York, N.Y. 7. Morey, R. C. Estimating and improving the quality of information in MIS. Commun. ACM 25, 5 (1982), pp Wand, Y. and Wang, R. Y. Anchoring data quality dimensions in ontological foundations. Commun. ACM 39, 11 (1996), pp Wang, R. Y. and Kon, H. B. Towards Total Data Quality Management (TDQM). Information Technology in Action: Trends and Perspectives. R. Y. Wang, Ed Prentice Hall, Englewood Cliffs, NJ. 10.Wang, R. Y., Storey, V. C. and Firth, C. P. A framework for analysis of data quality research. IEEE Trans. Know. Data Eng. 7, 4 (1995), pp Wang, R. Y. and Strong, D. M. Beyond accuracy: What data quality means to data consumers. J. Manage. Info. Syst. 12, 4 (1996), pp Weber, R. EDP Auditing: Conceptual Foundations and Practices. G. B. Davis, Ed McGraw-Hill, New York, N.Y. Diane M. Strong ([email protected]) is an assistant professor in the management department at Worcester Polytechnic Institute, Worcester, Mass. Yang W. Lee ([email protected]) is associate director for the Total Data Quality Management Research Program at MIT, Cambridge, Mass., and president and CEO of Cambridge Research Group. Richard Y. Wang ([email protected]) is co-director for the Total Data Quality Management Research program and associate professor at MIT Sloan School of Management, Cambridge, Mass. Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior specific permission and/or a fee. ACM /97/0500 $ May 1997/Vol. 40, No. 5 COMMUNICATIONS OF THE ACM
10 Potholes in the Road to Information Quality
Cybersquare Diane M. Strong Yang W. Lee Richard Y. Wang Worcester Polytechnic Institute Cambridge Research Group Massachusetts Institute of Technology 10 Potholes in the Road to Quality Worganizations
Data Quality Assessment
Data Quality Assessment Leo L. Pipino, Yang W. Lee, and Richard Y. Wang How good is a company s data quality? Answering this question requires usable data quality metrics. Currently, most data quality
TOWARD A FRAMEWORK FOR DATA QUALITY IN ELECTRONIC HEALTH RECORD
TOWARD A FRAMEWORK FOR DATA QUALITY IN ELECTRONIC HEALTH RECORD Omar Almutiry, Gary Wills and Richard Crowder School of Electronics and Computer Science, University of Southampton, Southampton, UK. {osa1a11,gbw,rmc}@ecs.soton.ac.uk
Enhancing DataQuality. Environments
Nothing is more likely to undermine the performance and business value of a data warehouse than inappropriate, misunderstood, or ignored data quality. Enhancing DataQuality in DataWarehouse Environments
Appendix B Data Quality Dimensions
Appendix B Data Quality Dimensions Purpose Dimensions of data quality are fundamental to understanding how to improve data. This appendix summarizes, in chronological order of publication, three foundational
Information Quality Benchmarks: Product and Service Performance
Information Quality Benchmarks: Product and Service Performance Beverly K. Kahn, Diane M. Strong, and Richard Y. Wang Information quality (IQ) is an inexact science in terms of assessment and benchmarks.
Building a Data Quality Scorecard for Operational Data Governance
Building a Data Quality Scorecard for Operational Data Governance A White Paper by David Loshin WHITE PAPER Table of Contents Introduction.... 1 Establishing Business Objectives.... 1 Business Drivers...
Information Quality for Business Intelligence. Projects
Information Quality for Business Intelligence Projects Earl Hadden Intelligent Commerce Network LLC Objectives of this presentation Understand Information Quality Problems on BI/DW Projects Define Strategic
Report on the Dagstuhl Seminar Data Quality on the Web
Report on the Dagstuhl Seminar Data Quality on the Web Michael Gertz M. Tamer Özsu Gunter Saake Kai-Uwe Sattler U of California at Davis, U.S.A. U of Waterloo, Canada U of Magdeburg, Germany TU Ilmenau,
A Framework for Identifying and Managing Information Quality Metrics of Corporate Performance Management System
Journal of Modern Accounting and Auditing, ISSN 1548-6583 February 2012, Vol. 8, No. 2, 185-194 D DAVID PUBLISHING A Framework for Identifying and Managing Information Quality Metrics of Corporate Performance
Monterey County HEALTH INFORMATION MANAGEMENT CODING SUPERVISOR
Monterey County 50T22 HEALTH INFORMATION MANAGEMENT CODING SUPERVISOR DEFINITION Under direction, supervises the work of staff who review, interpret, code and abstract medical records information according
EQR PROTOCOL 4 VALIDATION OF ENCOUNTER DATA REPORTED BY THE MCO
OMB Approval No. 0938-0786 EQR PROTOCOL 4 VALIDATION OF ENCOUNTER DATA REPORTED BY THE MCO A Voluntary Protocol for External Quality Review (EQR) Protocol 1: Assessment of Compliance with Medicaid Managed
THE ORGANISATION. Senior Management Major end users (divisions) Information Systems Department
THE ORGANISATION Senior Management Major end users (divisions) Information Systems Department Technology Hardware Software Information Systems Specialists CIO Managers Systems analysts Systems designers
A QUESTIONNAIRE-BASED DATA QUALITY METHODOLOGY
A QUESTIONNAIRE-BASED DATA QUALITY METHODOLOGY Reza Vaziri 1 and Mehran Mohsenzadeh 2 1 Department of Computer Science, Science Research Branch, Azad University of Iran, Tehran Iran rvaziri@iauctbacir
Achieving meaningful use of healthcare information technology
IBM Software Information Management Achieving meaningful use of healthcare information technology A patient registry is key to adoption of EHR 2 Achieving meaningful use of healthcare information technology
Analyzing and Improving Data Quality
Analyzing and Improving Data Quality Agustina Buccella and Alejandra Cechich GIISCO Research Group Departamento de Ciencias de la Computación Universidad Nacional del Comahue Neuquen, Argentina {abuccel,acechich}@uncoma.edu.ar
CHAPTER SIX DATA. Business Intelligence. 2011 The McGraw-Hill Companies, All Rights Reserved
CHAPTER SIX DATA Business Intelligence 2011 The McGraw-Hill Companies, All Rights Reserved 2 CHAPTER OVERVIEW SECTION 6.1 Data, Information, Databases The Business Benefits of High-Quality Information
How to Improve Your Revenue Cycle Processes in a Clinic or Physician Practice
How to Improve Your Revenue Cycle Processes in a Clinic or Physician Practice Janice Crocker, MSA, RHIA, CCS, CHP Introduction Reimbursement for medical practices has been impacted by various trends and
An Integrated Methodology for Implementing ERP Systems
APDSI 2000 Full Paper (July, 2000) An Integrated Methodology for Implementing ERP Systems Su-Yeon Kim 1), Eui-Ho Suh 2), Hyun-Seok Hwang 3) 1) Department of Industrial Engineering, POSTECH, Korea ([email protected])
Appendix A: Data Quality Management Model Domains and Characteristics
Appendix A: Data Quality Management Model Domains and Characteristics Characteristic Application Collection Warehousing Analysis Data Accuracy The extent to which the data are free of identifiable errors.
Toward A Framework For Data Quality In Cloud- Based Health Information System
Toward A Framework For Data Quality In Cloud- Based Health Information System Omar Almutiry, Gary Wills, Abdulelah Alwabel, Richard Crowder and Robert Walters Electronics and Computer Science University
Linking Quality to Payment
Linking Quality to Payment Background Our nation s health care delivery system is undergoing a major transformation as reimbursement moves from a volume-based methodology to one based on value and quality.
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 [email protected]
Total Data Quality Management: The Case of IRI
Total Quality Management: The Case of IRI Rita Kovac Yang W. Lee Leo L. Pipino Information Resources Incorporated Cambridge Research Group University of Massachusetts Lowell [email protected] [email protected]
Computer-assisted coding and documentation for a changing healthcare landscape
Computer-assisted coding and documentation for a changing healthcare landscape Reality is, in this new ICD-10 environment, providers will have two options: a) work harder; or b) find a new way to do things.
Position Classification Standard for Medical Records Administration Series, GS-0669
Position Classification Standard for Medical Records Administration Series, GS-0669 Table of Contents SERIES DEFINITION... 2 SERIES COVERAGE... 2 EXCLUSIONS... 2 OCCUPATIONAL INFORMATION... 3 TITLES...
The Official Guidelines for coding and reporting using ICD-9-CM
Reporting Accurate Codes In the Era of Recovery Audit Contractor Reviews Sue Roehl, RHIT, CCS The Official Guidelines for coding and reporting using ICD-9-CM A set of rules that have been developed to
HEALTH INFORMATION MANAGEMENT CODER I/II
Monterey County I 50T02 II 50T03 HEALTH INFORMATION MANAGEMENT CODER I/II DEFINITION Under general supervision, reviews, interprets, codes and abstracts medical records information according to standard
Billing an NP's Service Under a Physician's Provider Number
660 N Central Expressway, Ste 240 Plano, TX 75074 469-246-4500 (Local) 800-880-7900 (Toll-free) FAX: 972-233-1215 [email protected] Selection from: Billing For Nurse Practitioner Services -- Update
The Field. Preparation
Medical Records and Health Information Technicians Overview The Field - Preparation - Specialty Areas - Day in the Life - Earnings - Employment - Career Path Forecast - Professional Organizations The Field
HOW TO PREVENT AND MANAGE MEDICAL CLAIM DENIALS TO INCREASE REVENUE
Billing & Reimbursement Revenue Cycle Management HOW TO PREVENT AND MANAGE MEDICAL CLAIM DENIALS TO INCREASE REVENUE Billing and Reimbursement for Physician Offices, Ambulatory Surgery Centers and Hospitals
Data Quality Assurance
CHAPTER 4 Data Quality Assurance The previous chapters define accurate data. They talk about the importance of data and in particular the importance of accurate data. They describe how complex the topic
When to consider OLAP?
When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: [email protected] Abstract: Do you need an OLAP
GAO VA HEALTH CARE. Reliability of Reported Outpatient Medical Appointment Wait Times and Scheduling Oversight Need Improvement
GAO United States Government Accountability Office Report to Congressional Requesters December 2012 VA HEALTH CARE Reliability of Reported Outpatient Medical Appointment Wait Times and Scheduling Oversight
GAO VA HEALTH CARE. Reliability of Reported Outpatient Medical Appointment Wait Times and Scheduling Oversight Need Improvement
GAO United States Government Accountability Office Report to Congressional Requesters December 2012 VA HEALTH CARE Reliability of Reported Outpatient Medical Appointment Wait Times and Scheduling Oversight
Conceptualizing Total Quality Management (TQM) for Improving Housing Areas for the Urban Poor
Conceptualizing Total Quality Management (TQM) for Improving Housing Areas for the Urban Poor Abstract This paper examines the concept of TQM and investigates and identifies factors in all three phases
Practice Brief: Data Quality Management Model
Practice Brief: Data Quality Management Model Data is driving more and more healthcare industry decision making. This is evidenced by the many initiatives to capture outcome data, such as the: Joint Commission
IMPROPER PAYMENTS FOR EVALUATION AND MANAGEMENT SERVICES COST MEDICARE BILLIONS
Department of Health and Human Services OFFICE OF INSPECTOR GENERAL IMPROPER PAYMENTS FOR EVALUATION AND MANAGEMENT SERVICES COST MEDICARE BILLIONS IN 2010 Daniel R. Levinson Inspector General May 2014
TOWARDS IMPLEMENTING TOTAL DATA QUALITY MANAGEMENT IN A DATA WAREHOUSE
Journal of Information Technology Management ISSN #1042-1319 A Publication of the Association of Management TOWARDS IMPLEMENTING TOTAL DATA QUALITY MANAGEMENT IN A DATA WAREHOUSE G. SHANKARANARAYANAN INFORMATION
Organizational Structure: Mintzberg s Framework
VOLUME 14, NUMBER 1, 2012 Organizational Structure: Mintzberg s Framework Fred C. Lunenburg Sam Houston State University ABSTRACT Henry Mintzberg suggests that organizations can be differentiated along
Data Governance: Measure Twice, Cut Once. April 14, 2015
Data Governance: Measure Twice, Cut Once April 14, 2015 Dr. Stephen Morgan, SVP & CMIO, Carilion Clinic Randy L. Thomas, FHIMSS, Associate Partner, Encore, A Quintiles Company DISCLAIMER: The views and
Exploring Information Quality in Accounting Information Systems Adoption
IBIMA Publishing Communications of the IBIMA http://www.ibimapublishing.com/journals/cibima/cibima.html Vol. 2011 (2011), Article ID 683574, 12 pages DOI: 10.5171/2011.683574 Exploring Information Quality
Foundations of Business Intelligence: Databases and Information Management
Chapter 6 Foundations of Business Intelligence: Databases and Information Management 6.1 2010 by Prentice Hall LEARNING OBJECTIVES Describe how the problems of managing data resources in a traditional
Reusing Meta-Base to Improve Information Quality
Reusable Conceptual Models as a Support for the Higher Information Quality 7DWMDQD :HO]HU %UXQR 6WLJOLF,YDQ 5R]PDQ 0DUMDQ 'UXåRYHF University of Maribor Maribor, Slovenia ABSTRACT Today we are faced with
Data Quality in Information Systems
A College Course: Data Quality in Information Systems PG 895 Key Topics from Research Measurement Impacts TQM Data Entry Policies Error Detection Dimensions Change Processes User Requirements Information
Developing and Reporting Supplementary Financial Measures Definition, Principles, and Disclosures
IFAC Board Exposure Draft February 2014 Professional Accountants in Business Committee International Good Practice Guidance Developing and Reporting Supplementary Financial Measures Definition, Principles,
Factors that Affect Accounting Information System Implementation and Accounting Information Quality: A Survey in University Utara Malaysia
American Journal of Economics 2013, 3(1): 27-31 DOI: 10.5923/j.economics.20130301.06 Factors that Affect Accounting Information System Implementation and Accounting Information Quality: A Survey in University
IQ Principles in Software Development
IQ Principles in Software Development Dipl.Kfm. Michael Mielke DB Bildung (DZB / TQM Team) Teamleiter Informationsmanagement TQM Solmsstrasse, 18 60486 Frankfurt / Main [email protected] Abstract:
IS YOUR DATA WAREHOUSE SUCCESSFUL? Developing a Data Warehouse Process that responds to the needs of the Enterprise.
IS YOUR DATA WAREHOUSE SUCCESSFUL? Developing a Data Warehouse Process that responds to the needs of the Enterprise. Peter R. Welbrock Smith-Hanley Consulting Group Philadelphia, PA ABSTRACT Developing
Information Governance includes the Core Record Set for Coding Compliance Bonnie S. Cassidy, MPA, RHIA, FHIMSS
Information Governance includes the Core Record Set for Coding Compliance Bonnie S. Cassidy, MPA, RHIA, FHIMSS DISCLAIMER: The views and opinions expressed in this presentation are those of the author
How To Write An Hm Compliance Program
Health Information Management Compliance A Model Program for Healthcare Organizations 2002 Edition Sue Prophet, RHIA, CCS Contents About the Author....................................................vii
TOTAL DATA QUALITY MANAGEMENT: A STUDY OF BRIDGING RIGOR AND RELEVANCE
TOTAL DATA QUALITY MANAGEMENT: A STUDY OF BRIDGING RIGOR AND RELEVANCE Fons Wijnhoven, University of Twente, Enschede, Netherlands, [email protected] Roy Boelens, SG Automatisering, Emmen, Netherlands,
Healthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
IJMIE Volume 2, Issue 8 ISSN: 2249-0558
MANAGEMENT INFORMATION SYSTEM Prof. Nirmal Kumar Sharma* ABSTRACT The business application of Management Information System has expanded significantly over the years. Technology advances have increased
B.Com(Computers) II Year RELATIONAL DATABASE MANAGEMENT SYSTEM Unit- I
B.Com(Computers) II Year RELATIONAL DATABASE MANAGEMENT SYSTEM Unit- I 1 1. What is Data? A. Data is a collection of raw information. 2. What is Information? A. Information is a collection of processed
GAO. Assessing the Reliability of Computer-Processed Data. Applied Research and Methods. United States Government Accountability Office GAO-09-680G
GAO United States Government Accountability Office Applied Research and Methods July 2009 External Version I Assessing the Reliability of Computer-Processed Data GAO-09-680G Contents Preface 1 Section
Frequently Asked Questions about ICD-10-CM/PCS
Frequently Asked Questions about ICD-10-CM/PCS Q: What is ICD-10-CM/PCS? A: ICD-10-CM (International Classification of Diseases -10 th Version-Clinical Modification) is designed for classifying and reporting
Revenue Integrity Boot Camp. Coding. Agenda
Annie Lee Sallee MBA, RHIT, CPC, CPMA AHIMA Approved ICD-10-CM/PCS Trainer Revenue Cycle Education Specialist Home Town Health Jenan Custer CPC, CCS AHIMA Approved ICD-10-CM/PCS Trainer and Ambassador
6. MEASURING EFFECTS OVERVIEW CHOOSE APPROPRIATE METRICS
45 6. MEASURING EFFECTS OVERVIEW In Section 4, we provided an overview of how to select metrics for monitoring implementation progress. This section provides additional detail on metric selection and offers
one Introduction chapter OVERVIEW CHAPTER
one Introduction CHAPTER chapter OVERVIEW 1.1 Introduction to Decision Support Systems 1.2 Defining a Decision Support System 1.3 Decision Support Systems Applications 1.4 Textbook Overview 1.5 Summary
Master Data Management
Master Data Management Managing Data as an Asset By Bandish Gupta Consultant CIBER Global Enterprise Integration Practice Abstract: Organizations used to depend on business practices to differentiate them
Understanding the HIPAA standard transactions: The HIPAA Transactions and Code Set rule
Understanding the HIPAA standard transactions: The HIPAA Transactions and Code Set rule Many physician practices recognize the Health Information Portability and Accountability Act (HIPAA) as both a patient
Software Requirements Specification (SRS)
Software Requirements Specification (SRS) Meeting Scheduler MANISH BANSAL ABHISHEK GOYAL NIKITA PATEL ANURAG MAHAJAN SMARAK BHUYAN - 1 - VERSION RECORD Version record showing the amendments effected to
Purposes of Patient Records
CHAPTER 6 Documentation 1 Slide 1 Purposes of Patient Records Five Basic Purposes for Written Records Written communication Permanent record for accountability Legal record of care Teaching Research and
What you should know about Data Quality. A guide for health and social care staff
What you should know about Data Quality A guide for health and social care staff Please note the scope of this document is to provide a broad overview of data quality issues around personal health information
Clinical Database Information System for Gbagada General Hospital
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 2, Issue 9, September 2015, PP 29-37 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org
AHIMA Curriculum Map Health Information Management Associate Degree Approved by AHIMA Education Strategy Committee February 2011
HIM Associate Degree Entry Level Competencies (Student Learning Outcomes) I. Domain: Health Data Management A. Subdomain: Health Data Structure, Content and Standards 1. Collect and maintain health data
Course 103402 MIS. Foundations of Business Intelligence
Oman College of Management and Technology Course 103402 MIS Topic 5 Foundations of Business Intelligence CS/MIS Department Organizing Data in a Traditional File Environment File organization concepts Database:
The Usability of Electronic Stores based on the Organization of Information and Features
The Usability of Electronic Stores based on the Organization of Information and Features CHAN KAH-SING Singapore Polytechnic This paper describes an investigation on how the perceived usability of electronic
Answers to Review Questions
Tutorial 2 The Database Design Life Cycle Reference: MONASH UNIVERSITY AUSTRALIA Faculty of Information Technology FIT1004 Database Rob, P. & Coronel, C. Database Systems: Design, Implementation & Management,
10/23/2010. Objectives. Coding Process. What is ICD-9-CM coding? HCPCS. What is CPT-4? Provide a basic understanding of the coding process
Objectives Medical Coding and Billing HCMT 200 Provide a basic understanding of the coding process Understand the importance of complete, accurate documentation to the coding process Learn the benefits
POSITION DESCRIPTION/ COLUMBUS REGIONAL HEALTHCARE SYSTEM HEALTH INFORMATION MANAGEMENT
POSITION DESCRIPTION/ COLUMBUS REGIONAL HEALTHCARE SYSTEM JOB TITLE CODING SUPERVISOR JOB CODE 0172 DEPARTMENT FLSA (Exempt/Non-Exempt) HEALTH INFORMATION MANAGEMENT NON-EXEMPT DEPARTMENT DIRECTOR SIGNATURE
A Framework to Assess Healthcare Data Quality
The European Journal of Social and Behavioural Sciences EJSBS Volume XIII (eissn: 2301-2218) A Framework to Assess Healthcare Data Quality William Warwick a, Sophie Johnson a, Judith Bond a, Geraldine
GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington
GEOG 482/582 : GIS Data Management Lesson 10: Enterprise GIS Data Management Strategies Overview Learning Objective Questions: 1. What are challenges for multi-user database environments? 2. What is Enterprise
ENGAGING PHYSICIANS FOR ICD-10: ALL ABOARD Engaging Physicians for ICD-10: All Aboard
ENGAGING PHYSICIANS FOR ICD-10: ALL ABOARD Engaging Physicians for ICD-10: All Aboard ICD-10 Lisa Kozakoff Principal Consultant Siemens Healthcare Lisa Kozakoff Principal Consultant Agenda Introduction
Health Administration
A. Occupations Health Care Job Information Sheet #15 Health Administration A. Occupations 1) Health Administrator/Policy Analyst 2) Site Administrative Coordinator 3) Medical Secretary/Health Office Administrator
The Economic Effect of Implementing an EMR in an Outpatient Clinical Setting
Reprinted from Volume 18, Number 1 Winter 2004 Original Contributions The Economic Effect of Implementing an EMR in an Outpatient Clinical Setting Scott Barlow, MBA, Jeffrey Johnson, MD, Jamie Steck, MBA
