Data Quality Managing successfully

Similar documents
Appendix B Data Quality Dimensions

Data Quality Assessment. Approach

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

CHAPTER SIX DATA. Business Intelligence The McGraw-Hill Companies, All Rights Reserved

THE QUALITY OF DATA AND METADATA IN A DATAWAREHOUSE

DATA GOVERNANCE AND DATA QUALITY

Information Quality for Business Intelligence. Projects

Enterprise Data Quality

Semantic Integration in Enterprise Information Management

POLAR IT SERVICES. Business Intelligence Project Methodology

Connecting the dots: IT to Business

The Role of the BI Competency Center in Maximizing Organizational Performance

Contents. visualintegrator The Data Creator for Analytical Applications. Executive Summary. Operational Scenario

Measuring and Monitoring the Quality of Master Data By Thomas Ravn and Martin Høedholt, November 2008

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Ten Steps to Quality Data and Trusted Information

Implementing Oracle BI Applications during an ERP Upgrade

Master Data Management and Data Warehousing. Zahra Mansoori

NCOE whitepaper Master Data Deployment and Management in a Global ERP Implementation

SUCCESS STORY Our client is the largest food and beverage company in the world producing high quality food products

Data Quality Assurance

ISSUES AND OPPORTUNITIES FOR IMPROVING THE QUALITY AND USE OF DATA WITHIN THE DOD

Customer-Centric Information Quality Management

Root causes affecting data quality in CRM

Reduce and manage operating costs and improve efficiency. Support better business decisions based on availability of real-time information

Five Fundamental Data Quality Practices

JOURNAL OF OBJECT TECHNOLOGY

Increase Business Intelligence Infrastructure Responsiveness and Reliability Using IT Automation

Busting 7 Myths about Master Data Management

DATA QUALITY ASPECTS OF REVENUE ASSURANCE (Practice Oriented)

Data Integration Alternatives Managing Value and Quality

Foundations of Business Intelligence: Databases and Information Management

Internal Audit. Audit of the Inventory Control Framework

Hertsmere Borough Council. Data Quality Strategy. December

Data Integration Alternatives Managing Value and Quality

Business Intelligence and Analytics: Leveraging Information for Value Creation and Competitive Advantage

Implementing Oracle BI Applications during an ERP Upgrade

METRICS FOR MEASURING DATA QUALITY Foundations for an economic data quality management

Course MIS. Foundations of Business Intelligence

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

What Is Data Quality and Why Should We Care?

The Advantages of a Golden Record in Customer Master Data Management

Working with SAP BI 7.0 Data Transfer Process (DTP)

Using Master Data in Business Intelligence

Data Governance. David Loshin Knowledge Integrity, inc. (301)

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

Advantages of Implementing a Data Warehouse During an ERP Upgrade

Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects

Operationalizing Data Governance through Data Policy Management

Capgemini Financial Services. 29 July 2010

Statement of Guidance

Technology-Driven Demand and e- Customer Relationship Management e-crm

Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution

Structure of the presentation

Implementing a SQL Data Warehouse 2016

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success

Dutch Accreditation Council (RvA) Policy rule Nonconformities. Corrective action

Scalable Enterprise Data Integration Your business agility depends on how fast you can access your complex data

Implementing a Data Warehouse with Microsoft SQL Server 2014

Calculation of Risk Factor Using the Excel spreadsheet Calculation of Risk Factor.xls

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

7 Directorate Performance Managers. 7 Performance Reporting and Data Quality Officer. 8 Responsible Officers

SAP BUSINESSOBJECTS SUPPLY CHAIN PERFORMANCE MANAGEMENT IMPROVING SUPPLY CHAIN EFFECTIVENESS

INTERNATIONAL STANDARD ON AUDITING (UK AND IRELAND) 315

Knowledge Base Data Warehouse Methodology

Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff

Data Quality Assessment

A Conceptual Methodology and Practical Guidelines for Managing Data and Documents on Hydroelectric Projects

Making Business Intelligence Easy. Whitepaper Measuring data quality for successful Master Data Management

Reliable, efficient and professional.

DATA AUDIT: Scope and Content

THOMAS RAVN PRACTICE DIRECTOR An Effective Approach to Master Data Management. March 4 th 2010, Reykjavik

How To Audit A Company

Enabling Data Quality

MODEL DRIVEN DEVELOPMENT OF BUSINESS PROCESS MONITORING AND CONTROL SYSTEMS

MOC 20467B: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

SQL Server 2012 Business Intelligence Boot Camp

Business Intelligence for Financial Services: A Case Study

White Paper April 2006

There are a number of factors that increase the risk of performance problems in complex computer and software systems, such as e-commerce systems.

Building a Data Quality Scorecard for Operational Data Governance

An Exploratory Study of Data Quality Management Practices in the ERP Software Systems Context

Talend Metadata Manager. Reduce Risk and Friction in your Information Supply Chain

Design Patterns for Complex Event Processing

Management Update: CRM Success Lies in Strategy and Implementation, Not Software

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

MDM for the Enterprise: Complementing and extending your Active Data Warehousing strategy. Satish Krishnaswamy VP MDM Solutions - Teradata

Business Intelligence in Oracle Fusion Applications

Implementing a Data Warehouse with Microsoft SQL Server

Migrating to TM1. The future of IBM Cognos Planning, Forecasting and Reporting

A Framework to Assess Healthcare Data Quality

Frameworx 13.5 Implementation Conformance Certification Report

Data Integration Checklist

Course Outline. Module 1: Introduction to Data Warehousing

Comprehensive Data Quality with Oracle Data Integrator. An Oracle White Paper Updated December 2007

IBM Information Management

Presented By: Leah R. Smith, PMP. Ju ly, 2 011

Tableau Metadata Model

Evaluating the Business Impacts of Poor Data Quality

Transcription:

Consultants Intelligence Business White Paper Data Quality Managing successfully Data quality is essential. It is the key to acceptance of IT solutions. And: Poor data is expensive. Companies that have recognised this, embrace central initiatives to guarantee a high level of data quality, often in correlation with data governance projects. Admittedly not at no cost, as Kurt Häusermann and Marcus Pilz expound on in the following text. Kurt Häusermann is founder of BI Consultants GmbH in Zurich. He has been working more than 20 years in data management, Adulterated, incomplete and inconsistent data, and the consequent derived information, lead to problems in business processes wherever this data are used. Invalid data delays the daily work and results in additional effort and hence costs. It can become critical for a company, when a strategic decision is based on an inadequate data basis due to a lack of data quality. Furthermore, the significance of data quality continues to rise due to increasing requirements in the area of compliance. Analytics and Business Intelligence, especially for Life Science companies. Marcus Pilz is a board member of the Data Warehousing Institute. He has been working as a project The negative influence of bad data quality upon companies has been investigated in different studies. Thomas Redman, a well known authority for data quality, estimates the effect of bad data quality at 8 to 12 percent of the revenue (Redman, 1996). The consequences of bad data quality in a company are often obscure, do not get quantified and are often accepted by managers as a normal cost of doing business (English, 1999). Flawed data in operative systems are often not even regarded as being erroneous; as such data errors play a minor role in the business process. Later however, when the data is transferred to a data warehouse and analysed, the bad data quality shows itself immediately. Categories appear then in the reports, that as such clearly should not exist or that are repeatedly present with differing designations. The result: Incorrect aggregation of the data. Such problems worsen the acceptance by the business users, who would not like to base important business decisions on erroneous data. Being laisser-faire with data quality therefore causes direct and indirect costs for the company, which should be taken seriously. leader in the BI environment for approximately 20 years. He is an experienced speaker at international BI symposiums and is technical adviser as well as evaluator for the The cost of data quality Prof. Martin Eppler and Markus Helfert, both from the University of St. Gallen, designed a cost model for data quality in 2004. At first they determine the cost of bad data quality. Belonging to this are the direct costs, costs for the veri- technical magazine BI-Spektrum. 1

2 fication of the data, correction of invalid data and the consequences thereof, as well as costs due to, for example, clients not being able to be reached as a result of incorrect address details and indirect costs such as expenditures due to incorrect decisions, unperceived opportunities, loss of image or customer dissatisfaction due to wrong deliveries. Subsequently, they determine the cost of improving, or more specifically, guaranteeing a sufficient data quality. Included are the costs for prevention, detection and data cleansing. Amongst the prevention costs are the measures necessary in order that fewer errors occur, such as the improvement of data capture through plausibility tests, documented standards, better training of personnel or better coordination between subprocesses. Amongst the detection costs are the measures that lead to the discovery of already existing errors in the data, such as the analysis of existing databases with the aid of rules in order to detect invalid or inconsistent data. The repair costs include all activities that are necessary in order to correct the detected errors in the databases. Data Quality caused by low Data Quality of improving or assuring Data Quality Direct Indirect Prevention Detection Repair Verfication based on lower Reputation Training Analysis Repair Planning Re-Entry based on wrong Decisions or Actions Monitoring Reporting Repair Implementation Compensation Sunk Investment Standard Development and Deployment Eppler, Helfert: A Framework for the Classification of Data Quality and an Analysis of their Progression Now the costs for the improvement and guarantee of data quality can be compared with those for bad data quality. The main purpose exists therein, to find an optimum whereby the costs of bad data quality are reduced, without the cost of data quality improvement becoming too much of a cost factor. The search for the optimal data quality The illustration below shows that there is an optimum for data quality that must be found in practice. Significant in the illustration is that a too high an appraisal of data quality can lead to higher costs. Not only economic arguments pertain as justification for better data quality. In order to attain compli-

3 ance, an effort that lies far above the optimum may be necessary, but must, nevertheless, be carried out. Ultimately, the well known formulation Fitness for use by Joseph Juran applies to data quality. It means: Quality must suffice the purpose of the application. Consequently quality, and thus also data quality, must align itself with the requirements. This applies also for the rationale for data quality already cited above. Eppler, Helfert: A Framework for the Classifi ation of Data Quality and an Analysis of their Progression Causes of bad data quality Bad data quality begins with the very first capture of the data. Data are entered incorrectly and checked incompletely or not at all by the system. The personnel responsible for capturing the data have little training and there often exists none or only rudimentary standards for data capture. Additionally, an awareness of the consequences of data errors does not exist, because the personnel don t understand to what end the data will be used later. An invoice amount may seem important, but what meaning do department names have, for example? This shows itself much later during reporting, of which these personnel mostly don t get to know. A feedback-loop is often missing in practise, in which the data capturers are informed about data errors. In systems with entries that are less structured, obscurities and misinterpretations persist by the data capture. Business processes change. But operative systems are not always able to be adapted synchronously. Therefore fields will, for the sake of simplicity, be used otherwise, so that the operating system can be maintained. Not kept in mind are the consequences that this casualness may have later on in follow-up systems and the data warehouse. Further sources of bad data quality lie in the inadequate data architecture of the source systems. These often originate autonomous and include varying perceptions of the company. The consequent data representation is thus diverse, which can greatly impede the integration. Should a source system already once have been migrated, there exists a high risk that migration errors will be present, which have as yet gone unnoticed in the operative activities. After all, there exists the danger during the integration of data stocks that the

4 data contents are not accurately defined or that the compiled documents no longer reflect the current state. In such cases, data of varying semantics are brought together which can lead to a systematic falsification of the data. Certain data stocks can also be forgotten during integration. Missing data from offshore branches or subsidiaries can lead to an aggregation at company level being incorrectly calculated. Data quality dimensions Before one improves the data quality, one should define the dimensions to which quality will be measured. Richard Wang, who has been researching data quality at the Massachusetts Institute of Technology (MIT) for 20 years, pointed out in his widely adopted article Beyond accuracy: What data quality means to data consumers (1996), that in data quality it is not only about correctness and accuracy, but that data quality also comprises other dimensions. Most authors assume a 360 degree business user point of view. The following table shows a selection of possible data quality dimensions. Data Quality Dimension Accuracy Consistency Completeness (Attribute Level) Timeliness Relevance Clear definition Identifiability Definition (T. Redman, 2001) Degree of agreement between a data value or collection of data values and a source agreed to be correct. Degree to which a set of data satisfies business rules. Degree to which data values are present for required attributes or the degree to which required data records are present. Degree to which information chain or process is completed within a prespecified date or time. Degree to which data are relevant to a particular task or decision. A datum is clearly defined, if it is unambiguosly defined using simple terms. A good data model calls for each distinct entity to be uniquely identified. Which dimension are meaningful for a particular purpose and how the dimension should be measured, depends on the concrete goals. Important is that the quality dimensions should orientate themselves to the objectives, and not to that which a particular tool can do. Data Quality Improvement Basically, it can be assumed that operative systems exist, that continually produce a portion of erroneous data. Furthermore, a multiple of information already exists in key databases, which are partially faulty. A strategy to improve data quality must apply to both areas. On the one hand, the constant further accrual of flawed data must be prevented, while on the other hand existing data in the database must either be cleansed there or in a step prior to input in the data warehouse. It has been shown in projects, that the responsibility

5 for the data is not regulated enough. The creation of explicit responsibilities for data sources is an important first step in data quality projects. Besides, it is sensible to create the roll of data steward. Since data quality can not be measured without an accurate description of the data semantics, a test of the metadata and a test of the conformity between metadata and the effective usage belong to the standard preparation for a data quality undertaking. Now dimensions that have relevance to the undertaking can be determined, and quality policies defined to which the data must conform. This concerns predominantly the completeness, correctness and the consistency of the data. Data Profiling An important approach is that of data profiling. It is a matter of the systematic methodology of analyzing and technically assessing largely automated data, in order to implement corrective measures. Ideally, data profiling would be implemented at the beginning of a project or generally performed asynchronous to data warehouse processes in an independent hardware infrastructure. In doing so, the complete data portfolio would be extracted into a separate environment, since data profiling is burdensome on runtime owing to large data volumes and consistent data bases must be maintained over a longer period of time for the analyses. Modern tools support data profiling teams at the analysis, which should consist of small groups of about 3 people comprising interdisciplinary IT and business skills. The analysis initially consists of a conditioning of the results, in order to technically evaluate and to secure these in form of business rules in workshops. As a result of the profiling process, the user receives a list of possible problem areas in the used data and can assess, whether a correction of the problem must take place and what outlay must be planned for this. A variety of data profiling tools are available on the market. These could be utilized as an alternative to a self development, and offer the advantage of being quickly deployable, of supporting many data formats and of delivering consistent results. Own developments offer the advantage of being able to be integrated better into ETL processes: Profiling test routines should namely be so configured, that they are enhanced with test and approval processes at a later stage in the ETL process. At this juncture the profiling methods and the target statuses will be saved in the metadata, and the ETL process will be enhanced by inspection and approval processes. A time series analysis to forecast record counts or ranges of values is thereby possible under operating conditions. Chains of report to those responsible for data quality or business can be implemented through SMS or E-mail, for prompt clarification or initiation of corrections and avoidance of faulty runs.

6 References Apel, D. et al.: Datenqualität erfolgreich steuern. Hanser, München, 2009 English, L.: Improving Data Warehouse and Business Information Quality: Methods for Reducing and Increasing Profits. Wiley, New York,1999 Eppler, M. and M. Helfert: A Framework for the Classifi cation of Data Quality and an Analysis of their Progression. http://www.computing.dcu.ie/~mhelfert/ Research/publication/2004/EpplerHelfert_ICIQ2004.pdf Lee, Y.W. et al.: Journey to Data Quality. The MIT Press, Cambridge, 2006 Redman, T.: Data Quality for the Information Age. Artech, Boston, 1996 Redman, T.: Data Quality: The Field Guide. Digital Press, Boston, 2001 BI Consultants GmbH Hadlaubstrasse 124 CH - 8006 Zürich Switzerland tel + 41 44 350 40 51 mob + 41 79 332 87 15 info@bi-consultants.ch www.bi-consultants.ch