Data Quality Management The Most Critical Initiative You Can Implement SUGI 29 Montreal May 2004 Claudia Imhoff President Intelligent Solutions, Inc. CImhoff@Intelsols.com www.intelsols.com Jonathan G. Geiger Executive Vice President Intelligent Solutions, Inc. JGeiger@Intelsols.com www.intelsols.com Copyright 2004 Intelligent Solutions, Inc., All Rights Reserved
Topics What is Data Quality Management? Data Quality Management Challenges Data Quality Definition Four Pillars of Data Quality Management Getting Started 2
Data is an Asset Other corporate assets include People Capital (Money) Property Materials Assigning value is difficult Establishing ROI for Data Quality Management efforts is also difficult DATA 3
What is Data Quality Management? Establishment and deployment of: Roles, Responsibilities, Policies and Procedures Concerning the acquisition, maintenance, dissemination and disposition of data Viability of business decisions contingent on good data... Good data contingent on an effective approach to Data Quality Management 4
Data Quality Management Responsibilities Business Responsibilities Business rules governing data Data quality verification Information Technology Responsibilities Manage environment for acquiring, maintaining, disseminating, and disposing of electronic data Architecture Infrastructure Systems Databases 5
Data Quality Management Roles Program Manager and Project Leader Organization Change Agent Business Analyst and Data Analyst Data Steward 6
Data Quality Management Components Reactive: addresses problems that already exist Deal with inherent data problems, integration issues, merger and acquisition challenges Proactive: diminishes the potential for new problems to arise Governance, roles and responsibilities, quality expectations, supporting business practices, specialized tools. Both are needed 7
Data Quality Management Importance Companies often realize the importance too late Only after several documented problems with the data do they recognize the need to improve its quality. Billions of dollars are lost annually due to data quality problems. Additional estimates have shown that 15-20% of the data in a typical organization is erroneous or otherwise unusable. The importance of Data Quality Management should be evident so why aren t companies addressing it more aggressively? 8
Topics What is Data Quality Management? Data Quality Management Challenges Data Quality Definition Four Pillars of Data Quality Management Getting Started 9
Data Quality Management Challenges: Responsibility No single business unit is responsible for enterprise data Once captured in operational system, business unit washes hands of further responsibility Savvy corporations adopt data stewardship approach Leaders not focused on data issues 10
Data Quality Management Challenges: Cross Functionality Horizontal alignment in a vertical world Data Quality Management crosses organizational boundaries Compromise is often necessary 11
Data Quality Management Challenges: Problem Recognition Corporation must recognize that it HAS a Data Quality Management problem Is your company in denial? Getting money for a unrecognized problem is difficult at best 12
Data Quality Management Challenges: Discipline Downstream impacts must be understood and considered in decisions Corporation must define and assign responsibilities In job descriptions Formal procedures must be created 13
Time Funding Resources Data Quality Management Challenges: Investment All needed to overcome unquality Examples Duplicate materials to the same customer or prospect Exclusion of viable prospect from mailing list 14
Data Quality Management Challenges: On-Going Effort This is not a one-time effort Data Quality Management Staffing is required Should reduce staffing requirements elsewhere Governance is the name of the game Customizable tools needed 15
Data Quality Management Challenges: Return on Investment What is the cost of unquality? Work-arounds absorbed into daily processes How do you determine an ROI on it? 16
Topics What is Data Quality Management? Data Quality Management Challenges Data Quality Definition Four Pillars of Data Quality Management Getting Started 17
Quality - Definition Quality is conformance to requirements Whose requirements? How are requirements set? What degree of conformance? 18
Quality - Definition Quality is not... (necessarily) zero defects Defect Rate Target Time 19
Quality - Definition To the user, the data warehouse is the source Data model provides basis for data collection Definitions Validation rules Relationship rules Actual data must also be examined Operational business process implications Abuse of defined fields Undocumented business rules Impact of system changes 20
Quality Management 100% C O M P L E T E N E S S Complete but with errors Very Dangerous May be a prototype only A C C U R A C Y Perfect data Expensive Incomplete but accurate 100% From Imhoff and Geiger, April 1996, Data Management Review $ 21
Reject the error Four Types of Error Correction Accept the error Correct the error Use default value for data in error 22
Reject the Error! Better to have missing data than inaccurate data Reject the complete record Correct at the source and re-extract the data 23
Accept the Error! Data error is within tolerance limits Correct data at the source If not correctable, provide meta data on the error 24
Correct the Error! Data essential for completeness Correction is required Use temporary file Correct data prior to load May correct at source 25
Data needed for completeness Use Default Value for Data in Error! Data is unusable as is Data value is replaced with a default value Meta Data must be used to explain when and how the default is used 26
Topics What is Data Quality Management? Data Quality Management Challenges Data Quality Definition Four Pillars of Data Quality Management Getting Started 27
Four Pillars of Data Quality Management 28
Four Pillars of Data Quality Management Data Profiling Gaining an understanding of existing data relative to quality specifications This is your starting point from which improvement (and ROI) is measured Is the data complete? Is the data accurate? Data Quality Gaining an understanding of the causes of quality problems Heavy usage of data profiling technology Analysis of the root causes of data quality problems and inconsistencies Choose one of four options to fix the problem 29
Four Pillars of Data Quality Management Data Integration Collapsing disparate versions of data into a single one Recognition that same data exists in multiple locations with variable content Standardize the multiple versions (e.g., customers, products, geographies, etc.) to single version Data Augmentation incorporation of additional external data to gain insight Combine internal customer data with third party data to increase understanding of the customer External data competitor, customer demographic or credit history, total industry sales data 30
Topics What is Data Quality Management? Data Quality Management Challenges Data Quality Definition Four Pillars of Data Quality Management Getting Started 31
Getting Started Education Stewardship Program Partnerships & Environment Four-Phase Program Technology Support 32
Education Involve key data warehouse effort participants Business users Developers Influencing people Better chance of getting commitment Involves various techniques Facilitated sessions Interviews Group-ware Need to avoid analysis paralysis 33
Stewardship - Definition Webster s Dictionary: A steward is one who is called upon to exercise responsible care over possessions entrusted to him/her The steward does not own the possessions The steward has a responsibility affecting the processes that impact the possessions The steward may be a business unit or defacto steward 34
Data acquisition Processes System roles Update authority Validation rules Business rules Quality Data Steward Responsibilities Responsibilities of data stewardship include We need to approach this in an organized manner Data management Data models Demographics Naming standards Meta data requirements Storage redundancy Backup & recovery Archival & restoration Dissemination Access security Standard queries and reports Capabilities System use Quality Meta data provided Disposal Retention Erasure 35
Partnerships & Environment Business Unit Business Unit Executive Management Information Technology Information Technology Business Unit Middle Management Information Technology 36
Partnerships & Environment Address quality issues explicitly Address known quality problems Business processes Operational data Ensure environment supports quality Properly train and equip team Check development, test and production environments Build quality into process Provide quality review points 37
Partnerships & Environment Quality expectations must be: Understood Negotiated Communicated Met Quality is a business issue -- NOT just a technical issue Quality is not an issue for one business unit -- horizontal activity Quality Committee Data Stewardship 38
Four Phase Program 39
Technology Support Data Quality Management companies like DataFlux are available to help you get started. They can: Help you determine your Data Quality Management needs Develop a plan to help meet your needs Provide the technology, methodology and services to execute your plan 40
Summary Data Quality Management is not a luxury it is essential The first step is to recognize that you have data unquality A sound program consists of four pillars Getting started requires commitment and dedication in all corners of the enterprise 41
42