Seeking Data Quality Using Agile Methods to Test a Data Warehouse Copyright Ideaca 2008
Agenda Seeking Data Quality Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions Copyright Ideaca 2008 2
Agenda Seeking Data Quality Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions Copyright Ideaca 2008 3
What is a Data Warehouse? A non-transactional data repository Integrates data from multiple sources Organized around relevant subjects Queryable by business users Used for reporting Used for analysis Copyright Ideaca 2008 4
The Structure of a Data Warehouse Kimball s Star Schema Copyright Ideaca 2008 5
The Flow of Data Typical data flow Copyright Ideaca 2008 6
Agenda Seeking Data Quality Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions Copyright Ideaca 2008 7
The Value of a Data Warehouse To provide information that will help people make better choices This information is a solution to the problem of making choices in a complex environment The benefit of the information is that it reduces risk by providing an accurate representation of the state of the world This comes at the cost of building and maintaining the data warehouse now and into the future Copyright Ideaca 2008 8
Data Value Drivers Our research led us to these value drivers: The more accurate the data is, the more useful it is, and therefore the more valuable it is The value of data increases when combined with other data The value of data increases with its use; in fact is only has value when people use it Focus on high risk problems using limited resources Emphasis on Data Quality Relevance Completeness Correctness Consistency Copyright Ideaca 2008 9
Agenda Seeking Data Quality Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions Copyright Ideaca 2008 10
Agile Principles as Guides Testing is a process of investigation and evaluation Customer involved in deciding test relevance Customer involved in deciding test priority Communication of test goals and approach Simple and lightweight test scripts Avoid effort on low value tasks Copyright Ideaca 2008 11
Agenda Seeking Data Quality Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions Copyright Ideaca 2008 12
Test Strategy Outline Data Warehouse Test Targets Stars are the business view of a data warehouse Stars are comprised of a Fact and its Dimensions Fact and Dimension tables are loaded through ETL s Each target had a similar test approach The test backlog was a prioritized list of these tests Detailed test scripts are expensive to produce Our scripts outlined a guided exploration Progress could be measured through a burndown chart Regulatory requirements needed to be met Copyright Ideaca 2008 13
Business View of a Data Warehouse Testing progress reported on the basis of stars Copyright Ideaca 2008 14
Agenda Seeking Data Quality Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions Copyright Ideaca 2008 15
Tests We tested for completeness No missing records No missing fields We tested for correctness Correct keys Correct calculations Correct aggregations Correct data type/size We tested for consistency Consistent aggregations Consistent calculations Consistent data type/size Consistent granularity Consistent business rules Consistent use of nulls and defaults Consistent formatting Copyright Ideaca 2008 16
Test Points Test every ETL, Fact, and Dimension Copyright Ideaca 2008 17
Agenda Seeking Data Quality Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions Copyright Ideaca 2008 18
Test Results Greater than 99.99995% data accuracy Testing less than 20% of development effort Common scripts, common understanding Copyright Ideaca 2008 19
Root Cause Analysis Defects Classified by root cause Cause Defect % Development Standards Issues 23% Implementation Errors 22% ETL Errors 21% Database Issues 13% Design Issues 9% Other Issues 12% Copyright Ideaca 2008 20
Defect Roots Causes Cause Development standards issues Implementation errors ETL errors Cause Breakdown Naming conventions Design standards Documentation standards Metadata Primary/foreign key problems Inconsistent field lengths Field types Bad data Missing data Counts off Totals off Failed calculations Failed conversions Unpopulated fields Copyright Ideaca 2008 21
Defect Roots Causes - continued Cause Database errors Design issues All other issues Cause Breakdown Performance Indexes Partitions Tablespace Missing fields Extra fields Missing dimensions Mapping problems Miscellaneous Copyright Ideaca 2008 22
Agenda Seeking Data Quality Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions Copyright Ideaca 2008 23
Conclusions Value based approach focused our test efforts to find more serious problems sooner Applying agile principles allowed us to minimize wasted time and effort Testing identified development process changes that had the greatest impact on data quality New regulatory requirements mean that the ability to test is now a design issue Copyright Ideaca 2008 24
Summary Contrasting Test Styles Old Approach Focus on tool database, data warehouse Focus on process tables, views, stored procedures Test plans Test cases Detailed scripts for instructions No special emphasis on team communication New Approach Focus on value data usage in business context Focus on outcome stars/dimensions/facts Test backlogs Test targets Light scripts as guides for exploration Team communication is vital Copyright Ideaca 2008 25