Seeking Data Quality. Using Agile Methods to Test a Data Warehouse



Similar documents
POLAR IT SERVICES. Business Intelligence Project Methodology

Making SAP Information Steward a Key Part of Your Data Governance Strategy

White Paper

ETL-EXTRACT, TRANSFORM & LOAD TESTING

Presented by: Jose Chinchilla, MCITP

SAS Business Intelligence Online Training

Data Quality Assessment. Approach

Reflections on Agile DW by a Business Analytics Practitioner. Werner Engelen Principal Business Analytics Architect

Rational Reporting. Module 2: IBM Rational Insight Data Warehouse

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

Enterprise Data Quality Dashboards and Alerts: Holistic Data Quality

Oracle BI 11g R1: Build Repositories

Data Warehouse: Introduction

Data Warehouse (DW) Maturity Assessment Questionnaire

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Rational Reporting. Module 3: IBM Rational Insight and IBM Cognos Data Manager

Establish and maintain Center of Excellence (CoE) around Data Architecture

Lean QA: The Agile Way. Chris Lawson, Quality Manager

Methodology Framework for Analysis and Design of Business Intelligence Systems

Paper DM10 SAS & Clinical Data Repository Karthikeyan Chidambaram

SimCorp Solution Guide

The Benefits of Data Modeling in Data Warehousing

Would you like to have a process that unlocks ability to learn and produce faster?

Master Data Management The Nationwide Experience. Lance Dacre Director, Data Governance

Oracle Data Integrator integration with OBIEE

Data Integration and ETL with Oracle Warehouse Builder: Part 1

Requirements-Based Testing: Encourage Collaboration Through Traceability

Analytics: Pharma Analytics (Siebel 7.8) Student Guide

Agile Enterprise Data Warehousing Radical idea or practical concept?

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

Data Integrator: Object Naming Conventions

Building Views and Charts in Requests Introduction to Answers views and charts Creating and editing charts Performing common view tasks

B. 3 essay questions. Samples of potential questions are available in part IV. This list is not exhaustive it is just a sample.

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

IBM WebSphere DataStage Online training from Yes-M Systems

How to Leverage Your QMS for Competitive Advantage. Katie Farrand Continuous Improvement Specialist Synergy Resources

Data Warehousing Fundamentals for IT Professionals. 2nd Edition

Advantages of Implementing a Data Warehouse During an ERP Upgrade

Avoiding Common Analysis Services Mistakes. Craig Utley

Making Business Intelligence Easy. Whitepaper Measuring data quality for successful Master Data Management

Cúram Business Intelligence and Analytics Guide

Data Warehouse Overview. Srini Rengarajan

Data Warehousing Fundamentals Student Guide

Taking the first step to agile digital services

THOMAS RAVN PRACTICE DIRECTOR An Effective Approach to Master Data Management. March 4 th 2010, Reykjavik

A Case Study in Integrated Quality Assurance for Performance Management Systems

Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff

Data warehouse Architectures and processes

Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution

CHAPTER SIX DATA. Business Intelligence The McGraw-Hill Companies, All Rights Reserved

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

A Design Technique: Data Integration Modeling

Management Update: The Cornerstones of Business Intelligence Excellence

James Serra Data Warehouse/BI/MDM Architect JamesSerra.com

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.

TRANSFORMING YOUR BUSINESS

Top 10 Business Intelligence (BI) Requirements Analysis Questions

HP Application Lifecycle Management (ALM)

Oracle Database 12c: SQL Tuning for Developers. Sobre o curso. Destinatários. Oracle - Linguagens. Nível: Avançado Duração: 18h

Data warehouses. Data Mining. Abraham Otero. Data Mining. Agenda

Oracle Database 11g: SQL Tuning Workshop

Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course 20467A; 5 Days

Enabling Data Quality

Top 10 Performance Tips for OBI-EE

Copyright 2013 wolfssl Inc. All rights reserved. 2

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

LEARNING SOLUTIONS website milner.com/learning phone

Report and Dashboard Template User Guide

FDQM Financial Data Quality Management Fundamentals - Tips & Tricks Gary Womack, May 8th, 2013

No one has to change. Survival is optional. - W. Edwards Deming - Continue your Beyond Budgeting Journey with help from Agile, Lean and Scrum

Exadata in the Retail Sector

Data Warehousing and OLAP Technology for Knowledge Discovery

Cúram Business Intelligence Reporting Developer Guide

Welcome to online seminar on. Oracle Agile PLM BI. Presented by: Rapidflow Apps Inc. January, 2011

3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools

Oracle Warehouse Builder 10g

AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO

Tiber Solutions. Understanding the Current & Future Landscape of BI and Data Storage. Jim Hadley

3/13/2008. Financial Analytics Operational Analytics Master Data Management. March 10, Looks like you ve got all the data what s the holdup?

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

Quality Assurance in an Agile Environment

Practical meta data solutions for the large data warehouse

SQL Server Analysis Services Complete Practical & Real-time Training

Introduction to Agile Software Development Process. Software Development Life Cycles

Data Warehousing: A Technology Review and Update Vernon Hoffner, Ph.D., CCP EntreSoft Resouces, Inc.

The Microsoft Business Intelligence 2010 Stack Course 50511A; 5 Days, Instructor-led

IBM Software A Journey to Adaptive MDM

Lean Software Development and Kanban

The Role of the BI Competency Center in Maximizing Organizational Performance

Comparing Scrum And CMMI

Measuring for Results: Metrics and Myths

Cost Savings THINK ORACLE BI. THINK KPI. THINK ORACLE BI. THINK KPI. THINK ORACLE BI. THINK KPI.

Creating Connection with Hive

Transcription:

Seeking Data Quality Using Agile Methods to Test a Data Warehouse Copyright Ideaca 2008

Agenda Seeking Data Quality Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions Copyright Ideaca 2008 2

Agenda Seeking Data Quality Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions Copyright Ideaca 2008 3

What is a Data Warehouse? A non-transactional data repository Integrates data from multiple sources Organized around relevant subjects Queryable by business users Used for reporting Used for analysis Copyright Ideaca 2008 4

The Structure of a Data Warehouse Kimball s Star Schema Copyright Ideaca 2008 5

The Flow of Data Typical data flow Copyright Ideaca 2008 6

Agenda Seeking Data Quality Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions Copyright Ideaca 2008 7

The Value of a Data Warehouse To provide information that will help people make better choices This information is a solution to the problem of making choices in a complex environment The benefit of the information is that it reduces risk by providing an accurate representation of the state of the world This comes at the cost of building and maintaining the data warehouse now and into the future Copyright Ideaca 2008 8

Data Value Drivers Our research led us to these value drivers: The more accurate the data is, the more useful it is, and therefore the more valuable it is The value of data increases when combined with other data The value of data increases with its use; in fact is only has value when people use it Focus on high risk problems using limited resources Emphasis on Data Quality Relevance Completeness Correctness Consistency Copyright Ideaca 2008 9

Agenda Seeking Data Quality Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions Copyright Ideaca 2008 10

Agile Principles as Guides Testing is a process of investigation and evaluation Customer involved in deciding test relevance Customer involved in deciding test priority Communication of test goals and approach Simple and lightweight test scripts Avoid effort on low value tasks Copyright Ideaca 2008 11

Agenda Seeking Data Quality Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions Copyright Ideaca 2008 12

Test Strategy Outline Data Warehouse Test Targets Stars are the business view of a data warehouse Stars are comprised of a Fact and its Dimensions Fact and Dimension tables are loaded through ETL s Each target had a similar test approach The test backlog was a prioritized list of these tests Detailed test scripts are expensive to produce Our scripts outlined a guided exploration Progress could be measured through a burndown chart Regulatory requirements needed to be met Copyright Ideaca 2008 13

Business View of a Data Warehouse Testing progress reported on the basis of stars Copyright Ideaca 2008 14

Agenda Seeking Data Quality Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions Copyright Ideaca 2008 15

Tests We tested for completeness No missing records No missing fields We tested for correctness Correct keys Correct calculations Correct aggregations Correct data type/size We tested for consistency Consistent aggregations Consistent calculations Consistent data type/size Consistent granularity Consistent business rules Consistent use of nulls and defaults Consistent formatting Copyright Ideaca 2008 16

Test Points Test every ETL, Fact, and Dimension Copyright Ideaca 2008 17

Agenda Seeking Data Quality Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions Copyright Ideaca 2008 18

Test Results Greater than 99.99995% data accuracy Testing less than 20% of development effort Common scripts, common understanding Copyright Ideaca 2008 19

Root Cause Analysis Defects Classified by root cause Cause Defect % Development Standards Issues 23% Implementation Errors 22% ETL Errors 21% Database Issues 13% Design Issues 9% Other Issues 12% Copyright Ideaca 2008 20

Defect Roots Causes Cause Development standards issues Implementation errors ETL errors Cause Breakdown Naming conventions Design standards Documentation standards Metadata Primary/foreign key problems Inconsistent field lengths Field types Bad data Missing data Counts off Totals off Failed calculations Failed conversions Unpopulated fields Copyright Ideaca 2008 21

Defect Roots Causes - continued Cause Database errors Design issues All other issues Cause Breakdown Performance Indexes Partitions Tablespace Missing fields Extra fields Missing dimensions Mapping problems Miscellaneous Copyright Ideaca 2008 22

Agenda Seeking Data Quality Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions Copyright Ideaca 2008 23

Conclusions Value based approach focused our test efforts to find more serious problems sooner Applying agile principles allowed us to minimize wasted time and effort Testing identified development process changes that had the greatest impact on data quality New regulatory requirements mean that the ability to test is now a design issue Copyright Ideaca 2008 24

Summary Contrasting Test Styles Old Approach Focus on tool database, data warehouse Focus on process tables, views, stored procedures Test plans Test cases Detailed scripts for instructions No special emphasis on team communication New Approach Focus on value data usage in business context Focus on outcome stars/dimensions/facts Test backlogs Test targets Light scripts as guides for exploration Team communication is vital Copyright Ideaca 2008 25