Requirements are elicited from users and represented either informally by means of proper glossaries or formally (e.g., by means of goal-oriented



Similar documents
A Comprehensive Approach to Data Warehouse Testing

A Comprehensive Approach to Master Data Management Testing

Data Warehouse Design


Data Warehouse: Introduction

Data warehouse life-cycle and design

My Favorite Issues in Data Warehouse Modeling

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

Data Warehouse (DW) Maturity Assessment Questionnaire

Keywords : Data Warehouse, Data Warehouse Testing, Lifecycle based Testing

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

LEARNING SOLUTIONS website milner.com/learning phone

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

Advanced Data Management Technologies

Testing Big data is one of the biggest

Data Warehousing Systems: Foundations and Architectures

INTEROPERABILITY IN DATA WAREHOUSES

Metadata Management for Data Warehouse Projects

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Modern Software Engineering Methodologies Meet Data Warehouse Design: 4WD

Oracle Insurance Policy Administration System Quality Assurance Testing Methodology. An Oracle White Paper August 2008

Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course 20467A; 5 Days

Keywords: Data Warehouse, Data Warehouse testing, Lifecycle based testing, performance testing.

70-467: Designing Business Intelligence Solutions with Microsoft SQL Server

OLAP Theory-English version

Mario Guarracino. Data warehousing

Bringing agility to Business Intelligence Metadata as key to Agile Data Warehousing. 1 P a g e.

Foundations of Business Intelligence: Databases and Information Management

BUILDING OLAP TOOLS OVER LARGE DATABASES

Data warehouse design

Oracle BI 11g R1: Build Repositories

POLAR IT SERVICES. Business Intelligence Project Methodology

Dimensional Modeling for Data Warehouse

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Multidimensional Modeling - Stocks

Proven Testing Techniques in Large Data Warehousing Projects

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Index Selection Techniques in Data Warehouse Systems

A McKnight Associates, Inc. White Paper: Effective Data Warehouse Organizational Roles and Responsibilities

Data W a Ware r house house and and OLAP Week 5 1

B.Sc (Computer Science) Database Management Systems UNIT-V

MDM and Data Warehousing Complement Each Other

INFORMATION TECHNOLOGY STANDARD

A Hybrid Model Driven Development Framework for the Multidimensional Modeling of Data Warehouses

DEVELOPING REQUIREMENTS FOR DATA WAREHOUSE SYSTEMS WITH USE CASES

ETL-EXTRACT, TRANSFORM & LOAD TESTING

Rational Reporting. Module 3: IBM Rational Insight and IBM Cognos Data Manager

Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Establish and maintain Center of Excellence (CoE) around Data Architecture

BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS

Turning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex,

A Service-oriented Architecture for Business Intelligence

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

Course MIS. Foundations of Business Intelligence

OLAP AND DATA WAREHOUSE BY W. H. Inmon

Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution

Agile Business Intelligence Data Lake Architecture

Data Warehouses & OLAP

Microsoft Data Warehouse in Depth

AN INTEGRATION APPROACH FOR THE STATISTICAL INFORMATION SYSTEM OF ISTAT USING SDMX STANDARDS

Basics of Dimensional Modeling

Comparative Analysis of Data warehouse Design Approaches from Security Perspectives

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

DATA WAREHOUSING AND OLAP TECHNOLOGY

HYPERION MASTER DATA MANAGEMENT SOLUTIONS FOR IT

Data warehouse Architectures and processes

Designing Business Intelligence Solutions with Microsoft SQL Server 2012

DATA WAREHOUSING - OLAP

The Benefits of Data Modeling in Data Warehousing

B. 3 essay questions. Samples of potential questions are available in part IV. This list is not exhaustive it is just a sample.

Data Warehousing and OLAP Technology for Knowledge Discovery

The Role of the Analyst in Business Analytics. Neil Foshay Schwartz School of Business St Francis Xavier U

SQL Server 2012 Business Intelligence Boot Camp

MOC 20467B: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Bringing Value to the Organization with Performance Testing

Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance

Week 3 lecture slides

OBIEE 11g Data Modeling Best Practices

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Data Warehouses in the Path from Databases to Archives

Extensibility of Oracle BI Applications

Business Intelligence System Using Goal-Ontology Approach: A Case Study in Universiti Utara Malaysia

Data Modeling Master Class Steve Hoberman s Best Practices Approach to Developing a Competency in Data Modeling

SENG 520, Experience with a high-level programming language. (304) , Jeff.Edgell@comcast.net

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and

From Business Intelligence to Location Intelligence with the Lily Library

14. Data Warehousing & Data Mining

Software testing. Objectives

Designing Self-Service Business Intelligence and Big Data Solutions

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

IST722 Data Warehousing

Essbase Integration Services Release 7.1 New Features

Transcription:

A Comphrehensive Approach to Data Warehouse Testing Matteo Golfarelli & Stefano Rizzi DEIS University of Bologna Agenda: 1. DW testing specificities 2. The methodological framework 3. What & How should be tested 4. Testing phases 5. Test plan 6. Conclusion DOLAP - 09 1 DW testing Testing is undoubtedly an essential part of DW life-cycle but it received a few attention with respect to other design phases. DW testing specificities: Software testing is predominantly focused on program code, while DW testing is directed at data and information. DW testing focuses on the correctness and usefulness of the information delivered to users Differently from generic software systems, DW testing involves a huge data volume, which significantly impacts performance and productivity. DW systems are aimed at supporting any views of data, so the possible number of use scenarios is practically infinite and only few of them are known from the beginning. It is almost impossible to predict all the possible types of errors that will be encountered in real operational data. 2 1

Methodological framework Methodological framework Requirements are elicited from users and represented either informally by means of proper glossaries or formally (e.g., by means of goal-oriented diagrams) 2

Methodological framework Data sources are inspected, normalized, and integrated to obtain a reconciled schema Methodological framework A conceptual schema for the data mart is designed considering both user requirements and data available in the reconciled schema 3

Methodological framework The preliminary workload expressed by users is refined and user profiles are singled out, using for instance UML use case diagrams Methodological framework A logical schema for the data mart is obtained by properly translating the conceptual schema 4

Methodological framework ETL procedures are designed considering the source schemata, the reconciled schema, and the data mart logical schema Methodological framework It includes index selection, schema fragmentation, and all other issues related to physical allocation 5

Methodological framework This includes implementation of ETL procedures and creation of front-end reports What & How is tested Data quality: entails an accurate check on the correctness of the data loaded by ETL procedures and accessed by front-end tools. Design quality: implies verifying that user requirements are well expressed by the conceptual and by the logical schema. Conceptual schema Logical schema ETL procedures Database Front-end 6

What & How is tested Functional test: it verifies that the item is compliant with its specified business requirements. Usability test: it evaluates the item by letting users interact with it, in order to verify that the item is easy to use and comprehensible. Performance test: it checks that the item performance is satisfactory under typical workload conditions. Stress test: it shows how well the item performs with peak loads of data and very heavy workloads. Recovery test: it checks how well an item is able to recover from crashes, hardware failures and other similar problems. Security test: it checks that the item protects data and maintains functionality as intended. Regression test: It checks that the item still functions correctly after a change has occurred. What VS How is tested Conceptual schema Logical schema ETL procedures Database Front-end Functional Usability Performance Stress Recovery Security Regression 7

Testing the Conceptual Schema Functional test (Fact test): verifies that the workload preliminarily expressed by users during requirement analysis is actually supported by the conceptual schema. Can be quantitatively measured as the number of non supported queries Functional test (Conformity test): it is aimed at assessing how well conformed hierarchies have been designed. This test can be carried out by measuring the sparseness of the bus matrix Sparse bus matrix: the designer probably failed to recognize the semantic and structural similarities between apparently different hierarchies. Dense bus matrix: the designer probably failed to recognize the semantic and structural similarities between apparently different facts. Conceptual Logical ETL Database Front-end schema schema procedures Functional Usability Performance Stress Recovery Security Regression Conformity Test manufacturer type category manufacturer type category product sales manager product sales manager SALE_ CLASSIC monthweek qty sold store city state returns/qty sold SALE TRENDY monthweek qty sold store city state returns/qty sold Facts Dimensions SALE CLASSIC SALE TRENDY MANUFACT URE Week Product manufacturer type category Store Phase Quality Line monthweek product SALE qty sold returns/qty sold sales manager store city state 8

Testing the Logical Schema Functional test (Star test): consists in verifying that a sample of queries in the preliminary workload can correctly be formulated in SQL on the logical schema. Priority should be given to complex queries (including many-to to-many association, complex aggregations, non-standard temporal scenario) Can be quantitatively measured as the number of non-supported queries Usability test: consists in verifying how easy it is for a user to understand the schema. Several metrics have been developed to evaluate this point Number of dimensional attributes in a star schema Number of dimension tables in a star schema Number of complex constructs (e.g. many-to-many association) Works by M. Piattini et al. show an high correlation between an high value of dimensional attributes/ dimension tables and the time required by an user to understand the schema. Conceptual Logical ETL Database Front-end schema schema procedures Functional Usability Performance Stress Recovery Security Regression Testing ETL Procedures Functional testing: is probably the most complex and critical testing phase, because it directly affects the quality of data. Unit test: a white-box test that each developer carries out on the units (s)he developed. d They allow for breaking down the testing ti complexity, and they also enable more detailed reports on the project progress to be produced Integration test: a black-box box test that allows the correctness of data flows in ETL procedures to be checked Forced-error error test: are designed to force ETL procedures into error conditions aimed at verifying that the system can deal with faulty data as planned during requirement analysis. Since ETL is heavily code-based, most standard metrics for generic software system testing can be reused here. Conceptual Logical ETL Database Front-end schema schema procedures Functional Usability Performance Stress Recovery Security Regression 9

Testing the Front-End Functional testing of the front-end must necessarily involve the end- users, they can more easily detect abnormality in data caused by: Faulty ETL procedures Incorrect data aggregations or selections in front-end tools Poor data quality of the source database. An alternative approach to front-end functional testing consists in comparing the results of OLAP analyses with those obtained by directly querying the source DBs. Usability tests are carried out measuring the number of misunderstandings of the users about the real meaning of data when analyzing the reports. Conceptual Logical ETL Database Front-end schema schema procedures Functional Usability Performance Stress Recovery Security Regression Testing performances Performance should be separately tested for: Database: requires the definition of a reference workload in terms of number of concurrent users, types of queries, and data volume Front-end: requires the definition of a reference workload in terms of number of concurrent users, types of queries, and data volume ETL prcedures: requires the definition of a reference data volume for the operational data to be loaded Stress test: simulates an extraordinary workload due to a significantly larger amount of data and queries. Variables that should be considered to stress the system are: Database: number of concurrent users, types of queries and data volume Front-end: number of concurrent users, types of queries, and data volume ETL procedures: data volume The expected quality level can be expressed in terms of response time Conceptual Logical ETL Database Front-end schema schema procedures Functional Usability Performance Stress Recovery Security Regression 10

Regression test and test automation Regression test is aimed at making sure that any change applied to the system does not jeopardize the quality of preexisting, already tested features. In regression testing it is often possible to validate test results by just checking that they are consistent with those obtained at the previous iteration. Impact analysis is aimed at determining what other application objects are affected by a change in a single application object. Remarkably, some ETL tool vendors already provide some impact analysis functionalities. Automating testing activities plays a basic role in reducing the costs of testing activities. Testing automation is practically possible for a limited number of test types: Implementation-related testing activities can be automated using commercial tools (e.g. QACenter) that simulate specific workloads and analysis sessions, or that can measure an ETL process outcome. Design-related testing activities require solutions capable of accessing meta-data repositories and the DBMS catalog. ETL Conceptua Logical procedure Database Front-end l schema schema s Functional Usability Performan ce Stress Recovery Security Regression Coverage Criteria Measuring the coverage of tests is necessary to assess the overall system reliability Requires the definition of a suitable coverage criterion Coverage criteria are chosen by trading o test effectiveness and efficiency Testing activity Coverage criterion Measurement Expected coverage Fact test Each information need, expressed by users during requirement analysis, must be tested Percentage of queries in the preliminary workload that are supported by the conceptual schema Partial, depending on the extent of the preliminary workload Conformity test All data mart dimensions must be tested Bus matrix sparseness Total Usability test of the conceptual schema All facts, dimensions, and measures must be tested Conceptual metrics Total ETL unit test All decision points must be tested Correct loading of the test data sets Total ETL forced-error error test All error types specified by users must be tested Correct loading of the faulty data sets Total Front-end unit test At least one group-by set for each attribute in the multidimensional lattice of each fact must be tested Correct analysis result of a real data set Total 11

A test plan 1. Create a test plan describing the tests that must be performed 2. Prepare test cases: detailing the testing steps and their expected results, the reference databases, and a set of representative workloads. 3. Execute tests Conclusions and lesson learnt The chance to perform an effective test depends on the documentation completeness and accuracy in terms of collected requirements and project description. The test phase is part of the data warehouse life-cycle cycle The test phase should be planned and arranged at the beginning of the project Testing is not a one-man activity. The testing team should include testers, developers, designers, database administrators, and end-users, and it should be set up during the project planning phase. Testing of data warehouse systems s is largely based on data. Accurately preparing the right data sets is one of the most critical activities to be carried out during test planning. While testing must come to an end someday, data quality certification is an ever lasting process. The borderline between testing and certification clearly depends on how precisely requirement were stated and on the contract that regulates the project. 12

Future works We are currently supporting a professional design team engaged in a large data warehouse project, which will help us to: Better focus on relevant issues such as test coverage and test documentation. Understand the trade-off between extra-effort effort due to testing activities and the saving in post-deployment error correction activities and the gain in terms of better data and design quality on the other. Validate the quantitative metrics proposed and identifying proper thresholds. We are also working to a revised version of our testing approach when an evolutive/iterative design methodology is adopted 13