A Comprehensive Approach to Master Data Management Testing Abstract Testing plays an important role in the SDLC of any Software Product. Testing is vital in Data Warehousing Projects because of the criticality of the data that is made available to end users. MDM data warehouse testing has not yet received substantial attention. MDM data warehouse testing is different from generic Software Testing as its focus area is Data and Information whereas generic Software Testing is focused on Program Code. In this paper I introduce the testing activities with respect to the Data Warehouse built using the Technology Master Data Management commonly known as MDM along with the What & How of the Testing Activities.
How is MDM Testing different from Generic Testing? Data warehouse testing involves huge data volumes, unlike generic testing. This significantly impacts performance and productivity. In Generic System Testing the testable combinations of scenarios are limited whereas in MDM Data warehouse testing valid scenarios are unlimited and hence it is not completely testable. Data Validation is one of the main goals of MDM Data warehouse testing due to the significance of the data delivered to the end users. Unlike Generic Testing, MDM Data warehouse testing continues after the System Release. Regression Testing is an integral part of MDM Data warehouse testing as it is very difficult to anticipate future requirements and thus errors that can be encountered in the real system. Before getting into the What & How of Testing MDM, I will explain in brief what MDM is and why it is needed? What is MDM? MDM comprises a set of rules, criterions, procedures & tools, which defines, and manages the data of the Organisation. MDM is used to analyse information across different source systems in an Organisation, resolve data discrepancies, and derive master data for end-users. The resulting records can also be referred to as Golden Records or True Records. Why MDM? It provides a single source for consistent and accurate Master Data. Reduces overall data maintenance costs by preventing multiple processing in different systems. Ensures data consistency and accuracy, which reduces the error-processing costs due to inconsistent master data. Can effectively manage Master Data within Companies that have heterogeneous system landscapes containing both SAP & non-sap systems. Has automated or scheduled processes for data import, creation, update and distribution using Workflow. Rich Master Data Content management for Catalog and Web publication (including PDFs/Images). Record/attribute and Field Level role based security. One MDM server can store multiple Master Data Repositories. Business Purpose MDM is used to build a Master Data hub to analyse information across different source systems, resolve data discrepancies and derive master data. It builds an integration framework so that the master data can be shared across the organisation. The final records generated can be referred to as Golden Records/True Records. IBM Master Data Management Server is being used to store client data. It uses IBM MDS for the identity resolution of records. Once duplicate parties have been identified in MDS a soft link is created in MDM. Any client survivorship rules are used to generate the virtual golden record on the fly.
Testing Activities Accurate Advanced Test Planning is one of the major keys to the success of a System as the earlier an error is detected in the SDLC, the lower the cost of correcting. From an Organisational point of view, there are several roles involved in the Testing of a system. Analysts responsible for the conceptual schema which is used by the testers for understanding user requirements. Designers responsible for logical schema of data repositories and data flows, which are tested for robustness and competence. Testers responsible for developing and executing Test Plans and Cases. Developers responsible for Unit Testing. Data Base Administrators responsible for Stress and Performance Testing. Also responsible for setting up Test Environments. Users responsible for performing functional testing on the GUI. Testing activities are divided into two parts below What is tested, and how it is tested? What is tested? Testing data quality is the core of MDM Testing. MDM Data warehouse projects mainly involve checking on the correctness of the data loaded by ETL procedures and accessed by front-end tools. However, the complexity involved in the MDM data warehouse projects means that testing the design quality is equally significant. The following items are tested: Conceptual Design: This explains the facts, measures, and hierarchies independent of DataMart from an implementation independent point of view. Logical Design: This describes the arrangement of the data repository at the core of the DataMart. ETL Procedures: The complex procedures which are used for feeding the data from the sources. Database: The repository where the data is stored. Front End: The end user applications used for analysing results and generating reports. How it is tested? The following tests are carried out in the MDM data warehouse testing: Functional Test: Verifies that the Business Requirements are fully and correctly met in the item. Usability Test: Users interact with the item to verify that the item is easily usable and understandable. Performance Test: Done to check the item performance under typical workload conditions. Stress Test: Checks how well the item performs with peak loads and heavy loads of data. Recovery Test: Checks how well an item is able to recover from crashes, hardware failures and other similar problems. Security Test: Checks that the data is secure and the intended functionality is maintained. Regression Test: Checking that even after a change has occurred, the item still functions correctly after a change has occurred.
What Vs. How in Testing? Conceptual Logical ETL Database Frontend Functional Yes Yes Yes Yes Usability Yes Yes Yes Performance Yes Yes Yes Yes Stress Yes Yes Yes Recovery Yes Yes Security Yes Yes Yes Regression Yes Yes Yes Yes Yes Analysis & Design Implementation Test Coverage Testing can minimise the probability of a system fault but cannot remove it completely. Measuring the coverage of tests is required to assess overall system consistency. The first thing needed to measure test coverage is an appropriate definition of the Coverage Criteria. Different Coverage Criteria, such as statement coverage, decision coverage, and path coverage, are pre-arranged in the scope of code testing. The choice of one or other criteria deeply affects the test length and cost, as well as achievable coverage. Trading off test effectiveness and efficiency chooses the coverage criteria. Examples of coverage criteria that we propose for some of the testing activities described above are covered in the table below. Coverage Criteria for Testing Activities; the expected coverage is expressed with reference to the coverage criterion Testing Activity Coverage Criteria Measurement Expected Coverage Fact Test All information needs Percentage of queries in Partial expressed must be tested the workload supported by the conceptual schema Conformity Test All data mart dimensions Bus matrix sparseness Total must be test Conceptual Schema All facts, dimensions and Conceptual metrics Total Test measures must be test ETL Unit Test All decision points must be Correct loading of the test Total tested data sets ETL-forced Error Test All errors specified by users must be tested Correct loading of the faulty data sets Total Frontend Unit Test Minimum of 1 group set must be tested for each attributes Correct analysis of real data set Total Timeline for Testing From an organisational point of view, the three main phases of testing are:
- Creating a Test Strategy: The Test Strategy describes the tests that must be executed and their expected impact on System Requirements. - Preparing Test Scripts: Test Scripts enable the execution of the test strategy by detailing the testing steps together with their expected results. The reference databases for testing should be prepared during this phase and a comprehensive set of workloads should be well defined. - Executing Test Scripts: A test execution log tracks each test along with its results. Below is the of a Health Care Project; MDM Client Intelligence Program. Situation The Company s Client Information is stored in multiple and proprietary source applications. As the same Client Information is stored across different source systems this leads to the following issues: 1. Inconsistent and Inaccurate data. 2. High Maintenance Costs due to multiple processing. 3. High Costs due to Inconsistent data. 4. Duplication of data. 5. Data Discrepancy issues. Information Source Project MDM Client Intelligence Program (CIP). Solution Implemented The MDM Server solution for CIP consists of the integration of the IBM Initiate MDS and IBM MDM Server with the hub connector. The solution provides an organisation master data that has enabled the company to have a single trusted view of all clients. The solution has also enabled this company to share the single trusted view across multiple applications and systems. It is a foundation for building a person/contract/product master data hub in the future. Approach The Initial Load Data from the source systems to the Client Intelligence Program MDM Server are loaded using the Rails Process using Data Stage from the different source Applications to SDS, CCD and MDM Tables. The required format conversion from source system model to the SDS, CCD & MDM Servers Table format is done by the ETL Team. This process also covers Address Cleansing, Business Rules and Survivorship Rules before the data is loaded to Temp Tables in MDM. All the composite transactions are used for the initial load. The integrity of the initial loaded data is verified prior to making an attempt to do the delta load. All the sources included as a part of MDM CIP are assigned a ranking for different attributes used for storing Client Information. The customised survivorship rules work based on the source rank and the last update date. Only the MDM Server provides the last update date. Since the last update date does not make sense in the initial load, the source system order on loading is required.
Testing Scope The following types of tests are performed in the MDM CIP Project: 1. Create/Update Testing The new records are created and existing records are updated to verify that the records are correctly loaded. These are then reflected from Source to Physical MDM considering the different rules such as Address Cleansing, Business Rules and Survivorship Rules. 2. Survivorship Rules Testing The rules are tested to show they have been applied correctly on the records available in MDM Temp Tables and in Physical MDM. 3. Link/Unlink Testing Survivorship Rules are tested when the records are linked/unlinked. 4. MDS Initiate UI Testing The Virtual MDM UI Application is tested to verify the UI and to check if the user can add/update the records successfully. 5. Data Stewardship UI Testing The Physical MDM is tested to verify that the records are available in the Physical MDM with all the rules applied on the records. 6. Data Mapping Testing Data is verified in the different Databases SDS, CCD and MDM to verify if it has been loaded correctly.