Test Data Management A Process Framework Test Data Management, a service that caters to the various data needs for development/enhancement/maintenance and testing of applications, plays a vital role in the IT system of any organization. An increasing number of organizations are requesting Test Data Management as a managed, centralized IT service from its vendors, mainly with the objectives of realizing cost benefits, reduced time-to-market and improved quality of the end-product. However, Test Data Management is not as simple as it seems at first glance and comes with its own set of challenges. The intent of this paper is to discuss the challenges and typical practices in providing Test Data Management services, and share a process solution that addresses the challenges
About the Authors Dinesh Nittur Dinesh Nittur has 11 years of experience in the IT industry. He holds a Bachelors degree in Mechanical Engineering from Bangalore University. He has worked in various domains like Insurance, banking and capital market, with major focus on QA and Testing. Tithiparna Sengupta Tithiparna Sengupta is a Test Strategy Consultant at Tata Consultancy Services Ltd. She has 9 years of experience in the IT industry, with a 6+ years of experience in different areas of software testing. She has worked on testing related engagements for TCS clients in the Financial Services domain. She holds a Bachelors degree in Chemical Engineering from Jadavpur University, Kolkata and a postgraduate degree in the same discipline from Indian Institute of Technology, Kanpur.
Table of Contents 1. Introduction 3 2. Challenges 3 3. Solution 5 Typical Practices 5 On-the-job Observations 7 Process Framework 7 7. Benefits 12 8 References/Future Study 12 9. Acknowledgments 12
Introduction Test Data Management A Process Framework In today s world of competitive market, organizations change their business trends and strategies frequently to sustain growth. The changes in business trends result in development of new IT applications or changing the existing ones, which then result in various data needs in sub-production environments to facilitate development/enhancement/maintenance of the IT applications and, of course, testing. Test Data Management thus plays a vital role in the IT system of any organization. An ineffective test data management may lead to: Inadequate testing and thus poor quality of product Increased time-to-market Increased costs from redundant operations and rework Non-compliance with regulatory norms on data confidentiality In view of the above, an increasing number of organizations are requesting Test Data Management as a managed, centralized IT service from its vendors. The major benefits that any organization seeks from such a service are: Reduced cost in test data management Improved data quality, which will contribute to improved testing and thus improved quality of end product Timely data delivery, which will contribute to faster test executions and thus reduced time-to-market Strict adherence to regulatory norms on data confidentiality (more than a benefit, this is a must-have ) Freeing up of the bandwidth of developers and testers for their core work that is, code development and testing A well-defined process framework, along with effective use of tools for test data creation and masking, can enable a fast realization of the above benefits. Challenges At first glance, Test Data management may seem simple, but a closer involvement reveals that it is quite a tricky job that comes with the following challenges: Complexity of data requirements each project usually involves multiple application teams requiring data to be synchronized between applications; each application is simultaneously involved in multiple projects resulting in contention for environments and so on (Ref: Figure 1). Analysis of data requirements due to lack of information on the existing data in most organizations, the Production and sub-production environments are not profiled and thus gap analysis between as-is and to be can prove to be a major challenge. Scope creeps there are often frequent changes in data requirements due to changes in business requirements and handling these scope creeps, particularly, the late ones, can be a major challenge. Sudden and immediate requests for test data during test execution catering to these types of requests requires a lot of agility since the allowed turnaround time is very short. Adherence to regulatory compliance like data confidentiality data in the test environments cannot be a direct copy of production; confidential information must be masked before loading data. Despite using tools for this purpose, the process of data masking still adds some time to the overall cycle time for servicing any request that involves environment refresh with production data. 3
Assurance of data safety or security - there was initially no well-defined policy on test data storage strategy with version-control, access security and back-up mechanisms. Thus, there was always a risk of crisis situations resulting from unanticipated loss of database. Reliance on production data for loading to test environment - this is always a challenge due to the huge volume of production data and the chance of disruption to production systems due to repeated data requests. Coordination the test data management team has to coordinate with application teams, infrastructure team, Data Base Administrators (DBA) and so on. Coordination with multiple stakeholders can be at times quite a challenging task. Lack of a proper process framework to manage the activities related to Test Data Management Ensuring of proper data distribution so as to prevent: - Data contention between multiple projects - Redundant or unused data in any region Ensuring of data reuse Managing of the impact of data refresh on ongoing projects Identification of the right region that caters to the needs of all the applications within a project The figure below is a diagrammatic representation of complexity of data requests spanning multiple projects, applications, and environments: Dev Env Test Env 1 Test Env n App 1 App 2..App n App 1 App 2..App n App 1 App 2..App n Prj 1 Prj 2 Prj 1 Prj 3 Prj 2 Prj x Prj x Prj n Business Test Data Figure 1: Complexity of Test s 4
Solution Typically, Test Data Management services can be segregated into the following four categories: 1. Initial test data set-up and/or synchronization of test data across applications. This is a one-time job that is executed by the Test Data Management team right after it is established. 2. Servicing data requirements for project(s). Projects are again of two categories: a. Development of a new application, which may thus require test data creation from scratch b. Enhancement or maintenance of existing application(s) only 3. Regular Maintenance or Support servicing: a. Simple data requests b. Change requests (CRs), that is, change in data requirements c. Problem Reports (PRs), that is, problems reported in data delivered 4. Perfective Maintenance scheduled maintenance of test beds on an annual frequency Here we will talk about a process framework that takes care of all the services listed above. The process framework is derived from the typical practices that are followed for test data management and some on-the-job observations. Through this process framework, we have tried to address all the challenges mentioned above. Typical Practices These are the typical best practices that are followed in Test Data management: As part of initial data set-up and/or data synchronization across applications, a Data Profiling exercise is carried out for all the environments (Production and sub-production). Data Profiling allows us to understand what is in our production data as well as what currently exists in our test data; it is the process of collecting information and ing the characteristics of the data in terms of data source, data attributes, relationships, values, dependencies and domains. After the first exercise, data profiling exercise is typically repeated as part of scheduled perfective maintenance. Test Data requests are captured in a standardized test data request form so as to avoid gaps in the requirements provided. The most current data profiles are used to analyze the gaps between current environments and data requirements. All data requests are analyzed for: gaps with current environments and impact on other projects. Ideally, project teams should finalize their Test Data requirements and put in their data requests at the end of the Design phase of the SDLC itself. Test Data Manager (TDM) holds calls, meetings and formal reviews of s to ensure alignment of all teams on test data requirements and data set-up strategy. Both test data requirements and data set-up strategy are signed off by key stakeholders, which include the project teams. 5
Data refresh should ideally be the last option of the team, resorted to only after exploring possibilities like: - Use of existing bed - Sharing of test bed with other project (this minimizes redundant and unused data in any test bed) - Creation of partial (only the missing) data in the test bed and so on If going for 'data refresh', instead of refreshing with production data or data slices, test data management teams often first explore the possibility of refreshing with data dump from other test regions. All applications may or may not be able to access all test regions within test environment. Based on data needs from projects, test data are pulled from different test regions, synchronized and then loaded to the target test region. This practice ensures minimum disruption to production systems due to repeated data requests. For loading production data or data slices to test environment, tools are used to first mask the data prior to loading. Data are masked both for database tables as well as files. For creating data, it is common to use tools like shell scripts, SQL procedures etc. Test data management teams ideally should maintain a Data Distribution Log. This helps to prevent data contention issues and also to quickly identify the data that is available for distribution from one project to another. Data Version Control and Data Cataloging - after data set-up, Test data management team should store away the copy/back-ups of the data (both the files and databases), assign a version number to the data and then make an entry of the version number along with data details in the Data Catalog. This allows the restoration of environment to original condition with minimum effort, as and when required. Test data management team should implement a well-ed set of Data Security policies (data version control, back-up, storage and access policies) Regular maintenance of test beds should be carried out on an annual (or any other defined frequency) basis so as to preserve the synchronization between data of the different applications in the same region and facilitate reusability of data that have been once set up. Test data management team should build standardized templates for project or application teams to raise data requests as well as CRs and PRs, and also standardized templates for all their deliverable as well as internal-use s, for example, Data Requirements, Data Profile, Analysis, Data Strategy, Data Distribution Log and Data Catalog. In many organizations, Data Strategy is made a part of the Master Test Plan. In such cases, the Master Test Plan template includes a very detailed section called Test Data Plan, with sub-sections on strategy for test data set-up and the plan for various test data set-up activities (list of activities, time lines, estimated efforts and so on), risks and communication plan. The Test Data Management Team, based on the data requirement and subsequent analysis of the same, prepare the Data Plan, which in turn is incorporated into the Master Test Plan and then signed-off. This also helps the Test Manager to review and monitor the test data management activities. 6
While loading production data to test environment, instead of using the whole data dump, data sampling technique can be used to derive a production slice. Sampling allows extraction of a small representative set of data from production for use in testing. While adoption of data sampling practice calls for a certain level of technical maturity from the Test Data Management team, the practice enables the teams to work with even smaller volumes of production data without compromising on data requirements coverage. Instead of reverting the environment to the previous version of data, tool(s) can be used to restore the used data to its original unused condition. In many organizations, existing GUI-based functional automation scripts are used for this purpose. For instance, open accounts that have been closed upon test execution, can be reopened, from the GUI, through test automation scripts, to make the account numbers reusable. On-the-job observations These are the key on-the-job observations of the practitioners: The key to completing the actual data set-up process is an effective coordination with all the teams involved and prior communication of all activities Lack of proper connectivity with test regions often hinders test data set-up and/or subsequent validation of the set-up. Connectivity with the environment should be verified early on, before proceeding with the actual data set-up. Almost every test region has data redundancy or unused data, and distributing unused data from one project to another is a very quick and effective way of handling sudden data requests. Process Framework We have segregated the Test Data Management practices into various project phases Planning or Kick-off, Analysis, Design, Build and Maintenance and defined an ETVX model for each of the phases as described in the following table: Phase Entry criteria Tasks Validation Exit Criteria and Work items Planningcum- Kick-off Test Data Manager (TDM) identified for test management services Acquire initial understanding of the test data landscape in the organization through meetings, questionnaire and so on. Build templates for: 1. form 2. Data Profile 3. Data Requirements 4. Analysis Report Review Test Data Landscape Review Templates Review list of ongoing projects Exit Criteria Signed-off Templates Signed-off Test Data Landscape Work Items Questionnaire Test Data Landscape Templates 7
Phase Entry criteria Tasks Validation Exit Criteria and Work items Planningcum- Kick-off Test Data Manager (TDM) identified for test management services 5. Data Strategy Document or Data Plan (can be made a subset of Master Test Plan) 6. Data distribution log (log of request, fulfillment details, region, characteristics of data provided and so on) 7. Data Catalog Identify SPOCS from each application team, DBAs and so on. Conduct meetings with SPOCS Prepare a list of ongoing projects with start and end dates, applications involved, test regions being used by each project or application and so on. Prepare Data Landscape (a single- stop containing information like list of test regions, applications, types of data stores (files or database), typical frequency of data requests for each application and so on. Establish SLAs for delivering test data management services. Set up a team for Test Data Management (can be a virtual team comprising of members from different application teams and DBAs). Review Test Data Landscape Review Templates Review list of ongoing projects Exit Criteria Signed-off Templates Signed-off Test Data Landscape Work Items Questionnaire Test Data Landscape Templates Analysis Templates are available Signed-off Test Data Landscape is available Carry out Data profiling exercise for each of the individual data stores across the enterprise, for ALL regions, including Production (optional, applicable only if it is an initial data set-up or synchronization exercise). Assign version number to existing data in all environments and enter them in the Data Catalog (optional, applicable only for initial data set-up or synchronization; else, this is an activity that is done only during the build phase or maintenance phase, after setting up the test data and validating its correctness). Collect Test Data Requirements (can be for specific project(s) or for initial data set-up and synchronization across applications) through calls, meetings, data request forms and so on. Consolidate requirements provided in Review Data Profile (optional, applicable only for initial data set setup or synchronization) Review Data Catalog (optional, applicable only for initial data set-up or synchronization) Review data requirements (through calls, meetings and formal review of requirements ) Review updated list of projects Exit Criteria Reviewed Data Profile s (optional, applicable only for initial data set-up or synchronization) Reviewed Data Catalog (optional, applicable only for initial data set-up or synchronization) Signed-off test data requirements 8
Phase Entry criteria Tasks Validation Exit Criteria and Work items forms in Data Requirements. Update list of ongoing projects. Analyze data requirements, latest data distribution log (if present) and existing test bed to identify the following: 1. Gaps between requirements and current test beds 2. Gaps between requirements and current data profiles (Is the data requirement similar to the production data profile? Or have some very typical data scenarios in production been missed out? Is the data requirement similar to the data profile of any test environment) 3. Impact of data modification on ongoing projects. Prepare Analysis report. Define Data Security policies - version control, back-up, storage and access policies (one-time activity). Review Analysis Review Data Security Policy (onetime) Reviewed Analysis Signed-off Data Security Policy (onetime only) Work Items form(s) Data Requirements Data Profile (optional) Data Catalogue (optional) Analysis Data Distribution Log Data Security Policy Design Signed-off test data requirements are available. Signed-off Analysis is available. Decide the strategy for data preparation: Identify test region(s) where data need to be loaded or refreshed Identify methods to be used, for example, - Obtain partial data (only the missing data) from other region - Restore used data to original unused state - Refresh with data dumps (production slice or other regions) - Refresh with old back-ups - Mask data - Create new data - Distribute unused data from other projects Identify data sources and providers Identify tools for data extracting, masking, creating, loading and so on Identify Data distribution plan (e.g., tagging account numbers to projects etc.) Identify coordination and communication plan) Review and align the strategy through calls and meetings Review Test Data Plan Exit Criteria Signed-off Data Strategy Document or Data Plan Work Items Data Requirements Data Distribution Log Analysis Data Strategy or Data Plan 9
Phase Entry criteria Tasks Validation Exit Criteria and Work items Plan for test design activities list of activities, time lines, efforts, risks, communication plan etc. Create Data Strategy or Data Plan (containing both the strategy and planning details) Build Signed-off test data plan is available. Environment for data setup is available and connectivity with them is established and validated. Identified Tools are available. Execute activities like the following, identified in test data plan: -Take back-up of data from source region (Production or other sub-production regions) - Carry out masking (optional) - Load data dump (masked or unmasked) to target region - Create Data (manual or using tools) in target region. Communicate data readiness, along with data distribution details, and request validation of environment and data readiness. Take back-ups (for future reuse) of the new data (both databases and files) once data is set up. Assign version number to the back-up and catalog it with proper description. Update Data Distribution log. Update Data Profile, if required. (Especially applicable for initial data set-up or synchronization, wherein at the end of build phase, gaps between production and test environments are closed, thus updating the data profile of the test environments.) Validate data setup correctness and readiness Review updated data distribution log Review Data version and updated catalog Review Data Profile (optional, applicable in specific cases only) Exit Criteria Signed-off test environment Reviewed updated data distribution log Reviewed updated Data catalog Reviewed updated Data Profile Work Items Data Plan Data catalog Data Distribution Log Data Profile (optional, applicable in specific cases only Maintenance Simple data requests/late change in data requirements (to be captured through a CR)/Sudden or unplanned data requirements during test execution/sch eduled perfective maintenance Support change request/unplanned data needs/problem reports or incidents on data delivered: Create or update Data requirements Assign priority to the request, in case of multiple requests Analyze requirements to evaluate: - If data requirements can be furnished from existing test bed by data reuse, modification and other means - If required data can be shared from another project with same data characteristics. Validate data setup correctness and readiness Review updated data distribution log Review updated data profile (applicable only for Review Data version and catalog Exit Criteria Signed-off on the test environment Updated data distribution log Reviewed Data catalog Reviewed updated Data Profile Document (only for perfective maintenance) 10
Phase Entry criteria Tasks Validation Perform necessary steps, for example, - Identify data that may be reused or shared - Modify data Communicate data readiness, along with data distribution details, and request validation of environment and data readiness. Take back-up (for future reuse) of new data once data is fixed or set up. Assign version number to the back-up and catalog it with proper description. Update Data Distribution log. If necessary (i.e. where quick request resolutions are deemed unfeasible), then redirect request to be serviced through elaborate Analyze->Design- >Build phases. Exit Criteria and Work items Work Items form Data Requirements Data Distribution Log Data Profile (only for perfective maintenance) Scheduled perfective maintenance: Review status of ongoing projects and decide, upon analysis, the schedule for maintenance. Communicate schedule of maintenance to all project or application teams. Carry out Data profiling exercise for each of the individual data stores across the enterprise, for ALL regions, including Production. Assess gaps in the test bed and close them. If required, refresh test region with old back-up of the test region so as to restore it to its original state where all data are useable as-is and data for all applications are in sync. Update Data Distribution log with details of 'refresh' (region(s) refreshed, date of refresh, date of dump with which environment has been refreshed and so on). 11
Benefits The deployment of the process framework described above has been seen to yield the following benefits at: Reduction in cost to organization by as much as 30-40% from: Reduced data set-up effort due to streamlined processes Reduced rework due to first-time correct deliveries Increased use of tools for data creation Improved data reuse (approximately 80% data reuse) using previous backed-up data versions Reduced data redundancy through environment sharing between projects Reduced number of environment refreshes due to exploration of other options like data sharing, creating only the missing data and so on. Reduced disruption to Production services due to reduced requests for production data dumps Reduced time-to-market at least, no delays due to delay in test data delivery Improved data quality owing to thorough analysis of data requirements and thus improved testing Increased data security and recover-ability from well-defined data security policy References/Future Study [1] We spoke with practitioners across TCS i.e. Test Data Manager and members of multiple teams that are providing Test Data Management services. TM [2] Masketeer TCS tool for test data masking - http://www.tcs-trddc.com/tecs'09/masketeer.pdf [3] Testify TCS tool for test data creation Acknowledgments We acknowledge all the people in TCS, Vinita M and Chaitra Puttaswamy in particular, who helped us write this white paper by providing valuable information and suggestions. 12
About BFS Industry Practice We have an enviably strong BFS Industry expertise and experience gained through the numerous mission critical projects that have been successfully delivered to our eminent BFS clients across the Globe. This is strengthened by the BFS Industry Practice, which has professionals who have served across Global, Regional and National Financial Institutions in various lines of BFS business and operations. Our global focus, deep industry knowledge and commitment to understanding and satisfying client needs have been critical to our successes. The BFS Industry Practice is organised to deliver value to our clients across the multiple BFS-Industry Solution units, Insurance ISU, New Growth Markets and Emerging Market IOUs. About Tata Consultancy Services (TCS) Tata Consultancy Services is an IT services, business solutions and outsourcing organization that delivers real results to global businesses, ensuring a level of certainty no other firm can match. TCS offers a consulting-led, integrated portfolio of IT and IT-enabled services delivered through its unique Global Network Delivery TM Model, recognized as the benchmark of excellence in software development. A part of the Tata Group, India s largest industrial conglomerate, TCS has over 143,000 of the world's best trained IT consultants in 42 countries. The company generated consolidated revenues of US $6 billion for fiscal year ended 31 March 2009 and is listed on the National Stock Exchange and Bombay Stock Exchange in India. For more information, visit us at www.tcs.com bfs.marketing@tcs.com TCS Design Services M 0909 Subscribe to TCS White Papers TCS.com RSS: http://www.tcs.com/rss_feeds/pages/feed.aspx?f=w Feedburner: http://feeds2.feedburner.com/tcswhitepapers All content / information present here is the exclusive property of Tata Consultancy Services Limited (TCS). The content / information contained here is correct at the time of publishing. No material from here may be copied, modified, reproduced, republished, uploaded, transmitted, posted or distributed in any form without prior written permission from TCS. Unauthorized use of the content / information appearing here may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties. Copyright 2009 Tata Consultancy Services Limited www.tcs.com