Test Data Management The Best Practices in TDM Abhik Kar Independent Validation Solutions Infosys Technologies Limited Florida, USA Debdatta Lahiri Independent Validation Solutions Infosys Technologies Limited Chennai, India Abstract With increasing demand in fulfilling customer expectations, industries are forced to bring in applications with more features and functionalities. The integration of isolated and legacy applications have increased across industries and applications are now talking to number of interfaces. Volume of the data flow across applications has increased to humongous levels. Variation in data and interactions with multiple interfaces has caused number of scenarios being highly dependent on the test data. Effective TDM reduces testing cycle time and risk of defect slip, increases the reusability and brings in consistency in the test results. This paper discloses the research results and tries to show the effective ways of managing Test Data in a challenging environment. It uncovers the benefits of best practices in TDM to testers and other various stakeholders involved. It is supported by real life case study which proved effective TDM can reduce efforts in testing by significant margin. Index Terms Best Practices, Challenges, Cost Savings, Effort Savings, Test Data Management (TDM), Solution INTRODUCTION As time has passed all the IT applications have become increasingly complex, voluminous, and difficult to maintain. In any testing project there is a very high volume of data flow across all the applications/ modules that interface with each other. Test data plays an extremely vital role for any project where any form of application is being tested. It becomes a challenge then to manage such high volumes of data without considerable effort being put into it. Not only do the testers have to follow test plans, build test scenarios, do test scripting, but along with that one thing of major importance is how effectively and efficiently they manage the test data. If Test Data Management (TDM) is done effectively it helps to reduce considerable rework effort, gives better consistency to the entire process of testing, makes it more planned and organized. The whole testing process becomes a lot more reliable if one is sure that data is being managed and maintained correctly. Without proper TDM it is very difficult to get the highest level of accuracy in any form of testing and we all know how critical the testing process is, where nothing can be left to assumption and everything has to be validated with proper data. In any form of project we have seen tasks related to test data preparation and organization take up about 50-60% of the total effort. Reducing this huge amount of effort being spent on creating/generating test data is extremely critical for any project. With proper TDM this can be channelized to some other useful work in the project like execution and ad hoc testing. In our paper we have tried to delve into the process of effective test data management, which in turn reduces the amount of effort being put into maintaining the data which can be leveraged elsewhere in the project. We have also included a case study to show how in real time we have implemented effective TDM and it has helped us to reduce our Effort and time by a large amount. METHODOLOGY Understanding the challenges and constraints behind managing and maintaining high volumes of test data Analysis of the best practices which can be followed to ensure effective maintenance of test data Limitations of the solutions being suggested Recommendations on the future scope of improvement in Test Data Management We shall take live examples from our projects to showcase how effectively Test Data Management is being implemented and benefits we are getting from following such best practices. TESTING LIFECYCLE AND ITS DEPENDANCE ON TEST DATA Let us see what the stages in testing lifecycle are and which all stages deal with TDM. Requirements Analysis 35
Test Planning Test Development Test Execution Test Reporting Test Result Analysis Defect Retesting Regression Testing Test Closure Of the above steps the stages starting test development to Test Closure all deal with TDM. In test development if proper Test data is not available then the entire process of test case creation, scripting, developing scenarios becomes highly unreliable and ineffective. Rework effort goes up leading to increasing schedule creep which in turn increases cost to the client. In Execution, reporting, analysis of results, defect retesting, regression testing stages also proper accurate data is required to test for the particular functionalities, specifications/requirements. In case that is not available then the entire purpose is lost. The results become extremely unreliable and unrealistic. Always proceed on a step by step basis, as far as testing is concerned. Always have some back up for the Test data generated. Document the entire process for future referral. Have internal audits on Test data maintenance process in the project. SOLUTIONS A. Selecting the Accurate Test Data with Less Effort is the Sole Objective of Test Data Management To increase the test efficiency it s essential to reduce the test effort using new process, tools or techniques. A successful test data management would reap benefit if it s able to select the right data within the right time with less effort without compromising with the quality. This would in turn enhance the test efficiency. Test Data Selection in turn Test Data Management is the LEVER of the Testing Life Cycle. CHALLENGES FOR EFFECTIVE TDM Extremely high volumes of data is used for most applications Different modules of the same applications in the project may require different sets of data In most of the scenarios a lot of applications interface with each other thus increasing the dependency of all of them on the same sets of data. Also due to this the volume of data to be generated also goes up Amount of effort required to create data that have to be used across multiple releases and cycles Centralized database is generally used for many applications. So maintenance becomes another issue. Defect Injection ratio goes up due to manual errors. Faulty Selection of Test Data. Unreliable test results due to improper TDM. Defects may be missed out due to unavailability of proper Test data. Increase in rework effort as a result of lack of proper TDM. Increase in cost to the client as rework effort increases. Dissatisfied customers/clients. BEST PRACTICES FOLLOWED IN THE INDUSTRY FOR EFFECTIVE TDM First step is to thoroughly understand the requirements for generating the data for that. Fig 1: Interrelationship between Test Scope, Test Data and Test Effort A lever is a simple machine to move a load by turning on a pivot or fulcrum. More the fulcrum is located close to the load; less force is needed to move a heavy load. Similarly the Lever Analogy also fits Data Driven Software Testing. Our research from number of Data Driven Testing has proved the below: Accurate Test Data Selection is Crucial More Test Data is close to the real production Data the Test effort required is less More Test Data is close to the actual Test Scope Test effort needed is less 1) Accurate Test Data Selection is Crucial: It s easy to say Give us the right Test Data and we can test anything. But, is it easy to select the Right Test Data? "Give me a place to stand, and I shall move the earth with a lever" - Archimedes Accuracy of the Test Data is extremely important for a successful testing. Therefore selecting the right test data with minimum effort is essential. In the Lever analogy the Force Bar is the accuracy measurement of the test data. More accurate test data will provide the right position of the fulcrum 36
which in turn results the force or test effort required. We have included a case study from one of our data driven testing to showcase the effectiveness of the accurate test data selection. 2) More Test Data is Close to the Real Production Data the Test Effort Required is Less: Preparing test data for testing in test environment involves lot of thought process and analysis. The major parameter involved here is analyzing the requirements and understanding the functionalities. More we can understand the functionalities and create the test data to replicate production scenario it increases the effectiveness of the testing. Study has proved more test data is close to the real production data it will reduce the time taken in test cycle and increases the test efficiency. 3) More Test Data is Close to the Actual Test Scope Test Effort Needed is Less: Test scope includes all the business functionalities and rules need to be tested. As we prepare the test data to ensure maximum coverage and create test data to cover the test scope this needs a complete tracking. Therefore more the test data is prepared with maximum coverage to the testing scope, more the confidence grows up and the number of testing cycle reduces. B. Automate the Process of Test Data Generation as Far as Possible to Reduce Human errors The first step to effective TDM is automating the process of test data creation by using of a proper tool. The tool has to be simple and user friendly. It must reduce the effort taken to create test data from scratch. Also it should be reusable which can be used over releases and across different types of applications. Also it should be easy to understand how exactly to operate the particular tool. In most of the projects huge volumes of test data need to be generated from the Database. If the entire process is manual then the scope for missing out certain important data or injecting defects becomes a lot more probable. To give more reliability and consistency to the test data we should always try and use tools for its creation. This will also bring down the time required to prepare the data and that time can be utilized by the testers for doing some other productive activity like test scripting, test case creation etc. Example: While our stint in a project with a leading US bank; we automated the entire process of test data selection with the help of a tool called CST (Customer Selection Tool). The Test data selection involved the process of selecting customers from a huge database provided to us by the client. The customers were selected based on their zip code preference. Based on their zip codes their credit history would lie with the concerned US Credit Bureau. As we automated this process of customer selection, the accuracy of this test data went up and the effort required for generating that went down a lot. Further details have been provided in the case study. C. Create Reusable Data Test data created should be such that it can be reused over all the releases. This helps in reducing the efforts as well. Enable the reuse of Test Data by the help of Data Warehousing within the project. Data should be retrievable as and when required, based on the environment to be tested. In a project the testers should always try and create generic test data that can be applied to a number of applications/ releases. Example: During the stint of our project with a leading US Bank we created Test Data that could be reused over the life of 3 successive releases. It reduced the test data preparation time to almost half. Also the consistency of the data also increased. D. Maintain Version Tracking Maintain the versions of the test data prepared for the purpose of future tracking E. The Project should always have a Proper Configuration Management Plan An effective CM plan would ensure that the test data is maintained properly and also the data is maintained according to the security and confidentiality agreements. It makes sure that data across all the releases are present in the repository for future referral and reuse. All the data should not be accessible to everyone in the project. Even if they are accessible it should not be editable for those who do not have any use of that data. It helps to prevent unwanted tampering with the test data. BENEFITS OF THE SOLUTION The benefits of TDM include: Less number of defects are injected Reduction in effort considerably Improved Defect Detection efficiency due to correctness of data Better Customer Satisfaction due to dollar savings Reliable and efficient testing process Better Quality Streamlined efforts. 37
A. Customer Selection Tool CASE STUDY We were involved in a project with a leading US Bank. In the project we had to prepare huge amounts of test data. 1) Problem Statement - Challenges while Preparing the Test Data Manually: Huge volume of data provided by the client for the purpose of test data preparation. Data included customers of all 3 Credit Bureaus in US (Trans Union, Experian, and Equifax). A lot of financial Parameters were specified by the bank for the purpose of generating the custom score for each individual customer from the database. Huge amount of time taken to prepare the test data from the database manually. As the entire process was manual and a lot of financial parameters were involved the scope of error also increased. Defects were injected while preparing the test data due to faulty test data selection. 2) Process of Manually Selecting the Test Data: a) Features of the CST: Capable of selecting customers from the Credit Bureau File (database provided by client) based on the right Zip Code preference. Capable of generating the custom score for each of the customers based on the different financial parameters provided by the bank. Capable of grading the customers into the various risk categories based on criteria as specified by the client. b) Benefits of using the CST: Dynamic in customer selection and capable of handling large no. of test data. Data independent and is completely reusable. Eliminates a lot of manual effort in selecting the test data Takes into consideration all the financial parameters as specified by the bank Provides the output in our desired form. Is completely reusable across releases and different projects of the client 4) Average Time (in minutes) for Selecting Customer Data A huge database of customers was provided by client We had to select the test data manually from that database Customers were selected based on their zip code preference. Based on their zip code preference their credit history would be present in the concerned US Credit Bureau. From the Credit Bureau Details a custom score is generated for each customer based on the different financial parameters given by the bank. Then from the custom score and the FICO score (present in the database) the customers are grouped into different categories starting from Bad Customers ->High risk ->Medium Risk -> Low Risk. in Each Project: Fig 2: Time to prepare test data traditionally vs. using CST 3) Solution: 5) Savings Quantified: Due to all the above mentioned challenges and constraints we decided to automate the process of test data selection with the help of a tool called: CUSTOMER SELECTION TOOL (CST). Test data preparation time measured in terms of per customer data Traditional Approach (Manual) : 12mins on an average With CST: 2mins on an average 83% of Time and Effort Saved Using CST 38
ACKNOWLEDGEMENT We would like to thank Mrs. Yogita Sachdeva, Senior Project Manager - IVS, Infosys for her help and encouragement given to us during the course of drafting this paper. Without her enthusiastic support and encouragement we would not have been able to complete the paper. We would sincerely like to acknowledge her guidance and support given. Also we would like to thank Mrs. Alice Thankachan, Infosys for the tremendous level of encouragement that she had given us when we had decided to write this paper. She has helped us in all the possible ways in the course of writing this paper and given us the necessary guidance required. We are sincerely grateful to both of them. REFERENCES [1] http://www.solix.com/secure_clone.htm [2] http://www.grid-tools.com/ [3] http://www.ibm.com/developerworks/rational/library/06/11 07_davis/ [4] http://www.geekinterview.com BIOGRAPHIES Abhik Kar is a Process & Domain Consultant at Infosys Technologies Limited. He has led many large Testing projects, especially in Banking Sector. His major areas of interest include Test Process improvement and Test Management. Debdatta Lahiri is a Process & Domain Consultant at Infosys Technologies Limited. She has been associated with testing projects, especially in Banking Sector, working with leading banks in the industry. Her interest areas mainly include social networking and communication. 39