Datamaker - the complete Test Data Management solution

Datamaker - the complete Test Data Management solution Improve software application quality whilst reducing time-to-market

Executive Summary Whether building and testing new applications, re-engineering systems or migrating applications to the Cloud, modern IT needs to be able to quickly respond to changing requirements by provisioning fit for purpose test data to the right place, at the right time, to accelerate and improve test cycles. Test data management (TDM) is the practice of applying a structured and centralised approach to the management of test data at an enterprise level, in order to reduce cost whilst increasing efficiency and quality in development and testing. It is, therefore, perhaps no surprise that more and more organisations are looking to implement test data management solutions that provide a systematic approach to creating and provisioning fit for purpose test data, on demand, throughout the SDLC. Datamaker is the complete, end-to-end test data management solution from Grid-Tools, providing the flexibility and functionality required to secure, manage, find, make and design fit for purpose test data, shift left testing, mitigate risk and reduce the time-to-market of high quality, valuable software. Challenge Modern IT departments are under increasing pressure to better align themselves with the needs of the business: improving efficiency, reducing costs and accelerating the delivery of valuable, quality software. Yet, for many organisations faced with the constraints of large production databases, data protection and constantly shifting requirements, creating the fit for purpose test data needed to shift left testing and deliver data to the right place, at the right time remains a challenge. Solution The most extensive and complete test data management solution (Bloor Research, 2011), Datamaker can revolutionise an organisation s approach to provisioning test data. Moving beyond compliance, the ability to provision fit for purpose test data that matches requirements, on demand, across the organisation throughout the software development lifecycle (SDLC), affords greater agility in responding to the demands of the business, whilst improving the quality of your delivered software and reducing cost. Benefits Datamaker facilitates compliance with data protection legislation, whilst significantly reducing infrastructure costs by extracting small, secure, intelligent subsets from production. However, it is the ability to quickly design, find and make fit for purpose test data or synthetically create data where none exists to provide the maximal functional coverage in the optimal minimum number of test cases that makes Datamaker stand out from the crowd. Datamaker enables organisations to shift left high quality, efficient, cost-effective test cycles and deliver valuable software to market earlier.

Challenge: Creating and Managing Fit for Purpose Test Data Creating and managing test data for use in software development has long challenged IT departments. However, the pressure to align more closely with the business has created an ever-present need for the operational agility required to respond to market demands; whilst also meeting data protection compliance in the face of demands to reduce costs through the outsourcing of development and testing effort to globally distributed teams, and managing the problems presented by large databases. In the face of these acute new challenges, organisations are reevaluating the efficacy of traditional practices for provisioning test data that is fit for purpose. In short, can organisations meet their shortand long-term goals by making copies of production data sources for use in development and testing? 20% of the average software development lifecycle is wasted due to delays in provisioning downstream teams with the right data. In most cases, this is because individual developers and testers are required to manually and laboriously search for, manipulate, hack and create the test data needed; an inefficient, error-prone process, which compromises operational agility and increases both the cost and time to market. It also reduces the value of the test data provisioning effort, as manually created test data can rarely be shared or re-used by multiple teams across multiple different projects. The Problem of Production Meeting Data Protection Compliance Traditionally, most organisations used full copies of production databases to provision data for development and testing. This practice is no longer viable with current data protection legislation (e.g. HIPAA, PCI DSS, EU Data Protection Directive 96/45/EC and the UK Data Protection Act). Also, weaker security measures in non-production environments increases exposure to the risk of data breach. The organizational risk is considerable. According to the Ponemon Institute (Cost of Data Breach Study: Global Analysis, 2013), the average consolidated cost of data breach to US and UK companies in 2012 was $5.4m and 2m (approx. $3.1m) respectively. Once you factor in the long-term costs of customer defections and bad publicity to an organization, the risk of using production data becomes untenable. Data masking (also known as obfuscation, de-identification or desensitization) provides a popular solution to this problem. For all organisations, particularly those who have chosen to outsource their development and testing effort, this is now a minimum standard. However, in large, complex modern IT architectures, with sensitive data stored across multiple, disparate data sources, manual, ad hoc masking processes are too expensive, slow and error-prone to provide the security or performance required. For more, please see our recent article on creating a successful roadmap for data masking. Managing Large Production Databases In recent years, the rapid growth of production databases has posed IT departments, particularly those which use full copies of production in development and testing environments, with some significant

challenges. To start with, migrating and maintaining numerous copies of production is a prohibitively slow and expensive process, which requires considerable investment in network and hardware infrastructure. Migrating large volumes of data also increases the risk of exposing sensitive data. In most cases, developers and testers only require small sets of test data containing specific criteria or scenarios. This makes data subsetting a particularly useful solution; allowing organisations to extract small, relevant, consistent and secure slices of data from multiple, disparate production data sources. However, the overarching challenge of using production data remains - production data typically provides 10-20% of the code and functional coverage needed to fully test quality software. Poor Test Coverage and Data Quality Performing high quality, efficient test cycles requires small, rich sets of test data, which provide the maximal functional coverage in the minimum optimal volume of test data. However, much of the data in production is similar drawn from common transactions, for example and, by its very nature, sanitized. This means that production data typically only covers BAU (Business as Usual) scenarios. Consequently, negative paths where the system receives bad or incorrect data and scenarios for which no data currently exists are not properly tested. Often, it is these paths that cause software applications to fall down. This is particularly problematic for testing virtualized services that do not yet exist (prototypes), as with no data in production data sources, the services will require significant manual effort to ensure adequate testing is carried out. Manually Creating Test Data In most cases, it falls upon individual developers and testers to search for, and then manually create, the missing data or to enhance the quality of existing data. This is a slow, unsystematic process, which produces little return on investment for the time and resources required. At best, manual data creation improves functional coverage to approximately 50%, whilst the delays encountered in doing so are amplified downstream. This compromises agility and increases the cost and time-to-market. Manually created test data is also difficult to leverage elsewhere in the software development lifecycle. As it is created in isolation, it is often to meet a specific need and rarely reflects changes in requirements. The data is also usually burned in testing. Therefore, downstream teams derive little benefit from earlier effort and exacerbate any delays by having to manually create their own data. Managing the Test Data In traditional waterfall software development lifecycles, test data typically has to flow downstream through various teams. This lack of agility can create considerable idle time as teams wait for test data. It also renders teams unable to react to constantly changing requirements; for example, by the time the data filters down to a team, it may no longer be fit for purpose based on current requirements. These constraints are exacerbated by the challenge of provisioning the right test data caused by using production data. As discussed above, manually manipulating, hacking and editing the data creates long, inefficient test cycles that result in critical delays downstream. It is also rarely possible to share

and re-use manually created data between teams; either it is burned in testing or, by the time it is available, it no longer meets requirements. This inability to leverage earlier effort creates extra work, and forces organisations to make a straight and undesirable choice between reducing cost and timeto-market versus delivering valuable software applications with all of the required functionality. Solution: Fit for Purpose Data in the Right Place, at the Right Time Datamaker delivers an end-to-end, component-based test data management solution, which can revolutionize an organisation s approach to provisioning test data. Including the world s most powerful data generation engine, Datamaker provides the flexibility and functionality needed to secure, manage, find, make and design fit for purpose test data in the back-end. Delivering the right data to the right place, at the right time enables organisations to shift left testing, mitigate risk, minimize defect creation and reduce the cost and time-to-market of valuable software applications. Figure 1: Datamaker Solution Architecture

Secure Current data protection legislation requires, as a minimum standard, that organisations desensitise (mask or obfuscate) all production data for use in development and testing environments. Using mathematically-based algorithms, Datamaker can quickly and systematically discover all of the potentially sensitive data stored in multiple, disparate data sources, right across the entire enterprise. Sensitive data/message requests and responses can then be masked with consistent, realistic alternative values, either in-situ or in-flight, using a range of high-performance masking functions (randomisation, substitution, seed data etc.) on one of four powerful, optimised data masking engines. Datamaker calls out to native database and Mainframe utilities when masking the data, eliminating the need to perform Extract, Mask and Load (EML) processes and copy the data into a different proprietary format to be treated. This reduces the risk of unauthorised staff handling production data. Manage Provisioning fit for purpose test data to the right place, at the right time, is an acute pain point for most organisations. Adopting the concept of a central Test Data Warehouse, is fundamental to the task of managing test data in complex, multi-team environments. Using Datamaker, test data engineers are able to save the process of creating small, rich blocks of test data either through the intelligent subsetting of production, mining disparate systems or generating synthetic data in the warehouse. These data blocks can then be assembled up through various versions and releases in the software development lifecycle into more complex packs that meet specific test case criteria. Additionally, changes made to one block ripple up and down through each version and release, eliminating the need to edit and hack the data into shape. The same core data can then be shared amongst multiple teams, and (re)used in parallel. Constructing test data in this way means that earlier work can be leveraged later on, increasing agility and circumventing the traditional bottlenecks in waterfall developments. Datamaker offers multiple, powerful options for creating smaller, richer data blocks, including the capability to create, extract and migrate smaller, more intelligent subsets of production data, based on specific test criteria, coverage or scenarios. Datamaker calls out to native and Mainframe utilities to perform the subset, eliminating reliance on costly EML processes and reducing network latency. Users can also create covered subsets (providing the best possible spread of values) and federated subsets (gathering attributes, which fulfil specific scenarios or criteria, from multiple data sources) to provision smaller, more meaningful and secure sets of data, which offer the best possible coverage.

Figure 2: Test Data Warehouse Architecture Find Most automated tests fail because of data errors or unstable test keys. Datamaker provides powerful, innovative test matching functionality, which identifies, mines and matches data with specific, unique attributes from multiple, disparate data sources to the appropriate test, thereby ensuring that the data is in the correct state to perform successful automated test cycles. Datamaker uses sophisticated data mining techniques - based on stable test criteria rather than unstable keys in spread sheets to discover and extract the appropriate data from multiple, disparate data sources. The rarest sets of data are prioritized, ensuring that the most difficult tests are covered and performed first. Mined data can also be attached to a single key, enabling various tests to be run against a specific employee, for example. As well as stabilizing the data used for automated testing, the test matching functionality can detect which tests will fail, due to data errors, before the run. All matched tests are stored in the central Test Data Warehouse. This enables them to be quickly enhanced with synthetic data and manipulated prior to the automated run. The tests can also be shared between teams and re-used in multiple versions and releases of the application. Storing tests in the Test Data Warehouse also allows them to be easily fed directly into any major automation tool. Make As discussed above, production data typically offers around 10-20% of the functional coverage needed to fully test software applications, or in some cases - such as when building new applications, testing virtualized prototype systems or performing negative testing production data does not exist at all. Using intelligent data profiling techniques, Datamaker is able to take an accurate picture of how the data is related, and from the existing data model, generate small, rich, secure blocks of synthetic test data, which contains all the characteristics of live data without any of the sensitive content. These

blocks can be used to enhance existing sets of data, or create more complex blocks where no data currently exists, providing maximal functional coverage in the optimal minimum volume of data. Synthetic test data can be generated, in combination with requirements, during the design phase of the software development lifecycle. This ensures that each block of synthetically created test data is fit for purpose. All blocks are stored in the central Test Data Warehouse, and can be quickly manipulated in line with changing requirements. Changes then ripple up and down different versions and releases, meaning that teams are always working with the right test data, at the right time. Design To perform high quality, efficient test cycles, test teams need to be working with test data that offers >90% functional coverage. Creating a set of visual, unambiguous requirements enables teams to fully understand what functionality needs to be tested, and what data is needed to fulfil those test cases. Datamaker uses a range of coverage techniques for example, All-Pairs Combinatorial, Constrained All-Pairs and Cause-and-Effect (using Visual Test Flow ) to automatically create the minimum optimal set of test cases to provide the maximal functional coverage. This enables test teams to cover all happy and unhappy paths through an application in a manageable, quantifiable and optimized set of test cases. Designing test cases, and removing ambiguities in requirements, at the start of a project makes it possible to better quantify testing effort, understand when testing is done and remove redundant test cases. It also enables Data Architects and Test Data Engineers to understand what data is needed earlier. This, in combination with powerful data profiling functionality, makes it possible to know where the gaps in the existing/production data are, guiding any synthetic data generation effort. Benefits: Improve software quality and reduce time-to- market Datamaker is the complete, end-to-end, component-based test data management solution; providing organisations with all of the tools needed to overcome challenges in provisioning fit for purpose test data and reduce the time-to-market of high quality, valuable software applications. Datamaker provides the flexibility and functionality needed to facilitate compliance with current data protection legislation, reduce infrastructure costs, shift left testing and minimize defects. Reduction/Elimination of Risk Compliance with current data protection legislation has become a mandatory operation; both for auditors and the business, particularly when outsourcing development and/or testing to third parties. Further analysis of the Ponemon Institute s 2013 Cost of Data Breach Study shows the cost of the average incidence of data breach, to US, UK and Australian companies was $5.4m, $3.1m and $4.1m

(in $US) respectively. This risk was considerably increased by outsourcing to third-party vendors; the study found that third-party errors cost an extra $43 (US), $26 (UK) and $12 (AUS) per record. To put this into context, the average number of records breached in these three countries was 23,000-35,000. Additionally, the EU is currently reviewing data protection legislation that, if enacted, will result in the maximum fine levied for data breach being equivalent to 2% of global turnover. Datamaker provides organisations with an innovative solution to facilitate compliance and mitigate the risk of costly data breach by providing powerful data discovery and masking functionality to quickly identify and secure potentially sensitive records. By calling out to native database and Mainframe utilities, Datamaker is able to offer the best possible masking performance and eliminate the need to expose sensitive content through expensive EML processes and copying data into different formats. Reduce Infrastructure Costs Production databases are growing rapidly, and as a result, so are the infrastructure costs associated with migrating, storing and maintaining multiple copies of production. Migrating such large volumes of data also increases the risk of exposing sensitive data and, in most cases, provides development and testing teams with unnecessary volumes of redundant data, which provide insufficient coverage. Datamaker allows organisations to realise savings of up to $50k per database by creating smaller, more intelligent, referentially intact subsets of production, based on specific testing requirements. By using native database or Mainframe utilities to migrate the data, Datamaker offers the best possible performance without the cost, time, risk and additional hardware associated with EML processes. Using Datamaker to create intelligent subsets also reduces ongoing costs, enabling DBAs to provision consistent, sophisticated slices of data, from multiple systems, which provide an even spread of data types in more manageable blocks of data. These are stored in the Test Data Warehouse and can be used to build more complex sets of data, or be shared and (re)used throughout the entire organisation. Introducing a Visionary Approach Data masking and data subsetting provide an excellent start to provisioning fit for purpose test data, enabling users to deliver smaller, more meaningful sets of secure test data more quickly and costeffectively. However, what makes Datamaker stand out from the crowd is the ability to both eliminate the risk of data breach and improve the quality of the data through the visionary use of synthetic data. Taking an accurate picture of the existing data model, synthetic data contains all of the characteristics of production, but none of the sensitive content. As a result, fit for purpose test data can be generated according to requirements, whilst removing the risk of exposing sensitive records. Removing Bottlenecks across the SDLC In order to shift left testing, organisations need to be able to provide test teams with fit for purpose test data in the right place, at the right time. Yet, as much 20% of the average software development lifecycle is lost by teams sitting idle waiting for the right data to flow to downstream teams.

Based on the concept of a central Test Data Warehouse, Datamaker enables the same core test data to be shared and reused, in parallel, across multiple teams, projects and test cycles at the same time, minimising idle time waiting for data in downstream teams. This also ensures that testing teams are working with data that meets the current requirements, and that the data is not burnt in testing. Our Test Data on Demand self-service portal provides a controlled, web-based service layer, which creates a separation between provision and consumption of test data, free from the constraints of cross-system dependencies. Test Data on Demand is also fully compatible with major test automations tools, such as HP QTP and IBM Rational. This allows users to link fit for purpose test data directly to existing test cases and defect management systems, enabling them to shift left. Accelerate Delivery As much as 50% of development and testing time is spent manipulating, searching for or manually creating the right data to meet test case requirements. Automating this process using the Test Matching functionality within Datamaker - allows you to mine the appropriate data from multiple, disparate back-end systems and match it to specific test cases, reducing the time taken to find and provision the right data by 95% when compared to manual processes. It also enables automated test teams to improve success rates by know which tests will fail due to data issues, prior to the test run. In addition, using Datamaker to generate synthetic test data eliminates the risk of exposing sensitive content to unauthorised staff, removes bottlenecks created by manually hacking and creating data, allows users to quickly enhance the functional coverage of existing data, or create data where none currently exists. This allows teams to shift left testing by detecting defects early and reducing rework. Improve Quality Testing accounts for approximately 40% of the average software development lifecycle. Therefore, the ability to shorten test cycles and shift left testing can considerably reduce the time-to-market of valuable, high quality software applications. Designing the optimal set of test cases i.e. the minimum number of tests to provide maximal functional coverage is integral to maximising the value of testing. Over half (56%) of productions can be traced back to poor, incomplete or ambiguous requirements. Removing ambiguities from requirements enables Test Data Engineers to create and provision exactly the right test data to cover all test cases earlier in the software development lifecycle, minimises delays caused by expensive rework and miscommunication, and can realise savings of $50k per defect. Visual Test Flow allows users to map any process to a clear, visual path through all of the logic gates in your requirements, eliminating the ambiguities that arise from written specifications. With just a simple click, users are provided with multiple methods of optimal test path design, allowing you to cover all happy and unhappy paths in the minimum number of test cases. The right data can then be found or made and linked directly to each logic gate and process, directly within the visual flow.

This makes testing effort quantifiable; eliminating redundant test cases and knowing when you are done can reduce test cycles by as much as 30%. All of this significantly reduces the bottlenecks caused by detecting defects late, shortens and improves the quality of test cycles and reduces time-to-market.

For more information about how Grid-Tools products and solutions can benefit you, contact us: UK: +44 01865 884 600 US: +1 866 563 3120 E: sales@grid-tools.com www.grid-tools.com Subscribe to our blog Find us on Facebook Follow us on Twitter Connect with us on LinkedIn