IM02 How to manage your Test Data on zenterprise 18-20 September, 2012 IBM Forum Brussels
Notices This information was developed for products and services offered in the U.S.A. Note to U.S. Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-ibm product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-ibm Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-ibm products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-ibm products. Questions on the capabilities of non-ibm products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and distribute these sample programs in any form without payment to IBM for the purposes of developing, using, marketing, or distributing application programs conforming to IBM's application programming interfaces. 2
Trademarks This presentation contains trade-marked IBM products and technologies. Refer to the following Web site: http://www.ibm.com/legal/copytrade.shtml 3
Agenda Data Governance - Focus on Test Data creation and masking Obtaining test data. Some common practices. Obtaining test data. Best practices. InfoSphere Optim Test Data Management / Data Privacy Q & A
A Smarter Planet harnesses today s information explosion for business benefit creating a need for better Information Governance Instrumented Interconnected Intelligent Streamlining processes to manage business growth with consistency Ensuring compliance with policies, laws and regulations Controlling costs and optimizing infrastructure
Success requires governance across the Information Supply Chain Transactional & Collaborative Applications Integrate Analyze Content Analytics Business Analytics Applications Manage Master Data Big Data Cubes Data External Information Sources Content Streaming Information Data Warehouses Govern Information Governance Quality Lifecycle Security & Privacy Standards
Requirements for managing data across its lifecycle Information Governance Core Disciplines Lifecycle Management Discover & Define Develop & Test Optimize & Archive Consolidate & Retire Discover where data resides Develop & test database structures/ code Enhance performance Integrate into single data source Classify & define data and relationships Create & refresh test data Manage data growth Move only the needed information Define policies Capture & replay production workloads Report & retrieve archived data Enable compliance with retention & e-discovery
Organizations continue to be challenged with building quality applications Increasing Costs Increasing Risk Time to Market Defects are caught late in the cycle Mandatory to protect data and comply with regulations Lack of realistic test data and inadequate environments
Organizations continue to be challenged with building quality applications Increasing Costs $300 billion Annual costs of softwarerelated downtime. d 32% Low success rate for software projects e Increasing Risk 45,000+ Number of sensitive records exposed to 3 rd party during testing c 70% companies use actual customer data to test applications a Time to Market 37% Satisfied with speed of software development f 30-50% Time testing teams spend on setting up test environments, instead of testing b a. The Ponemon Institute. The Insecurity of Test Data: The Unseen Crisis b. NIST, Planning Report. The Economic Impacts of Inadequate Infrastructure for Software Testing c. Federal Aviation Administration: Exposes unprotected test data to a third party http://fcw.com/articles/2009/02/10/faa-data-breach.aspx d. The Standish Group, Comparative Economic Normalization Technology Study, CHAOS Chronicles v12.3.9, June 30, 2008 e. The Standish Group, Chaos Report, April 2009 f. Forrester Research, Corporate Software Development Fails To Satisfy On Speed Or Quality", 2005
Vulnerable non-production environments at risk Most ignore security in non-production environments 70% of organizations surveyed use live customer data in non-production environments (testing, Q/A, development) Database Trends and Applications. Ensuring Protection for Sensitive Test Data $194 per record cost of a data breach The Ponemon Institute. 2012 Cost of Data Beach Study 50% of organizations surveyed have no way of knowing if data used in test was compromised The Ponemon Institute. The Insecurity of Test Data: The Unseen Crisis 52% of surveyed organizations outsource development The Ponemon Institute. The Insecurity of Test Data: The Unseen Crisis
CIO Impact Across the Enterprise The benefits of TDM strategy Align application performance to business processes Ensure business continuity Respond quickly and accurately to audit and discovery requests Leverage existing investments in applications, databases and storage Reduce resource requirements for key IT operations Business Profit from superior application performance and availability Provision resources to meet priority business needs Automate data retention to support compliance initiatives Eliminate budget variances IT Streamline application and database upgrades Speed disaster recovery Simplify database administration Reclaim underutilized capacity Protect data privacy, integrity, and security
Agenda Data Governance - Focus on Test Data creation and masking Obtaining test data. Some common practices. Obtaining test data. Best practices. InfoSphere Optim Test Data Management / Data Privacy Q & A
Test data creation is often accomplished through cloning Simple to do Positives Requires little knowledge of the data model or infrastructure Creates an exact duplication of production Negatives Uses significant storage - Much more than team needs - Often done once and not for each team member Data is production ready and therefore a privacy risk Takes significant amounts of time to create No way to compare to original after test is complete Cannot span multiple data sources/applications Developer/Tester downtime when sharing data accessibility Production Database Test Database Clone
The Data Multiplier Effect 500 GB Development 500 GB Production 500 GB Test 500 GB Backup 3000 GB Total 500 GB User Acceptance 500 GB Disaster Recovery Actual Data Burden = Size of production database + all replicated clones
Generating synthetictest data Positives Negatives Safe Resource-intensive: - Huge commitment from DBA - Deep knowledge of database schema Tedious: DBA s must intentionally include errors to ensure robust testing process. Created data does not always reflect the integrity of the original data. Time-consuming: process is slower and can be error-prone. Test Database
Test data creation by writing SQL Positives Negatives? Write and maintain SQL. Complex and subject to change. Referential Integrity? Right data? Expensive, dedicated staff. Cannot span multiple data sources/applications. Developer/Tester downtime when sharing data accessibility. Production Database Test Database SQL
Agenda Data Governance - Focus on Test Data creation and masking Obtaining test data. Some common practices. Obtaining test data. Best practices. InfoSphere Optim Test Data Management / Data Privacy Q & A
Test Data Management Concepts Test Data Management (TDM) refers to the need to manage data used in various preproduction environments and is a vital part of Application Quality & Delivery. Extract production data into referentially intact data subsets to be used to support application data in other environments. De-identify (mask) extracted production data to protect privacy. Compare before and after images of test data. Speed application quality and delivery.
Test Data Management Best Practices 50 GB 50 GB Unit 2 TB Function 3 TB Integration Performance 4 TB UAT Production Subset Privatize Subset Privatize Inspect/Browse Inspect/Browse & & Seed Seed Test Test Cases Cases Correct Production Errors Refresh Test Data Refresh Test Data Run Test Production Errors Compare Compare Before Before and and After After Results Results No Errors Promote to Production! Source 1 Compare Process Source 2
The Test Data Management Process Gold Copy TDM Processes Subset and Privatized Test Data Extract Extract & Subset Subset Convert Convert & Mask Mask Compare Compare & Audit Audit Load Load & Distribute Distribute Subset Subset Masking Masking Source Source Criteria Criteria Rules Rules & Target Target Secured Lock-down Environment DB DB List List & Auth Auth
Refresh test data Test Environments Tester Refresh Developer Developer Refresh Test Database 50 GB Training Database 75 GB Tester Tester Refresh Dev Database 25 GB
Example process Speed delivery, reduce costs and improve quality while reducing risk and increasing compliance with Test Data Management Without Test Data Management With Test Data Management Tester Submits request for test data DBA Sits in queue for days Create test data Tester Submits request for test data Sits in queue for days Takes several days to create DBA Create test data Tester -Use test data in testing -Request data refresh Sits in queue for days and take several days to create Tester Takes hours to create -Use test data in testing - Refresh test data DBA Create or Refresh test data
Agile development relies on agile testing, agile testing relies on continuous access to test data Organization Process Technology an agile test organization and testers need continuous access to test data with agile development you have continuous integration and delivery and have to test often test data management software needs to support agile method by streamlining access to test data and having insight into test data
Agenda Data Governance - Focus on Test Data creation and masking Obtaining test data. Some common practices. Obtaining test data. Best practices. InfoSphere Optim Test Data Management / Data Privacy Q & A
Enterprise Data Governance for System z Archive inactive data and reduce amount of data exposed and requiring protections. Reduce risk from Security breaches Optim Data Growth Solution Comply with regulatory compliance requirements Manage Data Lifecycle Data Retention Data Retirement Data Governance Secure Prevent Access Restrict Access Monitor Access DB2/RACF Security Tivoli zsecure Audit Protect sensitive customer data and employee data Guardium for z Audit Audit Privileges Audit Users Audit Access Protect & Privacy Mask Data Encrypt Data Data Encryp. for IMS / DB2 Optim TDM and DP IBM is only solution provider with an end to end comprehensive solution 25
IBM InfoSphere Optim supports the heterogeneous enterprise Discover Manage Test Data Capture & Replay Archive Partner-delivered Solutions Single, scalable, heterogeneous information lifecycle management solution provides a central point to deploy policies to extract, archive, subset, and protect application data records from creation to deletion
OPTIM Server / Repository IMS VSAM / SEQ Files DB2 Orders Products Customers Payments Employee Payroll IMS Native Access Native Access DB2 Access DB2 Optim Directory Workstation ISPF Server Repository Services Data Access Services Subsetting Services Archiving Services An ISPF Data workbench Privacy software Servicesrunning under Z/OS Open utilized Data to design, Management test and deploy projects to the OPTIM Server. The ISPF workbench software Securityenables either Online and Batch (JCL) execution. Metadata Data Data Index Index Artifacts Storage Extract Independent & Archive Files Archive ODBC/JDBC
OPTIM Server / Repository IMS VSAM / SEQ Files DB2 Orders Products Customers Payments Employee Payroll IMS Native Access Native Access DB2 Access Utilizes IMS provided Utilizes drivers to VSAM access Native Accesss Utilizes to access SQL data. to access Metadata data. Provides data. Metadata is captured is captured via via copybook imports process (COBOL to capture or PL/1) metadata from DB2 Metadata copybook imports (COBOL or PL/1) catalog. DB2 Server Optim Directory Workstation ISPF Repository Services Data Access Services Subsetting Services Archiving Services Data Privacy Services Open Data Management Security Z/OS Store and retrieve metadata Data Data information, Index Indexproject information, archive catalog in the Optim Access source or destination Databases via Directory Enable data access relational with access Artifacts specific to drivers Archived per Data file type via ODBC/JDBC and Storage SQL-92. Extract and restore Storage relationally Extract Independent & Can Archive be used intact Business Archive Files in Archive conjunction to remote Database services of Objects across multiple DB2 Databases, IMS and VSAM Store ODM (e.g. and to integrate retrieve, Orders from restore, with Enterprise DB2, delete, Customers compress Data Access from VSAM data, Consistently for Business and metadata Credit and Intelligence and Cards predictably artifacts from IMS) mask (e.g. external and documents propagate data as BLOBs) for the related purpose to of Business test data Objects management with data compliance ODBC/JDBC Provide functional and object security to separate product and data access by role and responsibilities using RACF
Optim capture the complete Business Object
Application view Application-level business rules for data relationships Optim Captures the Complete Business Object Business Object : Represents application data record payment, invoice, customer Referentially-intact subset of data across related tables and applications; includes metadata, DDL, Reference + Transaction. Benefit: Referential Integrity: Ensure data is captured and masked consistently DBA view Referentially-intact subset of data Related LUW Files or Documents Complete Data RI Preserved! OS Independent DB independent ODBC Accessible Federated access to data and metadata IMS IMS DB2 DB2 IMS IMS VSAM VSAM
Test Data Management 2TB Production or Production Clone IBM InfoSphere Optim Test Data Management Solution Requirements -Subset -Mask Create right-size production-like environments for application testing 25 GB Unit Test 100 GB -Compare -Refresh Integration Test 25 GB Development 50 GB Training InfoSphere Optim TDM supports data on distributed platforms (LUW) and z/os. Out-of-the-box subset support for packaged applications ERP/CRM solutions as well as : Other Create referentially intact, right-sized test databases Automate test result comparisons to identify hidden errors Protect confidential data used in test, training & development Shorten iterative testing cycles and accelerate time to market Benefits Deploy new functionality more quickly and with improved quality Easily refresh & maintain test environments Protect sensitive information from misuse & fraud with data masking Accelerate delivery of test data through refresh
InfoSphere Optim Test Data Management Standard methodology Deploy Application Optim fits with your testing methodology Extract Subset Privatize Load Edit Compare Production Extract / privatize production data Success Unsuccessful Results? Automate all or part of the process Extract file Refresh and Retest Compare results Load test database Test Test Application Edit data
Enterprise Testing Solution with Rational and InfoSphere Optim Building better quality applications Comprehensive software quality process to minimize cost and shorten development cycles Manage test labs Create realistic test environments from production data Ensure protection of sensitive data Manage unit, functional and performance testing and quality test cases Streamline your test data management processes and deliver your project sooner and with fewer defects Design & Manage Test Campaign Initiate Data Extract Scripts Subset & Mask Production Data for Testing Refresh Masked Test Data Browse & Edit Test Data Execute Automated Test Routines InfoSphere Optim InfoSphere Optim InfoSphere Optim Fail Compare Before & After Data InfoSphere Optim Go Production!
The process : Access Definiton Legacy are VSAM files, seq. files or IMS segments
Type of Relationships DB2 DB2 defined relationship OPT Relationship defined with Optim
Selection Criteria Only records where State = GA
Extract Process PRODDB EXTRACT Point & Shoot CUSTOMERS ORDERS -- -- ------ -- --------- ---- -- -- ------ -- --------- ---- -- -- ------ -- --------- ---- -- -- ------ -- --------- ---- Extract File Use BROWSE to verify extracted data DETAILS Process Report Extract from source tables Extract data and/or object definitions
The Extract Report
Browse extract file with join
Insert Process : Populate Destination Tables Table Map Table names need not match Change qualifier and/or table name Can be saved in PST Directory
Insert Process : Populate Destination Tables Column Map Map unlike column names Transform/mask sensitive data Datatype conversions Column-level date aging Literals Special Registers Expressions Default Values User exits
Edit / Browse : Traditional vs. Relational Tools Single Table Editors The Relational Editor One table/view at a time No edit of related data from multiple tables Simultaneous browse/edit of related data from multiple tables FIND CUSTOMER NOTE INFO EXIT TABLE FIND ORDERS NOTE INFO EXIT TABLE FIND DETAILS NOTE INFO EXIT TABLE CUSTOMERS ORDERS............... DETAILS
Editing Data Edit data to: Insert Rows Delete Rows Update Rows
Relationally Joined Data Browse or edit related rows Scroll of higher-level table automatically synchronizes all lower-joined tables
Commit/Restore Commits are automatically made to the database when you move your pointer to a different row Each instance of a commit counts as an undo level Restore changes to a row, table or fetch set
Backing Out Changes - Row Level Undo removes last change made to the current row Undo brings up a row list and lets you select how far back you want to restore the current row Undo All removes all changes made to the current row since the last fetch
Challenges of Enterprise Data Privacy Multi-platforms Relational database applications in the enterprise Complex data model Multiple databases Legacy data components Interconnected applications Distributed work teams Employees and contractors Global 24 x 7 operations
What is data masking? Definition Method for creating a structurally similar but inauthentic version of an organization's data. The purpose is to protect the actual data while having a functional substitute for occasions when the real data is not required. Requirement Effective data masking requires data to be altered in a way that the actual values cannot be determined or reengineered, functional appearance is maintained. Other Terms Used Obfuscation, scrambling, data de-identification Commonly masked data types Name, address, telephone, SSN/national identity number, credit card number Methods Static Masking: Extracts rows from production databases, obfuscating data values that ultimately get stored in the columns in the test databases Dynamic Masking: Masks specific data elements on the fly without touching applications or physical production data store
Statically mask data in non-production databases Patient No 123456 SSN 333-22-4444 Name Erica Schafer Address 12 Murray Court City Austin State TX Zip 78704 Statically mask Patient No 112233 SSN 123-45-6789 Name Amanda Winters Address 40 Bayberry Drive City Elgin State IL Zip 60123 Mask data in non-production databases such as test and development Improve security of non-production environments Facilitate faster testing processes with accurate test data Support referential integrity Mask custom and packaged ERP/CRM applications
Optim Data Privacy Solution Production Test VSAM IMS DB2 Contextual, Application- Aware, Persistent Data Data Masking DB2 IMS VSAM Substitute confidential information with fictionalized data Deploy multiple masking algorithms Provide consistency across environments and iterations Enable off-shore testing Protect private data in non-production environments
Consistent mapping Across the enterprise Client Billing Application IMS SSN#s 157342266 132009824 DB2 SS#s 157342266 132009824 Data is masked Masked fields are consistent SSN#s 134235489 323457245 SSN#s 134235489 323457245
De-Identify test data During Extract Process Production Data Extract and Convert Masked Test Data Or Standalone Convert Process Or During Insert/Load Process Transform or Replace sensitive data using Standard mapping rules: Literals, Special Registers, Expressions, Default Values, Look-up tables Complex mapping rules: User exits
Optim Data Privacy in Application Testing NewDB Extract a relationally intact subset from production database(s) CUSTOMERS CUSTOMERS -- -- ---- ---- -- ---- Create CUST ORD DETL INSERT/ UPDATE TESTDB CUST ORDERS ORDERS -- -- ------ -- --------- ---- -- -- -- -- ------ -- -- --------- ---- ---- -- -- -- -- ------ -- -- --------- ---- ---- -- -- -- -- ------ -- -- --------- ---- ---- -- -- ------ -- --------- ---- DETAILS DETAILS -- -- ---- ---- ----- ---- ---- -- -- ---- ---- --------- ---- -- -- ---- ---- ----- ---- ---- Extract File Transform / mask sensitive data Load Files LOAD ORD DETL QADB CUST ORD DETL Extract data and/or object definitions Define a new set of test tables Apply masking during population process Extract file may be reused but contains un-masked data Good practice for testing masks
Optim Data Privacy in Application Testing NewDB Extract a relationally intact subset from production database(s) CUSTOMERS CUSTOMERS -- -- ---- ---- -- ---- ORDERS ORDERS -- -- ------ -- --------- ---- -- -- -- -- ------ -- -- --------- ---- ---- -- -- -- -- ------ -- -- --------- ---- ---- -- -- -- -- ------ -- -- --------- ---- ---- -- -- ------ -- --------- ---- DETAILS DETAILS -- -- ---- ---- ----- ---- ---- -- -- ---- ---- --------- ---- -- -- ---- ---- ----- ---- ---- Extract File Create Transform / mask sensitive data Masked Extract File CUST ORD DETL INSERT/ UPDATE Load Files LOAD TESTDB CUST ORD DETL QADB CUST ORD DETL Extract data and/or object definitions in pre-masked file Use pre-masked Extract file to create new set of tables Convert Pre-masked extract file data into second masked extract file Share masked extract file to be reused for population step Good practice for testing masks using COMPARE
Optim Data Privacy in Application Testing Only Users authorized to see Private data Extract a relationally intact subset from production database(s) CUSTOMERS CUSTOMERS -- -- ---- ---- -- ---- ORDERS ORDERS -- -- ------ -- --------- ---- -- -- -- -- ------ -- -- --------- ---- ---- -- -- -- -- ------ -- -- --------- ---- ---- -- -- -- -- ------ -- -- --------- ---- ---- -- -- ------ -- --------- ---- DETAILS DETAILS -- -- ---- ---- ----- ---- ---- -- -- ---- ---- --------- ---- -- -- ---- ---- ----- ---- ---- Transform / mask sensitive data Most Secure Approach Extract data only Convert during extract Extract File Extract file already contains masked data Can be shared with testers to reuse INSERT/ UPDATE Load Files LOAD TESTDB CUST ORD DETL QADB CUST ORD DETL
Transformation Techniques String literal values Character substrings Random or sequential numbers Arithmetic expressions Concatenated expressions Date aging Lookup values Intelligence
IBM InfoSphere Optim Data Masking JASON MICHAELS De-identify sensitive information with realistic but fictional data ROBERT SMITH Personal identifiable information is masked with realistic but fictional data Requirements Protect confidential data used in test, training & development systems Mask data on screen in applications Implement proven data masking techniques Support compliance with privacy regulations Solution supports custom & packaged ERP applications Benefits Protect sensitive information from misuse and fraud Prevent data breaches and associated fines Achieve better information governance
Contextually accurate masked data facilitates business processes Satisfy Privacy regulations String literal values Character substrings & concatenation Random or sequential numbers Reduce risk of data breaches Arithmetic expressions Lookup values Business data types (CCN, NID) Maintain value of test data Generic mask Dates User defined Patient Information Patient Patient No. No. 112233 123456 SSN SSN 123-45-6789 333-22-4444 Name Name Amanda Erica Schafer Winters Address 40 12 Bayberry Murray Court Drive City City Elgin Austin State State IL TX Zip Zip 60123 78704 Data is masked with contextually correct data to preserve integrity of data Personal Info Table PersNbr FirstName LastName 08054 10000 Jeanne Alice Renoir Bennett 19101 10001 Claude Carl Davis Monet 27645 10002 Pablo Elliot Flynn Picasso Referential integrity is maintained with key propagation Event Table PersNbr FstNEvtOwn LstNEvtOwn 27645 10002 Pablo Elliot Flynn Picasso 27645 10002 Pablo Elliot Flynn Picasso
Total Assets Street Address/City/State/Zip Code Data Sets Customers Street City State Zip Code $534,674,233 54,999 12 Buttercup Ln Cleveland OH 44101 $8,777,733,811 105,333 6767 Rte 10 S Princeton NJ 08594 1) Client is a Bank who wishes to mask its assets by location 288 Helm St 12 Roden Dr Milwaukee Los Angeles WI CA Address Lookup Table 53201 90001 2) Optim provides corresponding Street Address/City/State/Zip Codes for masking 3526 Diamond Rd Seattle WA 98101 12 Street Road 2 Applegarth Ln Total Assets Las Vegas NV Brunswick ME New Table with Masked Data Customers Street City 89101 04011 State Zip Code 3) Leverage Multiple Column Replacement. Entire address row can be masked with a valid Coding Accuracy Support System (CASS) address using enhanced random lookup function $534,674,233 54,999 3526 Diamond Rd Seattle WA 98101 $8,777,733,811 105,333 21 Street Rd Las Vegas NV 89101
First Names and Last Names Data Sets Production Database First Name Last Name GPA High School Advisor State John Bob Danielle Dave Stacey Paul Smith 3.2 Princeton Johnson NJ Kate Last Jones Name 2.7 Albany Kline Lookup NY First Name Lookup Table Newton Nelson Kline Howell Reese Table Test Database First Name Last Name GPA High School Advisor State Dave Nelson 3.2 Princeton Johnson NJ 1) Client is a University who wishes to mask the first and last name fields in their admissions database 2) Optim now has a first name lookup table with over 5,000 male/female names and a last name lookup table with over 80,000 names 3) Use Lookup Tables to randomly replace table first and last names Stacey Reese 2.7 Albany Kline NY
Data Privacy Transformation Library Functions TRANS SSN Generates valid and unique U.S. Social Security Number (SSN). By default, algorithmically generates consistently altered destination SSN based on source SSN. Can also generate a random SSN when the source data does not have an SSN value or when there is no need for transforming the source SSN in a consistent manner. TRANS CCN Use the TRANS CCN function to generate a valid and unique credit card number (CCN). By default, randomizes entire string, can also randomize parts of the credit card (example- preserve cc type). TRANS EML Generates a random e-mail address. An e-mail address consists of two parts, a user name followed by a domain name, separated by @. For example, user@domain.com. JASON MICHAELS ROBERT SMITH
Without Key Propagation Original Data Customers Table Cust ID Name Street 08054 Alice Bennett 2 Park Blvd 19101 Carl Davis 258 Main 27645 Elliot Flynn 96 Avenue Orders Table Now these are Orphans! Without Key Propagation Customers Table Cust ID Name Street 10000 Auguste Smith Mars23 10001 Claude Jones Venus24 10002 Pablo Adams Saturn25 Orders Table Cust ID Item # Order Date 27645 80-2382 20 June 2004 27645 86-4538 10 October 2005 Cust ID Item # Order Date 27645 80-2382 20 June 2004 27645 86-4538 10 October 2005
Masking with Key Propagation Original Data Customers Table Cust ID Name Street 08054 Alice Bennett 2 Park Blvd 19101 Carl Davis 258 Main 27645 Elliot Flynn 96 Avenue Orders Table Cust ID Item # Order Date 27645 80-2382 20 June 2004 27645 86-4538 10 October 2005 Referential integrity is maintained De-Identified Data Customers Table Cust ID Name Street 10000 Auguste Smith Mars23 10001 Claude Jones Venus24 10002 Pablo Adams Saturn25 Orders Table Cust ID Item # Order Date 10002 80-2382 20 June 2004 10002 86-4538 10 October 2005
Using Custom Masking Exits Apply complex data transformation algorithms and populate the resulting value to the destination column Selectively include or exclude rows and apply logic to the masking process Valuable where the desired transformation is beyond the scope of supplied Column Map functions Example: Generate a value for CUST_ID based on customer location, average account balance, and volume of transaction activity
Agenda Data Governance - Focus on Test Data creation and masking Obtaining test data. Some common practices. Obtaining test data. Best practices. InfoSphere Optim Test Data Management / Data Privacy Q & A
Participate in the System z Expert and Superhero contest! Fill in your answer to the question below on the scorecard and deposit your card in the box! Which IBM Software is key for delivering a completly masked Test data subset? A. Combination of answer B & D B. InfoSphere Optim Test Data Management C. InfoSphere Guardium D. InfoSphere Optim Data Privacy 66 2011 IBM Corporation
More information on zenterprise IBM zenterprise / System z Redbooks Portal: http://www.redbooks.ibm.com/portals/systemz IBM zenterprise Announcement Landing Page: ibm.com/systems/zenterprise196 IBM zenterprise HW Landing Page: ibm.com/systems/zenterprise196 IBM zenterprise Events Landing Page: ibm.com/systems/breakthrough IBM Software: ibm.com/software/os/systemz/announcements IBM System Storage: ibm.com/systems/storage/product/z.html IBM Global Financing: ibm.com/financing/us/lifecycle/acquire/zenterprise/ Global Technology Services: http://www.ibm.com/services/us/index.wss/offerfamily/gts/a1027714 67
Traditional Chinese Thai Russian Thank You English Bedankt Nederlands Merci French Obrigado Gracias! Spanish Arabic Brazilian Portuguese Danke German Simplified Chinese Japanese 68