Data Domain Discovery in Test Data Management

Similar documents
Data Domain Profiling and Data Masking for Hadoop

How to Configure a Secure Connection to Microsoft SQL Server

Using Microsoft Windows Authentication for Microsoft SQL Server Connections in Data Archive

Secure Agent Quick Start for Windows

Technical Paper. Defining an ODBC Library in SAS 9.2 Management Console Using Microsoft Windows NT Authentication

Configure an ODBC Connection to SAP HANA

SAS Add-In 2.1 for Microsoft Office: Getting Started with Data Analysis

System Area Management Software Tool Tip: Integrating into NetIQ AppManager

Configuring Notification for Business Glossary

Configuring Data Masking

Using LDAP Authentication in a PowerCenter Domain

Business Intelligence Tutorial: Introduction to the Data Warehouse Center

Business Intelligence Tutorial

2. Unzip the file using a program that supports long filenames, such as WinZip. Do not use DOS.

Creating IBM Cognos Controller Databases using Microsoft SQL Server

Accounts Payable Workflow Guide. Version 12.0

Instructions for Configuring a SAS Metadata Server for Use with JMP Clinical

Report Designer and Report Designer Add-In Installation Guide Version 1.0

BusinessObjects Enterprise XI Release 2

StarWind iscsi SAN: Configuring HA File Server for SMB NAS February 2012

Configure Managed File Transfer Endpoints

Accounts Payable Workflow Guide. Version 11.2

StreamServe Persuasion SP5 Control Center

Installation Guide. Novell Storage Manager for Active Directory. Novell Storage Manager for Active Directory Installation Guide

Jolly Server Getting Started Guide

Audit Management Reference

Lab 02 Working with Data Quality Services in SQL Server 2014

Installing Windows Rights Management Services with Service Pack 2 Step-by- Step Guide

Lab 05: Deploying Microsoft Office Web Apps Server

Plug-In for Informatica Guide

AvePoint SearchAll for Microsoft Dynamics CRM

ODBC Overview and Information

ImageNow Interact for Microsoft SharePoint Installation, Setup, and User Guide

How To Import And Re-Import Data From An Infosphere Data Model To An Infosplash Server On A Pc Or Macbook

Running a Workflow on a PowerCenter Grid

AvePoint Tags 1.1 for Microsoft Dynamics CRM. Installation and Configuration Guide

Master Data Services. SQL Server 2012 Books Online

Sage Intelligence Financial Reporting for Sage ERP X3 Version 6.5 Installation Guide

Using DBMoto 7 in a Microsoft Windows Cluster

Working with SQL Server Integration Services

HR Onboarding Solution

Accessing Your Database with JMP 10 JMP Discovery Conference 2012 Brian Corcoran SAS Institute

SMS Database System Quick Start. [Version 1.0.3]

Release Bulletin Sybase ETL Small Business Edition 4.2

MadCap Software. Upgrading Guide. Pulse

Knowledge Base Articles

Configuring Hadoop Distributed File Service as an Optimized File Archive Store

StarWind iscsi SAN Configuring HA File Server for SMB NAS

User's Guide. Using RFDBManager. For 433 MHz / 2.4 GHz RF. Version

AD RMS Windows Server 2008 to Windows Server 2008 R2 Migration and Upgrade Guide... 2 About this guide... 2

Management Pack for vrealize Infrastructure Navigator

Creating a Custom Logger to Log Database Access Outside of Business Hours

PUBLIC. How to Use in SAP Business One. Solutions from SAP. SAP Business One 2005 A SP01

Jet Data Manager 2012 User Guide

XMailer Reference Guide

EventTracker: Support to Non English Systems

SELF SERVICE RESET PASSWORD MANAGEMENT BACKUP GUIDE

StarWind SMI-S Agent: Storage Provider for SCVMM April 2012

Administrator s Guide to deploying Engagement across multiple computers in a network using Microsoft Active Directory

Preparing to Install SQL Server 2005

Database migration using Wizard, Studio and Commander. Based on migration from Oracle to PostgreSQL (Greenplum)

Connect to an SSL-Enabled Microsoft SQL Server Database from PowerCenter on UNIX/Linux

Data Discovery & Documentation PROCEDURE

ATT8367-Novell GroupWise 2014 and the Directory Labs

Creating and Deploying Active Directory Rights Management Services Templates Step-by-Step Guide

SELF SERVICE RESET PASSWORD MANAGEMENT DATABASE REPLICATION GUIDE

HYPERION SYSTEM 9 N-TIER INSTALLATION GUIDE MASTER DATA MANAGEMENT RELEASE 9.2

AvePoint SearchAll for Microsoft Dynamics CRM

ThirtySix Software WRITE ONCE. APPROVE ONCE. USE EVERYWHERE. SMARTDOCS SHAREPOINT CONFIGURATION GUIDE THIRTYSIX SOFTWARE

Define ODBC Database Library using Management Console

Accounts Receivable: Importing Remittance Data

User Document. Adobe Acrobat 7.0 for Microsoft Windows Group Policy Objects and Active Directory

Using Temporary Tables to Improve Performance for SQL Data Services

Tool Tip. SyAM Management Utilities and Non-Admin Domain Users

High Availability Configuration

Oracle Financial Services Data Integration Hub Foundation Pack Extension for Data Relationship Management Interface

Configuring a Microsoft SQL Server Resource in Metadata Manager 9.0

Report and Dashboard Template User Guide

Use the Microsoft Office Word Add-In to Create a Source Document Template for Microsoft Dynamics AX 2012 WHITEPAPER

Integrating Trend Micro OfficeScan 10 EventTracker v7.x

Dell Statistica Document Management System (SDMS) Installation Instructions

Enabling Remote Management of SQL Server Integration Services

Managing Software Updates with System Center 2012 R2 Configuration Manager

ELM Server Exchange Edition Virtual Archive Mailbox version 5.5

Installation Instruction STATISTICA Enterprise Server

SAP InfiniteInsight Explorer Analytical Data Management v7.0

How to Resolve the POODLE Vulnerability in Native Connection to Oracle

ACT! by Sage. Premium for Workgroups 2007 (9.0) Administrator s Guide to the ACT! Reader Utility

IBM DB2 XML support. How to Configure the IBM DB2 Support in oxygen

Administration Guide. Novell Storage Manager for Active Directory. Novell Storage Manager for Active Directory Administration Guide

ORACLE USER PRODUCTIVITY KIT USAGE TRACKING ADMINISTRATION & REPORTING RELEASE 3.6 PART NO. E

Specops Command. Installation Guide

Connector for Microsoft Dynamics Configuration Guide for Microsoft Dynamics SL

STATISTICA VERSION 10 STATISTICA ENTERPRISE SERVER INSTALLATION INSTRUCTIONS

Coveo Platform 7.0. Microsoft Dynamics CRM Connector Guide

Update and Installation Guide for Microsoft Management Reporter 2.0 Feature Pack 1

Nexio Connectus Cluster Set Up with SQL Server Backend

AD RMS Step-by-Step Guide

Converting InfoPlus.21 Data to a Microsoft SQL Server 2000 Database

Transcription:

Data Domain Discovery in Test Data Management 1993-2016 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. All other company and product names may be trade names or trademarks of their respective owners and/or copyrighted materials of such owners.

Abstract You can run profiles to discover primary keys, entities, and data domains in Test Data Management (TDM). This article documents the steps to perform data domain discovery in TDM. Supported Versions Test Data Management 9.6.0 Persistent Data Masking and Data Subset 9.5.2 and HotFixes Table of Contents Overview.... 2 Profiling in TDM.... 3 Example Scenario.... 3 Prerequisites.... 4 Overview of the Steps.... 4 Step 1. Create a Project and Add a Data Source to the Project.... 5 Step 2. Create a Data Domain.... 6 Step 3. Add the Data Domain to a Policy.... 6 Step 4. Add the Policy to the Project.... 7 Step 5. Create and Run a Profile.... 9 Step 6. Monitor the Profile Job from the Monitor View.... 10 Step 7. Review and Apply the Profile Results.... 12 Apply the Profile Results to Source Data.... 13 Overview Run a profile in TDM to understand your data before you perform subset or masking operations. Use the results to apply masking rules to multiple columns at a time or instead of manually configuring data subset entities in a subset operation. This document describes how to run a profile on source data to discover data domains. A data domain profile identifies source columns that contain similar data and assigns the same data domain to the columns. A data domain contains a regular expression that defines patterns in the data or patterns in column names. When you run the profile, the profile finds the columns that match the criteria in the regular expression contained in the data domain. When you configure a profile for data domain discovery, select the tables to search in the data domain discovery operation. Select which data domains to search for in the tables. You can select policies that contain data domains instead of selecting each data domain to search for. After you run the profile for data discovery, you can view the profile results. The profile results assign source columns to data domains. You can choose which profile results to use when you mask the data in TDM. 2

Profiling in TDM Understanding the data before you use it in a masking or subset operation can help you mask similar data with similar rules or discover entities that you can use in a subset operation. You can use the data discovery feature in TDM to run profiles to understand the data. You can run data domain profiles, entity profiles, and primary key profiles in TDM. You can run profiles only on relational sources. You can run profiles on the following relational source types: Oracle Microsoft SQL Server IBM DB2 Sybase Teradata Use the ODBC connection type to connect to Sybase and Teradata sources. You can also create and run profiles in PowerCenter and Informatica Data Quality and export the results. You can then import the profile results into TDM. You can run profiles on relational and nonrelational sources and import the results into TDM. To view the results of a profile on nonrelational sources in the TDM UI, you can run the profile in PowerCenter or Informatica Data Quality and import the results into the TDM repository. TDM supports certain features of profiling in Informatica Data Quality. You can import the results of the profiles that use supported features. TDM supports the Enterprise Discovery Profile and Profile options. You can import the results of data discovery performed with mapplets created using simple regular expressions. You cannot import results if the mapplets use labeler, tokenizer, content set, or reference tables. You can import and view domain discovery results of profiles run in the project. You can import the results of profiles created in folders within the project, but you cannot view the results in TDM. Tables that you use in a profile must have the same connection as when the table was imported into the repository. If you use a different connection in the profile, you might encounter unexpected results. You cannot use two tables with the same name in a profile. If a project contains more than one table with the same name, you must run a separate profile for each of the tables. Example Scenario Company X provides data analytics services to its customers. It works with large volumes of data, including sensitive data, received from its customers. Company X uses TDM to mask the sensitive data before using the data on its systems. Applying masking rules to individual columns of data can be tedious, given the volume of data. Multiple columns in different tables might contain data that needs to be masked with the same rules. It would help to be able to apply the masking rules to multiple columns at a time. Because all sensitive data must be masked, it is necessary to identify and mask all sensitive data. Company X uses TDM to run a data domain profile to understand which columns in which tables have similar data and what kind of data. The profile results list columns that belong to a data domain, based on either column title or column content. Company X can analyze the results and apply masking to all columns in a data domain. 3

In this example, we perform data domain discovery on the following tables: Table ACCOUNT CITY BRANCHES BANKACCOUNTS Columns ACCOUNT_ID ACCOUNT_TYPE BRANCH_ID MIN_BALANCE CITY_ID NAME STATE_ID TYPE BRANCH_ID CITY_ID ACCOUNT_ID EMP_ID SNO The objective is to find all columns that have sensitive data like names and account numbers. We can then apply the same masking rules to all columns with names and to all columns with account numbers at the same time. To achieve this, identify masking rules for names and account numbers. Add the rules to a data domain. Add the data domain to a policy in the project. Perform data domain discovery on the data. Analyze the results and directly apply the masking rules in the policy to the columns in the data domain. Prerequisites This document describes how to perform data domain discovery on source data in TDM. Before you can perform the tasks described in this document, you must ensure you have all the prerequisites in place. Before you run a profile in TDM, perform the following tasks: Install and configure the compatible version of Informatica services. TDM 9.5.2 and hotfixes work with Informatica services 9.5.1. TDM 9.6.0 works with Informatica services 9.6.0. If you have installed Informatica services 9.5.1, you must install EBF 12070. Install and configure TDM. Ensure that data discovery is enabled in TDM. Create the required connections in TDM. For more information about product requirements and supported platforms, see the Product Availability Matrix on Informatica Network: https://network.informatica.com/community/informatica-network/product-availability-matrices/overview Overview of the Steps To discover data domains in source data, perform the following tasks: 1. Create a project and add a data source to the project. 2. Create a data domain. 4

3. Add the data domain to a policy. 4. Add the policy to the project. 5. Create and run a profile. 6. Monitor the job from the Monitor view. 7. Review and apply the profile results. Step 1. Create a Project and Add a Data Source to the Project Create a project PROJECT_PROFILE_DD and add the data sources. 1. In Test Data Manager, click Projects. A list of projects appears. 2. Click Actions > New. 3. In the Create Project dialog box, enter project properties. The following table describes the project properties: Option Name Description PowerCenter Repository Folder Owner Description The name of the project. Enter PROJECT_PROFILE_DD. The description of the project. The name of the PowerCenter repository to store the repository folder. The name of the project folder in the repository. Default is the project name. You can choose another folder in the repository. The name of the user that owns the folder. The folder owner has all permissions on the folder. The default owner is the user who created the folder. You can select another user as the folder owner. 4. Click OK. The properties of the project appear in Test Data Manager. 5. Click Actions > Import Metadata. The Import Metadata window appears. 6. Choose Datasource Connection and select the database connection from the list. This imports metadata from a database connection. 7. Click Next. 8. Select the schema to import. You can filter schemas by schema name. 9. Click Next. 10. Select the tables to import. You can filter the tables by data source, table name, or table description. 11. Click Next. 12. Choose Import Now. This imports the data source immediately. 13. Click Finish. You can view the progress of the import job in the Monitor view. After the job completes, you can access the imported metadata through the Data Sources details view. 5

Step 2. Create a Data Domain A data domain is an object that represents the functional meaning of a column based on the name of the column or the data the column contains. When you create a data domain, you create a data expression that describes the data format of the column that you want to mask. You can also create multiple metadata expressions that describe probable column names. After you define a data domain, add masking rules to the data domain. Identify masking rules for names and account numbers. 1. Click Policies from the home page. The Policies view shows a list of policies, data domains, and rules in the TDM repository. 2. Click Actions > New > Data Domain. 3. Enter the name, sensitivity level, and description for the data domain. Create a data domain DD_Names. Click Next. 4. Enter a regular expression to filter columns by data pattern of the column data. Enter the following data pattern: ^(\d){8}$ 5. Click Next to add regular expressions that filter columns by column name. You can add multiple expressions. 6. Enter regular expressions to filter columns by column name. Enter the following metadata pattern: (?i) (a(cct ccount)_?(number num nbr no)) 7. Click Next to apply preferred masking rules to the data domain. 8. To add preferred masking rules to the data domain, click Add Rules. Add the two rules identified for names and account numbers to the data domain. The Add Rules dialog box appears. 9. Select the data masking rules to add. 10. Click OK. 11. Enable one rule as the default rule. 12. Click Finish. The data domain properties page appears. View the data and metadata patterns included and the rules included in the data domain. Step 3. Add the Data Domain to a Policy You cannot add a data domain directly to a project. Add the data domain to an existing policy that you can add to a project, or create a new policy to add the data domain. Create a new policy Policy_Names and add the data domain created in the previous step to the policy. 1. In the Policies view, click Actions > New > Policy. The New Policy dialog box appears. 2. Enter a name and optional description for the policy and click Next. Create a policy Policy_Names. 3. To add data domains to the policy, click Add Data Domains. 4. From the list, select the data domain DD_Names. 5. Click Finish. The policy appears in the Policies view. 6

Step 4. Add the Policy to the Project To use a policy in a profile, you must add the policy to the project in which you create and run the profile. Add the policy Policy_Names to the project PROJECT_PROFILE_DD. 1. To access the projects, click Projects. A list of projects appears. 2. Select the project that you want to edit. Open the project PROJECT_PROFILE_DD. The project opens in a separate tab with project properties and details about data sources. 3. In the Policies view, click Actions > Add Policies. 7

4. In the Add Policies page, browse to and select the policy, and click OK. The policy is added to the project. You can view the data domains and the preferred rules that are added to the policy. 8

Step 5. Create and Run a Profile After you create the required data domain and add it to the project, create and run a data domain profile in the project. Create a profile Profile_DD_Names in the project PROJECT_PROFILE_DD. A project must contain policies before you can create a data domain profile. The policies contain the data domains that you can use in a profile for data discovery. Before you perform this step, ensure that you have added the policy to the project. Perform data domain discovery on source data to identify data that matches the data format defined in this data domain. Then apply masking rules in the data domain to the columns that match the data domain. 1. Open the project and click the Discover view. 2. Click the Profile view. The Profile view shows a list of profiles in the project. 3. Click Actions > New Profile to create a new profile. 4. In the New Profile dialog box, enter the profile name and description. Choose to create a data domain profile. Select the Data Domain check box. 5. Click Add Tables to add tables to the profile. 6. Select the tables that you want to add and click OK. Select tables ACCOUNT, BANKACCOUNT, BRANCHES, and CITY. 7. Click Next. 8. In the Select Sampling Options pane, choose whether to add policies or data domains to the profile. When you select a policy, Test Data Manager includes all the data domains in the policy. Test Data Manager returns a list of policies or data domains in the pane. 9

9. Select the policies or the data domains to profile. Select the policy Policy_Names. 10. In the Sampling panel, select whether to run data discovery on the source data, the column name, or the data and the column name. You can run a profile for column metadata and then run it again for the source data. Run the profile on the data and column name. 11. Enter the maximum number of rows to include in the profile. 12. Enter the minimum conformance percent. All rows might not conform to the data domain expression pattern. You can enter a minimum percentage of conformance. 13. Click Save. 14. Click Actions > Execute. 15. Select the connection for the data source. Use the same connection that you used to import the table in the repository. 16. Click Execute. Step 6. Monitor the Profile Job from the Monitor View After you run the profile, you can check the status of the profile job. 1. Open the project and click Monitor. The list of jobs for the project appear. 10

2. Select the job to view the job details in the Properties pane. The status updates when the job finishes. 11

3. Click the job ID to view the logs on the job log page. The TDM server generates a log file that you can view to debug problems when a TDM job fails. Step 7. Review and Apply the Profile Results The data domain profile results show a list of source columns and valid data domains to assign to the columns. You can select which data domain candidates to use for data masking from the profile results. 1. Close the profile and open it again. 2. Click the Data Domain view. 3. Select a column and click the Data Preview tab to view the source data of the selected column. The data viewer displays the first 200 records of the columns returned by the data domain profile. 4. Verify the data domain suggested in the Profiled Data Domain column in the results. 5. Select Approve or Reject from the Status column and click Save to approve or reject the data domain. 6. Repeat this for all rows. You can assign rules in the data domain to each column after you approve the suggested data domain. 12

7. Mark the data domain classification as completed after you finish approving all the results. Click Actions > Mark domain classification as completed. Completing the data domain classification does not affect any process. Use this method to verify that you reviewed all the results. Apply the Profile Results to Source Data You can assign the rules in the data domain to each column after you approve the suggested data domain in the profile results. Assign rules in the Define Data Masking view. The preferred rules for the data domain appear at the top of the list in the Masking Rule column. You can apply the default data domain rules to multiple columns at a time. u In the project, click Define Data Masking to access the Data Masking view. To assign rules to one column at a time, perform the following steps: 1. Select a column to assign a masking rule. 2. If the Domain value is blank for the column, click the Policy column and choose a policy that contains the data masking rule that you want to assign. 3. Click inside the Masking Rule column to view the list of available rules. The data domain preferred rules appear at the top of the list. The other rules in the policy appear at the bottom of the list. 4. Select a masking rule. 5. Click Save for each column that you update. To assign default data domain rules to multiple columns at the same time, perform the following steps: 1. Select the columns to which you want to assign default values. 2. Click Rule Assignment. The Rule Assignments dialog box appears. 3. Select the columns to which you want to apply the default values. You can select the check box at the top of the dialog box to select all rows. 4. Click Default Assignments. Test Data Manager updates each column with the default rule. 5. Click Save. 13

Author Sadhana Kamath Senior Technical Writer Acknowledgements The author would like to acknowledge Praveen Parupudi, Senior Software Engineer, for his technical assistance. 14