Integrated Data Management: Discovering what you may not know



Similar documents
IBM InfoSphere Discovery: The Power of Smarter Data Discovery

IBM InfoSphere Optim Test Data Management Solution

Test Data Management in the New Era of Computing

IBM InfoSphere Optim Test Data Management

IBM Software Five steps to successful application consolidation and retirement

IBM Software Making the case for data lifecycle management

IBM InfoSphere Optim Data Masking solution

Integrating Netezza into your existing IT landscape

IBM InfoSphere Optim Test Data Management solution for Oracle E-Business Suite

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

Welcome Tata Consulting Services, DSP Managed Services IBM and Azlan. Oracle e-business Suite. R12 Upgrade Workshop Summer 2011

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition

White Papers. Best Business Practices in Implementing IBM Optim. Abstract. Seemakiran Head of India Operations

Data Profiling and Mapping The Essential First Step in Data Migration and Integration Projects

A WHITE PAPER By Silwood Technology Limited

IBM Optim. The ROI of an Archiving Project. Michael Mittman Optim Products IBM Software Group IBM Corporation

Beyond the Single View with IBM InfoSphere

MDM and Data Warehousing Complement Each Other

Informatica ILM Archive and Application Retirement

InfoSphere Governance Solutions Maximizing your Information Supply Chain

16 TB of Disk Savings and 3 Oracle Applications Modules Retired in 3 Days: EMC IT s Informatica Data Retirement Proof of Concept

Data Masking: A baseline data security measure

Case Study : How an Islamic Bank managed data growth and improved application performance using Database Archiving

What to Look for When Selecting a Master Data Management Solution

Data warehouse and Business Intelligence Collateral

Why Add Data Masking to Your IBM DB2 Application Environment

Master Data Management

JOURNAL OF OBJECT TECHNOLOGY

Balance and maximise your Oracle EBS investment with IBM Optim A Priceline and Travel Industry Case Study Philip McBride

Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution

Mergers and Acquisitions: The Data Dimension

Washington State s Use of the IBM Data Governance Unified Process Best Practices

A discussion of information integration solutions November Deploying a Center of Excellence for data integration.

Contents. Introduction... 1

Enterprise Data Management

Big Data-Challenges and Opportunities

Application retirement: enterprise data management strategies for decommissioning projects

SMART ARCHIVING. The need for a strategy around archiving. Peter Van Camp

DBKDA 2012 : The Fourth International Conference on Advances in Databases, Knowledge, and Data Applications

Placing Your Applications in the Best Cloud Model

White Paper. An Overview of the Kalido Data Governance Director Operationalizing Data Governance Programs Through Data Policy Management

How to address top problems in test data management

Building Effective Test Data Management In Distributed Environment

Business-driven governance: Managing policies for data retention

Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff

Test Data Management Concepts

White Paper. Successful Legacy Systems Modernization for the Insurance Industry

IBM Software The fundamentals of data lifecycle management in the era of big data

Datamaker - the complete Test Data Management solution

Best Practices in Contract Migration

Real World Strategies for Migrating and Decommissioning Legacy Applications

Enabling Data Quality

Industry models for insurance. The IBM Insurance Application Architecture: A blueprint for success

Database-Archiving Products Are Gaining Market Traction

<Insert Picture Here> Oracle Database Security Overview

Service Oriented Architecture (SOA) An Introduction

Introduction to Oracle Business Intelligence Standard Edition One. Mike Donohue Senior Manager, Product Management Oracle Business Intelligence

Vertical Data Warehouse Solutions for Financial Services

IBM Software A Journey to Adaptive MDM

Test Data Management

Enterprise Information Management Services Managing Your Company Data Along Its Lifecycle

Enterprise Data Quality

IBM Software Wrangling big data: Fundamentals of data lifecycle management

Oracle Database 12c Plug In. Switch On. Get SMART.

Core Banking Transformation using Oracle FLEXCUBE

Foundations of Business Intelligence: Databases and Information Management

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

<Insert Picture Here> Master Data Management

Oracle Database Security. Paul Needham Senior Director, Product Management Database Security

Enforce Governance, Risk, and Compliance Programs for Database Data

Enterprise Data Integration The Foundation for Business Insight

IBM Solution Framework for Lifecycle Management of Research Data IBM Corporation

60 TB of Savings in 4 Days: EMC IT s Informatica Data Archive Proof of Concept

Why is Master Data Management getting both Business and IT Attention in Today s Challenging Economic Environment?

A Database Re-engineering Workbench

Oracle BI Application: Demonstrating the Functionality & Ease of use. Geoffrey Francis Naailah Gora

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Getting started with a data quality program

An Oracle White Paper June Oracle Database 11g: Cost-Effective Solutions for Security and Compliance

The New Economics of SAP Business Suite powered by SAP HANA SAP AG. All rights reserved. 2

MDM for the Enterprise: Complementing and extending your Active Data Warehousing strategy. Satish Krishnaswamy VP MDM Solutions - Teradata

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington

BIG DATA THE NEW OPPORTUNITY

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

Data Quality Assessment. Approach

Transcription:

Integrated Data Management: Discovering what you may not know Eric Naiburg ericnaiburg@us.ibm.com

Agenda Discovering existing data assets is hard What is Discovery Discovery and archiving Discovery, test data management and data privacy Discovery and application consolidation and retirement Summary

Data management must drive competitive advantage 75% of CIO s believe they can strengthen their competitive advantage by better using and managing enterprise data. 78% of CIO s want to improve the way they use and manage their data. but only 15% believe that their data is currently comprehensively well managed. Source: Accenture CIO Data Management Survey 2007. n=167 CIOs Through 2009, IT leaders and information architects must develop a vision for their future information architecture for technologies related to data management * *Source: Gartner Research, The Gartner Data Management and Integration Vendor Guide, 2009 Regina Casonato, Mark A. Beyer, Ted Friedman; April 24, 2009

Innovation comes through integration Information is Related Across the Enterprise Channels Business Units Data Systems Providers Finance Administration DB CRM App DB DW App Core Systems ERP Health Plans Sales & Marketing DB CRM App DB DB App Core Systems ODS Patient / Member Contact Centers Internet Care Management Ancillary Services DB CRM DB CRM App DB App DB DW App DB App Core Systems Core Systems CIF Employers New Business Development DB CRM App DB DW App Core Systems Partners

IBM Solutions for Integrated Data Management An integrated, modular environment to manage enterprise application data and optimize data-driven applications, from requirements to retirement across heterogeneous environments

Optim is a Platform for Integrated Data Management Integrated Data Management Test & Development Databases Production Databases Value: Automates analysis of data and data relationships for complete understanding of data assets IBM InfoSphere Discovery Define the business objects for archiving and subsetting Identify all instances of private data so that they can be fully protected Discover undocumented business rules used to transform data from existing systems Prototype and test new transformations for the target system IBM Optim Test Data Management Solution Value: Speed Application Delivery Create realistic and manageable test environments Speed application delivery Improve Test Coverage Improve Quality IBM Optim Data Privacy Solution Value: Risk Management Protect PII Data Apply Single Data Masking Solution Leverage realistic data IBM Optim Application Retirement Solution Value: Reduce Infrastructure Cost & Compliance Decommission redundant or obsolete applications Retain Access to historical data IBM Optim Data Growth Solution Value: Improve Application Performance, Reduce Infrastructure Costs & Improve Compliance Retain only needed data, move the rest to archives Deploy Tiered Storage Strategies Retain Data According to Value Simplify Infrastructure

Supporting enterprise environments Discovery Test Data Management Data Privacy Data Growth Application Retirement Organization environments are diverse, yet interrelated therefore what you use to manage the data MUST support across your environment

You can t manage what you don t understand Distributed Data Landscape Highly distributed over multiple applications, databases and platforms Complex, poorly documented data relationships Which clients are eligible for the new sales promotion Which version of the data should we use for the ERP consolidation Relationships not understood because: Corporate memory is poor Documentation is poor or nonexistent Logical relationships (enforced through application logic or business rules) are hidden 8

Impact of NOT understanding core information assets 83% of data integration projects either overrun or fail Scrap and rework Increased $$$ Lack of consumer confidence Inaccurate or incomplete data is a leading cause of failure in business-intelligence and CRM projects 25% of time is spent clarifying bad data Lost opportunities Low data quality costs companies $611 billion annually Undetected defects will cost 10 to 100 times as much to fix upstream

Understand your distributed data landscape IBM InfoSphere Discovery automates analysis of data and data relationships for complete understanding of data assets: Identifies the relationships that link data elements into a business object within a source Customer, counterparty, invoice Identifies the complex logic that relates business objects across multiple sources 10

Automation accelerates time to deployment Data Growth Management: Automates discovery of referential integrity and business objects Data Consolidation, Integration & Migration : Discovers transformation and business logic between data sources Prototypes empty targets from the combination of many data sources Data Privacy: Discovers hidden sensitive data Discovery Discovery is is the the first first phase phase of of information information centric centric projects projects Discovery Phase Data Growth Consolidate What is unique Analyzes data values and patterns and produces actionable results Discovers complex relationships within and between data sources Transformation Rule Discovery Data Privacy

InfoSphere Discovery Requirements Discovery Accelerate project deployment by automating discovery of your distributed data landscape Define business objects for archival and test data applications Discover data transformation rules and heterogeneous relationships Identify hidden sensitive data for privacy Benefits Automation of manual activities accelerates time to value Business insight into data relationships reduces project risk Provides consistency across information agenda projects 12

Re-use shareable business objects Test Data Generation Application Consolidation Data De-identification Data Quality Data Integration Data Archival Master Data Management Group related tables in to logical business objects Single click to create a consistent sample set across business objects Re-use as shared objects in Infosphere Data architect & Optim Data Warehousing Enterprise Projects 13

Discovery for Data Archiving 14

Uncontrolled Data Growth Impacts cost Production 500 GB Training Training 500 GB Unit Test Unit Test 500 GB Production Integration System Test 500 GB UAT 500 GB System Test Integration 500 GB UAT Total 3 TB 15

Optim Data Growth Solution mitigates cost Production 200GB Training Training Unit Test 200GB 200GB Unit Test System Test 200GB Current Production Integration UAT Integration 200GB 200GB System Test UAT Total 1.2 TB Storage reduced by 60% 16

Complete Business Objects Are Critical for Data Archiving Payments Represents application data record payment, invoice, customer Referentially-intact subset of data across related tables and applications; includes metadata Provides historical reference snapshot of business activity Federated extract support across enterprise data stores 17

Complete business object: the challenge Where are they What are they How do I find them 18

Complete business object: automated discovery solution Automated discovery of Primary Foreign Keys 19

Complete business object: automated discovery solution Payments Automated grouping of tables into business entities Optim will automatically generate service definition/requests based on these entities. 20

InfoSphere Discovery for data archiving projects Analyze one or more data sources simultaneously Perform column analysis Identify primary-foreign keys Identify business objects Export business objects to Optim for archiving Other: Generate referentially consistent sample sets Identify critical data elements and overlaps across data sources 21

Discovery for Data Privacy and Test Data Management 22

Uncontrolled Data Growth Impacts cost Production 500 GB Training Training 500 GB Unit Test Unit Test 500 GB Production Integration System Test 500 GB UAT 500 GB System Test Integration 500 GB UAT Total 3 TB 23

Optim Data Growth Solution mitigates cost Production 200GB Training Training Unit Test 200GB 200GB Unit Test System Test 200GB Current Production Integration UAT Integration 200GB 200GB System Test UAT Total 1.2 TB Storage reduced by 60% 24

Optim Test Data Management mitigates cost Production 200GB Training 25GB Training Unit Test 25GB System Test 200GB Current Production Integration Unit Test UAT Integration 25GB 25GB UAT System Test Total 500GB Infrastructure reduced by 83% 25 Creating right-sized targeted test environments saves storage costs & speeds testing

Rendering data unusable to protect privacy - masking Removing, masking or transforming elements that could be used to identify an individual Name, address, telephone, SSN / National Identity number, credit card # Masked data must be appropriate to the context Within permissible range of values Application-aware Some other names you may see for masking Obfuscation, Scrambling, Data de-identification, Privacy Your Credit Card Your Credit Card 4212 5454 6565 7780 GOOD THRU > 12/09 EUGENE V. WHEATLEY 4536 6382 9896 5200 GOOD THRU > 12/09 SANFORD P. BRIGGS Before Masking After Masking

Optim Test Data Management & Data Privacy solutions Production Validate and Compare Test Subset Mask Propagate PeopleSoft / DB2 Siebel / Oracle Custom App / any DBMS Automate creation of complete test environment De-identify for privacy protection Deploy multiple masking algorithms Substitute real data with fictionalized yet contextually accurate data Provide consistency across environments and iterations No value to hackers Enable off-shore testing Compare results to identify defects early PeopleSoft / DB2 Siebel / Oracle Custom App / any DBMS

Using discovery to identify confidential data Some instances of sensitive data are easy to recognize, but others are hidden Compounded with other data elements in a row Broken apart and spread into multiple columns Buried within comment or text fields Hidden instances of private data represent a potential compliance risk 28

Sensitive data discovery Known Sensitive Sensitive Data Repository Data Row Member SS # A ge Phone Sex 1 595846226 123-45-6789 15 (123) 456-7890 M 2 567472596 138-27-1604 8 (138) 271-6037 F 3 540450091 154-86-4196 22 (154) 864-1961 M 4 514714372 173-44-7900 55 (173) 447-8996 F 5 490204164 194-26-1648 4 (194) 261-6476 F 6 466861109 217-57-3046 66 (217) 573-0453 M 987,623 444629628 243-68-1812 25 (243) 681-8107 F 987,624 423456789 272-92-3629 87 (272) 923-6280 M Finding Sensitive Data Elements (SDE) in each system can take days Whole and partial SDE s can be found in hundreds of tables and fields

InfoSphere Discovery for sensitive data Analyze multiple data sources simultaneously Discover sensitive data by comparing known sensitive data with data in a wide variety of systems at the push of a button Identified sensitive data elements (SDEs) are exported to Optim for masking 30

InfoSphere Discovery for hidden sensitive data Automates discovery of complex business rules between data sources Finds sensitive data hidden within longer fields (e.g. SSN hidden in a 46 digit routing number) Finds sensitive data that has been divided up across multiple columns (e.g. SSN divided into three separate columns) Finds sensitive data that has been transformed (i.e. items converted into codes) 31

Discovery for Application Retirement and Data Migration 32

Keep data available Consolidate multiple applications into a single instance and retire unused applications Move from home grown to packaged system Custom built General Ledger to PeopleSoft Financials Consolidate similar systems due to mergers and acquisitions Consolidate an independent business process with others Move automation capabilities into a single system and retire independent application Move application from an old to new architecture Not all data is relevant for the move, but it must be retained Shut down legacy system without a replacement In almost ALL cases, access to legacy data MUST be retained while the application and database are eliminated

Before application retirement and consolidation: you must know Archive Legacy Application Data Data from other applications New Application What are the business objects and data structures which are needed for intelligent archiving How does the legacy data map to the new application data structures How do other related applications map to the new application

Discover the business objects Archive Legacy Application Data Data from other applications New Application Discovery automates the identification of referential integrity and business objects to accelerate time to deployment for archiving

Map the legacy data to the consolidated application Archive Legacy Application Data Data from other applications New Application What are the business objects and data structures which are used for archiving How does the legacy data map to the new application data structures How do other related applications map to the new application

Data migration & consolidation is extremely difficult What is in each data source What are the matching keys used to align the rows Which sources do you trust New Application How do you combine the columns together 37

InfoSphere Discovery for unified schema prototypes Prototype migration of one or more sources into a new target application Align columns map sources to the new schema Align rows - analyze matching keys Match and Merge - analyze conflict detection and resolution rules, identify trusted sources, generate matched and merged prototypes Generates actionable rules for migrating data to the new application (SQL & FastTrack) 38

Map other applications to the new application Archive Legacy Application Data Data from other applications New Application What are the business objects and data structures which are used for archiving How does the legacy data map to the new application data structures How do other related applications map to the new application

Mapping data is very difficult Data from other applications How will we get data from our other applications into the new application How do I know I have the same transaction across applications What is the matching key that will align the rows across applications New Application What happens if the data formats and structures are different What is the transformation logic we need to map the new application to existing applications

InfoSphere Discovery transformation analyzer automates data mapping Distributed Enterprise Structured Data If age<18 and Sex=M then 0 If age<18 and Sex=F then 1 If age>=18 and Sex=M then 2 If age>=18 and Sex=F then 3 = Demo1 What is unique Discovers cross-system business rules, transformations and data exceptions by examining data values Transformation Analyzer: Automates discovery of: cross-system business rules and transformations data inconsistencies Detailed data mapping between 2 data sources Discrepancy discovery Cross source troubleshooting workbench Applicability Map a legacy applications to newly deployed applications Discover cross-source rules for data consolidation 41

IBM solutions manage costs, speed success and reduce risk 10-20x 10-20x time time savings savings identifying identifying data data objects objects 30-40% 30-40% Storage Storage savings savings 40%-75% 40%-75% Performance Performance boost boost InfoSphere Discovery Automates analysis of data and data relationships for complete understanding of data assets to identify the relationships that link data elements into a business object within a source and discovery sensitive data Optim Data Growth Solution Reduces the size of production databases improving application performance, reducing hardware and software costs and maintaining adherence to data governance regulations and policies 96% 96% Time Time savings savings 2x 2x the the data data protected protected Optim Test Data Management Solution Creates right-sized test environments to reduce data propagation, and related hardware and software costs; while increasing team efficiency by significantly speeding the creation of test environments Optim Data Privacy Solution Protects the confidentiality of data in non-production environments such as test through intelligent de-identification (i.e., masking) making data worthless if lost or stolen

Summary You don t know what you don t know and that is usually what will hurt you Data centric projects require extensive knowledge of existing systems and the most cost and time effective way of achieving that is through automation IBM InfoSphere Discovery automates analysis of data and data relationships for complete understanding of data assets to speed time to project success

44