IBM InfoSphere Discovery: The Power of Smarter Data Discovery

Similar documents

Integrated Data Management: Discovering what you may not know

Integrating Netezza into your existing IT landscape

IBM InfoSphere Information Server Ready to Launch for SAP Applications

IBM InfoSphere Optim Test Data Management

Beyond the Single View with IBM InfoSphere

Enterprise Information Management Services Managing Your Company Data Along Its Lifecycle

Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff

IBM Software Five steps to successful application consolidation and retirement

JOURNAL OF OBJECT TECHNOLOGY

Master Your Data and Your Business Using Informatica MDM. Ravi Shankar Sr. Director, MDM Product Marketing

MDM and Data Warehousing Complement Each Other

IBM InfoSphere Optim Test Data Management Solution

IBM Data Warehousing and Analytics Portfolio Summary

appmdmtm MASTER DATA MANAGEMENT

Getting started with a data quality program

Service Oriented Data Management

IBM Software A Journey to Adaptive MDM

Key New Capabilities Complete, Open, Integrated. Oracle Identity Analytics 11g: Identity Intelligence and Governance

Test Data Management in the New Era of Computing

Master Data Management

Enabling Data Quality

Microsoft. Course 20463C: Implementing a Data Warehouse with Microsoft SQL Server

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition

Course Outline. Module 1: Introduction to Data Warehousing

Bringing Strategy to Life Using an Intelligent Data Platform to Become Data Ready. Informatica Government Summit April 23, 2015

Continuing the MDM journey

Implementing and Executing Data Governance & Quality Strategy at Northern Trust

Industry Models and Information Server

DATA GOVERNANCE AND DATA QUALITY

Implementing a Data Warehouse with Microsoft SQL Server MOC 20463

COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

iway Roadmap Michael Corcoran Sr. VP Corporate Marketing

Master Data Management What is it? Why do I Care? What are the Solutions?

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

DECOMMISSIONING CASE : SIEBEL UCM TO INFORMATICA MDM HUB

Introduction. Architecture Re-engineering. Systems Consolidation. Data Acquisition. Data Integration. Database Technology

Harness the value of information throughout the enterprise. IBM InfoSphere Master Data Management Server. Overview

SAP BusinessObjects Information Steward

Implementing a Data Warehouse with Microsoft SQL Server

Using SAP Master Data Technologies to Enable Key Business Capabilities in Johnson & Johnson Consumer

<Insert Picture Here> Master Data Management

IBM Solution Framework for Lifecycle Management of Research Data IBM Corporation

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

THOMAS RAVN PRACTICE DIRECTOR An Effective Approach to Master Data Management. March 4 th 2010, Reykjavik

Implementing a Data Warehouse with Microsoft SQL Server

Informatica PowerCenter Data Virtualization Edition

Implementing a SQL Data Warehouse 2016

Bringing agility to Business Intelligence Metadata as key to Agile Data Warehousing. 1 P a g e.

Adopting the DMBOK. Mike Beauchamp Member of the TELUS team Enterprise Data World 16 March 2010

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success

Introduction to Glossary Business

Test Data Management Concepts

Business-driven governance: Managing policies for data retention

An RCG White Paper The Data Governance Maturity Model

A WHITE PAPER By Silwood Technology Limited

Data Virtualization and ETL. Denodo Technologies Architecture Brief

Capability Statement. Enterprise Data Management with the Datalynx Data Management Suite

Implementing a Data Warehouse with Microsoft SQL Server

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

IBM Analytics Make sense of your data

James Serra Data Warehouse/BI/MDM Architect JamesSerra.com

<Insert Picture Here> Oracle Master Data Management Strategy

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

Data Integration Checklist

Luncheon Webinar Series May 13, 2013

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

SAP Data Services 4.X. An Enterprise Information management Solution

DATA GOVERNANCE AT UPMC. A Summary of UPMC s Data Governance Program Foundation, Roles, and Services

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

A discussion of information integration solutions November Deploying a Center of Excellence for data integration.

The Enterprise Data Hub and The Modern Information Architecture

InfoSphere Governance Solutions Maximizing your Information Supply Chain

Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

Big Data and Trusted Information

Big Data and Big Data Governance

Principal MDM Components and Capabilities

Introduction. Automated Discovery of IT assets

Cloud First Does Not Have to Mean Cloud Exclusively. Digital Government Institute s Cloud Computing & Data Center Conference, September 2014

Implementing a Data Warehouse with Microsoft SQL Server 2012

Delivering information you can trust. IBM InfoSphere Master Data Management Server 9.0. Producing better business outcomes with trusted data

Agenda. 1 Enterprise Systems & Customer 2 Customer Hub: Overview 3. 4 Customer Hub Implementation Methodology

Compunnel. Business Intelligence, Master Data Management & Compliance (Healthcare) Largest Health Insurance Company in New Jersey.

Data Virtualization. Paul Moxon Denodo Technologies. Alberta Data Architecture Community January 22 nd, Denodo Technologies

Data virtualization: Delivering on-demand access to information throughout the enterprise

Melissa Coates. Tools & Techniques for Implementing Corporate and Self-Service BI. Triad SQL BI User Group 6/25/2013. BI Architect, Intellinet

Logical Modeling for an Enterprise MDM Initiative

Integrated Solution Offerings for Insurance IBM Insurance Framework

Management Update: The Cornerstones of Business Intelligence Excellence

IBM Software Integrating and governing big data

IBM Tivoli Netcool network management solutions for enterprise

Transcription:

IBM InfoSphere Discovery: The Power of Smarter Data Discovery Gerald Johnson IBM Client Technical Professional gwjohnson@us.ibm.com 2010 IBM Corporation

Objectives To obtain a basic understanding of the IBM InfoSphere Discovery solution through discussion and demonstration 2

Agenda IBM Software Introduction InfoSphere Discovery Overview and Foundational Capabilities Summary InfoSphere Discovery Demo: Business Objects and Relationship Discovery 3

You Can t Manage What You Don t Understand Distributed Data Situation: Grows exponentially Increasingly distributed Poorly understood Problems: Poor understanding = Poor IT agility Poor understanding = Poor data governance Poor understanding = Bad data = Bad business decisions Business Applications CRM ERP Financials Risk Mgmt BI Reports Distributed Data Landscape 4

Poor Understanding = Unpredictable Project Deployment Discovery is is the the first first phase of of information-centric projects Data Growth Consolidate Discovery Phase Privacy 5

Automate Discovery and Accelerate Information Understanding Significant Acceleration of Information Agenda projects Data Growth Management Test Data Management Sensitive Data De-identification Application/Data Consolidation, Migration and Retirement Master Data Management and Data Warehousing Why is this different Data-based discovery Automate discovery of business entities, cross-source business rules and transformation logic Evaluate multiple data sources simultaneously Identify and remediate cross-system rules and inconsistencies 6

IBM InfoSphere Discovery Components Cross-Profiler Basic profiling plus automated primary-foreign key, business entity and cross-source overlaps discovery Transformation Analyzer: Discover complex business rules and transformation logic between two data sources Discover Validate Remediate Unified Schema Builder: Prototype empty targets from the combination of many data sources 7

Value to Data Management Accelerate understanding your existing distributed data landscape for: Data Archiving Test Data management Sensitive Data Application and Data Consolidation, Migration and Retirement Master Data Management and Data Warehousing Dramatic reduction in risk, time and effort for the discovery phase projects Automated discovery of business entities, cross-source business rules and anomalies Increased repeatability Verifiable results Manual Data Analysis and Validation (Time, Risk and $$) Automated data discovery, audit and remediation 8

IBM Software How Information Management Products Work Together Automated relationship discovery Accelerate and improve data quality monitoring and management Information Analyzer Assess, Monitor, Manage Data Quality Consolidation prototyping capabilities DataStage / QualityStage MDM Server Build a better quality solution the first time ETL / Data Quality, Master Data Management Discovery Understand Data Relationships Transformation rules discovery Save time finding implied logic in legacy applications FastTrack Capture Design Specification Automated business object identification and confidential data location Achieve regulatory compliance faster Optim Data Privacy, Test Data Management Data Growth & Application Retirement 2011 IBM Corporation IBM InfoSphere Discovery: The Power of Smarter Data Discovery 9

InfoSphere Foundation Tools Govern your information assets Business Glossary Term Pack Manage Business Terms Design Enterprise Models Monitor Data Flows Business Glossary Data Architect Metadata Workbench Target Model Shared Metadata Leverage Industry Practices IBM Industry Models Understand Data Relationships Discovery Capture Design Specifications FastTrack Assess, Monitor, Manage Data Quality Information Analyzer 10 10

Extending the Foundation Tools Enterprise Projects Discovery Extends and Completes Discover Phase Test Data Generation Application Retirement & Consolidation Data Archival Data De-identification Data Quality Data Integration Master Data Management InfoSphere Foundation Tools Data Warehousing Manage Business Terms Business Glossary Discover Discover Data Relationships New Discovery Design Enterprise Models Data Architect Design Capture Design Specifications FastTrack Assess, Monitor, Manage Data Quality Information Analyzer Govern Monitor Data Flows Metadata Workbench 11 11

Complete Data Analysis & Quality Assessment Solution Discovery Automating the process of understanding how data is related across heterogeneous systems Technical Users 360º data relationship discovery Automated discovery methodologies Discover business objects Identify confidential data Rapidly prototype data migration and consolidation Core Profiling Information Analyzer Data quality assessment, analysis and continuous monitoring with exception management. Business Users Assess, monitor, manage data quality Across heterogeneous global systems Business friendly interface Re-usable rules library Collaborative exception management 12 12

IBM Solutions for your Information Agenda Projects Analysis\Discovery Operation\Run Time Optim Data Growth, Decommissioning, Privacy, Test Data Management InfoSphere Discovery Pay me nts InfoSphere Information Analyzer and Validator Data Quality and Audit, Monitoring and Validation InfoSphere Information Data Transformation and movement, Application Server migration InfoSphere MDM Server MDM / Data Warehousing and Data Warehouse MetaData Server 13 13

IBM InfoSphere Discovery Components Cross-Profiler Basic profiling plus automated primary-foreign key, business entity and cross-source overlaps discovery Transformation Analyzer: Discover complex business rules and transformation logic between two data sources Discover Validate Remediate Unified Schema Builder: Prototype empty targets from the combination of many data sources 14

InfoSphere Discovery Cross-Profiler Distributed Enterprise Structured Data Data profiling and cross-system overlap analysis of up to 20 systems simultaneously Automated PF Key and Business Object discovery Extremely easy to install and use What is unique Only solution on the market that automatically discovers primary foreign keys, business entities, and performs cross-source analysis Applicable to Data Archiving Test Data Management Sensitive Data Discovery 15

Challenges in Building an Archiving Solution* Identify the data source that needs growth management Subject Discovery : rough inventory of candidate data sources, tables and files. Find information on business subjects such as purchase orders, portfolios. Figure out which area of the data need to be controlled and the central tables in these areas. Identify and Document Referential Integrity (RI). Typical data source has large schema and implicit relationships that are not declared or documented. It is difficult to find these RIs without tooling support. Identify business objects (BOs) and document Boundary of business objects is often not documented at all. It is very difficult to establish business objects without tooling support. In Optim, develop and configure archiving templates by reading RI and BO documents *Same Challenges apply for Application Retirement 16

Building Archiving/Retirement Solutions Using Discovery and Optim Identify the data source that needs growth management Take rough inventory of candidate data sources, tables and files. Use PFkey Discovery to discover and establish RIs Use InfoSphere Discovery to discover critical artifacts needed when archiving or retiring systems Use Data Object Discovery to discover, review, and establish Business Objects. Export discovered artifacts to Optim In Optim, automatically configure archiving templates based on above discovered relationships and business objects. 17

Building a Test Data Solution Without Tooling Support Identify the data source that will provide test data Compliance team develops sensitive data mandate Identify the subject tables that support the test cases Find information on test subjects such as purchase orders, portfolios. Identify subject data to be extracted to support testing Identify existence of the sensitive data mandated. Document the location. The types of sensitive data that must be masked in test data are known. Where these data reside in a large data set, is not known. Manually scanning for these sensitive data, is difficult without tooling support. Identify and Document Referential Integrity Identify business objects and document Typical data source has large schema and relationships that are not declared or documented. It is difficult to find these RIs without tooling support. Boundary of business objects is often not documented at all. It is very difficult to establish business objects without tooling support. Use SQL/custom masking functions to create test data extract based on above documentation Audit and approve the extract with privatization. 18

Building TDM Solutions Using Discovery and Optim Identify the data source that will provide test data Compliance team develops sensitive data mandate Identify the subject tables that support the test cases Primary-Foreign Key Discovery Business Object Discovery Use InfoSphere Discovery to discover critical artifacts needed for creating test data with privatization Sensitive Data Discovery Export all discovered artifacts to Optim including sensitive data classification In Optim, semi-automatically configure archiving templates based on above discovered RIs, business objects, and sensitive data elements. Audit and approve the extract with privatization. 19

Schema with no declared/documented keys 20

Schema with discovered key relationships 21

Data Object Integration with IBM Optim Data Model The same data object imported into IBM Optim Designer and ready for: Archiving Test data Masking 22

Basic Discovery Summary Analyze up to 20 data sources simultaneously Discover column level statistics Automatically discover primaryforeign keys Discover business objects Generate referentially consistent sample sets Identify critical data elements and overlaps across data sources 23

IBM InfoSphere Discovery Components Cross-Profiler Basic profiling plus automated primary-foreign key, business entity and cross-source overlaps discovery Transformation Analyzer: Discover complex business rules and transformation logic between two data sources Discover Validate Remediate Unified Schema Builder: Prototype empty targets from the combination of many data sources 24

InfoSphere Discovery Transformation Analyzer Distributed Enterprise Structured Data If age<18 and Sex=M then 0 If age<18 and Sex=F then 1 If age>=18 and Sex=M then 2 If age>=18 and Sex=F then 3 = Example What is unique Discovers cross-system business rules, transformations and data exceptions by examining data values Automates discovery of: Cross-system business rules and transformations Data inconsistencies Detailed data mapping between two data sources Discrepancy discovery Cross source trouble-shooting workbench Applicability Discover cross-source rules for data consolidation Add new sources to existing MDM hub or EDW Map an MDM hub to consuming applications 25

IBM InfoSphere Discovery Components Cross-Profiler Basic profiling plus automated primary-foreign key, business entity and cross-source overlaps discovery Transformation Analyzer: Discover complex business rules and transformation logic between two data sources Discover Validate Remediate Unified Schema Builder: Prototype empty targets from the combination of many data sources 26

Master Data Management Project Community Goal: build a new master from multiple sources: What data is in each source CRM How do you combine columns together What are the matching keys used to align the rows What is the trust precedence for each attribute Region 27

Deploying a Master Data Management System Without Tooling Support* Identify data for MDM What are the data to be moved, where are they Common schema Known models: how do I know they will fit No models, how do I build one Map sources How to map How do I make sure all maps are compatible Cleansing, enrichment Match Match keys are hard to come by Merge Not sure which source is more trustworthy or why *IMPORTANT Similar use cases when: -Migrating multiple sources to new operational system -Deploying a data warehouse 28

Deploying a Master Data Management System Using InfoSphere Discovery Identify Data for MDM Use Discovery to discover and establish Data Inventory: Column Statistics, Relationship Analysis, Data Overlaps* InfoSphere Discovery Foundational Steps Use Profiling results to inventory Critical Data Elements InfoSphere Discovery Unified Schema Builder Unified modeling -- Build new model using CDEs -- Assess completeness of a unified model Match key discovery and analysis -- Discover match keys -- Analyze the strength of a match key Unified profiling of maps and domains -- prototype maps using data views and stats -- Unified profiling to align maps cross-source Conflict detection and resolution analysis -- Trusted source analysis -- conflict resolution rules analysis prototyping 29

InfoSphere Discovery - Unified Schema Builder Unified Schema Builder: Workbench for prototyping Master Data Management (MDM) Identify critical data elements Draw asset overlap topology Build unified model Map sources, validate across Prototype match & merge New Application Data Warehouse MDM Hub Complete methodology for MDM data discovery Test it out before you roll it out What is unique Prototypes empty targets from existing source data (MDM, EDW, data migration) and populates prototype with consolidated source data 30 30

Why is it such an easy button It is not. Discovery provides a rich environment for user to guide the discovery process Confirm discovered relationships Assess/establish known/heard relationships The more review you do, the more accurate it will get but, it is kind of an easy button because Discovery guides you through a well-defined workflow to get to the end results. 31

Reminder: Value of InfoSphere Discovery Accelerate understanding and validation of your existing distributed data landscape for: Data Archiving Test Data management Application/Data Consolidation, Migration and Retirement Sensitive Data Master Data Management and Data Warehousing Automated data discovery Dramatic reduction in risk, time and effort for the discovery phase of your project Automated discovery of business entities, crosssource business rules & anomalies Increased repeatability Verifiable results Manual Data Analysis and Validation (Time, Risk and $$) 32

33