MASTER DATA MANAGEMENT IN THE AGE OF BIG DATA



Similar documents
How To Manage Data And Data Management

Big Data and Advanced Analytics Technologies for the Smart Grid

DataFlux Data Management Studio

MASTER DATA MANAGEMENT: APPLICATIONS FOR CLINICAL TRIAL DATA

Bullet-Proof MDM: Designing a World-Class Development Environment

<Insert Picture Here> Master Data Management

Building the Bullet-Proof MDM Program

SAS MDM 4.1. User s Guide Second Edition. SAS Documentation

dxhub Denologix MDM Solution Page 1

Multi-Domain Master Data Management. Subhash Ramachandran VP, Product Management

SAS MDM 4.2. User s Guide. SAS Documentation

MDM and Data Warehousing Complement Each Other

NEEDLE STACKS & BIG DATA: USING EVENT STREAM PROCESSING FOR RISK, SURVEILLANCE & SECURITY ANALYTICS IN CAPITAL MARKETS

Master Data Management What is it? Why do I Care? What are the Solutions?

Thank you for attending the MDM for the Enterprise Seminar Series!

What's New in SAS Data Management

SAS Business Data Network 3.1

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success

Product to Customer. through MDM. Presented by Luminita Vollmer, MBA, CDMP, CBIP

Luncheon Webinar Series July 29, 2010

Thank you for attending the MDM for the Enterprise Seminar Series!

Introduction to Glossary Business

Why is Master Data Management getting both Business and IT Attention in Today s Challenging Economic Environment?

Architecting for the Internet of Things & Big Data

James Serra Data Warehouse/BI/MDM Architect JamesSerra.com

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Corralling Data for Business Insights. The difference data relationship management can make. Part of the Rolta Managed Services Series

Building a Data Warehouse

Integrating MDM and Business Intelligence

Jeremy Kashel Tim Kent Martyn Bullerwell. Chapter No.2 "Master Data Services Overview"

Oracle Siebel Marketing and Oracle B2B Cross- Channel Marketing Integration Guide ORACLE WHITE PAPER AUGUST 2014

Data Governance Maturity Model Guiding Questions for each Component-Dimension

Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff

IRMAC SAS INFORMATION MANAGEMENT, TRANSFORMING AN ANALYTICS CULTURE. Copyright 2012, SAS Institute Inc. All rights reserved.

and NoSQL Data Governance for Regulated Industries Using Hadoop Justin Makeig, Director Product Management, MarkLogic October 2013

POTENTIAL DHH TECHNICAL ARCHITECTURE

SQL Server Master Data Services A Point of View

Salesforce Certified Data Architecture and Management Designer. Study Guide. Summer 16 TRAINING & CERTIFICATION

IBM Analytics Prepare and maintain your data

David Chou. Architect Microsoft

Master Data Management: More than a single view of the enterprise? Tony Fisher President and CEO

Reporting component for templates, reports and documents. Formerly XML Publisher.

Consumption of OData Services of Open Items Analytics Dashboard using SAP Predictive Analysis

RS MDM. Integration Guide. Riversand

SQLSaturday#393 Redmond 16 May, End-to-End SQL Server Master Data Services

Category: Business Process and Integration Solution for Small Business and the Enterprise

XpoLog Center Suite Data Sheet

An Introduction to Master Data Management (MDM)

Integrity 10. Curriculum Guide

Course 10977A: Updating Your SQL Server Skills to Microsoft SQL Server 2014

MDM that Works. A Real World Guide to Making Data Quality a Successful Element of Your Cloud Strategy. Presented to Pervasive Metamorphosis Conference

SAP BusinessObjects Information Steward

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Bringing Strategy to Life Using an Intelligent Data Platform to Become Data Ready. Informatica Government Summit April 23, 2015

DATA QUALITY MATURITY

Master Your Data and Your Business Using Informatica MDM. Ravi Shankar Sr. Director, MDM Product Marketing

Evolutionary Multi-Domain MDM and Governance in an Oracle Ecosystem

Mastering Data Management. Mark Cheaney Regional Sales Manager, DataFlux

An Oracle White Paper June, Strategies for Scalable, Smarter Monitoring using Oracle Enterprise Manager Cloud Control 12c

Harness the value of information throughout the enterprise. IBM InfoSphere Master Data Management Server. Overview

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Master Data Governance & SAP Information Steward Integration. Jens Sauer, SAP Switzerland September 11 th, 2013

Data Integration with Talend Open Studio Robert A. Nisbet, Ph.D.

Business User driven Scorecards to measure Data Quality using SAP BusinessObjects Information Steward

SAP HANA SPS 09 - What s New? HANA IM Services: SDI and SDQ

SAS Financial Intelligence. What s new. Copyright 2006, SAS Institute Inc. All rights reserved.

Extraction Transformation Loading ETL Get data out of sources and load into the DW

SAS BI Course Content; Introduction to DWH / BI Concepts

Managing Third Party Databases and Building Your Data Warehouse

A WHITE PAPER By Silwood Technology Limited

SAP HANA implementation on SLT with a Non SAP source. Poornima Ramachandra

DECOMMISSIONING CASE : SIEBEL UCM TO INFORMATICA MDM HUB

Business Intelligence: Using Data for More Than Analytics

SAP Sybase Replication Server What s New in SP100. Bill Zhang, Product Management, SAP HANA Lisa Spagnolie, Director of Product Marketing

Multi-Domain Master Data Management. Subhash Ramachandran VP, Product Management webmethods Product Line

Sisense. Product Highlights.

Master big data to optimize the oil and gas lifecycle

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package Data Federation Administration Tool Guide

Reporting MDM Data Attribute Inconsistencies for the Enterprise Using DataFlux

<Insert Picture Here> Oracle Master Data Management Strategy

SAS Data Management Technologies Supporting a Data Governance Process. Dave Smith, SAS UK & I

Ganzheitliches Datenmanagement

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Key New Capabilities Complete, Open, Integrated. Oracle Identity Analytics 11g: Identity Intelligence and Governance

Integrating Netezza into your existing IT landscape

Introduction to TIBCO MDM

Task definition PROJECT SCENARIOS. The comprehensive approach to data integration

Reporting Services. White Paper. Published: August 2007 Updated: July 2008

Customer Case Studies on MDM Driving Real Business Value

Beyond the Single View with IBM InfoSphere

Reference Architecture, Requirements, Gaps, Roles

ER/Studio Enterprise Portal User Guide

Implementing a Data Warehouse with Microsoft SQL Server

Data Integration Checklist

Modernizing Your Data Strategy

The Top Challenges in Big Data and Analytics

SAS Enterprise Data Integration Server - A Complete Solution Designed To Meet the Full Spectrum of Enterprise Data Integration Needs

Enabling Data Quality

ORACLE SUPPLIER MANAGEMENT: SUPPLIER HUB & SUPPLIER LIFECYCLE MANAGEMENT

DataFlux Web Studio 2.5. User s Guide

Transcription:

MASTER DATA MANAGEMENT IN THE AGE OF BIG DATA PRESENTED TO IRMAC MAY 15, 2013 STEVE PAPAGIANNIS STEVE.PAPAGIANNIS@SAS.COM 416 307 4620

DEFINITIONS WHAT ARE MASTER AND BIG DATA??? Master data is information that is key to the operation of a business. It is the primary focus of the Information Technology (IT) discipline of Master Data Management (MDM), and can include reference data. This key business information may include data about customers, products, employees, materials, suppliers, and the like. Big Data. When the volume, velocity and variety of data exceed an organization's storage or compute capacity for accurate and timely decision making. Source: SAS Website Source: Wikipedia

DATA MANAGEMENT MASTER DATA AND BIG DATA AS SUB-SPECIALTIES Master Data is the high value information that is used most frequently across an enterprise. Example: Customers, Members, Groups, Organizations, Patients, Vendors, Providers, Citizens, Employees, Services, Accounts, Materials, Locations, Assets, etc. Big Data is the low value density information generated by systems Example: Security Logs, Call Detail Records, Sensor Information

WHY NOW??? Master Data is getting richer Big Data is being heavily explored Costs to achieve this are dropping Hype cycle

DATA SIZE THRIVING IN THE BIG DATA ERA VOLUME VARIETY VELOCITY VALUE TODAY THE FUTURE

Hwy 410, Brampton, Acme Construction Data Profiling Metadata Discovery Business Rule Definition Entity Definition Highway 410, Peel Region Contract Center Scattered Data Hwy 410, Brampton, Peel Region, Kilometer 10, January, $75MM, Acme Construction, resurfacing Budgeting Engineering Highway 410, $75MM Data Integration Data Quality Data Model Business Services Stewardship Console Data Governance Identity Management Reporting Hwy 410, January, Kilometer 10-12, resurfacing

SURVIVING CONTRIBUTORS INTO MASTER RECORD CRM ConsumerID 30391-244 First Name William Middle James Last Name Crown DOB 04/12/39 SSN Address 563-49-1234123 Oak St., Eves, IL 30319 ERP Member ID 30391244 First Name William Middle J. Last Name Crowne DOB 4-12-39 SSN 563491234 Address 123 Oak St., Eves, IL ONLINE Person ID 14239 First Name Bubba Middle J. Last Name DOB April 12 SSN Address BubbaJ@bubbagroup.com DW ID 3721B First Name Willaim Middle James Last Name Corp. DOB April 12 SSN 56349123 Address 3224 Pkwy G, Los Osos SFA ID 30391-244 First Name William Middle James Last Name Crown DOB 04/12/1939 SSN Address 563-49-1234123 Oak St., Eves, IL 30319 EID Source Keys Survived Fields 1001 30391-244 30391244 14239 3721B 30391-244 William James Crowne 04/12/1939 563491234 123 Oak Street Eves CA 91403

EVOLVING FROM DM TO MDM Data Management Data Integration and/or Data Quality Initiative Enrich: Data Augmentation and Enrichment Survive: Entity Resolution & Surviving Record Analysis Persist: Create a Master data record Synchronize: Tie master data record into existing source systems Surface: Provide access to other systems in real time via SOA Master Data Management Full-blown MDM Initiative

SAMPLE MDM HIGH LEVEL PROJECT PLAN Phase 1: Define / Discover Project Definition, Requirements Assessment Installation Source Analysis/Profiling Define Entities, ETL, Relationships Data Connectivity Phase 2: Apply Data Quality Define and Implement DQ, Matching, Survivorship Rules Data Enrichment, Verification, Validation Phase 3: Perform Initial Load Stage Data, Load Hub Integrate with Source Systems Define and Implement Services Performance Testing Phase 4: QA / Knowledge Transfer UAT Build Reports UI Enhancements Support/ Knowledge Transfer

WHAT IS THE INTERSECTION? Big Data MDM 1. Event Stream Processing 2. High Performance Analytics 3. In Database Processing Governance

SAS INFORMATION MANAGEMENT SUPPORT FOR ENTIRE INFORMATION MANAGEMENT CONTINUUM Strategy STRATEGY & VISION TO EXECUTE Governance INFORMATION GOVERNANCE Capabilities DATA MANAGEMENT DECISION MANAGEMENT ANALYTICS MANAGEMENT Provides unified data management capabilities that include data governance, data integration, data quality and MDM Provides decision services that include business rules and workflow that facilitates integration of the information services into the business systems Provides complete analytics management that includes model management, deployment, monitoring and governance of the analytics information asset

DATA GOVERNANCE FRAMEWORK IGNORE AT YOUR PERIL! P R O G R A M O V E R S I G H T Corporate Drivers Business Framework Process & Policy Data Management Data Governance Execution Process

WHERE IS THE DATA MORE VARIETY THAN BEFORE

HOW IS IT PROCESSED MORE VARIETY THAN BEFORE

BUSINESS DATA NETWORK

BUSINESS DATA NETWORK Enables data governance Common language improves communication & supports compliance regulations Represent and expose business relationships Track history of changes Accountability and responsibility Document and communicate ownership Notify interested parties on changes Supports better collaboration Capture and share annotations between team members Greater understanding of the context of information Use and reuse of trusted information

USE CASE: BUSINESS AND IT COLLABORATION Achieve a common vocabulary between business & technical users Database = ORACLE Schema = NAACCT Table = DLYTRANS Column = TAXVL data type = Decimal(14,2) Derivation: SUM(TRNTXAMT) Business Data Network Category: Costs Term: Tax Value Description: Tax to be paid on Gross Income. (John Walsh is responsible for updates. 90% reliable source) Status: UNDER REVIEW

ENABLES DATA GOVERNANCE Common language improves communication & supports compliance regulations

ENABLES DATA GOVERNANCE Represent and expose business relationships

ENABLES DATA GOVERNANCE Track history of changes

ACCOUNTABILITY AND RESPONSIBILITY Document and communicate ownership Notify interested parties on changes

COLLABORATION Capture and share Notes between team members

COLLABORATION (CON T) Understand the context of information

DataFlux - Windows Internet Explorer DataFlux Relationship Data Diagram Management Business Web Data Studio Network Parent Term DataFlux Web Studio Filter User Account Link Report Tag Child Term Information Map Synonymous Term Parent Term Collection SAS Table 2 Domain Task Rule Profile Field Transformation 2 Transformation 1 Process Job Data Job DataFlux Table SAS Table Legend: Dependency Parent Child Inclusion Association Synonymous Equivalent

DataFlux - Windows Internet Explorer DataFlux Relationship Data Diagram Management Business Web Data Studio Network Parent Term DataFlux Web Studio Filter: Object types: Rules, Terms Grandparent Term Parent Term Rule Child Term Filter All object types 1 Collections Contacts Data jobs Domains Fields Links Process jobs Profiles Rules Tables Tags Tasks Terms All relationship types Dependency Parent Child Inclusion Association Synonymous Equivalent Legend: Dependency Parent Child Inclusion Association Synonymous Equivalent

EVENT STREAM PROCESSING

EVENT STREAM PROCESSING (ESP) ESP is a subcategory of Complex Event Processing (CEP) focused on analyzing/processing events in motion called Event Streams.* The SAS ESP is an embeddable engine that can be integrated into or front-end solutions. * This is the definition provided by the Event Processing Technical Society

Continuous queries on data in motion (with incrementally updated results) TYPICAL CHARACTERISTICS OF EVENT STREAM PROCESSING APPLICATIONS: Very low (max) event processing latencies (i.e., Usecs-msecs) High volumes (>100k events/sec) Derived event windows with retention policies Memory constrained for performance (i.e., Bounded state) Predetermined data mining, decision making, alerting, position management, scoring, profiling, Event out-of-order handling to ensure ordered source streams

EVENT STREAM PROCESSING (ESP) VS. RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS) ESP RDBMS ESPs store the queries and continuously stream data through the queries Databases store the data and periodically run queries against the stored data EVENTS INCREMENTAL RESULTS QUERIES RESULTS

EVENT STREAM PROCESSING WHAT PROBLEMS ARE WE TRYING TO SOLVE? Capture value otherwise lost through information lag Enable new opportunities through producing actionable intelligence with lower latencies Continuously analyze events as they occur Eliminate storage latencies Incrementally update intelligence as new events occur Enable new analysis & processing models to be developed and modified quickly to increase the opportunity windows and reduce costs.

ESP SAMPLE USE CASES Portfolio Equity Position Management Continuous Change Data Management Capital Markets Liquidity Portfolio positions of interest, e.g., market equity positions by account, trader, department, location, region, and venue. Get master data changes as they occur and provide a consolidated up-to-themoment view across silo systems. Example is a consolidated customer view for telco services across home services, wireless, Consolidated L1 & L2 order books for selected instruments across selected venues. This can be used for algorithmic trading, trade execution, dark pool analysis. Credit Card Fraud Prevention Telco Prepaid Call Authorization Maintain account-based usage signatures and known fraud signatures to enable credit card purchase scoring for authorization requests. Correlate call authorization requests to account & account call plans to determine the call duration based on current balance. Maintain account balances upon call completions.

CHANGE DATA MANAGEMENT USE CASE CONTINUOUS DATA INTEGRATION FOR SINGLE CUSTOMER VIEW ACROSS SILO SYSTEMS Event Stream Sources/ Publishers Sybase ASE Oracle Customer Broadband, Digital, Phone, TV Services Customer Wireless Services Event Stream Processing Server Normalized & Cleansed Home Services (join) Normalized & Cleansed Wireless Services (join) Holistic Customer Views (join) Data Flow Model: 1. Customer services information is captured from the various DBs of silo systems as a query snapshot followed by change log deltas. 2. Customer account maps are used to normalize the various account numbers across disparate systems. 3. The normalized service data is cleansed for accuracy and completeness. 4. All customer services are consolidated into one holistic view of each customer, which is used by Customer Care. Customer Support Cases Normalized & Cleansed Support Cases (join) MS SQL Server Customer Account Map Codes Consolidated Customer Views My SQL Customer Care

MDM WALK THROUGH

ENTITY TYPE SUPPORT Administrator Activity DataFlux qmdm provides multientity support. Many teams (admins, stewards, business users) can work together to define various elements related to each entity like attribute lists, match rules, data quality rules, security roles, and so on. Entities are not live in the system until they have been published in Master Data Manager and can t accept data from other systems until the jobs/services to support each entity type have been generated using Master Data Manager.

ENTITY TYPE INHERITANCE Administrator Activity Entity types support inheritance. This allows entities derived from others to inherit attributes, clustering rules, roles, and other properties from their parent entity. Inheritance will in some cases render properties read-only on the child entity if the property was inherited from a parent. Modifications would need to made at the higher level. This entity type inherits from the Party entity type. Entity types can also be designated as abstract, which means they can carry attributes, roles, and so on but they won t be used to store actual instance data.

CLUSTER CONDITIONS Administrator Activity In this administrative area, clustering conditions can be designed for each entity type. By virtue of inheritance, clustering rules can be accessed by entities that take others as their parent. Job generation functionality will take these rules and other properties defined for each entity and will create ready-touse data jobs for batch loads and updates, real-time queries, and real-time add, modify, and retire services. Entities will not be active until published by an administrator.

DEFINING RELATIONSHIP TYPES Administrator Activity Entities can be related to other entities through defined relationships. Here a relationship between companies and parts has been established. Notice the labels and descriptions. These will appear throughout Master Data Manager to show the kind of relationship that is being displayed. Relationships are defined by a set of match rules that link the entity instance data. Here, if the match code for the company name in one entity matches the supplier match code in the other, a relationship will be made.

Stewardship Activity SEARCH Master Data Manager provides several ways to find the specific instance data you are looking for. This can be done through workflow tasks or reports but typically searching is done in the Master Data area. Searches can be done on entities and hierarchies. You can use any number of search fields or use an advanced search feature for more granular control. Entities that inherit from a common parent will have search results displayed together on the same screen. Fields available for searching are configured by an administrator and can differ across entity types.

AUTHORING ENTITIES Stewardship Activity New entity instance data can be created directly from Master Data Manager. The fields available in the edit form are configured by an administrator for each entity type. Fields can be marked as required or read-only and they can be constrained by regular expressions to enforce data standards.

Stewardship Activity CLUSTER MEMBERS At the heart of DataFlux qmdm is the notion of match clusters. These represent the group of records from different sources that have been deemed to be representations of the same person, place or thing. Every match cluster has a best record that is constructed through business rules from the values held by one or more contributing records. Best records can be authored and edited using Master Data Manager. Cluster Members (Best Record + Contributors) Best Record (in bold) New Best Record (if saved)

Stewardship Activity CLUSTER COMPARE Cluster compare functionality allows data stewards to look across contributing records in a cluster and quickly see, through highlighted fields, what is different. Use these controls to cycle through contributing records in the right column. The left column shows your potential new best record. Like the main Cluster Members view, you can tag this entity for a workflow, create a new best record, retire the cluster, or split off contributors into new clusters. You can double-click field values on the right to populate your potential new best record on the left. Selecting this option will filter the entity attributes to just those that are different.

VIEWING HISTORICAL INFORMATION Stewardship Activity Every change to entity data is captured in the qmdm database. Modifications to best records result in regenerated best records and older versions are retired. Entire match clusters can also be retired, making the data inactive but available for queries of the system s history. In this view, previous versions of the best record are shown in gray while active information is shown normally. Users can toggle the history view by choosing to show or hide retired records. Use the Show Retired menu item to view the history of changes to the match cluster.

ENTITY PROPERTIES WITH RELATIONSHIPS Stewardship Activity Every relationship definition that involves an entity type will be shown as new windows in the entity Properties area. Master Data Manager will query for discovered relationships for the first relationship type but you ll have to expand each additional relationship window to see all of them. Discovered relationships will appear automatically; however, you can also manually add and retire relationships here.

ENTITY RELATIONSHIP DIAGRAM Stewardship Activity The relationship diagram is a visual representation of an entity and all of its relationships with other entities. The area at the right sets the action for expanding related items (it s configurable since some queries can take more than a few second to run). A double-click will expand the relationship diagram to the next set of related entities. The window at the bottom either shows attributes for the selected entity or, if a grouptype node is selected, all of the entities that comprise the group These can be selectively added to the diagram. Control+Click rotates the diagram. Shift+Click pans the diagram.

ENTITY HIERARCHIES Stewardship Activity DataFlux qmdm supports the concept of entity hierarchies. There is no limit on the number of unique hierarchies or entity types per hierarchy. Hierarchies relate entities at the best record level and respond accordingly to splits and merges of match clusters. Entities can participate in several hierarchies simultaneously and appear in the same hierarchy more than once outside of the same level.

Stewardship Activity MASTER DATA DASHBOARD The master data dashboard dynamically updates as new entity types are added to DataFlux qmdm. It shows summary statistics by source and entity type for: Record counts Volume growth Number of contributors and survivors Data analysis for sparsity and max/min values Source system contribution ratios Batch load history

Stewardship Activity CUSTOM REPORTS Dynamic and batch reports are supported through the Reports area. Both are specially designed DataFlux data jobs interpreted as reports by Master Data Manager. Dynamic reports can accept user input as parameters and the output can be linked directly to entity data elsewhere in Master Data Manager for easy access. Batch jobs can be initiated from this location and if their output contains an HTML file, that data can be displayed here.

Stewardship Activity WORKFLOW TASKS Master Data Manager supports the concept of workflows. Two are available with a basic installation: Use the Tag menu item to make a cluster ready for review. Tag this is an ad hoc workflow that can be used to raise awareness of issues to other qmdm users. It is enabled be default Entity Lifecycle this can be used to route new records and edits to existing clusters to a group of users for review. It is available but not enabled by default. Here an entity has been tagged for review.

Stewardship Activity WORKFLOW TASKS Once a user chooses to act on a workflow task (they have already chosen to accept the task in the Actions item), the area below the Status information will contain entity attributes that they can view and modify. The entity attributes values shown are what was captured at the time the workflow was created. The actual entity attributes in the system may have changed after the workflow task was initiated. If so, any submission of changes from this location will be integrated with changes made in the interim. The Notes field can be used to communicate issues to other team members.

Stewardship Activity WORKFLOW DASHBOARD The workflows dashboard provides a snapshot of workflow activity generated in DataFlux qmdm. Among the statistics it reports are the following: Workflows by status Workflows by priority Workflows by entity type Workflows by workflow type Workflows by submitter Workflows by modification date Workflows by start date

Stewardship Activity DATA EXPORT Data from most panels in Master Data Manager can be exported to CSV, PDF, or Excel formats.