Data Warehousing Metadata Management



Similar documents
Data Warehousing Metadata Management

Data-Warehouse-, Data-Mining- und OLAP-Technologien

A Model-based Software Architecture for XML Data and Metadata Integration in Data Warehouse Systems

Talend Metadata Manager. Reduce Risk and Friction in your Information Supply Chain

Model-Driven Data Warehousing

Information Management Metamodel

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

B.Sc (Computer Science) Database Management Systems UNIT-V

Practical meta data solutions for the large data warehouse

Establish and maintain Center of Excellence (CoE) around Data Architecture

MDM and Data Warehousing Complement Each Other

SQL Server 2012 Business Intelligence Boot Camp

SimCorp Solution Guide

The Oracle Enterprise Data Warehouse (EDW)

SQL Server 2012 End-to-End Business Intelligence Workshop

Advanced Analytic Dashboards at Lands End. Brenda Olson and John Kruk April 2004

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Service Oriented Data Management

Data warehouse and Business Intelligence Collateral

ETL-EXTRACT, TRANSFORM & LOAD TESTING

The Role of the BI Competency Center in Maximizing Organizational Performance

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

A Service-oriented Architecture for Business Intelligence

POLAR IT SERVICES. Business Intelligence Project Methodology

Java Metadata Interface and Data Warehousing

JOURNAL OF OBJECT TECHNOLOGY

INTEROPERABILITY IN DATA WAREHOUSES

Common Warehouse Metamodel (CWM): Extending UML for Data Warehousing and Business Intelligence

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

HYPERION MASTER DATA MANAGEMENT SOLUTIONS FOR IT

UNIVERSITÄT LEIPZIG. On Metadata Interoperability in Data Warehouses. Hong Hai Do, Erhard Rahm. Report Nr. 01(2000) ISSN

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Implementing Data Models and Reports with Microsoft SQL Server 20466C; 5 Days

Filtering the Web to Feed Data Warehouses

Business Intelligence In SAP Environments

Building a Data Warehouse

Integration and Reuse of Heterogeneous Information Hetero-Homogeneous Data Warehouse Modeling in the CWM

F4: DW Architecture and Lifecycle. Erik Perjons, DSV, SU/KTH Data warehouse

Bangkok, Thailand 22 May 2008, Thursday

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

Business Intelligence in Oracle Fusion Applications

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

EII - ETL - EAI What, Why, and How!

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Data Vault at work. Does Data Vault fulfill its promise? GDF SUEZ Energie Nederland

BUSINESSOBJECTS DATA INTEGRATOR

Turning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex,

SAS BI Course Content; Introduction to DWH / BI Concepts

Oracle Architecture, Concepts & Facilities

Oracle BI Applications (BI Apps) is a prebuilt business intelligence solution.

Introduction to Datawarehousing

Implementing Data Models and Reports with Microsoft SQL Server

CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University

BUSINESSOBJECTS DATA INTEGRATOR

ORACLE TAX ANALYTICS. The Solution. Oracle Tax Data Model KEY FEATURES

Foundations of Business Intelligence: Databases and Information Management

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Lection 3-4 WAREHOUSING

COURSE SYLLABUS COURSE TITLE:

Data Warehousing Systems: Foundations and Architectures

BUILDING OLAP TOOLS OVER LARGE DATABASES

College of Engineering, Technology, and Computer Science

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

EFFECTS OF BUSINESS INTELLIGENCE APPLICATION IN TOLLING SYSTEM

Data Warehousing and Data Mining

SAS Enterprise Data Integration Server - A Complete Solution Designed To Meet the Full Spectrum of Enterprise Data Integration Needs

CHAPTER SIX DATA. Business Intelligence The McGraw-Hill Companies, All Rights Reserved

Oracle Warehouse Builder 10g

Would-be system and database administrators. PREREQUISITES: At least 6 months experience with a Windows operating system.

Entwicklung von Integrationslösungen aus einem Guss mit AIA Foundation Pack 11g

SAP BusinessObjects Information Steward

Meta Data Management for Business Intelligence Solutions. IBM s Strategy. Data Management Solutions White Paper

Migrating a Discoverer System to Oracle Business Intelligence Enterprise Edition

The Microsoft Business Intelligence 2010 Stack Course 50511A; 5 Days, Instructor-led

Introduction to Oracle Business Intelligence Standard Edition One. Mike Donohue Senior Manager, Product Management Oracle Business Intelligence

Structure of the presentation

Business Intelligence and Healthcare

LEARNING SOLUTIONS website milner.com/learning phone

Best Practices for Deploying Managed Self-Service Analytics and Why Tableau and QlikView Fall Short

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

BIG DATA COURSE 1 DATA QUALITY STRATEGIES - CUSTOMIZED TRAINING OUTLINE. Prepared by:

Master Data Management Architecture

Deploying Governed Data Discovery to Centralized and Decentralized Teams. Why Tableau and QlikView fall short

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Data Integration Checklist

Common Warehouse Metamodel (CWM): Extending UML for Data Warehousing and Business Intelligence

Data Warehousing & Business Intelligence

MIS630 Data and Knowledge Management Course Syllabus

Reflections on Agile DW by a Business Analytics Practitioner. Werner Engelen Principal Business Analytics Architect

Data Warehousing Fundamentals for IT Professionals. 2nd Edition

Cúram Business Intelligence Reporting Developer Guide

End to End Microsoft BI with SQL 2008 R2 and SharePoint 2010

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Chapter 11 Mining Databases on the Web

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success

POWER OF INTELLIGENCE AGILE BUSINESS DECISION MAKING

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Transcription:

Data Warehousing Metadata Management Spring Term 2014 Dr. Andreas Geppert Credit Suisse geppert@acm.org Spring 2014 Slide 1

Outline of the Course Introduction DWH Architecture DWH-Design and multi-dimensional data models Extract, Transform, Load (ETL) Metadata Data Quality Analytic Applications and Business Intelligence Implementation and Performance Spring 2014 Slide 2

Content 1. Motivation 2. Metadata 3. Metadata Creation and Usage 4. Metadata Management System Architecture 5. Appendix: DWH Metadata Standards Spring 2014 Slide 3

Motivation Metadata management is the prerequisite for two major objectives: 1. Enable and minimize effort for the development and operations of a data warehouse 2. enable optimal and correct information extraction data analysis: requires known and agreed semantics of data uniform terminology data quality; quality definition, quality checks, correction rules, quality requirements Spring 2014 Slide 4

Metadata "data about data": data that describe other data Meaning/semantics, syntax, etc. "any kind of information that is needed for the design, development and use of an information system" (Bauer & Günzel) metadata management: gathering, storage, administration, and providing of metadata storage in an own information system -> Repository Spring 2014 Slide 5

Instance and Metadata Meta meta models Meta models define metadata schema concept "table", "attribute" Metadata (data / object model) represent schemas, types Type/class "Customer", attribute "Main Street" Instance data, object data represent elements of the UoD and their properties Customer "John Doe", address "Main Street" 3 2 1 0 meta meta models meta models metadata data Spring 2014 Slide 6

Metadata: Examples in the DWH-Architecture source data models, Data Integration, Historization Ownership Sources data models, quality rules Reusable Selection, Aggregation, Calculation Selection, Aggregation, Calculation analysis logic, calculations, terminology Reporting, OLAP, Data Mining Web/App Servers GUI transformations transformations Datenmodelle data models Landing Zone Staging Area staging structures Subject Matter Areas calculations Reusable Measures & Dimensions Areas Metadata Management Spring 2014 Slide 7

Requirements to Metadata Management Availability and existence metadata must exist (e.g., data models of sources and the DWH) Automated capture metadata must (whenever possible) be captured and updated in the course of DWH development documentation after the fact ("Nachdokumentation") development plus subsequent documentation in the metadata repository does not work, metadata and code become inconsistent very fast Integration metadata must be integrated and provide an end-to-end view relationships between metadata (of different types) must be captured, managed, and used for instance, transformation rules are not independent metadata, they describe the mapping of a schema onto another one without seamlessly integrated metadata it is impossible to support impact analysis and data lineage Spring 2014 Slide 8

Requirements to Metadata Management (2) User access metadata support for all users and user groups adequate user interfaces, languages, etc. tool support interoperability metadata exchange between different systems should be possible Spring 2014 Slide 9

Metadata: Classification: Creation and usage time design metadata schema definitions/data models, transformation rules, data quality rules, terms and definitions runtime metadata log files, statistics, quality check results usage metadata usage frequencies, access patterns Spring 2014 Slide 10

Metadata: Classification (2) User technical metadata used by developers, administrators (IT staff in general) data dictionaries, database schemas, transformation rule code business metadata definitions of terms, calculation formulas Type metadata about primary data metadata about data in source systems, data warehouse, data marts process metadata rules and transformations of ETL-processes logs, execution plans Spring 2014 Slide 11

Metadata: Classification (3) Origin tools schema designer, ETL-tool,... sources users Abstraction conceptual abstract description, implementation independent understandable for users logical description in formal language e.g., database schema, formulas physical implementation e.g. SQL-code Spring 2014 Slide 12

Content 1. Motivation 2. Metadata 3. Metadata Creation and Usage 4. Metadata Management System Architecture 5. Appendix: DWH Metadata Standards Spring 2014 Slide 13

Metadata Usage passive use: documentation of various aspects of a DWH. Used b< users, developers, administrators active use: interpretation of metadata through tools (transformation rules, quality rules) metadata-driven processes semi-active use: use of metadata through tools in order to check something (e.g., schema definitions) Spring 2014 Slide 14

Metadata Use: Build Time analysis application development process analyze info req derive data req determine reuse potential determine technology model data mart model mappings integration application development process analyze data req analyze source def. incom. interf. extend staging area model SMA model mappings define quality checks design SMA design ETL implement SMA implement ETL def. outgoing interfaces implement interfaces design logical schema design mappings implement data mart implement ETL process develop reports Spring 2014 Slide 15

Metadata Use: Build Time (2) analysis of required data: requirements source analysis: requirements, source metadata (data models) definition of "incoming" interfaces: interface contracts extension of staging area: logical data model modeling of SMA-structures: conceptual data model modeling of mappings: conceptual ETL-model definition of quality rules: quality rules logical modeling of SMA-structures, SMA- and ETL-implementation: logical and physical data and ETL-models, job nets definition of "outgoing" interfaces: interface contracts Spring 2014 Slide 16

Metadata Use: Build Time (3) Analysis of information requirements: requirements business metadata/glossaries, ownership Derivation of required data: requirements, analysis model identification of reuse potential of dimensions and measures: application portfolio business intelligence (application) strategy data mart modeling: conceptual data model, roles, access rules, data quality requirements and rules modeling of mappings: conceptual ETL-model logical modeling and implementation of data mart and transformations: logical and physical data and process models, job nets report development: semantic layer, business metadata/glossaries Spring 2014 Slide 17

End-to-end Metadata Use: Build Time impact analysis determines the impact of changes in one place to downstream components it thus requires an end-to-end view of metadata in particular data models and transformations must be complete development techniques such as generation or derivation of lower-level artifacts out of higher-level (more abstract) ones foster impact analysis development techniques that do not have an integral treatment of metadata compromise impact analysis (in particular, stored procedures) Spring 2014 Slide 18

End-to-end Metadata Use: Impact Analysis Data Sources Integration, Historization Reusable Selection, Aggregation, Calculation Selection, Aggregation, Calculation Reporting, OLAP, Data Mining Web/App Servers GUI 19% 25% 11% 10% 10% 9% 8% 1% 3% 4% Subject Matter Areas Reusable Measures & Dimensions Areas Landing Zone Staging Area Metadata Management Spring 2014 Slide 19

Metadata Use: Runtime metadata use depends on the kind of user access ad-hoc querying: user must first formulate query use metadata to determine the data to analyze and query (data models, semantic layer) possibly use metadata search engine to determine data to query reporting: users consume generated reports use metadata about reports (report metadata, semantic layer) OLAP: querying and analyzing multidimensional data structures data models, KPI definitions Spring 2014 Slide 20

Metadata Use: Runtime report on various aspects of the DWH (metadata reporting) DWH content: DWH data models, tables, table rows, transformations, reports,... DWH processing: report on DWH load processes (number of load jobs, number of feeder files processed, number of rows loaded, etc.) DWH usage: number of reports consumed, (number of) queries, n most complex queries, etc. DWH security: number of users, roles, privileges, number of logins, number of failed login attempts,... data quality reports: number of DQ rules, checks, and DQ-issues DWH performance: database size, response times, storage and server utilization,... Spring 2014 Slide 21

Content 1. Motivation 2. Metadata 3. Metadata Creation and Usage 4. Metadata Management System Architecture 5. Appendix: DWH Metadata Standards Spring 2014 Slide 22

Metadata Management Architecture: Centralized single, centralized metadata management (system) usually only feasible if all components are form the same vendor (vs. best of breed) Entwicklungswerkzeug Administrationswerkzeug ETL-Werkzeug Analysewerkzeug Metadatenmanager Rep. Spring 2014 Slide 23

Metadata Management Architecture: decentralized multiple metadata management systems (e.g. tool-specific) bilateral metadata exchange (max. n*(n-1)/2 interfaces) Entwicklungswerkzeug Administrationswerkzeug ETL-Werkzeug Analysewerkzeug MDM MDM MDM MDM Rep. Rep. Rep. Rep. Spring 2014 Slide 24

Metadata Management Architecture: DWH-like "DWH-approach": autonomous metadata management systems and repositories are responsible for local tasks metadata are integrated in "global" metadata DWH, which is responsible for all tasks requiring integrated metadata Entwicklungswerkzeug Administrationswerkzeug ETL-Werkzeug Analysewerkzeug MDM MDM MDM MDM Rep. Rep. Rep. Rep. Metadatenmanager Rep. Spring 2014 Slide 25

Metadata Management Architecture: DWH-like Metadata Integration, Historization Reusable Selection, Aggregation, Calculation Selection, Aggregation, Calculation Metadata Reporting, OLAP, Data Mining Web/App Servers GUI Landing Zone Staging Area Design/Daten modellierungs- Tool, ETL-Tool, DBS Subject Matter Areas Datenmodelle ETL Datenqualität Business Terms Reusable Measures & Dimensions Areas Data Sources #data models; data model Metadata mismatches; Management stability Integration, Historization Reusable Selection, Aggregation, Calculation Selection, Aggregation, Calculation metadata reports and analysis Reporting, OLAP, Data Mining Web/App Servers GUI Landing Zone Staging Area Spring 2014 Slide 26

Example Source Metadata: Schema of a DB-Repository database catalog contains database metadata schema of the catalog defines metadata structures (meta meta model) DB2: schema syscat, contains ~ 70 catalog views Oracle: schema user sys, owns > 3000 views constraint table column trigger index Spring 2014 Slide 27

Metadata about Metadata: Schema of a Repository s. Marco 2000 Spring 2014 Slide 28

Summary Metadata are created and used in all phases and layers of data warehousing Complete and adequate metadata management is a success factor Integrated metadata management can be a challenge in a heterogeneous environment (i.e., best-of-breed) Spring 2014 Slide 29

Content 1. Motivation 2. Metadata 3. Metadata Creation and Usage 4. Metadata Management System Architecture 5. Appendix: DWH Metadata Standards Spring 2014 Slide 30

Metadatenstandards Open Information Model (OIM) Version 1.0 definiert in 1999 durch Metadata Coalition (MDC) Common Warehouse Model (CWM) erste Version definiert in 1999 durch Object Management Group (OMG) einfacher Austausch von DWH-Metadaten zwischen Werkzeugen und Repositorien Modularität, so dass auch nur relevante Teile des Models implementiert werden können Spring 2014 Slide 31

CWM: Struktur CWM erlaubt Repräsentation von Metadaten über... Quellen, Targets, und Transformationen Analysen Prozesse und Operationen, die Warehouse-Daten erzeugen und verwalten sowie Lineage der Verwendung erlauben CWM basiert auf UML weitgehende Wiederverwendung des Object Models (Teil von UML) CWM verwendet UML-Packages und eine hierarchische Package- Struktur aus Gründen der Komplexität, Verständnis und Wiederverwendbarkeit Spring 2014 Slide 32

CWM: Struktur jedes Paket kann Pakete der gleichen Schicht oder der unteren Schichten referenzieren Spring 2014 Slide 33

CWM: Layer "Foundation" "Foundation" bietet CWM-spezifische Dienste für andere Packages auf höheren Schichten Data Types: Klassen und Assoziationen für die Definition von Datentypen Expressions: K&A für die Repräsentation von Ausdrucksbäumen Keys and Indexes: K&A, die Schlüssel und Indexe repräsentieren Software Deployment: K&A, mit denen repräsentiert werden kann, wie Software in einem DWH "deployed" wird Type Mapping: K&A für die Abbildung von Datentypen zwischen verschiedenen Systemen Spring 2014 Slide 34

CWM: Layer "Ressourcen" Relational: Metadaten relationaler Systeme Record: Metadaten satzorientierter Systeme Multidimensional: Metadaten multidimensionaler Systeme XML: Metadaten von XML-Daten Spring 2014 Slide 35

CWM: Layer "Analysis" Transformation: Metadaten über Transformationen (aus Transformationswerkzeugen) OLAP: Metadaten aus OLAP-Werkzeugen Data Mining: Metadaten aus Data Mining-Werkzeugen Information Visualization: Metadaten aus Werkzeugen für die Informationsvisualisierung Business Nomenclature: Metadaten über Business-Taxonomien und -Glossare Spring 2014 Slide 36

CWM: Layer "Management" Warehouse Process: Metadaten über DWH-Prozesse Warehouse Operation: Metadaten über DWH-Betrieb (Ergebnisse) Spring 2014 Slide 37

CWM: "Multidimensional" Package generische Repräsentation einer multidimensionalen Datenbank Spring 2014 Slide 38

CWM: "OLAP" Package generische Repräsentation von OLAP-Konzepten Spring 2014 Slide 39