Trends in Data Warehouse Data Modeling: Data Vault and Anchor Modeling



Similar documents
Data Warehousing Systems: Foundations and Architectures

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

Trivadis White Paper. Comparison of Data Modeling Methods for a Core Data Warehouse. Dani Schnider Adriano Martino Maren Eschermann

IST722 Data Warehousing

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

Data Vault and The Truth about the Enterprise Data Warehouse

Data Vault at work. Does Data Vault fulfill its promise? GDF SUEZ Energie Nederland

Understanding Data Warehousing. [by Alex Kriegel]

Data Warehouse Overview. Srini Rengarajan

SENG 520, Experience with a high-level programming language. (304) , Jeff.Edgell@comcast.net

Introduction to Data Vault Modeling

SAS BI Course Content; Introduction to DWH / BI Concepts

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

Designing a Dimensional Model

Pentaho Kettle Solutions. Building Open Source ETL Solutions with Pentaho Data Integration

Presented by: Jose Chinchilla, MCITP

East Asia Network Sdn Bhd

Microsoft Data Warehouse in Depth

Part 22. Data Warehousing

A Service-oriented Architecture for Business Intelligence

Data Warehousing and Data Mining

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Lection 3-4 WAREHOUSING

Establish and maintain Center of Excellence (CoE) around Data Architecture

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

Master Data Management. Zahra Mansoori

MIS636 AWS Data Warehousing and Business Intelligence Course Syllabus

Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Implementing a Data Warehouse with Microsoft SQL Server

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

B. 3 essay questions. Samples of potential questions are available in part IV. This list is not exhaustive it is just a sample.

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Master Data Management and Data Warehousing. Zahra Mansoori

Data Testing on Business Intelligence & Data Warehouse Projects

SAS Business Intelligence Online Training

Data Warehousing with Oracle

資 料 倉 儲 (Data Warehousing)

Chapter 5. Learning Objectives. DW Development and ETL

Building an Effective Data Warehouse Architecture James Serra

Implementing a Data Warehouse with Microsoft SQL Server

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

MDM and Data Warehousing Complement Each Other

Implementing a Data Warehouse with Microsoft SQL Server MOC 20463

COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Republic Polytechnic School of Information and Communications Technology C355 Business Intelligence. Module Curriculum

Traditional BI vs. Business Data Lake A comparison

When to consider OLAP?

Research on Airport Data Warehouse Architecture

6NF CONCEPTUAL MODELS AND DATA WAREHOUSING 2.0

Data Warehouse Modeling Industry Models

Class News. Basic Elements of the Data Warehouse" 1/22/13. CSPP 53017: Data Warehousing Winter 2013" Lecture 2" Svetlozar Nestorov" "

Implementing a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance

THE TECHNOLOGY OF USING A DATA WAREHOUSE TO SUPPORT DECISION-MAKING IN HEALTH CARE

Microsoft. Course 20463C: Implementing a Data Warehouse with Microsoft SQL Server

Data Search. Searching and Finding information in Unstructured and Structured Data Sources

Week 3 lecture slides

The Data Warehouse ETL Toolkit

Dimensional Modeling and E-R Modeling In. Joseph M. Firestone, Ph.D. White Paper No. Eight. June 22, 1998

Data Warehouse: Introduction

Data Vault & Pentaho in Healthcare. Kasper de Graaf, Aly Hollander

THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days

POLAR IT SERVICES. Business Intelligence Project Methodology

Course Outline. Module 1: Introduction to Data Warehousing

Monitoring Genebanks using Datamarts based in an Open Source Tool

What is Management Reporting from a Data Warehouse and What Does It Have to Do with Institutional Research?

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8

Data warehouses. Data Mining. Abraham Otero. Data Mining. Agenda

Implementing a Data Warehouse with Microsoft SQL Server 2012

Data warehouse Architectures and processes

Course 20463:Implementing a Data Warehouse with Microsoft SQL Server

70-467: Designing Business Intelligence Solutions with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

Dimensional Modeling for Data Warehouse

SQL Server 2012 Business Intelligence Boot Camp

Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463)

Data Warehouse (DW) Maturity Assessment Questionnaire

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

Data Warehousing and Data Mining in Business Applications

Reflections on Agile DW by a Business Analytics Practitioner. Werner Engelen Principal Business Analytics Architect

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project

Data Warehousing. SQL Server 2008 R2, Denali

Data Warehousing Concepts

14. Data Warehousing & Data Mining

INFORMATION TECHNOLOGY STANDARD

Data Warehousing and Decision Support. Torben Bach Pedersen Department of Computer Science Aalborg University

OBIEE 11g Data Modeling Best Practices

Implementing a Data Warehouse with Microsoft SQL Server 2012

Transcription:

Trends in Data Warehouse Data Modeling: Data Vault and Anchor Modeling

Thanks for Attending! Roland Bouman, Leiden the Netherlands MySQL AB, Sun, Strukton, Pentaho (1 nov) Web- and Business Intelligence Developer author: Pentaho Solutions Pentaho Kettle Solutions Http://rpbouman.blogspot.com/ Twitter: @rolandbouman

Data Warehouse (DWH) Support Business Intelligence (BI) Reporting Analysis Data mining General Requirements Integrate disparate data sources Maintain History Calculate Derived data Data delivery to BI applications

DWH Architectures Categories Traditional Hybrid Modern Aspects Modelling Data logistics

DWH Architectures Traditional Information Factory (Bill Inmon) Enterprise Bus (Ralph Kimball) Hybrid Modern

DWH Architectures Traditional Hybrid Modern Hub-and-Spoke

DWH Architectures Traditional Hybrid Modern Data Vault (Dan Linstedt) Anchor Modeling (Lars Rönnbäck)

Inmon DWH (Traditional): Corporate Information Factory A source of data that is subject oriented, integrated, nonvolatile and time variant for the purpose of management's decision processes. Bill Inmon (the Data Warehouse Toolkit) http://www.inmoncif.com/home/

Inmon DWH (Traditional): Corporate Information Factory Enterprise or Corporate DWH, DWH 2.0 Focus on backroom data integration Central information model Single version of the truth Data delivery Disposable data marts Bottom-up

Data logistics of the Corporate Information Factory OLTP DB Files Staging Extract Transform Load Enterprise Data Warehouse Extract Transform Load OLAP DB Cube Source Data Marts BI Apps

Data Modeling for the CIF Enterprise DWH Normalized, typically 3NF Organized in subject areas Series of related tables Example: Customer, Product, Transaction Common key

Data Modeling for the CIF Enterprise DWH History PK includes a date/timepart Contains both detail and aggregate data Multiple levels of aggregation

Kimball DWH (Traditional): Dimensional Model and DWH Bus Architecture The data warehouse is the conglomeration of an organization's staging and presentation areas, where operational data is specifically structured for query and analysis performance and ease of use. Ralph Kimball (the Data Warehouse Toolkit) http://www.kimballgroup.com/

Kimball DWH (Traditional): DWH Bus Architecture Focus on data delivery Integration at the data mart level Top-down

Data logistics of the DWH Bus Architecture OLTP DB Files Staging Extract Transform Load (Enterprise) Data Warehouse OLAP DB Cube Source EDW is a collection Data Marts BI Apps

Data Modeling for the DWH Bus Architecture Dimensional Modeling Star schemas Organized in: Fact tables Dimension tables

Data Modeling for the DWH Bus Architecture Fact tables Highly normalized Additive metrics Dimension tables Highly denormalized Descriptive labels Shared across fact tables

Data Modeling for the DWH Bus Architecture History Slowly changing dimensions (versioning) Fact links to Date and/or Time dimensions Detailed, not aggregated

Sakila Rental Star Schema

Sakila DWH Bus Architecture dim_film dim_date fact_inventory fact_rental fact_payment dim_store dim_staff dim_customer

Problems with traditional DWH architectures General Problems Lack of flexibility and resilience to change Loading (ETL) Complexity Problems with Inmon Centralization requires upfront investment Single version of whose truth, when? Problems with Kimball Dimensional Model anomalies

Dimensional Modeling Anomalies Snowflaking (dimension normalization) Monster dimensions Outriggers Ex: Customer Demographics Hierarchical data Bridge table (closure table) Ex: Employee/Boss, Multi-valued dimensions Bridge table Ex: Account/Customer bridge table

Hybrid DWH: Hub-and-Spoke Inmon back-end (hub) Kimball front-end (satellites)

Modern: Data Vault The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that supports one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. Dan Linstedt (Data Vault Overview) http://danlinstedt.com/

Data Vault Focus on Data Integration Traceability and Auditability Resilience to change Single version of the facts Rather than single version of the truth All of the data, all of the time No upfront cleansing and conforming Bottom-up

Data Vault Modelling Hubs Links Satellites

Data Vault Modelling: Hubs Hubs Model Entities Contains business keys PK in absence of surrogate key Metadata: Record source Load date/time Optional surrogate key Used as PK if present No foreign keys!

Data Vault Modelling: Links Links model relationships Intersection table (M:n relationship) Foreign keys to related hubs or links Form natural key (business key) of the link Metadata: Record source Load date/time Optional surrogate key

Data Vault Modelling: Satellites Satellites model a group of attributes Foreign key to a Hub or Link Metadata: Record source Load date/time

Sakila Data Vault Example

Data Vault tools and Example Kettle Data Vault Example Sakila Data Vault Chapter 19 Kasper van de Graaf http://www.dikw-academy.nl Quipu Data Vault Generator Kettle templates Johannes van den Bosch http://www.datawarehousemanagement.org/

Modern: Anchor model Anchor Modeling is an agile information modeling technique that offers nondestructive extensibility mechanisms enabling robust and flexible management of changes. A key benefit of Anchor Modeling is that changes in a data warehouse environment only require extensions, not modications. Lars Rönnbäck (Agile Information Modeling in Evolving Data Environments) http://www.anchormodeling.com/

Anchor Modelling Focus on Resilience to change Agility Extensibility History tracking Bottom-up

Anchor Modelling 6NF (Date, Darwen, Lorentzos) Table features no non-trivial join dependencies at all Translation: A 6NF table cannot be decomposed losslessly Translation Temporal Data

Anchor Modelling Constructs Anchors Attributes Ties Knots

Anchor Modelling: Anchors Entities are modeled as Anchors Relationships may be modeled as Anchors m:n relationships having properties Only a surrogate key

Anchor Modelling: Ties Ties model relationships 1:n relationships m:n relationships without properties Static vs Historized History tracked using date/time May be Knotted Knot holds set of association types Two or more anchor roles Relationships may be broken into several ties having only mandatory anchors

Anchor Modelling: Attributes Models properties of an Anchor Static vs Historized History tracked using date/time May or not be Knotted Knot holds set of valid attribute values

Anchor Modelling: Knots Reference table Fairly small set of distinct values Dictionary lookup to qualify Attributes Ties Knotted Attributes and Ties

Anchor Model Diagram anchor knot Static attribute Historized attribute Static tie Historized tie http://www.anchormodeling.com/modeler/latest/

Aknowledgements Kasper de Graaf Twitter: @kdgraaf http://www.dikw-academy.nl Jos van Dongen Twitter: @josvandongen http://www.tholis.com/