Data Vault Modeling The Next Generation DW Approach



Similar documents
Business Analytics For All

Trends in Data Warehouse Data Modeling: Data Vault and Anchor Modeling

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Modeling: Operational, Data Warehousing & Data Marts

Main Memory Data Warehouses

Understanding Data Warehousing. [by Alex Kriegel]

Data Warehouse: Introduction

Data Vault and The Truth about the Enterprise Data Warehouse

Data Warehouse Overview. Srini Rengarajan

Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov

Trivadis White Paper. Comparison of Data Modeling Methods for a Core Data Warehouse. Dani Schnider Adriano Martino Maren Eschermann

Lection 3-4 WAREHOUSING

Tiber Solutions. Understanding the Current & Future Landscape of BI and Data Storage. Jim Hadley

IST722 Data Warehousing

EMC CUSTOMER UPDATE. 31 mei 2011 Fort Voordorp. Bart Sjerps. Greenplum Data Warehouse. Copyright 2011 EMC Corporation. All rights reserved.

2009 Oracle Corporation 1

SQL Server PDW. Artur Vieira Premier Field Engineer

Introduction to Data Warehousing. Ms Swapnil Shrivastava

Introduction to Data Vault Modeling

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Indexing Techniques for Data Warehouses Queries. Abstract

Big Data Technologies Compared June 2014

Data Vault at work. Does Data Vault fulfill its promise? GDF SUEZ Energie Nederland

Dimensional Modeling 101. Presented by: Michael Davis CEO OmegaSoft,LLC

Getting Value from Big Data with Analytics

Data Warehousing Systems: Foundations and Architectures

Dimensional Modeling and E-R Modeling In. Joseph M. Firestone, Ph.D. White Paper No. Eight. June 22, 1998

Data Search. Searching and Finding information in Unstructured and Structured Data Sources

Data Vault Modeling in a Day

Master Data Management and Data Warehousing. Zahra Mansoori

SAP Real-time Data Platform. April 2013

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

<Insert Picture Here> Oracle Exadata Database Machine Overview

How To Build An Exadata Database Machine X2-8 Full Rack For A Large Database Server

Building an Effective Data Warehouse Architecture James Serra

Concept and Applications of Data Mining. Week 1

INFORMATICA WORLD TOUR August 2012

Big Data and Its Impact on the Data Warehousing Architecture

Focus on the business, not the business of data warehousing!

Is Business Intelligence an Oxymoron?

Presented by: Jose Chinchilla, MCITP

Inge Os Sales Consulting Manager Oracle Norway

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Data Warehouse (DW) Maturity Assessment Questionnaire

Endeca Introduction to Big Data Analytics

Data warehousing with PostgreSQL

Data Warehousing Fundamentals for IT Professionals. 2nd Edition

Why Business Intelligence

Oracle Database 12c Plug In. Switch On. Get SMART.

Implementing a SQL Data Warehouse 2016

Business Intelligence: How to Unleash the Real Power of Microsoft s Fast Track and PDW Appliances

Designing a Dimensional Model

<Insert Picture Here> Real-Time Data Integration for BI and Data Warehousing

Introducing Oracle Exalytics In-Memory Machine

Business Intelligence. 1. Introduction September, 2013.

Investigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

Introduction to Database as a Service

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi

Data Warehousing with Oracle

Building a Cube in Analysis Services Step by Step and Best Practices

MICHAEL SCHMITZ NOVEMBER 20-22, 2006 NOVEMBER 23-24, 2006 RESIDENZA DI RIPETTA - VIA DI RIPETTA, 231 ROME (ITALY)

FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.

BIG DATA APPLIANCES. July 23, TDWI. R Sathyanarayana. Enterprise Information Management & Analytics Practice EMC Consulting

BUILDING OLAP TOOLS OVER LARGE DATABASES

Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance

Master Data Management. Zahra Mansoori

Real Life Performance of In-Memory Database Systems for BI

SUN ORACLE DATABASE MACHINE

BIPM H6001: Bus Intel & Process Modelling

資 料 倉 儲 (Data Warehousing)

How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW

Real-Time Data Warehousing & Fraud Detection with Oracle 11gR2. Dr.-Ing. Holger Friedrich

Cost-Effective Business Intelligence with Red Hat and Open Source

BIG DATA & the Data Warehouse

IBM WebSphere DataStage Online training from Yes-M Systems

Data warehouse Architectures and processes

Data Warehousing and Data Mining

The Technology Evaluator s Cheat Sheets. Business Intelligence & Analy:cs

Driving Peak Performance IBM Corporation

Exadata in the Retail Sector

Data Warehouse Modeling Industry Models

Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture

Dimensional Modeling for Data Warehouse

Business Intelligence, Data warehousing Concept and artifacts

Oracle Architecture, Concepts & Facilities

Life Cycle of a Data Warehousing Project in Healthcare

Lecture Data Warehouse Systems

The Data Webhouse. Toolkit. Building the Web-Enabled Data Warehouse WILEY COMPUTER PUBLISHING

Data Warehousing. SQL Server 2008 R2, Denali

Aaron Werman.

Establish and maintain Center of Excellence (CoE) around Data Architecture

Data warehouse design

<Insert Picture Here> Best Practices for Extreme Performance with Data Warehousing on Oracle Database

Introducing Oracle Data Integrator and Oracle GoldenGate Marco Ragogna

2010 Ingres Corporation. Interactive BI for Large Data Volumes Silicon India BI Conference, 2011, Mumbai Vivek Bhatnagar, Ingres Corporation

Data Warehouse Architecture Overview

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

Transcription:

Data Vault Modeling The Next Generation DW Approach Presented by: Siva Govindarajan http://www.globytes.com siva02@globytes.com 1

Data Vault Modeling Agenda Introduction Data Vault Place in Evolution Data Warehousing Architecture Data Vault Components Case Study Typical 3NF / Star Schema Data Vault Approach Temporal Data + Auditability Implementation of Data Vault Conclusion 2

Introduction Data Vault concept originally conceived by Dan Linstedt Enterprise Data Warehouse Modeling approach. Hybrid approach - 3NF & Star Schema Adaptable to changes Auditable Timed right with Technology 3

Data Vault Place in Evolution 1960s - Codd, Date et. al Normal Forms 1980s - Normal Forms adapted to DWs 1985+ - Star Schema for OLAP 1990s - Data Vault concept developed Dan Linstedt 2000+ - Data Vault concept published by Dan Linstedt 4

Data Warehousing Architecture Bill Inmon s Data Warehouse Architecture OLTP Systems Staging & ODS Enterprise DW Staging (optional) Data Mart Data Mart Data Marts Data Marts Adaptive Marts Kimball s Data Warehouse Bus Architecture Data Mart Data Mart OLTP Systems Staging Data Mart Data Mart 5

Data Vault Components Hub Entities Candidate Keys + Load Time + Source Link Entities FKs from Hub + Load Time + Source Satellite Entities Descriptive Data + Load Time + Source + End Time Dimension = Hub+ Satellite Fact = Satellite + Link [+ Hub ] 6

Case Study Modeling Approaches Typical 3NF / Star Schema Data Vault approach Scenario Typical Fact and Dimension New Dimension data Reference Data change - 1:M to M:M Additional attributes to Dimension 7

Typical 3 F / Star Schema 1. Typical Fact and Dimension TR ORDER tr_order_id dim_product_id system order id system root order id global root order id order quantity total executed qty event timestamp DIM PRODUCT dim_product_id product_id product_code product desc product_name le ngth 8

Typical 3 F / Star Schema 2. ew Dimension Data TR ORDER tr_order_id dim_product_id system order id system root order id global root order id order quantity total executed qty event timestamp DIM PRODUCT dim_product_id product_id product_code product desc product_name product_category_id product_category_name product_category_code product_category_desc is_strategy REF PRODUCT CAT EGORY product_category_id product_category_name product_category_code product_category_desc is_strategy 9

Typical 3 F / Star Schema 2. ew Dimension Data TR ORDER tr_order_id dim_product_id system order id system root order id global root order id order quantity total executed qty event timestamp d im_p ro d uct_ca te g o ry_id DIM PRODUCT dim_product_id product_id product_code product desc product_name le ng th DIM PRODUCT CAT EGORY d im_p ro d uct_ca te g o ry_id product_category_code product_category_name product_category_desc is_strategy namespace REF PRODUCT CAT EGORY product_category_id product_category_name product_category_code product_category_desc is_strategy 10

Typical 3 F / Star Schema 3. Reference 1:M to M:M Change TR ORDER tr_order_id dim_product_id system order id system root order id global root order id order quantity total executed qty event timestamp d im_p ro d uct_ca te g o ry_id order owner firm id? DIM PRODUCT dim_product_id product_id product_code product desc product_name le ng th DIM PRODUCT CAT EGORY d im_p ro d uct_ca te g o ry_id product_category_code product_category_name product_category_desc is_strategy namespace? REF PRODUCT CAT EGORY product_category_id product_category_name product_category_code product_category_d esc is_strategy 11

Typical 3 F / Star Schema 4. Additional attributes to Dimension TR ORDER tr_order_id dim_product_id system order id system root order id global root order id order quantity total executed qty event timestamp dim_p rod uct_ca teg ory_id order owner firm id DIM PRODUCT dim_product_id product_id product_code product desc product_name le ng th he ig ht we ig ht p a cka ging co lo r size 12

Data Vault Approach 1. Typical Hub, Satellite and Link (Replacing Fact/Dimension) LNK_ORDER lnk_o rd e r_id lnk_lo a d _d a te lnk_re co rd _so urce hub _p ro d uct_id hub _o rd e r_id HUB_PRODUCT hub_p ro duct_id hub_re co rd _so urce hub_lo a d _d a te product_id product_code HUB_ORDER hub _o rd e r_id hub _reco rd_so urce hub _lo a d _da te tr_order_id system order id system root order id global root order id SAT _PRODUCT 01 sa t_pro d uct01_id hub _p ro d uct_id sa t_lo a d _d a te sa t_end _da te sa t_re co rd _so urce product desc product_name symbol SAT ORDER01 sa t_o rd er01_id hub _o rd e r_id sa t_lo a d _d a te sa t_e nd_d a te sa t_re co rd _so urce order quantity total executed qty event timestamp 13

Data Vault Approach 2. ew Dimension Data LNK_ORDER lnk_order_id lnk_load_d ate lnk_reco rd_source hub_pro duct_id hub_ord er_id LNK_PRODUCT _X_CAT EGORY lnk_p rod uct_x_ca te gory_id lnk_loa d_date lnk_record _source hub_product_id hub_ca te gory_id HUB_PRODUCT hub_product_id hub_reco rd_source hub_load_d ate product_id product_code HUB_ORDER hub _o rde r_id hub _record_source hub _loa d_date tr_order_id system order id system root order id global root order id HUB_CAT EGORY hub_categ ory_id hub_record_source hub_lo ad_date product_category_id product_category_code SAT_PRODUCT01 sat_product01_id hub_pro duct_id sat_load_da te sat_end_date sat_re cord_so urce product desc product_name symbol SAT ORDER01 sat_o rd er01_id hub _o rde r_id sat_lo ad_date sat_e nd _d ate sat_record_source order quantity total executed qty event timestamp SAT _CAT EGORY01 sa t_category01_id hub_ca te gory_id sa t_load_da te sa t_end_date sa t_re co rd_source product_category_name product_category_desc is_strategy 14

LNK_ORDER lnk_order_id lnk_load_d ate lnk_reco rd_source hub_pro duct_id hub_ord er_id Data Vault Approach 3. Reference 1:M to M:M Change HUB_PRODUCT hub_product_id hub_reco rd_source hub_load_d ate product_id product_code HUB_ORDER hub _o rde r_id hub _record_source hub _loa d_date tr_order_id system order id system root order id global root order id SAT_PRODUCT01 sat_product01_id hub_pro duct_id sat_load_da te sat_end_date sat_re cord_so urce product desc product_name symbol SAT ORDER01 sat_o rd er01_id hub _o rde r_id sat_lo ad_date sat_e nd _d ate sat_record_source order quantity total executed qty event timestamp LNK_PRODUCT _X_CAT EGORY lnk_p rod uct_x_ca te gory_id lnk_loa d_date lnk_record _source hub_product_id hub_ca te gory_id HUB_CAT EGORY hub_categ ory_id hub_record_source hub_lo ad_date product_category_id product_category_code SAT _CAT EGORY01 sa t_category01_id hub_ca te gory_id sa t_load_da te sa t_end_date sa t_re co rd_source product_category_name product_category_desc is_strategy 15

Data Vault Approach 4. Additional attributes to Dimension SAT_PRODUCT01 sat_product01_id HUB_PRODUCT hub _p roduct_id hub _re co rd _source hub _lo ad_date product_id product_code hub _product_id sat_load_date sat_end _d ate sat_re cord_source product desc product_name SAT _PRODUCT 02 sat_pro duct02_id hub_product_id sat_load_d ate sat_end _date sat_record_so urce length height weight packaging color 16

Temporal Data + Auditability SAT _PRODUCT01 sa t_p ro duct01_id HUB_PRODUCT hub_prod uct_id sa t_loa d_da te sa t_e nd_d ate sa t_record_source product desc product_name hub_p ro duct_id hub_record_source hub_loa d_da te product_id product_code SAT _PRODUCT 02 sa t_p ro duct02_id hub_prod uct_id sa t_loa d_da te sa t_e nd_d ate sa t_record_source length height weight packaging color LNK_PRODUCT_X_CATEGORY lnk_prod uct_x_catego ry_id lnk_load_date lnk_record_source hub _product_id hub _catego ry_id LNK_ORDER lnk_o rd er_id lnk_load _date lnk_reco rd _source hub_orde r_id hub_prod uct_id hub_categ ory_id HUB_ORDER hub_orde r_id hub_reco rd _source hub_load _date tr_order_id system order id system root order id global root order id SAT ORDER01 sa t_o rd er01_id hub_orde r_id sa t_loa d_da te sa t_e nd_d ate sa t_record_source order quantity total executed qty event timestamp source_system SAT _CAT EGORY01 HUB_CATEGORY sat_catego ry01_id hub_cate gory_id hub_record_source hub_loa d_da te product_category_id product_category_code hub_cate gory_id sat_lo ad_d ate sat_end _date sat_record_source product_category_name product_category_desc is_strategy 17

Implementation of Data Vault Make it Simple View for Dimension View For Fact Data Loading Take advantage of Technology DW Appliances High-throughput storage devices RDBMS Features 18

Make it Simple Create views for Dimension From our Demo model, Join Product related Hubs and Satellites to build View for Product Dimension as shown below : HUB_PRODUCT SAT_PRODUCT01 SAT_PRODUCT02 19

Make it Simple Create Views For Fact Join Order related Hubs, Satellites and Links to build View for ORDER Fact using ORDER related Hubs/ Satellites and Links. HUB_ORDER SAT_ORDER01 LNK_ORDER 20

Data Loading Typically data loading for a Data Vault is in the following sequence. Hubs for Dimensions Links for Dimensions Satellites for Dimensions Hubs for Fact (if any ) Links for Fact Satellites for Fact Refer to Data Vault Series article 5 of Linstedt for further details 21

DW Appliances Next wave of IT Data Warehouse hardware solutions are the Data Warehouse Appliances. Take advantage of the technology where possible. Few DW Appliances in the market : Oracle Exadata Netezza Teradata RedBrick GreenPlum 22

High-throughput storage devices Utilize Storage devices designed for DW / OLAP applications. Following are some examples only and would change over time. Please do your research based on your sizing requirements. EMC CLARiiON Storage family Texas Memory systems RamSan Solid State devices Hitachi USP / VSP Storage family Other Hybrid solutions with High performance storage and Solid State devices 23

RDBMS Features Some Oracle Features catering DW / OLAP applications: Exadata Smart Scans OLAP Based Materialized views Partitioning Reference partition Advanced data compression Automatic degree of parallelism (ADOP). Star query optimization Oracle OLAP/DWH Features. 24

Conclusion In this presentation we have addressed high level overview of Data Vault Architecture. Topics discussed are Data Vault concept and architecture Data Vault Components such as Hubs, Satellites and Link tables Typical modeling challenges with traditional modeling approaches How those challenges could be handled using Data Vault Modeling Approach. Auditing and Temporal data capture using DV Approach. And finally, some implementation details If interested in learning more, try Dan Linstedt's special coaching area at: http://danlinstedt.com/my-coach. 25

Questions?? 26

References Data Vault Series articles by Dan E. Linstedt -- http://www.tdan.com Referred few articles from Genesee Academy -- http://geneseeacademy.com Articles and books by Ralph Kimball and Bill Inmon from multiple sources. Technical Documents from http://www.oracle.com/technetwork 27

Special Thanks to Dan Linstedt for review and corrections to the article. My colleague Mark Bruscke for his assistance in preparing the article. 28

The End Contact Email : siva02@globytes.com 29