Designing a Dimensional Model



Similar documents
BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance

Optimizing Your Data Warehouse Design for Superior Performance

Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov

Multidimensional Modeling - Stocks

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

Data Warehousing and Decision Support. Torben Bach Pedersen Department of Computer Science Aalborg University

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Presented by: Jose Chinchilla, MCITP

Dimensional Data Modeling for the Data Warehouse

Kimball Dimensional Modeling Techniques

Business Intelligence, Data warehousing Concept and artifacts

Dimodelo Solutions Data Warehousing and Business Intelligence Concepts

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Fluency With Information Technology CSE100/IMT100

Data W a Ware r house house and and OLAP II Week 6 1

Data Warehousing and Data Mining

DATA WAREHOUSING AND OLAP TECHNOLOGY

Microsoft Data Warehouse in Depth

When to consider OLAP?

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Turning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex,

Data Warehousing, OLAP, and Data Mining

DIMENSIONAL MODELLING

This tutorial is designed for those who want to learn the basics of OBIEE and take advantage of its features to develop quality BI reports.

The Benefits of Data Modeling in Data Warehousing

University of Gaziantep, Department of Business Administration

Avoiding Common Analysis Services Mistakes. Craig Utley

BUILDING A WEB-ENABLED DATA WAREHOUSE FOR DECISION SUPPORT IN CONSTRUCTION EQUIPMENT MANAGEMENT

DATA WAREHOUSING - OLAP

Analysis Services Step by Step

Data Warehousing Systems: Foundations and Architectures

F4: DW Architecture and Lifecycle. Erik Perjons, DSV, SU/KTH Data warehouse

Data Warehousing. SQL Server 2008 R2, Denali

Business Intelligence & Product Analytics

IST722 Data Warehousing

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

The Microsoft Business Intelligence 2010 Stack Course 50511A; 5 Days, Instructor-led

CHAPTER 4 Data Warehouse Architecture

<Insert Picture Here> Oracle Retail Data Model Overview

Basics of Dimensional Modeling

Monitoring Genebanks using Datamarts based in an Open Source Tool

Part 22. Data Warehousing

Dimensional Modeling 101. Presented by: Michael Davis CEO OmegaSoft,LLC

Understanding Data Warehousing. [by Alex Kriegel]

Building Scalable & High Performance Datamarts with MySQL

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

Business Intelligence, Analytics & Reporting: Glossary of Terms

THE TECHNOLOGY OF USING A DATA WAREHOUSE TO SUPPORT DECISION-MAKING IN HEALTH CARE

Week 13: Data Warehousing. Warehousing

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project

Building an Effective Data Warehouse Architecture James Serra

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance

Course chapter 9: Data Warehousing Dimensional modeling. F. Radulescu - Data warehousing - Dimensional modeling 1

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi

Trends in Data Warehouse Data Modeling: Data Vault and Anchor Modeling

SAS BI Course Content; Introduction to DWH / BI Concepts

TRANSFORMING YOUR BUSINESS

DATA WAREHOUSE BUSINESS INTELLIGENCE FOR MICROSOFT DYNAMICS NAV

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

Trivadis White Paper. Comparison of Data Modeling Methods for a Core Data Warehouse. Dani Schnider Adriano Martino Maren Eschermann

Introduction to Data Warehousing. Ms Swapnil Shrivastava

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

Speeding ETL Processing in Data Warehouses White Paper

Business Benefits From Microsoft SQL Server Business Intelligence Solutions How Can Business Intelligence Help You? PTR Associates Limited

Dimensional Modeling and E-R Modeling In. Joseph M. Firestone, Ph.D. White Paper No. Eight. June 22, 1998

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days

Anwendersoftware Anwendungssoftwares a. Data-Warehouse-, Data-Mining- and OLAP-Technologies. Online Analytic Processing

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

Module 1: Introduction to Data Warehousing and OLAP

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

Implementing Data Models and Reports with Microsoft SQL Server 20466C; 5 Days

SAP BO Course Details

Together we can build something great

Business Intelligence for SUPRA. WHITE PAPER Cincom In-depth Analysis and Review

Technology-Driven Demand and e- Customer Relationship Management e-crm

The Design and the Implementation of an HEALTH CARE STATISTICS DATA WAREHOUSE Dr. Sreèko Natek, assistant professor, Nova Vizija,

LEARNING SOLUTIONS website milner.com/learning phone

Week 3 lecture slides

Sterling Business Intelligence

Data Testing on Business Intelligence & Data Warehouse Projects

Lection 3-4 WAREHOUSING

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities

By Makesh Kannaiyan 8/27/2011 1

INTRODUCTION TO BUSINESS INTELLIGENCE What to consider implementing a Data Warehouse and Business Intelligence

<Insert Picture Here> Enhancing the Performance and Analytic Content of the Data Warehouse Using Oracle OLAP Option

Justice Data Warehousing and Court Business Intelligence. Technical Introduction. Harris County Courts

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

Transcription:

Designing a Dimensional Model Erik Veerman Atlanta MDF member SQL Server MVP, Microsoft MCT Mentor, Solid Quality Learning Definitions Data Warehousing A subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of decision-making process. Data Marts R. Kimball - a data mart is a flexible set of data, ideally based on the most atomic (granular) data possible to extract from operational source, and presented in a symmetric (dimensional) model that is resilient when faced with unexpected user queries in its most simplistic form a data mart represent data from a single business process Business process = purchase order, store inventory, etc 1

Definitions OLAP = On-line Analytical Processing A reporting system designed to allow different flexible analysis in real time, on-line, with data structures designed for fast retrieval, with redundancy included to support performance. Note: On-line doesn t indicate data from on-line systems, rather on-the-fly OLAP Basics Drill-down - decreasing the level of aggregation Drill-up/Roll-up - increasing the level of aggregation Drill-across - move between different star-join schemas using conformed dimensions and joins Slicing and dicing ability to look at the database from different views, e.g. one slice shows all sales of product type within regions, another slice shows all sales by sales channel within each product type Pivoting - e.g. change columns to rows, rows to columns Ranking - sorting Definitions Business Intelligence (BI) Forrester definition: A process of transforming data into information and making it available to users in time to make a difference Strategic BI (Examples: Balance scorecard, Strategic Planning) Who: strategic leaders What: formulate strategy and monitor corporate performance Operational BI (Examples: Budgeting, Sales forcasting) Who: operational managers What: execution of strategy againts objectives Analytical BI (Examples: Financial and Sales Analysis, Customer Segmentation, Clickstream analysis) Who: analysts, knowledge worker, controller What: ad-hoc analysis 2

THE Definition Dimensional Modeling The process and outcome of designing logical database schemas created to support OLAP and Data Warehousing solutions Transactional System Emphasis OLTP = On Line Transactional Processing Each transaction must be written to the database in real time, i.e. on line Data structures must enable consistent and fast writing The best consistency and speed can be achieved if each piece of data is written only once Normalized (netlike) relational schema suits these requests 3

Reporting Challenges with OLTP Schema doesn t clearly call out subjects, objects, events, states... Difficult to prepare reports and analysis views Requires multiple joins Indexes not optimized for reporting Models business process, not information Levels show only current state, history is not tracked Data Warehousing, The Solution Schema designed with reporting and analysis in mind With redundant data, specially prepared for analysis, we can do more: Prepare data over time Prepare aggregates Add data from other sources, not only OLTP Sales value shows much more if we know also market capacity and our market share 4

Business Intelligence Architecture Store Costs & Sales (SQL Server) Reporting Applications Warehouse Orders & Costs (Oracle) Star-Schema (SQL Server) BI Solutions Corporate HR & Budget (DB2 & Excel) Cubes (Analysis Services) Corporate Data Warehouse Dimensional Modeling Used by most contemporary BI solutions Right mix of normalization and denormalization often called Dimensional Normalization Some use for full data warehouse design Others use for data mart designs Consists of two primary types of tables Dimension tables Fact tables 5

Dimensional Modeling Dimensional normalization Logical design technique that presents data in an intuitive way allowing high-performance access Targets decision support information Focused on easy user navigation and high performance design (vs.) Transactional normalization Logical design technique to eliminate data redundancy, to keep data consistency, and storage efficiency Makes transactions simple and deterministic ER models for enterprise are usually complex often containing hundreds, or even thousands, of entities/tables Dimension Tables Contain attributes related to business entities Customers, vendors, employees Products, materials, even invoices (attributes!) Dates and sometimes time (hours, mins, etc.) Often employ surrogate keys Defined within the dimensional model Not the same as source system primary, alternate, or business keys Not uncommon to have many, many columns Highly de-normalized to reduce joins 6

Fact Tables Contain numbers and other business metrics Define the basic measures users want to analyze Numbers are then aggregated according to related dimensions Fact tables contain dimension keys Defines relationship between measures and dimensions using surrogate keys Typically narrow tables, but often very large Highly normalized structure Dimensional Model Example 7

Why Dimensional Modeling Logical model is easy to understand Standard framework and business model for end user apps Model can be done (mostly) independent of expected queries Handle changes easy such as adding new dimensional attributes Optimized for performance High performance browsing across the attributes Strategy to handling aggregates, leveraging summary tables or OLAP aggregation technologies Logical redundant with base table to enhance query performance OLAP engines can make strong assumptions on how to optimize Historical tracking of information Strategies for handling changing dimensions Fact design allows high volume snapshots and transaction tracking Dimension Tables! 8

Dimensions Organized hierarchies of categories, levels, and members Used to slice and query within a cube Business perspective from which data is looked upon Collection of text attributes that are highly correlated (e.g. Product, Store, Time) Shared with multiple fact relationships to provides data correlation Dimension Details Attributes Descriptive characteristics of an entity Building blocks of dimensions, describe each instance Usually text fields, with discrete values e.g., the flavor of a product, the size of a product Dimension Keys Surrogate Keys Candidate Business Keys Dimension Granularity Granularity in general is the level of detail of data contained in an entity A dimensions granularity is the lowest level object which uniquely identifies a member Typically the identifying name of a dimension 9

Hierarchies Dimension Keys Dimension Business Key Column or columns that identify a unique instance of the business record (not necessarily a unique record in the dimension table) Used in the ETL process to tie fact records with dimension members Dimension Record Surrogate Key Defines the dimension s primary key Relates to the fact table foreign key field Numeric data type, typically integer (2,4,8 byte) 10

Dimension Surrogate Keys Surrogate Key Usage Consolidates multi-value business keys Allows tracking of dimension history Standardizes dimension tables Limits fact table width for optimization Surrogate Key Design Practices Avoid smart keys Avoid production keys (may change!) the company may acquire a competitor and thereby change the key building rules changed record, but deliberately not changed key Narrow as possible Dimensions Types Basic Dimension Types Advanced Types Standard Star dimension Degenerate Snowflake dimension Profile or Junk dimensions Parent-Child dimensions Role-playing and Outriggers Snowflake Employee The Board SteveB BillG JimAll PaulMa BobMu TodN DavidV Manager <None> The Board The Board SteveB SteveB SteveB PaulMa PaulMa Parent-Child PaulFle DavidV 11

Role-Playing and Time Dimensions Changing Dimensions Problem known as Slowly Changing Dimensions (Ralph Kimball) Common changing types: 0: No change. 1: Not interested in the previous state. Overwrites value. 2: Slow changes. Adds new row. 3: Fast changes. Adds new column 12

SCD Type-1 UPDATE In-Place update Restates history, cannot query old value Locking and contention possible Simple Customer Source Customer ID Customer Name Customer Type Product Line Year Opened AW014 Bike Mart Business, Warehouse All 1996 Customer SK Customer ID Population Reseller Type Reseller Dim Product Line (Type-1) Year Opened 27 AW014 Bike Mart Warehouse All Mountain 1996 SCD Type-2 UPDATE Product Dim Product Source Product SK 462 477 Product ID BK-M82S Product ID BK-M82S BK-M82S Name Yukon Special Name Yukon Special Yukon Special Size 44 Size 44 44 Model All Terrain Model (Type 2) Mountain All Terrain Class Light Frame Class (Type 2) Light Frame Light Frame Track History (versioning) Surrogate Keys required! Start Time End Time 1/1/2003 NULL 6/7/2005 6/7/2005 NULL 13

Where to Start for Dimensions.. Understand dimension hierarchies and drill-paths Confirm historically tracked attribute requirements Check source data integrity, cleanliness, and completeness Review current reports for summarization, roll-up, and grouping Fact Tables! 14

The fact itself Facts The measure that is being tracked Quantity, count, amount, percent Mostly numerical, continuous values e.g., price of a product, quantity sold, number of products in inventory, budget value, count of customers, count of sales, account balance Facts (or measures) can be classified by Numerical data type Aggregation type Additive nature Granularity Facts Fact tables Capture measures/facts Association with dimensions Some tracking information included Different types Transactional Snapshot or inventory Factless Fact Table Granularity The level of detail of data contained in the fact table The description of a single instance (a record) of the fact Typically includes a time level and a distinct combinations of other dimensions e.g. Daily item totals by product, by store, Weekly snapshot of store inventory by product 15

Additive Nature Additive: Facts that can be summed up/aggregated across all of the dimensions in the fact table (e.g., discrete numerical measures of activity, i.e., quantity sold, dollars sold) Semi-Additive: Facts that can be summed up for some of the dimensions in the fact table, but not the others (e.g., account balances, inventory level, distinct counts) Non-Additive: Facts that cannot be summed up for any of the dimensions present in the fact table (e.g., measurement of room temperature) Advanced: Custom Rollups Financial Statement Profit + Net sales + Gross sales - VAT - Discount - Expenses + Infrastructure + Labor + Financing Standard 13000 8300 7000 800 500 4700 1500 2500 700 Custom 1000 5700 7000 800 500 4700 1500 2500 700 Possible, but requires blending of data with meta data in the data warehouse 16

Aggregations (Aggs) A summarization of base-level fact table records Aggregations need to account for the additive nature of the measures, created on-the-fly or by pre-aggregation Common aggregation scenarios Category product aggs by store by day District store aggs by product by day Monthly sales aggs by product by store Category product aggs by store district by day Common aggregations = Sum, Count, Distinct Count, Max, Min, Average, Semi-additive (Last Child, Last Non-empty Child) Fact Table Types Different types of fact tables Transactional Additive facts tracking events over time Snapshot or inventory Pictures in time of levels or balances Factless Dimensionality relationships Combinations! 17

Transactional Fact Tables Most common type of fact table Track the occurrence of events, each detailed event is captured into a row in the fact table Measures are typically additive across all dimensions Common transactional fact table types Sales, Visits, Web-page hits, Account transactions Snapshot Fact Tables Known as inventory level fact tables Snapshot in time of quantities in stock or balances of accounts Time dimension used to identify grain Non additive measures across time, but typically additive across all other dimensions Common transactional fact table types Inventory levels, Event booking levels, Chart of account balance levels 18

Factless Fact Tables No measured facts! Are useful to describe events and coverage Information that something has or has not happened Often used to represent many-to-many relationships Contain only dimension keys Common factless fact tables: Class attendance, Event tracking, Coverage tables, Promotion or campaign facts Design with Additive in Mind Think dimensionally! Complex requirements don t need to be designed with complex queries Many times new fact tables can be designed that can answer specific questions, such as date attributes and ranges 19

Where to start with Fact Tables Identify high-value business process to model (orders, invoices, shipments, inventory) Identify reporting grain of the business process Identify dimensions that apply to each fact table Identify measures that will populate the fact tables Example business questions to listen for: How much total business did my newly remodeled stores do compared with the chain average? How did leather goods items costing less than $5 do with my most frequent shoppers? What was the revenue comparison of non-holiday weekend days to holiday weekend days? Resources The Data Warehouse Toolkit (Kimball) The Data Warehouse Lifecycle Toolkit (Kimball) The Microsoft Data Warehouse Toolkit (Kimball) OLAP Solutions (Thomsen) Microsoft OLAP Solutions (Thomsen, Spofford, Chase) 20

Simple as that! What we didn t have time to cover: Meta Data in Data Warehouses Designing tables with SSAS in mind Indexing strategies Disk optimization for Data Warehouses Fact table partitioning strategies Aggregation tables ETL Considerations Data Warehouse Project Planning 21