Building a Data Warehouse Andrew Couch UK Access User Group. Andrew Couch 2016, All Rights Reserved

Similar documents
Presented by: Jose Chinchilla, MCITP

Implementing a Data Warehouse with Microsoft SQL Server MOC 20463

COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

SAS BI Course Content; Introduction to DWH / BI Concepts

Implementing a Data Warehouse with Microsoft SQL Server

Course Outline. Module 1: Introduction to Data Warehousing

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Implementing a Data Warehouse with Microsoft SQL Server

Course 20463:Implementing a Data Warehouse with Microsoft SQL Server

Building an Effective Data Warehouse Architecture James Serra

Microsoft Data Warehouse in Depth

East Asia Network Sdn Bhd

IST722 Data Warehousing

Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov

ETL Process in Data Warehouse. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

The Benefits of Data Modeling in Data Warehousing

Implementing a Data Warehouse with Microsoft SQL Server 2012

Designing a Dimensional Model

Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463)

Implementing a Data Warehouse with Microsoft SQL Server 2012

Implementing a Data Warehouse with Microsoft SQL Server 2012

Implementing a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server 2012

Data Warehousing and Data Mining

ETL Overview. Extract, Transform, Load (ETL) Refreshment Workflow. The ETL Process. General ETL issues. MS Integration Services

Microsoft. Course 20463C: Implementing a Data Warehouse with Microsoft SQL Server

Beta: Implementing a Data Warehouse with Microsoft SQL Server 2012

IBM WebSphere DataStage Online training from Yes-M Systems

Would-be system and database administrators. PREREQUISITES: At least 6 months experience with a Windows operating system.

CHAPTER 4: BUSINESS ANALYTICS

SQL Server Introduction to SQL Server SQL Server 2005 basic tools. SQL Server Configuration Manager. SQL Server services management

When to consider OLAP?

Implementing a Data Warehouse with Microsoft SQL Server

70-467: Designing Business Intelligence Solutions with Microsoft SQL Server

SQL Server 2012 Business Intelligence Boot Camp

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Trivadis White Paper. Comparison of Data Modeling Methods for a Core Data Warehouse. Dani Schnider Adriano Martino Maren Eschermann

For Sales Kathy Hall

Optimizing Your Data Warehouse Design for Superior Performance

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

SQL SERVER TRAINING CURRICULUM

SAS Business Intelligence Online Training

Reflections on Agile DW by a Business Analytics Practitioner. Werner Engelen Principal Business Analytics Architect

LearnFromGuru Polish your knowledge

CHAPTER 5: BUSINESS ANALYTICS

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

Establish and maintain Center of Excellence (CoE) around Data Architecture

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Data Vault and The Truth about the Enterprise Data Warehouse

The Microsoft Business Intelligence 2010 Stack Course 50511A; 5 Days, Instructor-led

ETL-EXTRACT, TRANSFORM & LOAD TESTING

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

SQL SERVER BUSINESS INTELLIGENCE (BI) - INTRODUCTION

Correct Answer: J Explanation. Explanation/Reference: According to these references, this answer looks correct.

SQL Server 2008 Core Skills. Gary Young 2011

Sterling Business Intelligence

Data Warehouse Overview. Srini Rengarajan

Αξιοποιείστε τα δεδομένα της επιχείρησής σας με Ms Visual Studio 2010 & Ms SQL 2008

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

Migrating a Discoverer System to Oracle Business Intelligence Enterprise Edition

KDOT s Spatially Enabled Data Warehouse. Paul Bayless KDOT Data Warehouse Manager and Bill Schuman GeoDecisions Project Manager

PassTest. Bessere Qualität, bessere Dienstleistungen!

SSIS Training: Introduction to SQL Server Integration Services Duration: 3 days

AV-005: Administering and Implementing a Data Warehouse with SQL Server 2014


A Multidimensional Design for the Genesis Data Set

Course: SAS BI(business intelligence) and DI(Data integration)training - Training Duration: 30 + Days. Take Away:

DATA WAREHOUSE BUSINESS INTELLIGENCE FOR MICROSOFT DYNAMICS NAV

TRANSFORMING YOUR BUSINESS

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Republic Polytechnic School of Information and Communications Technology C355 Business Intelligence. Module Curriculum

ETL TESTING TRAINING

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

The Data Warehouse ETL Toolkit

Trends in Data Warehouse Data Modeling: Data Vault and Anchor Modeling

Data warehousing with PostgreSQL

MICROSOFT DATA WAREHOUSE IN DEPTH

Modeling: Operational, Data Warehousing & Data Marts

Data Warehousing. SQL Server 2008 R2, Denali

Oracle BI Applications (BI Apps) is a prebuilt business intelligence solution.

THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days

Managing Third Party Databases and Building Your Data Warehouse

SQL Server Analysis Services Complete Practical & Real-time Training

Data Warehouse design

EAI vs. ETL: Drawing Boundaries for Data Integration

Business Intelligence for SUPRA. WHITE PAPER Cincom In-depth Analysis and Review

MOC 20467B: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Using Relational Algebra on the Specification of Real World ETL Processes

Business Intelligence: Effective Decision Making

Building a Cube in Analysis Services Step by Step and Best Practices

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Business Intelligence. 11. SSIS, ETL January 2014.

BENEFITS OF AUTOMATING DATA WAREHOUSING

Module 1: Introduction to Data Warehousing and OLAP

Transcription:

Building a Data Warehouse Andrew Couch UK Access User Group Andrew Couch 2016, All Rights Reserved

Your experiences? How many of you have NOT Used SQL Server? Used SSIS (SQL Server Integration Services)? Built a DW (Data Warehouse)?

Presentation Contents Introduction Why build a Data Warehouse (DW)? DW Terminology and Layers SSIS Simple Demo Data Warehouse Layers Staging Layer (Demo) ETL (Extract Transform and Load) Framework Type II SCD Dimensions Enterprise Layer Facts & Kimball Methodology Data Mart Layer

Why build a Data Warehouse? Limitations of Existing Systems Business users find it difficult to piece together all the links in a relational database schema Objects such as tables and fields often don t represent the best business oriented names for data items RDBMS are great at showing operational state, but do not easily capture how data changes over time Operational systems may suffer performance issues when extensive analysis is undertaken against a live system Often need data from other systems/sources for full analysis

Data Warehouse Terminology Dimensional analysis Process of restructuring data into a form more suitable for a business analyst Fact or Measure (data) Contains measures, counts (transactional) and references dimensions Dimension (data) Contain attributes which can vary over time tied to the facts or possibly other dimensions Dimension types Some dimensions can have slowly changing data which captures changes over time and supports more accurate reporting Grain of a fact This could be a transaction, or other combination of key values; the grain is very important Star schema The aim is to construct a very simple data model, with a fact in the centre and relationships to several dimensions; it looks like a star Snowflake Schema Where we have dimensions linked to dimensions; to be avoided when possible

Physical Overview of a Data Warehouse Data Warehouse Live system Staging Enterprise or History Data Mart or Transformation SSIS and Packages

Data Warehouse Layers Staging Bring in the data to a set of tables which mirror the structure of the incoming data, data gets replenished with tables emptied and re-populated Enterprise or History Data accumulates and has minor restructuring; only accumulate data when it is new or changes, need to detect when data has changed Transformation or Data Mart Data presented in a simplified star schema for use by business analyst or to feed into cubes/ data mining for further analysis Some data marts accumulate data, others can be emptied and re-populated Not in 3NF; often highly de-normalised

Example of an OLTP System

Need Theory, Practice and Tools Donald Knuth Donald Ervin Knuth (born 10 January 1938) is an American computer scientist, Professor Emeritus at Stanford University, and winner of the 1974 Turing Award. If you find that you're spending almost all your time on theory, start turning some attention to practical things; it will improve your theories. If you find that you're spending almost all your time on practice, start turning some attention to theoretical things; it will improve your practice.

SSIS Demo (see how easy ETL is!) Steps Create a VS SSIS Project By default this creates an initial package Add a source connection Add a target connection Add a script task to empty the target table Add a data flow to populate the target table Strategies Use the GUI components Use your TSQL and stored procedures Write VB.NET or C# for non-standard file formats

My ETL Framework We have 3 layers in the DW, so create one Visual Studio SSIS Project for each Layer Each project has multiple packages Pre-processing package; get everything ready for the processing Dimension packages; process the dimensions Up to 8 or 10 processing sequences; each sequence processes a table Fact packages; process the facts Up to 8 or 10 processing sequences; each sequence processes a table Post-processing package; finish up and prepare next layer Everything is driven and recorded in the Framework system tables Framework supports Easy development, can be restarted when a sequence fails (big feature!) Robust error handling Easy extensibility

Figure 2: basic framework components Figure 3: worklist processing My Framework Concepts Figure 8: general layout of package structure

Key Features No Keys No Relationships Remove Identity Properties Staging Layer

Demo : Staging Data Using SSIS

Slowly Changing Dimensions: SCD Type II John works for the sales division in the Northwest, he then emigrates down south to the Southwest sales division All revenue before he moved need to be credited in the Northwest business unit, then after he moves to the Southwest business unit The data is slowly changing, because it changes slowly over time The data is called a dimension (think lookup) John needs two personal records, each with a different key that reflects the changes in his attributes over time We need a start and end date for when the dimension record is applicable to the related fact We need surrogate keys, because the dimensional data has several records for John which vary over time A fact (for example sales) need to be tied to the appropriate surrogate key Example :-

Dimension (SCD) Fact (Transactional) Id = 1, John In Northwest Sale 6 : Id = 1 Sale 7 : Id = 1 Id = 2, John In Southwest Sale 8 : Id = 2 Sale 9 : Id = 2 Id Name Region SCD_StartDate SCD_EndDate --------- ---------------- ---------------- --------------------- ------------------- 1 John NW 1/1/2015 1/1/2016 2 John SW 2/1/2016?????????????? : could be either NULL or a high date 31/12/9999 (slide shows UK dates)

Key Features Surrogate Keys New foreign keys SCD_StartDate SCD_EndDate Null Value Records Enterprise/ History Layer

Enterprise, SCD with SSIS SCD Data Flow, automatically created with a simple Wizard

Enterprise Layer Notes SCD can be implemented using several different techniques SSIS SCD Component If using built in SSIS SCD Component, then you can not have VARCHAR(MAX), NVARCHAR(MAX), TEXT VARBINARY(MAX) Stored Procedures; return counts and flags to SSIS Other SSIS Flows and Logic Can use Checksum calculations to determine when a row has changed and needs to be versioned In SQL Server, don t mix UNICODE and ASCII data types SSIS will not convert VARCHAR(10) in Staging implicitly to mapped NVARCHAR(10) in Enterprise; world of pain Keep your data with minimum calculated values and a close structure to the OLTP in this layer; in-case you change the rules for performing calculations

Facts: Paraphrasing Kimball (my interpretation) Some facts are pure transactional, for example in an accounting system we do not edit historical transactions, but make subsequent transactions to correct a balance : Transactional Fact Some facts like a share price change on a periodic basis: Periodic Snapshot Fact Some facts can get changed/evolve. Maybe attributes need to move through a pipeline before becoming immutable, or maybe someone just edits them! Accumulating Snapshot Fact Delete and replace the fact? Update fact attributes? Other business specific strategies

Ralph Kimball (born 1944) is an author on the subject of data warehousing and business intelligence. He is one of the original architects of data warehousing and is known for long-term convictions that data warehouses must be designed to be understandable and fast. His methodology, also known as dimensional modeling or the Kimball methodology, has become the de facto standard in the area of decision support. A row in an accumulating snapshot fact table summarizes the measurement events occurring at predictable steps between the beginning and the end of a process. Pipeline or workflow processes, such as order fulfillment or claim processing, that have a defined start point, standard intermediate steps, and defined end point can be modeled with this type of fact table. There is a date foreign key in the fact table for each critical milestone in the process. An individual row in an accumulating snapshot fact table, corresponding for instance to a line on an order, is initially inserted when the order line is created. As pipeline progress occurs, the accumulating fact table row is revisited and updated. This consistent updating of accumulating snapshot fact rows is unique among the three types of fact tables. In addition to the date foreign keys associated with each critical process step, accumulating snapshot fact tables contain foreign keys for other dimensions and optionally contain degenerate dimensions. They often include numeric lag measurements consistent with the grain, along with milestone completion counters.

Key Features Star Schema Avoid Snowflakes Bring in only desired columns Can use views for the data No NULL Values for SSAS in fields Example DataMart

Sample Databases for my SSIS Framework http://www.ascassociates.biz/ssispart1.aspx Part I: Building a SSIS ETL Framework Concepts Part II: SSIS Staging Sample Framework Part III: SSIS Enterprise Sample Framework Part IV: SSIS Transformation Sample Framework Code can be used and modified for commercial use, no restrictions Some consultancy available