Building an Effective Data Warehouse Architecture James Serra



Similar documents
James Serra Sr BI Architect

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

LEARNING SOLUTIONS website milner.com/learning phone

70-467: Designing Business Intelligence Solutions with Microsoft SQL Server

Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course 20467A; 5 Days

Microsoft Data Warehouse in Depth

MOC 20467B: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Presented by: Jose Chinchilla, MCITP

Business Intelligence, Data warehousing Concept and artifacts

Data Warehouse Overview. Srini Rengarajan

Business Intelligence and Healthcare

Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov

SQL Server 2012 Business Intelligence Boot Camp

End to End Microsoft BI with SQL 2008 R2 and SharePoint 2010

MS 50511A The Microsoft Business Intelligence 2010 Stack

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

The Microsoft Business Intelligence 2010 Stack Course 50511A; 5 Days, Instructor-led

Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Parallel Data Warehouse

IST722 Data Warehousing

Reflections on Agile DW by a Business Analytics Practitioner. Werner Engelen Principal Business Analytics Architect

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

Data Warehousing Systems: Foundations and Architectures

Course Outline. Module 1: Introduction to Data Warehousing

ETL Overview. Extract, Transform, Load (ETL) Refreshment Workflow. The ETL Process. General ETL issues. MS Integration Services

Data Warehousing and Data Mining

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Designing Self-Service Business Intelligence and Big Data Solutions

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

MDM and Data Warehousing Complement Each Other

When to consider OLAP?

Designing a Dimensional Model

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

Course 20463:Implementing a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server MOC 20463

COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

COURSE SYLLABUS COURSE TITLE:

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Implementing a Data Warehouse with Microsoft SQL Server

Business Intelligence: How to Unleash the Real Power of Microsoft s Fast Track and PDW Appliances

Melissa Coates. Tools & Techniques for Implementing Corporate and Self-Service BI. Triad SQL BI User Group 6/25/2013. BI Architect, Intellinet

Les journées SQL Server 2013

Microsoft BI Platform Overview

SQL Server Analysis Services Complete Practical & Real-time Training

Agile Data Warehousing. Christina Knotts Associate Consultant Eli Lilly & Company

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

Implementing a Data Warehouse with Microsoft SQL Server

Business Intelligence in SharePoint 2013

DATA WAREHOUSING AND OLAP TECHNOLOGY

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Combined Knowledge Business Intelligence with SharePoint 2013 and SQL 2012 Course

Data Integration and ETL with Oracle Warehouse Builder NEW

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

Creating BI solutions with BISM Tabular. Written By: Dan Clark

SAS BI Course Content; Introduction to DWH / BI Concepts

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Chapter 5. Learning Objectives. DW Development and ETL

Microsoft End to End Business Intelligence Boot Camp

East Asia Network Sdn Bhd

Microsoft Analytics Platform System. Solution Brief

Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

SQL Server 2012 End-to-End Business Intelligence Workshop

LearnFromGuru Polish your knowledge

Using The Best Tools For Your Business Intelligence Implementation

A Survey on Data Warehouse Architecture

Implementing a Data Warehouse with Microsoft SQL Server 2012

Managing the PowerPivot for SharePoint Environment

How to Enhance Traditional BI Architecture to Leverage Big Data

Implementing Data Models and Reports with Microsoft SQL Server

Understanding Microsoft s BI Tools

Dimodelo Solutions Data Warehousing and Business Intelligence Concepts

DATA WAREHOUSING - OLAP

OLAP Theory-English version

Traditional BI vs. Business Data Lake A comparison

Implementing a Data Warehouse with Microsoft SQL Server 2012

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

MICROSOFT DATA WAREHOUSE IN DEPTH

Implementing a Data Warehouse with Microsoft SQL Server 2012

The Changing World of Business Intelligence: Leading with Microsoft Excel

IAF Business Intelligence Solutions Make the Most of Your Business Intelligence. White Paper November 2002

SQL Server 2012 Performance White Paper

Implementing a Data Warehouse with Microsoft SQL Server

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

Applied Business Intelligence. Iakovos Motakis, Ph.D. Director, DW & Decision Support Systems Intrasoft SA

Turning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex,

Data Warehousing. SQL Server 2008 R2, Denali

Trivadis White Paper. Comparison of Data Modeling Methods for a Core Data Warehouse. Dani Schnider Adriano Martino Maren Eschermann

Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463)

Deductive Data Warehouses and Aggregate (Derived) Tables

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

Lection 3-4 WAREHOUSING

Data Warehouse (DW) Maturity Assessment Questionnaire

Implementing Data Models and Reports with Microsoft SQL Server 20466C; 5 Days

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

Week 13: Data Warehousing. Warehousing

Transcription:

Building an Effective Data Warehouse Architecture James Serra Global Sponsors:

About Me Business Intelligence Consultant, in IT for 28 years Owner of Serra Consulting Services, specializing in end-to-end Business Intelligence and Data Warehouse solutions using the Microsoft BI stack Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM architect, PDW developer Been perm, contractor, consultant, business owner Presenter at PASS Business Analytics Conference and PASS Summit MCSE for SQL Server 2012: Data Platform and BI SME for SQL Server 2012 certs Contributing writer for SQL Server Pro magazine Blog at JamesSerra.com 2

Agenda Why use a Data Warehouse? Fast Track Data Warehouse (FTDW) Appliances Data Warehouse vs Data Mart Kimball and Inmon Methodologies Populating a Data Warehouse ETL vs ELT Surrogate Keys SSAS Cubes SQL Server 2012 Tabular Model End-User Microsoft BI Tools 3

Why use a Data Warehouse? All these reasons are for data warehouses only (not OLTP): Reduce stress on production system Optimized for read access, sequential disk scans Integrate many sources of data Keep historical records (no need to save hardcopy reports) Restructure/rename tables and fields Use Master Data Management, including hierarchies No IT involvement needed for users to create reports Improve data quality and plugs holes in source systems One version of the truth Easy to create BI solutions on top of it (i.e. SSAS Cubes) 4

Why use a Data Warehouse? Legacy applications + databases = chaos Enterprise data warehouse = order Produc'on Control MRP Inventory Control Parts Management Logis'cs Shipping Raw Goods Order Control Purchasing Finance Marke'ng Sales Accoun'ng Management Repor'ng Engineering Actuarial Human Resources Enterprise Data Warehouse Every ques'on = decision Con'nuity Consolida'on Control Compliance Collabora'on Single version of the truth 5

Hardware Solutions Fast Track Data Warehouse - A reference configuration optimized for data warehousing. This saves an organization from having to commit resources to configure and build the server hardware. Fast Track Data Warehouse hardware is tested for data warehousing which eliminates guesswork and is designed to save you months of configuration, setup, testing and tuning. You just need to install the OS and SQL Server Appliances - Microsoft has made available SQL Server appliances that allow customers to deploy data warehouse (DW), business intelligence (BI) and database consolidation solutions in a very short time, with all the components preconfigured and pre-optimized. These appliances include all the hardware, software and services for a complete, ready-to-run, out-of-the-box, high performance, energy-efficient solutions 6

Fast Track Data Warehouse FT Version 4.0 Benefits: - Pre-Balanced Architectures - Choice of HW platforms - Lower TCO - High Scale - Reduced Risk 7

Appliances 8 HP Business Data Warehouse Appliance (FT 3.0, 5TB) HP Business Decision Appliance (BI, SharePoint 2010, SQL Server 2008R2, PP) HP Database Consolidation Appliance (virtual environment, Windows2008R2) HP Enterprise Data Warehouse Appliance (1 st PDW, SQL2008R2, 610TB) Dell Quickstart Data Warehouse Appliance 1000 (FT 4.0, 5TB) Dell Quickstart Data Warehouse Appliance 2000 (FT 4.0, 12TB) Dell Parallel Data Warehouse Appliance (2 nd PDW, SQL2008R2, 600TB) IBM Fast Track Data Warehouse (FT 4.0, 3 versions: 24TB, 60TB, 112TB) V2 of PDW by HP released, uses SQL Server 2012, quarter-rack (75TB)

Appliances - Future HP Business Data Warehouse Appliance (FT 4.0, 2 nd half 2013, likely with flash or SSD) HP Business Decision Appliance (BI, SharePoint 2013, SQL Server 2012, PP, 2 nd quarter 2013) HP Database Consolidation Appliance (virtual environment, Windows Server 2012, System Center 2012 SP1) 9

Data Warehouse vs Data Mart Data Warehouse: A single organizational repository of enterprise wide data across many or all subject areas Holds multiple subject areas Holds very detailed information Works to integrate all data sources Feeds data mart Data Mart: Subset of the data warehouse that is usually oriented to specific subject (finance, marketing, sales) The logical combination of all the data marts is a data warehouse In short, a data warehouse as contains many subject areas, and a data mart contains just one of those subject areas 10

Kimball and Inmon Methodologies Two approaches for building data warehouses 11

Kimball and Inmon Myths Myth: Kimball is a bottom-up approach Really top-down: BUS matrix (choose business processes/data sources), conformed dimensions, MDM Myth: Inmon requires a ton of up-front design that takes a long time Inmon says to build DW iteratively, not big bang approach (p. 91 BDW, p. 21 Imhoff) Myth: Star schema data marts are not allowed in Inmon s model Inmon says they are good for direct end-user access of data (p. 365 BDW), good for data marts (p. 12 TTA) Myth: Very few companies use the Inmon method Survey had 39% Inmon vs 26% Kimball. Many have a EDW Myth: The Kimball and Inmon architectures are incompatible They can work together to provide a better solution 12

Kimball and Inmon Methodologies Relational (Inmon) vs Dimensional (Kimball) Relational Modeling: Entity-Relationship (ER) model Normalization rules Many tables using joins History tables, natural keys Good for indirect end-user access of data Dimensional Modeling: Facts and dimensions, star schema Less tables but have duplicate data (de-normalized) Easier for user to understand (but strange for IT people used to relational) Slowly changing dimensions, surrogate keys Good for direct end-user access of data 13

Relational Model vs Dimensional Model Relational Model Dimensional Model 14

Kimball and Inmon Methodologies 15 Kimball: Logical data warehouse (BUS), made up of data marts Business Driven, users have active participation Decentralized atomic tables Independent dimensional data marts optimized for reporting/ analytics Integrated via Conformed Dimensions (provides consistency across data sources) 2-tier, less ETL, no data duplication Inmon: Enterprise data model (CIF) that is a enterprise data warehouse (EDW) IT Driven, users have passive participation Centralized atomic normalized tables (off limit to end users) Later create dependent data marts that can be used for multiple purpose Integration via enterprise data model

16 Kimball Model

17 Inmon Model

Reasons to add a Enterprise Data Warehouse Single version of the truth May make building dimensions easier using lightly denormalized tables in EDW instead of going directly from the OLTP source Normalized EDW results in enterprise-wide consistency which makes it easier to spawn-off the data marts at the expense of duplicated data Less daily ETL refresh and reconciliation if have many sources and many data marts Easier to add a new data mart Reason not to: If have a few sources that need reporting quickly 18

Which model to use? Models are not that different, having become similar over the years, and can compliment each other Boils down to Inmon creates a normalized DW before creating a dimensional data mart and Kimball skips the normalized DW With tweaks to each model, they look very similar (adding a normalized EDW to Kimball, dimensionally structured data marts to Inmon) Bottom line: Understand both approaches and pick parts from both for your situation no need to just choose just one approach BUT, no solution will be effective unless you possess soft skills (leadership, communication, planning, and interpersonal relationships) 19

Hybrid Model Use SQL Server Views to interface between each level in the model 20

Kimball Methodology Kimball defines a development lifecycle, where Inmon is just about the data warehouse (not how used) 21

Populating a Data Warehouse 22 Frequency of data pull Full Extraction All data Incremental Extraction Only data changed from last run Determine data that has changed Timestamp - Last Updated Change Data Capture (CDC) Partitioning Triggers MERGE Column DEFAULT value Online Extraction Data from source. First create copy of source: Replication Database Snapshot Availability Groups Offline Extraction Data from flat file

ETL vs ELT Extract, Transform, and Load (ETL) Transform while hitting source system No staging tables Processing done by ETL tools (SSIS) Extract, Load, Transform (ELT) Uses staging tables Processing done by target database engine (SSIS: Execute T-SQL Statement task instead of Data Flow Transform tasks) Use for big volumes of data Use when source and target databases are the same Use with Parallel Data Warehouse (PDW) ELT is better since database engine is more efficient than SSIS Best use of database engine: Transformations Best use of SSIS: Data pipeline and workflow management 23

Surrogate Keys Surrogate Keys Unique identifier not derived from source system Embedded in fact tables as foreign keys to dimension tables Allows integrating data from multiple source systems Protect from source key changes in the source system Allows for slowly changing dimensions Allows you to create rows in the dimension that don t exist in the source (-1 in fact table for unassigned) Improves performance (joins) and database size by using integer type instead of text 24

SSAS Cubes Reasons to use instead of data warehouse: Aggregating (Summarizing) the data for performance Multidimensional analysis slice, dice, drilldown Hierarchies Advanced time-calculations i.e. 12-month rolling average Easily use Excel to view data via Pivot Tables Slowly Changing Dimensions (SCD) 25

26 Data Warehouse Architecture

SQL Server 2012 Tabular Model New xvelocity in-memory database in SSAS Build model in Power Pivot or SQL Server Data Tools (SSDT) Uses existing relational model No star schema, no extra SSIS Uses DAX (Excel-like formula language to create custom calculations) Faster and easier to use than multidimensional model In Hybrid Model, DW Bus Architecture goes away, but only if you don t need surrogate keys, SCD, conformed dimensions, and all reporting will be off of the cubes 27

End-User Microsoft BI Tools Many options on Microsoft tools to use to go against the DW/BI: Excel PivotTables (Cube, 3NF) SQL Server Reporting Services (SSRS) (Cube, 3NF) Report Builder (Cube, 3NF) PowerPivot (3NF) PerformancePoint Services (PPS) (Cube) Power View (Cube) Data Mining (Cube) 28

Resources Data Warehouse Architecture Kimball and Inmon methodologies: http://bit.ly/srznhy SQL Server 2012: Multidimensional vs tabular: http://bit.ly/srzx1x Data Warehouse vs Data Mart: http://bit.ly/srai4p Fast Track Data Warehouse Reference Guide for SQL Server 2012: http://bit.ly/srawsj Complex reporting off a SSAS cube: http://bit.ly/sraeyw Surrogate Keys: http://bit.ly/srairp Normalizing Your Database: http://bit.ly/srahnc Difference between ETL and ELT: http://bit.ly/srakqa Microsoft s Data Warehouse offerings: http://bit.ly/xazy9h Microsoft SQL Server Reference Architecture and Appliances: http://bit.ly/y7bxy5 Methods for populating a data warehouse: http://bit.ly/sraruz Great white paper: Microsoft EDW Architecture, Guidance and Deployment Best Practices: http://bit.ly/srazug End-User Microsoft BI Tools Clearing up the confusion: http://bit.ly/srbmlt Microsoft Appliances: http://bit.ly/yqixzm 29

Questions? Global sponsors James Serra, Business Intelligence Consultant Email me at: JamesSerra3@gmail.com Follow me at: @JamesSerra Link to me at: www.linkedin.com/in/jamesserra Visit my blog at: JamesSerra.com Alliance sponsors

Thank You for Attending Global sponsors Alliance sponsors