Data Warehousing Fundamentals Student Guide



Similar documents
Oracle Database 11g: Data Warehousing Fundamentals

Data Warehouse Database Design Student Guide

Analytics: Pharma Analytics (Siebel 7.8) Student Guide

Oracle BI Discoverer Administrator 11g: Develop an EUL

Oracle9i Database: Advanced Backup and Recovery Using RMAN

Oracle BI 10g: Analytics Overview

Oracle Database 11g: Administer a Data Warehouse

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Part 22. Data Warehousing

Data Warehousing Fundamentals

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Implementing a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server

Fluency With Information Technology CSE100/IMT100

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

Implementing a Data Warehouse with Microsoft SQL Server 2012

Data Warehousing with Oracle

Oracle Warehouse Builder 10g

Cúram Business Intelligence Reporting Developer Guide

Course Outline. Module 1: Introduction to Data Warehousing

Implementing a Data Warehouse with Microsoft SQL Server MOC 20463

COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.

Implementing a Data Warehouse with Microsoft SQL Server 2012

Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance

Oracle Database 11g for Data Warehousing and Business Intelligence. An Oracle White Paper July 2007

Lection 3-4 WAREHOUSING

East Asia Network Sdn Bhd

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Data Warehousing Fundamentals for IT Professionals. 2nd Edition

AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO

Business Intelligence Tutorial: Introduction to the Data Warehouse Center

Data Warehouse Overview. Srini Rengarajan

Oracle Architecture, Concepts & Facilities

SQL Server 2012 Business Intelligence Boot Camp

DATA WAREHOUSING AND OLAP TECHNOLOGY

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Outline Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications

Implementing a Data Warehouse with Microsoft SQL Server

Business Intelligence Tutorial

THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days

Course 20463:Implementing a Data Warehouse with Microsoft SQL Server

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

Implementing a Data Warehouse with Microsoft SQL Server

Building a Data Warehouse

When to consider OLAP?

Implementing a Data Warehouse with Microsoft SQL Server 2014

Data Warehousing Systems: Foundations and Architectures

Change Manager 5.0 Installation Guide

Data Warehouse: Introduction

Microsoft. Course 20463C: Implementing a Data Warehouse with Microsoft SQL Server

An Oracle White Paper March Best Practices for Real-Time Data Warehousing

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

CSPP 53017: Data Warehousing Winter 2013" Lecture 6" Svetlozar Nestorov" " Class News

IBM WebSphere DataStage Online training from Yes-M Systems

NEW FEATURES ORACLE ESSBASE STUDIO

Data Warehouse Design

POLAR IT SERVICES. Business Intelligence Project Methodology

The Data Warehouse ETL Toolkit

By Makesh Kannaiyan 8/27/2011 1

Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463)

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

An Oracle White Paper February Real-time Data Warehousing with ODI-EE Changed Data Capture

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Oracle BPA Suite: Model and Implement Business Processes Volume I Student Guide

Framework for Data warehouse architectural components

Oracle Identity and Access Management: The All-In-One Seminar Student Guide

ER/Studio Enterprise Portal User Guide

IST722 Data Warehousing

Data Warehousing Fundamentals

B.Sc (Computer Science) Database Management Systems UNIT-V

Implementing a Data Warehouse with Microsoft SQL Server 2012

Data warehouse Architectures and processes

14. Data Warehousing & Data Mining

Oracle OLAP. Describing Data Validation Plug-in for Analytic Workspace Manager. Product Support

Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course 20467A; 5 Days

Beta: Implementing a Data Warehouse with Microsoft SQL Server 2012

Performance Enhancement Techniques of Data Warehouse

LEARNING SOLUTIONS website milner.com/learning phone

OracleAS 10g: Enterprise Identity Management

Building an Effective Data Warehouse Architecture James Serra

Oracle Utilities Mobile Workforce Management Business Intelligence

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Oracle Enterprise Manager

Data Warehousing and Data Mining

Business Intelligence Solution for Small and Midsize Enterprises (BI4SME)

Oracle Application Server 10g: Administer High Availability

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

Oracle Utilities Meter Data Management Business Intelligence

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

Republic Polytechnic School of Information and Communications Technology C355 Business Intelligence. Module Curriculum

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

How To Write A Diagram

Oracle Communications Data Model

How to Enhance Traditional BI Architecture to Leverage Big Data

Understanding Data Warehousing. [by Alex Kriegel]

Transcription:

Data Warehousing Fundamentals Student Guide D16310GC10 Edition 1.0 September 2002 D37302

Authors Nikos Psomas Padmaja Mitravinda, Kolachalam Technical Contributors and Reviewers Kasturi Shekhar Vidya Nagaraj Sudip Majumber Robert Stackowiak Joel Barkin Adam Laro-Bashford M. Lea Shaw Richard Green Jean-Pierre Dijcks Mark Van de Wiel James Spiller John Haydu Maribel Renau Marcelo Manzano Sarah Spicer Rosita Hanoman Publisher Nita K. Brozowski Copyright Oracle Corporation, 2002. All rights reserved. This documentation contains proprietary information of Oracle Corporation. It is provided under a license agreement containing restrictions on use and disclosure and is also protected by copyright law. Reverse engineering of the software is prohibited. If this documentation is delivered to a U.S. Government Agency of the Department of Defense, then it is delivered with Restricted Rights and the following legend is applicable: Restricted Rights Legend Use, duplication or disclosure by the Government is subject to restrictions for commercial computer software and shall be deemed to be Restricted Rights software under Federal law, as set forth in subparagraph (c)(1)(ii) of DFARS 252.227-7013, Rights in Technical Data and Computer Software (October 1988). This material or any portion of it may not be copied in any form or by any means without the express prior written permission of Oracle Corporation. Any other copying is a violation of copyright law and may result in civil and/or criminal penalties. If this documentation is delivered to a U.S. Government Agency not within the Department of Defense, then it is delivered with Restricted Rights, as defined in FAR 52.227-14, Rights in Data-General, including Alternate III (June 1987). The information in this document is subject to change without notice. If you find any problems in the documentation, please report them in writing to Education Products, Oracle Corporation, 500 Oracle Parkway, Box SB-6, Redwood Shores, CA 94065. Oracle Corporation does not warrant that this document is error-free. Express, Express Analyzer, Express Objects, Express Server, Personal Express, and Oracle are trademarks or registered trademarks of Oracle Corporation. All other products or company names are used for identification purposes only, and may be trademarks of their respective owners.

Contents Preface 1 Business Intelligence and Data Warehousing Introductions 1-2 Course Objectives 1-3 Lessons 1-5 Lessons 1-6 Let s Get Started 1-7 Lesson 1 Objectives 1-8 What Is Business Intelligence? 1-9 Purpose of Business Intelligence 1-10 Evolution of BI 1-11 Early Management Information Systems 1-12 Analyzing Data from Operational Systems 1-13 Why OLTP Is Not Suitable for Analytical Reporting 1-14 Data Extract Processing 1-15 Management Issues with Data Extract Programs 1-16 Productivity Issues with Extract Processing 1-17 Data Quality Issues with Extract Processing 1-18 Data Warehousing and Business Intelligence 1-19 Advantages of Warehouse Processing Environments 1-20 Success Factors for a Dynamic Business Environment 1-22 Business Drivers for Data Warehouses 1-23 Technological Advances Enabling Data Warehousing 1-24 Oracle9i Business Intelligence 1-26 Oracle s Business Intelligence and Data Warehousing Products 1-27 Summary 1-32 Practice 1-1 Overview 1-33 2 Defining Data Warehouse Concepts and Terminology Objectives 2-2 Definition of a Data Warehouse 2-3 Data Warehouse Properties 2-5 Subject-Oriented 2-6 Integrated 2-7 Time-Variant 2-9 Nonvolatile 2-10 Changing Warehouse Data 2-11 Data Warehouse Versus OLTP 2-12 Usage Curves 2-14 User Expectations 2-15 Enterprisewide Warehouse 2-16 Data Warehouses Versus Data Marts 2-17 Dependent Data Mart 2-19 Independent Data Mart 2-20 Typical Data Warehouse Components 2-21 Warehouse Development Approaches 2-23 iii

Big Bang Approach 2-24 Top-Down Approach 2-26 Bottom-Up Approach 2-27 Incremental Approach to Warehouse Development 2-29 Data Warehousing Process Components 2-30 Methodology 2-31 Architecture 2-32 Extraction, Transformation, and Load (ETL) 2-33 Implementation 2-34 Operation and Support 2-35 Phases of the Incremental Approach 2-36 Strategy Phase Deliverables 2-38 Summary 2-40 Practice 2-1 Overview 2-41 3 Lesson 3: Planning and Managing the Data Warehouse Project Objectives 3-2 Managing Financial Issues 3-3 ROI and Associated Costs 3-5 Computing ROI: Benefits 3-6 Computing ROI: Typical Costs 3-7 Computing ROI: Example 3-8 Funding the Project 3-9 Obtaining Business Commitment 3-10 Data Warehouse Champion 3-11 Steering Committee 3-12 Warehouse Data Ownership 3-13 Setting Expectations 3-14 Managing Expectations 3-15 Assembling the Project Team 3-16 Recognizing Critical Success Factors 3-18 Business User Requirements 3-19 Techniques for Uncovering Requirements 3-20 User Requirements Checklist 3-22 Gathering User Requirements: Possible Obstacles 3-23 Data Access Strategy 3-24 Data Access Tool Requirements 3-25 User Query Progression 3-26 Query Efficiency 3-27 Query Scheduling and Monitoring 3-29 Query Access Architectures 3-31 Web Access 3-32 Security 3-33 Fine-Grained Access Control in Oracle8i and Oracle9i 3-34 iv

Implementation Requirements 3-35 Data Acquisition 3-36 Data Quality 3-37 Documentation 3-38 Testing 3-39 Training 3-40 Training Needs 3-41 Post-Implementation Support 3-43 Summary 3-44 Practice 3-1 Overview 3-45 4 Modeling the Data Warehouse Objectives 4-2 Data Warehouse Modeling Issues 4-3 Data Warehouse Environment Data Structures 4-5 Star Schema Model 4-6 Snowflake Schema Model 4-7 Data Warehouse Database Design Phases 4-9 Phase 1: Defining the Business Model 4-10 Performing Strategic Analysis 4-11 Creating the Business Model 4-12 Business Requirements Drive the Design Process 4-13 Identifying Measures and Dimensions 4-15 Using a Business Process Matrix 4-17 Determining Granularity 4-18 Identifying Business Rules 4-19 Documenting Metadata 4-20 Metadata Documentation Approaches 4-21 Phase 2: Defining the Dimensional Model 4-22 Star Dimensional Modeling 4-23 Fact Table Characteristics 4-24 Dimension Table Characteristics 4-25 Star Dimensional Model Characteristics 4-26 Using Time in the Data Warehouse 4-27 The Time Dimension 4-28 Using Data Modeling Tools 4-29 Phase 3: Defining the Physical Model 4-31 Physical Model Design Tasks 4-32 Database Object Naming Conventions 4-33 Architectural Requirements 4-34 Strategy for Architecture Definition 4-35 Hardware Requirements 4-36 Making the Right Choice 4-37 Storage and Performance Considerations 4-38 Database Sizing 4-39 v

Test Load Sampling 4-40 Oracle9i Database Architectural Advantages 4-41 Data Partitioning 4-42 Horizontal Partitioning 4-44 Vertical Partitioning 4-45 Partitioning Methods 4-46 Indexing 4-48 B-Tree Index 4-49 Bitmap Indexes 4-50 Bitmap Join Indexes 4-51 Star Query Optimization 4-52 Star Transformation 4-53 Parallelism 4-55 Using Summary Data 4-56 Query Rewrite with Oracle9i 4-57 Summary 4-58 Practice 4-1 Overview 4-59 5 Building the Data Warehouse: Extracting Data Objectives 5-2 Extraction, Transformation, Loading (ETL) Processes 5-3 ETL: Tasks, Importance, and Cost 5-5 Extracting Data 5-7 Examining Data Sources 5-8 Production Data 5-9 Archive Data 5-10 Internal Data 5-11 External Data 5-12 Mapping Data 5-14 Extraction Techniques 5-15 Extraction Methods 5-17 Designing Extraction Processes 5-19 Maintaining Extraction Metadata 5-21 Extraction Tools 5-22 Selection Criteria 5-23 Possible ETL Failures 5-24 Maintaining ETL Quality 5-26 Oracle s Solution for ETL 5-27 Oracle s Solution for ETL: Oracle9i Streams, Replication, and Message Queuing 5-29 Frontier Airways: A Business Scenario 5-31 Summary 5-33 Practice 5-1 Overview 5-34 vi

6 Building the Data Warehouse: Transforming Data Objectives 6-2 Transformation 6-3 Possible Staging Models 6-4 Remote Staging Model 6-5 On-site Staging Model 6-6 Data Anomalies 6-7 Transformation Routines 6-8 Transforming Data: Problems and Solutions 6-9 Multipart Keys Problem 6-10 Multiple Local Standards Problem 6-12 Multiple Files Problem 6-13 Missing Values Problem 6-14 Duplicate Values Problem 6-15 Element Names Problem 6-16 Element Meaning Problem 6-17 Input Format Problem 6-18 Referential Integrity Problem 6-19 Name and Address Problem 6-20 Name and Address Processing in Oracle9i Warehouse Builder 6-22 Quality Data: Importance and Benefits 6-24 Quality: Standards and Improvements 6-26 Data Quality Guidelines 6-28 Data Quality: Solutions and Management 6-30 Transformation Techniques 6-31 Merging Data 6-32 Merging Data 6-33 Adding a Date Stamp 6-34 Adding a Date Stamp: Fact Tables and Dimensions 6-36 Adding Keys to Data 6-38 Summarizing Data 6-39 Maintaining Transformation Metadata 6-41 Data Ownership and Responsibilities 6-43 Transformation Timing and Location 6-45 Choosing a Transformation Point 6-47 Monitoring and Tracking 6-48 Designing Transformation Processes 6-49 Transformation Tools 6-50 Oracle s Enhanced Features for Transformation 6-51 Summary 6-57 Practice 6-1 Overview 6-58 vii

7 Building the Data Warehouse: Loading Warehouse Data Objectives 7-2 Loading Data into the Warehouse 7-3 Initial Load and Refresh 7-5 Data Refresh Models: Extract Processing Environment 7-7 Data Refresh Models: Warehouse Processing Environment 7-8 Building the Loading Process 7-9 Data Granularity 7-11 Loading Techniques 7-12 Loading Technique Considerations 7-14 Loading Techniques Provided by Oracle: SQL*Loader 7-16 Loading Techniques Provided by Oracle 7-18 Transportable Tablespaces 7-20 Post-Processing of Loaded Data 7-21 Indexing Data 7-23 Unique Indexes 7-24 Creating Derived Keys 7-25 Summary Management 7-27 Summary Management in Oracle9i 7-28 Filtering Data 7-30 Verifying Data Integrity 7-31 Steps for Verifying Data Integrity 7-32 Standard Quality Assurance Checks 7-34 Summary 7-35 Practice 7-1 Overview 7-36 8 Refreshing Warehouse Data Objectives 8-2 Developing a Refresh Strategy for Capturing Changed Data 8-3 User Requirements and Assistance 8-4 Load Window Requirements 8-5 Planning the Load Window 8-6 Scheduling the Load Window 8-7 Capturing Changed Data for Refresh 8-11 Wholesale Data Replacement 8-13 Comparison of Database Instances 8-14 Time and Date Stamping 8-15 Database Triggers 8-16 Using a Database Log 8-17 Choosing a Method for Change Data Capture 8-18 Change Data Capture Mechanism in Oracle9i 8-19 Refresh Mechanisms in Oracle9i 8-22 Applying the Changes to Data 8-24 Overwriting a Record 8-25 Adding a New Record 8-26 viii

Adding a Current Field 8-27 Limitations of Methods for Applying Changes 8-28 Maintaining History: Techniques 8-30 Versioning 8-33 Preserve Complete History 8-34 Purging and Archiving Data 8-35 Oracle Supported Techniques for Purging Data 8-36 Oracle Supported Techniques for Archiving Data 8-38 Final Tasks 8-39 Publishing Data 8-41 ETL Tools: Selection Criteria 8-42 ETL Tool Selection Criteria 8-44 Summary 8-45 Practice 8-1 Overview 8-46 9 Leaving a Metadata Trail Objectives 9-2 Defining Warehouse Metadata 9-3 Metadata Users 9-5 Types of Metadata 9-6 Examining Types of Metadata 9-7 Examining Metadata: ETL Metadata 9-8 Extraction Metadata 9-10 Transformation Metadata 9-12 Loading Metadata 9-13 Examining Metadata: End-User Metadata 9-14 End-User Metadata: Context 9-15 Example of End-User Metadata 9-16 Historic Context of Data 9-17 Types of Context 9-18 Developing a Metadata Strategy 9-19 Defining Metadata Goals and Intended Usage 9-20 Identifying Target Metadata Users 9-21 Choosing Metadata Tools and Techniques 9-22 Choosing the Metadata Location 9-24 Managing the Metadata 9-25 Integrating Multiple Sets of Metadata 9-26 Managing Changes to Metadata 9-27 Additional Metadata Content and Considerations 9-28 Common Warehouse Metamodel 9-30 Oracle Warehouse Builder: Compliance with OMG-CWM 9-31 Summary 9-33 Practice 9-1 Overview 9-34 ix

10 Managing and Maintaining the Data Warehouse Objectives 10-2 Managing the Transition to Production 10-3 Promoting Support for the Data Warehouse 10-4 Choosing Between Pilot and Large-Scale Implementation 10-5 The Warehouse Pilot 10-6 Piloting the Warehouse 10-7 Documentation 10-9 Testing the Warehouse 10-10 Training 10-11 Post-Implementation Support 10-13 Monitoring the Success of the Data Warehouse 10-14 Measuring the Success of the Data Warehouse 10-15 Managing Growth 10-16 Expansion and Adjustment 10-17 Controlling Expansion 10-18 Sizing Storage 10-19 Estimating Storage 10-21 Objects That Need Space 10-22 Other Considerations and Techniques 10-23 Space Management 10-24 Archiving Data 10-25 Purging Data 10-26 Identifying Data Warehouse Performance Issues 10-27 Review and Revise 10-28 Secret of Success 10-29 Course Summary 10-30 A B Practice Solutions Oracle Warehouse Builder Glossary x