Data Warehousing Fundamentals Student Guide D16310GC10 Edition 1.0 September 2002 D37302
Authors Nikos Psomas Padmaja Mitravinda, Kolachalam Technical Contributors and Reviewers Kasturi Shekhar Vidya Nagaraj Sudip Majumber Robert Stackowiak Joel Barkin Adam Laro-Bashford M. Lea Shaw Richard Green Jean-Pierre Dijcks Mark Van de Wiel James Spiller John Haydu Maribel Renau Marcelo Manzano Sarah Spicer Rosita Hanoman Publisher Nita K. Brozowski Copyright Oracle Corporation, 2002. All rights reserved. This documentation contains proprietary information of Oracle Corporation. It is provided under a license agreement containing restrictions on use and disclosure and is also protected by copyright law. Reverse engineering of the software is prohibited. If this documentation is delivered to a U.S. Government Agency of the Department of Defense, then it is delivered with Restricted Rights and the following legend is applicable: Restricted Rights Legend Use, duplication or disclosure by the Government is subject to restrictions for commercial computer software and shall be deemed to be Restricted Rights software under Federal law, as set forth in subparagraph (c)(1)(ii) of DFARS 252.227-7013, Rights in Technical Data and Computer Software (October 1988). This material or any portion of it may not be copied in any form or by any means without the express prior written permission of Oracle Corporation. Any other copying is a violation of copyright law and may result in civil and/or criminal penalties. If this documentation is delivered to a U.S. Government Agency not within the Department of Defense, then it is delivered with Restricted Rights, as defined in FAR 52.227-14, Rights in Data-General, including Alternate III (June 1987). The information in this document is subject to change without notice. If you find any problems in the documentation, please report them in writing to Education Products, Oracle Corporation, 500 Oracle Parkway, Box SB-6, Redwood Shores, CA 94065. Oracle Corporation does not warrant that this document is error-free. Express, Express Analyzer, Express Objects, Express Server, Personal Express, and Oracle are trademarks or registered trademarks of Oracle Corporation. All other products or company names are used for identification purposes only, and may be trademarks of their respective owners.
Contents Preface 1 Business Intelligence and Data Warehousing Introductions 1-2 Course Objectives 1-3 Lessons 1-5 Lessons 1-6 Let s Get Started 1-7 Lesson 1 Objectives 1-8 What Is Business Intelligence? 1-9 Purpose of Business Intelligence 1-10 Evolution of BI 1-11 Early Management Information Systems 1-12 Analyzing Data from Operational Systems 1-13 Why OLTP Is Not Suitable for Analytical Reporting 1-14 Data Extract Processing 1-15 Management Issues with Data Extract Programs 1-16 Productivity Issues with Extract Processing 1-17 Data Quality Issues with Extract Processing 1-18 Data Warehousing and Business Intelligence 1-19 Advantages of Warehouse Processing Environments 1-20 Success Factors for a Dynamic Business Environment 1-22 Business Drivers for Data Warehouses 1-23 Technological Advances Enabling Data Warehousing 1-24 Oracle9i Business Intelligence 1-26 Oracle s Business Intelligence and Data Warehousing Products 1-27 Summary 1-32 Practice 1-1 Overview 1-33 2 Defining Data Warehouse Concepts and Terminology Objectives 2-2 Definition of a Data Warehouse 2-3 Data Warehouse Properties 2-5 Subject-Oriented 2-6 Integrated 2-7 Time-Variant 2-9 Nonvolatile 2-10 Changing Warehouse Data 2-11 Data Warehouse Versus OLTP 2-12 Usage Curves 2-14 User Expectations 2-15 Enterprisewide Warehouse 2-16 Data Warehouses Versus Data Marts 2-17 Dependent Data Mart 2-19 Independent Data Mart 2-20 Typical Data Warehouse Components 2-21 Warehouse Development Approaches 2-23 iii
Big Bang Approach 2-24 Top-Down Approach 2-26 Bottom-Up Approach 2-27 Incremental Approach to Warehouse Development 2-29 Data Warehousing Process Components 2-30 Methodology 2-31 Architecture 2-32 Extraction, Transformation, and Load (ETL) 2-33 Implementation 2-34 Operation and Support 2-35 Phases of the Incremental Approach 2-36 Strategy Phase Deliverables 2-38 Summary 2-40 Practice 2-1 Overview 2-41 3 Lesson 3: Planning and Managing the Data Warehouse Project Objectives 3-2 Managing Financial Issues 3-3 ROI and Associated Costs 3-5 Computing ROI: Benefits 3-6 Computing ROI: Typical Costs 3-7 Computing ROI: Example 3-8 Funding the Project 3-9 Obtaining Business Commitment 3-10 Data Warehouse Champion 3-11 Steering Committee 3-12 Warehouse Data Ownership 3-13 Setting Expectations 3-14 Managing Expectations 3-15 Assembling the Project Team 3-16 Recognizing Critical Success Factors 3-18 Business User Requirements 3-19 Techniques for Uncovering Requirements 3-20 User Requirements Checklist 3-22 Gathering User Requirements: Possible Obstacles 3-23 Data Access Strategy 3-24 Data Access Tool Requirements 3-25 User Query Progression 3-26 Query Efficiency 3-27 Query Scheduling and Monitoring 3-29 Query Access Architectures 3-31 Web Access 3-32 Security 3-33 Fine-Grained Access Control in Oracle8i and Oracle9i 3-34 iv
Implementation Requirements 3-35 Data Acquisition 3-36 Data Quality 3-37 Documentation 3-38 Testing 3-39 Training 3-40 Training Needs 3-41 Post-Implementation Support 3-43 Summary 3-44 Practice 3-1 Overview 3-45 4 Modeling the Data Warehouse Objectives 4-2 Data Warehouse Modeling Issues 4-3 Data Warehouse Environment Data Structures 4-5 Star Schema Model 4-6 Snowflake Schema Model 4-7 Data Warehouse Database Design Phases 4-9 Phase 1: Defining the Business Model 4-10 Performing Strategic Analysis 4-11 Creating the Business Model 4-12 Business Requirements Drive the Design Process 4-13 Identifying Measures and Dimensions 4-15 Using a Business Process Matrix 4-17 Determining Granularity 4-18 Identifying Business Rules 4-19 Documenting Metadata 4-20 Metadata Documentation Approaches 4-21 Phase 2: Defining the Dimensional Model 4-22 Star Dimensional Modeling 4-23 Fact Table Characteristics 4-24 Dimension Table Characteristics 4-25 Star Dimensional Model Characteristics 4-26 Using Time in the Data Warehouse 4-27 The Time Dimension 4-28 Using Data Modeling Tools 4-29 Phase 3: Defining the Physical Model 4-31 Physical Model Design Tasks 4-32 Database Object Naming Conventions 4-33 Architectural Requirements 4-34 Strategy for Architecture Definition 4-35 Hardware Requirements 4-36 Making the Right Choice 4-37 Storage and Performance Considerations 4-38 Database Sizing 4-39 v
Test Load Sampling 4-40 Oracle9i Database Architectural Advantages 4-41 Data Partitioning 4-42 Horizontal Partitioning 4-44 Vertical Partitioning 4-45 Partitioning Methods 4-46 Indexing 4-48 B-Tree Index 4-49 Bitmap Indexes 4-50 Bitmap Join Indexes 4-51 Star Query Optimization 4-52 Star Transformation 4-53 Parallelism 4-55 Using Summary Data 4-56 Query Rewrite with Oracle9i 4-57 Summary 4-58 Practice 4-1 Overview 4-59 5 Building the Data Warehouse: Extracting Data Objectives 5-2 Extraction, Transformation, Loading (ETL) Processes 5-3 ETL: Tasks, Importance, and Cost 5-5 Extracting Data 5-7 Examining Data Sources 5-8 Production Data 5-9 Archive Data 5-10 Internal Data 5-11 External Data 5-12 Mapping Data 5-14 Extraction Techniques 5-15 Extraction Methods 5-17 Designing Extraction Processes 5-19 Maintaining Extraction Metadata 5-21 Extraction Tools 5-22 Selection Criteria 5-23 Possible ETL Failures 5-24 Maintaining ETL Quality 5-26 Oracle s Solution for ETL 5-27 Oracle s Solution for ETL: Oracle9i Streams, Replication, and Message Queuing 5-29 Frontier Airways: A Business Scenario 5-31 Summary 5-33 Practice 5-1 Overview 5-34 vi
6 Building the Data Warehouse: Transforming Data Objectives 6-2 Transformation 6-3 Possible Staging Models 6-4 Remote Staging Model 6-5 On-site Staging Model 6-6 Data Anomalies 6-7 Transformation Routines 6-8 Transforming Data: Problems and Solutions 6-9 Multipart Keys Problem 6-10 Multiple Local Standards Problem 6-12 Multiple Files Problem 6-13 Missing Values Problem 6-14 Duplicate Values Problem 6-15 Element Names Problem 6-16 Element Meaning Problem 6-17 Input Format Problem 6-18 Referential Integrity Problem 6-19 Name and Address Problem 6-20 Name and Address Processing in Oracle9i Warehouse Builder 6-22 Quality Data: Importance and Benefits 6-24 Quality: Standards and Improvements 6-26 Data Quality Guidelines 6-28 Data Quality: Solutions and Management 6-30 Transformation Techniques 6-31 Merging Data 6-32 Merging Data 6-33 Adding a Date Stamp 6-34 Adding a Date Stamp: Fact Tables and Dimensions 6-36 Adding Keys to Data 6-38 Summarizing Data 6-39 Maintaining Transformation Metadata 6-41 Data Ownership and Responsibilities 6-43 Transformation Timing and Location 6-45 Choosing a Transformation Point 6-47 Monitoring and Tracking 6-48 Designing Transformation Processes 6-49 Transformation Tools 6-50 Oracle s Enhanced Features for Transformation 6-51 Summary 6-57 Practice 6-1 Overview 6-58 vii
7 Building the Data Warehouse: Loading Warehouse Data Objectives 7-2 Loading Data into the Warehouse 7-3 Initial Load and Refresh 7-5 Data Refresh Models: Extract Processing Environment 7-7 Data Refresh Models: Warehouse Processing Environment 7-8 Building the Loading Process 7-9 Data Granularity 7-11 Loading Techniques 7-12 Loading Technique Considerations 7-14 Loading Techniques Provided by Oracle: SQL*Loader 7-16 Loading Techniques Provided by Oracle 7-18 Transportable Tablespaces 7-20 Post-Processing of Loaded Data 7-21 Indexing Data 7-23 Unique Indexes 7-24 Creating Derived Keys 7-25 Summary Management 7-27 Summary Management in Oracle9i 7-28 Filtering Data 7-30 Verifying Data Integrity 7-31 Steps for Verifying Data Integrity 7-32 Standard Quality Assurance Checks 7-34 Summary 7-35 Practice 7-1 Overview 7-36 8 Refreshing Warehouse Data Objectives 8-2 Developing a Refresh Strategy for Capturing Changed Data 8-3 User Requirements and Assistance 8-4 Load Window Requirements 8-5 Planning the Load Window 8-6 Scheduling the Load Window 8-7 Capturing Changed Data for Refresh 8-11 Wholesale Data Replacement 8-13 Comparison of Database Instances 8-14 Time and Date Stamping 8-15 Database Triggers 8-16 Using a Database Log 8-17 Choosing a Method for Change Data Capture 8-18 Change Data Capture Mechanism in Oracle9i 8-19 Refresh Mechanisms in Oracle9i 8-22 Applying the Changes to Data 8-24 Overwriting a Record 8-25 Adding a New Record 8-26 viii
Adding a Current Field 8-27 Limitations of Methods for Applying Changes 8-28 Maintaining History: Techniques 8-30 Versioning 8-33 Preserve Complete History 8-34 Purging and Archiving Data 8-35 Oracle Supported Techniques for Purging Data 8-36 Oracle Supported Techniques for Archiving Data 8-38 Final Tasks 8-39 Publishing Data 8-41 ETL Tools: Selection Criteria 8-42 ETL Tool Selection Criteria 8-44 Summary 8-45 Practice 8-1 Overview 8-46 9 Leaving a Metadata Trail Objectives 9-2 Defining Warehouse Metadata 9-3 Metadata Users 9-5 Types of Metadata 9-6 Examining Types of Metadata 9-7 Examining Metadata: ETL Metadata 9-8 Extraction Metadata 9-10 Transformation Metadata 9-12 Loading Metadata 9-13 Examining Metadata: End-User Metadata 9-14 End-User Metadata: Context 9-15 Example of End-User Metadata 9-16 Historic Context of Data 9-17 Types of Context 9-18 Developing a Metadata Strategy 9-19 Defining Metadata Goals and Intended Usage 9-20 Identifying Target Metadata Users 9-21 Choosing Metadata Tools and Techniques 9-22 Choosing the Metadata Location 9-24 Managing the Metadata 9-25 Integrating Multiple Sets of Metadata 9-26 Managing Changes to Metadata 9-27 Additional Metadata Content and Considerations 9-28 Common Warehouse Metamodel 9-30 Oracle Warehouse Builder: Compliance with OMG-CWM 9-31 Summary 9-33 Practice 9-1 Overview 9-34 ix
10 Managing and Maintaining the Data Warehouse Objectives 10-2 Managing the Transition to Production 10-3 Promoting Support for the Data Warehouse 10-4 Choosing Between Pilot and Large-Scale Implementation 10-5 The Warehouse Pilot 10-6 Piloting the Warehouse 10-7 Documentation 10-9 Testing the Warehouse 10-10 Training 10-11 Post-Implementation Support 10-13 Monitoring the Success of the Data Warehouse 10-14 Measuring the Success of the Data Warehouse 10-15 Managing Growth 10-16 Expansion and Adjustment 10-17 Controlling Expansion 10-18 Sizing Storage 10-19 Estimating Storage 10-21 Objects That Need Space 10-22 Other Considerations and Techniques 10-23 Space Management 10-24 Archiving Data 10-25 Purging Data 10-26 Identifying Data Warehouse Performance Issues 10-27 Review and Revise 10-28 Secret of Success 10-29 Course Summary 10-30 A B Practice Solutions Oracle Warehouse Builder Glossary x