The Data Warehouse Challenge



Similar documents
How To Write A Diagram

Data Warehouse Design

Measuring Data Quality for Ongoing Improvement

Master Data Management

Contents Foreword Preface Acknowledgments Introduction to Technical Architecture Chaos and Control

Object-Oriented Modeling and Design

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

Business Architecture

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days

Workflow Administration of Windchill 10.2

Turning Data into Knowledge: Creating and Implementing a Meta Data Strategy

IMPROVEMENT THE PRACTITIONER'S GUIDE TO DATA QUALITY DAVID LOSHIN

Software Factories: Assembling Applications with Patterns, Models, Frameworks, and Tools

Fluency With Information Technology CSE100/IMT100

Introduction to Windchill PDMLink 10.0 for Heavy Users

BIRT: A Field Guide to Reporting

Windchill Service Information Manager Curriculum Guide

Business Administration of Windchill PDMLink 10.0

Master Data Management and Data Governance Second Edition

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

The Data Model Resource Book Revised Edition Volume 2

Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff

AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO

Database. Administration. The Complete. and Procedures. Guide to DBA Practices. AAddison-Wesley. Second Edition. Mullins

SOFTWARE CONFIGURATION MANAGEMENT DOCUMENTATION

Project Management Using Earned Value

Contents. Dedication List of Figures List of Tables. Acknowledgments

Establish and maintain Center of Excellence (CoE) around Data Architecture

Knowledge Management

Principles of Distributed Database Systems

relevant to the management dilemma or management question.

SOA Principles of Service Design

VISUALIZING DATA POWER VIEW. with MICROSOFT. Brian Larson. Mark Davis Dan English Paui Purington. Mc Grauu. Sydney Toronto

Appendix B Data Quality Dimensions

NSW Government Standard Approach to Information Architecture. December 2013 v.1.0

Program Learning Objectives

Windchill PDMLink Curriculum Guide

Empirical Model-Building and Response Surfaces

Data Warehousing Fundamentals Student Guide

Contents RELATIONAL DATABASES

Introduction. Part I Introduction to Exchange Server

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Design and Implementation

Data Warehousing Systems: Foundations and Architectures

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Windows Server 2008 Active Directory Resource Kit

POLAR IT SERVICES. Business Intelligence Project Methodology

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

Contents. iii. ix xi xi xi xiii xiii xiii xiv xv xvi xvii xix

Climate and Disaster Resilience Index of Asian Cities

GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington

C ONTENTS. Acknowledgments

Information Management & Data Governance

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Managing Data in Motion

Microsoft SharePoint 2010 Administration

BUSINESS ANALYSIS FDR INTELLIGENCE

SOA Governance. Stephen G. Bennett, Clive Gee, Robert Laird, Co-authored and edited by Thomas Erl. Governing

US Department of Education Federal Student Aid Integration Leadership Support Contractor January 25, 2007

An Overview of Database management System, Data warehousing and Data Mining

MDM and Data Warehousing Complement Each Other

Demystified CONTENTS Acknowledgments xvii Introduction xix CHAPTER 1 Database Fundamentals CHAPTER 2 Exploring Relational Database Components

Springer SUPPLY CHAIN CONFIGURATION CONCEPTS, SOLUTIONS, AND APPLICATIONS. Cham Chandra University of Michigan - Dearborn Dearborn, Michigan, USA

Foundations of Business Intelligence: Databases and Information Management

Cloud Computing. and Scheduling. Data-Intensive Computing. Frederic Magoules, Jie Pan, and Fei Teng SILKQH. CRC Press. Taylor & Francis Group

Lection 3-4 WAREHOUSING

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

Market Entry Strategies of Foreign Telecom Companies in India

System Administration of Windchill 10.2

Music Business Lecturers Oxford, UK Seeking Part-time and casual appointments

INFORMATION SYSTEMS (IS) DATA SERVICES JOB TITLES CANNOT USE FOR VACANCIES

The Data Webhouse. Toolkit. Building the Web-Enabled Data Warehouse WILEY COMPUTER PUBLISHING

UNITED STATES DEPARTMENT OF THE INTERIOR BUREAU OF LAND MANAGEMENT MANUAL TRANSMITTAL SHEET Data Administration and Management (Public)

NEW ZEALAND FINANCIAL ACCOUNTING

Data warehouse Architectures and processes

Foundations of Business Intelligence: Databases and Information Management

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

IST722 Data Warehousing

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

Object-Oriented Systems Analysis and Design

Spatial Information Data Quality Guidelines

Framework for Data warehouse architectural components

ArcGIS Data Models Practical Templates for Implementing GIS Projects

Intellectual Development

Preface. Table of Contents. List of Figures. List of Tables. List of Abbreviations. 1 Introduction 1. 2 Problem 23.

Course MIS. Foundations of Business Intelligence

Master Data Management Architecture

Introduction to Windchill Projectlink 10.2

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

Oracle Financial Services Behavior Detection Platform Administration Guide. Release December 2013

Analytics: Pharma Analytics (Siebel 7.8) Student Guide

Contents. About This Book How To Use This Book Foreword Acknowledgments About the Author

IN THE COUNCIL OF THE DISTRICT OF COLUMBIA

Big Data for Investment Research Management

SOLUTION BRIEF CA ERwin Modeling. How can I understand, manage and govern complex data assets and improve business agility?

BSM 9.0 ESSENTIALS. Instructor-Led Training

B.Sc (Computer Science) Database Management Systems UNIT-V

Transcription:

The Data Warehouse Challenge Taming Data Chaos Michael H. Brackett Technische Hochschule Darmstadt Fachbereichsbibliothek Informatik TU Darmstadt FACHBEREICH INFORMATIK B I B L I O T H E K Irwentar-Nr.:...H.3...:T...G3.ty..2iL.. Saclwjebiete: n..7!..r; Standort: WILEY COMPUTER PUBLISHING John Wiley & Sons, Inc. New York Chichester Brisbane Toronto Singapore

Contents About the Author Foreword by William H. Inmon Acknowledgments Preface v vii ix xi Chapter 1 Data Crisis 1 Information Demand 2 Dynamic Environment < 2 Business Changes 3 Business Information Demand 4 Data Situation 4 Disparate Data 5 Disparate Data Cycle 7 Data Dilemma 8 Technology Trends 9 Client/Server Architecture 10 Data Warehouse Systems 10 Geographic Information Systems 11 Other Trends 12 Metadata Demand 13 Summary 14 Questions 15 xv

xvi CONTENTS Chapter 2 Data Challenge 17 The Realities 18 Basic Problem 18 Data Awareness Data Understanding 18 19 Data Variability Data Redundancy 20 21 Data Access Tools 22 23 Standards 24 Hidden Resource 25 / Disparate Data Shock 25 Meeting the Challenge 26 Data Resource Initiative Data Resource Strategies 26 27 Identify Data Understand Data 27 27 Integrate Data 28 Aggregate Data 28 DepZoy Data 28 Opportunity for Change 29 Approaches 29 Justification 30 Summary 32 Questions 33 Chapter 3 Data Vision 35 Integrated Data Resource 36 Principles 36 Subject-Oriented 37 Business Survival-Oriented 38 i?eaz World Perspective 38 Robust Resource 40 Sharable Resource 41 Development 42 A Formal Data Resource 43 Data Resource Library 44 Information Engineering Support 44 Data 46 Data Engineering 48 Summary 49 Questions 50

CONTENTS xvii Chapter 4 Data Architecture 51 Formal Architecture 52 Information Technology Infrastructure 52 Data Resource Framework 55 Data Architecture 55 Common Data Architecture 57 Formal Approach 60 Data Architecture Perspective 60 Data Model Perspective 61 y Data Unit Perspective 62 Objects and Events 62 Features 63 Existences and Occurrences 64 Coded Data Values 65 Data Megatype Perspective 65 Summary 67 Questions 68 Chapter 5 Data Description 69 Data Names 70 Data Naming Conventions 71 Data Naming Taxonomy 72 A Structural Taxonomy 72 Original Taxonomy Components 74 Enhanced Taxonomy Components 75 Data Naming Vocabulary 76 Aligning Naming Conventions 78 Forming Data Names 78" Data Site Names 79 Data Occurrence Selections Names 80 Data Subject Names 80 Data Code Set Names 81 Data Characteristic Names 82 Data Characteristic Variation Names 86 Data Characteristic Substitution Names 89 Data Code Names 90 Data Version Name 91 Data Name Abbreviations 92 Short Data Names 93 Defining Data 94 Data Definition Criteria 94 Data Definition Common Words 98 Summary 98 Questions 99

xviii CONTENTS Chapter 6 Chapter 7 Data Structure Data Structure Concept Common Data Structure Data Sets Data Relations Common Notation Data Relation types Data Relation Diagrams Entity Relation Diagrams Subject Relation Diagrams File Relation Diagrams Multiple Perspectives Data Subject Hierarchy Presenting Ideas Data Keys Primary Keys Multiple Primary Keys Primary Key Intelligence Dual Primary Keys Foreign Keys Subject Structure Chart Coded Data Code Tables Data Code Set Coded Data Trends Data Group Trends Data Classification Data Classification Scheme Data Themes Data Segments Data Clusters Summary Questions Data Qualit Disparate Data Quality Data Integrity Data Value Integrity Conditional Data Value Integrity Data Domains Default Data Values 101 102 102 103 104 104 105 108 108 112 114 115 116 119 123 123 125 126 128 128 129 131 131 134 134 135 135 136 139 139 140 141 142 143 144 145 146 147 150 151

CONTENTS XIX Chapter 8 Data Structure Integrity Conditional Data Structure Integrity Referential Integrity Data Retention Integrity Data Derivation Integrity Derived Data Redundant Data Replicated Data Data Accuracy Scope Data Currentness Data Lineage and Heritage Temporal Data Data Versions Multiple Source Updates Proactive and Retroactive Updates Data Completeness Managing Data Quality Data Quality Improvement Data Quality Criteria Data Quality Techniques Data Quality Process Realizing Disparate Data Quality Understanding Existing Data Quality Determine Desired Data Quality Adjusting Data Quality Tracking Data Quality Summary Questions Metadata Metadata Situation Disparate Metadata Disparate Metadata Cycle Metadata Dilemma Metadata Shock A New Perspective Metadata Types Common Metadata Metadata Warehouse Metadata Warehouse Concent 152 154 156 157 158 158 163 164 164 165 165 167 170 172 173 174 175 176 177 177 178 179 179 179 179 179 180 181 183 185 186 186 187 188 188 189 189 191 193 194

xx CONTENTS Metadata Warehouse Architecture 195 Metadata Warehouse Components 195 Data Naming Lexicon 197 Data Dictionary 199 Data Structure 202 Data Integrity 203 Data Thesaurus Data Glossary 205. 208 "" Data Product Reference Data Directory 209 211 Data Translation Schemes 212 Data Clearinghouse 213 Managing Metadata 216 Metadata Quality 216 Metadat Versions 218 Summary 220 Questions 221 Chapter 9 Data Refining 223 Data Refining Concept 224 Data Refining Approach 224 Data Product Concept 225 Data Product Names 227 Data Naming Taxonomy 227 Data Products 228 Data Product Groups 228 Data Product Units 229 Data Product Codes 230 Data Product Definitions 231 Data Product Structure 232 File Relation Diagram 232 File Structure Chart 233 Entity Relation Diagram 234 Entity Structure Chart 235 Data Product Quality 236 Data Product Integrity 236 Data Product Accuacy 237 Data Cross-Reference 238 Data Cross-Reference Approach 239 Data Product Group 240 Data Product Unit 240 Data Product Code 244 Data Product Inventory 246

CONTENTS xx! Data Variability 247 Primary Key Variability 247 Data Subject Variability 247 Data Characteristic Variability 247 Data Code Value Variability 249 Official Data Variations 251 Official Primary Key Official Data Characteristic Variations 252 252 Official Data Domains Official Data Codes 254 254 Data Translation Schemes 255 Data Characteristic Translation 255 Data Code Translation 257 Disparate Data Integration 258 Integration Scope 258 Official Data Source 259 Integration Table 260 Physical Integration 261 Summary 262 Questions 263 Chapter 10 Evaluational Data 265 Data Warehouse System Concept 266 Decision Support 266 Data Resource Support 267 Data Warehouse System Definition 268 Dual Database Concept 269 A New Perspective 270 Evaluation Data 270 Data Architecture 272 Data Dimensions 273 Evaluation Data Perspective 21A Evaluation Data Description 274 Data Subjects 275 Data Subject Names 276 Data Characteristic Names 277 Data Selection 278 Data Versions 279 Data Definitions 279 Evaluation Data Structure 280 Primary Keys 280 Subject Relation Diagram 281 Summary Data Subject Matrices 283

xxii CONTENTS Evaluation Data Integrity 285 Data Relations 285 Data Normalization 286 Data Summarization 288 Data Summarization Levels 290 Maintaining Evaluation Data 291 Data Addition 292 Data Removal 293 Data Rederivation 295 Data Version 296 Data Perspectives 297 Metadata 298 Data Exploration and Mining 301 Summary 302 Questions 303 Chapter 11 Data Transformation 305 Data Transformation Concept 306 Data Transformation Perspective 306 Data Transformation Routes 310 Data Transformation Matrix 311 Data Transformation Steps 311 Identify Target Data 312 Identify Source Data 313 Extract Source Data 314 Reconstruct Historical Data 315 Translate Data 316 Recast Data 317 Restructure Data Summarize Data 319 320 Load Data 321 Review Data 321 Summary 322 Questions 323 Chapter 12 Spatial Data 325 A Data Perspective 326 Decision Support 326 Data Situation 327 Common Data Architecture 328 Spatial Data Definitions 329

CONTENTS xxiii I Spatial Data Description 331 Data Layers 331 Spatial Data Layer Names 335 Spatial Data Definition 338 Spatial Data Structure 339 Data Relations 339 Primary Keys 342 Spatial Data Quality 344 Datums 344 Linear Referencing Systems 345 Linear Addressing Systems 347 Geographic Areas 348 Linear Object Segmentation 349 Metadata 350 Managing Spatial Data 351 Spatial Data Tiers 351 Spatial Data Themes 353 Seen Areas 354 Duplicate Data Layers 355 Data Layer Extents 356 Time-Variant Spatial Data 356 Data Layer Aggregation 357 Three-Dimensional Aggregation 360 Spatial Data Scale 361 Integrating Tabular and Spatial Data 362 Spatial Data Referencing 363 Descriptive Spatial Referencing 364 Nondescriptive Georeferencing 366 Indirect Spatial Referencing 367 Summary 369 Questions 370 Chapter 13 Distributing Data 373 Data Distribution Concept 374 Data Distribution 374 Data Distribution Dilemma 375 Common Data Architecture 376 Official Data 377 Replicating Data 378 Distributed Data Description 379 Distributed Data Names 379 Distributed Data Definitions 381

xxiv CONTENTS Distributed Data Structure 381 Logical Data Structure 382 Distributed Data Structure 382 Physical Data Structure 384 Distributed Data Diagram 386 Data Partitioning 389 Data Subject Partitioning 390 Data Occurrence Partitioning Data Characteristic Partitioning 391 392 Dual Data Partitioning 393 Distributing Data 393 Data Distribution Driver 394 Distributing Tabular Data 394 Distributing Evaluational Data Distributing Spatial Data 395 396 Distributing Metadata 397 Data Marts 398 Redistributing Data 399 Distributed Data Quality 400 Data Origination 401 Data Tracking 401 Data Concurrency 403 Distributed Data Quality Principles 405 Summary 406 Questions 407 Chapter 14 Common Data Model 409 The Data Schema Concept 410 Two-Schema Concept 410 Three-Schema Concept 411 Four-Schema Concept 412 Five-Schema Concept 414 Abstract Schema Concept 415 Framework for Information Systems 416 Five-Schema and the Framework 417 Common Data Modeling 418 Data Modeling Perceptions 419 Data Modeling Problems 420 Common Data Architecture 422 Common Data Modeling Concept 424 Forward Data Modeling 424 Reverse Data Modeling 426 Vertical Data Modeling 427

CONTENTS XXV Common Data Modeling Method Basic Data Modeling Components An Integrated Data Resource Modeling Logical Schema Developing New Data Refining Disparate Data Developing Evaluational Data Distributing Data Changing Operating Environments Integrating Data Data Model Interfaces Data Subject Hierachies Common Person Grouped Code Tables Archive and History Data Summary Questions 428 428 430 431 431 432 433 433 434 435 436 437 439 441 442 444 446 Chapter 15 Resolving the Dilemma 447 Data Issues 448 Increasing Data Disparity 448 Knowledge Loss 449 Millennium Data Problem Client Data Access 450 451 Acquired Applications Conflicting Data Standards 453 454 Standards and Guidelines 455 Rapid Development Multiple Common Data Architectures 456 457 Legacy Systems 457 Stabilizing Variables 458 Business Improvement 460 Resolution Initiative 461 Recognition 461 Vision 462 Orientation 463 Strategy 465 Evaluation 466 Summary 466 Questions 468 Glossary 469

xxvi CONTENTS Appendix A Common Words 523 Common Data Site Words 523 Common Data Subject Words 523 Common Data Characteristic Words 525 Common Data Characteristic Variation Words 528 Common Data Version Words 529 Common Data Definition Words 529 Appendix B Short Data Names 531 Parent Elimination Notation 531 Subordinate Inclusion Notation 532 Subordinate Substitution Notation 532 Parent Substitution Notation 533 Summary Data Subject Notation 533 Program Name Notation 533 Appendix C Data Definition Examples 535 Data Sites 535 Data Occurrence Groups 535 Data Subjects 536 Data Characteristics 537 Data Characteristic Variations 538 Data Codes 539 Data Versions 539 Appendix D Metadata Explanation 541 Appendix E Cross-Reference Example 545 Original Data Definitions 545 Data Qaulity Information 545 Cross-References 551 Cross-References by Common Data Name 551 Cross-References by Product Data Name 552 Subject Relation Diagram Data Definitions 553 553 Geospatial Dataset 554 Geospatial Dataset Attribute Accuracy Geospatial Dataset Horizontal Accuracy 554 554 Geospatial Dataset Process 555 Geospatial Dataset Source 555 Geospatial Dataset Vertical Accuracy 556

CONTENTS xxvii Appendix F Evaluation Data Example 557 Operational Subject Relation Diagram 558 Evaluation Subject Relation Diagram 559 Primary Key Matrix 560 Data Characteristic Matrix 562 Bibliography 565 Index 567