Part 22 Data Warehousing
The Decision Support System (DSS) Tools to assist decision-making Used at all levels in the organization Sometimes focused on a single area Sometimes focused on a single problem Interactive Needs ad hoc query tools Copyright 1971-2002 Thomas P. Sturm Data Warehousing Part 22, Page 2
Components of a DSS Data Store Business Data (internal and external) Business Model Data (generated from algorithms or "mined") Data Extraction Data Filtering End-user Query Tool (ad hoc query tool) End-user Presentation Tool Copyright 1971-2002 Thomas P. Sturm Data Warehousing Part 22, Page 3
Data Collection Conversion From manual records From machine-readable records Via CORBA Data Purification In a large database anything that can occur will Data must not contain anomalies Data could be in read and append only format Copyright 1971-2002 Thomas P. Sturm Data Warehousing Part 22, Page 4
Characteristics of DSS Data Time span Not just the current data, but covers a long time span Granularity Not every detail of every transaction (necessarily), but totals and summaries and derived data Dimensionality Data relationships in as many ways as might be relevant to the application area or problem Copyright 1971-2002 Thomas P. Sturm Data Warehousing Part 22, Page 5
Differences Between Operational Data and DSS Data Attribute Operational Data DSS Data Alternate Name On-line transaction processing On-line analytical processing Acronym OLTP OLAP Characteristic Operational processing Informational processing Orientation Transaction Analysis Timeframe Current Historical Update On-line Batch Level of Detail Low Summarized Normalization Full Not required Transactions Updates Queries Query scope Narrow Broad Data volume Gigabyte Terabyte Users Clerks, database professionals Knowledge workers Copyright 1971-2002 Thomas P. Sturm Data Warehousing Part 22, Page 6
Data Warehouse Integrated Centralized Consolidated Standardized Subject-Oriented Organized by topic Summarized by topic Multiple subjects of interest Historical or Time-Variant Time is a variable Multiple values with different time stamps Non-Volatile Data added, but never removed Always growing Batch update via appending Summaries may change Copyright 1971-2002 Thomas P. Sturm Data Warehousing Part 22, Page 7
Building the Data Warehouse Data Extraction and Collection From existing operational data and external sources Data Filtering and Reduction To remove extraneous fields (such as SSN) To collect a sample when not all instances needed Data Cleaning and Scrubbing Consistent units of measure Consistent intervals of time Consistent accounting methods Consistent definitions Data Transformation and Coding Code to numerical from categorical Categorize numerical ranges Everything should ideally reduce to numbers Aggregation and Summarization Generate subtotals and totals Generate across various dimensions Copyright 1971-2002 Thomas P. Sturm Data Warehousing Part 22, Page 8
Twelve Rules of Data Warehousing (Inmon and Kelley) 1. Data Warehouse separate from operational data 2. Data Warehouse is integrated 3. Data Warehouse contains historical data 4. Data Warehouse time components are a series of snapshots 5. Data Warehouse is subject-oriented 6. Data Warehouse is read-only except for periodic batch updates 7. Data Warehouse development is data driven 8. Data Warehouse contains multiple levels of detail, from operational detail to highly summarized 9. Data Warehouse transactions are read-only against large data sets 10. Data Warehouse traces data from source through transformations 11. Data Warehouse contains metadata 12. Data Warehouse has charge-back Copyright 1971-2002 Thomas P. Sturm Data Warehousing Part 22, Page 9
Data Warehouse Architectures Multidimensional Data Model Data cube Star Snowflake Constellation Implementation ROLAP (relational) MOLAP (multidimensional) HOLAP (hybrid) Copyright 1971-2002 Thomas P. Sturm Data Warehousing Part 22, Page 10