What Is a Data Warehouse?

Similar documents
Overview of Data Warehousing and OLAP

Data W a Ware r house house and and OLAP Week 5 1

Data Warehousing & OLAP

Chapter 3, Data Warehouse and OLAP Operations

Data Warehousing & OLAP

Data Warehousing and OLAP Technology

Data Warehouse. MIT-652 Data Mining Applications. Thimaporn Phetkaew. School of Informatics, Walailak University. MIT-652: DM 2: Data Warehouse 1

Data Mining for Knowledge Management. Data Warehouses

Data warehousing. Han, J. and M. Kamber. Data Mining: Concepts and Techniques Morgan Kaufmann.

Lecture 2 Data warehousing

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

Data Warehousing and Online Analytical Processing

2 Data Warehouse and OLAP Technology for Data Mining What is a data warehouse? Amultidimensional data model... 6


BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Data Mining. Session 4 Main Theme Data Warehousing and OLAP. Dr. Jean-Claude Franchitti

TIES443. Lecture 3: Data Warehousing. Lecture 3. Data Warehousing. Course webpage:

Part 22. Data Warehousing

Data Warehousing and OLAP.

Data Warehousing and elements of Data Mining

Data Warehousing & OLAP

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

OLAP OLAP. Data Warehouse. OLAP Data Model: the Data Cube S e s s io n

Lecture Data Warehouse Systems

Data Warehousing & OLAP

Indexing Techniques for Data Warehouses Queries. Abstract

14. Data Warehousing & Data Mining

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

What is OLAP - On-line analytical processing

Week 13: Data Warehousing. Warehousing

Introduction to Data Warehousing. Ms Swapnil Shrivastava

Improved Query Performance with Variant Indexes

Data Warehouse Technology And The MSD Databases

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

Data warehouse design

Improved Query Performance with Variant Indexes

OLAP Systems and Multidimensional Expressions I

Data Warehouse Logical Design. Letizia Tanca Politecnico di Milano (with the kind support of Rosalba Rossato)

Review. Data Warehousing. Today. Star schema. Star join indexes. Dimension hierarchies

DATA WAREHOUSING AND OLAP TECHNOLOGY

Optimizing Your Data Warehouse Design for Superior Performance

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Data Warehousing Concepts

Data Warehousing and OLAP

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

Chapter 7 Multidimensional Data Modeling (MDDM)

A Critical Review of Data Warehouse

This tutorial will help computer science graduates to understand the basic-toadvanced concepts related to data warehousing.

Module Title: Business Intelligence

Chapter 3 Data Warehouse - technological growth

Improve Data Warehouse Performance by Preprocessing and Avoidance of Complex Resource Intensive Calculations

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Designing a Dimensional Model

Lecture Data Warehouse Systems

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

Mario Guarracino. Data warehousing

CHAPTER 3. Data Warehouses and OLAP

Basics of Dimensional Modeling

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Fluency With Information Technology CSE100/IMT100

Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries

(Week 10) A04. Information System for CRM. Electronic Commerce Marketing

DATA WAREHOUSING - OLAP

Data Warehouses and NoSQL Sharing Administra6ve Informa6on

IST722 Data Warehousing

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

Evaluation of Bitmap Index Compression using Data Pump in Oracle Database

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

Data Warehouse Design

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi

Application Of Business Intelligence In Agriculture 2020 System to Improve Efficiency And Support Decision Making in Investments.

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1

DATA WAREHOUSING II. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 23

CS54100: Database Systems

Investigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses 1

Multi-dimensional index structures Part I: motivation

Lection 3-4 WAREHOUSING

Week 3 lecture slides

Multi-Dimensional Database Allocation for Parallel Data Warehouses

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

When to consider OLAP?

Advanced Data Management Technologies

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Data Warehousing and OLAP Technology for Knowledge Discovery

Investigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses

Evaluation of Expressions

Bitmap Index Design and Evaluation

Data Mining and Data Warehousing Henryk Maciejewski Data Warehousing and OLAP

Chapter 23. Database Security. Security Issues. Database Security

Bitmap Indices for Data Warehouses

1960s 1970s 1980s 1990s. Slow access to

Turkish Journal of Engineering, Science and Technology

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

IT2305 Database Systems I (Compulsory)

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

Data Warehousing, OLAP, and Data Mining

Transcription:

What Is a Data Warehouse? A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management s decision-making process. W. H. Inmon Data warehousing: the process of constructing and using data warehouses Jian Pei: CMPT 741/459 Data Warehousing and OLAP (2) 1

https://classconnection.s3.amazonaws.com/922/flashcards/636922/png/image061351738272741.png OLAP Operations Jian Pei: CMPT 741/459 Data Warehousing and OLAP (2) 2

Data Warehouse Schema Design Query answering efficiency Subject orientation Integration Tradeoff between time and space Universal table versus fully normalized schema Jian Pei: CMPT 741/459 Data Warehousing and OLAP (2) 3

Star Schema time time_key day day_of_the_week month quarter year branch branch_key branch_name branch_type Measures Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales item item_key item_name brand type supplier_type location location_key street city state_or_province country Jian Pei: CMPT 741/459 Data Warehousing and OLAP (2) 4

Snowflake Schema time item time_key day day_of_the_week month quarter year branch branch_key branch_name branch_type Measures Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales item_key item_name brand type supplier_key location location_key street city_key supplier supplier_key supplier_type city city_key city state_or_province country Jian Pei: CMPT 741/459 Data Warehousing and OLAP (2) 5

Fact Constellation Shipping Fact Table time_key time time_key day day_of_the_week month quarter year Sales Fact Table time_key item_key branch_key item item_key item_name brand type supplier_type item_key shipper_key from_location to_location dollars_cost branch branch_key branch_name branch_type Measures location_key units_sold dollars_sold avg_sales location location_key street city province_or_state country units_shipped shipper shipper_key shipper_name location_key shipper_type Jian Pei: CMPT 741/459 Data Warehousing and OLAP (2) 6

(Good) Aggregate Functions Distributive: there is a function G() such that F({X i,j }) = G({F({X i,j i=1,...,l j }) j=1, n}) Examples: COUNT(), MIN(), MAX(), SUM() G=SUM() for COUNT() Algebraic: there is an M-tuple valued function G() and a function H() such that F({Xi,j}) = H({G({Xi,j i=1,.., I}) j=1,..., n }) Examples: AVG(), standard deviation, MaxN(), MinN() For AVG(), G() records sum and count, H() adds these two components and divides to produce the global average Jian Pei: CMPT 741/459 Data Warehousing and OLAP (2) 7

Holistic Aggregate Functions There is no constant bound on the size of the storage needed to describe a subaggregate. There is no constant M, such that an M-tuple characterizes the computation F({Xi,j i=1,...,i}). Examples: Median(), MostFrequent() (also called the Mode()), and Rank() Jian Pei: CMPT 741/459 Data Warehousing and OLAP (2) 8

Index Requirements in OLAP Data is read only (Almost) no insertion or deletion Query types Point query: looking up one specific tuple (rare) Range query: returning the aggregate of a (large) set of tuples, with group by Complex queries: need specific algorithms and index structures, will be discussed later Jian Pei: CMPT 741/459 Data Warehousing and OLAP (2) 9

OLAP Query Example In table (cust, gender, ), find the total number of male customers Method 1: scan the table once Method 2: build a B+ tree index on attribute gender, still need to access all tuples of male customers Can we get the count without scanning many tuples, even not all tuples of male customers? Jian Pei: CMPT 741/459 Data Warehousing and OLAP (2) 10

Bitmap Index For n tuples, a bitmap index has n bits and can be packed into!n /8" bytes and!n /32" words From a bit to the row-id: the j-th bit of the p- th byte! row-id = p*8 +j cust gender 1 0 0 Jack M Cathy F Nancy F Jian Pei: CMPT 741/459 Data Warehousing and OLAP (2) 11

Using Bitmap to Count Shcount[] contains the number of bits in the entry subscript Example: shcount[01100101]=4 count = 0; for (i = 0; i < SHNUM; i++) count += shcount[b[i]]; Jian Pei: CMPT 741/459 Data Warehousing and OLAP (2) 12

Advantages of Bitmap Index Efficient in space Ready for logic composition C = C1 AND C2 Bitmap operations can be used Bitmap index only works for categorical data with low cardinality Naively, we need 50 bits per entry to represent the state of a customer in US How to represent a sale in dollars? Jian Pei: CMPT 741/459 Data Warehousing and OLAP (2) 13

Bit-Sliced Index A sale amount can be written as an integer number of pennies, and then be represented as a binary number of N bits 24 bits is good for up to $167,772.15, appropriate for many stores A bit-sliced index is N bitmaps Tuple j sets in bitmap k if the k-th bit in its binary representation is on The space costs of bit-sliced index is the same as storing the data directly Jian Pei: CMPT 741/459 Data Warehousing and OLAP (2) 14

Using Indexes SELECT SUM(sales) FROM Sales WHERE C; Tuples satisfying C is identified by a bitmap B Direct access to rows to calculate SUM: scan the whole table once B+ tree: find the tuples from the tree Projection index: only scan attribute sales Bit-sliced index: get the sum from (B AND B k )*2 k Jian Pei: CMPT 741/459 Data Warehousing and OLAP (2) 15

Cost Comparison Traditional value-list index (B+ tree) is costly in both I/O and CPU time Not good for OLAP Bit-sliced index is efficient in I/O Other case studies in [O Neil and Quass, SIGMOD 97] Jian Pei: CMPT 741/459 Data Warehousing and OLAP (2) 16

Horizontal or Vertical Storage A fact table for data warehousing is often fat Tens of even hundreds of dimensions/attributes A query is often about only a few attributes Horizontal storage: tuples are stored one by one Vertical storage: tuples are stored by attributes A 1 A 2 A 100 x 1 x 2 x 100 z 1 z 2 z 100 A 1 A 2 A 100 x 1 x 2 x 100 z 1 z 2 z 100 Jian Pei: CMPT 741/459 Data Warehousing and OLAP (2) 17

Horizontal Versus Vertical Find the information of tuple t Typical in OLTP Horizontal storage: get the whole tuple in one search Vertical storage: search 100 lists Find SUM(a 100 ) GROUP BY {a 22, a 83 } Typical in OLAP Horizontal storage (no index): search all tuples O(100n), where n is the number of tuples Vertical storage: search 3 lists O(3n), 3% of the horizontal storage method Projection index: vertical storage Jian Pei: CMPT 741/459 Data Warehousing and OLAP (2) 18