Data W a Ware r house house and and OLAP Week 5 1



Similar documents
Chapter 3, Data Warehouse and OLAP Operations

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Overview of Data Warehousing and OLAP

Data Warehousing and Online Analytical Processing

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

Data Warehousing and OLAP Technology

Data warehousing. Han, J. and M. Kamber. Data Mining: Concepts and Techniques Morgan Kaufmann.

2 Data Warehouse and OLAP Technology for Data Mining What is a data warehouse? Amultidimensional data model... 6

Data Warehouse. MIT-652 Data Mining Applications. Thimaporn Phetkaew. School of Informatics, Walailak University. MIT-652: DM 2: Data Warehouse 1

Data Mining for Knowledge Management. Data Warehouses

Lecture 2 Data warehousing


Data Warehousing & OLAP

CHAPTER 3. Data Warehouses and OLAP

Data Warehousing and elements of Data Mining

This tutorial will help computer science graduates to understand the basic-toadvanced concepts related to data warehousing.

TIES443. Lecture 3: Data Warehousing. Lecture 3. Data Warehousing. Course webpage:

Part 22. Data Warehousing

Introduction to Data Warehousing. Ms Swapnil Shrivastava

DATA WAREHOUSING AND OLAP TECHNOLOGY

Data Warehouse Technology And The MSD Databases

Lection 3-4 WAREHOUSING

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

14. Data Warehousing & Data Mining

DATA WAREHOUSING - OLAP

IST722 Data Warehousing

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Module 1: Introduction to Data Warehousing and OLAP

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Data Warehousing and OLAP Technology for Knowledge Discovery

Building Data Cubes and Mining Them. Jelena Jovanovic

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

Week 3 lecture slides

Fluency With Information Technology CSE100/IMT100

Chapter 7 Multidimensional Data Modeling (MDDM)

New Approach of Computing Data Cubes in Data Warehousing

Turkish Journal of Engineering, Science and Technology

Concept and Applications of Data Mining. Week 1

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies

Data Warehousing Systems: Foundations and Architectures

A Design and implementation of a data warehouse for research administration universities

DATA CUBES E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Data Warehousing and Data Mining in Business Applications

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Business Intelligence. 1. Introduction September, 2013.

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

DATA WAREHOUSING APPLICATIONS: AN ANALYTICAL TOOL FOR DECISION SUPPORT SYSTEM

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

B.Sc (Computer Science) Database Management Systems UNIT-V

Data Warehousing and Data Mining

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Week 13: Data Warehousing. Warehousing

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

Basics of Dimensional Modeling

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

When to consider OLAP?

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

A Critical Review of Data Warehouse

THE TECHNOLOGY OF USING A DATA WAREHOUSE TO SUPPORT DECISION-MAKING IN HEALTH CARE

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

OLAP and Data Warehousing! Introduction!

Chapter 3 Data Warehouse - technological growth

Learning Objectives. Definition of OLAP Data cubes OLAP operations MDX OLAP servers

Data Warehousing and OLAP

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

OLAP Theory-English version

INTERACTIVE DECISION SUPPORT SYSTEM BASED ON ANALYSIS AND SYNTHESIS OF DATA - DATA WAREHOUSE

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes

Overview. Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Data Warehousing. An Example: The Store (e.g.

Dimensional Modeling for Data Warehouse

Data Warehouse: Introduction

SAS BI Course Content; Introduction to DWH / BI Concepts

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

Data warehouses. Data Mining. Abraham Otero. Data Mining. Agenda

CASE PROJECTS IN DATA WAREHOUSING AND DATA MINING

Data Mining - Introduction

CS54100: Database Systems

M Designing and Implementing OLAP Solutions Using Microsoft SQL Server Day Course

Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1

Implementing Data Models and Reports with Microsoft SQL Server 20466C; 5 Days

Transcription:

Data Warehouse and OLAP Week 5 1

Midterm I Friday, March 4 Scope Homework assignments 1 4 Open book

Team Homework Assignment #7 Read pp. 121 139, 146 150 of the text book. Do Examples 3.8, 3.10 and Exercise 3.4 (b) and (c). Prepare for the results of the homework assignment. Due date beginning of the lecture on Friday March11 th.

Topics Definition of data warehouse Multidimensional data model Data warehouse architecture From data warehousing to data mining

What is Data Warehouse? (1) A data warehouse is a repository of information collected from multiple sources, stored under a unified schema, and that usually resides at a single site A data warehouse is a semantically consistent data store that serves as a physical implementation of a decision support data model and stores the information on which an enterprise need to make strategic decisions

What is Data Warehouse? (2) Data warehouses provide on line analytical processing (OLAP) tools for the interactive analysis of multidimensional data of varied granularities, which facilitate effective data generalization and data mining Many other data mining functions, such as association, classification, prediction, and clustering, can be integrated t with OLAP operations to enhance interactive mining of knowledge at multiple levels of abstraction

What is Data Warehouse? (3) A decision support database that is maintained separately from the organization s operational database A data warehouse is a subject oriented, integrated, time variant, and nonvolatile collection of data in support of management s decision making process [Inm96]. W. H. Inmon

Data Warehouse Framework data mining Figure 1.7 Typical framework of a data warehouse for AllElectronics 8

Data Warehouse is Subject-Oriented Oi Organized around major subjects, such as customer, product, sales, etc. Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process

Data Warehouse is Integrated t Constructed by integrating multiple, heterogeneous data sources relational databases, flat files, on line transaction records Data cleaning and data integration techniques are applied. Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources E.g., Hotel price: currency, tax, breakfast covered, etc.

Data Warehouse is Time Variant The time horizon for the data warehouse is significantly longer than that of operational systems Operational database: current value data Data warehouse data: provide information from a historical perspective (e.g., past 5 10 years) Every key structure in the data warehouse Contains an element of time, explicitly or implicitly

Data Warehouse is Nonvolatile A physically separate store of data transformed from the operational environment Operational update of data does not occur in the data warehouse environment Does not require transaction processing, recovery, and concurrency control mechanisms Requires only two operations in dt data accessing: initial loading of data and access of data

OLTP vs. OLAP Table 3.1 Comparison between OLTP and OLAP 13

Why Separate is Data Warehouse Needed? (1) Why not perform on line analytical processing directly on operational databases insteadof spendingadditional time and resources to construct a separate data warehouse?

Why Separate is Data Warehouse Needed? d?( (2) High performance for both systems DBMS tuned for OLTP: searching for particular records, indexing, hashing, concurrency control, recovery Warehouse tuned for OLAP: complex OLAP queries, multidimensional view, consolidation (summarization and aggregation)

Topics Definition of data warehouse Multidimensional data model Data warehouse architecture From data warehousing to data mining

From Tables and Spreadsheets to Data Cubes A data warehouse is based on a multidimensional data model This model views data in the form of a data cube A data cube allows data to be modeled and viewed in multiple dimensions

From Tables and Spreadsheets to Data Cubes (1) A data cube is defined by facts and dimensions Facts are data which data warehouse focus on Fact tables contain numeric measures (such as dollars_sold) and keys to each of the related dimension tables Dimensions are perspectives with respect to fact Dimension tables describe the dimension with attributes. For example, item (item_name, brand, type), or time(day, week, month, quarter, year)

19 Figure 1.6. Fragments of relations from a relational database for AllElectronics

From Tables and Spreadsheets to Data Cubes (2) dimensions Facts (numerical measures) Table 3.2 A 2-D view of sales data for AllElectronics according to the dimensions i time and item, where the sales are from branches located in the city of Vancouver. The measure displayed is dollar_sold (in thousands). 20

From Tables and Spreadsheets to Data Cubes (3) Table 3.3 A 3-D view of sales data for AllElectronics according to the dimensions time, item, and location. The measure displayed is dollar_sold (in thousands). 21

From Tables and Spreadsheets to Data Cubes (4) Figure 3.1 A 3-D data cube representation of the data in Table 3.3, according to the dimensions time, item, and location. The measure displayed is dollar_sold (in thousands). 22

From Tables and Spreadsheets to Data Cubes (5) Figure 3.2 A 4-D data cube representation, according to the dimensions time, item, location, and supplier. The measure displayed is dollar_sold (in thousands). 23

Cuboid A data cube is a lattice of cuboids The total number of cuboids The apex cuboid The base cuboid 24

Figure 3.14 Lattice of cuboids, making up a 3-D data cube. Each cuboid represents a different group-by. The base cuboid contains the three dimensions city, item, and year. 25

The Curse of Dimensionality How many cuboids are there in a n dimensional data cube? How many cuboids are there in a n dimensional data cube and each dimension (i) has the number of level, (L i )? 26

Conceptual Modeling of Data Warehouses Modeling dt data warehouses: dimensions i & measures Star schema: A fact table in the middle connected to a set of dimension tables Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake Fact constellations: tllti Multiple l fact ttbl tables share dimension i tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation

Star Schema time time_ key day day_of_the_week month quarter year branch branch_key branch_name branch_type Sales Fact Table time_key item_key branch_key location_key dollars_sold unit_sold item item_key item_name brand type supplier_type location location_key street city province_or_street country Figure 3.4 Star schema of a data warehouse for sales. 28

time time_key day day_of_the_week month quarter year branch branch_key branch_name branch_type Snowflake Schema Sales Fact Table time_key item_key branch_key location_key dollars_sold units_sold item item_key item_name brand type supplier_key location location_key street city supplier supplier_key supplier_type city city_key city province_or_street country Figure 3.4 Snowflake schema of a data warehouse for sales. 29

time time_key day day_of_the_week month quarter year branch branch_key branch_name branch_type Fact Constellation Sales Fact Table time_key item_key branch_key location_key dollars_sold unit_sold item item_key item_name brand type supplier_type location location_key street city shipper province_or_street country Shipping Fact Table item_key time_key shipper_key from_location to_location dollars_sold unit_ shipped shipper shipper_key shipper_name location_key shipper_type Figure 3.5 Fact constellation schema of a data warehouse for sales and shipping. 30

Exercise Exercise 35(a) 3.5 page 153 31