Overview of Data Warehousing and OLAP



Similar documents
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Data W a Ware r house house and and OLAP Week 5 1

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Data warehousing. Han, J. and M. Kamber. Data Mining: Concepts and Techniques Morgan Kaufmann.

Chapter 3, Data Warehouse and OLAP Operations

Data Warehousing & OLAP

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

Data Warehouse. MIT-652 Data Mining Applications. Thimaporn Phetkaew. School of Informatics, Walailak University. MIT-652: DM 2: Data Warehouse 1

Introduction to Data Warehousing. Ms Swapnil Shrivastava

Data Warehousing and OLAP Technology

Data Mining for Knowledge Management. Data Warehouses

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

DATA WAREHOUSING - OLAP

Week 13: Data Warehousing. Warehousing

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

DATA WAREHOUSING AND OLAP TECHNOLOGY

Lecture 2 Data warehousing

Data Warehousing and OLAP

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

Data Warehousing and Online Analytical Processing

Learning Objectives. Definition of OLAP Data cubes OLAP operations MDX OLAP servers

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Anwendersoftware Anwendungssoftwares a. Data-Warehouse-, Data-Mining- and OLAP-Technologies. Online Analytic Processing


Week 3 lecture slides

DATA CUBES E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

2 Data Warehouse and OLAP Technology for Data Mining What is a data warehouse? Amultidimensional data model... 6

Data Warehousing and elements of Data Mining

This tutorial will help computer science graduates to understand the basic-toadvanced concepts related to data warehousing.

Part 22. Data Warehousing

DATA WAREHOUSING APPLICATIONS: AN ANALYTICAL TOOL FOR DECISION SUPPORT SYSTEM

TIES443. Lecture 3: Data Warehousing. Lecture 3. Data Warehousing. Course webpage:

Data Warehousing, OLAP, and Data Mining

Data W a Ware r house house and and OLAP II Week 6 1

Lecture 2: Introduction to Business Intelligence. Introduction to Business Intelligence

Data Warehousing and OLAP Technology for Knowledge Discovery

What is OLAP - On-line analytical processing

BUSINESS ANALYTICS AND DATA VISUALIZATION. ITM-761 Business Intelligence ดร. สล ล บ ญพราหมณ

IST722 Data Warehousing

Lection 3-4 WAREHOUSING

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

OLAP and Data Warehousing! Introduction!

Building Data Cubes and Mining Them. Jelena Jovanovic

When to consider OLAP?

CHAPTER 4 Data Warehouse Architecture

14. Data Warehousing & Data Mining

Namrata 1, Dr. Saket Bihari Singh 2 Research scholar (PhD), Professor Computer Science, Magadh University, Gaya, Bihar

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

CHAPTER 3. Data Warehouses and OLAP

Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1

Unit -3. Learning Objective. Demand for Online analytical processing Major features and functions OLAP models and implementation considerations

CS2032 Data warehousing and Data Mining Unit II Page 1

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

UNIT-3 OLAP in Data Warehouse

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

Why Business Intelligence

Data Warehousing Systems: Foundations and Architectures

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Basics of Dimensional Modeling

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Data Warehousing and Data Mining

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer

Data Warehouse. The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the following way:

A Comparative Study on Operational Database, Data Warehouse and Hadoop File System T.Jalaja 1, M.Shailaja 2

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

Data Warehousing OLAP

Turning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex,

Multi-dimensional index structures Part I: motivation

A Critical Review of Data Warehouse

A Technical Review on On-Line Analytical Processing (OLAP)

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes

Data Warehouse: Introduction

70-467: Designing Business Intelligence Solutions with Microsoft SQL Server

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

Turkish Journal of Engineering, Science and Technology

The strategic importance of OLAP and multidimensional analysis A COGNOS WHITE PAPER

OLAP Systems and Multidimensional Expressions I

Designing a Dimensional Model

B.Sc (Computer Science) Database Management Systems UNIT-V

Fluency With Information Technology CSE100/IMT100

New Approach of Computing Data Cubes in Data Warehousing

Transcription:

Overview of Data Warehousing and OLAP Chapter 28 March 24, 2008 ADBS: DW 1

Chapter Outline What is a data warehouse (DW) Conceptual structure of DW Why separate DW Data modeling for DW Online Analytical Processing (OLAP) Building A Data Warehouse Warehouse vs. Data Views March 24, 2008 ADBS: DW 2

- What is Data Warehouse? Defined in many different ways, but not rigorously. A decision support database that is maintained separately from the organization s operational database Support information processing by providing a solid platform of consolidated, historical data for analysis. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management s decision-making process. W. H. Inmon Data warehousing: The process of constructing and using data warehouses March 24, 2008 ADBS: DW 3

-- Data Warehouse Subject-Oriented Organized around major subjects, such as customer, product, sales. Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing. Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process. March 24, 2008 ADBS: DW 4

-- Data Warehouse Integrated Constructed by integrating multiple, heterogeneous data sources relational databases, flat files, on-line transaction records Data cleaning and data integration techniques are applied. Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources E.g., Hotel price: currency, tax, breakfast covered, etc. When data is moved to the warehouse, it is converted. March 24, 2008 ADBS: DW 5

-- Data Warehouse Time Variant The time horizon for the data warehouse is significantly longer than that of operational systems. Operational database: current value data. Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) Every key structure in the data warehouse Contains an element of time, explicitly or implicitly But the key of operational data may or may not contain time element. March 24, 2008 ADBS: DW 6

-- Data Warehouse Non-Volatile A physically separate store of data transformed from the operational environment. Operational update of data does not occur in the data warehouse environment. Does not require transaction processing, recovery, and concurrency control mechanisms Requires only two operations in data accessing: Initial loading of data and access of data. March 24, 2008 ADBS: DW 7

Decision support tools Direct Query Reporting tools OLAP Data Mining Crystal reports Essbase Intelligent Miner Merge Clean Summarize Data warehouse RDBMS e.g. Teradata Detailed transactional data Dubai branch Doha branch Riyad branch Oracle Operational data IMS GIS data Census data SAS March 24, 2008 ADBS: DW 8

- Conceptual Structure of Data Warehouse Back Flushing Data Warehouse Databases Cleaning Reformatting Data Metadata OLAP DSSI EIS Data Mining Other Data Inputs Updates/New Data March 24, 2008 ADBS: DW 9

- Why Separate Data Warehouse? High performance for both systems DBMS tuned for Online Transaction Processing (OLTP): access methods, indexing, concurrency control, recovery Warehouse tuned for Online Analytical Processing (OLAP): complex OLAP queries, multidimensional view, consolidation. Different functions and different data: missing data: Decision support requires historical data which operational DBs do not typically maintain data consolidation: DS requires consolidation (aggregation, summarization) of data from heterogeneous sources data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled March 24, 2008 ADBS: DW 10

- Data Modeling for Data Warehouses Example of Two- Dimensional vs. Multi- Dimensional P r o d u c t P123 P124 P125 P126 : : Three dimensional data cube Fiscal Quarter Qtr 1 Qtr 2 Qtr 3 Qtr 4 Reg 1 Region Reg 2 Reg 3 March 24, 2008 ADBS: DW 11

- Data Modeling for Data Warehouses Advantages of a multi-dimensional model Multi-dimensional models lend themselves readily to hierarchical views in what is known as roll-up display and drilldown display. The data can be directly queried in any combination of dimensions, bypassing complex database queries. March 24, 2008 ADBS: DW 12

-- Multi-dimensional Schemas Multi-dimensional schemas are specified using: Dimension table It consists of tuples of attributes of the dimension. Fact table Each tuple is a recorded fact. This fact contains some measured or observed variable (s) and identifies it with pointers to dimension tables. The fact table contains the data, and the dimensions to identify each tuple in the data March 24, 2008 ADBS: DW 13

-- Multi-dimensional Schemas Modeling data warehouses: dimensions & measures Star schema: A fact table in the middle connected to a set of dimension tables Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation March 24, 2008 ADBS: DW 14

--- Example of Star Schema time time_key day day_of_the_week month quarter year branch branch_key branch_name branch_type Measures Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales item item_key item_name brand type supplier_type location location_key street city province_or_street country March 24, 2008 ADBS: DW 15

--- Example of Snowflake Schema time time_key day day_of_the_week month quarter year Sales Fact Table time_key item_key item item_key item_name brand type supplier_key supplier supplier_key supplier_type branch branch_key branch_name branch_type Measures branch_key location_key units_sold dollars_sold avg_sales location location_key street city_key city city_key city province_or_street country March 24, 2008 ADBS: DW 16

--- Example of Fact Constellation time time_key day day_of_the_week month quarter year Sales Fact Table time_key item_key branch_key item item_key item_name brand type supplier_type Shipping Fact Table time_key item_key shipper_key from_location branch branch_key branch_name branch_type Measures location_key units_sold dollars_sold avg_sales location location_key street city province_or_street country to_location dollars_cost units_shipped shipper shipper_key shipper_name location_key shipper_type March 24, 2008 ADBS: DW 17

-OLAP Fast, interactive answers to large aggregate queries. Multidimensional model: dimensions with hierarchies Dim 1: Bank location: branch-->city-->state Dim 2: Customer: sub profession --> profession Dim 3: Time: month --> quarter --> year Measures: loan amount, #transactions, balance March 24, 2008 ADBS: DW 18

-- Multidimensional Data Sales volume as a function of product, month, and region Region Dimensions: Product, Location, Time Hierarchical summarization paths Industry Region Year Category Country Quarter Product Product City Month Week Office Day Month March 24, 2008 ADBS: DW 19

-- Data Cubes Fact relation sale Product Client Amt p1 c1 12 p2 c1 11 p1 c3 50 p2 c2 8 Two-dimensional cube c1 c2 c3 p1 12 50 p2 11 8 March 24, 2008 ADBS: DW 20

-- Data Cubes Fact relation 3-dimensional cube sale Product Client Date Amt p1 c1 1 12 p2 c1 1 11 p1 c3 1 50 p2 c2 1 8 p1 c1 2 44 p1 c2 2 4 day 2 day 1 c1 c2 c3 p1 44 4 p2 c1 c2 c3 p1 12 50 p2 11 8 March 24, 2008 ADBS: DW 21

... -- Data Cubes day 2 day 1 c1 c2 c3 p1 44 4 p2 c1 c2 c3 p1 12 50 p2 11 8 Example: computing sums... c1 c2 c3 p1 56 4 50 p2 11 8 c1 c2 c3 sum 67 12 50 sum p1 110 p2 19 129 March 24, 2008 ADBS: DW 22

-- Data Cubes... In multidimensional data model together with measure values usually we store summarizing information (aggregates) c1 c2 c3 Sum p1 56 4 50 110 p2 11 8 19 Sum 67 12 50 129 March 24, 2008 ADBS: DW 23

- A Sample Data Cube TV PC VCR sum Product Date 1Qtr 2Qtr 3Qtr 4Qtr sum Total annual sales of TV in K. S. A. K.S.A U.A.E. Bahrain Country sum March 24, 2008 ADBS: DW 24

-- The Cube Operator day 2 day 1 c1 c2 c3 p1 44 4 p2 c1 c2 c3 p1 12 50 p2 11 8... sale(c1,*,*) c1 c2 c3 p1 56 4 50 p2 11 8 sale(c2,p2,*) c1 c2 c3 sum 67 12 50 sum p1 110 p2 19 129 sale(*,*,*) March 24, 2008 ADBS: DW 25

-- The Cube Operator * c1 c2 c3 * p1 56 4 50 110 p2 11 8 19 day 2 c1* c267 c3 12 * 50 129 p1 44 4 48 day 1 c1 p2 c2 c3 * p1 12* 44 50 4 62 48 p2 11 8 19 * 23 8 50 81 sale(*,p2,*) March 24, 2008 ADBS: DW 26

-- Aggregation Using Hierarchies day 2 day 1 c1 c2 c3 p1 44 4 p2 c1 c2 c3 p1 12 50 p2 11 8 customer region country region A region B p1 12 50 p2 11 8 (customer c1 in Region A; customers c2, c3 in Region B) March 24, 2008 ADBS: DW 27

-- OLAP Servers Relational OLAP (ROLAP) Extended relational DBMS that maps operations on multidimensional data to standard relations operations. Store all information, including fact tables, as relations Multidimensional OLAP (MOLAP) Special purpose server that directly implements multidimensional data and operations Store multidimensional datasets as arrays. March 24, 2008 ADBS: DW 28

-OLAP Queries Roll up (drill-up): summarize data by climbing up hierarchy or by dimension reduction Drill down (roll down): reverse of roll-up from higher level summary to lower level summary or detailed data, or introducing new dimensions Slice and dice: project and select Pivot (rotate): reorient the cube, visualization, 3D to series of 2D planes. Other operations drill across: involving (across) more than one fact table drill through: through the bottom level of the cube to its back-end relational tables (using SQL) March 24, 2008 ADBS: DW 29

-- OLAP Queries: Roll Up Summarizes data along dimension. client Dammam Riyadh c1 c2 c3 c4 10 12 11 12 3 5 7 11 21 9 7 15 Date of sale city region video Camera CD Roll up aggregation with respect to city Video Camera CD Dammam 22 8 30 Riyadh 23 18 22 March 24, 2008 ADBS: DW 30

-- OLAP Queries: Drill Down Roll down, drill down: go from higher level summary to lower level summary or detailed data For a particular product category, find the detailed sales data for each salesperson by date Given total sales by state, we can ask for sales per city, or just sales by city for a selected state March 24, 2008 ADBS: DW 31

-- OLAP Queries: Drill down day 1 day 2 c1 c2 c3 p1 44 4 p2 c1 c2 c3 p1 12 50 p2 11 8 c1 c2 c3 p1 56 4 50 p2 11 8 rollup drill-down c1 c2 c3 sum 67 12 50 sum p1 110 p2 19 129 March 24, 2008 ADBS: DW 32

-- Other OLAP Queries Slice and dice: select and project Sales of video in USA over the last 6 months Slicing and dicing reduce the number of dimensions Pivot: reorient cube The result of pivoting is called a cross-tabulation If we pivot the Sales cube on the Client and Product dimensions, we obtain a table for each client for each product value March 24, 2008 ADBS: DW 33

Other OLAP Queries Pivoting can be combined with aggregation sale prodid clientid date amt p1 c1 1 12 p2 c1 1 11 p1 c3 1 50 p2 c2 1 8 p1 c1 2 44 p1 c2 2 4 day 2 day 1 c1 c2 c3 p1 44 4 p2 c1 c2 c3 p1 12 50 p2 11 8 c1 c2 c3 Sum 1 23 8 50 81 2 44 4 48 Sum 67 12 50 129 c1 c2 c3 Sum p1 56 4 50 110 p2 11 8 19 Sum 67 12 50 129 March 24, 2008 ADBS: DW 34

-- Cube Implementations Data cubes are implemented by materialized views A materialized view is the result of some query, which we chose to store its output table in the database. For the data cube, the views we would choose to materialize will typically be aggregations of the full data cube. Lattice of views are created for performance reasons All Weeks Days Lattice Years Quarters Months March 24, 2008 ADBS: DW 35

-- OLTP vs. OLAP OLTP OLAP users clerk, IT professional knowledge worker function day to day operations decision support DB design application-oriented subject-oriented data current, up-to-date detailed, flat relational isolated usage repetitive ad-hoc access read/write lots of scans index/hash on prim. key unit of work short, simple transaction complex query # records accessed tens millions #users thousands hundreds DB size 100MB-GB 100GB-TB historical, summarized, multidimensional integrated, consolidated metric transaction throughput query throughput, response March 24, 2008 ADBS: DW 36

- Building A Data Warehouse The builders of Data warehouse should take a broad view of the anticipated use of the warehouse. The design should support ad-hoc querying An appropriate schema should be chosen that reflects the anticipated usage. March 24, 2008 ADBS: DW 37

- Building A Data Warehouse The Design of a Data Warehouse involves following steps Acquisition of data for the warehouse. Ensuring that Data Storage meets the query requirements efficiently. Giving full consideration to the environment in which the data warehouse resides. March 24, 2008 ADBS: DW 38

- Building A Data Warehouse Acquisition of data for the warehouse The data must be extracted from multiple, heterogeneous sources. Data must be formatted for consistency within the warehouse. The data must be cleaned to ensure validity. Difficult to automate cleaning process. Back flushing, upgrading the data with cleaned data. March 24, 2008 ADBS: DW 39

- Building A Data Warehouse Acquisition of data for the warehouse The data must be fitted into the data model of the warehouse. The data must be loaded into the warehouse. Proper design for refresh policy should be considered. March 24, 2008 ADBS: DW 40

- Building A Data Warehouse Storing the data according to the data model of the warehouse Creating and maintaining required data structures. Creating and maintaining appropriate access paths. Providing for time-variant data as new data are added. Supporting the updating of warehouse data. Refreshing the data. Purging data. March 24, 2008 ADBS: DW 41

- Building A Data Warehouse Usage projections The fit of the data model Characteristics of available resources Design of the metadata component Modular component design Design for manageability and change Considerations of distributed and parallel architecture Distributed vs. federated warehouses. March 24, 2008 ADBS: DW 42

- Warehouse vs. Data Views Views and data warehouses are alike in that they both have readonly extracts from the databases. However, data warehouses are different from views in the following ways: Data Warehouses exist as persistent storage instead of being materialized on demand. Data Warehouses are not usually relational, but rather multidimensional. Data Warehouses can be indexed for optimization. Data Warehouses provide specific support of functionality. Data Warehouses deals huge volumes of data that is contained generally in more than one database. March 24, 2008 ADBS: DW 43

- Difficulties of implementing Data Warehouses Lead time is huge in building a data warehouse. Potentially it takes years to build and efficiently maintain a data warehouse. Both quality and consistency of data are major concerns. Revising the usage projections regularly to meet the current requirements. The data warehouse should be designed to accommodate addition and attrition of data sources without major redesign Administration of data warehouse would require far broader skills than are needed for a traditional database. March 24, 2008 ADBS: DW 44

END March 24, 2008 ADBS: DW 45