Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi kishorejaladi@yahoo.com



Similar documents
Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Introduction to Data Warehousing. Ms Swapnil Shrivastava

Part 22. Data Warehousing

Week 13: Data Warehousing. Warehousing

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

DATA WAREHOUSING - OLAP

Data Warehousing and OLAP

DATA WAREHOUSING AND OLAP TECHNOLOGY

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

CS2032 Data warehousing and Data Mining Unit II Page 1

DATA CUBES E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

Data W a Ware r house house and and OLAP II Week 6 1

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

This tutorial will help computer science graduates to understand the basic-toadvanced concepts related to data warehousing.

Learning Objectives. Definition of OLAP Data cubes OLAP operations MDX OLAP servers

When to consider OLAP?

CHAPTER 4 Data Warehouse Architecture

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

Week 3 lecture slides

BUSINESS ANALYTICS AND DATA VISUALIZATION. ITM-761 Business Intelligence ดร. สล ล บ ญพราหมณ

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

Fluency With Information Technology CSE100/IMT100

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

Data Warehousing and OLAP Technology for Knowledge Discovery

Anwendersoftware Anwendungssoftwares a. Data-Warehouse-, Data-Mining- and OLAP-Technologies. Online Analytic Processing

Overview of Data Warehousing and OLAP

A Critical Review of Data Warehouse

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

Monitoring Genebanks using Datamarts based in an Open Source Tool

Hybrid OLAP, An Introduction


Data Warehousing. Paper

Data Warehouse: Introduction

IST722 Data Warehousing

Data Warehousing Systems: Foundations and Architectures

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

INFO 321, Database Systems, Semester

14. Data Warehousing & Data Mining

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Data Warehousing and Online Analytical Processing

Designing a Dimensional Model

1960s 1970s 1980s 1990s. Slow access to

Turkish Journal of Engineering, Science and Technology

(Week 10) A04. Information System for CRM. Electronic Commerce Marketing

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Data Warehousing Overview

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Module 1: Introduction to Data Warehousing and OLAP

Data Warehousing, OLAP, and Data Mining

Overview. Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Data Warehousing. An Example: The Store (e.g.

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

Data Mining and Data Warehousing Henryk Maciejewski Data Warehousing and OLAP

Data Warehousing & OLAP

DATA WAREHOUSING APPLICATIONS: AN ANALYTICAL TOOL FOR DECISION SUPPORT SYSTEM

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

University of Gaziantep, Department of Business Administration

Building Cubes and Analyzing Data using Oracle OLAP 11g

Data warehousing. Han, J. and M. Kamber. Data Mining: Concepts and Techniques Morgan Kaufmann.

Distance Learning and Examining Systems

A Technical Review on On-Line Analytical Processing (OLAP)

An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies

PREFACE INTRODUCTION MULTI-DIMENSIONAL MODEL. Chris Claterbos, Vlamis Software Solutions, Inc.

OLAP and Data Warehousing! Introduction!

M Designing and Implementing OLAP Solutions Using Microsoft SQL Server Day Course

Data Warehousing and Data Mining

Data Warehouse design

Data Warehousing Concepts

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Chapter 3, Data Warehouse and OLAP Operations

Vlamis Software Solutions, Inc Copyright 2008, Vlamis Software Solutions, Inc.

Data Warehousing & OLAP

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration

Data Warehouse. The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the following way:

OLAP Systems and Multidimensional Expressions I

Life Cycle of a Data Warehousing Project in Healthcare

Oracle OLAP What's All This About?

The Cubetree Storage Organization

OLAP OLAP. Data Warehouse. OLAP Data Model: the Data Cube S e s s io n

Chapter 14: Databases and Database Management Systems

Main Memory & Near Main Memory OLAP Databases. Wo Shun Luk Professor of Computing Science Simon Fraser University

Lecture Data Warehouse Systems

The Art of Designing HOLAP Databases Mark Moorman, SAS Institute Inc., Cary NC

Basics of Dimensional Modeling

OLAP, Relational, and Multidimensional Database Systems

Using Business Intelligence with Oracle s E-Business Suite

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance

A DATA WAREHOUSE SOLUTION FOR E-GOVERNMENT

Business Intelligence Solutions. Cognos BI 8. by Adis Terzić

Budgeting and Planning with Microsoft Excel and Oracle OLAP

Meta-data and Data Mart solutions for better understanding for data and information in E-government Monitoring

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

Dimensional Modeling for Data Warehouse

Transcription:

Data Warehousing: Data Models and OLAP operations By Kishore Jaladi kishorejaladi@yahoo.com

Topics Covered 1. Understanding the term Data Warehousing 2. Three-tier Decision Support Systems 3. Approaches to OLAP servers 4. Multi-dimensional data model 5. ROLAP 6. MOLAP 7. HOLAP 8. Which to choose: Compare and Contrast 9. Conclusion

Understanding the term Data Warehousing Data Warehouse: The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the following way: "A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process". He defined the terms in the sentence as follows: Subject Oriented: Data that gives information about a particular subject instead of about a company's ongoing operations. Integrated: Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole. Time-variant: All data in the data warehouse is identified with a particular time period. Non-volatile Data is stable in a data warehouse. More data is added but data is never removed. This enables management to gain a consistent picture of the business.

Data Warehouse Architecture

Other important terminology Enterprise Data warehouse collects all information about subjects (customers,products,sales,assets, personnel) that span the entire organization Data Mart Departmental subsets that focus on selected subjects Decision Support System (DSS) Information technology to help the knowledge worker (executive, manager, analyst) make faster & better decisions Online Analytical Processing (OLAP) an element of decision support systems (DSS)

Three-Tier Decision Support Systems Warehouse database server Almost always a relational DBMS, rarely flat files OLAP servers Relational OLAP (ROLAP): extended relational DBMS that maps operations on multidimensional data to standard relational operators Multidimensional OLAP (MOLAP): special-purpose server that directly implements multidimensional data and operations Clients Query and reporting tools Analysis tools Data mining tools

The Complete Decision Support System Information Sources Semistructured Sources Data Warehouse Server (Tier 1) Data Warehouse OLAP Servers (Tier 2) e.g., MOLAP serve Clients (Tier 3) OLAP extract transform load refresh etc. serve e.g., ROLAP Query/Reporting Operational DB s serve Data Mining Data Marts

Approaches to OLAP Servers Three possibilities for OLAP servers (1) Relational OLAP (ROLAP) Relational and specialized relational DBMS to store and manage warehouse data OLAP middleware to support missing pieces (2) Multidimensional OLAP (MOLAP) Array-based storage structures Direct access to array data structures (3) Hybrid OLAP (HOLAP) Storing detailed data in RDBMS Storing aggregated data in MDBMS User access via MOLAP tools

The Multi-Dimensional Data Model Sales by product line over the past six months Sales by store between 1990 and 1995 Store Info Key columns joining fact table to dimension tables Numerical Measures Prod Code Time Code Store Code Sales Qty Product Info Fact table for measures Dimension tables Time Info...

ROLAP: Dimensional Modeling Using Relational DBMS Special schema design: star, snowflake Special indexes: bitmap, multi-table join Proven technology (relational model, DBMS), tend to outperform specialized MDDB especially on large data sets Products IBM DB2, Oracle, Sybase IQ, RedBrick, Informix

Star Schema (in RDBMS)

Star Schema Example

The Classic Star Schema Store Dimension STORE KEY Store Description City State District ID District Desc. Region_ID Region Desc. Regional Mgr. Level Fact Table STORE KEY PRODUCT KEY PERIOD KEY Dollars Units Price Product Dimension PRODUCT KEY Product Desc. Brand Color Size Manufacturer Level Time Dimension PERIOD KEY Period Desc Year Quarter Month Day Current Flag Resolution Sequence A single fact table, with detail and summary data Fact table primary key has only one key column per dimension Each key is generated Each dimension is a single table, highly denormalized Benefits: Easy to understand, easy to define hierarchies, reduces # of physical joins, low maintenance, very simple metadata

Star Schema with Sample Data

The Snowflake Schema Store Dimension STORE KEY Store Description City State District ID Region_ID Regional Mgr. District_ID District Desc. Region_ID Store Fact Table STORE KEY PRODUCT KEY PERIOD KEY Dollars Units Price Region_ID Region Desc. Regional Mgr.

Aggregation in a Single Fact Table Store Dimension STORE KEY Store Description City State District ID District Desc. Region_ID Region Desc. Regional Mgr. Level Fact Table STORE KEY PRODUCT KEY PERIOD KEY Dollars Units Price Product Dimension PRODUCT KEY Product Desc. Brand Color Size Manufacturer Level Time Dimension PERIOD KEY Period Desc Year Quarter Month Day Current Flag Resolution Sequence Drawbacks: Summary data in the fact table yields poorer performance for summary levels, huge dimension tables a problem

The Fact Constellation Schema Store Dimension STORE KEY Store Description City State District ID District Desc. Region_ID Region Desc. Regional Mgr. Fact Table STORE KEY PRODUCT KEY PERIOD KEY Dollars Units Price Product Dimension PRODUCT KEY Product Desc. Brand Color Size Manufacturer Time Dimension PERIOD KEY Period Desc Year Quarter Month Day Current Flag Sequence District Fact Table District_ID PRODUCT_KEY PERIOD_KEY Dollars Units Price Region Fact Table Region_ID PRODUCT_KEY PERIOD_KEY Dollars Units Price

Aggregations using Snowflake Schema and Multiple Fact Tables St ore Dimension STORE KEY Store Descript ion City State District ID District Desc. Region_ ID Region Desc. Regional Mgr. District_ID District Desc. Region_ ID Store Fact Table STORE KEY PRODUCT KEY PERIOD KEY Dollars Units Price Region_ ID Region Desc. Regional Mgr. District Fact Table District_ID PRODUCT_KEY PERIOD_KEY Dollars Unit s Price RegionFact Table Region_ID PRODUCT_KEY PERIOD_KEY Dollars Unit s Price No LEVEL in dimension tables Dimension tables are normalized by decomposing at the attribute level Each dimension table has one key for each level of the dimensionís hierarchy The lowest level key joins the dimension table to both the fact table and the lower level attribute table How does it work? The best way is for the query to be built by understanding which summary levels exist, and finding the proper snowflaked attribute tables, constraining there for keys, then selecting from the fact table.

Aggregation Contd St ore Dimension STORE KEY Store Descript ion City State District ID District Desc. Region_ ID Region Desc. Regional Mgr. District_ID District Desc. Region_ ID Store Fact Table STORE KEY PRODUCT KEY PERIOD KEY Dollars Units Price Region_ ID Region Desc. Regional Mgr. District Fact Table District_ID PRODUCT_KEY PERIOD_KEY Dollars Unit s Price RegionFact Table Region_ID PRODUCT_KEY PERIOD_KEY Dollars Unit s Price Advantage: Best performance when queries involve aggregation Disadvantage: Complicated maintenance and metadata, explosion in the number of tables in the database

Aggregates Add up amounts for day 1 In SQL: SELECT sum(amt) FROM SALE WHERE date = 1 sale prodid storeid date amt p1 s1 1 12 p2 s1 1 11 p1 s3 1 50 p2 s2 1 8 p1 s1 2 44 p1 s2 2 4 81

Aggregates Add up amounts by day In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date sale prodid storeid date amt p1 s1 1 12 p2 s1 1 11 p1 s3 1 50 p2 s2 1 8 p1 s1 2 44 p1 s2 2 4 ans date sum 1 81 2 48

Another Example Add up amounts by day, product SQL: SELECT prodid, date, sum(amt) FROM SALE GROUP BY date, prodid sale prodid storeid date amt p1 s1 1 12 p2 s1 1 11 p1 s3 1 50 p2 s2 1 8 p1 s1 2 44 p1 s2 2 4 sale prodid date amt p1 1 62 p2 1 19 p1 2 48 rollup drill-down

Points to be noticed about ROLAP Defines complex, multi-dimensional data with simple model Reduces the number of joins a query has to process Allows the data warehouse to evolve with rel. low maintenance Can contain both detailed and summarized data. ROLAP is based on familiar, proven, and already selected technologies. BUT!!! SQL for multi-dimensional manipulation of calculations.

MOLAP: Dimensional Modeling Using the Multi Dimensional Model MDDB: a special-purpose data model Facts stored in multi-dimensional arrays Dimensions used to index array Sometimes on top of relational DB Products Pilot, Arbor Essbase, Gentia

The MOLAP Cube Fact table view: sale prodid storeid amt p1 s1 12 p2 s1 11 p1 s3 50 p2 s2 8 Multi-dimensional cube: s1 s2 s3 p1 12 50 p2 11 8 dimensions = 2

3-D Cube Fact table view: Multi-dimensional cube: sale prodid storeid date amt p1 s1 1 12 p2 s1 1 11 p1 s3 1 50 p2 s2 1 8 p1 s1 2 44 p1 s2 2 4 day 2 day 1 s1 s2 s3 p1 44 4 p2 s1 s2 s3 p1 12 50 p2 11 8 dimensions = 3

Product Example NY SF LA Juice 10 Milk 34 Coke 56 Cream 32 Soap 12 Bread 56 M T W Th F S S Time 56 units of bread sold in LA on M roll-up to region roll-up to brand roll-up to week Dimensions: Time, Product, Store Attributes: Product (upc, price, ) Store Hierarchies: Product Brand Day Week Quarter Store Region Country

Cube Aggregation: Roll-up day 2 day 1 s1 s2 s3 p1 44 4 p2 s1 s2 s3 p1 12 50 p2 11 8 Example: computing sums... s1 s2 s3 p1 56 4 50 p2 11 8 rollup drill-down s1 s2 s3 sum 67 12 50 sum p1 110 p2 19 129

Cube Operators for Roll-up day 2 day 1 s1 s2 s3 p1 44 4 p2 s1 s2 s3 p1 12 50 p2 11 8... sale(s1,*,*) s1 s2 s3 p1 56 4 50 p2 11 8 sale(s2,p2,*) s1 s2 s3 sum 67 12 50 sum p1 110 p2 19 129 sale(*,*,*)

Extended Cube * s1 s2 s3 * p1 56 4 50 110 p2 11 8 19 day 2 s1 * s2 67 s3 12 * 50 129 p1 44 4 48 day 1 p2 s1 s2 s3 * * 44 4 p1 12 50 62 48 p2 11 8 19 * 23 8 50 81 sale(*,p2,*)

Aggregation Using Hierarchies day 2 day 1 s1 s2 s3 p1 44 4 p2 s1 s2 s3 p1 12 50 p2 11 8 store region country region A region B p1 56 54 p2 11 8 (store s1 in Region A; stores s2, s3 in Region B)

Points to be noticed about MOLAP Pre-calculating or pre-consolidating transactional data improves speed. BUT Fully pre-consolidating incoming data, MDDs require an enormous amount of overhead both in processing time and in storage. An input file of 200MB can easily expand to 5GB MDDs are great candidates for the <50GB department data marts. Rolling up and Drilling down through aggregate data. With MDDs, application design is essentially the definition of dimensions and calculation rules, while the RDBMS requires that the database schema be a star or snowflake.

Hybrid OLAP (HOLAP) HOLAP = Hybrid OLAP: Best of both worlds Storing detailed data in RDBMS Storing aggregated data in MDBMS User access via MOLAP tools

Data Flow in HOLAP RDBMS Server MDBMS Server Client User data Meta data Derived data SQL-Read SQL-Reach Through Multidimensional data Multidimensional access Multidimensional Viewer Relational Viewer SQL-Read

When deciding which technology to go for, consider: 1) Performance: How fast will the system appear to the end-user? MDD server vendors believe this is a key point in their favor. 2) Data volume and scalability: While MDD servers can handle up to 50GB of storage, RDBMS servers can handle hundreds of gigabytes and terabytes.

An experiment with Relational and the Multidimensional models on a data set The analysis of the author s example illustrates the following differences between the best Relational alternative and the Multidimensional approach. relational Multidimensional Improvement Disk space requirement (Gigabytes) Retrieve the corporate measures Actual Vs Budget, by month (I/O s) Calculation of Variance Budget/Actual for the whole database (I/O time in hours) 17 10 1.7 240 1 240 237 2* 110* * This may include the calculation of many other derived data without any additional I/O. Reference: http://dimlab.usc.edu/csci599/fall2002/paper/i2_p064.pdf

What-if analysis IF THEN IF THEN IF THEN A. You require write access B. Your data is under 50 GB C. Your timetable to implement is 60-90 days D. Lowest level already aggregated E. Data access on aggregated level F. You re developing a general-purpose application for inventory movement or assets management Consider an MDD /MOLAP solution for your data mart A. Your data is over 100 GB B. You have a "read-only" requirement C. Historical data at the lowest level of granularity D. Detailed access, long-running queries E. Data assigned to lowest level elements Consider an RDBMS/ROLAP solution for your data mart. A. OLAP on aggregated and detailed data B. Different user groups C. Ease of use and detailed data Consider an HOLAP for your data mart

Examples ROLAP Telecommunication startup: call data records (CDRs) ECommerce Site Credit Card Company MOLAP Analysis and budgeting in a financial department Sales analysis HOLAP Sales department of a multi-national company Banks and Financial Service Providers

Tools available ROLAP: ORACLE 8i ORACLE Reports; ORACLE Discoverer ORACLE Warehouse Builder Arbors Software s Essbase MOLAP: ORACLE Express Server ORACLE Express Clients (C/S and Web) MicroStrategy s DSS server Platinum Technologies Plantinum InfoBeacon HOLAP: ORACLE 8i ORACLE Express Serve ORACLE Relational Access Manager ORACLE Express Clients (C/S and Web)

Conclusion ROLAP: RDBMS -> star/snowflake schema MOLAP: MDD -> Cube structures ROLAP or MOLAP: Data models used play major role in performance differences MOLAP: for summarized and relatively lesser volumes of data (10-50GB) ROLAP: for detailed and larger volumes of data Both storage methods have strengths and weaknesses The choice is requirement specific, though currently data warehouses are predominantly built using RDBMSs/ROLAP.

References http://dimlab.usc.edu/csci599/fall2002/paper/i2_p064.pdf OLAP, Relational, and Multidimensional Database Systems, by George Colliat, Arbor Software Corporation http://www.donmeyer.com/art3.html Data warehousing Services, Data Mining & Analysis, LLC http://www.cs.man.ac.uk/~franconi/teaching/2001/cs636/cs636- olap.ppt Data Warehouse Models and OLAP Operations, by Enrico Franconi http://www.promatis.com/mediacenter/papers - ROLAP, MOLAP, HOLAP: How to determine which to technology is appropriate, by Holger Frietch, PROMATIS Corporation