Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing



Similar documents
Data W a Ware r house house and and OLAP Week 5 1

Chapter 3, Data Warehouse and OLAP Operations

TIES443. Lecture 3: Data Warehousing. Lecture 3. Data Warehousing. Course webpage:

Data Warehousing and OLAP Technology

Data Warehouse. MIT-652 Data Mining Applications. Thimaporn Phetkaew. School of Informatics, Walailak University. MIT-652: DM 2: Data Warehouse 1

Data warehousing. Han, J. and M. Kamber. Data Mining: Concepts and Techniques Morgan Kaufmann.

Multi-dimensional index structures Part I: motivation

Overview of Data Warehousing and OLAP

Data Mining for Knowledge Management. Data Warehouses

Lecture 2 Data warehousing

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Data Warehousing and Online Analytical Processing


2 Data Warehouse and OLAP Technology for Data Mining What is a data warehouse? Amultidimensional data model... 6

DATA WAREHOUSING - OLAP

DATA WAREHOUSING AND OLAP TECHNOLOGY

This tutorial will help computer science graduates to understand the basic-toadvanced concepts related to data warehousing.

Week 3 lecture slides

Introduction to Data Warehousing. Ms Swapnil Shrivastava

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Learning Objectives. Definition of OLAP Data cubes OLAP operations MDX OLAP servers

Data Warehouse. The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the following way:

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Week 13: Data Warehousing. Warehousing

Data Warehousing and elements of Data Mining

Data W a Ware r house house and and OLAP II Week 6 1

DATA CUBES E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

Building Data Cubes and Mining Them. Jelena Jovanovic

Data Warehousing and OLAP

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi

OLAP and Data Warehousing! Introduction!

Monitoring Genebanks using Datamarts based in an Open Source Tool

OLAP Systems and Multidimensional Expressions I

Anwendersoftware Anwendungssoftwares a. Data-Warehouse-, Data-Mining- and OLAP-Technologies. Online Analytic Processing

Part 22. Data Warehousing

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

Data Warehouse: Introduction

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Module 1: Introduction to Data Warehousing and OLAP

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

CS2032 Data warehousing and Data Mining Unit II Page 1

Review. Data Warehousing. Today. Star schema. Star join indexes. Dimension hierarchies

CHAPTER 4 Data Warehouse Architecture

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

IST722 Data Warehousing

CHAPTER 3. Data Warehouses and OLAP

Basics of Dimensional Modeling

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

BUSINESS ANALYTICS AND DATA VISUALIZATION. ITM-761 Business Intelligence ดร. สล ล บ ญพราหมณ

Data Warehouse Technology And The MSD Databases

Decision Support, Data Warehousing, and OLAP

14. Data Warehousing & Data Mining

M Designing and Implementing OLAP Solutions Using Microsoft SQL Server Day Course

Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1

Overview. Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Data Warehousing. An Example: The Store (e.g.

Lecture 2: Introduction to Business Intelligence. Introduction to Business Intelligence

Data Warehousing OLAP

New Approach of Computing Data Cubes in Data Warehousing

DATA WAREHOUSING APPLICATIONS: AN ANALYTICAL TOOL FOR DECISION SUPPORT SYSTEM

Data Warehousing, OLAP, and Data Mining

SAS BI Course Content; Introduction to DWH / BI Concepts

When to consider OLAP?

Data Warehousing and OLAP Technology for Knowledge Discovery

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

A Technical Review on On-Line Analytical Processing (OLAP)

Cognos 8 Best Practices

Optimizing Your Data Warehouse Design for Superior Performance

Designing a Dimensional Model

Data Warehousing and Data Mining. A.A Datawarehousing & Datamining 1

Data Warehousing & OLAP

Data Warehouse Logical Design. Letizia Tanca Politecnico di Milano (with the kind support of Rosalba Rossato)

Data Warehouse design

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Turning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex,

70-467: Designing Business Intelligence Solutions with Microsoft SQL Server

Data Warehousing. Paper

Why Business Intelligence

OLAP Theory-English version

Web Log Data Sparsity Analysis and Performance Evaluation for OLAP

Transcription:

Database Applications Advanced Querying Transaction processing Online setting Supports day-to-day operation of business OLAP Data Warehousing Decision support Offline setting Strategic planning (statistics) Transaction Processing Transaction Processing Transaction processing Operational setting Up-to-date = critical Simple data Simple queries Flight reservations ticket sales do not sell a seat twice reservation, date, name Give flight details of X List flights to Y Database must support simple data tables simple queries select from where consistency & integrity CRITICAL concurrency Relational databases, Object-Oriented, Object-Relational Decision Support Data Warehouse Decision support Off-line setting «Historical» data Summarized data Different databases Statistical queries Flight company Evaluate ROI flights Flights of last year # passengers on line L Passengers, fuel costs, maintenance info Average % of seats sold/month/destination A decision support DB that is maintained separately from the organization s operational databases. Why Separate Data Warehouse? High performance for both systems DBMS tuned for OLTP access methods, indexing, concurrency control, recovery Warehouse tuned for OLAP complex OLAP queries, multidimensional view, consolidation. Different functions and different data Missing data: Decision support requires historical data which operational DBs do not typically maintain Data consolidation: DS requires consolidation (aggregation, summarization) of data from heterogeneous sources Data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled 1

Three-Tier Architecture OLAP other sources Operational DBs Metadata Extract Transform Load Refresh Monitor & Integrator Data Warehouse Data Marts OLAP Server Serve ROLAP Server Analysis Query/Reporting Data Mining OLAP = OnLine Analytical Processing Online = no waiting for answers OLAP system = system that supports analytical queries that are dimensional in nature. Data Sources Data Storage OLAP Engine Front-End Tools This Lecture Examples of decision support queries Data Cubes Conceptual data model Typical operations Implementation ROLAP vs MOLAP Indexing structures SQL:1999 support for OLAP Examples of Queries Flight company: evaluate ticket sales give total, average, minimal, maximal amount per date: week, month, year by destination/source port/country/continent by ticket type by # of connections Characteristics One special attribute: amount measure Other attributes: select relevant regions dimensions Different levels of generality (month, year, ) hierarchies Measure data is summarized: sum, min, max, average aggregations Dim. Supermarket example Evaluate the sales of products measure Product cost in $ Customer: ID, city, state, country, Store: chain, size, location, Product: brand, type, hierarchies What are the measure and dimensional attributes, where are the hierarchies? 2

Why dimensions? Multidimensional view on the data Cross Tabulation Cross-tabulations are highly useful Sales of clothes June August 06 store Cost in $ Product: color Blue Red Orange Total customer :month, June August 2006 June July August Total 51 58 65 174 25 20 22 67 158 120 51 329 234 198 138 570 product Data cubes Data Cubes Extension of Cross-Tables to multiple dimensions Conceptual notion June July August Blue Red Orange 51 25 158 Data Points/ 58 20 120 1 st level of aggregation 65 22 51 Dimensions Total 234 198 138 Aggregated w.r.t. X-dim TV PC VCR sum Product 1Qtr 2Qtr 3Qtr 4Qtr sum Ireland France Germany sum Country Total 174 67 329 570 Aggregated w.r.t. Y-dim Aggregated w.r.t. X and Y Data Cubes Base cuboid = n-dimensional cube with n number of dimensions The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid The lattice of cuboids forms a data cube Lattice of Cuboids all product date country date, country product, date product, country product, date, country 3

Operations with Data Cubes Scenario: Before starting the analysis task: what data? select a few relevant dimensions define hierarchy aggregation functions of interest Pre-materialize load data compute counts/max, min, avg, on beforehand Operations with Data Cubes What operations can you think of an analyst might find useful? (e.g., store) Operations with Data Cubes What operations can you think of that an analyst might find useful? (e.g., store) only look at stores in the Netherlands look at cities instead of individual stores look at the cross-table for product-date restrict analysis to 2006, product O1 go back to a finer granularity at the store level Roll-Up Move in one dimension from a lower granularity to a higher one store city cities country product product type Drill-down Move in one dimension from a higher granularity to a lower one city store country cities product type product Pivoting Change the dimensions that are displayed ; select a cross-tab. look at the cross-table for product-date display cross-table for date-customer Drill-through: go back to the original, individual data records 4

Slice & dice Select a part of the cube by restricting one or more dimensions restrict analysis to city = Eindhoven Summary of Concepts Cube: Multidimensional view on data dimensional attributes measure attribute Operations: roll-up/drill-down pivoting slice and dice Implementation To make query answering more efficient: consolidate (materialize) aggregations Obvious implementation: multidimensional array. Fast lookup: cell(prod. p, date d, prom. pr): look up index of p1, index of d, index of pr: index = (p x D x PR) + (d x PR) + pr Implementation Multidimensional array obvious problem: sparse data can easily be solved, though. Example: binary search tree, key on index hash table. Implementation However: very quickly people were confronted with the Data Explosion Problem Consolidating the summaries blows up the data enormously! Reasons are often misunderstood and confusing. Why? Suppose: n dimensions, every dimension has d values d n possible tuples. Number of cells in the cube: (d+1) n So, this is not the problem 5

Why? Suppose n dimensions, every dimension has d values every dimension has a hierarchy most extreme case: binary tree 2d possibilities/dimension Why? Suppose n dimensions, every dimension has d values every dimension has a hierarchy most extreme case: binary tree 2d possibilities/dimension 2 n x d n cells Only partial explanation (factor 2 n comes from an extremely pathological case) Why? The problem is that most data is not dense, but sparse. Hence, not all d n combinations are possible. Example: 10 dimensions with 10 values 10 000 000 000 possibilities Suppose «only» 1 000 000 are present Example: 10 dimensions with 10 values 10 000 000 000 possibilities Suppose «only» 1 000 000 are present Every tuple increases count of 2 10 cells! With hierarchies: effect even worse! If every hierarchy has 5 items: 5 10 = 9 765 625 cells! View Selection Problem Suffices to precompute some aggregates, and compute others on demand. aggregate on (item-name, color) from an aggregate on (item-name, color, size) For all but a few non-decomposable aggregates such as median Several optimizations for computing multiple aggregates Compute aggregate on (item-name, color) from an aggregate on (item-name, color, size) Compute aggregates on (item-name, color, size), (item-name, color) and (item-name) in single DB sort View Selection Problem product all date country date, country product, date product, country product, date, country 6

View Selection Problem all product date country product, date Which views to select: hard research problem! product, country product, date, country date, country Implementation Nowadays systems can be divided in three categories: ROLAP (Relational OLAP) OLAP supported on top of a relational database MOLAP (Multi-Dimensional OLAP) Use of special multi-dimensional data structures HOLAP: (Hybrid) combination of previous two ROLAP Cubes can easily be represented in relational tables: special value all Month Prod. Cust. Price Jan p1 c1 10 Jan p2 c1 8 Jan p1 c2 10 Feb p1 c1 9 all p1 c1 102 Jan all c1 18 Jan p1 all 1 230 all all c1 4 235 all all all 1 253 458 ROLAP Typical database scheme: star schema fact table is central links to dimensional tables Extensions: snowflake schema dimensions have hierarchy/extra information attached Star constellation multiple star schemas sharing dimensions Example of a Star Schema Order Order No Order Customer Customer No Customer Name Customer Address Salesperson SalespersonID SalespersonName Quota Fact Table OrderNO SalespersonID CustomerNO ProdNo Key Name Quantity Total Price Product ProductNO ProdName ProdDescr Category CategoryDescription UnitPrice Key Name State Country Order No Order Customer No Customer Name Customer Address SalespersonID Example of a Snowflake Schema SalespersonName Quota Order Customer Salesperson Fact Table OrderNO SalespersonID CustomerNO ProdNo Key Name Quantity Total Price ProductNO ProdName ProdDescr Category Category UnitPrice Key Month Product Name State Country Month Month Year State Category CategoryName CategoryDescr StateName Country Year 7

Example of Fact Constellation Multiple fact tables share dimension tables Time time_key day day_of_the_week month quarter year Branch branch_key branch_name branch_type Measures Sales Fact Table Time_key Item_key Branch_key Location_key Unit_sold Euros_sold Avg_sales Item item_key item_name brand type supplier_key Location location_key street city Province/street country Shipping Fact Table Time_key Item_key shipper_key from_location to_location Euros_sold unit_shipped shipper shipper_key shipper_name location_key shipper_type SQL 1999 support for OLAP see other set of slides 8