OLAP Systems and Multidimensional Queries I

Similar documents
OLAP Systems and Multidimensional Expressions I

OLAP Systems and Multidimensional Queries II

OLAP Systems and Multidimensional Expressions II

DATA WAREHOUSING - OLAP

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

DATA WAREHOUSING AND OLAP TECHNOLOGY

Data W a Ware r house house and and OLAP II Week 6 1

OLAP OLAP. Data Warehouse. OLAP Data Model: the Data Cube S e s s io n

To increase performaces SQL has been extended in order to have some new operations avaiable. They are:

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

Learning Objectives. Definition of OLAP Data cubes OLAP operations MDX OLAP servers

Anwendersoftware Anwendungssoftwares a. Data-Warehouse-, Data-Mining- and OLAP-Technologies. Online Analytic Processing

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

CHAPTER 4 Data Warehouse Architecture

Data Integration and ETL Process

New Approach of Computing Data Cubes in Data Warehousing

Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1

Building Data Cubes and Mining Them. Jelena Jovanovic

Multi-dimensional index structures Part I: motivation

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes

Overview. Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Data Warehousing. An Example: The Store (e.g.

M Designing and Implementing OLAP Solutions Using Microsoft SQL Server Day Course

DATA CUBES E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

LEARNING SOLUTIONS website milner.com/learning phone

Week 13: Data Warehousing. Warehousing

Unit -3. Learning Objective. Demand for Online analytical processing Major features and functions OLAP models and implementation considerations

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

Mario Guarracino. Data warehousing

Data Warehouse Logical Design. Letizia Tanca Politecnico di Milano (with the kind support of Rosalba Rossato)

A Technical Review on On-Line Analytical Processing (OLAP)

Lecture Data Warehouse Systems

OLAP and Data Warehousing! Introduction!

Week 3 lecture slides

Monitoring Genebanks using Datamarts based in an Open Source Tool

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Review. Data Warehousing. Today. Star schema. Star join indexes. Dimension hierarchies

CS2032 Data warehousing and Data Mining Unit II Page 1

Data Warehousing and OLAP

Data Warehouse design

Basics of Dimensional Modeling

Optimizing Your Data Warehouse Design for Superior Performance

Overview of Data Warehousing and OLAP

UNIT-3 OLAP in Data Warehouse

Data Mining and Data Warehousing Henryk Maciejewski Data Warehousing and OLAP

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Web Log Data Sparsity Analysis and Performance Evaluation for OLAP

Main Memory & Near Main Memory OLAP Databases. Wo Shun Luk Professor of Computing Science Simon Fraser University

Business Intelligence, Data warehousing Concept and artifacts

CS54100: Database Systems

Data Warehouse: Introduction

The Design and the Implementation of an HEALTH CARE STATISTICS DATA WAREHOUSE Dr. Sreèko Natek, assistant professor, Nova Vizija,

When to consider OLAP?

What is OLAP - On-line analytical processing

Designing a Dimensional Model

A Design and implementation of a data warehouse for research administration universities

The Art of Designing HOLAP Databases Mark Moorman, SAS Institute Inc., Cary NC

Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course 20467A; 5 Days

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi

Chapter 3, Data Warehouse and OLAP Operations

II. OLAP(ONLINE ANALYTICAL PROCESSING)

Turkish Journal of Engineering, Science and Technology

Data Warehousing. Paper

Hybrid OLAP, An Introduction

Data Warehousing & OLAP

Business Intelligence & Product Analytics

University of Gaziantep, Department of Business Administration

Data Integration and ETL Process

Part 22. Data Warehousing

Data Warehousing Systems: Foundations and Architectures

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Data Warehousing and OLAP Technology

SAS BI Course Content; Introduction to DWH / BI Concepts

Data warehousing. Han, J. and M. Kamber. Data Mining: Concepts and Techniques Morgan Kaufmann.

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

IST722 Data Warehousing

DATA WAREHOUSE E KNOWLEDGE DISCOVERY

Data Warehousing OLAP

Data Testing on Business Intelligence & Data Warehouse Projects

Apache Kylin Introduction Dec 8,

ROLAP with Column Store Index Deep Dive. Alexei Khalyako SQL CAT Program Manager

Republic Polytechnic School of Information and Communications Technology C355 Business Intelligence. Module Curriculum

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778

BUSINESS ANALYTICS AND DATA VISUALIZATION. ITM-761 Business Intelligence ดร. สล ล บ ญพราหมณ

Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

Data Mining for Knowledge Management. Data Warehouses

SQL SERVER TRAINING CURRICULUM

Using distributed technologies to analyze Big Data

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days

Introduction to Data Warehousing. Ms Swapnil Shrivastava

SQL Server 2012 Business Intelligence Boot Camp

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

Data W a Ware r house house and and OLAP Week 5 1

ETL TESTING TRAINING

Transcription:

OLAP Systems and Multidimensional Queries I Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second semester Academic year 2014/15 (winter course) 1 / 45

Review of the previous lectures Mining of massive datasets Evolution of database systems: operational vs. analytical systems. Dimensional modeling. Extraction, transformation and load of data. 2 / 45

Outline 1 Motivation 2 OLAP Servers 3 ROLAP 4 SQL 5 Summary 3 / 45

Outline 1 Motivation 2 OLAP Servers 3 ROLAP 4 SQL 5 Summary 4 / 45

OLAP systems The next step is to provide solutions for querying and reporting multidimensional analytical data. The goal is to provide efficient solutions for physical representation and processing of these data. 5 / 45

Multidimensional reports OLAP servers provide an effective solution for accessing and processing large volumes of high dimensional data. OLAP systems provide tools for multidimensional reporting. 6 / 45

Outline 1 Motivation 2 OLAP Servers 3 ROLAP 4 SQL 5 Summary 7 / 45

Multidimensional cube The proper data model for multidimensional reporting is the multidimensional one. 8 / 45

Operators in multidimensional data model Roll up summarize data along a dimension hierarchy. Drill down go from higher level summary to lower level summary or detailed data. Slice and dice corresponds to selection and projection. Pivot reorient cube. Raking, Time functions, etc.. 9 / 45

Lattice of cuboids Different degrees of summarizations are presented as a lattice of cuboids. Example for the dimensions: time, product, location, supplier. Using this structure, one can easily show roll up and drill down operations. 10 / 45

Total number of cuboids For an n-dimensional data cube, the total number of cuboids that can be generated is: T = (L i + 1), i=1,...,n where L i is the number of levels associated with dimension i (excluding the virtual top level all since generalizing to all is equivalent to the removal of a dimension). For example, if the cube has 10 dimensions and each dimension has 4 levels, the total number of cuboids that can be generated will be: l = 5 10 = 9, 8 10 6. 11 / 45

Total number of cuboids Example: Consider a simple database with two dimensions: 12 / 45

Total number of cuboids Example: Consider a simple database with two dimensions: Columns in Date dimension: day, month, year Columns in Localization dimension: street, city, country. 12 / 45

Total number of cuboids Example: Consider a simple database with two dimensions: Columns in Date dimension: day, month, year Columns in Localization dimension: street, city, country. Without any information about hierarchies, the number of all possible group-bys is 12 / 45

Total number of cuboids Example: Consider a simple database with two dimensions: Columns in Date dimension: day, month, year Columns in Localization dimension: street, city, country. Without any information about hierarchies, the number of all possible group-bys is 2 6 : 12 / 45

Total number of cuboids Example: Consider a simple database with two dimensions: Columns in Date dimension: day, month, year Columns in Localization dimension: street, city, country. Without any information about hierarchies, the number of all possible group-bys is 2 6 : day street month city year country day, month street, city day, year street, country month, year city, country day, month, year street, city, country 12 / 45

Total number of cuboids Example: Consider the same relations but with defined hierarchies: 13 / 45

Total number of cuboids Example: Consider the same relations but with defined hierarchies: day month year street city country 13 / 45

Total number of cuboids Example: Consider the same relations but with defined hierarchies: day month year street city country Many combinations of columns can be excluded, e.g. group by day, year, street, country. The number of group-bys is then 13 / 45

Total number of cuboids Example: Consider the same relations but with defined hierarchies: day month year street city country Many combinations of columns can be excluded, e.g. group by day, year, street, country. The number of group-bys is then 4 2 : 13 / 45

Total number of cuboids Example: Consider the same relations but with defined hierarchies: day month year street city country Many combinations of columns can be excluded, e.g. group by day, year, street, country. The number of group-bys is then 4 2 : year country month, year city, country day, month, year street, city, country 13 / 45

Three types of aggregate functions distributive: count(), sum(), max(), min(), algebraic: ave(), std dev(), holistic: median(), mode(), rank(). 14 / 45

OLAP servers Relational OLAP (ROLAP), Multidimensional OLAP (MOLAP), Hybrid OLAP (HOLAP). 15 / 45

Outline 1 Motivation 2 OLAP Servers 3 ROLAP 4 SQL 5 Summary 16 / 45

ROLAP ROLAP servers use a relational or post-relational database management system to store and manage warehouse data. ROLAP systems use SQL and its OLAP extensions. Optimization techniques: Denormalization, Materialized views, Partitioning, Joins, Indexes, Query processing. 17 / 45

ROLAP Advantages of ROLAP Servers: Scalable with respect to the number of dimensions, Scalable with respect to the size of data, Sparsity is not a problem (fact tables contain only facts), Mature and well-developed technology. Disadvantage of ROLAP Servers: Worse performance than MOLAP, Additional data structures and optimization techniques used to improve the performance. 18 / 45

Grouping Group-by is usually performed in the following way: 19 / 45

Grouping Group-by is usually performed in the following way: Partition tuples on grouping attributes: tuples in same group are placed together, and in different groups separated, Scan tuples in each partition and compute aggregate expressions. 19 / 45

Grouping Group-by is usually performed in the following way: Partition tuples on grouping attributes: tuples in same group are placed together, and in different groups separated, Scan tuples in each partition and compute aggregate expressions. Two techniques for partitioning Sorting Sort by the grouping attributes, All tuples with same grouping attributes will appear together in sorted list. Hashing Hash by the grouping attributes, All tuples with same grouping attributes will hash to same bucket, Sort or re-hash within each bucket to resolve collisions. 19 / 45

Grouping Group-by is usually performed in the following way: Partition tuples on grouping attributes: tuples in same group are placed together, and in different groups separated, Scan tuples in each partition and compute aggregate expressions. Two techniques for partitioning Sorting Sort by the grouping attributes, All tuples with same grouping attributes will appear together in sorted list. Hashing Hash by the grouping attributes, All tuples with same grouping attributes will hash to same bucket, Sort or re-hash within each bucket to resolve collisions. In OLAP queries use intermediate results to compute more general group-bys 19 / 45

Grouping Example: Grouping by sorting (Month, City): Month City Sale March Poznań 105 March Warszawa 135 March Poznań 50 April Poznań 150 April Kraków 175 May Warszawa 100 May Poznań 70 May Warszawa 75 20 / 45

Grouping Example: Grouping by sorting (Month, City): Month City Sale March Poznań 105 March Warszawa 135 March Poznań 50 April Poznań 150 April Kraków 175 May Warszawa 100 May Poznań 70 May Warszawa 75 Month City Sale March Poznań 105 March Poznań 50 March Warszawa 135 April Kraków 175 April Poznań 150 May Poznań 70 May Warszawa 100 May Warszawa 75 20 / 45

Grouping Example: Grouping by sorting (Month, City): Month City Sale March Poznań 105 March Warszawa 135 March Poznań 50 April Poznań 150 April Kraków 175 May Warszawa 100 May Poznań 70 May Warszawa 75 Month City Sale March Poznań 155 March Warszawa 135 April Kraków 175 April Poznań 150 May Poznań 70 May Warszawa 175 Month City Sale March Poznań 105 March Poznań 50 March Warszawa 135 April Kraków 175 April Poznań 150 May Poznań 70 May Warszawa 100 May Warszawa 75 20 / 45

Grouping Example: Grouping by sorting (Month; City; Month, City): Month City Sale March Poznań 155 March Warszawa 135 April Kraków 175 April Poznań 150 May Poznań 70 May Warszawa 175 21 / 45

Grouping Example: Grouping by sorting (Month; City; Month, City): Month City Sale March Poznań 155 March Warszawa 135 April Kraków 175 April Poznań 150 May Poznań 70 May Warszawa 175 City Sale Kraków 175 Poznań 155 Poznań 150 Poznań 70 Warszawa 135 Warszawa 175 Month Sale March 155 March 135 April 150 April 175 May 175 May 70 City Sale Kraków 175 Poznań 375 Warszawa 310 Month Sale March 285 April 325 May 245 21 / 45

Outline 1 Motivation 2 OLAP Servers 3 ROLAP 4 SQL 5 Summary 22 / 45

Querying the star schema SQL queries 23 / 45

SQL group by SELECT Name, AVG(Grade) FROM Students grades G, Student S WHERE G.Student = S.ID GROUP BY Name; Name AVG(Grade) Inmon 4.8 Kimball 4.7 Gates 4.0 Todman 4.5 24 / 45

SQL group by SELECT Academic year, Name, AVG(Grade) FROM Students grades G, Academic year A, Professor P WHERE G.Professor = P.ID and G.Academic year = A.ID GROUP BY Academic year, Name; Academic year Name AVG(Grade) 2001/2 Stefanowski 4.2 2002/3 Stefanowski 4.0 2003/4 Stefanowski 3.9 2001/2 S lowiński 4.1 2002/3 S lowiński 3.8 2003/4 S lowiński 3.6 2003/4 Dembczyński 4.8 25 / 45

OLAP extensions in SQL: GROUP BY ROLLUP, GROUP BY CUBE, GROUP BY GROUPING SETS GROUPING and DECODE/CASE OVER Ranking functions 26 / 45

GROUP BY CUBE SELECT Time, Product, Location, Supplier, SUM(Gain) FROM Sales GROUP BY CUBE (Time, Product, Location, Supplier); 27 / 45

GROUP BY CUBE SELECT Time, Product, Location, Supplier, SUM(Gain) FROM Sales GROUP BY Time, Product, Location, Supplier UNION ALL SELECT Time, Product, Location, *, SUM(Gain) FROM Sales GROUP BY Time, Product, Location UNION ALL SELECT Time, Product, *, Location, SUM(Gain) FROM Sales GROUP BY Time, Product, Location UNION ALL... UNION ALL SELECT *, *, *, *, SUM(Gain) FROM Sales; 28 / 45

GROUP BY CUBE SELECT Academic year, Name, AVG(Grade) FROM Students grades GROUP BY CUBE(Academic year, Name); Academic year Name AVG(Grade) 2001/2 Stefanowski 4.2 2001/2 S lowiński 4.1 2002/3 Stefanowski 4.0 2002/3 S lowiński 3.8 2003/4 Stefanowski 3.9 2003/4 S lowiński 3.6 2003/4 Dembczyński 4.8 2001/2 NULL 4.15 2002/3 NULL 3.85 2003/4 NULL 3.8 NULL Stefanowski 3.9 NULL S lowiński 3.6 NULL Dembczyński 4.8 NULL NULL 3.95 29 / 45

GROUP BY ROLLUP SELECT Time, Product, Location, Supplier, SUM(Gain) FROM Sales GROUP BY ROLLUP (Time, Product, Location, Supplier); 30 / 45

GROUP BY ROLLUP SELECT Time, Product, Location, Supplier, SUM(Gain) FROM Sales GROUP BY Time, Product, Location, Supplier UNION ALL SELECT Time, Product, Location, *, SUM(Gain) FROM Sales GROUP BY Time, Product, Location UNION ALL SELECT Time, Product, *, *, SUM(Gain) FROM Sales GROUP BY Time, Product UNION ALL SELECT Time, *, *, *, SUM(Gain) FROM Sales GROUP BY Time UNION ALL SELECT *, *, *, *, SUM(Gain) FROM Sales; 31 / 45

GROUP BY ROLLUP SELECT Academic year, Name, AVG(Grade) FROM Students grades G GROUP BY ROLLUP(Academic year, Name); Academic year Name AVG(Grade) 2001/2 Stefanowski 4.2 2001/2 S lowiński 4.1 2002/3 Stefanowski 4.0 2002/3 S lowiński 3.8 2003/4 Stefanowski 3.9 2003/4 S lowiński 3.6 2003/4 Dembczyński 4.8 2001/2 NULL 4.15 2002/3 NULL 3.85 2003/4 NULL 3.8 NULL NULL 3.95 32 / 45

GROUP BY GROUPING SETS SQL queries SELECT Time, Product, Location, Supplier, SUM(Gain) FROM Sales GROUP BY GROUPING SETS ((Time), (Product), (Location), (Supplier)); 33 / 45

GROUP BY GROUPING SETS SELECT Time, *, *, *, SUM(Gain) FROM Sales GROUP BY Time UNION ALL SELECT *, Product, *, *, SUM(Gain) FROM Sales GROUP BY Product UNION ALL SELECT *, *, Location, *, SUM(Gain) FROM Sales GROUP BY Location UNION ALL SELECT *, *, *, Supplier, SUM(Gain) FROM Sales GROUP BY Supplier; 34 / 45

GROUP BY GROUPING SETS SELECT Academic year, Name, AVG(Grade) FROM Students grades GROUP BY GROUPING SETS ((Academic year), (Name),()); Academic year Name AVG(Grade) 2001/2 NULL 4.15 2002/3 NULL 3.85 2003/4 NULL 3.8 NULL Stefanowski 3.9 NULL S lowiński 3.6 NULL Dembczyński 4.8 NULL NULL 3.95 35 / 45

GROUPING(<column expression>) Returns a value of 1 if the value of expression in the row is a null representing the set of all values. <column expression> is a column or an expression that contains a column in a GROUP BY clause. GROUPING is used to distinguish the null values that are returned by ROLLUP, CUBE or GROUPING SETS from standard null values. The NULL returned as the result of a ROLLUP, CUBE or GROUPING SETS operation is a special use of NULL. 36 / 45

GROUPING(<column expression>) SELECT Extra scholarship, AVG(Grade), GROUPING(Extra scholarship) as Grouping FROM Students grades GROUP BY ROLL UP(Extra scholarship); Extra scholarship AVG(Grade) Grouping Yes 4.15 0 No 3.61 0 NULL 4.03 0 NULL 3.89 1 36 / 45

DECODE(expression, search, result [, search, result]... [, default] ) If the value of expression is equal to search, then result is returned, otherwise default is returned. The functionality is similar to CASE expression, The results of GROUPING() can be passed into a DECODE function or the CASE expression. 37 / 45

DECODE(expression, search, result [, search, result]... [, default] ) SELECT DECODE(GROUPING(Extra scholarship), 1, "Total Average", Extra scholarship) as Extra scholarship, AVG(Grade) FROM Students grades GROUP BY ROLL UP(Extra scholarship); Extra scholarship AVG(Grade) Yes 4.15 No 3.61 NULL 4.03 Total average 3.89 37 / 45

OVER(): 38 / 45

OVER(): Determines the partitioning and ordering of a rowset before the associated window function is applied. 38 / 45

OVER(): Determines the partitioning and ordering of a rowset before the associated window function is applied. The OVER clause defines a window or user-specified set of rows within a query result set. 38 / 45

OVER(): Determines the partitioning and ordering of a rowset before the associated window function is applied. The OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. 38 / 45

OVER(): Determines the partitioning and ordering of a rowset before the associated window function is applied. The OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. The OVER clause can be used with functions to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results. 38 / 45

OVER(): Determines the partitioning and ordering of a rowset before the associated window function is applied. The OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. The OVER clause can be used with functions to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results. Syntax: OVER ( [ <PARTITION BY clause> ] [ <ORDER BY clause> ] [ <ROW or RANGE clause> ] ) 38 / 45

OVER(): 39 / 45

OVER(): PARTITION BY: 39 / 45

OVER(): PARTITION BY: Divides the query result set into partitions. The window function is applied to each partition separately and computation restarts for each partition. 39 / 45

OVER(): PARTITION BY: Divides the query result set into partitions. The window function is applied to each partition separately and computation restarts for each partition. ORDER BY: 39 / 45

OVER(): PARTITION BY: Divides the query result set into partitions. The window function is applied to each partition separately and computation restarts for each partition. ORDER BY: Defines the logical order of the rows within each partition of the result set, i.e., it specifies the logical order in which the window function calculation is performed. 39 / 45

OVER(): PARTITION BY: Divides the query result set into partitions. The window function is applied to each partition separately and computation restarts for each partition. ORDER BY: Defines the logical order of the rows within each partition of the result set, i.e., it specifies the logical order in which the window function calculation is performed. ROW RANGE: 39 / 45

OVER(): PARTITION BY: Divides the query result set into partitions. The window function is applied to each partition separately and computation restarts for each partition. ORDER BY: Defines the logical order of the rows within each partition of the result set, i.e., it specifies the logical order in which the window function calculation is performed. ROW RANGE: Further limits the rows within the partition by specifying start and end points within the partition. 39 / 45

OVER(): PARTITION BY: Divides the query result set into partitions. The window function is applied to each partition separately and computation restarts for each partition. ORDER BY: Defines the logical order of the rows within each partition of the result set, i.e., it specifies the logical order in which the window function calculation is performed. ROW RANGE: Further limits the rows within the partition by specifying start and end points within the partition. This is done by specifying a range of rows with respect to the current row either by logical association or physical association. 39 / 45

OVER(): PARTITION BY: Divides the query result set into partitions. The window function is applied to each partition separately and computation restarts for each partition. ORDER BY: Defines the logical order of the rows within each partition of the result set, i.e., it specifies the logical order in which the window function calculation is performed. ROW RANGE: Further limits the rows within the partition by specifying start and end points within the partition. This is done by specifying a range of rows with respect to the current row either by logical association or physical association. The ROWS clause limits the rows within a partition by specifying a fixed number of rows preceding or following the current row. 39 / 45

OVER(): PARTITION BY: Divides the query result set into partitions. The window function is applied to each partition separately and computation restarts for each partition. ORDER BY: Defines the logical order of the rows within each partition of the result set, i.e., it specifies the logical order in which the window function calculation is performed. ROW RANGE: Further limits the rows within the partition by specifying start and end points within the partition. This is done by specifying a range of rows with respect to the current row either by logical association or physical association. The ROWS clause limits the rows within a partition by specifying a fixed number of rows preceding or following the current row. The RANGE clause logically limits the rows within a partition by specifying a range of values with respect to the value in the current row. 39 / 45

OVER(): PARTITION BY: Divides the query result set into partitions. The window function is applied to each partition separately and computation restarts for each partition. ORDER BY: Defines the logical order of the rows within each partition of the result set, i.e., it specifies the logical order in which the window function calculation is performed. ROW RANGE: Further limits the rows within the partition by specifying start and end points within the partition. This is done by specifying a range of rows with respect to the current row either by logical association or physical association. The ROWS clause limits the rows within a partition by specifying a fixed number of rows preceding or following the current row. The RANGE clause logically limits the rows within a partition by specifying a range of values with respect to the value in the current row. Preceding and following rows are defined based on the ordering in the ORDER BY clause. 39 / 45

Ranking functions: 40 / 45

Ranking functions: RANK () OVER: 40 / 45

Ranking functions: RANK () OVER: Returns the rank of each row within the partition of a result set. The rank of a row is one plus the number of ranks that come before the row in question. 40 / 45

Ranking functions: RANK () OVER: Returns the rank of each row within the partition of a result set. The rank of a row is one plus the number of ranks that come before the row in question. DENSE RANK () OVER: 40 / 45

Ranking functions: RANK () OVER: Returns the rank of each row within the partition of a result set. The rank of a row is one plus the number of ranks that come before the row in question. DENSE RANK () OVER: Returns the rank of rows within the partition of a result set, without any gaps in the ranking. The rank of a row is one plus the number of distinct ranks that come before the row in question. 40 / 45

Ranking functions: RANK () OVER: Returns the rank of each row within the partition of a result set. The rank of a row is one plus the number of ranks that come before the row in question. DENSE RANK () OVER: Returns the rank of rows within the partition of a result set, without any gaps in the ranking. The rank of a row is one plus the number of distinct ranks that come before the row in question. NTILE (integer expression) OVER: 40 / 45

Ranking functions: RANK () OVER: Returns the rank of each row within the partition of a result set. The rank of a row is one plus the number of ranks that come before the row in question. DENSE RANK () OVER: Returns the rank of rows within the partition of a result set, without any gaps in the ranking. The rank of a row is one plus the number of distinct ranks that come before the row in question. NTILE (integer expression) OVER: Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs. 40 / 45

Ranking functions: RANK () OVER: Returns the rank of each row within the partition of a result set. The rank of a row is one plus the number of ranks that come before the row in question. DENSE RANK () OVER: Returns the rank of rows within the partition of a result set, without any gaps in the ranking. The rank of a row is one plus the number of distinct ranks that come before the row in question. NTILE (integer expression) OVER: Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs. ROW NUMBER () OVER: 40 / 45

Ranking functions: RANK () OVER: Returns the rank of each row within the partition of a result set. The rank of a row is one plus the number of ranks that come before the row in question. DENSE RANK () OVER: Returns the rank of rows within the partition of a result set, without any gaps in the ranking. The rank of a row is one plus the number of distinct ranks that come before the row in question. NTILE (integer expression) OVER: Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs. ROW NUMBER () OVER: Returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition. 40 / 45

Examples: Ranking of the students SELECT Student, Avg(Grade), RANK () OVER (ORDER BY Avg(Grade) DESC) FROM Students grades GROUP BY Student; To sort according to rank, we need to order the resulting relation: SELECT Student, Avg(Grade), RANK () OVER (ORDER BY Avg(Grade) DESC) AS rank of grades FROM Students grades GROUP BY Student ORDER BY rank of grades; 41 / 45

Examples: Ranking of students partitioned by instructors. SELECT Instructor Name, Student, Avg(Grade), RANK () OVER (PARTITION BY Instructor Name ORDER BY Avg(Grade) DESC) AS rank 1 FROM Students grades GROUP BY Student,Instructor Name ORDER BY Instructor Name, rank 1; Moving average of a student: SELECT Student, Academic year, AVG (grades) OVER (PARTITION BY Student ORDER BY Academic year DESC ROWS UNBOUNDED PRECEDING) FROM Students grades ORDER BY Student, Academic year; 42 / 45

Outline 1 Motivation 2 OLAP Servers 3 ROLAP 4 SQL 5 Summary 43 / 45

Summary OLAP Systems: Relational OLAP. SQL for analytical queries. 44 / 45

Bibliography J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, second edition, 2006 45 / 45