DB2 for i SQL OLAP Functions

Similar documents
DB2 for i. Analysis and Tuning. Mike Cain IBM DB2 for i Center of Excellence. mcain@us.ibm.com

DB2 V8 Performance Opportunities

Querying Microsoft SQL Server

Introducing Microsoft SQL Server 2012 Getting Started with SQL Server Management Studio

Course ID#: W 35 Hrs. Course Content

Querying Microsoft SQL Server 20461C; 5 days

Querying Microsoft SQL Server Course M Day(s) 30:00 Hours

Querying Microsoft SQL Server 2012

Course 10774A: Querying Microsoft SQL Server 2012

Course 10774A: Querying Microsoft SQL Server 2012 Length: 5 Days Published: May 25, 2012 Language(s): English Audience(s): IT Professionals

MOC QUERYING MICROSOFT SQL SERVER

Course 20461C: Querying Microsoft SQL Server Duration: 35 hours

Oracle Database 10g: Introduction to SQL

To increase performaces SQL has been extended in order to have some new operations avaiable. They are:

Netezza SQL Class Outline

Querying Microsoft SQL Server 2012

SQL Optimization & Access Paths: What s Old & New Part 1

OLAP Systems and Multidimensional Expressions I

Data warehousing in Oracle. SQL extensions for data warehouse analysis. Available OLAP functions. Physical aggregation example

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

Saskatoon Business College Corporate Training Centre

DBMS / Business Intelligence, SQL Server

The Arts & Science of Tuning HANA models for Performance. Abani Pattanayak, SAP HANA CoE Nov 12, 2015

Top 25+ DB2 SQL Tuning Tips for Developers. Presented by Tony Andrews, Themis Inc.

Guide to Performance and Tuning: Query Performance and Sampled Selectivity

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

The Benefits of Data Modeling in Business Intelligence

Implementing Data Models and Reports with Microsoft SQL Server

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance

AV-005: Administering and Implementing a Data Warehouse with SQL Server 2014

DB2 for i5/os: Tuning for Performance

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778

MOC 20461C: Querying Microsoft SQL Server. Course Overview

SQL Server Administrator Introduction - 3 Days Objectives

David L. Fuston, Vlamis Software Solutions, Inc.,

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

2. Metadata Modeling Best Practices with Cognos Framework Manager

Basics of Dimensional Modeling

Querying Microsoft SQL Server Querying Microsoft SQL Server D. Course 10774A: Course Det ails. Co urse Outline

Useful Business Analytics SQL operators and more Ajaykumar Gupte IBM

Advanced SQL. Jim Mason. Web solutions for iseries engineer, build, deploy, support, train

Chapter 9 Joining Data from Multiple Tables. Oracle 10g: SQL

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

SQL the natural language for analysis ORACLE WHITE PAPER JUNE 2015

How To Model Data For Business Intelligence (Bi)

Unit 3. Retrieving Data from Multiple Tables

An Oracle White Paper June Creating an Oracle BI Presentation Layer from Imported Oracle OLAP Cubes

New Approach of Computing Data Cubes in Data Warehousing

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

CA Performance Handbook. for DB2 for z/os

Optimizing Your Data Warehouse Design for Superior Performance

When to consider OLAP?

Oracle OLAP. Describing Data Validation Plug-in for Analytic Workspace Manager. Product Support

Multi-dimensional index structures Part I: motivation

Inside the PostgreSQL Query Optimizer

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

SQL Server for developers. murach's TRAINING & REFERENCE. Bryan Syverson. Mike Murach & Associates, Inc. Joel Murach

Introduction to Querying & Reporting with SQL Server

Data Warehousing With DB2 for z/os... Again!

TIM 50 - Business Information Systems

<Insert Picture Here> Enhancing the Performance and Analytic Content of the Data Warehouse Using Oracle OLAP Option

Building Advanced Data Models with SAP HANA. Werner Steyn Customer Solution Adoption, SAP Labs, LLC.

AN INTRODUCTION TO THE SQL PROCEDURE Chris Yindra, C. Y. Associates

CS54100: Database Systems

Understanding BEx Query Designer: Part-2 Structures, Selections and Formulas

The Benefits of Data Modeling in Business Intelligence.

LearnFromGuru Polish your knowledge

M Designing and Implementing OLAP Solutions Using Microsoft SQL Server Day Course

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

MS SQL Performance (Tuning) Best Practices:

In this session, we use the table ZZTELE with approx. 115,000 records for the examples. The primary key is defined on the columns NAME,VORNAME,STR

Advanced Query for Query Developers

Safe Harbor Statement

Microsoft Excel: Pivot Tables

The Unconstrained Primary Key

How To Use A Microsoft Sql Server 2005

Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

Accessing multidimensional Data Types in Oracle 9i Release 2

SQL Query Evaluation. Winter Lecture 23

DATA WAREHOUSING - OLAP

The strategic importance of OLAP and multidimensional analysis A COGNOS WHITE PAPER

Introduction and Overview for Oracle 11G 4 days Weekends

About PivotTable reports

Business Intelligence & Product Analytics

Using Multiple Operations. Implementing Table Operations Using Structured Query Language (SQL)

Implementing Data Models and Reports with Microsoft SQL Server 20466C; 5 Days

Cognos 8 Best Practices

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

IBM TRIRIGA Application Platform Version Reporting: Creating Cross-Tab Reports in BIRT

Essbase Integration Services Release 7.1 New Features

CIS 631 Database Management Systems Sample Final Exam

Module 1: Getting Started with Databases and Transact-SQL in SQL Server 2008

CUBE ORGANIZED MATERIALIZED VIEWS, DO THEY DELIVER?

ICAB4136B Use structured query language to create database structures and manipulate data

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

ETL TESTING TRAINING

Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1

Database Design Patterns. Winter Lecture 24

Relational Division and SQL

Transcription:

IBM Systems and Technology Group DB2 for i SQL OLAP Functions Mike Cain DB2 for i Center of Excellence Rochester, MN USA mcain@us.ibm.com 2007 IBM Corporation 8 Copyright IBM Corporation, 2008. All Rights Reserved. This publication may refer to products that are not currently available in your country. IBM makes 2009 no IBM commitment Corporation to make available any products referred to herein.

OLAP On Line Analytical Processing Think: building and delivering information via SQL 2

Review of the SQL OLAP functions RANK DENSE_RANK V5R4 ROW_NUMBER ROLLUP CUBE GROUPING SETS 6.1 GROUPING 3

RANK and DENSE_RANK Specifies that the ordinal rank of a row within the window is computed Rows that are not distinct with respect to the ordering within their window are assigned the same rank The results of ranking may be defined with or without gaps in the numbers resulting from duplicate values RANK specifies that the rank of a row is defined as 1 plus the number of rows that strictly precede the row Thus, if two or more rows are not distinct with respect to the ordering, then there will be one or more gaps in the sequential rank numbering DENSE_RANK specifies that the rank of a row is defined as 1 plus the number of preceding rows that are distinct with respect to the ordering Thus, there will be no gaps in the sequential rank numbering 4

RANK and DENSE_RANK RANK and DENSE_RANK for highlighting data attribute independent of the result set sorting SELECT empno, lastname, salary+bonus AS TOTAL_SALARY, RANK() OVER (ORDER BY salary+bonus DESC) AS Salary_Rank FROM employee WHERE salary + bonus > 30000 ORDER BY lastname Dense_Rank() Output 5

ROW_NUMBER Specifies that a sequential row number is computed for the row within the window defined by the ordering, starting with 1 for the first row If the ORDER BY clause is not specified in the window, the row numbers are assigned to the rows in arbitrary order, as returned by the subselect I.e. not according to any ORDER BY clause in the selectstatement 6

ROW_NUMBER ROW_NUMBER can be used to arbitrarily assign a number to query results SELECT ROW_NUMBER() OVER (ORDER BY workdept, lastname) AS Nbr, lastname, salary FROM employee ORDER BY workdept, lastname SELECT workdept, lastname, hiredate, ROW_NUMBER() OVER (PARTITION BY workdept ORDER BY hiredate) AS Nbr FROM employee ORDER BY workdept, hiredate 7

ROW_NUMBER /* Below is an example of an SQL statement using the ROW_NUMBER support. */ /* Note this particular technique works best when a table contains a single column primary key. */ WITH cte_row_number as ( SELECT emp_mast_pk, ROW_NUMBER () OVER (ORDER BY lastname, firstnme, midinit) AS cte_rrn FROM emp_mast WHERE workdept =? ) SELECT empno, lastname, firstnme, midinit FROM emp_mast JOIN cte_row_number USING (emp_mast_pk) WHERE cte_rrn BETWEEN? AND? ORDER BY cte_rrn; Common Table Expression (emp_mast_pk, cte_rrn) Final query to return a range of results /* The dynamic statement uses the ROW_NUMBER function to assign a row number to the result set of the derived table cte_row_number. */ /* The OVER clause defines the order of the data. */ /* The outer query filters based on the assigned row number. */ /* The final result set is ordered by the generated row number (cte_rrn) which is in ascending sequence based on the ROW_NUMBER OVER clause. */ 8

Grouping: What s in a name? Industry refers to Grouping Sets and Super Groups (Rollup and Cube) collectively as Grouping Sets Grouping Sets and Super Groups are sometimes referred to as Analytical SQL Grouping Functions The phrase Grouping Functions can be used to collectively describing Grouping Sets, Rollups, and Cubes 9

Grouping Sets and Super Groups ROLLUP and CUBE Many BI applications and OLAP tools involve hierarchical, multidimensional aggregate views of transaction data Users need to view results at multiple levels Users need to view result data from different perspective Current grouping support only allows aggregation data of along a SINGLE dimension EXAMPLE: SELECT country, region, store, product, SUM(sales) FROM trans GROUP BY country, region, store, product Limitations result in extra coding for programmers 1 level The 6.1 grouping and OLAP capabilities allow data to be grouped in multiple ways with a single SQL request ROLLUP CUBE GROUPING SETS Less Coding for Developers! 10

ROLLUP An extension to the GROUP BY clause that produces a result set containing subtotal rows in addition to the regular grouped rows Subtotal rows are superaggregate rows that contain further aggregates whose values are derived by applying the same column functions that were used to obtain the grouped rows ROLLUP on the GROUP BY clause results in DB2 returning aggregates for each level of the hierarchy implicitly represented in the grouping columns 11

ROLLUP ROLLUP(Country, Region) will result in the data being summarized at the following levels (Country, Region) (Country) ( ) << represents Grand Total Example Query: SELECT country, region, SUM(sales) FROM trans GROUP BY ROLLUP (country, region) 12

ROLLUP Output Example SELECT country, region, SUM(sales) FROM trans GROUP BY ROLLUP (country, region) GROUP BY country,null Country Region Sum(Sales) Canada 100,000 Canada NW 100,000 3,250,000 NE 450,000 NW 940,000 SE 550,000 GROUP BY NULL, NULL SW 1,310,000 3,350,000 13

CUBE An extension to the GROUP BY clause that produces a result set that contains all the rows of a ROLLUP aggregation, plus contains crosstabulation rows Crosstabulation rows are additional superaggregate rows that are not part of an aggregation with subtotals CUBE on the GROUP BY clause results in DB2 returning aggregates for all possible distinct combinations represented by the grouping columns 14

CUBE CUBE(Country, Region) will result in the data being summarized at the following levels (Country, Region) (Country) (Region) ( ) << represents Grand Total Returns results at multiple intersection points Example Query: SELECT country, region, SUM(sales) FROM trans GROUP BY CUBE(country, region) 15

CUBE Output Example SELECT country,region, SUM(sales) FROM trans GROUP BY CUBE (country, region) GROUP BY NULL, region Country Region NE Sum(Sales) 450000 NW 1040000 SE 550000 GROUP BY NULL, NULL SW 1310000 3350000 GROUP BY country, NULL Canada 100000 3250000 Canada NW 100000 NE 450000 NW 940000 SE 550000 SW 1310000 16

GROUPING SETS Allows multiple grouping clauses to be specified in a single statement This can be thought of as the union of two or more groups of rows into a single result set GROUPING SET on the GROUP BY clause enables DB2 to return aggregates for multiple sets of grouping columns 17

GROUPING SETS GROUPING SETS((Country, Region), (Country, Store)) will result in the data being summarized at the following levels (Country, Region) (Country, Store) CUBE and ROLLUP can be used in combination with Grouping Sets CAUTION: These types of combinations can result in an exponential growth in the number of grouping sets returned by a query, combine carefully Example Query: SELECT country, region, SUM(sales) FROM trans GROUP BY GROUPING SETS((country, region), (country, store)) 18

GROUPING SETS Output Example SELECT country, region, store, SUM(sales) FROM trans GROUP BY GROUPING SETS ((country, region), (country, store)) Country Region Store Sum(Sales) Canada NW 100,000 GROUP BY COUNTRY, REGION NE NW 450,000 940,000 SE 550,000 SW 1,310,000 Canada Dougs 100,000 Mariahs 350,000 GROUP BY COUNTRY, STORE KMs Jennas Adrians 770,000 400,000 500,000 Joshs 300,000 TZs 200,000 Maddies 210,000 19

GROUPING The GROUPING function can be used to determine if null values are from underlying user data or DB2 aggregate processing Function returns 1 if grouping column contains NULL value produced by grouping set or super group processing Function returns 0 if grouping column contains real GROUP BY value EXAMPLE: SELECT country,region, store, GROUPING(store), SUM(sales) FROM trans WHERE transyear = 2006 GROUP BY GROUPING SETS ((country, region),(country, store)) 20

Multiplicative vs. Additive CUBE and ROLLUP can appear as part of a GROUP BY or GROUPING SETS list When CUBE and ROLLUP expressions are combined in one set, they operate like multipliers, forming additional set entries according to the definition of CUBE, ROLLUP When the CUBE or ROLLUP are separate sets, they operate in an additive fashion The use of parenthesis and the GROUPING SETS vs. GROUP BY clause will dictate how CUBE and ROLLUP expressions interact with each other 21

Multiplicative vs. Additive Examples EXAMPLE 1: Set 1 Set 2 Set 3 GROUP BY GROUPING SETS ((A, B, C, D), (A, B, C), (A, B)) Additive grouping sets. GROUPING SETS without ROLLUP or CUBE are always additive. The parenthesis in red are required EXAMPLE 2: GROUP BY GROUPING SETS (CUBE(A,B), ROLLUP(C,D), E) GROUPING SETS clause used in both. One set of parenthesis around all sets specifies an additive association. Parenthesizing each set individually also specifies an additive association GROUPING SETS clause with single parenthesis is Additive CUBE(A,B) expands to GROUPING SETS ((A,B), (A), (B), ()) ROLLUP(C,D) expands to GROUPING SETS ((C,D), (C), ()) GROUP BY occurs in an additive fashion to form: GROUPING SETS ((A, B), (A), (B), (), (C, D), (C), (), (E)) 22

Multiplicative vs. Additive Examples EXAMPLE 3: GROUP BY A, B, ROLLUP(C,D) Or GROUP BY GROUPING SETS ((A, B, ROLLUP(C,D))) Parenthesis in red specifies that those elements are multiplicative in nature if a ROLLUP or CUBE exists. GROUPING SETS clause with double parenthesis is multiplicative GROUP BY clause with ROLLUP or CUBE is multiplicative ROLLUP(C,D) expands to GROUPING SETS ((C,D), (C), ()) GROUP BY occurs in a multiplicative fashion to form: GROUP BY GROUPING SETS ((a, b, c, d), (a, b, c), (a, b)) 23

Multiplicative vs. Additive Examples EXAMPLE 4: GROUP BY CUBE(A, B), ROLLUP(C, D), E OR GROUP BY GROUPING SETS ((CUBE(A,B), ROLLUP(C,D), E)) Parenthesis in red specifies that those elements are multiplicative in nature if a ROLLUP or CUBE exists CUBE(A,B) expands to GROUPING SETS ((A,B), (A), (B), ()) ROLLUP(C,D) expands to GROUPING SETS ((C,D), (C), ()) GROUP BY occurs in a multiplicative fashion to form: GROUPING SETS ((A, B, C, D, E), (A, B, C, E), (A, B, E), (A, C, D, E), (A, C, E), (A, E), (B, C, D, E), (B, C, E), (B, E), (C, D, E), (C, E), (E)) 12 Sets of results! 24

Grouping Sets and Super Groups Considerations The new grouping technology can result in an exponential growth in the number of results returned by a query Performance Considerations SQE query optimizer contains patented and unique technology allowing DB2 for i to internally compute multiple aggregates in single pass of data Assist the optimizer by creating indexes that cover all of the grouping columns in addition to any local, equal selection predicates Best Index keys for previous sample query (transyear, country, region, store) Index Advisor enhanced to support new grouping capabilities too! Do more with SQL, and do it faster and more efficiently! Only available in 6.1, and when using SQL queries via SQE 25

Performance Tuning Opportunities Optimal access method(s) in the feeder nodes Sound Indexing and statistics Strategy Create radix indexes for common Grouping Set and ROLLUP sets Run Visual Explain to view query rewrites Use index advised to find holes in indexing strategy and temporary index creates Sorted Aggregate Hash Table Creations Could be replaced by a permanent index if index encapsulates predicate selection, grouping and aggregation columns SMP / DB Parallelism can greatly improve populate time of the hash table Temporary Index Creations Could be replaced by a permanent index if index encapsulates predicate selection, grouping and aggregation columns Can imply that the environment is memory constrained Memory Constaints Grouping function queries are generally more memory intensive as they often union multiple trees that create and interrogate temporary objects Use feedback tools to determine the Optimizer s fair share of memory 26

Grouping Set Index Advised Example SELECT REGION, COUNTRY, COUNT(TERRITORY) FROM CUST_DIM WHERE CONTINENT = 'AMERICA' GROUP BY GROUPING SETS ((REGION),(COUNTRY)) 27

Grouping Set Index Advised Example 28

Grouping Set Index Advised Example 29

Grouping Set Index Advised Example SELECT REGION, COUNTRY, COUNT(TERRITORY) FROM CUST_DIM GROUP BY GROUPING SETS ((REGION),(COUNTRY)) 30

Grouping Set Index Advised Example 31

Grouping Set Index Advised Example 32

Q & A 33

Thank You 34