Fallacies of the Cost Based Optimizer

Similar documents

Anatomy of a SQL Tuning Session

D B M G Data Base and Data Mining Group of Politecnico di Torino

TWO TYPES OF HISTOGRAMS

Oracle Database 11g: SQL Tuning Workshop

SQL Tuning Proven Methodologies

Oracle Database 11g: SQL Tuning Workshop Release 2

SQL Query Evaluation. Winter Lecture 23

The Sins of SQL Programming that send the DB to Upgrade Purgatory Abel Macias. 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

Advanced Oracle SQL Tuning

An Oracle White Paper May The Oracle Optimizer Explain the Explain Plan

Why Query Optimization? Access Path Selection in a Relational Database Management System. How to come up with the right query plan?

Best Practices for DB2 on z/os Performance

Oracle InMemory Database

Oracle EXAM - 1Z Oracle Database 11g Release 2: SQL Tuning. Buy Full Product.

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

CA Performance Handbook. for DB2 for z/os

Chapter 9 Joining Data from Multiple Tables. Oracle 10g: SQL

An Oracle White Paper May Guide for Developing High-Performance Database Applications

Oracle DBA Course Contents

SQL Server Query Tuning

Execution Plans: The Secret to Query Tuning Success. MagicPASS January 2015

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design

Lab # 5. Retreiving Data from Multiple Tables. Eng. Alaa O Shama

Guide to Performance and Tuning: Query Performance and Sampled Selectivity

Answer Key. UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division

Top 25+ DB2 SQL Tuning Tips for Developers. Presented by Tony Andrews, Themis Inc.

Instant SQL Programming

D B M G Data Base and Data Mining Group of Politecnico di Torino

Query Processing C H A P T E R12. Practice Exercises

Chapter 13: Query Processing. Basic Steps in Query Processing

SQL Performance for a Big Data 22 Billion row data warehouse

Query tuning by eliminating throwaway

In this session, we use the table ZZTELE with approx. 115,000 records for the examples. The primary key is defined on the columns NAME,VORNAME,STR

Duration Vendor Audience 5 Days Oracle End Users, Developers, Technical Consultants and Support Staff

CONFIGURING AND OPERATING STREAMED PROCESSING IN PEOPLESOFT GLOBAL PAYROLL IN PEOPLETOOLS 8.48/9

Performing Queries Using PROC SQL (1)

Partitioning under the hood in MySQL 5.5

SQL Performance and Tuning. DB2 Relational Database

Understanding Query Processing and Query Plans in SQL Server. Craig Freedman Software Design Engineer Microsoft SQL Server

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

SQL Query Performance Tuning: Tips and Best Practices

DB2 for Linux, UNIX, and Windows Performance Tuning and Monitoring Workshop

Performance Implications of Various Cursor Types in Microsoft SQL Server. By: Edward Whalen Performance Tuning Corporation

Part 5: More Data Structures for Relations

SQL Server Query Tuning

Understanding SQL Server Execution Plans. Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at

Database Administration with MySQL

Tune That SQL for Supercharged DB2 Performance! Craig S. Mullins, Corporate Technologist, NEON Enterprise Software, Inc.

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

PERFORMANCE TUNING FOR PEOPLESOFT APPLICATIONS

Displaying Data from Multiple Tables. Copyright 2004, Oracle. All rights reserved.

Oracle Database In-Memory The Next Big Thing

A basic create statement for a simple student table would look like the following.

Oracle Database: SQL and PL/SQL Fundamentals NEW

Performance Tuning for the Teradata Database

Datenbanksysteme II: Implementation of Database Systems Implementing Joins

IBM DB2: LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

Execution plans and tools

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

SQL Server. 1. What is RDBMS?

Oracle Database: SQL and PL/SQL Fundamentals

SQL Performance Hero and OMG Method vs the Anti-patterns. Jeff Jacobs Jeffrey Jacobs & Associates jmjacobs@jeffreyjacobs.com

SQL SELECT Query: Intermediate

FHE DEFINITIVE GUIDE. ^phihri^^lv JEFFREY GARBUS. Joe Celko. Alvin Chang. PLAMEN ratchev JONES & BARTLETT LEARN IN G. y ti rvrrtuttnrr i t i r

2874CD1EssentialSQL.qxd 6/25/01 3:06 PM Page 1 Essential SQL Copyright 2001 SYBEX, Inc., Alameda, CA

DBMS / Business Intelligence, SQL Server

Query Optimization in a Memory-Resident Domain Relational Calculus Database System

Exploring Query Optimization Techniques in Relational Databases

Programming with SQL

Exadata for Oracle DBAs. Longtime Oracle DBA

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

Who am I? Copyright 2014, Oracle and/or its affiliates. All rights reserved. 3

Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC

MOC 20461C: Querying Microsoft SQL Server. Course Overview

Introduction. Part I: Finding Bottlenecks when Something s Wrong. Chapter 1: Performance Tuning 3

ICAB4136B Use structured query language to create database structures and manipulate data

Safe Harbor Statement

database abstraction layer database abstraction layers in PHP Lukas Smith BackendMedia

Introducing Cost Based Optimizer to Apache Hive

Unit Storage Structures 1. Storage Structures. Unit 4.3

Optimizing Your Data Warehouse Design for Superior Performance

Contents at a Glance. Contents... v About the Authors... xiii About the Technical Reviewer... xiv Acknowledgments... xv

Wolfgang Breitling, Centrex Consulting Corporation THE EVENT DISCLAIMER HOW TO SET EVENT 10053

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?

DB2 Developers Guide to Optimum SQL Performance

Normal Form vs. Non-First Normal Form

news from Tom Bacon about Monday's lecture

Writing SQL. PegaRULES Process Commander

Oracle Database: SQL and PL/SQL Fundamentals

CIS 631 Database Management Systems Sample Final Exam

Displaying Data from Multiple Tables. Copyright 2004, Oracle. All rights reserved.

Transcription:

Fallacies of the Cost Based Optimizer Wolfgang Breitling breitliw@centrexcc.com

Who am I Independent consultant since 1996 specializing in Oracle and Peoplesoft setup, administration, and performance tuning 25+ years in database management DL/1, IMS, ADABAS, SQL/DS, DB2, Oracle OCP certified DBA - 7, 8, 8i, 9i Oracle since 1993 (7.0.12) Mathematics major at University of Stuttgart 2

Who are YOU? DBA Developer Management Oracle 9 R1 / R2 Oracle 8 Oracle 7?? Oracle 10?? 3

Cost vs. Performance Correlation between cost and performance? Why not? 4

Assumptions Uniform Distribution Assumption Uniform Distribution over Blocks Uniform Distribution over Rows Uniform Distribution over Range of Values Predicate Independence Assumption Join Uniformity Assumption 5

Selectivity and Cardinality Selectivity = FF = card est / card base card est = FF * card base 6

The Makeup of Plan Costs The base table access cost is dependent on estimated # of blocks accessed which is - directly or indirectly - a function of the estimated row cardinality: Table scan nblks / k Unique scan blevel + 1 Fast full scan Index-only Range scan leaf_blocks / k blevel + FF * leaf_blocks blevel + FF * leaf_blocks + FF * clustering_factor 7

The Makeup of Plan Costs Join cost is dependent on cardinality of row sources Nested Loop $ outer + card outer * $ inner Sort-Merge $ outer + $sort outer + $ inner + $sort inner Hash $ outer + $ inner + $ hash 8

Plan Costs Recap Estimated cardinality = selectivity * base cardinality The cost of an access plan is a function of the estimated cardinalities of its components. Incorrect estimates lead to incorrect plan component costs and sub-optimal or wrong access plans. This is why accurate cardinality estimates are so important. 9

Distribution over blocks uniform clustered 7 35 6 30 5 25 4 20 3 15 2 10 1 5 0 0 10

Distribution of Value Frequencies for an equality predicate (last_name = 'Smith') the selectivity is set to the reciprocal of the number of distinct values of last_name, because the query selects rows that all contain one out of N distinct values. * * Oracle 9i Performance Tuning Guide and Reference 11

Distribution of Value Frequencies Power distribution 40,000 35,000 30,000 25,000 y = 36641x -0.5879 20,000 15,000 10,000 5,000 0 12

Distribution of Value Frequencies column NDV density -------------- ---------- ------------ EMPLID 10,000 1.0000E-04... COMPANY 200 5.0000E-03 Select emplid, jobcode, salary from ps_job5 b where b.company = 'B01' explain plan card operation ---- --------------------------------- 50 SELECT STATEMENT 50 TABLE ACCESS BY INDEX ROWID PS_JOB5 50 INDEX RANGE SCAN PSBJOB5 execution plan card operation ------ ------------------------------- 530 SELECT STATEMENT 530 TABLE ACCESS BY INDEX ROWID PS_JOB5 531 INDEX GOAL: ANALYZED (RANGE SCAN) OF 'PSBJOB5' (NON-UNIQUE) call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.47 0.47 21 359 5 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 37 0.48 0.47 420 567 0 530 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 39 0.95 0.94 441 926 5 530 13

Distribution of Value Frequencies column NDV density -------------- ---------- ------------ EMPLID 10,000 1.0000E-04... COMPANY 200 5.0000E-03 600 500 400 300 B01 C02 A03 B04 COM COUNT(0) --- ---------- B01 530 C02 350 A03 274 B04 231 C05 202 A06 181 B07 165 C08 152 C00 28 Select emplid, jobcode, salary from ps_job5 b where b.company = 'B01' explain plan card operation ---- --------------------------------- 50 SELECT STATEMENT 50 TABLE ACCESS BY INDEX ROWID PS_JOB5 50 INDEX RANGE SCAN PSBJOB5 200 100 14 0

Distribution of Value Frequencies With Histogram on company Analyze table ps_job5 compute statistics for columns company [ size 75 ]; column NDV density -------------- ---------- ------------ EMPLID 10,000 1.0000E-04... COMPANY 200 6.0644E-03 Select emplid, jobcode, salary from ps_job5 b where b.company = 'B01' explain plan execution plan card operation ------ ------------------------------- 530 SELECT STATEMENT GOAL: CHOOSE 530 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'PS_JOB5' card operation ---- --------------------------------- 534 SELECT STATEMENT 534 TABLE ACCESS FULL PS_JOB5 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.17 0.15 25 424 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 37 0.24 0.22 912 943 15 530 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 39 0.41 0.37 937 1367 15 530 15

Distribution of Value Frequencies With Histogram and bind Variable on company column NDV density -------------- ---------- ------------ EMPLID 10,000 1.0000E-04... COMPANY 200 6.0644E-03 Select emplid, jobcode, salary from ps_job5 b where b.company = :b1 explain plan card operation ---- --------------------------------- 61 SELECT STATEMENT 61 TABLE ACCESS BY INDEX ROWID PS_JOB5 61 INDEX RANGE SCAN PSBJOB5 10,000 * 6.0644 e-3 = 60.644 rounded up to 61. 16

Distribution of Value Frequencies With Histogram and bind Variable on company Analyze table ps_job5 compute statistics for columns company size 10; column NDV density -------------- ---------- ------------ EMPLID 10,000 1.0000E-04... COMPANY 200 1.0870E-02 Select emplid, jobcode, salary from ps_job5 b where b.company = :b1 explain plan card operation ---- --------------------------------- 109 SELECT STATEMENT 109 TABLE ACCESS FULL PS_JOB5 17

Distribution over Range of Values The optimizer assumes that employee_id values are distributed evenly in the range between the lowest value and highest value. * * Oracle 9i Performance Tuning Guide and Reference 18

Distribution over Range of Values table column NDV density lo hi PS_LEDGER ACCOUNTING_PERIOD 15 6.6667E-02 0 999 Period 0 holds opening balances, periods 1-12 hold the ledger entries for the months, and periods 998 and 999 are used for special processing. 19

Distribution over Range of Values table column NDV density lo hi PS_LEDGER ACCOUNTING_PERIOD 15 6.6667E-02 0 999 accounting_period = n [ n ε {1.. 12} ] selectivity = 1/ndv = 1/15 = 6.6667e -2 accounting_period between 1 and 12 selectivity = 12/(999-0) + 1/15 = 7.8679e -2 accounting_period < 12 selectivity = (12-0)/(999-0) = 1.2012e -2 20

Distribution over Range of Values Adjusting the high-value statistic select sum(posted_total_amt) from ps_ledger where accounting_period between 1 and 12 Column: ACCOUNTING Col#: 11 Table: PS_LEDGER Alias: PS_LEDGER NDV: 15 NULLS: 0 DENS: 6.6667e-002 LO: 0 HI: 999 TABLE: PS_LEDGER ORIG CDN: 745198 CMPTD CDN: 58632 Column: ACCOUNTING Col#: 11 Table: PS_LEDGER Alias: PS_LEDGER NDV: 15 NULLS: 0 DENS: 6.6667e-002 LO: 0 HI: 14 TABLE: PS_LEDGER ORIG CDN: 745198 CMPTD CDN: 684873 select sum(posted_total_amt) from ps_ledger where accounting_period < 12 Column: ACCOUNTING Col#: 11 Table: PS_LEDGER Alias: PS_LEDGER NDV: 15 NULLS: 0 DENS: 6.6667e-002 LO: 0 HI: 999 TABLE: PS_LEDGER ORIG CDN: 745198 CMPTD CDN: 49680 Column: ACCOUNTING Col#: 11 Table: PS_LEDGER Alias: PS_LEDGER NDV: 15 NULLS: 0 DENS: 6.6667e-002 LO: 0 HI: 14 TABLE: PS_LEDGER ORIG CDN: 745198 CMPTD CDN: 638742 21

Predicate Independence Assumption P1 AND P2 P1 OR P2 S(P1&P2) = S(P1) * S(P2) S(P1 P2) = S(P1) + S(P2) -[S(P1) * S(P2)] select emplid, jobcode, salary from ps_job1 b where b.company = 'CCC' and b.paygroup = 'FGH'; 250 rows selected. Explain Plan card operation 251 SELECT STATEMENT 251 TABLE ACCESS BY INDEX ROWID PS_JOB1 251 INDEX RANGE SCAN PSBJOB1 select emplid, jobcode, salary from ps_job2 b where b.company = 'CCC' and b.paygroup = 'FGH'; 2500 rows selected. Explain Plan card operation 251 SELECT STATEMENT 251 TABLE ACCESS BY INDEX ROWID PS_JOB2 251 INDEX RANGE SCAN PSBJOB2 22

Predicate Independence Assumption table rows blks empty chain avg_rl PS_JOB1 50,000 4,547 3 0 317 table column NDV density bkts PS_JOB1 EMPLID 10,000 1.0000E-04 1 PS_JOB1 JOBCODE 198 5.0505E-03 1 PS_JOB1 COMPANY 10 1.0000E-01 1 PS_JOB1 PAYGROUP 20 5.0000E-02 1 PS_JOB1 SALARY 49,597 2.0163E-05 1 table rows blks empty chain avg_rl PS_JOB2 50,000 4,547 3 0 317 table column NDV density bkts PS_JOB2 EMPLID 10,000 1.0000E-04 1 PS_JOB2 JOBCODE 199 5.0251E-03 1 PS_JOB2 COMPANY 10 1.0000E-01 1 PS_JOB2 PAYGROUP 20 5.0000E-02 1 PS_JOB2 SALARY 49,848 2.0061E-05 1 card est = card base * sel (company AND paygroup) = sel(company) * sel(paygroup) = 50000 * 1.0000e -01 * 5.0000e -02 = 250 index column NDV _#LB PSBJOB1 200 400 COMPANY 10 PAYGROUP 20 index column NDV _#LB PSBJOB2 20 449 COMPANY 10 PAYGROUP 20 23

Join Uniformity Assumption join cardinality = card A * card B * join selectivity join selectivity = 1/max(ndv A, ndv B ) principle of inclusion, i.e. each value of the smaller domain has a match in the larger domain which is frequently true for joins between foreign keys and primary keys. 24

Join Uniformity Assumption SQL> select 'A-' a.n1, 'B-' b.n1 2 from t1 a, t1 b 3 where a.n1 = b.n1; 10 SELECT STATEMENT 10 HASH JOIN 10 TABLE ACCESS FULL T1 10 TABLE ACCESS FULL T1 A-0 B-0 A-1 B-1 A-2 B-2 A-3 B-3 A-4 B-4 A-5 B-5 A-6 B-6 A-7 B-7 A-8 B-8 A-9 B-9 10 rows selected. 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 Join cardinality = card A * card B * join selectivity = card A * card B * 1/max(ndv a, ndv b ) = 10 * 10 * 1/max(10, 10 ) = 10 25

Join Uniformity Assumption SQL> select 'A-' a.n1, 'B-' b.n1 2 from t1 a, t2 b 3 where a.n1 = b.n1; 10 SELECT STATEMENT 10 HASH JOIN 10 TABLE ACCESS FULL T2 10 TABLE ACCESS FULL T2 A-5 B-5 A-6 B-6 A-7 B-7 A-8 B-8 A-9 B-9 5 rows selected. 0 1 2 3 4 5 6 7 10 11 12 13 14 5 6 7 8 8 9 9 Join cardinality = card A * card B * join selectivity = card A * card B * 1/max(ndv a, ndv b ) = 10 * 10 * 1/max(10, 10 ) = 10 26

Join Uniformity Assumption SQL> select 'A-' a.n1, 'B-' b.n1 2 from t2 a, t2 b 3 where a.n1 = b.n1; 20 SELECT STATEMENT 20 HASH JOIN 10 TABLE ACCESS FULL T2 10 TABLE ACCESS FULL T2 A-0 B-0 A-0 B-0 A-0 B-0 A-0 B-0 A-1 B-1 A-2 B-2 A-3 B-3 A-4 B-4 0 0 0 0 0 0 1 2 0 0 0 0 0 0 1 2 40 rows selected. 3 3 4 4 Join cardinality = card A * card B * join selectivity = card A * card B * 1/max(ndv a, ndv b ) = 10 * 10 * 1/max(5, 5 ) = 20 27

Join Selectivity and Cardinality insert into t1(n1,n2) select mod(rownum,10),mod(rownum,5) from dba_objects where rownum <= 50; column NDV density N1 10 1.0000E-01 N2 5 2.0000E-01 select 'A.' A.n1 '-B.' B.n1 from t1 a, t2 b where a.n1 = b.n1; Execution Plan Explain Plan card operation 250 SELECT STATEMENT 250 HASH JOIN 50 TABLE ACCESS FULL T1 50 TABLE ACCESS FULL T2 Rows Execution Plan 0 SELECT STATEMENT GOAL: CHOOSE 250 HASH JOIN 50 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'T1' 50 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'T2' 28

Join Selectivity and Cardinality select 'A.' A.n1 '-B.' B.n1 from t1 a, t2 b where a.n1 = b.n1 and a.n2 = 5; Explain Plan card operation 50 SELECT STATEMENT 50 HASH JOIN 10 TABLE ACCESS FULL T1 50 TABLE ACCESS FULL T2 Execution Plan Rows Execution Plan 0 SELECT STATEMENT GOAL: CHOOSE 50 HASH JOIN 10 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'T1' 50 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'T2' 29

Join Selectivity and Cardinality select 'A.' A.n1 '-B.' B.n1 from t1 a, t2 b where a.n1 = b.n1 and a.n1 = 5; Explain Plan card operation 5 SELECT STATEMENT 5 HASH JOIN 5 TABLE ACCESS FULL T1 5 TABLE ACCESS FULL T2 Execution Plan Rows Execution Plan 0 SELECT STATEMENT GOAL: CHOOSE 25 HASH JOIN 5 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'T1' 5 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'T2' 30

Join Selectivity and Cardinality Oracle 9i (9.2.0.4) and 10g select 'A.' A.n1 '-B.' B.n1 from t1 a, t2 b where a.n1 = b.n1 and a.n2 = 5; Card Plan 9i & 10g 50 SELECT STATEMENT (all_rows) (Cost 5) 50 HASH JOIN (Cost 5) 10 TABLE ACCESS (analyzed) TABLE SCOTT T1 (full) (Cost 2) 50 TABLE ACCESS (analyzed) TABLE SCOTT T2 (full) (Cost 2) select 'A.' A.n1 '-B.' B.n1 from t1 a, t2 b where a.n1 = b.n1 and a.n1 = 5; Card Plan 9i 10g 25 SELECT STATEMENT (all_rows) (Cost 12) (Cost 4) 25 MERGE JOIN (cartesian) (Cost 12) (Cost 4) 5 TABLE ACCESS (analyzed) TABLE SCOTT T1 (full) (Cost 2) (Cost 2) 5 BUFFER (sort) (Cost 10) (Cost 2) 5 TABLE ACCESS (analyzed) TABLE SCOTT T2 (full) (Cost 2) (Cost 0) 31

References Oracle 9i Performance Tuning Guide and Reference Note 10626.1 Note 35934.1 Note 212809.1 Note 46234.1 Note 68992.1 Cost Based Optimizer (CBO) Overview Cost Based Optimizer - Common Misconceptions and Issues Limitations of the Oracle Cost Based Optimizer Interpreting Explain plan Predicate Selectivity Steve Adams Ixora News - April 2001. www.ixora.com.au/newsletter/2001_04.htm Cary Millsap When to Use an Index. www.hotsos.com 32

References G. Piatetsky-Shapiro, C. Connell: Accurate Estimation of the Number of Tuples Satisfying a Condition. Proceedings of the ACM SIGMOD International Conference on Management of Data. June 1984. p. 256-275. C. A. Lynch: Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distribution of Column Values. Proceedings of the International Conference on Very Large Data Bases. August, 1988. p. 240-251. Y. E. Ioannidis, S. Christodoulakis: On the Propagation of Errors in the Size of Join Results. Proceedings of the ACM SIGMOD International Conference on Management of Data. May, 1991. p. 268-277. A. N. Swami, K. B. Schiefer: On the Estimation of Join Result Sizes. Proceedings of the International Conference on Extending Database Technology. March, 1994. p. 287-300. M. Stillger, G. Lohman, V. Markl, M. Kandil: LEO DB2 s LEarning Optimizer. Proceedings of the International Conference on Very Large Data Bases. September 2001. Rome, Italy. p. 19-28 33

Wolfgang Breitling Centrex Consulting Corporation breitliw@centrexcc.com www.centrexcc.com