Analyzing Plan Diagrams of Data Query Optimizers

Similar documents
Business Intelligence Extensions for SPARQL

Advanced Oracle SQL Tuning

SQL SELECT Query: Intermediate

Look Out The Window Functions

HTSQL is a comprehensive navigational query language for relational databases.

Copyright 2015 NTT corp. All Rights Reserved.

Basics on Geodatabases

How To Create A Large Data Storage System

Trafodion Operational SQL-on-Hadoop

DBMS / Business Intelligence, SQL Server

SQL Server 2008 Core Skills. Gary Young 2011

When a variable is assigned as a Process Initialization variable its value is provided at the beginning of the process.

Partitioning under the hood in MySQL 5.5

1 Structured Query Language: Again. 2 Joining Tables

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases.

Power BI Performance. Tips and Techniques

Beginning C# 5.0. Databases. Vidya Vrat Agarwal. Second Edition

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

Introduction. Introduction: Database management system. Introduction: DBS concepts & architecture. Introduction: DBS versus File system

MySQL for Beginners Ed 3

Introduction: Database management system

Copying data from SQL Server database to an Oracle Schema. White Paper

Databases in Engineering / Lab-1 (MS-Access/SQL)

Oracle SQL. Course Summary. Duration. Objectives

Our Raison d'être. Identify major choice decision points. Leverage Analytical Tools and Techniques to solve problems hindering these decision points

Financial Data Access with SQL, Excel & VBA

INTRODUCING DRUID: FAST AD-HOC QUERIES ON BIG DATA MICHAEL DRISCOLL - CEO ERIC TSCHETTER - LEAD METAMARKETS

Monitoring PostgreSQL database with Verax NMS


Analyzing & Optimizing T-SQL Query Performance Part1: using SET and DBCC. Kevin Kline Senior Product Architect for SQL Server Quest Software

Development Best Practices

Creating Reports Crystal Clear

ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski

Big Data Technology CS , Technion, Spring 2013

Agile Business Suite: a 4GL environment for.net developers DEVELOPMENT, MAINTENANCE AND DEPLOYMENT OF LARGE, COMPLEX BACK-OFFICE APPLICATIONS

T-SQL STANDARD ELEMENTS

Chapter 13: Query Optimization

Data Mining: An Overview. David Madigan

PSU SQL: Introduction. SQL: Introduction. Relational Databases. Activity 1 Examining Tables and Diagrams

DB2 for Linux, UNIX, and Windows Performance Tuning and Monitoring Workshop

DB2 for i5/os: Tuning for Performance

SQL Server Parallel Data Warehouse: Architecture Overview. José Blakeley Database Systems Group, Microsoft Corporation

Chapter 6 - Condor finder

Oracle EXAM - 1Z Oracle Database 11g Release 2: SQL Tuning. Buy Full Product.

Explaining the Postgres Query Optimizer

SDMX technical standards Data validation and other major enhancements

Query Processing in Encrypted Cloud Databases

Lab # 5. Retreiving Data from Multiple Tables. Eng. Alaa O Shama

DB2 for i. Analysis and Tuning. Mike Cain IBM DB2 for i Center of Excellence. mcain@us.ibm.com

Database Management. Technology Briefing. Modern organizations are said to be drowning in data but starving for information p.

Karl Lum Partner, LabKey Software Evolution of Connectivity in LabKey Server

Inside the PostgreSQL Query Optimizer

OBIEE 11g Data Modeling Best Practices

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata

Sharding with postgres_fdw

Clustering through Decision Tree Construction in Geology

In this session, we use the table ZZTELE with approx. 115,000 records for the examples. The primary key is defined on the columns NAME,VORNAME,STR

Business Application Services Testing

1 Energy Data Problem Domain. 2 Getting Started with ESPER. 2.1 Experimental Setup. Diogo Anjos José Cavalheiro Paulo Carreira

Oracle and Sybase, Concepts and Contrasts

Relational Database Basics Review

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

database abstraction layer database abstraction layers in PHP Lukas Smith BackendMedia

Databases and SQL. The Bioinformatics Lab SS Wiki topic 10. Tikira Temu. 04. June 2013

Introduction to SQL for Data Scientists

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices

Evaluation of Expressions

IT2304: Database Systems 1 (DBS 1)

BCA. Database Management System

D B M G Data Base and Data Mining Group of Politecnico di Torino

Principles of Data Mining by Hand&Mannila&Smyth

Excel Power Tools. David Onder and Alison Joseph. NCAIR 2015 Conference

Discovering SQL. Wiley Publishing, Inc. A HANDS-ON GUIDE FOR BEGINNERS. Alex Kriegel WILEY

ARIZONA DEPARTMENT OF TRANSPORTATION. Presented by Lonnie D. Hendrix, P.E. Assistant State Engineer, Maintenance

Database System Architecture & System Catalog Instructor: Mourad Benchikh Text Books: Elmasri & Navathe Chap. 17 Silberschatz & Korth Chap.

Relational Database: Additional Operations on Relations; SQL

HBase Schema Design. NoSQL Ma4ers, Cologne, April Lars George Director EMEA Services

Open source framework for data-flow visual analytic tools for large databases

Oracle Database 11g: SQL Tuning Workshop

Exploring Query Optimization Techniques in Relational Databases

Integrating Open Sources and Relational Data with SPARQL

SQL Server 2014 Performance Tuning and Optimization 55144; 5 Days; Instructor-led

The Dynamics of Software Project Staffing: A System Dynamics Based Simulation Approach

Safe Harbor Statement

Course Outline. SQL Server 2014 Performance Tuning and Optimization Course 55144: 5 days Instructor Led

SQL SERVER GRAPH DATABASE PDF

Using Database Performance Warehouse to Monitor Microsoft SQL Server Report Content

SQL Query Evaluation. Winter Lecture 23

Exploratory Data Analysis for Ecological Modelling and Decision Support

Database Design and Database Programming with SQL - 5 Day In Class Event Day 1 Activity Start Time Length

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

Toad for Data Analysts, Tips n Tricks

Transcription:

Analyzing Plan Diagrams of Data Query Optimizers INSTITUT FÜR PROGRAMMSTRUKTUREN UND DATENORGANISATION (IPD), FAKULTY OF INFORMATICS 1 KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association Name of Institute, Faculty, Department www.kit.edu

Paper 2 Analyzing Plan Diagrams of Database Query Optimizers written in 2005 Naveen Reddy, Jayant R. Haritsa Database Systems Lab, SERC/CSA Indian Institute of Science, Bangalore 560012, INDIA

Relational Database Management System Where is a Query Optimizer located? Relational Database Management System (RDBMS) Client SQL Query Optimizer Parser Result... Database 1 Database 2 Database 3 Database System 3

Query Optimizer How does a Query Optimizer work? Parsed Query (from Parser) subquery unnesting, query rewrite with materialized views,... Query Transformer Transformed Query Estimator Dictionary Query and estimates Plan Generator Query Plan (to Row Source Generator) 4

Query Optimizer 5 Identifies the most efficent strategy to execute SQL queries Returns query plan Part of the RDBMS Based on estimated number of rows of relevant relations Relevant join conditions are given in the query

Query Plan A strategy to execute SQL queries Query Plan does not affect the result Example: (id is FK of Aid) select o,p from A inner join B on (A.id=B.Aid) where B.p=0; Plan P1: AB A=3, B=6/3=2 => cost(ab)=6 id 1 2 3 o X Y Z Plan P2: BA 6 A B id 1 2 3 4 5 6 Aid 1 1 2 2 3 3 p 0 1 2 0 0 1 B=3 (using where), A=1 (primary key) => cost(ba)=3

How to analyze a query optimizer? 1. Send SQL statement EXPLAIN SELECT FROM WHERE 2. Optimizer returns query plan 3. This can be repeated by varying the where condition(s) systematically until the selectivity space is covered 4. Corresponding plan and cost for each set of conditions can be visualized Customers 4711 Selectivity Space 13 7 Orders

Query plan costs A id 1 2 3 o X Y Z B id 1 2 3 4 5 6 Aid 1 1 2 2 3 3 p 0 1 2 0 0 1 test=# explain select o,p from A inner join B on (A.id=B.Aid) where B.p=0; QUERY PLAN ----------------------------------------------------------Hash Join (cost=1.07..2.18 rows=3 width=6) Hash Cond: (b.aid = a.id) -> Seq Scan on b (cost=0.00..1.07 rows=3 width=8) Filter: (p = 0) -> Hash (cost=1.03..1.03 rows=3 width=6) -> Seq Scan on a (cost=0.00..1.03 rows=3 width=6) 8

TPC-H Decision support benchmark database schema, data queries for testing (Q1 through Q22) We use same data set as the authors of the paper: relation part.tbl nation.tbl customer.tbl region.tbl partsupp.tbl supplier.tbl orders.tbl lineitem.tbl 9 #rows 200000 25 150000 5 800000 10000 1500000 6001215

Picasso 10 Picasso Sends explain queries against the database Stores and visualizes query plan and costs Picasso Server Connects to Database (SQL-Server, Oracle, Sybase, postgresql,...) Multiuser Picasso Client Connects to PICASSO Server GUI for PICASSO Server

Run test with Picasso 11

Run test with Picasso 12 Picasso varies WHERE conditions according to plot resolution: and o_totalprice :varies and c_acctbal :varies 2 varying conditions 2-dimensional selectivity space (2D) Picasso runs EXPLAIN-Statements to determine the query plan chosen by the optimizer. 2D and Resolution=100 100² = 10,000 EXPLAIN-Statements

Diagram creation process Create EXPLAIN SELECT... SQL-Statement with condition(s) SQL Optimizer RDBMS Add new point to the Plan Diagram Query Plan Output Diagram 13 explain select name, price from customer, order where [ ] and o_totalprice <= 98679 and c_acctbal <= 5809; result: P4

Plan Diagram and Cost Diagram Plan Coverage of selectivity space in % Plan Diagram: 2D-visualization of execution plans 14 Cost Diagram: 3D-visualization of estimated costs. Estimation is done by optimizer.

Cost domination Principle Monotonic cost behaviour: Increasing selectivity space increases estimated costs Non-monotonic cost behaviour: Increasing selectivity space does not always increase estimated costs Cost domination principle: When increasing selectivity space, estimated costs must increase monotonically. 15

Estimated costs vs. execution costs SQL Statement for each point in diagram Measurement Creation time for a 10x10 diagram Monotonicity Estimated costs Execution costs explain select... select... #rows,... execution time 10s 4h 10min Estimated cost diagram should always be monotonic Execution cost diagram might not be monotonic, e.g. if estimations were not precise => Even if we have Monotonic estimated cost behaviour, we want to reduce #Plans and #Segments. 16

Reduced Plan Diagram Plan Diagram Reduced Plan Diagram Maximum allowed Cost Increase: 5% 10% Reduced Plan Diagram: Try to reduce #Plans without increasing estimated costs more than X% in each point (X% is called Costgreedy Reduction or plan optimality tolerance threshold.) 17

TPC-H Query 7 Plan Diagram 4 Plans P4 covers 1,78% only Many small segments Not smooth 18 Cost Diagram monotonic Reduced Plan Diagram 2 Plans only Max Increase 6.73% Avg Increase 0.72%

TPC-H Query 8 Plan Diagram 9 Plans 80% covered by 3 Plans Many small segments Not smooth 19 Cost Diagram monotonic Reduced Plan Diagram 4 Plans only Max Increase 9.8% Avg Increase 0.41%

TPC-H Query 9 Plan Diagram 22 Plans 80% covered by 3 Plans Many small segments Not smooth Highly instable 20 Cost Diagram monotonic Reduced Plan Diagram 10 Plans Max Increase 9.15% Avg Increase 0.15%

Clarifications 21 #Plans does not necessarily correspond with execution time of one query. We do not rate performance! But: We consider the stability of an optimizer.

Hypothesis 22 Modern query optimizers make extremely fine-grained plan choices smaller plans occupy less than 1% of selectivity space Optimizers are over-sophisticated Processing overheads with query optimization could be lowered

Parametric query optimization (PQO) PQO at run time, use plan corresponding to selectivity parameter(s). PQO Assumptions: Plan Convexity But: today's optimizers can look like that: If Plan P is optimal at point A and at point B, it is also optimal at all points joining the two points Plan Uniqueness An optimal plan P appears at only one contiguous region in the entire space Plan Homogenity 23 apriori identify the optimal set of plans at compile time An optimal plan P is optimal within the entire region enclosed by its plan boundaries. Better: reduce #plans for PQO

Conclusion Often highly intricate diagrams Typically 80% of space covered by 20% of plans (postgresql: 37%) 24 Cardinality of plan diagram can be reduced, without materially affecting the query cost Assumptions of parametric query optimization literature do not hold in practice. (Reduction needed) Needed: Mechanism for pruning the plan search space Directly reduce #plans by optimizer

Discussion Plan Diagram Cost Diagram Do you have any questions? 25 Reduced Plan Diagram?

Comparison of Database Analyzers TPC-H Query 2 5 7 8 9 10 18 21 postgresql OptA OptB OptC 26 %plans for 80% 100,00% 37,50% 50,00% 33,33% 13,64% 22,22% 14,29% 25,00% 37,00% Gini Index 0,47 0,65 0,57 0,68 0,76 0,48 0,24 0,28 0,52 28,7 24,5 28,8 17,00% 23,00% 16,00% 0,79 0,72 0,8 Measurement of statistical dispersion OptX: Commercial DBs tested by IIS #Plans 2 8 4 9 22 9 7 4 8,13 postgresql: Same dataset, similar reference system postgresql uses smaller #Plans to cover space. postgresql needs higher percentage of plans to cover 80% of selectivity space. postgresql's Gini Index is close to 0.5 which is better than 0.7 or 0.8.

Generation of Reduced Plan Diagram 27 P1, P4 are upper bounds of P2 We move point to P1 which has lowest cost (90) If all Points of a plan are swit

Query plan costs 28 The order of the operations in a query plan is important: cost(ab) cost(abc)!= cost(a) * cost(bc) cost(abc)!= cost(ab) * cost(c)!= cost(ba)

Complex patterns 29 Rapidly alternating choices not created on postgresql