Execution Strategies for SQL Subqueries



Similar documents
SQL Query Evaluation. Winter Lecture 23

Chapter 13: Query Processing. Basic Steps in Query Processing

Introducing Microsoft SQL Server 2012 Getting Started with SQL Server Management Studio

Querying Microsoft SQL Server

Course 20461C: Querying Microsoft SQL Server Duration: 35 hours

Querying Microsoft SQL Server 2012

Querying Microsoft SQL Server 2012

Course ID#: W 35 Hrs. Course Content

Course 10774A: Querying Microsoft SQL Server 2012

Course 10774A: Querying Microsoft SQL Server 2012 Length: 5 Days Published: May 25, 2012 Language(s): English Audience(s): IT Professionals

Querying Microsoft SQL Server 20461C; 5 days

Chapter 13: Query Optimization

Inside the PostgreSQL Query Optimizer

MOC QUERYING MICROSOFT SQL SERVER

FHE DEFINITIVE GUIDE. ^phihri^^lv JEFFREY GARBUS. Joe Celko. Alvin Chang. PLAMEN ratchev JONES & BARTLETT LEARN IN G. y ti rvrrtuttnrr i t i r

Business Intelligence Extensions for SPARQL

MOC 20461C: Querying Microsoft SQL Server. Course Overview

Querying Microsoft SQL Server Course M Day(s) 30:00 Hours

Relational Databases

Translating SQL into the Relational Algebra

Querying Microsoft SQL Server (20461) H8N61S

Understanding Query Processing and Query Plans in SQL Server. Craig Freedman Software Design Engineer Microsoft SQL Server

Querying Microsoft SQL Server 2012

SQL Server for developers. murach's TRAINING & REFERENCE. Bryan Syverson. Mike Murach & Associates, Inc. Joel Murach

Chapter 14: Query Optimization

AV-005: Administering and Implementing a Data Warehouse with SQL Server 2014

Querying Microsoft SQL Server Querying Microsoft SQL Server D. Course 10774A: Course Det ails. Co urse Outline

Parallel & Distributed Data Management

Introduction to Querying & Reporting with SQL Server

SQL Query Performance Tuning: Tips and Best Practices

Oracle Database 11g: SQL Tuning Workshop Release 2

SQL Server Query Tuning

Advanced Oracle SQL Tuning

Evaluation of Expressions

Define terms Write single and multiple table SQL queries Write noncorrelated and correlated subqueries Define and use three types of joins

Advanced Query for Query Developers

Instant SQL Programming

In this session, we use the table ZZTELE with approx. 115,000 records for the examples. The primary key is defined on the columns NAME,VORNAME,STR

Oracle Database: SQL and PL/SQL Fundamentals

Oracle Database: SQL and PL/SQL Fundamentals NEW

Saskatoon Business College Corporate Training Centre

Oracle Database 11g: SQL Tuning Workshop

DBMS / Business Intelligence, SQL Server

Understanding SQL Server Execution Plans. Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at

Execution Plans: The Secret to Query Tuning Success. MagicPASS January 2015

Apache MRQL (incubating): Advanced Query Processing for Complex, Large-Scale Data Analysis

SQL SELECT Query: Intermediate

Efficient Data Access and Data Integration Using Information Objects Mica J. Block

Chapter 5: Overview of Query Processing

SUBQUERIES AND VIEWS. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 6

Microsoft Access 3: Understanding and Creating Queries

Oracle Database: SQL and PL/SQL Fundamentals

Oracle SQL. Course Summary. Duration. Objectives

SQL Server Query Tuning

Join Example. Join Example Cart Prod Comprehensive Consulting Solutions, Inc.All rights reserved.

Duration Vendor Audience 5 Days Oracle End Users, Developers, Technical Consultants and Support Staff

Query Processing C H A P T E R12. Practice Exercises

Data Structure: Relational Model. Programming Interface: JDBC/ODBC. SQL Queries: The Basic From

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package Data Federation Administration Tool Guide

Useful Business Analytics SQL operators and more Ajaykumar Gupte IBM

Oracle Database 12c: Introduction to SQL Ed 1.1

Introduction to tuple calculus Tore Risch

COMP 5138 Relational Database Management Systems. Week 5 : Basic SQL. Today s Agenda. Overview. Basic SQL Queries. Joins Queries

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

6.830 Lecture PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization

Query Optimization in Microsoft SQL Server PDW The article done by: Srinath Shankar, Rimma Nehme, Josep Aguilar-Saborit, Andrew Chung, Mostafa

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

LearnFromGuru Polish your knowledge

Chapter 9 Joining Data from Multiple Tables. Oracle 10g: SQL

Query tuning by eliminating throwaway

Relational Division and SQL

Query Optimization in Cloud Environment

Oracle White Paper June Optimizer with Oracle Database 12c

More on SQL. Juliana Freire. Some slides adapted from J. Ullman, L. Delcambre, R. Ramakrishnan, G. Lindstrom and Silberschatz, Korth and Sudarshan

Database Programming with PL/SQL: Learning Objectives

Oracle 10g PL/SQL Training

[Refer Slide Time: 05:10]

ICAB4136B Use structured query language to create database structures and manipulate data

Physical DB design and tuning: outline

SQL Performance and Tuning. DB2 Relational Database

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

Introduction to database design

An Oracle White Paper May The Oracle Optimizer Explain the Explain Plan

Exploring Query Optimization Techniques in Relational Databases

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

SQL SERVER DEVELOPER Available Features and Tools New Capabilities SQL Services Product Licensing Product Editions Will teach in class room

CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner

MTCache: Mid-Tier Database Caching for SQL Server

Effective Use of SQL in SAS Programming

Introduction to SQL for Data Scientists

SQL Server and MicroStrategy: Functional Overview Including Recommendations for Performance Optimization. MicroStrategy World 2016

BCA. Database Management System

D B M G Data Base and Data Mining Group of Politecnico di Torino

A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX

WRITING EFFICIENT SQL. By Selene Bainum

Transcription:

Execution Strategies for SQL Subqueries Mostafa Elhemali, César Galindo- Legaria, Torsten Grabs, Milind Joshi Microsoft Corp With additional slides from material in paper, added by S. Sudarshan 1 Motivation Optimization of subqueries has been studied for some time Challenges Mixing scalar and relational expressions Appropriate abstractions for correct and efficient processing Integration of special techniques in complete system This talk presents the approach followed in SQL Server Framework where specific optimizations can be plugged Framework applies also to nested loops languages 2 1

Outline Query optimizer context Subquery processing framework Subquery disjunctions 3 Algebraic query representation Relational operator trees Not SQL-block-focused SELECT SUM(T.a) FROM T, R WHERE T.b = R.b AND R.c = 5 GROUP BY T.c GroupBy T.c, sum(t.a) Select (T.b=R.b and R.c = 5) Cross product T R GroupBy T.c, sum(t.a) Join (T.b=R.b) T Select (R.c = 5) R algebrize transform 4 2

Operator tree transformations Select (A.x = 5) Join GropBy A.x, B.k, sum(a.y) Join Join A B A B A B Join Join Hash-Join Select (A.x = 5) B GropBy A.x, sum(a.y) B A B A A Simplification / normalization Exploration Implementation 5 SQL Server Optimization process T0 (input) simplify T1 cost-based optimization pool of alternatives use simplification / normalization rules search(0) search(1) search(2) use exploration and implementation rules, cost alternatives T2 (output) 6 3

Plan Generation Overview 7 Outline Query optimizer context Subquery processing framework Subquery disjunctions 8 4

SQL Subquery A relational expression where you expect a scalar Existential test, e.g. NOT EXISTS(SELECT ) Quantified comparison, e.g. T.a =ANY (SELECT ) Scalar-valued, e.g. T.a = (SELECT ) + (SELECT ) Convenient and widely used by query generators 9 Algebrization select * from customer where 100,000 < (select sum(o_totalprice) from orders where o_custkey = c_custkey) SELECT CUSTOMER < 1000000 SUBQUERY(X) ScalarGb SELECT ORDERS = O_CUSTKEY C_CUSTKEY X:= SUM O_TOTALPRICE Subqueries: relational operators with scalar parents Commonly correlated, i.e. they have outer-references 10 5

Subquery removal Executing subquery requires mutual recursion between scalar engine and relational engine Subquery removal: Transform tree to remove relational operators from under scalar operators Preserve special semantics of using a relational expression in a scalar, e.g. at-most-one-row 11 The Apply operator R Apply E(r) For each row r of R, execute function E on r Return union: {r 1 } X E(r 1 ) U {r 2 } X E(r 2 ) U Abstracts for each and relational function invocation Also known as d-join and tuple-substitution join Variants: left outer join, semi-join, anti-join Exposed in SQL Server (FROM clause) LATERAL clause in SQL standard Useful to invoke table-valued functions 12 6

Subquery removal SELECT CUSTOMER 1000000 < SUBQUERY(X) SELECT(1000000<X) APPLY(bind:C_CUSTKEY) ScalarGb CUSTOMER SGb(X=SUM(O_TOTALPRICE)) SELECT X:= SELECT(O_CUSTKEY=C_CUSTKEY) ORDERS = SUM ORDERS O_CUSTKEY C_CUSTKEY O_TOTALPRICE 13 Algebraization of SubQueries SQL Query: SELECT *, (SELECT C_NAME FROM CUSTOMER WHERE C_CUSTKEY = O_CUSTKEY) FROM ORDERS Translated to ORDERS Apply OJ (π [C_NAME] σ [C_CUSTKEY = O_CUSTKEY] CUSTOMER) In general: R Apply OJ max1row(e(r)) Subqueries with exists/not exists become R Apply SJ E(r) R Apply ASJ E(r) 14 7

Conditional Scalar Execution Expression CASE WHEN EXISTS(E1(r)) THEN E2(r) ELSE 0 END Translated to π CASE WHEN p = 1 THEN e2 ELSE 0 END ( (R Apply[semijoin, probe as p] E1(r)) Apply[outerjoin, pass-through p=1] max1row(e2(r)) as e2) 15 Disjunction of SubQueries WHERE p(r) OR EXISTS( E1(r)) OR EXISTS(E2(r)) R Apply SJ ((σ p(r) CT(1) UA E1(r) UA E2(r)) CT(1): Constant Table returning 1 UA: Union All Can also translate to apply with passthrough 16 8

Quantification and NULLs Consider predicate to <>ALL 5 NOT IN S which is equivalent The result of this predicate is as follows, for various cases of set S: 1. If S = {} then p is TRUE. 2. If S = {1} then p is TRUE. 3. If S = {5} then p is FALSE. 4. If S = {NULL, 5} then p is FALSE. 5. If S = {NULL, 1} then p is UNKNOWN. (FOR ALL s in S: p) = (NOT EXISTS s in S: NOT p): But only without nulls In general predicate A cmp B is translated as A <cmp > B OR A IS NULL OR B IS NULL where cmp is the complement of cmp 17 Apply removal Executing Apply forces nested loops execution into the subquery Apply removal: Transform tree to remove Apply operator The crux of efficient processing Not specific to SQL subqueries Can go by unnesting, decorrelation, unrolling loops Get joins, outerjoin, semijoins, as a result 18 9

Apply removal SELECT(1000000<X) SELECT (1000000 < X) APPLY(bind:C_CUSTKEY) Gb[C_CUSTKEY] X = SUM (O_TOTALPRICE) CUSTOMER SGb(X=SUM(O_TOTALPRICE) LEFT OUTERJOIN (O_CUSTKEY = C_CUSTKEY) SELECT(O_CUSTKEY=C_CUSTKEY) CUSTOMER ORDERS ORDERS Apply does not add expressive power to relational algebra Removal rules exist for different operators 19 Why remove Apply? Goal is NOT to avoid nested loops execution, but to normalize the query Queries formulated using for each surface may be executed more efficiently using set-oriented algorithms and queries formulated using declarative join syntax may be executed more efficiently using nested loop, for each algorithms 20 10

Removing Apply Cont. Apply removal that preserves the size of the expression. With Apply ORDERS Apply OJ (σ[c_custkey = O_CUSTKEY] CUSTOMER) Removing apply ORDERS OJ [C_CUSTKEY = O_CUSTKEY] CUSTOMER Apply removal that duplicates subexpressions. Apply removal not always possible max1row/pass-through predicates, opaque functions 21 Magic Sets Originally formulated for recursive query processing Special case for non-recursive queries 22 11

Magic Sets with Group By Other options B: Pull groupby above join C: Segmented execution, when R and S are the same E.g. Select all students with the highest mark 23 Reordering Semijoins and Antijoins Pushing down semi/anti joins Converting semi-join to join (to allow reordering) How about anti-joins? 24 12

Subquery Disjunctions generates an antijoin with predicate which can be rewritten using Another useful rule 25 Categories of execution strategies select from customer where exists( orders ) and customer semijoin orders normalized logical tree apply hash / merge join apply customer orders lookup customer orders orders customer lookup forward lookup set oriented reverse lookup 26 13

Forward lookup APPLY[semijoin](bind:C_CUSTKEY) CUSTOMER ORDERS Lkup(O_CUSTKEY=C_CUSTKEY) The natural form of subquery execution Early termination due to semijoin pull execution model Best alternative if few CUSTOMERs and index on ORDER exists 27 Reverse lookup DISTINCT on C_CUSTKEY APPLY(bind:O_CUSTKEY) ORDERS CUSTOMERS Lkup(C_CUSTKEY=O_CUSTKEY) APPLY(bind:O_CUSTKEY) DISTINCT on O_CUSTKEY CUSTOMERS Lkup(C_CUSTKEY=O_CUSTKEY) ORDERS Mind the duplicates Consider reordering GroupBy (DISTINCT) around join 28 14

Subquery processing overview SQL without subquery relational expr without Apply logical reordering set-oriented execution nested loops languages SQL with subquery Removal of Apply relational expr with Apply Removal of Subquery physical optimizations navigational, nested loops execution Parsing and normalization Cost-based optimization 29 The fine print Can you always remove subqueries? Yes, but you need a quirky Conditional Apply Subqueries in CASE WHEN expressions Can you always remove Apply? Not Conditional Apply Not with opaque table-valued functions Beyond yes/no answer: Apply removal can explode size of original relational expression 30 15

Outline Query optimizer context Subquery processing framework Subquery disjunctions 31 Subquery disjunctions select * from customer where c_catgory = preferred or exists(select * from nation where n_nation = c_nation and ) or exists(select * from orders where o_custkey = c_custkey and ) APPLY[semijoin](bind:C_CUSTKEY, C_NATION, C_CATEGORY) CUSTOMER UNION ALL SELECT(C_CATEGORY = preferred ) SELECT SELECT 1 NATION ORDERS Natural forward lookup plan Union All with early termination short-circuits OR computation 32 16

Apply removal on Union UNION (DISTINCT) SELECT(C_CATEGORY = preferred ) SEMIJOIN CUSTOMER SEMIJOIN CUSTOMER ORDERS CUSTOMER NATION Distributivity replicates outer expression Allows set-oriented and reverse lookup plan This form of Apply removal done in cost-based optimization, not simplification 33 Optimizing Apply Caching of results from earlier calls Trivial if no correlation variables In-memory if few distinct values/small results May or may no be worthwhile if large results Asynchronous IO Batch Sort 34 17

Asynchronous IO Ask OS to prefetch data, continue doing other things while prefetch is happening better use of resources, esp with multiple disks/cpus SELECT <blah blah> FROM PART natural join SUPPLIER natural join PARTSUPP WHERE <restrictive selections> AND PS_SUPPLYCOST = (SELECT MIN(PS_SUPPLYCOST) FROM PARTSUPP, SUPPLIER WHERE P_PARTKEY = PS_PARTKEY AND S_SUPPKEY = PS_SUPPKEY) Plan used by SQL Server (how the hell did it come up with this?) where 35 Batch Sort Sort order of parameters can help inner query But sorting outer query can be time consuming esp if we stop after a few answers So: batch a group of outer parameters, sort them, then invoke inner in sorted order Batch size increased step-wise so first few answers are fast at cost of more IO, later ones optimize IO more but with some delay 36 18

Summary Presentation focused on overall framework for processing SQL subqueries and for each constructs Many optimization pockets within such framework you can read in the paper: Optimizations for semijoin, antijoin, outerjoin Magic subquery decorrelation technique Optimizations for general Apply Goal of decorrelation is not set-oriented execution, but to normalize and open up execution alternatives 37 A question of costing Fwd-lookup 10ms to 3 days Bwd-lookup 10ms to 3 days, cases opposite to fwd-lkp Optimizer that picks the right strategy for you priceless Set-oriented execution 2 to 3 hours 38 19

Execution Strategies for Semijoin Outer query on orders, exists subquery on lineitem (Section 6.1) 39 Execution Strategies for Antijoin Outer query uses only orders, exists subquery on lineitem 40 20

Strategies for Subquery Disjunction Section 7.1: One disjunction is a select, other is an exists on a subquery 41 Execution Optimization for Apply 42 21