Query Processing. Q Query Plan. Example: Select B,D From R,S Where R.A = c S.E = 2 R.C=S.C. Himanshu Gupta CSE 532 Query Proc. 1



Similar documents
Evaluation of Expressions

Chapter 14: Query Optimization

Chapter 13: Query Optimization

Chapter 13: Query Processing. Basic Steps in Query Processing

MapReduce examples. CSE 344 section 8 worksheet. May 19, 2011

SQL Query Evaluation. Winter Lecture 23

Relational Databases

Translating SQL into the Relational Algebra

Chapter 5: Overview of Query Processing

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

Introducing Cost Based Optimizer to Apache Hive

Lecture 6: Query optimization, query tuning. Rasmus Pagh

Unique column combinations

Inside the PostgreSQL Query Optimizer

Rethinking SIMD Vectorization for In-Memory Databases

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

Datenbanksysteme II: Implementation of Database Systems Implementing Joins

CSE 562 Database Systems

Execution Strategies for SQL Subqueries

Query Optimization for Distributed Database Systems Robert Taylor Candidate Number : Hertford College Supervisor: Dr.

Software Engineering

Review. Data Warehousing. Today. Star schema. Star join indexes. Dimension hierarchies

SQL: Queries, Programming, Triggers

Schema Mappings and Data Exchange

Relational Algebra and SQL

Query Processing C H A P T E R12. Practice Exercises

The University of British Columbia

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Chapter One Introduction to Programming

Advanced Oracle SQL Tuning

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

Chapter 6: Episode discovery process

Big Data & Scripting Part II Streaming Algorithms

DATABASE DESIGN - 1DL400

In-Memory Database: Query Optimisation. S S Kausik ( ) Aamod Kore ( ) Mehul Goyal ( ) Nisheeth Lahoti ( )

6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, Class 4 Nancy Lynch

Implementation of Recursively Enumerable Languages using Universal Turing Machine in JFLAP

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

Multi-dimensional index structures Part I: motivation

The Volcano Optimizer Generator: Extensibility and Efficient Search

Graham Kemp (telephone , room 6475 EDIT) The examiner will visit the exam room at 15:00 and 17:00.

HOW TO USE MINITAB: DESIGN OF EXPERIMENTS. Noelle M. Richard 08/27/14

Software Pipelining - Modulo Scheduling

Automata and Computability. Solutions to Exercises

Cloud and Big Data Summer School, Stockholm, Aug Jeffrey D. Ullman

Oracle Database 11g: SQL Tuning Workshop

Big Data and Scripting map/reduce in Hadoop

Database Constraints and Design

Session 7 Fractions and Decimals

4. SQL. Contents. Example Database. CUSTOMERS(FName, LName, CAddress, Account) PRODUCTS(Prodname, Category) SUPPLIERS(SName, SAddress, Chain)

Automating SQL Injection Exploits

Basics of Counting. The product rule. Product rule example. 22C:19, Chapter 6 Hantao Zhang. Sample question. Total is 18 * 325 = 5850

Lecture Notes on Database Normalization

SQL Database queries and their equivalence to predicate calculus

Parquet. Columnar storage for the people

Oracle Database 11g: SQL Tuning Workshop Release 2

Classification and Prediction

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

AI: A Modern Approach, Chpts. 3-4 Russell and Norvig

Web Data Extraction: 1 o Semestre 2007/2008

SDMX technical standards Data validation and other major enhancements

Why Query Optimization? Access Path Selection in a Relational Database Management System. How to come up with the right query plan?

Introduction to database design

3. Relational Model and Relational Algebra

Bounded Treewidth in Knowledge Representation and Reasoning 1

Language Modeling. Chapter Introduction

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

Relational Algebra. Module 3, Lecture 1. Database Management Systems, R. Ramakrishnan 1

Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows

MapReduce: Algorithm Design Patterns

Answer Key. UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division

Distributed Databases in a Nutshell

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

1 Sufficient statistics

CS 3719 (Theory of Computation and Algorithms) Lecture 4

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Partitioning under the hood in MySQL 5.5

DBMS / Business Intelligence, SQL Server

Dynamic Programming. Lecture Overview Introduction

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Big Data Technology Pig: Query Language atop Map-Reduce

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

Lecture 1. Basic Concepts of Set Theory, Functions and Relations

Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A

1 Introduction. 2 An Interpreter. 2.1 Handling Source Code

Transcription:

Q Query Plan Query Processing Example: Select B,D From R,S Where R.A = c S.E = 2 R.C=S.C Himanshu Gupta CSE 532 Query Proc. 1

How do we execute query? One idea - Do Cartesian product - Select tuples - Do projection Himanshu Gupta CSE 532 Query Proc. 2

Relational Algebra - can be used to describe plans... Plan I Π B,D σ R.A= c S.E=2 R.C=S.C X R S OR: Π B,D [ σ R.A= c S.E=2 R.C = S.C (RXS)] Himanshu Gupta CSE 532 Query Proc. 3

Another idea: Plan II Π B,D natural join σ R.A = c σ S.E = 2 R S Himanshu Gupta CSE 532 Query Proc. 4

R S A B C σ (R) σ(s) C D E a 1 10 A B C C D E 10 x 2 b 1 20 c 2 10 10 x 2 20 y 2 c 2 10 20 y 2 30 z 2 d 2 35 30 z 2 40 x 1 e 3 45 50 y 3 Himanshu Gupta CSE 532 Query Proc. 5

Plan III Use R.A and S.C Indexes (1) Use R.A index to select R tuples with R.A = c (2) For each R.C value found, use S.C index to find matching tuples (3) Eliminate S tuples S.E 2 (4) Join matching R,S tuples, project B,D attributes and place in result Himanshu Gupta CSE 532 Query Proc. 6

R S A = c C A B C C D E I1 a 1 10 10 x 2 b 1 20 <c,2,10> <10,x,2> 20 y 2 c 2 10 check=2? 30 z 2 d 2 35 output: <2,x> 40 x 1 e 3 45 50 y 3 next tuple: <c,7,15> I2 Himanshu Gupta CSE 532 Query Proc. 7

SQL query parse parse tree convert logical query plan apply laws improved l.q.p estimate result sizes l.q.p. +sizes More-Improved l.q.p Query Processing statistics {(P1,C1),(P2,C2)...} answer execute Pi pick best estimate costs Enumerate physical plans {P1,P2,..} Himanshu Gupta CSE 532 Query Proc. 8

Key Steps in Query Processing à à Logical query plan(s) Physical query plans Best Physical plan. Improving logical/physical plans requires cost estimation which in turn requires size estimation of intermediate results. Size estimation techniques: Keep data statistics, and use certain models (later slides). Himanshu Gupta CSE 532 Query Proc. 9

Example SELECT title FROM StarsIn WHERE starname IN (SELECT name FROM MovieStar WHERE birthdate LIKE %1960 ); (Find the movies with stars born in 1960) Himanshu Gupta CSE 532 Query Proc. 10

Example: Parse Tree <Query> <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Tuple> IN <Query> title StarsIn <Attribute> ( <Query> ) starname <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Attribute> LIKE <Pattern> name MovieStar birthdate %1960 Himanshu Gupta CSE 532 Query Proc. 11

Example: Generating Relational Algebra Πtitle σ StarsIn <condition> <tuple> <attribute> IN Πname σbirthdate LIKE %1960 starname MovieStar Himanshu Gupta CSE 532 Query Proc. 12

Example: Logical Query Plan Πtitle σstarname=name StarsIn Πname σbirthdate LIKE %1960 MovieStar Himanshu Gupta CSE 532 Query Proc. 13

Example: Improved Logical Query Plan Πtitle starname=name Question: Push project to StarsIn? StarsIn Πname σbirthdate LIKE %1960 MovieStar Himanshu Gupta CSE 532 Query Proc. 14

Example: Estimate Result Sizes Need expected size StarsIn Π σ MovieStar Himanshu Gupta CSE 532 Query Proc. 15

Example: One Physical Plan Hash join Parameters: join order, memory size,... SEQ scan index scan Parameters: Select Condition,... StarsIn MovieStar Himanshu Gupta CSE 532 Query Proc. 16

Example: Estimate costs L.Q.P P1 P2. Pn C1 C2. Cn Pick best! Himanshu Gupta CSE 532 Query Proc. 17

SQL query parse parse tree convert logical query plan apply laws improved l.q.p estimate result sizes l.q.p. +sizes More-Improved l.q.p Query Processing statistics {(P1,C1),(P2,C2)...} answer execute Pi pick best estimate costs Enumerate physical plans {P1,P2,..} Himanshu Gupta CSE 532 Query Proc. 18

Rules: Joins, products, union R S = S R (R S) T = R (S T) R x S = S x R (R x S) x T = R x (S x T) R U S = S U R R U (S U T) = (R U S) U T Himanshu Gupta CSE 532 Query Proc. 19

Rules: Selects σ p1 p2 (R) = σ p1vp2 (R) = σ p1 [ σ p2 (R)] [ σ p1 (R)] U [ σ p2 (R)] Himanshu Gupta CSE 532 Query Proc. 20

Rules: σ + combined Let p = predicate with only R attribs q = predicate with only S attribs m = predicate with only R,S attribs σ p (R S) = [σ p (R)] S σ q (R S) = R [σ q (S)] Himanshu Gupta CSE 532 Query Proc. 21

Derived Rules: σ + combined σp q (R S) = [σp (R)] [σq (S)] σp q m (R S) = σm [(σp R) (σq S)] σpvq (R S) = [(σp R) S] U [R (σq S)] Himanshu Gupta CSE 532 Query Proc. 22

Rules: π, σ combined Let x = subset of R attributes z = attributes in predicate P (subset of R attributes) πxz πx[σp (R) ] = πx {σp [ πx (R) ]} Himanshu Gupta CSE 532 Query Proc. 23

Rules: σ, U combined σp(r U S) = σp(r) U σp(s) σp(r - S) = σp(r) - S = σp(r) - σp(s) Himanshu Gupta CSE 532 Query Proc. 24

Which are good transformations? σp1 p2 (R) σp1 [σp2 (R)] σp (R S) [σp (R)] S R S S R πx [σp (R)] πx {σp [πxz (R)]} Himanshu Gupta CSE 532 Query Proc. 25

Bottom line: No transformation is always good Usually good: early selections Himanshu Gupta CSE 532 Query Proc. 26

SQL query parse parse tree convert logical query plan apply laws improved l.q.p estimate result sizes l.q.p. +sizes More-Improved l.q.p Query Processing statistics {(P1,C1),(P2,C2)...} answer execute Pi pick best estimate costs Enumerate physical plans {P1,P2,..} Himanshu Gupta CSE 532 Query Proc. 27

Estimating result size Keep statistics for relation R T(R) : # tuples in R S(R) : # bytes in each R tuple B(R): # blocks to hold all R tuples V(R, A) : # distinct values in R for attribute A Himanshu Gupta CSE 532 Query Proc. 28

Example R A B C D cat 1 10 a cat 1 20 b dog 1 30 a dog 1 40 c bat 1 50 d A: 20 byte string B: 4 byte integer C: 8 byte date D: 5 byte string T(R) = 5 S(R) = 37 V(R,A) = 3 V(R,C) = 5 V(R,B) = 1 V(R,D) = 4 Himanshu Gupta CSE 532 Query Proc. 29

Size estimates for W = R1 x R2 T(W) = S(W) = T(R1) T(R2) S(R1) + S(R2) Himanshu Gupta CSE 532 Query Proc. 30

Size estimate for W = σa=a (R) S(W) = S(R) T(W) =? Himanshu Gupta CSE 532 Query Proc. 31

Example A B C D R V(R,A)=3 cat 1 10 a cat 1 20 b V(R,B)=1 dog 1 30 a V(R,C)=5 dog 1 40 c V(R,D)=4 bat 1 50 d W = σz=val(r); T(W) = T(R) V(R,Z) (under assumption of uniform distribution of values of Z over V(R,Z)) Himanshu Gupta CSE 532 Query Proc. 32

W = σ z val (R ). T(W)? R Z Min=1 V(R,Z)=10 W= σ z 15 (R) Max=20 f = 20-15+1 = 6 20 20 (fraction of range) T(W) = f T(R) Himanshu Gupta CSE 532 Query Proc. 33

Size estimate for W = R1 R2 Let x = attributes of R1 y = attributes of R2 Case 1 X Y = Same as R1 x R2 Himanshu Gupta CSE 532 Query Proc. 34

Case 2 W = R1 R2 X Y = A R1 A B C R2 A D Assumption (containment of values): V(R1,A) V(R2,A) Every A value in R1 is in R2 V(R2,A) V(R1,A) Every A value in R2 is in R1 Himanshu Gupta CSE 532 Query Proc. 35

Computing T(W) when V(R1,A) V(R2,A) R1 A B C R2 A D Take 1 tuple Match 1 tuple matches with T(R2) tuples... V(R2,A) so T(W) = T(R2) T(R1) V(R2, A) Himanshu Gupta CSE 532 Query Proc. 36

V(R1,A) V(R2,A) T(W) = T(R2) T(R1) V(R2,A) V(R2,A) V(R1,A) T(W) = T(R2) T(R1) V(R1,A) [A is common attribute] Himanshu Gupta CSE 532 Query Proc. 37

In general, W = R1 R2 T(W) = T(R2) T(R1) max{ V(R1,A), V(R2,A) } Himanshu Gupta CSE 532 Query Proc. 38

In all cases: S(W) = S(R1) + S(R2) - S(A) size of attribute A Himanshu Gupta CSE 532 Query Proc. 39

Similarly, we can estimate sizes of: ΠAB (R); σa=a B=b (R); R S with common attributes. A,B,C; Union, intersection, diff,. Sec. 16.4.7 Himanshu Gupta CSE 532 Query Proc. 40

Note: for complex expressions, need intermediate T,S,V results. E.g. W = [σa=a (R1) ] R2 Treat as relation U T(U) = T(R1)/V(R1,A) S(U) = S(R1) V(U,*) =?? (Needed for T(W)) Himanshu Gupta CSE 532 Query Proc. 41

To estimate Vs E.g., U = σa=a (R1) Say R1 has attributes A,B,C,D V(U, A) = 1 V(U, B) = V(R1, B) V(U, C) = V(R1,C) V(U, D) = V(R1,D) (exact) (guess) (guess) (guess) Himanshu Gupta CSE 532 Query Proc. 42

For Joins U = R1(A,B) R2(A,C) V(U,A) = min { V(R1, A), V(R2, A) } V(U,B) = V(R1, B) V(U,C) = V(R2, C) [called preservation of value sets assumption] Himanshu Gupta CSE 532 Query Proc. 43

Example: Z = R1(A,B) R2(B,C) R3(C,D) R1 R2 R3 T(R1) = 1000 V(R1,A)=50 V(R1,B)=100 T(R2) = 2000 V(R2,B)=200 V(R2,C)=300 T(R3) = 3000 V(R3,C)=90 V(R3,D)=500 Himanshu Gupta CSE 532 Query Proc. 44

Partial Result: U = R S T(U) = 1000 2000 V(U,A) = 50 200 V(U,B) = 100 V(U,C) = 300 Himanshu Gupta CSE 532 Query Proc. 45

Z = U R3 T(Z) = 1000 2000 3000 V(Z,A) = 50 200 300 V(Z,B) = 100 V(Z,C) = 90 V(Z,D) = 500 Himanshu Gupta CSE 532 Query Proc. 46

Recap Estimating size of results is an art Computing Statistics: Scan, Sample. Also, statistics must be kept up to date Other Statistics: Histograms (equi-height, equi-width, most-freq.) Next: Using size estimates to improve an LQP. Himanshu Gupta CSE 532 Query Proc. 47

Improve LPQ with Size Size estimates can also help improve the LQP. But, need some heuristic to estimate costs. Good heuristic: Cost = Sum of intermediate-result sizes. Next Slide: Example. Himanshu Gupta CSE 532 Query Proc. 48

Ex: Improve LQP with Sizes R(a,b): T(R)=5000, V(R,a)=50, V(R,b)=100 S(b,c): T(S) = 2000, V(S,b)=200, V(S,c)=100. δ(σ_p(r JOIN S)). Where p is (R.a=20). Two choices: 1. δ(σ_p(r)) JOIN (δ (S)) 2. δ(σ_p(r) JOIN S) Cost Model = Sum of the sizes of intermediate results. [1100 vs 1150] Himanshu Gupta CSE 532 Query Proc. 49

SQL query parse parse tree convert logical query plan apply laws improved l.q.p estimate result sizes l.q.p. +sizes More-Improved l.q.p Query Processing statistics {(P1,C1),(P2,C2)...} answer execute Pi pick best estimate costs Enumerate physical plans {P1,P2,..} Himanshu Gupta CSE 532 Query Proc. 50

SQL query parse parse tree convert logical query plan apply laws improved l.q.p estimate result sizes l.q.p. +sizes More-Improved l.q.p statistics Chapter 15 {(P1,C1),(P2,C2)...} answer execute Pi pick best estimate costs Enumerate physical plans {P1,P2,..} Himanshu Gupta CSE 532 Query Proc. 51

SQL query parse parse tree convert logical query plan apply laws improved l.q.p estimate result sizes l.q.p. +sizes More-Improved l.q.p Done (previous slides) statistics {(P1,C1),(P2,C2)...} Enumerate physical plans {P1,P2,..} answer execute Pi pick best estimate costs Himanshu Gupta CSE 532 Query Proc. 52

SQL query parse parse tree convert logical query plan apply laws improved l.q.p estimate result sizes l.q.p. +sizes More-Improved l.q.p Next few slides statistics {(P1,C1),(P2,C2)...} Enumerate physical plans {P1,P2,..} answer execute Pi pick best estimate costs Himanshu Gupta CSE 532 Query Proc. 53

LQP vs. PQP LQP is a high-level expression-tree. PQP is (LQP + more execution details such as): Order/grouping of join, union, intersection, etc. Choice of join algorithm, etc. Additional operators such as scanning, sorting, etc. Intermediate steps between two operations. E.g., sorting, storage, pipelining, etc. Himanshu Gupta CSE 532 -Query Opt-54

LQP à PQP Generating and comparing physical plans Query Generate Plans Pruning x x Estimate Cost Select Cost Pick Min Himanshu Gupta CSE 532 -Query Opt-55

PQP Enumeration Heuristics 1. Various techniques (details skipped): Top-down Bottom-up Heuristics Branch and Bound Hill Climbing Dynamic Programming Selinger-style Optimization (Improved DP) Himanshu Gupta CSE 532 Query Proc. 56

Join Ordering R1 x R2 x R3 x.. Rn (join of n tables). How many possible orderings? Join-selectivity. Left-deep, Right-deep trees. Dynamic Programming approach. Himanshu Gupta CSE 532 Query Proc. 57

PQP Selection Example Pipelining vs. Materialization (R join S) join U 5000, 10000, 10000 blocks respectively. (R join S) is of size k (we ll consider different k) We ll only use hash-joins Memory size: 101 blocks. Himanshu Gupta CSE 532 Query Proc. 58

(R join S) join U. R join S 100 buckets at most. Example (contd) R-buckets have 50 tuples each. So, need 51 buffers for the second pass. Pipeline: Use remaining 50 buffers to join the result with U. If k < 49, then keep (R join S) in memory, and read U one block at a time. Himanshu Gupta CSE 532 Query Proc. 59

Example (continued) If k < 49, then read U one block at a time, and do everything in main memory. Cost = 55k. If k > 49, but < 5000: First, hash all tables. Hash U with 50 buckets. When doing R join S, use the remaining 50 buffers to hash (R join S) result into the 50 buckets. Each bucket of (R join S) is of size 100. Then, do two-pass hash join of (R join S) with U. Cost = 50000 + 15000 + (k + (10000 + k)) Himanshu Gupta CSE 532 Query Proc. 60

If k > 5000: Example contd. Last step will require a 3-pass join, since each bucket of (R join S) as well as U is more than 100 blocks (buckets of U are 200 blocks each). Better: No pipelining. Write (R join S). Then, do 2-pass hash-join with U (now the buckets of U are 100 blocks each). Cost = 45000 + k + 3(10000 + k) = 75000 + 4k Himanshu Gupta CSE 532 Query Proc. 61