QUERY OPTIMIZATION. CS 564- Fall ACKs: Jeff Naughton, Jignesh Patel, AnHai Doan

Similar documents
Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

Chapter 13: Query Processing. Basic Steps in Query Processing

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113

Evaluation of Expressions

Lecture 6: Query optimization, query tuning. Rasmus Pagh

Advanced Oracle SQL Tuning

SQL Query Evaluation. Winter Lecture 23

Computer Security: Principles and Practice

Inside the PostgreSQL Query Optimizer

Access Path Selection in a Relational Database Management System

Ch.5 Database Security. Ch.5 Database Security Review

Chapter 5: Overview of Query Processing

Query Optimization for Distributed Database Systems Robert Taylor Candidate Number : Hertford College Supervisor: Dr.

There are five fields or columns, with names and types as shown above.

From Databases to Natural Language: The Unusual Direction

Why Query Optimization? Access Path Selection in a Relational Database Management System. How to come up with the right query plan?

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Oracle Database 11g: SQL Tuning Workshop

Query Processing C H A P T E R12. Practice Exercises

Overview of Storage and Indexing

Chapter 14: Query Optimization

B2.2-R3: INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

Assignment 3: SQL Queries. Questions 1. The following relations keep track of airline flight information:

Overview of Storage and Indexing. Data on External Storage. Alternative File Organizations. Chapter 8

University of Wisconsin. Imagine yourself standing in front of an exquisite buæet ælled with numerous delicacies.

Data Models and Database Management Systems (DBMSs) Dr. Philip Cannata

Introduction to tuple calculus Tore Risch

MapReduce examples. CSE 344 section 8 worksheet. May 19, 2011

Relational Algebra. Query Languages Review. Operators. Select (σ), Project (π), Union ( ), Difference (-), Join: Natural (*) and Theta ( )

System R: Relational Approach to Database Management

Chapter 13: Query Optimization

University of Aarhus. Databases IBM Corporation

Oracle Database 11g: SQL Tuning Workshop Release 2

Run$me Query Op$miza$on

APP INVENTOR. Test Review

Query Optimization in a Memory-Resident Domain Relational Calculus Database System

SQL: QUERIES, CONSTRAINTS, TRIGGERS

DATABASE DESIGN - 1DL400

Answer Key. UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division

Index Selection Techniques in Data Warehouse Systems

Query Processing. Q Query Plan. Example: Select B,D From R,S Where R.A = c S.E = 2 R.C=S.C. Himanshu Gupta CSE 532 Query Proc. 1

Physical Database Design and Tuning

Topics in basic DBMS course

Pattern Insight Clone Detection

1. Physical Database Design in Relational Databases (1)

Architectures for Big Data Analytics A database perspective

Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics

Introduction to Parallel Programming and MapReduce

In This Lecture. Physical Design. RAID Arrays. RAID Level 0. RAID Level 1. Physical DB Issues, Indexes, Query Optimisation. Physical DB Issues

Parquet. Columnar storage for the people

Access Path Selection in a Relational Database Management System. P. Griffiths Selinger M. M. Astrahan D. D. Chamberlin,: It. A. Lorie.

Distributed Databases in a Nutshell

Object-Relational Query Processing

Reducing the Braking Distance of an SQL Query Engine

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

Efficient Processing of Joins on Set-valued Attributes

Access Path Selection in a Relational Database Management System. P. Griffiths Selinger M. M. Astrahan D. D. Chamberlin,: It. A. Lorie.

Fallacies of the Cost Based Optimizer

In-Memory Database: Query Optimisation. S S Kausik ( ) Aamod Kore ( ) Mehul Goyal ( ) Nisheeth Lahoti ( )

IBM DB2: LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

Analysis and Modeling of MapReduce s Performance on Hadoop YARN

Adaptive Virtual Partitioning for OLAP Query Processing in a Database Cluster

DB2 for Linux, UNIX, and Windows Performance Tuning and Monitoring Workshop

2 Associating Facts with Time

Cloud Computing at Google. Architecture

Big Data With Hadoop

Datenbanksysteme II: Implementation of Database Systems Implementing Joins

The SQL Query Language. Creating Relations in SQL. Referential Integrity in SQL. Basic SQL Query. Primary and Candidate Keys in SQL

ASTERIX: An Open Source System for Big Data Management and Analysis (Demo) :: Presenter :: Yassmeen Abu Hasson

SQL Tuning Proven Methodologies

CSE 562 Database Systems

Rethinking SIMD Vectorization for In-Memory Databases

SQL Server Query Tuning

Physical DB design and tuning: outline

Relational Algebra 2. Chapter 5.2 V3.0. Napier University Dr Gordon Russell

Query Optimization Over Web Services Using A Mixed Approach

Fragmentation and Data Allocation in the Distributed Environments

Scheme G. Sample Test Paper-I

First, here are our three tables in a fictional sales application:

Indexing Techniques in Data Warehousing Environment The UB-Tree Algorithm

PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce. Authors: B. Panda, J. S. Herbach, S. Basu, R. J. Bayardo.

Aggregating Data Using Group Functions

Understanding Query Processing and Query Plans in SQL Server. Craig Freedman Software Design Engineer Microsoft SQL Server

Recovering Business Rules from Legacy Source Code for System Modernization

Generating code for holistic query evaluation Konstantinos Krikellas 1, Stratis D. Viglas 2, Marcelo Cintra 3

Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries

MapReduce. from the paper. MapReduce: Simplified Data Processing on Large Clusters (2004)

Introduction to Database Systems. Chapter 1 Introduction. Chapter 1 Introduction

Graph Mining on Big Data System. Presented by Hefu Chai, Rui Zhang, Jian Fang

Storage in Database Systems. CMPSCI 445 Fall 2010

Tertiary Storage and Data Mining queries

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

High-performance XML Storage/Retrieval System

Outline. Distributed DBMS

Distributed Aggregation in Cloud Databases. By: Aparna Tiwari

Efficient Grouping and Flexibility of a NiagaraCQ System

Transcription:

QUERY OPTIMIZATION CS 564- Fall 2016 ACKs: Jeff Naughton, Jignesh Patel, AnHai Doan

EXAMPLE QUERY EMP(ssn, ename, addr, sal, did) 10000 tuples, 1000 pages DEPT(did, dname, floor, mgr) 500 tuples, 50 pages SELECT DISTINCT ename FROM Emp E, Dept D WHERE E.did = D.did AND D.dname = Toy ; 2

EVALUATION PLAN (1) SELECT DISTINCT ename FROM Emp E, Dept D WHERE E.did = D.did AND D.dname = Toy ; π ename σ dname = Toy σ EMP.did = DEPT.did X EMP DEPT 3

EVALUATION PLAN (2) SELECT DISTINCT ename FROM Emp E, Dept D WHERE E.did = D.did AND D.dname = Toy ; π ename σ dname = Toy Nested Loop Join EMP DEPT 4

EVALUATION PLAN (3) SELECT DISTINCT ename FROM Emp E, Dept D WHERE E.did = D.did AND D.dname = Toy ; π ename σ dname = Toy buffer size B= 50 Sort Merge Join EMP DEPT 5

EVALUATION PLAN (4) SELECT DISTINCT ename FROM Emp E, Dept D WHERE E.did = D.did AND D.dname = Toy ; π ename Sort Merge Join buffer size B= 50 index on dname σ dname = Toy DEPT EMP 6

PIPELINED EVALUATION Instead of materializing the temporary relation to disk, we can instead pipeline to the next operator in memory By using pipelining we benefit from: no reading/writing to disk of the temporary relation overlapping execution of operators Pipelining is not always possible! 7

QUERY OPTIMIZATION The query optimizer 1. identifies candidate equivalent trees 2. for each tree it finds the best annotated version (using any available indexes): this is called a plan 3. chooses the best overall plan by estimating the cost of each plan 8

ARCHITECTURE OF AN OPTIMIZER query Query Parser parsed query Query Optimizer Plan generator Plan cost estimator System Catalog evaluation plan 9

QUERY OPTIMIZATION query plan: annotated Relational Algebra tree iterator interface: open() /getnext() /close() can be pipelined or materialized The optimizer must solve two main issues: What is the space of possible query plans? How can we estimate the cost of each plan? Ideally: best plan! Practically: avoid worst plans + look at a subset of all plans 10

COST ESTIMATION Estimating the cost of a query plan involves: estimating the cost of each operation in the plan depends on input cardinalities algorithm cost (we know this!) estimating the size of intermediate results we need statistics about input relations for selections and joins, we typically assume independence of predicates 11

COST ESTIMATION Statistics are stored in the system catalog: number of tuples (cardinality) size in pages # distinct keys (when there is an index on the attribute) range (for numeric values) The system catalog is updated periodically Commercial systems use additional statistics, which provide more accurate estimates: histograms wavelets 12

EVALUATION PLANS The space of possible query plans is typically huge and it is hard to navigate through The RA formalism provides us with mathematical rules that transform one RA expression to an equivalent one: for example push selections down reorder joins This way we can construct many equivalent alternative query plans 13

RA EQUIVALENCE (1) Commutativity of σ σ "# (σ "& (R)) σ "& (σ "# (R)) Cascading of σ σ "# " & ", (R) σ "# (σ "& ( σ ", (R))) Cascading of π π /# (R) π /# (π /& ( π /, (R) )) when a 1 a 134 We can evaluate selections in any order! 14

RA EQUIVALENCE (2) Commutativity of join R S S R Associativity of join R S T R (S T) We can reorder the computation of joins in any way (exponentially many orders)! 15

RA EQUIVALENCE (3) Selections + Projections σ 8 (π 9 (R)) π 9 (σ " (R)) (if the selection involves attributes that remain after projection) Selections + Joins σ 8 R S σ 8 (R) S (if the selection involves attributes only in R) We can push selections down the plan tree! 16

EVALUATION PLANS Single relation plan (no joins): file scan index scan(s): clustered or non-clustered more than one index may match predicates The optimizer chooses one with the least estimated cost We can also merge or pipeline selection and projection (and aggregate when there is no group by) 17

EVALUATION PLANS Multiple relation plan joins can be evaluated in any order selections can be combined into the join operator selections and projections can be pushed down the plan tree using the RA equivalence transformations 18

JOIN REORDERING Consider the following join: R S T U Most DBMSs consider left-deep join plans These allow for fully pipelined evaluation U T R S 19