Datenbanksysteme II: Query Execution. Ulf Leser

Similar documents
Datenbanksysteme II: Implementation of Database Systems Query Execution

Chapter 13: Query Processing. Basic Steps in Query Processing

SQL Query Evaluation. Winter Lecture 23

Query Processing. Q Query Plan. Example: Select B,D From R,S Where R.A = c S.E = 2 R.C=S.C. Himanshu Gupta CSE 532 Query Proc. 1

Datenbanksysteme II: Implementation of Database Systems Implementing Joins

Query Processing C H A P T E R12. Practice Exercises

Oracle Database 11g: SQL Tuning Workshop

Inside the PostgreSQL Query Optimizer

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

Translating SQL into the Relational Algebra

Evaluation of Expressions

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

Chapter 13: Query Optimization

Oracle Database 11g: SQL Tuning Workshop Release 2

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Instant SQL Programming

Data Warehousing und Data Mining

Partitioning under the hood in MySQL 5.5

2) What is the structure of an organization? Explain how IT support at different organizational levels.

Chapter 14: Query Optimization

MOC 20461C: Querying Microsoft SQL Server. Course Overview

University of Aarhus. Databases IBM Corporation

Relational Database: Additional Operations on Relations; SQL

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

Execution Strategies for SQL Subqueries

Physical Database Design and Tuning

Relational Databases

Performance Tuning for the Teradata Database

1. Physical Database Design in Relational Databases (1)

Oracle Database: SQL and PL/SQL Fundamentals

SQL Query Performance Tuning: Tips and Best Practices

Oracle Database: SQL and PL/SQL Fundamentals NEW

Database Programming with PL/SQL: Learning Objectives

Oracle Database: SQL and PL/SQL Fundamentals

Execution Plans: The Secret to Query Tuning Success. MagicPASS January 2015

In this session, we use the table ZZTELE with approx. 115,000 records for the examples. The primary key is defined on the columns NAME,VORNAME,STR

CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113

Netezza SQL Class Outline

SQL Server Query Tuning

MapReduce examples. CSE 344 section 8 worksheet. May 19, 2011

COMP 5138 Relational Database Management Systems. Week 5 : Basic SQL. Today s Agenda. Overview. Basic SQL Queries. Joins Queries

Introducing Cost Based Optimizer to Apache Hive

Oracle SQL. Course Summary. Duration. Objectives

FHE DEFINITIVE GUIDE. ^phihri^^lv JEFFREY GARBUS. Joe Celko. Alvin Chang. PLAMEN ratchev JONES & BARTLETT LEARN IN G. y ti rvrrtuttnrr i t i r

DBMS / Business Intelligence, SQL Server

Oracle Database: SQL and PL/SQL Fundamentals NEW

Advanced Oracle SQL Tuning

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

1 Structured Query Language: Again. 2 Joining Tables

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

Duration Vendor Audience 5 Days Oracle End Users, Developers, Technical Consultants and Support Staff

Lecture 6: Query optimization, query tuning. Rasmus Pagh

C H A P T E R 1 Introducing Data Relationships, Techniques for Data Manipulation, and Access Methods

CIS 631 Database Management Systems Sample Final Exam


Useful Business Analytics SQL operators and more Ajaykumar Gupte IBM

Physical DB design and tuning: outline

Oracle Database 10g: Introduction to SQL

Optimization of SQL Queries in Main-Memory Databases

Big Data Technology Pig: Query Language atop Map-Reduce

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

Oracle EXAM - 1Z Oracle Database 11g Release 2: SQL Tuning. Buy Full Product.

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

Unit Storage Structures 1. Storage Structures. Unit 4.3

a presentation by Kirk Paul Lafler SAS Consultant, Author, and Trainer

A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX

Oracle 10g PL/SQL Training

Optimizing Your Data Warehouse Design for Superior Performance

SQL: Queries, Programming, Triggers

Querying Microsoft SQL Server

CSCE-608 Database Systems COURSE PROJECT #2

Example Instances. SQL: Queries, Programming, Triggers. Conceptual Evaluation Strategy. Basic SQL Query. A Note on Range Variables

Understanding Query Processing and Query Plans in SQL Server. Craig Freedman Software Design Engineer Microsoft SQL Server

Rethinking SIMD Vectorization for In-Memory Databases

CSE 562 Database Systems

MS SQL Performance (Tuning) Best Practices:

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

COS 318: Operating Systems

Big Data Processing with Google s MapReduce. Alexandru Costan

Chapter 5. SQL: Queries, Constraints, Triggers

Toad for Oracle 8.6 SQL Tuning

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

Data storage Tree indexes

Symbol Tables. Introduction

BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES

IT2305 Database Systems I (Compulsory)

Course ID#: W 35 Hrs. Course Content

SUBQUERIES AND VIEWS. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 6

Parallel Data Preparation with the DS2 Programming Language

Querying Microsoft SQL Server 20461C; 5 days

In-Memory Data Management for Enterprise Applications

14:440:127 Introduction to Computers for Engineers. Notes for Lecture 06

Topics Advanced PL/SQL, Integration with PROIV SuperLayer and use within Glovia

Oracle Database 11g SQL

Big Data and Scripting map/reduce in Hadoop

Best Practices for DB2 on z/os Performance

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

Transcription:

Datenbanksysteme II: Query Execution Ulf Leser

Content of this Lecture Relational operations Physical query plan operators Implementing (some) relational operators Execution models Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 2

5 Layer Architecture Data Model We are here Logical Access Data Structures Buffer Management Operating System Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 3

Query Execution We have Structured Query Language SQL Relational algebra How to access tuples in many ways (scan, index, ) Now Given a SQL query Find a fast way and order of accessing tuples such that the answer to the query is computed Usually, we won t find the best way, but avoid the worst Use knowledge about value distributions, access paths, query operations, IO cost, Compile a declarative query in an optimal executable program Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 4

Steps (Sketch) Translate query in a relational algebra expression Logical rewriting: Each expression can be rewritten in many other, semantically equivalent expressions Physical optimization: For each relational operation, we have multiple possible implementations Table access: scan, indexes, sorted access through index, Joins: Nested loop, sort-merge, hash, Query execution: Execute the best query plan found Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 5

SQL query parse convert estimate result sizes consider physical plans Exemplary Workflow parse tree logical query plan improved l.q.p l.q.p. +sizes statistics / adaptation {P1,P2,..} execute pick best {(P1,C1),(P2,C2)...} answer Pi estimate costs Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 6

Example SQL query SELECT title FROM starsin WHERE starname IN ( SELECT name FROM moviestar WHERE birthdate LIKE %1960 ); (Find all movies with stars born in 1960) Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 7

Parse Tree <Query> <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Tuple> IN <Query> title StarsIn <Attribute> ( <Query> ) starname <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Attribute> LIKE <Pattern> name MovieStar birthdate %1960 Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 8

Logical Query Plan Πtitle σstarname=name StarsIn Πname σbirthdate LIKE %1960 MovieStar Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 9

Improved Logical Query Plan Πtitle σstarname=name Πtitle starname=name Question: Push projection to StarsIn? StarsIn Πname StarsIn Πname σbirthdate LIKE %1960 σbirthdate LIKE %1960 MovieStar MovieStar Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 10

Physical Plan Hash join Parameters: Join order, selectivity, memory size, size of attributes, sequential scan StarsIn index scan MovieStar Parameters: Selectivity, fragmentation of data file, size of tuples,,... Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 11

Content of this Lecture Relational operations Physical query plan operators Implementing (some) relational operators Execution models Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 12

Relational Operations: One Table In the following: Table means table or intermediate result Selection σ Read table and filter tuples based on condition Possibility: Use index to access only the qualifying tuples Specified in WHERE clause of SQL query Projection π Read table and manipulate columns (usually remove columns) In SET semantic, also duplicates must be filtered Possibility: Push projections down to base tables Projection usually decreases size of table When not? Specified in SELECT clause of SQL query Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 13

Relational Operations: One Table cont d Grouping Read table and put all tuples with equal values in all grouping attributes into bag; output one tuple per bag by aggregating values Possibility: Reuse order in table SQL GROUP-BY clause Duplicate elimination Read table and remove all duplicate tuples Possibility: Reuse order in table Specified in SELECT clause Sorting Not an operation in relational algebra, but important in practice Possibility: Reuse order in table SQL ORDER-BY clause Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 14

Relational Operations: Two Tables Cartesian product x Read two tables and build all combinations of tuples Usually avoided combine product and selection to join Products in a plan are hints to wrong queries Specified implicitly by FROM clause Derived operation: Join Read two tables and combine matching tuples Natural join, theta join, equi join, semi join, outer join Possibility: Join-order; nested-loop join, sort-merge join, hash join, index join, Specified in WHERE clause or in FROM clause Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 15

Relational Operations: Two Queries Union Read two tables and build union of all tuples Proper SQL command for two queries Intersection Read two tables and build intersection of tuples Same as join over all attributes SQL command for two queries Minus Subtract tuples of one table from tuples from the other Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 16

Content of this Lecture Relational operations Physical query plan operators Implementing (some) relational operators Execution models Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 17

Select versus Update We do not discuss update, delete, insert Update and delete have queries normal optimization But: data tuples must be loaded (and locked and changed and written) Some tricks don t work any more (e.g. oversized index) Insert may have query Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 18

Implementing Operations Most single table operations are rather straight-forward See book by Garcia-Molina, Ullmann, Widom for detailed discussion Joins are more complicated later Sorting, especially for large tables, is important External sorting we have seen Merge-Sort We sketch three single table operations Scanning a table Duplicate elimination Group By Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 19

Scanning a Table At the bottom of each operator tree are relations Accessing them implies a table scan If table T has b blocks, this costs b IO Often better: Combine with next operation in plan SELECT t.a, t.b FROM t WHERE A=5 Selection: If index on T.A available, perform index scan Assume T =n, A =a different values, z=n/a tuples with T.A=a Index has height log k (n) Accessing z tuples from T costs (worst-case) z IO Especially effective if A is a key: Only one tuple selected Projection: Integrate into table scan Read tuples, but only pass-on attributes that are needed Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 20

Scanning a Table 2 Conditions can be complex SELECT t.a, t.b FROM t WHERE A=5 AND (B<4 OR B>9) AND C= müller Approach Compute conjunctive normal form With indexes: Compute TID lists for each conjunct and intersect Alternatives? Without indexes: Scan table and evaluate condition for each tuple For complex conditions and small tables, linear scanning might be faster Depends on expected result size Cost-based optimization required Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 21

Duplicate Elimination Option 1: Sorting Sort input table (or intermediate result) on DISTINCT columns Can be skipped if table is already sorted Scan sorted table and output only unique tuples Generates output in sorted order Pipeline breaker (see later) Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 22

Duplicate Elimination Option 2: Use internal sorting/hashing Scan input table and build internal data structure holding each unique tuple once Binary tree some cost for balancing, robust Hash table might be faster, needs good hash function When reading a tuple, check if it has already been seen If not: insert tuple and copy it to the output; else: skip tuple No pipeline breaker Does not sort result (but existing sorting would remain) Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 23

Performance Assumptions Main memory: m blocks Table: b blocks Using external sorting If table is sorted, we need b IO If table not sorted, we need 2*b*ceiling(log m (b))-b IO How? Using internal sorting If all distinct values fit into m, we need b IO Estimate from statistics Otherwise use two pass algorithms (e.g. hash-join like; later) Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 24

Grouping and Aggregation SELECT T.day_id, sum(amount*price) FROM sales S GROUP BY T.day_id SELECT must contain only GROUP BY attributes and aggregate functions Partition result of inner query by GROUP BY attributes For each partition, compute one result tuple: GROUP BY attributes and aggregate function applied on all values of other attributes in this partition Note: Depending on the aggregate function, we might need to buffer more than one value per partition examples? Inner query Partition Aggregate HAVING clause Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 25

Implementing GROUP BY Proceed like duplicate elimination But we also need to compute the aggregated columns No problem: SUM, COUNT, MIN, MAX, ANY What to do for AVG? Top-5? What to do for Median? Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 26

Computing Median Option 1: Partition table into k partitions Scan table Build (hash) table for first k different GROUP BY values When reading one of first k, add value to (sorted) list When reading other GROUP value, discard When scan finished, output median of k groups Iterate next k groups Option 2: Sort table on GROUP BY and Median attribute Then scan sorted data Buffer all values per group When next group is reached, output middle value What if we cannot buffer all values of a group? Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 27

Content of this Lecture Relational operations Physical query plan operators Implementing (some) relational operators Execution models Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 28

Query Execution Operator implementations need to call each other to pass tuples up the tree Iterator concept: Open, next, close Each operator implementation needs these three methods Two modes Blocked Pipelined Work usually done in open (if blocking) or in next (if pipelined) Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 29

Example Blocked (Sketch) Πtitle starname=name StarsIn Πname σbirthdate LIKE %1960 MovieStar p = projection.open(); while p.next(t) output t.title; p.close(); class projection { open() { j = join.open(); while j.next(t) tmp[i++]=t; j.close(); cnt:=0; } next( t) { if (cnt<max) t = tmp[cnt++]; return true; else return false; } close() { discard( tmp); } } class join { open() { l = table.open(); while l.next(tl) r = projection.open() while r.next(tr) if tl.starname=tr.name tmp[i++]=tl tr; r.close(); end while; l.close(); cnt:=0; } next( t) { if (cnt<max) t = tmp[cnt++]; return true; else return false; } close() { discard( tmp); Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 30

Example Pipelined (Sketch) Πtitle starname=name StarsIn Πname σbirthdate LIKE %1960 MovieStar p = projection.open(); while p.next(t) output t.title; p.close(); class projection { open() { j = join.open(); } next(t) { return j.next( t); } close() { j.close(); } } class join { open() { l = table.open(); r = projection.open() l.next( tl); } next( t) { if r.next(tr) if tl.starname=tr.name t=tl tr; return true; else if l.next( tl) r.reset(); return next( t); else return false; } close() { l.close(); r.close(); Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 31

Pipelined versus Blocked Pipelining is in general advantageous Very little demand for buffering When intermediate results are large, buffers need to be stored on disk Operations can be distributed to different threads or CPUs Results come early and continuously Pipeline breaker Some operations cannot be pipelined Sorting: next() can be executed only after entire table was read Exception: When input is sorted Grouping and aggregation Depending on implementation Minus, intersection... R......... S...... T Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 32

Pipelined versus Blocked Projection with duplicate elimination Need not be a pipeline breaker Recall implementation without sorting next() can return early But we need to keep track of all values already returned requires large buffer Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 33

Bag and Set Semantic Relational algebra has SET semantic All relations are duplicate-free Result of each query is duplicate-free Result of each intermediate result is duplicate-free SQL databases use BAG semantic More practical in applications PKs are used to prevent existence of real duplicates But: Duplicate elimination remains an important task Explicit DISTINCT clause What else? Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 34