In order to minimize the execution cost is possible: . minimize the size of intermediates results.

Similar documents
Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

SQL Query Evaluation. Winter Lecture 23

Chapter 13: Query Processing. Basic Steps in Query Processing

In this session, we use the table ZZTELE with approx. 115,000 records for the examples. The primary key is defined on the columns NAME,VORNAME,STR

Introduction to tuple calculus Tore Risch

Inside the PostgreSQL Query Optimizer

DATABASE DESIGN - 1DL400

SQL Server Query Tuning

Query Processing C H A P T E R12. Practice Exercises

Chapter 1: Introduction. Database Management System (DBMS) University Database Example


Oracle Database 11g: SQL Tuning Workshop

D B M G Data Base and Data Mining Group of Politecnico di Torino

Chapter 1: Introduction

VALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur

Relational Databases

CIS 631 Database Management Systems Sample Final Exam

Data Warehousing und Data Mining

Symbol Tables. Introduction

Guide to SQL Programming: SQL:1999 and Oracle Rdb V7.1

Database Programming with PL/SQL: Learning Objectives

Efficiently Identifying Inclusion Dependencies in RDBMS

D B M G Data Base and Data Mining Group of Politecnico di Torino

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?

Understanding SQL Server Execution Plans. Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at

Rethinking SIMD Vectorization for In-Memory Databases

Database System Concepts

Performance Evaluation of Natural and Surrogate Key Database Architectures

Databases and Information Systems 1 Part 3: Storage Structures and Indices

Chapter 9 Joining Data from Multiple Tables. Oracle 10g: SQL

Physical DB design and tuning: outline

Chapter 13: Query Optimization

Unit Storage Structures 1. Storage Structures. Unit 4.3

Relational Algebra. Query Languages Review. Operators. Select (σ), Project (π), Union ( ), Difference (-), Join: Natural (*) and Theta ( )

Oracle Database 11g: SQL Tuning Workshop Release 2

Evaluation of Expressions

Execution Strategies for SQL Subqueries

Physical Data Organization

DATA STRUCTURES USING C

Database Design Patterns. Winter Lecture 24

Performance Tuning for the Teradata Database

Evaluation of view maintenance with complex joins in a data warehouse environment (HS-IDA-MD )

Relational Foundations for Functorial Data Migration

Why Query Optimization? Access Path Selection in a Relational Database Management System. How to come up with the right query plan?

Matrix Multiplication

Data exchange. L. Libkin 1 Data Integration and Exchange

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design

Lecture 6: Query optimization, query tuning. Rasmus Pagh

- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time

Relational Algebra 2. Chapter 5.2 V3.0. Napier University Dr Gordon Russell

Answer Key. UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division

MapReduce for Data Warehouses

Chapter 14: Query Optimization

Access Path Selection in a Relational Database Management System

DATABASE MANAGEMENT SYSTEMS. Question Bank:

University of Wisconsin. Imagine yourself standing in front of an exquisite buæet ælled with numerous delicacies.

Lesson 8: Introduction to Databases E-R Data Modeling

Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6)

MapReduce examples. CSE 344 section 8 worksheet. May 19, 2011

Execution Plans: The Secret to Query Tuning Success. MagicPASS January 2015

Principles of Database Management Systems. Overview. Principles of Data Layout. Topic for today. "Executive Summary": here.

Data Structures for Databases

Parallel Processing of JOIN Queries in OGSA-DAI

Course -Oracle 10g SQL (Exam Code IZ0-047) Session number Module Topics 1 Retrieving Data Using the SQL SELECT Statement

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

Graham Kemp (telephone , room 6475 EDIT) The examiner will visit the exam room at 15:00 and 17:00.

2 Associating Facts with Time

C H A P T E R 1 Introducing Data Relationships, Techniques for Data Manipulation, and Access Methods

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

The SQL Query Language. Creating Relations in SQL. Referential Integrity in SQL. Basic SQL Query. Primary and Candidate Keys in SQL

AVOIDANCE OF CYCLICAL REFERENCE OF FOREIGN KEYS IN DATA MODELING USING THE ENTITY-RELATIONSHIP MODEL

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

Chapter 8. SQL-99: SchemaDefinition, Constraints, and Queries and Views

Datenbanksysteme II: Implementation of Database Systems Implementing Joins

Objectives. Oracle SQL and SQL*PLus. Database Objects. What is a Sequence?

Best Practices for DB2 on z/os Performance

Distributed Databases in a Nutshell

Question 1. Relational Data Model [17 marks] Question 2. SQL and Relational Algebra [31 marks]

Review: Participation Constraints

ETL as a Necessity for Business Architectures

Chapter 5: Overview of Query Processing

CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113

a presentation by Kirk Paul Lafler SAS Consultant, Author, and Trainer

In-Memory Databases MemSQL

IT2305 Database Systems I (Compulsory)

Lecture 1: Data Storage & Index

University of Aarhus. Databases IBM Corporation

Wee Keong Ng. Web Data Management. A Warehouse Approach. With 106 Illustrations. Springer

Introducing Cost Based Optimizer to Apache Hive

Module: Software Instruction Scheduling Part I

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Active database systems. Triggers. Triggers. Active database systems.

Advanced Oracle SQL Tuning

TYPICAL QUESTIONS & ANSWERS

Database Systems. National Chiao Tung University Chun-Jen Tsai 05/30/2012

Useful Business Analytics SQL operators and more Ajaykumar Gupte IBM

Transcription:

Query Optimization The query optimizer is the most important block in DBMS: it generates the strategy called execution plan. This module also defines how reads are performed:. directly;. with indices. In order to minimize the execution cost is possible:. minimize the number of reads;. minimize the size of intermediates results. All possibles plans are evaluated statistically: to have better performances strategies are periodically update. Modules SQL QUERY ANALYSIS RELETIONAL ALGEBRA ALGEBRIC OPTIMIZER QUERY TREE COST BASED OPTIMIZATION 1

Execution modes If query is executed and the access program is immediatly generated is used compile & go mode; it is very useful when database and user interact frequently. When the access program is not immedialty generated, many alternatives can be evaluated in order to provide the best result: this is compile & store mode. Analysis Analysis block exploits:. lexical control;. syntactic control;. semantic control in order to translate query received in relational algebra. Algebraic Optimization This module generates query tree from relational algebra indipendently by data but, probably the cost is different. Most important properties are:. intermediates results are reduced before being stored in memory;. an expression for the application who reduced intermediates results is prepared. Transformations with examples In order to exploits examples let consider the two tables:. employ: (emp n, ename, dept n, salary);. department: (dept n, dname). Atomization of selection In general: In the example: σ F1 F 2 (T) σ F1 (σ F2 (T)) σ F2 (σ F1 (T)) σ name= rossi salary>1000() σ name (σ salary ()) σ name (σ salary ()) 2

Cascading projections In general: π x (T) = π x (π x,y (T)) In the example: π emp n () = π emp n (π emp n,ename ()) Anticipation of selection with respect to join In general: σ F (T 1 T 2 ) = T 1 (σ F (T 2 )) In the example, consider the join of: p Dep where: This operation: p : emp.dept n = dept.dept n σ dname=delen ( p Dep) = p (σ dname=delen (Dep)) Anticipation of projection with respect to join In general: In the example: π x (T 1 T 2 ) = π x ((π x,y (T 1 ) (π x,y (T 2 ))) π dname,ename ( p Dep) = π dname,ename ((π dept n,ename () p p (π dept n,dname (Dep))) Join derivation from Cartesian product In general: σ F (T 1 T 2 ) = T 1 F T 2 Distribution of selection with respect to union In general: σ F (T 1 T 2 ) = (σ F (T 1 )) (σ F (T 2 )) Notice that, in orderto have a positive result, two tables mustbeomogenous (same schema). Distribution of selection with respect to difference In general: σ F (T 1 T 2 ) = (σ F (T 1 )) (σ F (T 2 )) 3

Distribution of projection with respect to union In general: π x (T 1 T 2 ) = (π x (T 1 )) (π x (T 2 )) Distribution of projection with respect to difference In general: π x (T 1 T 2 ) = (π x (T 1 )) (π x (T 2 )) As selection, projection can have a distribution with respect to the difference too, only if x includes the primary key or a set of attributes with same properties of the primary key such as:. being unique;. being not null. Example Using same tables described previously consider the following SQL query: 1 SELECT DISTINCT Dname 2 FROM, 3 WHERE.dept_n=.dept_n 4 AND Salary > 1000; The canonical form is:.dept n=.dept n The first transformation is done separating the two conditions of selection: σ.dept n=.dept n The second transformation change the cartesian product in a join operation: 4

p where: p : emp.dept n = dept.dept n The third transformation is anticipate selection: p Last two tranformations are done in order to anticipate projection; first the projection is decomposed:,dept n p then projection is anticipate: p π dept n,dept n 5

Cost Based Optimization This module has to choose between all canonical query trees processed by the Algebraic Optimizer. It is based on two parts:. data profiles which are statistical information describing tables and expressions, for example number and size (in bytes) of tuples, number and size (in bytes too) of attributes; this information, located in data dictionary, is periodically updated and is called table profiles;. cost of access operations. Access Operators The function provided is select an option in the leaf node of query trees that optimizer has built. Full scan This approach is a sequential scan of the table: is very useful when there are no indices or when indices does not reduce the reading cost. Otherwise there are operations who can reduce the size of data to read. Predicate evaluation Access to table is realized through indices: the predicate must have a good selectivity expecially for unclustered indices otherwise the memory block does not fit in main memory. Is possible that, if a predicate has bad selectivity, the best operation is a full scan read. Conjunction of predicates The condition is: A i = v 1 A j = v 2 The predicate who has the best selectivity is first applied and then the second one is applied. It is also possible to build a composed index using both predicates: pointers are sorted on more fields, but complexity increases a lot. Disjunction of predicates The condition is: A i = v 1 A j = v 2 Indices, if all predicates are supported by an index, can be used otherwise is better a full scan because both attributes compare. 6

Join operation This operation is very critical because connect two tables is based on values and, typically all results (intermediates and final) are larger than initial tables. There are three different algorithms:. nested loop;. merge scan join;. hash join. Nested loop This algorithm distinguish tables in:. inner table;. outer table. Join operation is realized reading once the outer table and, for each tuple, read the entire inner table in order to find matches. This approach is also called brute force. Theefficiency is dueto the size of the innertable: if it is small or indexed the cost is not so expensive. For these reasons the inner table is chosen as the smaller of the two: the technique is not symmetric because there will be different cost changing options. Merge scan With this algorithm the selection of inner and outer table does not have effecto to the cost of execution: first of all both tables are sorted on the join attributes, then they are read in parallel and tuples present in both tables are generated on correspondig values. Matching tuples are located in a few countinous memory blocks and since there is only one reading this approach is faster than nested loop. The disadvantage is sorting required on both tables, but if the physical structure is one already sorted on join attributes the drawback is not relevant. Hash Join This is the fastest technique because perform hash function is fast than sort or read operations. The algorithm apply the same hash function to the join attribute in both tables so tuples witch have the same value will be put in the same memory block. It is important notice that in each bucket tuples must be ordered so a local sort is required. Group by The group by close can be performed in two ways:. sorted based by attributes, then aggregate on groups;. hashbasedwhenanhashfunctionisappliedonthegroupofattributes. 7

Execution plan selection To find the optimal execution plan the following dimensions are:. how data is read;. the execution order of all operations;. the technique in witch each operator is implemented.. how often sorts are performed. It is possible that if a join operation is performed with a merge scan the best choice is having the group by implemented with sort; another example is join donewith hash approach, so group by will be donewith hash method. The previous operation can support others later operation. The optimizer builds alternatives in a tree, where:. intermediate nodes makes decision on a variable;. leaf nodes are the final query execution plan. 8