Query Evaluation Techniques for Large Databases. GOETZ GRAETE Presented by Long Zhang Dept. of Computer Science, UBC

Query Evaluation Techniques for Large Databases GOETZ GRAETE Presented by Long Zhang Dept. of Computer Science, UBC zhlong73@cs.ubc.ca 1

Purpose To survey efficient algorithms and software architecture of query execution engines for executing complex queries over large databases. 2

Restrictions Restrictive data definition and manipulation languages can make application development and maintenance unbearably cumbersome. Data volumes might be so large or complex that the real or perceived performance advantage of file systems is considered more important than all other criteria, e.g. the higher levels of abstraction and programmer productivity typically achieved with database management systems. 3

Topics algorithms and there costs sorting versus hashing parallelism resource allocation and scheduling general performance-enhancing techniques 4

Which parts we concern? User Interface Database Query Language Query Optimizer Query Execution Engine Files and Indices I/O Buffer Disk 5

Query Execution Engine What is it? Collection of query execution operators and mechanisms for operator communication and synchronization Query execution engine defines the space of possible plans that can be chosen by query optimizer. 6

Translate logical query from SQL to query tree in logical algebra.. Query tree in logical algebra is translated into a physical plan Optimizer expands search space and finds best plan. Optimal physical plan copied out of optimizer s memory structure and sent to query execution engine. query execution engine executes plan using relations in database as input, and produces output 7

Discussion Fact: On the first page, the author states that DBMSs have not been used in many areas for two reasons. Application development and maintenance is difficult. (Restrictive data definition and manipulation languages can make the development difficult) The data in those areas is SO big, that speed trumps all, and people would rather handcode.

Questions are Do you think in today s world databases are popular for most applications? IF NO, GO TO (1), IF YES, GO TO (2) 1)Why do you think databases aren't used more? Why don't you use them on your data? 2)How do database systems' needs for dealing with large amounts of data differ from other applications or systems such as OS? Should we only depend on it? IF YES, what should be changed in the architecture of databases or data abstraction?

1. Architecture of Query Execution Engines Focus on useful mechanisms for processing sets of items Records Tuples Entities Objects 10

Logical algebra VS Physical algebra Logical algebra, e.g. relational algebra, is more closely related to the data model and defines what queries can be expressed in the data model. Physical algebra is system specific. For example, nested-loops joins, merge-join, hash join, etc. Systems may implement same model and logical algebra with different physical algebra. 11

Logical algebra VS physical algebra Specific algorithms and therefore cost functions are associated only with physical operators not logical algebra operators Mapping logical to physical non trivial: It involves algorithm choices Logical and physical operators not directly mapped Some operators in physical algebra may implement multiple logical operators etc 12

To sum up Query optimization is the mapping from logical to physical operations Query execution engine is the implementation of operations on physical representation types and of mechanisms for coordination and cooperation among multiple such operations in complex queries. 13

Implementation issues Prepare an operator for producing data : Open Produce an item : Next Perform final housekeeping: Close 14

Iterators Two important features of operators Can be combined into arbitrarily complex evaluation plans Any number of operators can schedule and execute each other in a single process without assistant from underlying OS

Observations Entire query plan executed within a single process Operators produce an item at a time on request Items never wait in a temporary file or buffer (pipelining) Efficient in time-space-product memory cost Iterators can schedule any type of trees including bushy trees No operator affected by the complexity of the whole plan

Sorting &Hashing The purpose of many query-processing algorithms is to perform some kind of matching, i.e., bringing items that are alike together and performing some operation on them. There are two basic approaches used for this purpose: sorting hashing. These are the basis for many join algorithms

Design Issues Sorting should be implemented as an iterator In order to ensure that sort module interfaces well with the other operators, (e.g., file scan or merge-join). Input to the sort module must be an iterator, and sort uses open, next, and close procedures to request its input therefore, sort input can come from a scan or a complex query plan, and sort operator can be inserted into a query plan at any place or at several places.

Out of Memory If the input is larger than main memory, the open-sort procedure creates sorted runs and merges them until only one final merge phase is left. Dividing and combining

More on Sorting For sorting large data sets there are two distinct sub-algorithm : One for sorting within main memory One for managing subsets of the data set on the disk. For practical reasons, e.g., ensuring that a run fits into main memory, the disk management algorithm typically uses physical dividing and logical combining (merging). A point of practical importance is the fan-in or degree of merging, but this is a parameter rather than a defining algorithm property.

Creating level-0 runs In memory sort, e.g. Quicksort. By doing so, each run will have the size of allocated memory. Replacement selection. Using priority heap, which support insert and removesmallest operations. The main purpose is to generate the longest initial ordered sequence for each file.

Replacement Selection Sort Input Buffer, Output Buffer and Ram have the same size. Input File Input Buffer RAM Output Buffer Output File

Replacement Selection Sort In memory: 12 Next input: 20 10 19 31 25 21 56 40 Do not need to care When the run is Larger than the memory

Merging In Memory Run 1 Run 2 Run 3 Output Buffer On the disk

Merging Scans are faster if read-ahead and write-behind are used; Using large cluster sizes for the run files is very beneficial; The number of runs W is typically not a power of Fan-in;

Merging as many as possible instead of always merging the same level

Quick Sort VS Replacement selection Run files in RS are typically larger than memory,as oppose to QS where they are the size of the memory Qs results in burst of reads and writes for entire memory loads from the input file to initial run files while RS alternates between individual read and write In RS memory management is more complex The advantage of having fewer runs must be balanced with the different I/0 pattern and the disadvantage of more complex memory management.

2.2 Hashing Alternative to sorting Expected complexity of hashing algorithms is O(N) rather than O( N log N) as for sorting. Hash-based query processing algorithms use an in-memory hash table of database objects to perform their matching task.

Hash Table overflow When hash table is larger than memory, hash table overflow occurs and must be dealt with. Input divided into multiple partition files such that partitions can be processed independently from one another Concatenation of results of all partitions is the result of the entire operation. 29

Discussion 2 Question 2 Fact1: Nowadays, data is larger and larger than before. Fact2: Memory overflow was a serious problem to both sorting and hashing at the time when the paper was written. Fact3: Machines are more powerful than the time of this paper: such as larger memory.

The question is For query execution in large database system: Do you think that in the time since then the issues would have gotten Better? Because memories have gotten larger? Worse? Because there is a bigger gap between the time it takes to access memory and the time to access things on disk? Do you think increasing power of the machines could deal with the large data processing or still worth making great effort on the field as this paper did? Are there any other new issues worth consideration?

3. Disk Access File Scans Associative Access Using Indices Buffer Management

3.1 File scans Make it fast -> Read ahead -> requires contiguous file allocation

Associative Access Using Indices Goal: To reduce the number of accesses to secondary storage How? By employing associative search techniques in the form of indices Indices map key or attribute values to locator information with which database objects can be retrieved.

Some Index Structures: Clustered & Un-clustered Clustered: order or organization of index entries determines order of items on disk. Sparse & Dense Sparse: Indices do not contain an entry for each data item in the primary file, but only one entry for each page of the primary file; Dense: there are same number of entries in index as there are items in primary file. Non-clustering indices must always be dense

Buffer Management Goal: reduce I/O cost by cashing data in an I/O buffer. Design Issues Recovery Replacement policy performance effect of buffer allocation Interactions of index retrieval and buffer management Implementation Issues Interface provided : fixing unfixing Intermediate results kept in a separate buffer

Part 5 Binary Matching Operations There are a number of database operations that combine information from two inputs, files, or sets. A group of operators that all do basically the same task are called the one-to-one match operations.

5.1 Nested-Loops Join For each item in one input (called the outer input), scan the entire other input (called the inner input) and find matches. Temporary file used to store inner input. Shortcomings: the inner input is scanned very often.

Improvements for Nested-Loops If a single match (semi-join) carries all necessary information, a scan of the inner input can be terminated after the first match. Scan Each item of inner input-> scan each page (block nested-loops join). More in memory computation Scanning the inner input Forward and Backward, reusing the last page.

5.2 Merge-Join Algorithms Require both inputs are sorted on the join attribute. Similar to merge process used in sorting Heap-filter merge-join: a combination of nested-loops join and merge-join Hybrid join (used by IBM for DB2), uses elements from index nested-loop joins and merge join, and techniques joining sorted lists on index leaf entries.

5.3 Hash Join Algorithms Are based on the idea of building an inmemory hash table on one input and then probing this hash table using items from the other input. Good News: Very fast, no temporary files, if the build input does indeed fits into memory.

5.3 Hash Join Algorithms Build and probe inputs are partitioned using the same partitioning function. The final join result comes from concatenating the join results of pairs of partitioning files. Recursive partitioning is used when a build partition file is larger than memory.

Discussion The author mentions that many commercial database systems use only the nested loop and merge joins due to the results of the System R project, which did not consider hash join algorithms. He states that hash join is today regarded as the most efficient in many cases. Should hash joins have been considered for use in the System R project? Why? What factors might have made the System R team not consider hash join?

7. Duality of Sort And Hash-Based Query Processing Algorithms Sort based algorithms: a large data set is divided into subsets using a physical rule, namely into chunks as large as memory. Hash based algorithms: Large inputs are cut into subsets by hash value.

Several dualities I/O capabilities (random and sequential). Memory limitation (fan-in, fan-out) Read-ahead (merging) VS write-behind (partitioning) Etc There exist many dualities between sorting using multilevel merging and recursive hash table overflow management.

Two special cases If two join inputs are of different size (and the query optimizer can reliably predict this difference), hybrid hash join outperforms merge-join. If the hash function is very poor, merge-join performs better than hash join.

8. Execution of Complex Query Plans Optimal scheduling of multiple operators and the division and allocation of resources in a complex plan are important issues. Share of physical resources, i.e. memory, disk bandwidth

earlier relational execution engines VS nowadays In earlier time: Concurrent execution of multiple subplans in a single query was not possible. Temporary files written impedes concurrency and need for resource allocation. Now: Consider more join algorithms, more complex query plans. Support more concurrent users and use parallelprocessing capabilities.

For bushy plans Stop points (all data in temporary files and no intermediate result data in memory) can be used to switch subplans. Which subplan is initiated first. Memory partition among active concurrent operators. Recursion levels should be executed level by level in recursive hybrid hash join

Other I/O issues, e.g. disk bandwidth and disk arms for seeking in partitioning and merging need to be considered. Scheduling bushy trees in multiprocessor systems is not entirely understood yet.

Discussion 4 When tuning your database system for performance, how much would you weight the common queries versus the computationally complex queries? the queries that touch a large amount of data versus the queries that touch a small amount of data?

It s over.