Database System Architecture and Implementation Kristin Tufte Execution Costs 1
Web Forms Orientation Applications SQL Interface SQL Commands Executor Operator Evaluator Parser Optimizer DBMS Transaction Manager We are here! Lock Manager Files and Index Structures Buffer Manager Disk Space Manager Recovery Manager Index and Data Files Catalog Database Figure Credit: Raghu Ramakrishnan and Johannes Gehrke: Database Management Systems, McGraw-Hill, 2003. 2
Recall Heap Files Heap files provide just enough structure to maintain a collection of records (of a table) The heap file supports sequential (openscan( )) over the collection! SQL query leading to a sequential scan SELECT A, B FROM R No other operations get specific support from heap files 3
Systematic File Organization! SQL queries calling for systematic file organization SELECT A, B SELECT A, B FROM R FROM R WHERE C > 45 ORDER BY C ASC For the above queries, it would definitely be helpful if the SQL query processor could rely on a particular file organization of the records in the file for table R " Exercise Which organization of records in the file for table R could speed up the evaluation of both queries above? 4
Systematic File Organization! SQL queries calling for systematic file organization SELECT A, B SELECT A, B FROM R FROM R WHERE C > 45 ORDER BY C ASC For the above queries, it would definitely be helpful if the SQL query processor could rely on a particular file organization of the records in the file for table R " Exercise Which organization of records in the file for table R could speed up the evaluation of both queries above? Allocate records of table R in ascending order of attribute C values Place records in neighboring pages (Only include columns A, B, and C in the records) 5
Module Overview Three different file organizations 1. files containing randomly ordered records (heap files) 2. files sorted on one or more record fields 3. files hashed on one or more record fields Comparison of file organizations simple cost model application of cost model to file operations Introduction to index concept clustered vs. unclustered indexes dense vs. sparse indexes 6
Comparison of File Organizations Competition of three file organizations in five disciplines 1. scan: read all records in a give file 2. search with equality test 3. search with range selection (upper or lower bound may be unspecified) 4. insert a given record in the file, respecting the file s organization 5. delete a record (identified by its rid), maintain the file s! SQL organization queries calling for equality test and range selection support SELECT * SELECT * FROM R FROM R WHERE C = 45 WHERE A > 0 AND A < 100 7
Simple Cost Model A cost model is used to analyze the execution time of a given database operations block I/O operations are typically a major cost factor CPU time to account for searching inside a page, comparing a record field to selection constant, etc. To estimate the execution time of the five database operation, we introduce a coarse cost model omits cost of network access does not consider cache effects neglects burst I/O Cost models play an important role in query optimization 8
# Simple cost model parameters Simple Cost Model Parameter Description b number of pages in the file r number of records on a page D time to read/write a disk page C CPU time needed to process a record (e.g., compare a field value) H CPU time take to apply a function to a record (e.g., a comparison or hash function) Some typical values D 15 ms C H 0.1 µs 9
# A simple hash function Back to the Future A hashed file uses a hash function h to map a given record onto a specific page of the file. Example: h uses the lower 3 bits of the first field (of type INTEGER) of the record to compute the corresponding page number. h( 42, true, dog ) 2 (42 = 101010 2 ) h( 14, true, cat ) 6 (14 = 1110 2 ) h( 26, false, mouse ) 2 (26 = 11010 2 ) The hash function determines the page number only, record placement inside a page is not prescribed If a page p is filled to capacity, a chain of overflow pages is maintained to store additional records with h( ) = p To avoid immediate overflowing when a new record is inserted, pages are typically filled to 80% only when a heap file is initially (re)organized into a hash file 10
Cost of Scan 11
" Scanning a hashed file Hashed File In which order does a scan of a hashed file retrieve its records? 12
Cost of Search with Equality Test " Nevertheless, no DBMS will implement binary search for value lookup Why? 13
Cost of Search with Equality Test 14
Cost of Search with Range Selection 15
Cost of Insert 16
Cost of Delete 17
Performance Comparison Performance of range selections for files of increasing size (D = 15 ms, C = 0.1 µs, r = 100, n = 10) # Performance graph Figure Credit: Marc H. Scholl, University of Konstanz, Germany 18
Performance Comparison Performance of deletions for files of increasing size (D = 15 ms, C = 0.1 µs, r = 100, n = 1) # Performance graph Figure Credit: Marc H. Scholl, University of Konstanz, Germany 19
And the Winner Is There is no single file organization that responds equally fast to all five operations This is a dilemma because more advanced file organizations can make a real difference in speed (see previous slides) There exist index structures which offer all advantages of a sorted file and support insertions/deletions efficiently (at the cost of a modest space overhead): B+ trees Before discussing B+ trees in detail, the following introduces the index concept in general 20