Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor in user satisfaction - depends on Disk access - efficient data structures for data representation - efficiency of system operation on those structures - one of the most critical factors in performance - main memory is in general not big enough for entire DB - recovery problem with main memory DB - disk contains data files and system files including data dictionary and index files Physical-1
Storage Media Hierarchy Storage medium: primary storage and secondary storage - database is stored physically on some some storage medium - primary storage: can be operated directly by CPU --- main memory & cache - secondary storage: larger capacity, lower cost, slower access; cannot be operated directly by CPU; must be copied to primary Hierarchy - access speed, cost per unit of data, reliability - cache: fastest and most costly - main memory - flash memory: limited number of writes (also slow) non-volatile: disk-substitute in embedded systems - magnetic disk and optical disk (CD-ROM) - tape storage: sequential access; for backup and archival Physical-2
Disk Access and Buffer Management Disk - direct access storage device (not sequential) - arm movement involves seek time and latency time - goal is to reduce # of disk access and seek time - a block need not to be transferred every time - buffer blocks: closely related with concurrency control and recovery strategy of the database system Buffer management - goal is to increase hit ratio - similar to virtual memory management in OS - differences: forced writing for recovery and MRU (most recently used first) replacement algorithm - priority-based replacement: data dictionary and index blocks have high priority Physical-3
RAID Redundant arrays of independent disks - motivation: large # of small disks might be cost effective; higher reliability and higher performance Higher reliability by redundancy - mirroring/shadowing: a logical disk consists of two physical disks --- write on both Higher performance by parallelism - data striping: splitting data across multiple disks - bit-level or block-level striping - with n disks, block i will go to disk (i mod n) + 1 RAID levels - to provide redundancy at lower cost using disk striping combined with error-correcting bits, instead of mirroring Physical-4
File Organization File - a sequence of records mapped unto disk blocks - block: unit of data transfer between disk and memory - block size ranges from 512 bytes to few Kbytes - fixed-length records vs variable-length records Fixed-length records - size of each field is declared - when delete, mark it to be ignored: searching for deleted free space may not be efficient - use pointer for free space: danger of dangling pointer which no longer points to the desired record - problem of interblock records: needs 2 accesses... block i) (block i+1... ---------- record j Physical-5
Variable-length Records When such situations occur? - multiple record types in one file - record type allows variable length fields - repeating groups (multiple values) Methods to deal with them - byte string representation: special end-of-record symbol ( ) at the end of each record - each record is a string of consecutive bytes - difficulty in reusing the space of deleted record - fixed-length representation: 1) reserved space for expected maximum length - useful only if most are close to max. length 2) a list of fixed-length records chained by pointers 3) anchor block (first record of the chain) and overflow block (all the others) chained by pointers Physical-6
Mapping Data to Files Relational database - straight-forward - in most cases, each relation in a separate file File organization - how to organize a given set of records in files - heap file: any record can be placed anywhere (no ordering) - sequential file: records are stored in a sequential order - hashing file: hash function computes the specific block for the record based some attribute value - clustering file: records of different relations stored on the same file/block for efficient processing - related records can be read by one block read - may be inefficient for other operations Physical-7
Efficient Searching Additional structures help searching - associated with files to make the search for records based on certain field more efficient - for direct data locating w/o sequential search - two approaches: indexing and hashing Sequential file - records are chained together by pointers for fast retrieval in search key order - records are stored physically in search key order to minimize the number of block accesses - difficult to maintain the physical sequential order as records are inserted and deleted - binary search for files can be done on the blocks rather than on the records, if block address are available in the file header Physical-8
Index Structures Index file - index is usually defined on a single field of a record (index field) - index file is for fast random access Dense index - one index record for every search-key value - faster access but higher overhead Sparse index - index records for only some of the records - less faster but less overhead (Brighton) (record: Brighton,..) (Brighton) (Downhill) (record: Downhill,..) (Marinion) (record: Marinion,..) (Marinion) dense index sparse index Physical-9
Index Structures Hierarchy of index - multi-level index for a large index file - index tree (search tree) Primary and secondary index - primary index is the one whose search key specifies the sequential order of the file - secondary index: index other than primary one - secondary index improves the performance of queries that use keys other than the primary search key - modifying DB imposes a serious overhead on secondary index (compared to the primary index) - dense index is desirable than sparse index for secondary index, since the file is not ordered physically according to the secondary index Physical-10
Clustering Index Clustering field - a non-key field that does not have a distinct value for each record, on which records of a file are physically ordered Clustering index - clustering index is to speed up retrieval of records that have the same value for the clustering field - differs from primary index which requires that ordering field should have a distinct value for each record Physical-11
Index File Index file size - index file for a primary index need substantially fewer blocks than the data file - why? - fewer index entries: an entry exists for each block of data file rather than for each record - index entry is smaller in size than a data record: only two fields (key value and block pointer) Blocking factor (bfr) - savings in disk block accesses - bfr = block size (B) / record length (R) Physical-12
Index File: Example An ordered file with 30,000 records, B = 1 Kb, R = 100 bytes - bfr = 10; data file needs 3000 blocks - binary search would require (log 2 Blocks) = 12 accesses - with ordering key field of 9 bytes and block pointer of 6 bytes, size of primary index entry = 15 bytes - bfr = block size (B) / record length (R) = 68 - total # of index entries: 3000 - # of blocks needed for the index = (3000/68) = 45 - binary search on index file would require (log 2 B i ) = (log 2 45) = 6 accesses - search for a record using the primary index 6 (for index) + 1 (for data) = 7 accesses Physical-13
Search Tree Disadvantage of indexed sequential file organization - performance degradation as file grows - file reorganization can avoid this performance degradation with its own overhead Search tree - a special type of tree used to guide the search for a record given the value of one of its fields - in a search tree of order p, each node contains at most p 1 search values and p pointers in the order <P 1, K 1,..., P q 1, K q 1, P q >, where q p P i : pointer to a child node or null pointer K i : search key value from some ordered set of values (all search key values are assumed to be unique) for all values X in the subtree pointed by P i, we have K i 1 <X<K i for 1<i<q, X<K i for i=1, and K i 1 <X for i=q Physical-14
B-tree Index Files B-tree (balanced tree) - a search tree with some additional constraints for efficient insertion and deletion - number of access is fixed Formal definition A B-tree of order n is a search tree that satisfies 1) the root has at least two children 2) all nodes other than root have at least n/2 children 3) all leaf nodes are at the same level (balanced) Insertion and deletion - insertion may need split when a node becomes too large (more than n children) - deletion may need combining if a node becomes too small (less than n/2 pointers) - balance property must be maintained Physical-15
B-tree and B+-tree Node structure of B-tree <P 1, (K 1, Pr 1 ), P 2,..., (K q 1, Pr q 1 ), P q > P i : tree pointer to point another node K i : search key value Pr i : data pointer to point record whose search key field value is K i (or the data block containing it) - within each node, K 1 < K 2 <.. <K q 1 - for all values X in the subtree pointed by P i, we have K i 1 <X<K i for 1<i<q, X<K i for i=1, and K i 1 <X for i=q - a node with q tree pointers, q p, has q 1 search key field values, and hence q 1 data pointers B + -tree: a variation of B-tree data structure - most widely used multi-level index implementation <P 1, K 1,..., P q 1, K q 1, P q >, where q p - at leaf node, it is <K 1, Pr 1,..., K q 1, Pr q 1, P next > where P next points to the next leaf node of the tree Physical-16
B+-tree Requirements for maintaining B + -tree - every node must contain at least n/2 pointers except for the root (which should have at least 2) - balanced: for ensuring good performance Searching for key field value K 1) visit the root node, looking for the smallest key value greater than K. Suppose the value is K i. 2) follow pointer P i to another node - if K < K 1, then follow P 1 - if K > K max, then follow P max 3) repeat step 2 until reaching a leaf node Physical-17
Differences of B+-tree from B-tree 1. In B + -tree, data pointers are stored only at the leaf nodes - more entires can be packed into internal (non-leaf) nodes of a B + -tree than for a similar B-tree - for the same block (node) size, the order p will be larger for the B + -tree than for the B-tree --- improved search time - B-tree eliminates redundant storage of search key values - faster search in some cases to find desired search key values before reading a leaf node in B-tree 2. Leaf and non-leaf nodes are of the same size in B + -tree, while in B-tree, non-leaf nodes are larger - complication in storage management for index structures 3. Deletion in B-tree is more complicated - in B + -tree, deleted entry always appears in a leaf - in B-tree, it can be a non-leaf node, requiring replacement by the proper value from the subtree of the node containing the deleted entry Physical-18