Chapter 14. Indexing Structures for Files. Chapter Outline. Indexes as Access Paths. Primary Indexes Clustering Indexes Secondary Indexes

Similar documents
Chapter 8: Structures for Files. Truong Quynh Chi Spring- 2013

Database Systems. Session 8 Main Theme. Physical Database Design, Query Execution Concepts and Database Programming Techniques

DATABASE DESIGN - 1DL400

Physical Data Organization

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers

Databases and Information Systems 1 Part 3: Storage Structures and Indices

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

File Management. Chapter 12

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Overview of Storage and Indexing

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Chapter 13. Disk Storage, Basic File Structures, and Hashing

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

Analysis of Algorithms I: Binary Search Trees

CHAPTER 13: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING

CSE 326: Data Structures B-Trees and B+ Trees

DATABASDESIGN FÖR INGENJÖRER - 1DL124

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Full and Complete Binary Trees

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

Record Storage and Primary File Organization

Big Data and Scripting. Part 4: Memory Hierarchies

Overview of Storage and Indexing. Data on External Storage. Alternative File Organizations. Chapter 8

Data Warehousing und Data Mining

File System Management

Lecture 1: Data Storage & Index

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

Merkle Hash Trees for Distributed Audit Logs

Unit Storage Structures 1. Storage Structures. Unit 4.3

root node level: internal node edge leaf node Data Structures & Algorithms McQuain

Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

Performance Evaluation of Natural and Surrogate Key Database Architectures

Chapter 12 File Management

Chapter 12 File Management. Roadmap

Data Structures Fibonacci Heaps, Amortized Analysis

Chapter 13: Query Processing. Basic Steps in Query Processing

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

Persistent Binary Search Trees

Patel Group of Institutions Advance Database Management System MCA SEM V UNIT -1

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Tables so far. set() get() delete() BST Average O(lg n) O(lg n) O(lg n) Worst O(n) O(n) O(n) RB Tree Average O(lg n) O(lg n) O(lg n)

Questions 1 through 25 are worth 2 points each. Choose one best answer for each.

Chapter 10: Virtual Memory. Lesson 03: Page tables and address translation process using page tables

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time

Two Parts. Filesystem Interface. Filesystem design. Interface the user sees. Implementing the interface

SQL Query Evaluation. Winter Lecture 23

Symbol Tables. Introduction

Buffer Management 5. Buffer Management

Multi-dimensional index structures Part I: motivation

Converting a Number from Decimal to Binary

SMALL INDEX LARGE INDEX (SILT)

CIS 631 Database Management Systems Sample Final Exam

Binary Search Trees CMPSC 122

International journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online

Lecture 2 February 12, 2003

Algorithms Chapter 12 Binary Search Trees

A Deduplication-based Data Archiving System

Chapter 12 File Management

The Advantages and Disadvantages of Network Computing Nodes

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Binary Search Trees 3/20/14

Project Group High- performance Flexible File System 2010 / 2011

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

Physical Database Design and Tuning

Sample Questions Csci 1112 A. Bellaachia

1. Physical Database Design in Relational Databases (1)

Original-page small file oriented EXT3 file storage system

Storage Management for Files of Dynamic Records

File Management. Chapter 12

Lecture 18: Applications of Dynamic Programming Steven Skiena. Department of Computer Science State University of New York Stony Brook, NY

ibigtable: Practical Data Integrity for BigTable in Public Cloud

MAC Sublayer. Abusayeed Saifullah. CS 5600 Computer Networks. These slides are adapted from Kurose and Ross

External Memory Geometric Data Structures

Binary Heaps. CSE 373 Data Structures

Abstract Data Type. EECS 281: Data Structures and Algorithms. The Foundation: Data Structures and Abstract Data Types

M-way Trees and B-Trees

Parallelization: Binary Tree Traversal

In-Memory Databases MemSQL

File-System Implementation

Scalable Prefix Matching for Internet Packet Forwarding

Inline Deduplication

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

How To Create A Tree From A Tree In Runtime (For A Tree)

Data Structure with C

Review of Hashing: Integer Keys

Algorithms and Data Structures

History-Independent Cuckoo Hashing

Data storage Tree indexes

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

Introduction Advantages and Disadvantages Algorithm TIME COMPLEXITY. Splay Tree. Cheruku Ravi Teja. November 14, 2011

Transcription:

Chapter 14 Indexing Structures for Files Chapter Outline Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes Dynamic Multilevel Indexes Using B-Trees and B+-Trees Indexes on Multiple Keys 2 Indexes as Access Paths A single-level index is an auxiliary file that makes it more efficient to search for a record in the data file. The index is usually specified on one field of the file (although it could be specified on several fields) One form of an index is a file of entries <field value, pointer to record>, which is ordered by field value The index is called an access path on the field. 3

4 Indexes as Access Paths The index file usually occupies considerably less disk blocks than the data file because its entries are much smaller A binary search on the index yields a pointer to the file record Indexes can also be characterized as dense or sparse. A dense index has an index entry for every search key value (and hence every record) in the data file. A sparse (or nondense) index, on the other hand, has index entries for only some of the search values Indexes as Access Paths Example: Given the following data file: EMPLOYEE(NAME, SSN, ADDRESS, JOB, SAL,... ) Suppose that: record size R=150 bytes block size B=512 bytes r=30000 records Then, we get: blocking factor Bfr= B div R= 512 div 150= 3 records/block number of file blocks b= (r/bfr)= (30000/3)= 10000 blocks For an index on the SSN field, assume the field size V SSN =9 bytes, assume the record pointer size P R =7 bytes. Then: index entry size R I =(V SSN + P R )=(9+7)=16 bytes index blocking factor Bfr I = B div R I = 512 div 16= 32 entries/block number of index blocks b= (r/ Bfr I )= (30000/32)= 938 blocks binary search needs log 2 bi= log 2 938= 10 block accesses This is compared to an average linear search cost of: (b/2)= 30000/2= 15000 block accesses If the file records are ordered, the binary search cost would be: log 2 b= log 2 30000= 15 block accesses 5 Types of Single Level indexes Primary Index Clustering Index Secondary Index 6

7 Primary Index Defined on an ordered data file The data file is ordered on a key field Includes one index entry for each block in the data file; the index entry has the key field value for the first record in the block, which is called the block anchor A similar scheme can use the last record in a block. A primary index is a nondense (sparse) index, since it includes an entry for each disk block of the data file and the keys of its anchor record rather than for every search value. Primary Index on the Ordering Key Field 8 Clustering Index Defined on an ordered data file The data file is ordered on a non-key field unlike primary index, which requires that the ordering field of the data file have a distinct value for each record. Includes one index entry for each distinct value of the field; the index entry points to the first data block that contains records with that field value. It is another example of nondense index Insertion and Deletion is relatively straightforward with a clustering index. 9

10 A Clustering Index on a Non-key Field Clustering index with a separate block cluster for each group of records that share the same value for the clustering field. 11 Secondary Index A secondary index provides a secondary means of accessing a file for which some primary access already exists. The secondary index may be on a field which is a candidate key and has a unique value in every record, or a nonkey with duplicate values. The index is an ordered file with two fields. The first field is of the same data type as some nonordering field of the data file that is an indexing field. The second field is either a block pointer or a record pointer. There can be many secondary indexes (and hence, indexing fields) for the same file. Includes one entry for each record in the data file; hence, it is a dense index 12

13 A dense secondary index (with block pointers) on a nonordering key field of a file A secondary index (with record pointers) on a non-key field implemented using one level of indirection so that index entries are of fixed length and have unique field values 14 15

16 Multi-Level Indexes Because a single-level index is an ordered file, we can create a primary index to the index itself ; in this case, the original index file is called the first-level index and the index to the index is called the second-level index. We can repeat the process, creating a third, fourth,..., top level until all entries of the top level fit in one disk block A multi-level index can be created for any type of firstlevel index (primary, secondary, clustering) as long as the first-level index consists of more than one disk block A two-level Primary Index 17 Multi-Level Indexes A multi-level index is a form of search tree ; however, insertion and deletion of new index entries is a severe problem because every level of the index is an ordered file. 18

19 A node in a search tree with pointers to subtrees below it A search tree of order p = 3 20 Dynamic Multilevel Indexes Using B-Trees and B+-Trees Because of the insertion and deletion problem, most multilevel indexes use B-tree or B+-tree data structures, which leave space in each tree node (disk block) to allow for new index entries These data structures are variations of search trees that allow efficient insertion and deletion of new search values. In B-Tree and B+-Tree data structures, each node corresponds to a disk block Each node is kept between half-full and completely full 21

22 Dynamic Multilevel Indexes Using B-Trees and B+-Trees An insertion into a node that is not full is quite efficient; if a node is full the insertion causes a split into two nodes Splitting may propagate to other tree levels A deletion is quite efficient if a node does not become less than half full If a deletion causes a node to become less than half full, it must be merged with neighboring nodes Difference between B-tree and B+-tree In a B-tree, pointers to data records exist at all levels of the tree In a B+-tree, all pointers to data records exist at the leaflevel nodes A B+-tree can have less levels (or higher capacity of search values) than the corresponding B-tree 23 B-tree structures. (a) A node in a B-tree with q 1 search values. (b) A B-tree of order p = 3. The values were inserted in the order 8, 5, 1, 7, 3, 12, 9, 6. 24

25 The nodes of a B+-tree. (a) Internal node of a B+-tree with q 1 search values. (b) Leaf node of a B+-tree with q 1 search values and q 1 data pointers. Indexes on Multiple Keys Situations arise in which a search on multiple keys is necessary If this kind of search is frequent, it may be prudent to create indexes on combinations of the keys involved 26 Hashing on Multiple Keys The hash function is constructed using multiple keys (usually to construct different pieces of the hash key) Suitable only for exact-value searches 27