Lecture 1: Data Storage & Index

Similar documents
Storage in Database Systems. CMPSCI 445 Fall 2010

Overview of Storage and Indexing

Storing Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Chapter 13. Disk Storage, Basic File Structures, and Hashing

Overview of Storage and Indexing. Data on External Storage. Alternative File Organizations. Chapter 8

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Physical Data Organization

6. Storage and File Structures

Chapter 8: Structures for Files. Truong Quynh Chi Spring- 2013

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.

Storage and File Structure

Record Storage and Primary File Organization

DATABASE DESIGN - 1DL400

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers

Databases and Information Systems 1 Part 3: Storage Structures and Indices

Storing Data: Disks and Files

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

The Classical Architecture. Storage 1 / 36

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

CHAPTER 13: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING

Tables so far. set() get() delete() BST Average O(lg n) O(lg n) O(lg n) Worst O(n) O(n) O(n) RB Tree Average O(lg n) O(lg n) O(lg n)

Big Data and Scripting. Part 4: Memory Hierarchies

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

Data Warehousing und Data Mining

Database Systems. Session 8 Main Theme. Physical Database Design, Query Execution Concepts and Database Programming Techniques

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

Chapter 13: Query Processing. Basic Steps in Query Processing

Query Processing C H A P T E R12. Practice Exercises

File Management. Chapter 12

CIS 631 Database Management Systems Sample Final Exam

Principles of Database Management Systems. Overview. Principles of Data Layout. Topic for today. "Executive Summary": here.

File Management Chapters 10, 11, 12

Data storage Tree indexes

Chapter 10: Storage and File Structure

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Review of Hashing: Integer Keys

DATA STRUCTURES USING C

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

1 File Management. 1.1 Naming. COMP 242 Class Notes Section 6: File Management

COS 318: Operating Systems. File Layout and Directories. Topics. File System Components. Steps to Open A File

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

Unit Storage Structures 1. Storage Structures. Unit 4.3

Database 2 Lecture I. Alessandro Artale

The Database is Slow

File System Management

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

FAWN - a Fast Array of Wimpy Nodes

Lecture 2 February 12, 2003

COS 318: Operating Systems

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

Secondary Storage. Any modern computer system will incorporate (at least) two levels of storage: magnetic disk/optical devices/tape systems

& Data Processing 2. Exercise 2: File Systems. Dipl.-Ing. Bogdan Marin. Universität Duisburg-Essen

Database Management Systems

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

Chapter 1 File Organization 1.0 OBJECTIVES 1.1 INTRODUCTION 1.2 STORAGE DEVICES CHARACTERISTICS

Raima Database Manager Version 14.0 In-memory Database Engine

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University. (

Bigdata High Availability (HA) Architecture

Chapter 12 File Management

Project Group High- performance Flexible File System 2010 / 2011

Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A

University of Dublin Trinity College. Storage Hardware.

CSE 326: Data Structures B-Trees and B+ Trees

Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.

File Management. Chapter 12

SMALL INDEX LARGE INDEX (SILT)

Chapter 12 File Management

Chapter 12 File Management. Roadmap

COS 318: Operating Systems. Storage Devices. Kai Li and Andy Bavier Computer Science Department Princeton University

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

Binary Heap Algorithms

File Systems Management and Examples

10CS35: Data Structures Using C

Concurrency Control. Chapter 17. Comp 521 Files and Databases Fall

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Vector storage and access; algorithms in GIS. This is lecture 6

Physical DB design and tuning: outline

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

Multi-dimensional index structures Part I: motivation

Architecture and Implementation of Database Management Systems

Memory Allocation. Static Allocation. Dynamic Allocation. Memory Management. Dynamic Allocation. Dynamic Storage Allocation

Answer Key. UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division

Binary Heaps. CSE 373 Data Structures

Linux Driver Devices. Why, When, Which, How?

SQL Query Evaluation. Winter Lecture 23

2) What is the structure of an organization? Explain how IT support at different organizational levels.

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

Buffer Management 5. Buffer Management

Normalisation and Data Storage Devices

Computer Architecture

Lecture 16: Storage Devices

Chapter 11 I/O Management and Disk Scheduling

Transcription:

Lecture 1: Data Storage & Index R&G Chapter 8-11 Concurrency control Query Execution and Optimization Relational Operators File & Access Methods Buffer Management Disk Space Management Recovery Manager 1

Where are we? Concurrency control Query Execution and Optimization Relational Operators File & Access Methods Buffer Management Disk Space Management Recovery Manager 2

Magnetic Disk Read/write/transfer in blocks (pages) Courtesy to R. Burns 3

A real disk image from Seagate Technology Corporation Arm Platter Actuator Spindle 4

Data access in a disk Access time = seek time + rotational delay + transfer time 5

Disk space manager allocate or de-allocate pages in the disk Abstraction of pages Maintains free blocks Basic Interface: allocate_page, allocate one or more new free pages, remove them from the list of free pages. deallocate_page, de-allocate one or more pages, put them into the list of free pages. Read_page Write_page 6

Where are we? Concurrency control Query Execution and Optimization Relational Operators File & Access Methods Buffer Management Disk Space Management Recovery Manager 7

To avoid always reading/wrting pages from disk use the available memory as buffer pool Divided into frames which contains pages from the disk Buffer Pool Page read/write requests Note: data have to be in RAM for the DBMS to operate on them Disk 8

Page maintenance in a buffer pool (pin_count = 0) 1) Pin a frame when its page is requested (pin_count++) (pin_count = 1) 0) Unpin a frame when its page is released (pin_count--) A page is dirty if it has been modified but not updated on the disk yet 9

How to process a page request? No Already in a frame f i? Yes Increment the pin_count of f i and return f i Exist a non-used frame f j No Choose a frame f j for replacement Yes No Is f j dirty? critical to the performance Yes Read page p into f j and return f j Write the page in f j to disk 10

A page replacement policy determines which frame to be replaced General Rule: keep those pages that might be accessed soon in the future A frame is considered for replacement only if its pin_count == 0. LRU (Least Recently Used) policy: - Choose the one that hasn t been used for the longest time - Implemented as a queue of pages with pin_count == 0 Frame chosen for replacement LRU insert Frame whose pin_count just goes to 0 What is the assumption of LRU? remove Frame whose pin_count goes above 0 11

Clock policy approximates LRU Every frame is associated with a Reference Bit (R). - R is set to 1 when a frame s pin_count goes down to 0. L A B On replacement request: 1. Advance the pointer. 2. If R == 0 and pin_count==0, choose the frame. 3. Else if R == 1, set R to 0 and goes to step 1. J K I C E D Clock has a lower cost than LRU. (Why?) H G F 12

Where are we Concurrency control Query Execution and Optimization Relational Operators File & Access Methods Buffer Management Disk Space Management Recovery Manager 13

Data are abstracted as files of records for higher level DBMS components Relation (Table) Represented as File of Records Stored as How to keep track of - pages in a file? - free space in each page? - records in each page? Pages 14

Directory format: use a directory to indicate the data pages used by a file Header Page Data Page 1 Data Page 2 DIRECTORY Data Page N Free space within a data page can be indicated in the directory entry. Where to find the header page? System Catalog! 16

How are records organized within a page? Rid = (i,n) Page i Rid = (i,2) Rid = (i,1) FREE SPACE 20 16 24 N N... 2 1 # slots SLOT DIRECTORY Pointer to start of free space How to identify a record? Record id (RID) = <Page id, slot id> 18

How the fields are organized in a record? Field 1 Field 2 Field 3 Field 4 Fields with fixed size: just store them contiguously Field1 $ Field2 $ Field3 $ Field4 $ Fields with variable size: use special characters to delimit each field. Field1 Field2 Field3 Field4 Again, directory! 19

Summary How do disks work? Disks read/write/transfer data in the unit of page. Data transfer is the dominant cost of data access. How to reduce disk I/O Keep pages that will be accessed in the future in the memory Replacement policies: LRU, Clock, MRU, and etc. How to organize the data in a disk? Abstracted as file of records Directory can be used to locate pages of a file in a disk locate records in a page locate fields in a record 20

Heap file abstraction enables retrieving records by their RID or scanning records sequentially Record id (RID) = <Page id, slot id> Pages Page Record sequential scan: Look for the header page of a file in the catalog Header Page DIRECTORY Data Page 1 Data Page 2 Data Page N Read each record in each page sequentially 21

What if we want to look up records by their values Example: o Find all students in IMADA o Find all students with a Scores > 10 Solution 1: sequential scan and check the values of each record. o need to read all the pages slow! Solution 2: organize the data in the file by their values: o sorted file (sorted on one field) o use binary search to speed up o How about searching by the value of another field? sequential search again! o High cost when data are updated! 22

Index is a data structure used to speeds up valuebased search of records conditions of the values on one or more fields input Index output the records or locations of the records satisfying the conditions An index contains a collection of data entries. And a data structure to search the data entries matching the search key. o Tree B+ Tree index both equality and range search o Hash table Hash index only equality search An index is stored as a File An index supports the search of one or more fields, which is called the search key of the index 23

Alternatives for Data Entry k* in Index Three alternatives: 1. Actual data record (with key value k) 2. <k, rid of matching data record> 3. <k, list of rids of matching data records> Choice is orthogonal to the indexing technique. Examples of indexing techniques: B+ trees, hashbased structures, R trees, Typically, index contains auxiliary information that directs searches to the desired data entries Can have multiple (different) indexes per file. E.g. file sorted by age, with a hash index on salary and a B+tree index on name. 24

Alternatives for Data Entries (Contd.) Alternative 1: Actual data record (with key value k) If this is used, index structure is a file organization for data records (like Heap files or sorted files). At most one index on a given collection of data records can use Alternative 1. This alternative saves pointer lookups but can be expensive to maintain with insertions and deletions. 25

Alternatives for Data Entries (Contd.) Alternative 2 <k, rid of matching data record> and Alternative 3 <k, list of rids of matching data records> Easier to maintain than Alt 1. If more than one index is required on a given file, at most one index can use Alternative 1; rest must use Alternatives 2 or 3. Alternative 3 more compact than Alternative 2, but leads to variable sized data entries even if search keys are of fixed length. Even worse, for large rid lists the data entry would have to span multiple blocks! 26

Index Classification Clustered vs. unclustered: If order of data records is the same as, or `close to, order of index data entries, then called clustered index. A file can be clustered on at most one search key. Cost of retrieving data records through index varies greatly based on whether index is clustered or not! Alternative 1 implies clustered, but not vice-versa. 27

Clustered vs. Unclustered Index Suppose that Alternative (2) is used for data entries, and that the data records are stored in a Heap file. To build clustered index, first sort the Heap file (with some free space on each block for future inserts). Overflow blocks may be needed for inserts. (Thus, order of data recs is `close to, but not identical to, the sort order.) CLUSTERED Index entries direct search for data entries UNCLUSTERED Data entries Data entries (Index File) (Data file) Data Records Data Records 28

Unclustered vs. Clustered Indexes What are the tradeoffs???? Clustered Pros Efficient for range searches Clustered Cons Expensive to maintain (on the fly or sloppy with reorganization)

B+ Tree is a balanced tree structure Each node in the tree occupies a page Entries in non-leave nodes à called index entries: <key value, page_id> Entries in leaf nodes à called data entries: <key value, RID> OR <key value, list of RID> OR <key value, data record> Root 13 17 24 30 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* Search for 5*, 15*, or all data entries >= 24* 32

Insert 8* Go to the correct leave Do recursively: If non-full then else insert the entry split and copy/push up the middle key to the parent node Root 13 17 24 30 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 8* 33

Insert 8* Go to the correct leave Do recursively: If non-full then else insert the entry split and copy/push up the middle key to the parent node Root 5 13 17 24 30 2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 34

Insert 8* Go to the correct leave Do recursively: If non-full then else insert the entry split and copy/push up the middle key to the parent node Root 17 Note the difference between copy up and push up 5 13 24 30 2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 35

Delete 19* and 20* Go to the correct leave and delete the entry If not at least half full then redistribute with the sibling; if the sibling doesn t have enough entries then merge with the sibling; Root Keep each page at least half full except the root 17 5 13 24 30 2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 36

19* and 20* deleted now delete 24* Go to the correct leave and delete the entry If not at least half full then redistribute with the sibling; if the sibling doesn t have enough entries then merge with the sibling; Root note the copy up of middle key 27 17 5 13 27 30 2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39* 37

24* deleted Merge with sibling Root 17 note the deletion of key 27 5 13 30 2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39* Merge could cause re-distribution or merge of ancestor nodes Root 5 13 17 30 2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39* 38

Rethink the cost of accessing all the records with index key 24 If the records are in many different pages à high cost L Clustered Index: the real data records are stored in an order close to the order of data entries in the index. Root 13 17 24 30 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 39

Hash-based Index use hash function to look for the data entries Hash function H(key) outputs an integer h(key) = (a * key + b) usually works well. Static hash index uses N primary pages Data entries are stored at the page H(key) mod N If a primary page is full, add an overflow page h(key) mod N key h 0 1 Problem: too many overflow pages N-1 Primary bucket pages Overflow pages 40

Increase the number of buckets when overflow occurs How about simply increase the number of buckets of a static hash index? requires read and write all the pages of the index! Can we only split the overflowed bucket instead of all of them? 41

Extendible hashing Use an directory one entry for each bucket, which points to the primary page of the bucket If a bucket is overflowed split it into two double the directory if needed 42

insert h(key) = 20 (10100) 4*" 12*"32*"16*" Bucket A" 32*"16*" Bucket A" 00" 01" 10" 11" 1*" 5*" 21*"13*" Bucket B" 10*" Bucket C" 000" 001" 010" 011" 1*" 5*" 21*"13*" 10*" Bucket B" Bucket C" 100" 15*"7*" 19*" Bucket D" 101" 110" 15*"7*" 19*" Bucket D" 4*" 12*" 20*" Bucket A2" (`split image'" of Bucket A)" 111" 4*" 12*" 20*" '" Bucket A2"

insert h(key) = 20 (10100) 4*" 12*"32*"16*" Bucket A" 0" 00" 0" 01" 0" 10" 0" 11" 1" 00" 1" 01" 1" 10" 1" 11" 1*" 5*" 21*"13*" Bucket B" 10*" 15*"7*" 19*" 4*" 12*" 20*" Bucket C" Bucket D" Bucket A2" (`split image'" of Bucket A)"

Summary Index can speed up search by values B+ Tree index is good for range search maintain balance on insert/delete Hash index is good for equality search Static hashing suffers from long overflow chains Extendible hashing avoids bucket overflow by doubling the directory Linear hashing avoids directory by splitting buckets round-robin, and using overflow pages. 48