Lecture 1: Data Storage & Index

Size: px
Start display at page:

Download "Lecture 1: Data Storage & Index"

Transcription

1 Lecture 1: Data Storage & Index R&G Chapter 8-11 Concurrency control Query Execution and Optimization Relational Operators File & Access Methods Buffer Management Disk Space Management Recovery Manager 1

2 Where are we? Concurrency control Query Execution and Optimization Relational Operators File & Access Methods Buffer Management Disk Space Management Recovery Manager 2

3 Magnetic Disk Read/write/transfer in blocks (pages) Courtesy to R. Burns 3

4 A real disk image from Seagate Technology Corporation Arm Platter Actuator Spindle 4

5 Data access in a disk Access time = seek time + rotational delay + transfer time 5

6 Disk space manager allocate or de-allocate pages in the disk Abstraction of pages Maintains free blocks Basic Interface: allocate_page, allocate one or more new free pages, remove them from the list of free pages. deallocate_page, de-allocate one or more pages, put them into the list of free pages. Read_page Write_page 6

7 Where are we? Concurrency control Query Execution and Optimization Relational Operators File & Access Methods Buffer Management Disk Space Management Recovery Manager 7

8 To avoid always reading/wrting pages from disk use the available memory as buffer pool Divided into frames which contains pages from the disk Buffer Pool Page read/write requests Note: data have to be in RAM for the DBMS to operate on them Disk 8

9 Page maintenance in a buffer pool (pin_count = 0) 1) Pin a frame when its page is requested (pin_count++) (pin_count = 1) 0) Unpin a frame when its page is released (pin_count--) A page is dirty if it has been modified but not updated on the disk yet 9

10 How to process a page request? No Already in a frame f i? Yes Increment the pin_count of f i and return f i Exist a non-used frame f j No Choose a frame f j for replacement Yes No Is f j dirty? critical to the performance Yes Read page p into f j and return f j Write the page in f j to disk 10

11 A page replacement policy determines which frame to be replaced General Rule: keep those pages that might be accessed soon in the future A frame is considered for replacement only if its pin_count == 0. LRU (Least Recently Used) policy: - Choose the one that hasn t been used for the longest time - Implemented as a queue of pages with pin_count == 0 Frame chosen for replacement LRU insert Frame whose pin_count just goes to 0 What is the assumption of LRU? remove Frame whose pin_count goes above 0 11

12 Clock policy approximates LRU Every frame is associated with a Reference Bit (R). - R is set to 1 when a frame s pin_count goes down to 0. L A B On replacement request: 1. Advance the pointer. 2. If R == 0 and pin_count==0, choose the frame. 3. Else if R == 1, set R to 0 and goes to step 1. J K I C E D Clock has a lower cost than LRU. (Why?) H G F 12

13 Where are we Concurrency control Query Execution and Optimization Relational Operators File & Access Methods Buffer Management Disk Space Management Recovery Manager 13

14 Data are abstracted as files of records for higher level DBMS components Relation (Table) Represented as File of Records Stored as How to keep track of - pages in a file? - free space in each page? - records in each page? Pages 14

15 Directory format: use a directory to indicate the data pages used by a file Header Page Data Page 1 Data Page 2 DIRECTORY Data Page N Free space within a data page can be indicated in the directory entry. Where to find the header page? System Catalog! 16

16 How are records organized within a page? Rid = (i,n) Page i Rid = (i,2) Rid = (i,1) FREE SPACE N N # slots SLOT DIRECTORY Pointer to start of free space How to identify a record? Record id (RID) = <Page id, slot id> 18

17 How the fields are organized in a record? Field 1 Field 2 Field 3 Field 4 Fields with fixed size: just store them contiguously Field1 $ Field2 $ Field3 $ Field4 $ Fields with variable size: use special characters to delimit each field. Field1 Field2 Field3 Field4 Again, directory! 19

18 Summary How do disks work? Disks read/write/transfer data in the unit of page. Data transfer is the dominant cost of data access. How to reduce disk I/O Keep pages that will be accessed in the future in the memory Replacement policies: LRU, Clock, MRU, and etc. How to organize the data in a disk? Abstracted as file of records Directory can be used to locate pages of a file in a disk locate records in a page locate fields in a record 20

19 Heap file abstraction enables retrieving records by their RID or scanning records sequentially Record id (RID) = <Page id, slot id> Pages Page Record sequential scan: Look for the header page of a file in the catalog Header Page DIRECTORY Data Page 1 Data Page 2 Data Page N Read each record in each page sequentially 21

20 What if we want to look up records by their values Example: o Find all students in IMADA o Find all students with a Scores > 10 Solution 1: sequential scan and check the values of each record. o need to read all the pages slow! Solution 2: organize the data in the file by their values: o sorted file (sorted on one field) o use binary search to speed up o How about searching by the value of another field? sequential search again! o High cost when data are updated! 22

21 Index is a data structure used to speeds up valuebased search of records conditions of the values on one or more fields input Index output the records or locations of the records satisfying the conditions An index contains a collection of data entries. And a data structure to search the data entries matching the search key. o Tree B+ Tree index both equality and range search o Hash table Hash index only equality search An index is stored as a File An index supports the search of one or more fields, which is called the search key of the index 23

22 Alternatives for Data Entry k* in Index Three alternatives: 1. Actual data record (with key value k) 2. <k, rid of matching data record> 3. <k, list of rids of matching data records> Choice is orthogonal to the indexing technique. Examples of indexing techniques: B+ trees, hashbased structures, R trees, Typically, index contains auxiliary information that directs searches to the desired data entries Can have multiple (different) indexes per file. E.g. file sorted by age, with a hash index on salary and a B+tree index on name. 24

23 Alternatives for Data Entries (Contd.) Alternative 1: Actual data record (with key value k) If this is used, index structure is a file organization for data records (like Heap files or sorted files). At most one index on a given collection of data records can use Alternative 1. This alternative saves pointer lookups but can be expensive to maintain with insertions and deletions. 25

24 Alternatives for Data Entries (Contd.) Alternative 2 <k, rid of matching data record> and Alternative 3 <k, list of rids of matching data records> Easier to maintain than Alt 1. If more than one index is required on a given file, at most one index can use Alternative 1; rest must use Alternatives 2 or 3. Alternative 3 more compact than Alternative 2, but leads to variable sized data entries even if search keys are of fixed length. Even worse, for large rid lists the data entry would have to span multiple blocks! 26

25 Index Classification Clustered vs. unclustered: If order of data records is the same as, or `close to, order of index data entries, then called clustered index. A file can be clustered on at most one search key. Cost of retrieving data records through index varies greatly based on whether index is clustered or not! Alternative 1 implies clustered, but not vice-versa. 27

26 Clustered vs. Unclustered Index Suppose that Alternative (2) is used for data entries, and that the data records are stored in a Heap file. To build clustered index, first sort the Heap file (with some free space on each block for future inserts). Overflow blocks may be needed for inserts. (Thus, order of data recs is `close to, but not identical to, the sort order.) CLUSTERED Index entries direct search for data entries UNCLUSTERED Data entries Data entries (Index File) (Data file) Data Records Data Records 28

27 Unclustered vs. Clustered Indexes What are the tradeoffs???? Clustered Pros Efficient for range searches Clustered Cons Expensive to maintain (on the fly or sloppy with reorganization)

28 B+ Tree is a balanced tree structure Each node in the tree occupies a page Entries in non-leave nodes à called index entries: <key value, page_id> Entries in leaf nodes à called data entries: <key value, RID> OR <key value, list of RID> OR <key value, data record> Root * 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* Search for 5*, 15*, or all data entries >= 24* 32

29 Insert 8* Go to the correct leave Do recursively: If non-full then else insert the entry split and copy/push up the middle key to the parent node Root * 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 8* 33

30 Insert 8* Go to the correct leave Do recursively: If non-full then else insert the entry split and copy/push up the middle key to the parent node Root * 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 34

31 Insert 8* Go to the correct leave Do recursively: If non-full then else insert the entry split and copy/push up the middle key to the parent node Root 17 Note the difference between copy up and push up * 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 35

32 Delete 19* and 20* Go to the correct leave and delete the entry If not at least half full then redistribute with the sibling; if the sibling doesn t have enough entries then merge with the sibling; Root Keep each page at least half full except the root * 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 36

33 19* and 20* deleted now delete 24* Go to the correct leave and delete the entry If not at least half full then redistribute with the sibling; if the sibling doesn t have enough entries then merge with the sibling; Root note the copy up of middle key * 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39* 37

34 24* deleted Merge with sibling Root 17 note the deletion of key * 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39* Merge could cause re-distribution or merge of ancestor nodes Root * 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39* 38

35 Rethink the cost of accessing all the records with index key 24 If the records are in many different pages à high cost L Clustered Index: the real data records are stored in an order close to the order of data entries in the index. Root * 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 39

36 Hash-based Index use hash function to look for the data entries Hash function H(key) outputs an integer h(key) = (a * key + b) usually works well. Static hash index uses N primary pages Data entries are stored at the page H(key) mod N If a primary page is full, add an overflow page h(key) mod N key h 0 1 Problem: too many overflow pages N-1 Primary bucket pages Overflow pages 40

37 Increase the number of buckets when overflow occurs How about simply increase the number of buckets of a static hash index? requires read and write all the pages of the index! Can we only split the overflowed bucket instead of all of them? 41

38 Extendible hashing Use an directory one entry for each bucket, which points to the primary page of the bucket If a bucket is overflowed split it into two double the directory if needed 42

39 insert h(key) = 20 (10100) 4*" 12*"32*"16*" Bucket A" 32*"16*" Bucket A" 00" 01" 10" 11" 1*" 5*" 21*"13*" Bucket B" 10*" Bucket C" 000" 001" 010" 011" 1*" 5*" 21*"13*" 10*" Bucket B" Bucket C" 100" 15*"7*" 19*" Bucket D" 101" 110" 15*"7*" 19*" Bucket D" 4*" 12*" 20*" Bucket A2" (`split image'" of Bucket A)" 111" 4*" 12*" 20*" '" Bucket A2"

40 insert h(key) = 20 (10100) 4*" 12*"32*"16*" Bucket A" 0" 00" 0" 01" 0" 10" 0" 11" 1" 00" 1" 01" 1" 10" 1" 11" 1*" 5*" 21*"13*" Bucket B" 10*" 15*"7*" 19*" 4*" 12*" 20*" Bucket C" Bucket D" Bucket A2" (`split image'" of Bucket A)"

41 Summary Index can speed up search by values B+ Tree index is good for range search maintain balance on insert/delete Hash index is good for equality search Static hashing suffers from long overflow chains Extendible hashing avoids bucket overflow by doubling the directory Linear hashing avoids directory by splitting buckets round-robin, and using overflow pages. 48

Storage in Database Systems. CMPSCI 445 Fall 2010

Storage in Database Systems. CMPSCI 445 Fall 2010 Storage in Database Systems CMPSCI 445 Fall 2010 1 Storage Topics Architecture and Overview Disks Buffer management Files of records 2 DBMS Architecture Query Parser Query Rewriter Query Optimizer Query

More information

Overview of Storage and Indexing

Overview of Storage and Indexing Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnan

More information

Storing Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7

Storing Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7 Storing : Disks and Files Chapter 7 Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet base Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Disks and

More information

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing Chapter 13 Disk Storage, Basic File Structures, and Hashing Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1 Slide 13-1 Chapter 13 Disk Storage, Basic File Structures, and Hashing Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and Extendible

More information

Chapter 13. Disk Storage, Basic File Structures, and Hashing

Chapter 13. Disk Storage, Basic File Structures, and Hashing Chapter 13 Disk Storage, Basic File Structures, and Hashing Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and Extendible Hashing

More information

Overview of Storage and Indexing. Data on External Storage. Alternative File Organizations. Chapter 8

Overview of Storage and Indexing. Data on External Storage. Alternative File Organizations. Chapter 8 Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnan

More information

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Chapter 13 Disk Storage, Basic File Structures, and Hashing. Chapter 13 Disk Storage, Basic File Structures, and Hashing. Copyright 2004 Pearson Education, Inc. Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files

More information

Physical Data Organization

Physical Data Organization Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor

More information

6. Storage and File Structures

6. Storage and File Structures ECS-165A WQ 11 110 6. Storage and File Structures Goals Understand the basic concepts underlying different storage media, buffer management, files structures, and organization of records in files. Contents

More information

Chapter 8: Structures for Files. Truong Quynh Chi tqchi@cse.hcmut.edu.vn. Spring- 2013

Chapter 8: Structures for Files. Truong Quynh Chi tqchi@cse.hcmut.edu.vn. Spring- 2013 Chapter 8: Data Storage, Indexing Structures for Files Truong Quynh Chi tqchi@cse.hcmut.edu.vn Spring- 2013 Overview of Database Design Process 2 Outline Data Storage Disk Storage Devices Files of Records

More information

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium. Chapter 4: Record Storage and Primary File Organization 1 Record Storage and Primary File Organization INTRODUCTION The collection of data that makes up a computerized database must be stored physically

More information

Storage and File Structure

Storage and File Structure Storage and File Structure Chapter 10: Storage and File Structure Overview of Physical Storage Media Magnetic Disks RAID Tertiary Storage Storage Access File Organization Organization of Records in Files

More information

Record Storage and Primary File Organization

Record Storage and Primary File Organization Record Storage and Primary File Organization 1 C H A P T E R 4 Contents Introduction Secondary Storage Devices Buffering of Blocks Placing File Records on Disk Operations on Files Files of Unordered Records

More information

DATABASE DESIGN - 1DL400

DATABASE DESIGN - 1DL400 DATABASE DESIGN - 1DL400 Spring 2015 A course on modern database systems!! http://www.it.uu.se/research/group/udbl/kurser/dbii_vt15/ Kjell Orsborn! Uppsala Database Laboratory! Department of Information

More information

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers B+ Tree and Hashing B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers B+ Tree Properties Balanced Tree Same height for paths

More information

Databases and Information Systems 1 Part 3: Storage Structures and Indices

Databases and Information Systems 1 Part 3: Storage Structures and Indices bases and Information Systems 1 Part 3: Storage Structures and Indices Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für Informatik Universität Paderborn WS 2009 / 2010 Contents: - database buffer -

More information

Storing Data: Disks and Files

Storing Data: Disks and Files Storing Data: Disks and Files (From Chapter 9 of textbook) Storing and Retrieving Data Database Management Systems need to: Store large volumes of data Store data reliably (so that data is not lost!) Retrieve

More information

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13 External Sorting Chapter 13 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Sort? A classic problem in computer science! Data requested in sorted order e.g., find students in increasing

More information

The Classical Architecture. Storage 1 / 36

The Classical Architecture. Storage 1 / 36 1 / 36 The Problem Application Data? Filesystem Logical Drive Physical Drive 2 / 36 Requirements There are different classes of requirements: Data Independence application is shielded from physical storage

More information

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Sort? A classic problem in computer science! Data requested in sorted order e.g., find students in increasing

More information

CHAPTER 13: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING

CHAPTER 13: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING Chapter 13: Disk Storage, Basic File Structures, and Hashing 1 CHAPTER 13: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING Answers to Selected Exercises 13.23 Consider a disk with the following characteristics

More information

Tables so far. set() get() delete() BST Average O(lg n) O(lg n) O(lg n) Worst O(n) O(n) O(n) RB Tree Average O(lg n) O(lg n) O(lg n)

Tables so far. set() get() delete() BST Average O(lg n) O(lg n) O(lg n) Worst O(n) O(n) O(n) RB Tree Average O(lg n) O(lg n) O(lg n) Hash Tables Tables so far set() get() delete() BST Average O(lg n) O(lg n) O(lg n) Worst O(n) O(n) O(n) RB Tree Average O(lg n) O(lg n) O(lg n) Worst O(lg n) O(lg n) O(lg n) Table naïve array implementation

More information

Big Data and Scripting. Part 4: Memory Hierarchies

Big Data and Scripting. Part 4: Memory Hierarchies 1, Big Data and Scripting Part 4: Memory Hierarchies 2, Model and Definitions memory size: M machine words total storage (on disk) of N elements (N is very large) disk size unlimited (for our considerations)

More information

Operating Systems CSE 410, Spring 2004. File Management. Stephen Wagner Michigan State University

Operating Systems CSE 410, Spring 2004. File Management. Stephen Wagner Michigan State University Operating Systems CSE 410, Spring 2004 File Management Stephen Wagner Michigan State University File Management File management system has traditionally been considered part of the operating system. Applications

More information

Data Warehousing und Data Mining

Data Warehousing und Data Mining Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data

More information

Database Systems. Session 8 Main Theme. Physical Database Design, Query Execution Concepts and Database Programming Techniques

Database Systems. Session 8 Main Theme. Physical Database Design, Query Execution Concepts and Database Programming Techniques Database Systems Session 8 Main Theme Physical Database Design, Query Execution Concepts and Database Programming Techniques Dr. Jean-Claude Franchitti New York University Computer Science Department Courant

More information

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C Tutorial#1 Q 1:- Explain the terms data, elementary item, entity, primary key, domain, attribute and information? Also give examples in support of your answer? Q 2:- What is a Data Type? Differentiate

More information

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen LECTURE 14: DATA STORAGE AND REPRESENTATION Data Storage Memory Hierarchy Disks Fields, Records, Blocks Variable-length

More information

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles B-Trees Algorithms and data structures for external memory as opposed to the main memory B-Trees Previous Lectures Height balanced binary search trees: AVL trees, red-black trees. Multiway search trees:

More information

Chapter 13: Query Processing. Basic Steps in Query Processing

Chapter 13: Query Processing. Basic Steps in Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Query Processing C H A P T E R12. Practice Exercises

Query Processing C H A P T E R12. Practice Exercises C H A P T E R12 Query Processing Practice Exercises 12.1 Assume (for simplicity in this exercise) that only one tuple fits in a block and memory holds at most 3 blocks. Show the runs created on each pass

More information

DATABASDESIGN FÖR INGENJÖRER - 1DL124

DATABASDESIGN FÖR INGENJÖRER - 1DL124 1 DATABASDESIGN FÖR INGENJÖRER - 1DL124 Sommar 2005 En introduktionskurs i databassystem http://user.it.uu.se/~udbl/dbt-sommar05/ alt. http://www.it.uu.se/edu/course/homepage/dbdesign/st05/ Kjell Orsborn

More information

File Management. Chapter 12

File Management. Chapter 12 Chapter 12 File Management File is the basic element of most of the applications, since the input to an application, as well as its output, is usually a file. They also typically outlive the execution

More information

CIS 631 Database Management Systems Sample Final Exam

CIS 631 Database Management Systems Sample Final Exam CIS 631 Database Management Systems Sample Final Exam 1. (25 points) Match the items from the left column with those in the right and place the letters in the empty slots. k 1. Single-level index files

More information

Principles of Database Management Systems. Overview. Principles of Data Layout. Topic for today. "Executive Summary": here.

Principles of Database Management Systems. Overview. Principles of Data Layout. Topic for today. Executive Summary: here. Topic for today Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom) How to represent data on disk

More information

File Management Chapters 10, 11, 12

File Management Chapters 10, 11, 12 File Management Chapters 10, 11, 12 Requirements For long-term storage: possible to store large amount of info. info must survive termination of processes multiple processes must be able to access concurrently

More information

Data storage Tree indexes

Data storage Tree indexes Data storage Tree indexes Rasmus Pagh February 7 lecture 1 Access paths For many database queries and updates, only a small fraction of the data needs to be accessed. Extreme examples are looking or updating

More information

Chapter 10: Storage and File Structure

Chapter 10: Storage and File Structure Chapter 10: Storage and File Structure Overview of Physical Storage Media Magnetic Disks RAID Tertiary Storage Storage Access File Organization Organization of Records in Files Data-Dictionary Storage

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture I: Storage Storage Part I of this course Uni Freiburg, WS 2014/15 Systems Infrastructure for Data Science 3 The

More information

Review of Hashing: Integer Keys

Review of Hashing: Integer Keys CSE 326 Lecture 13: Much ado about Hashing Today s munchies to munch on: Review of Hashing Collision Resolution by: Separate Chaining Open Addressing $ Linear/Quadratic Probing $ Double Hashing Rehashing

More information

DATA STRUCTURES USING C

DATA STRUCTURES USING C DATA STRUCTURES USING C QUESTION BANK UNIT I 1. Define data. 2. Define Entity. 3. Define information. 4. Define Array. 5. Define data structure. 6. Give any two applications of data structures. 7. Give

More information

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees B-Trees Algorithms and data structures for external memory as opposed to the main memory B-Trees Previous Lectures Height balanced binary search trees: AVL trees, red-black trees. Multiway search trees:

More information

1 File Management. 1.1 Naming. COMP 242 Class Notes Section 6: File Management

1 File Management. 1.1 Naming. COMP 242 Class Notes Section 6: File Management COMP 242 Class Notes Section 6: File Management 1 File Management We shall now examine how an operating system provides file management. We shall define a file to be a collection of permanent data with

More information

COS 318: Operating Systems. File Layout and Directories. Topics. File System Components. Steps to Open A File

COS 318: Operating Systems. File Layout and Directories. Topics. File System Components. Steps to Open A File Topics COS 318: Operating Systems File Layout and Directories File system structure Disk allocation and i-nodes Directory and link implementations Physical layout for performance 2 File System Components

More information

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao CMPSCI 445 Midterm Practice Questions NAME: LOGIN: Write all of your answers directly on this paper. Be sure to clearly

More information

Unit 4.3 - Storage Structures 1. Storage Structures. Unit 4.3

Unit 4.3 - Storage Structures 1. Storage Structures. Unit 4.3 Storage Structures Unit 4.3 Unit 4.3 - Storage Structures 1 The Physical Store Storage Capacity Medium Transfer Rate Seek Time Main Memory 800 MB/s 500 MB Instant Hard Drive 10 MB/s 120 GB 10 ms CD-ROM

More information

Database 2 Lecture I. Alessandro Artale

Database 2 Lecture I. Alessandro Artale Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (1) Database 2 Lecture I Alessandro Artale Faculty of Computer Science Free University of Bolzano Room: 221 artale@inf.unibz.it http://www.inf.unibz.it/

More information

The Database is Slow

The Database is Slow The Database is Slow SQL Server Performance Tuning Starter Kit Calgary PASS Chapter, 19 August 2015 Randolph West, Born SQL Email: r@ndolph.ca Twitter: @rabryst Basic Internals Data File Transaction Log

More information

File System Management

File System Management Lecture 7: Storage Management File System Management Contents Non volatile memory Tape, HDD, SSD Files & File System Interface Directories & their Organization File System Implementation Disk Space Allocation

More information

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2) Sorting revisited How did we use a binary search tree to sort an array of elements? Tree Sort Algorithm Given: An array of elements to sort 1. Build a binary search tree out of the elements 2. Traverse

More information

FAWN - a Fast Array of Wimpy Nodes

FAWN - a Fast Array of Wimpy Nodes University of Warsaw January 12, 2011 Outline Introduction 1 Introduction 2 3 4 5 Key issues Introduction Growing CPU vs. I/O gap Contemporary systems must serve millions of users Electricity consumed

More information

Lecture 2 February 12, 2003

Lecture 2 February 12, 2003 6.897: Advanced Data Structures Spring 003 Prof. Erik Demaine Lecture February, 003 Scribe: Jeff Lindy Overview In the last lecture we considered the successor problem for a bounded universe of size u.

More information

COS 318: Operating Systems

COS 318: Operating Systems COS 318: Operating Systems File Performance and Reliability Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall10/cos318/ Topics File buffer cache

More information

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++ Answer the following 1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++ 2) Which data structure is needed to convert infix notations to postfix notations? Stack 3) The

More information

Secondary Storage. Any modern computer system will incorporate (at least) two levels of storage: magnetic disk/optical devices/tape systems

Secondary Storage. Any modern computer system will incorporate (at least) two levels of storage: magnetic disk/optical devices/tape systems 1 Any modern computer system will incorporate (at least) two levels of storage: primary storage: typical capacity cost per MB $3. typical access time burst transfer rate?? secondary storage: typical capacity

More information

& Data Processing 2. Exercise 2: File Systems. Dipl.-Ing. Bogdan Marin. Universität Duisburg-Essen

& Data Processing 2. Exercise 2: File Systems. Dipl.-Ing. Bogdan Marin. Universität Duisburg-Essen Folie a: Name & Data Processing 2 2: File Systems Dipl.-Ing. Bogdan Marin Fakultät für Ingenieurwissenschaften Abteilung Elektro-und Informationstechnik -Technische Informatik- Objectives File System Concept

More information

Database Management Systems

Database Management Systems 4411 Database Management Systems Acknowledgements and copyrights: these slides are a result of combination of notes and slides with contributions from: Michael Kiffer, Arthur Bernstein, Philip Lewis, Anestis

More information

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs. Phases of database design Application requirements Conceptual design Database Management Systems Conceptual schema Logical design ER or UML Physical Design Relational tables Logical schema Physical design

More information

Chapter 1 File Organization 1.0 OBJECTIVES 1.1 INTRODUCTION 1.2 STORAGE DEVICES CHARACTERISTICS

Chapter 1 File Organization 1.0 OBJECTIVES 1.1 INTRODUCTION 1.2 STORAGE DEVICES CHARACTERISTICS Chapter 1 File Organization 1.0 Objectives 1.1 Introduction 1.2 Storage Devices Characteristics 1.3 File Organization 1.3.1 Sequential Files 1.3.2 Indexing and Methods of Indexing 1.3.3 Hash Files 1.4

More information

Raima Database Manager Version 14.0 In-memory Database Engine

Raima Database Manager Version 14.0 In-memory Database Engine + Raima Database Manager Version 14.0 In-memory Database Engine By Jeffrey R. Parsons, Senior Engineer January 2016 Abstract Raima Database Manager (RDM) v14.0 contains an all new data storage engine optimized

More information

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University. (http://www.cs.princeton.edu/courses/cos318/)

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University. (http://www.cs.princeton.edu/courses/cos318/) COS 318: Operating Systems Storage Devices Kai Li Computer Science Department Princeton University (http://www.cs.princeton.edu/courses/cos318/) Today s Topics Magnetic disks Magnetic disk performance

More information

Bigdata High Availability (HA) Architecture

Bigdata High Availability (HA) Architecture Bigdata High Availability (HA) Architecture Introduction This whitepaper describes an HA architecture based on a shared nothing design. Each node uses commodity hardware and has its own local resources

More information

Chapter 12 File Management

Chapter 12 File Management Operating Systems: Internals and Design Principles Chapter 12 File Management Eighth Edition By William Stallings Files Data collections created by users The File System is one of the most important parts

More information

Project Group High- performance Flexible File System 2010 / 2011

Project Group High- performance Flexible File System 2010 / 2011 Project Group High- performance Flexible File System 2010 / 2011 Lecture 1 File Systems André Brinkmann Task Use disk drives to store huge amounts of data Files as logical resources A file can contain

More information

Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A

Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A Introduction to IR Systems: Supporting Boolean Text Search Chapter 27, Part A Database Management Systems, R. Ramakrishnan 1 Information Retrieval A research field traditionally separate from Databases

More information

University of Dublin Trinity College. Storage Hardware. Owen.Conlan@cs.tcd.ie

University of Dublin Trinity College. Storage Hardware. Owen.Conlan@cs.tcd.ie University of Dublin Trinity College Storage Hardware Owen.Conlan@cs.tcd.ie Hardware Issues Hard Disk/SSD CPU Cache Main Memory CD ROM/RW DVD ROM/RW Tapes Primary Storage Floppy Disk/ Memory Stick Secondary

More information

CSE 326: Data Structures B-Trees and B+ Trees

CSE 326: Data Structures B-Trees and B+ Trees Announcements (4//08) CSE 26: Data Structures B-Trees and B+ Trees Brian Curless Spring 2008 Midterm on Friday Special office hour: 4:-5: Thursday in Jaech Gallery (6 th floor of CSE building) This is

More information

Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.

Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m. Universal hashing No matter how we choose our hash function, it is always possible to devise a set of keys that will hash to the same slot, making the hash scheme perform poorly. To circumvent this, we

More information

File Management. Chapter 12

File Management. Chapter 12 File Management Chapter 12 File Management File management system is considered part of the operating system Input to applications is by means of a file Output is saved in a file for long-term storage

More information

SMALL INDEX LARGE INDEX (SILT)

SMALL INDEX LARGE INDEX (SILT) Wayne State University ECE 7650: Scalable and Secure Internet Services and Architecture SMALL INDEX LARGE INDEX (SILT) A Memory Efficient High Performance Key Value Store QA REPORT Instructor: Dr. Song

More information

Chapter 12 File Management

Chapter 12 File Management Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Roadmap Overview File organisation and Access

More information

Chapter 12 File Management. Roadmap

Chapter 12 File Management. Roadmap Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Overview Roadmap File organisation and Access

More information

Data Structures for Databases

Data Structures for Databases 60 Data Structures for Databases Joachim Hammer University of Florida Markus Schneider University of Florida 60.1 Overview of the Functionality of a Database Management System..................................

More information

COS 318: Operating Systems. Storage Devices. Kai Li and Andy Bavier Computer Science Department Princeton University

COS 318: Operating Systems. Storage Devices. Kai Li and Andy Bavier Computer Science Department Princeton University COS 318: Operating Systems Storage Devices Kai Li and Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall13/cos318/ Today s Topics! Magnetic disks!

More information

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level) Comp 5311 Database Management Systems 16. Review 2 (Physical Level) 1 Main Topics Indexing Join Algorithms Query Processing and Optimization Transactions and Concurrency Control 2 Indexing Used for faster

More information

Binary Heap Algorithms

Binary Heap Algorithms CS Data Structures and Algorithms Lecture Slides Wednesday, April 5, 2009 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks CHAPPELLG@member.ams.org 2005 2009 Glenn G. Chappell

More information

File Systems Management and Examples

File Systems Management and Examples File Systems Management and Examples Today! Efficiency, performance, recovery! Examples Next! Distributed systems Disk space management! Once decided to store a file as sequence of blocks What s the size

More information

10CS35: Data Structures Using C

10CS35: Data Structures Using C CS35: Data Structures Using C QUESTION BANK REVIEW OF STRUCTURES AND POINTERS, INTRODUCTION TO SPECIAL FEATURES OF C OBJECTIVE: Learn : Usage of structures, unions - a conventional tool for handling a

More information

Concurrency Control. Chapter 17. Comp 521 Files and Databases Fall 2010 1

Concurrency Control. Chapter 17. Comp 521 Files and Databases Fall 2010 1 Concurrency Control Chapter 17 Comp 521 Files and Databases Fall 2010 1 Conflict Serializable Schedules Recall conflicts (WR, RW, WW) were the cause of sequential inconsistency Two schedules are conflict

More information

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92. Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure

More information

Vector storage and access; algorithms in GIS. This is lecture 6

Vector storage and access; algorithms in GIS. This is lecture 6 Vector storage and access; algorithms in GIS This is lecture 6 Vector data storage and access Vectors are built from points, line and areas. (x,y) Surface: (x,y,z) Vector data access Access to vector

More information

Physical DB design and tuning: outline

Physical DB design and tuning: outline Physical DB design and tuning: outline Designing the Physical Database Schema Tables, indexes, logical schema Database Tuning Index Tuning Query Tuning Transaction Tuning Logical Schema Tuning DBMS Tuning

More information

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System CS341: Operating System Lect 36: 1 st Nov 2014 Dr. A. Sahu Dept of Comp. Sc. & Engg. Indian Institute of Technology Guwahati File System & Device Drive Mass Storage Disk Structure Disk Arm Scheduling RAID

More information

Multi-dimensional index structures Part I: motivation

Multi-dimensional index structures Part I: motivation Multi-dimensional index structures Part I: motivation 144 Motivation: Data Warehouse A definition A data warehouse is a repository of integrated enterprise data. A data warehouse is used specifically for

More information

Architecture and Implementation of Database Management Systems

Architecture and Implementation of Database Management Systems Architecture and Implementation of Database Management Systems Prof. Dr. Marc H. Scholl Summer 2004 University of Konstanz, Dept. of Computer & Information Science www.inf.uni-konstanz.de/dbis/teaching/ss04/architektur-von-dbms

More information

Memory Allocation. Static Allocation. Dynamic Allocation. Memory Management. Dynamic Allocation. Dynamic Storage Allocation

Memory Allocation. Static Allocation. Dynamic Allocation. Memory Management. Dynamic Allocation. Dynamic Storage Allocation Dynamic Storage Allocation CS 44 Operating Systems Fall 5 Presented By Vibha Prasad Memory Allocation Static Allocation (fixed in size) Sometimes we create data structures that are fixed and don t need

More information

Answer Key. UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division

Answer Key. UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division Answer Key UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division CS186 Fall 2003 Eben Haber Midterm Midterm Exam: Introduction to Database Systems This exam has

More information

Binary Heaps. CSE 373 Data Structures

Binary Heaps. CSE 373 Data Structures Binary Heaps CSE Data Structures Readings Chapter Section. Binary Heaps BST implementation of a Priority Queue Worst case (degenerate tree) FindMin, DeleteMin and Insert (k) are all O(n) Best case (completely

More information

Linux Driver Devices. Why, When, Which, How?

Linux Driver Devices. Why, When, Which, How? Bertrand Mermet Sylvain Ract Linux Driver Devices. Why, When, Which, How? Since its creation in the early 1990 s Linux has been installed on millions of computers or embedded systems. These systems may

More information

SQL Query Evaluation. Winter 2006-2007 Lecture 23

SQL Query Evaluation. Winter 2006-2007 Lecture 23 SQL Query Evaluation Winter 2006-2007 Lecture 23 SQL Query Processing Databases go through three steps: Parse SQL into an execution plan Optimize the execution plan Evaluate the optimized plan Execution

More information

2) What is the structure of an organization? Explain how IT support at different organizational levels.

2) What is the structure of an organization? Explain how IT support at different organizational levels. (PGDIT 01) Paper - I : BASICS OF INFORMATION TECHNOLOGY 1) What is an information technology? Why you need to know about IT. 2) What is the structure of an organization? Explain how IT support at different

More information

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems Chapter 13 File and Database Systems Outline 13.1 Introduction 13.2 Data Hierarchy 13.3 Files 13.4 File Systems 13.4.1 Directories 13.4. Metadata 13.4. Mounting 13.5 File Organization 13.6 File Allocation

More information

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems Chapter 13 File and Database Systems Outline 13.1 Introduction 13.2 Data Hierarchy 13.3 Files 13.4 File Systems 13.4.1 Directories 13.4. Metadata 13.4. Mounting 13.5 File Organization 13.6 File Allocation

More information

Buffer Management 5. Buffer Management

Buffer Management 5. Buffer Management 5 Buffer Management Copyright 2004, Binnur Kurt A journey of a byte Buffer Management Content 156 A journey of a byte Suppose in our program we wrote: outfile

More information

Normalisation and Data Storage Devices

Normalisation and Data Storage Devices Unit 4 Normalisation and Data Storage Devices Structure 4.1 Introduction 4.2 Functional Dependency 4.3 Normalisation 4.3.1 Why do we Normalize a Relation? 4.3.2 Second Normal Form Relation 4.3.3 Third

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 11 Memory Management Computer Architecture Part 11 page 1 of 44 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin

More information

Lecture 16: Storage Devices

Lecture 16: Storage Devices CS 422/522 Design & Implementation of Operating Systems Lecture 16: Storage Devices Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken from previous versions of

More information

Chapter 11 I/O Management and Disk Scheduling

Chapter 11 I/O Management and Disk Scheduling Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 11 I/O Management and Disk Scheduling Dave Bremer Otago Polytechnic, NZ 2008, Prentice Hall I/O Devices Roadmap Organization

More information