Indexing. The Problem

Size: px
Start display at page:

Download "Indexing. The Problem"

Transcription

1 Indexing Topics The Problem. Terminology. The Design Space. Fixed and Variable Length Records. B-trees Hashing Learning Objectives: Describe the problem we face in creating keys atop a conventional disk-based storage system. Design a scheme to implement records on top of files. Explain what B-trees are. Distinguish the different B-tree variants and explain the trade-offs among them. Explain hashing as applied to database indexes. 2/2/10 1 The Problem We want to store key/data pairs (aka records). Today s file systems store files. Old IBM file systems were record-based. Operating system/file system knew about record formats, keys, etc. In fact, some disks knew about keyed structure and lookup. Take CS261 to learn more about this. Need to flexibly and efficiently store records in files. Efficient in terms of space. Efficient in terms of look-up time. Flexible in terms of record size (fixed and variable). Flexible in terms of the number of indices, types of indices, etc. 2/2/10 2

2 Isn t this a solved problem? There are lots of data structures like trees and things with gazillions of algorithms that operate on them efficiently. Why don t these algorithms and data structures translate directly into disk-space structures? Pointers work nicely in main memory -- how do you represent pointers in main memory? Data structures can be arbitrarily sized, but disk blocks are fixed size (and are larger than many objects). Files typically only grow at the end -- they don t support insert into the middle. 2/2/10 3 Storage Technology There are a variety of media on which to store data. Each type has a different set of tradeoffs. Tradeoffs change over time. Important to understand the principles and roles rather than particular implementations. What would an ideal storage technology look like? Infinitely fast: add no additional latency to the cost of the processor operations. 100% reliable. 100% available. Infinitely large (or as large as your largest data set). Cheap. 2/2/10 4

3 Sample Storage Technologies RAM Main Memory Speed Stability Writable Cost/bit Random access Very fast Poor Yes High Yes Flash Fast OK Yes Medium/ High Yes Disk Slow OK Yes Mid Yes (but slower) Tape Slow Good Yes Low No 2/2/10 5 Abstracting the Disk The disk interface is clunky: Read and write one or more sectors. A sector is 512 bytes. Not a terribly elegant interface for implementing key/data pairs. Operating systems provides files -- are they much better? Byte stream interface. Position a pointer and read/write some number of bytes to/from that location. Still not terribly easy to do things like keys. Problems: No structure within disk blocks or byte streams. Unit of transfer between disk and file system is pages (1K, 4K, 1M) Placing one object per file breaks nearly all file systems. If you put multiple objects into a file, you still need a way to locate them. 2/2/10 6

4 Terminology Interchangeable terms: key/data pairs, record, tuple Primary Index: Index sort order corresponds to layout Secondary Index: Sort order independent of layout Internal Index: Keys and data both stored in index External Index: Keys and reference to data in index Data is stored elsewhere Meta-data: Data that describes the data or its structure Not user data 2/2/10 7 The Design Space Fixed versus variable length records. Fixed are easier. Fixed are faster. Most data are not fixed length. Usually waste space -- have to allocate enough space for largest objects (if you don t allocate enough space, what do you do when you get a big object?) Internal or external indices External indices separate indexing from the data. Internal indices make consistency easier. Internal indices provide clustering. 2/2/10 8

5 B-tree versus Hash Indexing High order difference: ordered versus unordered In-theory hash indices require fewer disk accesses. In practice, this is often not the case: Practically all database systems maintain a cache. The cache should be large enough to hold all the internal nodes of a B-tree. If so, internal nodes are memory accesses, not disk accesses. Items in hash buckets are not necessarily sorted. If there are many items per bucket, locating an item in a bucket can be expensive. Since B-trees must be ordered, searching them may be faster. Accesses are rarely random, so B-tree clustering is often a win. When are hash tables a big win? Database is huge (internal pages do not fit in memory) 2/2/10 Access is really random. 9 Implementation: Fixed Length Records (Naïve Approach) Allocate records contiguously, one right after the other. Let s be the record size. To access record n, compute s * n, seek to that offset, and read s bytes. Record 0 Record 1 Record 2 Record 3 Record 4 s bytes Offset = s * n n 2/2/10 10

6 Naïve Approach: Problems Obvious, simple, and wrong Record 0 Record 1 Record 2 Record 3 Record 4 Disk Block Other Problems? How do you delete a record? How do you add a record between record 2 and 3? 2/2/10 11 Fixed Length Records: Take 2 Add meta-data Let s add 1 bit in each record that indicates if the record is present/not present: Record 0 Record 1 Record 2 Record 3 Record 4 Pros/Cons? + Can delete records + Can reuse space - Difficult to find free space - Records still span pages 2/2/10 12

7 Fixed Length Records: Take 3 Add more meta-data: a header page Header points to the beginning of a free list Each deleted record points to the next deleted record. Header Record 0 Record 1 Record 2 Record 3 Record 4 Pros/Cons? + Can now find free space easily - Records still span pages 2/2/10 13 Fixed Length Records: Take 4 Put records on pages explicitly Calculate how many records fit on a page. Call this f (fill factor) To find record n, compute P(age) = n / f; O(ffset) = n % f Keep free space management from previous approach. Header Record 0 Record 1 Record 2 Record 3 Pros/Cons? +Records still span pages - Wasted space called internal fragmentation 2/2/10 14

8 Variable Length Records Now assume records have different lengths. Problem: Can no longer computer a fill factor Must have some kind of meta-data Per-record length End of record delimiter Directory Record lengths of delimiters Abandon record numbers and assume records are identified by a page number and offset. Before each record, include its length, OR After each record, place a special symbol 2/2/10 15 Lengths and Delimiters (1) Record P/0 100 Record P/2 Record P/1 Record P/0 Record P/1 Record P/2 2/2/10 16

9 Lengths and Delimiters (2) Pros/Cons? How do you read backwards within a page (lengths)? How do you delete? How do you insert? Add meta-data back in (from fixed length records) Add deleted bit Add header and freelist Chain empty records together Why is this harder than the fixed length case? Performing dynamic memory allocation Pick an algorithm: first fit, best fit, etc. 2/2/10 May have to coalesce space 17 Variable Length: Take 2 Use the page/offset identification scheme. Place a directory at the top of each page Directory points to where records begin Grow directory from the top Allocate record space from the bottom Can expand header to include things like deleted bits off 1 off 2... off f record f records 3 f-1 record 2 record 1 2/2/10 18

10 A note about objects How do objects fit into this? Objects are just variable sized records. Objects may contain references to other objects. Must translate these references between persistent form and memory representation; called swizzling. On-disk might use the page/offset record number In memory probably want an actual pointer It is this translation that is called swizzling Some objects are large (greater than a page). Previous designs assumed that objects fit on a single page; life gets more complicated when this is not the case. 2/2/10 19 B-trees: Balanced Trees B-trees were designed to balance the time taken to retrieve a page from disk and the time to search within a page. We will build a tree from nodes, where nodes correspond to disk pages (a few KB). Each (internal) node stores N keys and N + 1 pointers to other nodes. On leaves, keys can be paired with their data (internal index) or they can contain record numbers (external index). 2/2/10 20

11 B-Tree Diagram mouse eagle koala rat tiger bat emu lemur muskrat rhino vole cat frog llama ostrich shrimp whale dog goat mite parrot 2/2/10 21 B-tree vs B+tree B-tree: both leaves and internal pages contain data. B+tree: all data lives at the leaves. What are the trade-offs between B-trees and B+trees? B-tree: no duplication of keys B+tree: All data at the leaves; iterating over keys is easier. B+tree: Internal nodes more compact (better fanout). B-tree: Some lookups are faster 2/2/10 22

12 Maintaining your Tree Splits What do you do when you are trying to insert and an item doesn t fit? Split the page in half; pick a key that distinguishes the pages and insert it into the parent page. What happens if that key doesn t fit in the parent? Split the parent potentially recursive up to the root. Reverse splits (merges, coalescing) On delete, you might empty a page. Coalesce the page with its sibling and remove a key from the parent. Like splits, reverse splits can propagate to root. 2/2/10 23 Other B-Tree Variants B-link: Leaf pages are linked together to provide fast sequential scan. Almost everyone does this. Straight forward until you introduce cursors and concurrency (stay-tuned). B*: All nodes are kept 2/3 full (by redistributing keys). Splitting becomes more complicated because you may have to move keys among siblings to maintain the 2/3 property. Reverse splits happen before pages become empty must consider a coalescing operation whenever you drop below 2/3 full. 2/2/10 24

13 Practical Considerations (1) Key lengths may vary, so you may not be able to maintain the same number of keys (and therefore pointers per page). What are the implications of this? Nodes must map to pages (either disk or file). Pointers are therefore page numbers. How do you handle keys (or data) larger than a page? What is the minimum number of keys you must have on a page? What if they don t fit? Key compression If you store entire keys, you may be storing a lot of repeated data (e.g., misdemeanor, misplace, mistake, etc). Store the minimum difference keys instead (store mis once and store the suffixes as keys). 2/2/10 25 Practical Considerations (2) What if you have multiple data items for the same key? Do you allow it? How do you store them? What if you have so many duplicates, you have to split the page on which they reside what key do you promote? Standard solutions: Disallow Store a few duplicates on a node, but if you get too many, create a special duplicate page (perhaps even an entire tree). Store multiple data items as one (encoded) data item. 2/2/10 26

14 A Cute Hack Sometimes you want real record numbers, not page/offset identifiers. And sometimes you want to insert new records between two adjacent records. You can hack B+trees to do that With each pointer, store the number of records that appear beneath that pointer. Can easily find the record with record number n. Facilitates insertion and. 2/2/10 27 Cute Hack: Demo 9 7 mouse eagle koala rat tiger bat cat dog emu frog goat lemur llama mite muskrat ostrich parrot rhino shrimp vole whale 2/2/10 28

15 Cursors They mark a position in the tree (used in iterating over a file). Cannot lose the position in the face of a delete. Subsequent inserts must happen in the right spot. Requires retaining the key value. With multiple deletes and multiple cursors, you have to maintain positioning between cursors c_get elephant delete current where is cursor? insert kangaroo insert eagle cat dog elephant mouse cursor 2/2/10 29 Semi-Digression: Famous Last Words (Thanks to Mark Day) Databases seem so complicated. We can just do this with shared text files. OK then, we ll just write a B-tree package B-tree code seems hard to get right for all the corner cases. No text book exists that actually explains all the intricate details of B-tree manipulation; they present the first 80% in wonderful simplicity. One of the reasons Berkeley DB exists is because B-tree implementations are simply hard. 2/2/10 30

16 Hashing Your index is a collection of buckets (bucket = page) Define a hash function, h, that maps a key to a bucket. Store the corresponding data in that bucket. Collisions Multiple keys hash to the same bucket. Store multiple keys in the same bucket. What do you do when buckets fill? Chaining: link new pages(overflow pages) off the bucket. Open-hashing: look in the next bucket. Chaining versus open-hashing Open-hashing does not support deletion well. 2/2/ Hash Example Assume: H(cat) = 0 H(dog) = 1 H(mouse) = 0 Operations 1. Insert cat 2. Insert dog 3. Insert mouse 4. Delete dog 5. Lookup mouse mouse cat dog mouse 2/2/10 32

17 Static vs Dynamic Hashing Static: number of buckets predefined; never changes. Either, overflow chains grow very long, OR A lot of wasted space in unused buckets. Dynamic: number of buckets changes over time. Hash function must adapt. Usually, start revealing more bits of the hash value as the table grows. 2/2/10 33 Practical Hashing (1) Buckets map to pages. Must be able to directly translate from a bucket number to a page number. Where do you store overflow pages? If number of buckets is fixed (static hashing), store overflow buckets after regular buckets. Use free list to manage overflow buckets. Static hashing isn t very practical for databases. Databases change in size fairly substantially. If you have to preallocate, often waste space. 2/2/10 34

18 Practical Hashing (2) Dynamic hash implementation. Periodically double the size of the database. Rehash every key into new table. Dynamic Linear Hashing (Litwin) Grow table one bucket at a time. Split buckets sequentially; rehash just the splitting bucket. Maintain overflow buckets as necessary. Keep track of max bucket to identify the correct number of bits to consider in the hash value. 2/2/10 35

Lecture 1: Data Storage & Index

Lecture 1: Data Storage & Index Lecture 1: Data Storage & Index R&G Chapter 8-11 Concurrency control Query Execution and Optimization Relational Operators File & Access Methods Buffer Management Disk Space Management Recovery Manager

More information

Physical Data Organization

Physical Data Organization Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor

More information

DATABASE DESIGN - 1DL400

DATABASE DESIGN - 1DL400 DATABASE DESIGN - 1DL400 Spring 2015 A course on modern database systems!! http://www.it.uu.se/research/group/udbl/kurser/dbii_vt15/ Kjell Orsborn! Uppsala Database Laboratory! Department of Information

More information

Operating Systems CSE 410, Spring 2004. File Management. Stephen Wagner Michigan State University

Operating Systems CSE 410, Spring 2004. File Management. Stephen Wagner Michigan State University Operating Systems CSE 410, Spring 2004 File Management Stephen Wagner Michigan State University File Management File management system has traditionally been considered part of the operating system. Applications

More information

Chapter 13. Disk Storage, Basic File Structures, and Hashing

Chapter 13. Disk Storage, Basic File Structures, and Hashing Chapter 13 Disk Storage, Basic File Structures, and Hashing Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and Extendible Hashing

More information

Record Storage and Primary File Organization

Record Storage and Primary File Organization Record Storage and Primary File Organization 1 C H A P T E R 4 Contents Introduction Secondary Storage Devices Buffering of Blocks Placing File Records on Disk Operations on Files Files of Unordered Records

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1 Slide 13-1 Chapter 13 Disk Storage, Basic File Structures, and Hashing Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and Extendible

More information

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Chapter 13 Disk Storage, Basic File Structures, and Hashing. Chapter 13 Disk Storage, Basic File Structures, and Hashing. Copyright 2004 Pearson Education, Inc. Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files

More information

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing Chapter 13 Disk Storage, Basic File Structures, and Hashing Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files

More information

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium. Chapter 4: Record Storage and Primary File Organization 1 Record Storage and Primary File Organization INTRODUCTION The collection of data that makes up a computerized database must be stored physically

More information

6. Storage and File Structures

6. Storage and File Structures ECS-165A WQ 11 110 6. Storage and File Structures Goals Understand the basic concepts underlying different storage media, buffer management, files structures, and organization of records in files. Contents

More information

CSE 326: Data Structures B-Trees and B+ Trees

CSE 326: Data Structures B-Trees and B+ Trees Announcements (4//08) CSE 26: Data Structures B-Trees and B+ Trees Brian Curless Spring 2008 Midterm on Friday Special office hour: 4:-5: Thursday in Jaech Gallery (6 th floor of CSE building) This is

More information

COS 318: Operating Systems. File Layout and Directories. Topics. File System Components. Steps to Open A File

COS 318: Operating Systems. File Layout and Directories. Topics. File System Components. Steps to Open A File Topics COS 318: Operating Systems File Layout and Directories File system structure Disk allocation and i-nodes Directory and link implementations Physical layout for performance 2 File System Components

More information

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees B-Trees Algorithms and data structures for external memory as opposed to the main memory B-Trees Previous Lectures Height balanced binary search trees: AVL trees, red-black trees. Multiway search trees:

More information

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers B+ Tree and Hashing B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers B+ Tree Properties Balanced Tree Same height for paths

More information

Chapter 13: Query Processing. Basic Steps in Query Processing

Chapter 13: Query Processing. Basic Steps in Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Storage in Database Systems. CMPSCI 445 Fall 2010

Storage in Database Systems. CMPSCI 445 Fall 2010 Storage in Database Systems CMPSCI 445 Fall 2010 1 Storage Topics Architecture and Overview Disks Buffer management Files of records 2 DBMS Architecture Query Parser Query Rewriter Query Optimizer Query

More information

Chapter 8: Structures for Files. Truong Quynh Chi tqchi@cse.hcmut.edu.vn. Spring- 2013

Chapter 8: Structures for Files. Truong Quynh Chi tqchi@cse.hcmut.edu.vn. Spring- 2013 Chapter 8: Data Storage, Indexing Structures for Files Truong Quynh Chi tqchi@cse.hcmut.edu.vn Spring- 2013 Overview of Database Design Process 2 Outline Data Storage Disk Storage Devices Files of Records

More information

Overview of Storage and Indexing

Overview of Storage and Indexing Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnan

More information

CHAPTER 17: File Management

CHAPTER 17: File Management CHAPTER 17: File Management The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides

More information

File System Management

File System Management Lecture 7: Storage Management File System Management Contents Non volatile memory Tape, HDD, SSD Files & File System Interface Directories & their Organization File System Implementation Disk Space Allocation

More information

Data storage Tree indexes

Data storage Tree indexes Data storage Tree indexes Rasmus Pagh February 7 lecture 1 Access paths For many database queries and updates, only a small fraction of the data needs to be accessed. Extreme examples are looking or updating

More information

Unit 4.3 - Storage Structures 1. Storage Structures. Unit 4.3

Unit 4.3 - Storage Structures 1. Storage Structures. Unit 4.3 Storage Structures Unit 4.3 Unit 4.3 - Storage Structures 1 The Physical Store Storage Capacity Medium Transfer Rate Seek Time Main Memory 800 MB/s 500 MB Instant Hard Drive 10 MB/s 120 GB 10 ms CD-ROM

More information

File Management. Chapter 12

File Management. Chapter 12 Chapter 12 File Management File is the basic element of most of the applications, since the input to an application, as well as its output, is usually a file. They also typically outlive the execution

More information

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles B-Trees Algorithms and data structures for external memory as opposed to the main memory B-Trees Previous Lectures Height balanced binary search trees: AVL trees, red-black trees. Multiway search trees:

More information

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen LECTURE 14: DATA STORAGE AND REPRESENTATION Data Storage Memory Hierarchy Disks Fields, Records, Blocks Variable-length

More information

Storage Management for Files of Dynamic Records

Storage Management for Files of Dynamic Records Storage Management for Files of Dynamic Records Justin Zobel Department of Computer Science, RMIT, GPO Box 2476V, Melbourne 3001, Australia. jz@cs.rmit.edu.au Alistair Moffat Department of Computer Science

More information

FAWN - a Fast Array of Wimpy Nodes

FAWN - a Fast Array of Wimpy Nodes University of Warsaw January 12, 2011 Outline Introduction 1 Introduction 2 3 4 5 Key issues Introduction Growing CPU vs. I/O gap Contemporary systems must serve millions of users Electricity consumed

More information

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer.

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer. Guest lecturer: David Hovemeyer November 15, 2004 The memory hierarchy Red = Level Access time Capacity Features Registers nanoseconds 100s of bytes fixed Cache nanoseconds 1-2 MB fixed RAM nanoseconds

More information

Raima Database Manager Version 14.0 In-memory Database Engine

Raima Database Manager Version 14.0 In-memory Database Engine + Raima Database Manager Version 14.0 In-memory Database Engine By Jeffrey R. Parsons, Senior Engineer January 2016 Abstract Raima Database Manager (RDM) v14.0 contains an all new data storage engine optimized

More information

1 File Management. 1.1 Naming. COMP 242 Class Notes Section 6: File Management

1 File Management. 1.1 Naming. COMP 242 Class Notes Section 6: File Management COMP 242 Class Notes Section 6: File Management 1 File Management We shall now examine how an operating system provides file management. We shall define a file to be a collection of permanent data with

More information

Databases and Information Systems 1 Part 3: Storage Structures and Indices

Databases and Information Systems 1 Part 3: Storage Structures and Indices bases and Information Systems 1 Part 3: Storage Structures and Indices Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für Informatik Universität Paderborn WS 2009 / 2010 Contents: - database buffer -

More information

Chapter 12 File Management

Chapter 12 File Management Operating Systems: Internals and Design Principles Chapter 12 File Management Eighth Edition By William Stallings Files Data collections created by users The File System is one of the most important parts

More information

Principles of Database Management Systems. Overview. Principles of Data Layout. Topic for today. "Executive Summary": here.

Principles of Database Management Systems. Overview. Principles of Data Layout. Topic for today. Executive Summary: here. Topic for today Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom) How to represent data on disk

More information

SMALL INDEX LARGE INDEX (SILT)

SMALL INDEX LARGE INDEX (SILT) Wayne State University ECE 7650: Scalable and Secure Internet Services and Architecture SMALL INDEX LARGE INDEX (SILT) A Memory Efficient High Performance Key Value Store QA REPORT Instructor: Dr. Song

More information

University of Dublin Trinity College. Storage Hardware. Owen.Conlan@cs.tcd.ie

University of Dublin Trinity College. Storage Hardware. Owen.Conlan@cs.tcd.ie University of Dublin Trinity College Storage Hardware Owen.Conlan@cs.tcd.ie Hardware Issues Hard Disk/SSD CPU Cache Main Memory CD ROM/RW DVD ROM/RW Tapes Primary Storage Floppy Disk/ Memory Stick Secondary

More information

Original-page small file oriented EXT3 file storage system

Original-page small file oriented EXT3 file storage system Original-page small file oriented EXT3 file storage system Zhang Weizhe, Hui He, Zhang Qizhen School of Computer Science and Technology, Harbin Institute of Technology, Harbin E-mail: wzzhang@hit.edu.cn

More information

Review of Hashing: Integer Keys

Review of Hashing: Integer Keys CSE 326 Lecture 13: Much ado about Hashing Today s munchies to munch on: Review of Hashing Collision Resolution by: Separate Chaining Open Addressing $ Linear/Quadratic Probing $ Double Hashing Rehashing

More information

File-System Implementation

File-System Implementation File-System Implementation 11 CHAPTER In this chapter we discuss various methods for storing information on secondary storage. The basic issues are device directory, free space management, and space allocation

More information

Binary Heap Algorithms

Binary Heap Algorithms CS Data Structures and Algorithms Lecture Slides Wednesday, April 5, 2009 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks CHAPPELLG@member.ams.org 2005 2009 Glenn G. Chappell

More information

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C Tutorial#1 Q 1:- Explain the terms data, elementary item, entity, primary key, domain, attribute and information? Also give examples in support of your answer? Q 2:- What is a Data Type? Differentiate

More information

Project Group High- performance Flexible File System 2010 / 2011

Project Group High- performance Flexible File System 2010 / 2011 Project Group High- performance Flexible File System 2010 / 2011 Lecture 1 File Systems André Brinkmann Task Use disk drives to store huge amounts of data Files as logical resources A file can contain

More information

Data Warehousing und Data Mining

Data Warehousing und Data Mining Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data

More information

Heaps & Priority Queues in the C++ STL 2-3 Trees

Heaps & Priority Queues in the C++ STL 2-3 Trees Heaps & Priority Queues in the C++ STL 2-3 Trees CS 3 Data Structures and Algorithms Lecture Slides Friday, April 7, 2009 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks

More information

361 Computer Architecture Lecture 14: Cache Memory

361 Computer Architecture Lecture 14: Cache Memory 1 361 Computer Architecture Lecture 14 Memory cache.1 The Motivation for s Memory System Processor DRAM Motivation Large memories (DRAM) are slow Small memories (SRAM) are fast Make the average access

More information

Database Systems. Session 8 Main Theme. Physical Database Design, Query Execution Concepts and Database Programming Techniques

Database Systems. Session 8 Main Theme. Physical Database Design, Query Execution Concepts and Database Programming Techniques Database Systems Session 8 Main Theme Physical Database Design, Query Execution Concepts and Database Programming Techniques Dr. Jean-Claude Franchitti New York University Computer Science Department Courant

More information

Chapter 11: File System Implementation. Operating System Concepts with Java 8 th Edition

Chapter 11: File System Implementation. Operating System Concepts with Java 8 th Edition Chapter 11: File System Implementation 11.1 Silberschatz, Galvin and Gagne 2009 Chapter 11: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation

More information

Chapter 12 File Management

Chapter 12 File Management Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Roadmap Overview File organisation and Access

More information

Chapter 12 File Management. Roadmap

Chapter 12 File Management. Roadmap Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Overview Roadmap File organisation and Access

More information

Chapter 11: File System Implementation. Chapter 11: File System Implementation. Objectives. File-System Structure

Chapter 11: File System Implementation. Chapter 11: File System Implementation. Objectives. File-System Structure Chapter 11: File System Implementation Chapter 11: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency

More information

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team Lecture Summary In this lecture, we learned about the ADT Priority Queue. A

More information

recursion, O(n), linked lists 6/14

recursion, O(n), linked lists 6/14 recursion, O(n), linked lists 6/14 recursion reducing the amount of data to process and processing a smaller amount of data example: process one item in a list, recursively process the rest of the list

More information

CHAPTER 13: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING

CHAPTER 13: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING Chapter 13: Disk Storage, Basic File Structures, and Hashing 1 CHAPTER 13: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING Answers to Selected Exercises 13.23 Consider a disk with the following characteristics

More information

A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES

A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES ULFAR ERLINGSSON, MARK MANASSE, FRANK MCSHERRY MICROSOFT RESEARCH SILICON VALLEY MOUNTAIN VIEW, CALIFORNIA, USA ABSTRACT Recent advances in the

More information

CIS 631 Database Management Systems Sample Final Exam

CIS 631 Database Management Systems Sample Final Exam CIS 631 Database Management Systems Sample Final Exam 1. (25 points) Match the items from the left column with those in the right and place the letters in the empty slots. k 1. Single-level index files

More information

DATA STRUCTURES USING C

DATA STRUCTURES USING C DATA STRUCTURES USING C QUESTION BANK UNIT I 1. Define data. 2. Define Entity. 3. Define information. 4. Define Array. 5. Define data structure. 6. Give any two applications of data structures. 7. Give

More information

BM307 File Organization

BM307 File Organization BM307 File Organization Gazi University Computer Engineering Department 9/24/2014 1 Index Sequential File Organization Binary Search Interpolation Search Self-Organizing Sequential Search Direct File Organization

More information

Memory Allocation. Static Allocation. Dynamic Allocation. Memory Management. Dynamic Allocation. Dynamic Storage Allocation

Memory Allocation. Static Allocation. Dynamic Allocation. Memory Management. Dynamic Allocation. Dynamic Storage Allocation Dynamic Storage Allocation CS 44 Operating Systems Fall 5 Presented By Vibha Prasad Memory Allocation Static Allocation (fixed in size) Sometimes we create data structures that are fixed and don t need

More information

Big Data and Scripting. Part 4: Memory Hierarchies

Big Data and Scripting. Part 4: Memory Hierarchies 1, Big Data and Scripting Part 4: Memory Hierarchies 2, Model and Definitions memory size: M machine words total storage (on disk) of N elements (N is very large) disk size unlimited (for our considerations)

More information

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13 External Sorting Chapter 13 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Sort? A classic problem in computer science! Data requested in sorted order e.g., find students in increasing

More information

Multi-dimensional index structures Part I: motivation

Multi-dimensional index structures Part I: motivation Multi-dimensional index structures Part I: motivation 144 Motivation: Data Warehouse A definition A data warehouse is a repository of integrated enterprise data. A data warehouse is used specifically for

More information

In-Memory Databases MemSQL

In-Memory Databases MemSQL IT4BI - Université Libre de Bruxelles In-Memory Databases MemSQL Gabby Nikolova Thao Ha Contents I. In-memory Databases...4 1. Concept:...4 2. Indexing:...4 a. b. c. d. AVL Tree:...4 B-Tree and B+ Tree:...5

More information

File Management Chapters 10, 11, 12

File Management Chapters 10, 11, 12 File Management Chapters 10, 11, 12 Requirements For long-term storage: possible to store large amount of info. info must survive termination of processes multiple processes must be able to access concurrently

More information

DATABASDESIGN FÖR INGENJÖRER - 1DL124

DATABASDESIGN FÖR INGENJÖRER - 1DL124 1 DATABASDESIGN FÖR INGENJÖRER - 1DL124 Sommar 2005 En introduktionskurs i databassystem http://user.it.uu.se/~udbl/dbt-sommar05/ alt. http://www.it.uu.se/edu/course/homepage/dbdesign/st05/ Kjell Orsborn

More information

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++ Answer the following 1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++ 2) Which data structure is needed to convert infix notations to postfix notations? Stack 3) The

More information

Two Parts. Filesystem Interface. Filesystem design. Interface the user sees. Implementing the interface

Two Parts. Filesystem Interface. Filesystem design. Interface the user sees. Implementing the interface File Management Two Parts Filesystem Interface Interface the user sees Organization of the files as seen by the user Operations defined on files Properties that can be read/modified Filesystem design Implementing

More information

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92. Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure

More information

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process

More information

CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions. Linda Shapiro Spring 2016

CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions. Linda Shapiro Spring 2016 CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions Linda Shapiro Spring 2016 Announcements Friday: Review List and go over answers to Practice Problems 2 Hash Tables: Review Aim for constant-time

More information

Rethinking SIMD Vectorization for In-Memory Databases

Rethinking SIMD Vectorization for In-Memory Databases SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for In-Memory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest

More information

æ A collection of interrelated and persistent data èusually referred to as the database èdbèè.

æ A collection of interrelated and persistent data èusually referred to as the database èdbèè. CMPT-354-Han-95.3 Lecture Notes September 10, 1995 Chapter 1 Introduction 1.0 Database Management Systems 1. A database management system èdbmsè, or simply a database system èdbsè, consists of æ A collection

More information

Storing Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7

Storing Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7 Storing : Disks and Files Chapter 7 Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet base Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Disks and

More information

Overview of Storage and Indexing. Data on External Storage. Alternative File Organizations. Chapter 8

Overview of Storage and Indexing. Data on External Storage. Alternative File Organizations. Chapter 8 Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnan

More information

Hash Tables. Computer Science E-119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Data Dictionary Revisited

Hash Tables. Computer Science E-119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Data Dictionary Revisited Hash Tables Computer Science E-119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Data Dictionary Revisited We ve considered several data structures that allow us to store and search for data

More information

Database 2 Lecture I. Alessandro Artale

Database 2 Lecture I. Alessandro Artale Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (1) Database 2 Lecture I Alessandro Artale Faculty of Computer Science Free University of Bolzano Room: 221 artale@inf.unibz.it http://www.inf.unibz.it/

More information

Query Processing C H A P T E R12. Practice Exercises

Query Processing C H A P T E R12. Practice Exercises C H A P T E R12 Query Processing Practice Exercises 12.1 Assume (for simplicity in this exercise) that only one tuple fits in a block and memory holds at most 3 blocks. Show the runs created on each pass

More information

COS 318: Operating Systems

COS 318: Operating Systems COS 318: Operating Systems File Performance and Reliability Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall10/cos318/ Topics File buffer cache

More information

Algorithms. Margaret M. Fleck. 18 October 2010

Algorithms. Margaret M. Fleck. 18 October 2010 Algorithms Margaret M. Fleck 18 October 2010 These notes cover how to analyze the running time of algorithms (sections 3.1, 3.3, 4.4, and 7.1 of Rosen). 1 Introduction The main reason for studying big-o

More information

Topics in Computer System Performance and Reliability: Storage Systems!

Topics in Computer System Performance and Reliability: Storage Systems! CSC 2233: Topics in Computer System Performance and Reliability: Storage Systems! Note: some of the slides in today s lecture are borrowed from a course taught by Greg Ganger and Garth Gibson at Carnegie

More information

SQL Query Evaluation. Winter 2006-2007 Lecture 23

SQL Query Evaluation. Winter 2006-2007 Lecture 23 SQL Query Evaluation Winter 2006-2007 Lecture 23 SQL Query Processing Databases go through three steps: Parse SQL into an execution plan Optimize the execution plan Evaluate the optimized plan Execution

More information

The Classical Architecture. Storage 1 / 36

The Classical Architecture. Storage 1 / 36 1 / 36 The Problem Application Data? Filesystem Logical Drive Physical Drive 2 / 36 Requirements There are different classes of requirements: Data Independence application is shielded from physical storage

More information

CS 2112 Spring 2014. 0 Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions

CS 2112 Spring 2014. 0 Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions CS 2112 Spring 2014 Assignment 3 Data Structures and Web Filtering Due: March 4, 2014 11:59 PM Implementing spam blacklists and web filters requires matching candidate domain names and URLs very rapidly

More information

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Sort? A classic problem in computer science! Data requested in sorted order e.g., find students in increasing

More information

Chapter 1 File Organization 1.0 OBJECTIVES 1.1 INTRODUCTION 1.2 STORAGE DEVICES CHARACTERISTICS

Chapter 1 File Organization 1.0 OBJECTIVES 1.1 INTRODUCTION 1.2 STORAGE DEVICES CHARACTERISTICS Chapter 1 File Organization 1.0 Objectives 1.1 Introduction 1.2 Storage Devices Characteristics 1.3 File Organization 1.3.1 Sequential Files 1.3.2 Indexing and Methods of Indexing 1.3.3 Hash Files 1.4

More information

Storage and File Structure

Storage and File Structure Storage and File Structure Chapter 10: Storage and File Structure Overview of Physical Storage Media Magnetic Disks RAID Tertiary Storage Storage Access File Organization Organization of Records in Files

More information

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level) Comp 5311 Database Management Systems 16. Review 2 (Physical Level) 1 Main Topics Indexing Join Algorithms Query Processing and Optimization Transactions and Concurrency Control 2 Indexing Used for faster

More information

Part III Storage Management. Chapter 11: File System Implementation

Part III Storage Management. Chapter 11: File System Implementation Part III Storage Management Chapter 11: File System Implementation 1 Layered File System 2 Overview: 1/4 A file system has on-disk and in-memory information. A disk may contain the following for implementing

More information

10CS35: Data Structures Using C

10CS35: Data Structures Using C CS35: Data Structures Using C QUESTION BANK REVIEW OF STRUCTURES AND POINTERS, INTRODUCTION TO SPECIAL FEATURES OF C OBJECTIVE: Learn : Usage of structures, unions - a conventional tool for handling a

More information

The What, Why and How of the Pure Storage Enterprise Flash Array

The What, Why and How of the Pure Storage Enterprise Flash Array The What, Why and How of the Pure Storage Enterprise Flash Array Ethan L. Miller (and a cast of dozens at Pure Storage) What is an enterprise storage array? Enterprise storage array: store data blocks

More information

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771 ENHANCEMENTS TO SQL SERVER COLUMN STORES Anuhya Mallempati #2610771 CONTENTS Abstract Introduction Column store indexes Batch mode processing Other Enhancements Conclusion ABSTRACT SQL server introduced

More information

FAT32 vs. NTFS Jason Capriotti CS384, Section 1 Winter 1999-2000 Dr. Barnicki January 28, 2000

FAT32 vs. NTFS Jason Capriotti CS384, Section 1 Winter 1999-2000 Dr. Barnicki January 28, 2000 FAT32 vs. NTFS Jason Capriotti CS384, Section 1 Winter 1999-2000 Dr. Barnicki January 28, 2000 Table of Contents List of Figures... iv Introduction...1 The Physical Disk...1 File System Basics...3 File

More information

Availability Digest. www.availabilitydigest.com. Data Deduplication February 2011

Availability Digest. www.availabilitydigest.com. Data Deduplication February 2011 the Availability Digest Data Deduplication February 2011 What is Data Deduplication? Data deduplication is a technology that can reduce disk storage-capacity requirements and replication bandwidth requirements

More information

Binary Trees and Huffman Encoding Binary Search Trees

Binary Trees and Huffman Encoding Binary Search Trees Binary Trees and Huffman Encoding Binary Search Trees Computer Science E119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Motivation: Maintaining a Sorted Collection of Data A data dictionary

More information

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES ABSTRACT EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES Tyler Cossentine and Ramon Lawrence Department of Computer Science, University of British Columbia Okanagan Kelowna, BC, Canada tcossentine@gmail.com

More information

Quick Guide. Passports in Microsoft PowerPoint. Getting Started with PowerPoint. Locating the PowerPoint Folder (PC) Locating PowerPoint (Mac)

Quick Guide. Passports in Microsoft PowerPoint. Getting Started with PowerPoint. Locating the PowerPoint Folder (PC) Locating PowerPoint (Mac) Passports in Microsoft PowerPoint Quick Guide Created Updated PowerPoint is a very versatile tool. It is usually used to create multimedia presentations and printed handouts but it is an almost perfect

More information

Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging

Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging Memory Management Outline Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging 1 Background Memory is a large array of bytes memory and registers are only storage CPU can

More information

Data Management for Portable Media Players

Data Management for Portable Media Players Data Management for Portable Media Players Table of Contents Introduction...2 The New Role of Database...3 Design Considerations...3 Hardware Limitations...3 Value of a Lightweight Relational Database...4

More information

Symbol Tables. Introduction

Symbol Tables. Introduction Symbol Tables Introduction A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The

More information