Indexing. The Problem
|
|
- Ann Atkins
- 7 years ago
- Views:
Transcription
1 Indexing Topics The Problem. Terminology. The Design Space. Fixed and Variable Length Records. B-trees Hashing Learning Objectives: Describe the problem we face in creating keys atop a conventional disk-based storage system. Design a scheme to implement records on top of files. Explain what B-trees are. Distinguish the different B-tree variants and explain the trade-offs among them. Explain hashing as applied to database indexes. 2/2/10 1 The Problem We want to store key/data pairs (aka records). Today s file systems store files. Old IBM file systems were record-based. Operating system/file system knew about record formats, keys, etc. In fact, some disks knew about keyed structure and lookup. Take CS261 to learn more about this. Need to flexibly and efficiently store records in files. Efficient in terms of space. Efficient in terms of look-up time. Flexible in terms of record size (fixed and variable). Flexible in terms of the number of indices, types of indices, etc. 2/2/10 2
2 Isn t this a solved problem? There are lots of data structures like trees and things with gazillions of algorithms that operate on them efficiently. Why don t these algorithms and data structures translate directly into disk-space structures? Pointers work nicely in main memory -- how do you represent pointers in main memory? Data structures can be arbitrarily sized, but disk blocks are fixed size (and are larger than many objects). Files typically only grow at the end -- they don t support insert into the middle. 2/2/10 3 Storage Technology There are a variety of media on which to store data. Each type has a different set of tradeoffs. Tradeoffs change over time. Important to understand the principles and roles rather than particular implementations. What would an ideal storage technology look like? Infinitely fast: add no additional latency to the cost of the processor operations. 100% reliable. 100% available. Infinitely large (or as large as your largest data set). Cheap. 2/2/10 4
3 Sample Storage Technologies RAM Main Memory Speed Stability Writable Cost/bit Random access Very fast Poor Yes High Yes Flash Fast OK Yes Medium/ High Yes Disk Slow OK Yes Mid Yes (but slower) Tape Slow Good Yes Low No 2/2/10 5 Abstracting the Disk The disk interface is clunky: Read and write one or more sectors. A sector is 512 bytes. Not a terribly elegant interface for implementing key/data pairs. Operating systems provides files -- are they much better? Byte stream interface. Position a pointer and read/write some number of bytes to/from that location. Still not terribly easy to do things like keys. Problems: No structure within disk blocks or byte streams. Unit of transfer between disk and file system is pages (1K, 4K, 1M) Placing one object per file breaks nearly all file systems. If you put multiple objects into a file, you still need a way to locate them. 2/2/10 6
4 Terminology Interchangeable terms: key/data pairs, record, tuple Primary Index: Index sort order corresponds to layout Secondary Index: Sort order independent of layout Internal Index: Keys and data both stored in index External Index: Keys and reference to data in index Data is stored elsewhere Meta-data: Data that describes the data or its structure Not user data 2/2/10 7 The Design Space Fixed versus variable length records. Fixed are easier. Fixed are faster. Most data are not fixed length. Usually waste space -- have to allocate enough space for largest objects (if you don t allocate enough space, what do you do when you get a big object?) Internal or external indices External indices separate indexing from the data. Internal indices make consistency easier. Internal indices provide clustering. 2/2/10 8
5 B-tree versus Hash Indexing High order difference: ordered versus unordered In-theory hash indices require fewer disk accesses. In practice, this is often not the case: Practically all database systems maintain a cache. The cache should be large enough to hold all the internal nodes of a B-tree. If so, internal nodes are memory accesses, not disk accesses. Items in hash buckets are not necessarily sorted. If there are many items per bucket, locating an item in a bucket can be expensive. Since B-trees must be ordered, searching them may be faster. Accesses are rarely random, so B-tree clustering is often a win. When are hash tables a big win? Database is huge (internal pages do not fit in memory) 2/2/10 Access is really random. 9 Implementation: Fixed Length Records (Naïve Approach) Allocate records contiguously, one right after the other. Let s be the record size. To access record n, compute s * n, seek to that offset, and read s bytes. Record 0 Record 1 Record 2 Record 3 Record 4 s bytes Offset = s * n n 2/2/10 10
6 Naïve Approach: Problems Obvious, simple, and wrong Record 0 Record 1 Record 2 Record 3 Record 4 Disk Block Other Problems? How do you delete a record? How do you add a record between record 2 and 3? 2/2/10 11 Fixed Length Records: Take 2 Add meta-data Let s add 1 bit in each record that indicates if the record is present/not present: Record 0 Record 1 Record 2 Record 3 Record 4 Pros/Cons? + Can delete records + Can reuse space - Difficult to find free space - Records still span pages 2/2/10 12
7 Fixed Length Records: Take 3 Add more meta-data: a header page Header points to the beginning of a free list Each deleted record points to the next deleted record. Header Record 0 Record 1 Record 2 Record 3 Record 4 Pros/Cons? + Can now find free space easily - Records still span pages 2/2/10 13 Fixed Length Records: Take 4 Put records on pages explicitly Calculate how many records fit on a page. Call this f (fill factor) To find record n, compute P(age) = n / f; O(ffset) = n % f Keep free space management from previous approach. Header Record 0 Record 1 Record 2 Record 3 Pros/Cons? +Records still span pages - Wasted space called internal fragmentation 2/2/10 14
8 Variable Length Records Now assume records have different lengths. Problem: Can no longer computer a fill factor Must have some kind of meta-data Per-record length End of record delimiter Directory Record lengths of delimiters Abandon record numbers and assume records are identified by a page number and offset. Before each record, include its length, OR After each record, place a special symbol 2/2/10 15 Lengths and Delimiters (1) Record P/0 100 Record P/2 Record P/1 Record P/0 Record P/1 Record P/2 2/2/10 16
9 Lengths and Delimiters (2) Pros/Cons? How do you read backwards within a page (lengths)? How do you delete? How do you insert? Add meta-data back in (from fixed length records) Add deleted bit Add header and freelist Chain empty records together Why is this harder than the fixed length case? Performing dynamic memory allocation Pick an algorithm: first fit, best fit, etc. 2/2/10 May have to coalesce space 17 Variable Length: Take 2 Use the page/offset identification scheme. Place a directory at the top of each page Directory points to where records begin Grow directory from the top Allocate record space from the bottom Can expand header to include things like deleted bits off 1 off 2... off f record f records 3 f-1 record 2 record 1 2/2/10 18
10 A note about objects How do objects fit into this? Objects are just variable sized records. Objects may contain references to other objects. Must translate these references between persistent form and memory representation; called swizzling. On-disk might use the page/offset record number In memory probably want an actual pointer It is this translation that is called swizzling Some objects are large (greater than a page). Previous designs assumed that objects fit on a single page; life gets more complicated when this is not the case. 2/2/10 19 B-trees: Balanced Trees B-trees were designed to balance the time taken to retrieve a page from disk and the time to search within a page. We will build a tree from nodes, where nodes correspond to disk pages (a few KB). Each (internal) node stores N keys and N + 1 pointers to other nodes. On leaves, keys can be paired with their data (internal index) or they can contain record numbers (external index). 2/2/10 20
11 B-Tree Diagram mouse eagle koala rat tiger bat emu lemur muskrat rhino vole cat frog llama ostrich shrimp whale dog goat mite parrot 2/2/10 21 B-tree vs B+tree B-tree: both leaves and internal pages contain data. B+tree: all data lives at the leaves. What are the trade-offs between B-trees and B+trees? B-tree: no duplication of keys B+tree: All data at the leaves; iterating over keys is easier. B+tree: Internal nodes more compact (better fanout). B-tree: Some lookups are faster 2/2/10 22
12 Maintaining your Tree Splits What do you do when you are trying to insert and an item doesn t fit? Split the page in half; pick a key that distinguishes the pages and insert it into the parent page. What happens if that key doesn t fit in the parent? Split the parent potentially recursive up to the root. Reverse splits (merges, coalescing) On delete, you might empty a page. Coalesce the page with its sibling and remove a key from the parent. Like splits, reverse splits can propagate to root. 2/2/10 23 Other B-Tree Variants B-link: Leaf pages are linked together to provide fast sequential scan. Almost everyone does this. Straight forward until you introduce cursors and concurrency (stay-tuned). B*: All nodes are kept 2/3 full (by redistributing keys). Splitting becomes more complicated because you may have to move keys among siblings to maintain the 2/3 property. Reverse splits happen before pages become empty must consider a coalescing operation whenever you drop below 2/3 full. 2/2/10 24
13 Practical Considerations (1) Key lengths may vary, so you may not be able to maintain the same number of keys (and therefore pointers per page). What are the implications of this? Nodes must map to pages (either disk or file). Pointers are therefore page numbers. How do you handle keys (or data) larger than a page? What is the minimum number of keys you must have on a page? What if they don t fit? Key compression If you store entire keys, you may be storing a lot of repeated data (e.g., misdemeanor, misplace, mistake, etc). Store the minimum difference keys instead (store mis once and store the suffixes as keys). 2/2/10 25 Practical Considerations (2) What if you have multiple data items for the same key? Do you allow it? How do you store them? What if you have so many duplicates, you have to split the page on which they reside what key do you promote? Standard solutions: Disallow Store a few duplicates on a node, but if you get too many, create a special duplicate page (perhaps even an entire tree). Store multiple data items as one (encoded) data item. 2/2/10 26
14 A Cute Hack Sometimes you want real record numbers, not page/offset identifiers. And sometimes you want to insert new records between two adjacent records. You can hack B+trees to do that With each pointer, store the number of records that appear beneath that pointer. Can easily find the record with record number n. Facilitates insertion and. 2/2/10 27 Cute Hack: Demo 9 7 mouse eagle koala rat tiger bat cat dog emu frog goat lemur llama mite muskrat ostrich parrot rhino shrimp vole whale 2/2/10 28
15 Cursors They mark a position in the tree (used in iterating over a file). Cannot lose the position in the face of a delete. Subsequent inserts must happen in the right spot. Requires retaining the key value. With multiple deletes and multiple cursors, you have to maintain positioning between cursors c_get elephant delete current where is cursor? insert kangaroo insert eagle cat dog elephant mouse cursor 2/2/10 29 Semi-Digression: Famous Last Words (Thanks to Mark Day) Databases seem so complicated. We can just do this with shared text files. OK then, we ll just write a B-tree package B-tree code seems hard to get right for all the corner cases. No text book exists that actually explains all the intricate details of B-tree manipulation; they present the first 80% in wonderful simplicity. One of the reasons Berkeley DB exists is because B-tree implementations are simply hard. 2/2/10 30
16 Hashing Your index is a collection of buckets (bucket = page) Define a hash function, h, that maps a key to a bucket. Store the corresponding data in that bucket. Collisions Multiple keys hash to the same bucket. Store multiple keys in the same bucket. What do you do when buckets fill? Chaining: link new pages(overflow pages) off the bucket. Open-hashing: look in the next bucket. Chaining versus open-hashing Open-hashing does not support deletion well. 2/2/ Hash Example Assume: H(cat) = 0 H(dog) = 1 H(mouse) = 0 Operations 1. Insert cat 2. Insert dog 3. Insert mouse 4. Delete dog 5. Lookup mouse mouse cat dog mouse 2/2/10 32
17 Static vs Dynamic Hashing Static: number of buckets predefined; never changes. Either, overflow chains grow very long, OR A lot of wasted space in unused buckets. Dynamic: number of buckets changes over time. Hash function must adapt. Usually, start revealing more bits of the hash value as the table grows. 2/2/10 33 Practical Hashing (1) Buckets map to pages. Must be able to directly translate from a bucket number to a page number. Where do you store overflow pages? If number of buckets is fixed (static hashing), store overflow buckets after regular buckets. Use free list to manage overflow buckets. Static hashing isn t very practical for databases. Databases change in size fairly substantially. If you have to preallocate, often waste space. 2/2/10 34
18 Practical Hashing (2) Dynamic hash implementation. Periodically double the size of the database. Rehash every key into new table. Dynamic Linear Hashing (Litwin) Grow table one bucket at a time. Split buckets sequentially; rehash just the splitting bucket. Maintain overflow buckets as necessary. Keep track of max bucket to identify the correct number of bits to consider in the hash value. 2/2/10 35
Lecture 1: Data Storage & Index
Lecture 1: Data Storage & Index R&G Chapter 8-11 Concurrency control Query Execution and Optimization Relational Operators File & Access Methods Buffer Management Disk Space Management Recovery Manager
More informationPhysical Data Organization
Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor
More informationDATABASE DESIGN - 1DL400
DATABASE DESIGN - 1DL400 Spring 2015 A course on modern database systems!! http://www.it.uu.se/research/group/udbl/kurser/dbii_vt15/ Kjell Orsborn! Uppsala Database Laboratory! Department of Information
More informationOperating Systems CSE 410, Spring 2004. File Management. Stephen Wagner Michigan State University
Operating Systems CSE 410, Spring 2004 File Management Stephen Wagner Michigan State University File Management File management system has traditionally been considered part of the operating system. Applications
More informationChapter 13. Disk Storage, Basic File Structures, and Hashing
Chapter 13 Disk Storage, Basic File Structures, and Hashing Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and Extendible Hashing
More informationRecord Storage and Primary File Organization
Record Storage and Primary File Organization 1 C H A P T E R 4 Contents Introduction Secondary Storage Devices Buffering of Blocks Placing File Records on Disk Operations on Files Files of Unordered Records
More informationCopyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1
Slide 13-1 Chapter 13 Disk Storage, Basic File Structures, and Hashing Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and Extendible
More informationChapter 13 Disk Storage, Basic File Structures, and Hashing.
Chapter 13 Disk Storage, Basic File Structures, and Hashing. Copyright 2004 Pearson Education, Inc. Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files
More informationChapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing
Chapter 13 Disk Storage, Basic File Structures, and Hashing Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files
More informationINTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.
Chapter 4: Record Storage and Primary File Organization 1 Record Storage and Primary File Organization INTRODUCTION The collection of data that makes up a computerized database must be stored physically
More information6. Storage and File Structures
ECS-165A WQ 11 110 6. Storage and File Structures Goals Understand the basic concepts underlying different storage media, buffer management, files structures, and organization of records in files. Contents
More informationCSE 326: Data Structures B-Trees and B+ Trees
Announcements (4//08) CSE 26: Data Structures B-Trees and B+ Trees Brian Curless Spring 2008 Midterm on Friday Special office hour: 4:-5: Thursday in Jaech Gallery (6 th floor of CSE building) This is
More informationCOS 318: Operating Systems. File Layout and Directories. Topics. File System Components. Steps to Open A File
Topics COS 318: Operating Systems File Layout and Directories File system structure Disk allocation and i-nodes Directory and link implementations Physical layout for performance 2 File System Components
More informationB-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees
B-Trees Algorithms and data structures for external memory as opposed to the main memory B-Trees Previous Lectures Height balanced binary search trees: AVL trees, red-black trees. Multiway search trees:
More informationB+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers
B+ Tree and Hashing B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers B+ Tree Properties Balanced Tree Same height for paths
More informationChapter 13: Query Processing. Basic Steps in Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationStorage in Database Systems. CMPSCI 445 Fall 2010
Storage in Database Systems CMPSCI 445 Fall 2010 1 Storage Topics Architecture and Overview Disks Buffer management Files of records 2 DBMS Architecture Query Parser Query Rewriter Query Optimizer Query
More informationChapter 8: Structures for Files. Truong Quynh Chi tqchi@cse.hcmut.edu.vn. Spring- 2013
Chapter 8: Data Storage, Indexing Structures for Files Truong Quynh Chi tqchi@cse.hcmut.edu.vn Spring- 2013 Overview of Database Design Process 2 Outline Data Storage Disk Storage Devices Files of Records
More informationOverview of Storage and Indexing
Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnan
More informationCHAPTER 17: File Management
CHAPTER 17: File Management The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides
More informationFile System Management
Lecture 7: Storage Management File System Management Contents Non volatile memory Tape, HDD, SSD Files & File System Interface Directories & their Organization File System Implementation Disk Space Allocation
More informationData storage Tree indexes
Data storage Tree indexes Rasmus Pagh February 7 lecture 1 Access paths For many database queries and updates, only a small fraction of the data needs to be accessed. Extreme examples are looking or updating
More informationUnit 4.3 - Storage Structures 1. Storage Structures. Unit 4.3
Storage Structures Unit 4.3 Unit 4.3 - Storage Structures 1 The Physical Store Storage Capacity Medium Transfer Rate Seek Time Main Memory 800 MB/s 500 MB Instant Hard Drive 10 MB/s 120 GB 10 ms CD-ROM
More informationFile Management. Chapter 12
Chapter 12 File Management File is the basic element of most of the applications, since the input to an application, as well as its output, is usually a file. They also typically outlive the execution
More informationPrevious Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles
B-Trees Algorithms and data structures for external memory as opposed to the main memory B-Trees Previous Lectures Height balanced binary search trees: AVL trees, red-black trees. Multiway search trees:
More informationCS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen
CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen LECTURE 14: DATA STORAGE AND REPRESENTATION Data Storage Memory Hierarchy Disks Fields, Records, Blocks Variable-length
More informationStorage Management for Files of Dynamic Records
Storage Management for Files of Dynamic Records Justin Zobel Department of Computer Science, RMIT, GPO Box 2476V, Melbourne 3001, Australia. jz@cs.rmit.edu.au Alistair Moffat Department of Computer Science
More informationFAWN - a Fast Array of Wimpy Nodes
University of Warsaw January 12, 2011 Outline Introduction 1 Introduction 2 3 4 5 Key issues Introduction Growing CPU vs. I/O gap Contemporary systems must serve millions of users Electricity consumed
More informationIntroduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer.
Guest lecturer: David Hovemeyer November 15, 2004 The memory hierarchy Red = Level Access time Capacity Features Registers nanoseconds 100s of bytes fixed Cache nanoseconds 1-2 MB fixed RAM nanoseconds
More informationRaima Database Manager Version 14.0 In-memory Database Engine
+ Raima Database Manager Version 14.0 In-memory Database Engine By Jeffrey R. Parsons, Senior Engineer January 2016 Abstract Raima Database Manager (RDM) v14.0 contains an all new data storage engine optimized
More information1 File Management. 1.1 Naming. COMP 242 Class Notes Section 6: File Management
COMP 242 Class Notes Section 6: File Management 1 File Management We shall now examine how an operating system provides file management. We shall define a file to be a collection of permanent data with
More informationDatabases and Information Systems 1 Part 3: Storage Structures and Indices
bases and Information Systems 1 Part 3: Storage Structures and Indices Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für Informatik Universität Paderborn WS 2009 / 2010 Contents: - database buffer -
More informationChapter 12 File Management
Operating Systems: Internals and Design Principles Chapter 12 File Management Eighth Edition By William Stallings Files Data collections created by users The File System is one of the most important parts
More informationPrinciples of Database Management Systems. Overview. Principles of Data Layout. Topic for today. "Executive Summary": here.
Topic for today Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom) How to represent data on disk
More informationSMALL INDEX LARGE INDEX (SILT)
Wayne State University ECE 7650: Scalable and Secure Internet Services and Architecture SMALL INDEX LARGE INDEX (SILT) A Memory Efficient High Performance Key Value Store QA REPORT Instructor: Dr. Song
More informationUniversity of Dublin Trinity College. Storage Hardware. Owen.Conlan@cs.tcd.ie
University of Dublin Trinity College Storage Hardware Owen.Conlan@cs.tcd.ie Hardware Issues Hard Disk/SSD CPU Cache Main Memory CD ROM/RW DVD ROM/RW Tapes Primary Storage Floppy Disk/ Memory Stick Secondary
More informationOriginal-page small file oriented EXT3 file storage system
Original-page small file oriented EXT3 file storage system Zhang Weizhe, Hui He, Zhang Qizhen School of Computer Science and Technology, Harbin Institute of Technology, Harbin E-mail: wzzhang@hit.edu.cn
More informationReview of Hashing: Integer Keys
CSE 326 Lecture 13: Much ado about Hashing Today s munchies to munch on: Review of Hashing Collision Resolution by: Separate Chaining Open Addressing $ Linear/Quadratic Probing $ Double Hashing Rehashing
More informationFile-System Implementation
File-System Implementation 11 CHAPTER In this chapter we discuss various methods for storing information on secondary storage. The basic issues are device directory, free space management, and space allocation
More informationBinary Heap Algorithms
CS Data Structures and Algorithms Lecture Slides Wednesday, April 5, 2009 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks CHAPPELLG@member.ams.org 2005 2009 Glenn G. Chappell
More informationKrishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C
Tutorial#1 Q 1:- Explain the terms data, elementary item, entity, primary key, domain, attribute and information? Also give examples in support of your answer? Q 2:- What is a Data Type? Differentiate
More informationProject Group High- performance Flexible File System 2010 / 2011
Project Group High- performance Flexible File System 2010 / 2011 Lecture 1 File Systems André Brinkmann Task Use disk drives to store huge amounts of data Files as logical resources A file can contain
More informationData Warehousing und Data Mining
Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data
More informationHeaps & Priority Queues in the C++ STL 2-3 Trees
Heaps & Priority Queues in the C++ STL 2-3 Trees CS 3 Data Structures and Algorithms Lecture Slides Friday, April 7, 2009 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks
More information361 Computer Architecture Lecture 14: Cache Memory
1 361 Computer Architecture Lecture 14 Memory cache.1 The Motivation for s Memory System Processor DRAM Motivation Large memories (DRAM) are slow Small memories (SRAM) are fast Make the average access
More informationDatabase Systems. Session 8 Main Theme. Physical Database Design, Query Execution Concepts and Database Programming Techniques
Database Systems Session 8 Main Theme Physical Database Design, Query Execution Concepts and Database Programming Techniques Dr. Jean-Claude Franchitti New York University Computer Science Department Courant
More informationChapter 11: File System Implementation. Operating System Concepts with Java 8 th Edition
Chapter 11: File System Implementation 11.1 Silberschatz, Galvin and Gagne 2009 Chapter 11: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation
More informationChapter 12 File Management
Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Roadmap Overview File organisation and Access
More informationChapter 12 File Management. Roadmap
Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Overview Roadmap File organisation and Access
More informationChapter 11: File System Implementation. Chapter 11: File System Implementation. Objectives. File-System Structure
Chapter 11: File System Implementation Chapter 11: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency
More informationCS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team
CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team Lecture Summary In this lecture, we learned about the ADT Priority Queue. A
More informationrecursion, O(n), linked lists 6/14
recursion, O(n), linked lists 6/14 recursion reducing the amount of data to process and processing a smaller amount of data example: process one item in a list, recursively process the rest of the list
More informationCHAPTER 13: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING
Chapter 13: Disk Storage, Basic File Structures, and Hashing 1 CHAPTER 13: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING Answers to Selected Exercises 13.23 Consider a disk with the following characteristics
More informationA COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES
A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES ULFAR ERLINGSSON, MARK MANASSE, FRANK MCSHERRY MICROSOFT RESEARCH SILICON VALLEY MOUNTAIN VIEW, CALIFORNIA, USA ABSTRACT Recent advances in the
More informationCIS 631 Database Management Systems Sample Final Exam
CIS 631 Database Management Systems Sample Final Exam 1. (25 points) Match the items from the left column with those in the right and place the letters in the empty slots. k 1. Single-level index files
More informationDATA STRUCTURES USING C
DATA STRUCTURES USING C QUESTION BANK UNIT I 1. Define data. 2. Define Entity. 3. Define information. 4. Define Array. 5. Define data structure. 6. Give any two applications of data structures. 7. Give
More informationBM307 File Organization
BM307 File Organization Gazi University Computer Engineering Department 9/24/2014 1 Index Sequential File Organization Binary Search Interpolation Search Self-Organizing Sequential Search Direct File Organization
More informationMemory Allocation. Static Allocation. Dynamic Allocation. Memory Management. Dynamic Allocation. Dynamic Storage Allocation
Dynamic Storage Allocation CS 44 Operating Systems Fall 5 Presented By Vibha Prasad Memory Allocation Static Allocation (fixed in size) Sometimes we create data structures that are fixed and don t need
More informationBig Data and Scripting. Part 4: Memory Hierarchies
1, Big Data and Scripting Part 4: Memory Hierarchies 2, Model and Definitions memory size: M machine words total storage (on disk) of N elements (N is very large) disk size unlimited (for our considerations)
More informationExternal Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13
External Sorting Chapter 13 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Sort? A classic problem in computer science! Data requested in sorted order e.g., find students in increasing
More informationMulti-dimensional index structures Part I: motivation
Multi-dimensional index structures Part I: motivation 144 Motivation: Data Warehouse A definition A data warehouse is a repository of integrated enterprise data. A data warehouse is used specifically for
More informationIn-Memory Databases MemSQL
IT4BI - Université Libre de Bruxelles In-Memory Databases MemSQL Gabby Nikolova Thao Ha Contents I. In-memory Databases...4 1. Concept:...4 2. Indexing:...4 a. b. c. d. AVL Tree:...4 B-Tree and B+ Tree:...5
More informationFile Management Chapters 10, 11, 12
File Management Chapters 10, 11, 12 Requirements For long-term storage: possible to store large amount of info. info must survive termination of processes multiple processes must be able to access concurrently
More informationDATABASDESIGN FÖR INGENJÖRER - 1DL124
1 DATABASDESIGN FÖR INGENJÖRER - 1DL124 Sommar 2005 En introduktionskurs i databassystem http://user.it.uu.se/~udbl/dbt-sommar05/ alt. http://www.it.uu.se/edu/course/homepage/dbdesign/st05/ Kjell Orsborn
More information1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++
Answer the following 1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++ 2) Which data structure is needed to convert infix notations to postfix notations? Stack 3) The
More informationTwo Parts. Filesystem Interface. Filesystem design. Interface the user sees. Implementing the interface
File Management Two Parts Filesystem Interface Interface the user sees Organization of the files as seen by the user Operations defined on files Properties that can be read/modified Filesystem design Implementing
More informationCSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.
Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure
More informationBig Data Technology Map-Reduce Motivation: Indexing in Search Engines
Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process
More informationCSE373: Data Structures & Algorithms Lecture 14: Hash Collisions. Linda Shapiro Spring 2016
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions Linda Shapiro Spring 2016 Announcements Friday: Review List and go over answers to Practice Problems 2 Hash Tables: Review Aim for constant-time
More informationRethinking SIMD Vectorization for In-Memory Databases
SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for In-Memory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest
More informationæ A collection of interrelated and persistent data èusually referred to as the database èdbèè.
CMPT-354-Han-95.3 Lecture Notes September 10, 1995 Chapter 1 Introduction 1.0 Database Management Systems 1. A database management system èdbmsè, or simply a database system èdbsè, consists of æ A collection
More informationStoring Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7
Storing : Disks and Files Chapter 7 Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet base Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Disks and
More informationOverview of Storage and Indexing. Data on External Storage. Alternative File Organizations. Chapter 8
Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnan
More informationHash Tables. Computer Science E-119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Data Dictionary Revisited
Hash Tables Computer Science E-119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Data Dictionary Revisited We ve considered several data structures that allow us to store and search for data
More informationDatabase 2 Lecture I. Alessandro Artale
Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (1) Database 2 Lecture I Alessandro Artale Faculty of Computer Science Free University of Bolzano Room: 221 artale@inf.unibz.it http://www.inf.unibz.it/
More informationQuery Processing C H A P T E R12. Practice Exercises
C H A P T E R12 Query Processing Practice Exercises 12.1 Assume (for simplicity in this exercise) that only one tuple fits in a block and memory holds at most 3 blocks. Show the runs created on each pass
More informationCOS 318: Operating Systems
COS 318: Operating Systems File Performance and Reliability Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall10/cos318/ Topics File buffer cache
More informationAlgorithms. Margaret M. Fleck. 18 October 2010
Algorithms Margaret M. Fleck 18 October 2010 These notes cover how to analyze the running time of algorithms (sections 3.1, 3.3, 4.4, and 7.1 of Rosen). 1 Introduction The main reason for studying big-o
More informationTopics in Computer System Performance and Reliability: Storage Systems!
CSC 2233: Topics in Computer System Performance and Reliability: Storage Systems! Note: some of the slides in today s lecture are borrowed from a course taught by Greg Ganger and Garth Gibson at Carnegie
More informationSQL Query Evaluation. Winter 2006-2007 Lecture 23
SQL Query Evaluation Winter 2006-2007 Lecture 23 SQL Query Processing Databases go through three steps: Parse SQL into an execution plan Optimize the execution plan Evaluate the optimized plan Execution
More informationThe Classical Architecture. Storage 1 / 36
1 / 36 The Problem Application Data? Filesystem Logical Drive Physical Drive 2 / 36 Requirements There are different classes of requirements: Data Independence application is shielded from physical storage
More informationCS 2112 Spring 2014. 0 Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions
CS 2112 Spring 2014 Assignment 3 Data Structures and Web Filtering Due: March 4, 2014 11:59 PM Implementing spam blacklists and web filters requires matching candidate domain names and URLs very rapidly
More informationExternal Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
External Sorting Chapter 13 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Sort? A classic problem in computer science! Data requested in sorted order e.g., find students in increasing
More informationChapter 1 File Organization 1.0 OBJECTIVES 1.1 INTRODUCTION 1.2 STORAGE DEVICES CHARACTERISTICS
Chapter 1 File Organization 1.0 Objectives 1.1 Introduction 1.2 Storage Devices Characteristics 1.3 File Organization 1.3.1 Sequential Files 1.3.2 Indexing and Methods of Indexing 1.3.3 Hash Files 1.4
More informationStorage and File Structure
Storage and File Structure Chapter 10: Storage and File Structure Overview of Physical Storage Media Magnetic Disks RAID Tertiary Storage Storage Access File Organization Organization of Records in Files
More informationComp 5311 Database Management Systems. 16. Review 2 (Physical Level)
Comp 5311 Database Management Systems 16. Review 2 (Physical Level) 1 Main Topics Indexing Join Algorithms Query Processing and Optimization Transactions and Concurrency Control 2 Indexing Used for faster
More informationPart III Storage Management. Chapter 11: File System Implementation
Part III Storage Management Chapter 11: File System Implementation 1 Layered File System 2 Overview: 1/4 A file system has on-disk and in-memory information. A disk may contain the following for implementing
More information10CS35: Data Structures Using C
CS35: Data Structures Using C QUESTION BANK REVIEW OF STRUCTURES AND POINTERS, INTRODUCTION TO SPECIAL FEATURES OF C OBJECTIVE: Learn : Usage of structures, unions - a conventional tool for handling a
More informationThe What, Why and How of the Pure Storage Enterprise Flash Array
The What, Why and How of the Pure Storage Enterprise Flash Array Ethan L. Miller (and a cast of dozens at Pure Storage) What is an enterprise storage array? Enterprise storage array: store data blocks
More informationENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771
ENHANCEMENTS TO SQL SERVER COLUMN STORES Anuhya Mallempati #2610771 CONTENTS Abstract Introduction Column store indexes Batch mode processing Other Enhancements Conclusion ABSTRACT SQL server introduced
More informationFAT32 vs. NTFS Jason Capriotti CS384, Section 1 Winter 1999-2000 Dr. Barnicki January 28, 2000
FAT32 vs. NTFS Jason Capriotti CS384, Section 1 Winter 1999-2000 Dr. Barnicki January 28, 2000 Table of Contents List of Figures... iv Introduction...1 The Physical Disk...1 File System Basics...3 File
More informationAvailability Digest. www.availabilitydigest.com. Data Deduplication February 2011
the Availability Digest Data Deduplication February 2011 What is Data Deduplication? Data deduplication is a technology that can reduce disk storage-capacity requirements and replication bandwidth requirements
More informationBinary Trees and Huffman Encoding Binary Search Trees
Binary Trees and Huffman Encoding Binary Search Trees Computer Science E119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Motivation: Maintaining a Sorted Collection of Data A data dictionary
More informationEFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES
ABSTRACT EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES Tyler Cossentine and Ramon Lawrence Department of Computer Science, University of British Columbia Okanagan Kelowna, BC, Canada tcossentine@gmail.com
More informationQuick Guide. Passports in Microsoft PowerPoint. Getting Started with PowerPoint. Locating the PowerPoint Folder (PC) Locating PowerPoint (Mac)
Passports in Microsoft PowerPoint Quick Guide Created Updated PowerPoint is a very versatile tool. It is usually used to create multimedia presentations and printed handouts but it is an almost perfect
More informationMemory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging
Memory Management Outline Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging 1 Background Memory is a large array of bytes memory and registers are only storage CPU can
More informationData Management for Portable Media Players
Data Management for Portable Media Players Table of Contents Introduction...2 The New Role of Database...3 Design Considerations...3 Hardware Limitations...3 Value of a Lightweight Relational Database...4
More informationSymbol Tables. Introduction
Symbol Tables Introduction A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The
More information