Principles of Database Management Systems. Overview. Principles of Data Layout. Topic for today. "Executive Summary": here.



Similar documents
Organization of Records in Blocks

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Database 2 Lecture I. Alessandro Artale

6. Storage and File Structures

CHAPTER 13: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.

Storage and File Structure

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

Physical Data Organization

Lecture 1: Data Storage & Index

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Chapter 13. Disk Storage, Basic File Structures, and Hashing

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Record Storage and Primary File Organization

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

The Relational Model. Why Study the Relational Model? Relational Database: Definitions

The Relational Model. Ramakrishnan&Gehrke, Chapter 3 CS4320 1

An overview of FAT12

Outline. Principles of Database Management Systems. Memory Hierarchy: Capacities and access times. CPU vs. Disk Speed

CHAPTER 17: File Management

Chapter 12 File Management

Chapter 12 File Management. Roadmap

Lecture 6. SQL, Logical DB Design

There are five fields or columns, with names and types as shown above.

Chapter 8: Structures for Files. Truong Quynh Chi Spring- 2013

Chapter 1 File Organization 1.0 OBJECTIVES 1.1 INTRODUCTION 1.2 STORAGE DEVICES CHARACTERISTICS

Chapter 1: Introduction. Database Management System (DBMS) University Database Example

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

Chapter 12 File Management

Overview of Storage and Indexing

Number Representation

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.

Storage in Database Systems. CMPSCI 445 Fall 2010

Outline. Data Modeling. Conceptual Design. ER Model Basics: Entities. ER Model Basics: Relationships. Ternary Relationships. Yanlei Diao UMass Amherst

Overview of Storage and Indexing. Data on External Storage. Alternative File Organizations. Chapter 8

CSE 562 Database Systems

Database Systems. Session 8 Main Theme. Physical Database Design, Query Execution Concepts and Database Programming Techniques

Storing Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7

The Relational Model. Why Study the Relational Model?

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

Lecture Data Warehouse Systems

Raima Database Manager Version 14.0 In-memory Database Engine

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao


Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

Database Design and Programming

The Relational Model. Why Study the Relational Model? Relational Database: Definitions. Chapter 3

Database Fundamentals

File Management. Chapter 12

Introduction to Databases

File Management Chapters 10, 11, 12

The Classical Architecture. Storage 1 / 36

Chapter 9, More SQL: Assertions, Views, and Programming Techniques

Chapter 1: Introduction

Unit Storage Structures 1. Storage Structures. Unit 4.3

Creating Database Tables in Microsoft SQL Server

Physical DB design and tuning: outline

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

A Brief Introduction to MySQL

Specifications of Paradox for Windows

File Management. Chapter 12

File-System Implementation

INDEXING BIOMEDICAL STREAMS IN DATA MANAGEMENT SYSTEM 1. INTRODUCTION

Part 7: Object Oriented Databases

High-performance XML Storage/Retrieval System

How To Write Portable Programs In C

Multi-dimensional index structures Part I: motivation

1 File Management. 1.1 Naming. COMP 242 Class Notes Section 6: File Management

3. Relational Model and Relational Algebra

University of Dublin Trinity College. Storage Hardware.

Hypertable Architecture Overview

Government Girls Polytechnic, Bilaspur

EECS 647: Introduction to Database Systems

Preliminary Draft May 19th Video Subsystem

In This Lecture. Security and Integrity. Database Security. DBMS Security Support. Privileges in SQL. Permissions and Privilege.

Chapter 12 File Management

Chapter 8. SQL-99: SchemaDefinition, Constraints, and Queries and Views

Databases Model the Real World. The Entity- Relationship Model. Conceptual Design. Steps in Database Design. ER Model Basics. ER Model Basics (Contd.

Database Design Overview. Conceptual Design ER Model. Entities and Entity Sets. Entity Set Representation. Keys

types, but key declarations and constraints Similar CREATE X commands for other schema ëdrop X name" deletes the created element of beer VARCHARè20è,

SQL Tables, Keys, Views, Indexes

Cloud and Big Data Summer School, Stockholm, Aug Jeffrey D. Ullman

COS 318: Operating Systems. File Layout and Directories. Topics. File System Components. Steps to Open A File

Chapter 10: Storage and File Structure

File System Management

CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113

Lecture 18: Reliable Storage

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?

Exceptions in MIPS. know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine

The programming language C. sws1 1

CPU Organization and Assembly Language

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

Efficient Compression Techniques for an In Memory Database System

database abstraction layer database abstraction layers in PHP Lukas Smith BackendMedia

SQL NULL s, Constraints, Triggers

DBMS Questions. 3.) For which two constraints are indexes created when the constraint is added?

Transcription:

Topic for today Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom) How to represent data on disk and in memory "Executive Summary": Principles are rather simple, but there are lots of variations in the details DBMS 2001 Notes 3: Representing Data 1 DBMS 2001 Notes 3: Representing Data 2 Principles of Data Layout Overview Data Items here Attributes of relational tuples (or objects) represented by sequences of bytes called fields Fields grouped together into records representation of tuples or objects Records stored in blocks File: collection of blocks that forms a relation (or the extent of an object class) Records Blocks Files Memory DBMS 2001 Notes 3: Representing Data 3 DBMS 2001 Notes 3: Representing Data 4 What are the data items we want to store? a salary, a name a date, a picture What we have available: Bytes 8 bits DBMS 2001 Notes 3: Representing Data 5 To represent: Integer (short): 2 bytes ( -32000 +32000) e.g., 35 is 00000000 00100011 Integer (long): 4 bytes ( -2x10 9 + 2x10 9 ) Real, floating point (SQL FLOAT) 4 or 8 bytes arithmetic interpretation by hardware DBMS 2001 Notes 3: Representing Data 6 1

To represent: Characters various coding schemes suggested, most popular is ASCII Example (8 bit ASCII): A: 01000001 a: 01100001 5: 00110101 LF: 00001010 To represent: Boolean e.g., TRUE FALSE 1111 1111 0000 0000 Application specific e.g., RED 1 GREEN 2 BLUE 3 YELLOW 4 Can we use less than 1 byte/code? Yes, but only if desperate... DBMS 2001 Notes 3: Representing Data 7 DBMS 2001 Notes 3: Representing Data 8 To represent: To represent: Dates, e.g.: Integer: # days since Jan 1, 1900 8 chars: YYYYMMDD 7 chars: YYYYDDD 10 chars: YYYY-MM-DD (SQL2) (not YYMMDD! Why?) Time, e.g. Integer: seconds since midnight chars: HH:MM:SS[.FF ] (SQL2) String of characters Null terminated e.g., c a t Length given e.g., - Fixed length 3 c a t DBMS 2001 Notes 3: Representing Data 9 DBMS 2001 Notes 3: Representing Data 10 To represent: Bag of bits Key Point Length Bits Fixed length items Variable length items - usually length given at beginning DBMS 2001 Notes 3: Representing Data 11 DBMS 2001 Notes 3: Representing Data 12 2

Also Type of an item: Tells us how to interpret (plus size if fixed) normally indicated in the record schema (see in a moment) Record - Collection of related fields E.g.: Employee record: name field, salary field, date-of-hire field,... DBMS 2001 Notes 3: Representing Data 13 DBMS 2001 Notes 3: Representing Data 14 Types of records: Main choices: FIXED vs VARIABLE FORMAT FIXED vs VARIABLE LENGTH Fixed format A SCHEMA (not record) contains following information - number of fields - type of each field - order in record - meaning of each field DBMS 2001 Notes 3: Representing Data 15 DBMS 2001 Notes 3: Representing Data 16 Example: fixed format and length Employee record (1) E#, 2 byte integer (2) E.name, 10 char. Schema (3) Dept, 2 byte code Variable format Record itself contains format; Self Describing 46 F o r d 02 83 J o n e s 01 Records DBMS 2001 Notes 3: Representing Data 17 DBMS 2001 Notes 3: Representing Data 18 3

Example: variable format and length # Fields 2 5 I 46 4 S 4 Code identifying field as E# Integer type Code for Ename String type Length of str. F o r d Field name codes could also be strings, i.e. tags ( XML as a data interchange format) Variable format useful for: sparse records e.g, patient records with thousands of possible tests repeating fields information integration from heterogeneous sources DBMS 2001 Notes 3: Representing Data 19 DBMS 2001 Notes 3: Representing Data 20 EXAMPLE: var format record with repeating fields Employee one or more children 3 E_name: Fred Child: Sally Child: Tom Note: Repeating fields does not imply - variable format, nor - variable size Fred Sally Tom -- Key is to allocate maximum number of repeating fields (if not used NULL) DBMS 2001 Notes 3: Representing Data 21 DBMS 2001 Notes 3: Representing Data 22 Many variants between fixed and variable format: Ex. #1: Include record type in record 5 27.... record type record length tells me what to expect (i.e. points to schema) Record header - data at beginning that describes record May contain: - indication of record type - record length - time stamp - other stuff... DBMS 2001 Notes 3: Representing Data 23 DBMS 2001 Notes 3: Representing Data 24 4

Ex #2 of variant between FIXED/VAR format Hybrid format one part is fixed, other variable E.g.: All employees have E#, name, dept; Other fields vary. 25 Smith Toy 2 Hobby:chess retired # of var fields Also, many variations in internal organization of record Just to show one: length of field * * * 3 10 F1 5 F2 12 F3 total size 3 32 5 15 20 F1 F2 F3 0 1 2 3 4 5 15 20 32 offsets DBMS 2001 Notes 3: Representing Data 25 DBMS 2001 Notes 3: Representing Data 26 Next: placing records into blocks blocks... a file assume fixed length blocks assume a single relation (for now) DBMS 2001 Notes 3: Representing Data 27 Issues in storing records in blocks: (1) separating records (2) spanned vs. unspanned (3) mixed record types clustering (4) split records (5) sequencing (6) addressing records DBMS 2001 Notes 3: Representing Data 28 (1) Separating records Block R1 R2 (a) fixed size recs. -> no need to separate (b) special marker (c) give record lengths (or offsets) - within each record - in block header (see later) (2) Spanned vs. Unspanned Unspanned: records are within one block block 1 block 2 R1 R2 R4 R5 Spanned: records span block boundaries R1 block 1 block 2 R2 (a) (b) R4 R5 R6 R7 (a)...... DBMS 2001 Notes 3: Representing Data 29 DBMS 2001 Notes 3: Representing Data 30 5

With spanned records: Spanned vs. unspanned: R1 R2 need indication of partial record pointer to rest (a) (b) R4 R5 R6 need indication of continuation (+ from where?) R7 (a) Unspanned is much simpler, but may waste space Spanned necessary if record size > block size (e.g., fields containing large "BLOB"s for, say, MPEG video clips) DBMS 2001 Notes 3: Representing Data 31 DBMS 2001 Notes 3: Representing Data 32 Example (ofunspanned records) 10 6 records each of size 2,050 bytes (fixed) block size = 4096 bytes R1 block 1 block 2 R2 2050 bytes wasted 2046 2050 bytes wasted 2046 Space used about 4 x 10 9 B, about half wasted (3) Mixed record types Mixed - records of different types (e.g. DEPT, EMPLOYEE) allowed in same block e.g., a block: Dep d1 Emp e1 Emp e2 DBMS 2001 Notes 3: Representing Data 33 DBMS 2001 Notes 3: Representing Data 34 Why do we want to mix? Answer: CLUSTERING Records that are frequently accessed together should be in the same block Example Q1: select DEPT.Name, EMP.Name, from DEPT, EMP where DEPT. Name = a block EMP.DeptName DEPT,Name=Toy,... EMP,DeptName=Toy,... EMP,DeptName=Toy,... DBMS 2001 Notes 3: Representing Data 35 DBMS 2001 Notes 3: Representing Data 36 6

If Q1 frequent, clustering is good But consider Q2: SELECT * FROM DEPT If Q2 is frequent, clustering is counterproductive (4) Split records Typically for hybrid format Fixed part in one block Variable part in another block DBMS 2001 Notes 3: Representing Data 37 DBMS 2001 Notes 3: Representing Data 38 Block with fixed recs. R1 (a) R2 (a) Blocks with variable recs. R1 (b) (5) Sequencing Ordering records in file (and block) by some key value R2 (b) R2 (c) Sequential file ( sequenced) DBMS 2001 Notes 3: Representing Data 39 DBMS 2001 Notes 3: Representing Data 40 Why sequencing? Typically to make it possible to efficiently read records in order (e.g., to do a merge-join discussed later) Sequencing Options (a) Next record physically contiguous R1 Next (R1)... (b) Linked R1 Next (R1) DBMS 2001 Notes 3: Representing Data 41 DBMS 2001 Notes 3: Representing Data 42 7

Sequencing Options (c) Overflow area Records in sequence header R1 R2 R4 R5 R2.1 R1.3 R4.7 (6) Addressing records How does one refer to records? Many options: Physical Rx Indirect DBMS 2001 Notes 3: Representing Data 43 DBMS 2001 Notes 3: Representing Data 44 Purely Physical Device ID E.g., Record Cylinder # Address = Track # or ID Block # Offset in block Block ID Fully Indirect E.g., Record ID is arbitrary byte string (say, an object ID) rec ID r map table Rec ID Physical addr. address a DBMS 2001 Notes 3: Representing Data 45 DBMS 2001 Notes 3: Representing Data 46 Tradeoff Flexibility Cost to move records of indirection (for deletions, insertions) Physical Indirect Many options in between DBMS 2001 Notes 3: Representing Data 47 DBMS 2001 Notes 3: Representing Data 48 8

Ex #1 Indirection in block Header A block: R4 R2 R1 Free space Block header - data at beginning that describes block May contain: - File ID (or RELATION or DB ID); the ID of this block - Record directory; Pointer to free space - Type of block (e.g. contains records of type 4; is overflow, ) - Pointer to other blocks like it (say, if part of an index structure) - Timestamp... DBMS 2001 Notes 3: Representing Data 49 DBMS 2001 Notes 3: Representing Data 50 Issues in storing records in blocks (1) Separating records (2) Spanned vs. Unspanned (3) Mixed record types - Clustering (4) Split records (5) Sequencing (6) Addressing records here Other Topics (1) Insertion/Deletion (2) Pointer Swizzling (3) Comparison of Schemes DBMS 2001 Notes 3: Representing Data 51 DBMS 2001 Notes 3: Representing Data 52 (1) Deletion Block Rx Options: (a) Immediately reclaim space (b) Mark deleted May need chain of deleted records (for re-use) If there are pointers to the record, either need to set them to NULL or indicate the record space reclaimed DBMS 2001 Notes 3: Representing Data 53 DBMS 2001 Notes 3: Representing Data 54 9

As usual, many tradeoffs... How expensive is to move valid record to free space for immediate reclaim? How much space is wasted? e.g., deleted records, delete fields, free space chains,... Concern with deletions Dangling pointers R1? DBMS 2001 Notes 3: Representing Data 55 DBMS 2001 Notes 3: Representing Data 56 A solution: Tombstones (a) Leave MARK in old location with physical IDs: A block A Solution : Tombstones (b) Leave MARK in map table with logical Ids: Map table ID LOC This space never re-used This space can be re-used 7788 Never reuse ID 7788 nor space in map... DBMS 2001 Notes 3: Representing Data 57 DBMS 2001 Notes 3: Representing Data 58 (1) Insert Easy case: records not in sequence Insert new record at end of file, or in any block with space for it Insert Hard case: records in sequence If free space close by, not too bad - can "slide" records to preceeding/suceeding blocks - pointers to moved records require forwarding addresses Or use additional blocks as an overflow area DBMS 2001 Notes 3: Representing Data 59 DBMS 2001 Notes 3: Representing Data 60 10

Interesting problems: How much free space to leave in each block, track, cylinder? How often to reorganize file + overflow? (2) Pointer Swizzling Two kinds of record addresses: DB address physical location on secondary storage Memory address record location when loaded into (main or virtual) memory Which one to use, and when? DBMS 2001 Notes 3: Representing Data 61 DBMS 2001 Notes 3: Representing Data 62 Pointer Swizzling One Option: block 1 Memory Disk block 1 Translation DB Addr Mem Addr Table Rec-A Rec-A-inMem block 2 Rec A Rec A block 2 For all records copied to memory DBMS 2001 Notes 3: Representing Data 63 DBMS 2001 Notes 3: Representing Data 64 Another Option: In memory pointers - need type bit to disk Swizzling Automatic ("eager") On-demand ("lazy") No swizzling / program control M to memory DBMS 2001 Notes 3: Representing Data 65 DBMS 2001 Notes 3: Representing Data 66 11

Comparison There are about 10,000,000 ways to organize my data on disk Issues: Flexibility Space Utilization Which one is right for me? Complexity Performance DBMS 2001 Notes 3: Representing Data 67 DBMS 2001 Notes 3: Representing Data 68 To evaluate a given strategy, estimate following parameters: -> space used for expected data -> expected time to - fetch record given its key - fetch record with next key - insert/delete/update record - scan all records in the file - reorganize file Example How would you design Megatron 3000 storage system? (for a relational DB, low end) Variable length records? Spanned? What data types? Fixed format? Record IDs? Sequencing? How to handle deletions? DBMS 2001 Notes 3: Representing Data 69 DBMS 2001 Notes 3: Representing Data 70 Summary How to lay out data on disk Data Items Records Blocks Files Memory DBMS DBMS 2001 Notes 3: Representing Data 71 Next How to find a record quickly, given a key DBMS 2001 Notes 3: Representing Data 72 12