Database 2 Lecture I. Alessandro Artale



Similar documents
CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

6. Storage and File Structures

Principles of Database Management Systems. Overview. Principles of Data Layout. Topic for today. "Executive Summary": here.

Storage and File Structure

Outline. Principles of Database Management Systems. Memory Hierarchy: Capacities and access times. CPU vs. Disk Speed

Physical Data Organization

Storage in Database Systems. CMPSCI 445 Fall 2010

Storing Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7

University of Dublin Trinity College. Storage Hardware.

Storing Data: Disks and Files

Record Storage and Primary File Organization

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

Chapter 13. Disk Storage, Basic File Structures, and Hashing

Chapter 10: Storage and File Structure

Data Storage - I: Memory Hierarchies & Disks

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Chapter 1 File Organization 1.0 OBJECTIVES 1.1 INTRODUCTION 1.2 STORAGE DEVICES CHARACTERISTICS

How To Create A Table In Sql (Ahem)

Secondary Storage. Any modern computer system will incorporate (at least) two levels of storage: magnetic disk/optical devices/tape systems

File System Management

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.

File Management. Chapter 12

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

SQL DATA DEFINITION: KEY CONSTRAINTS. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 7

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Database Systems. Session 8 Main Theme. Physical Database Design, Query Execution Concepts and Database Programming Techniques

Chapter 2 Data Storage

Lecture 1: Data Storage & Index

Unit Storage Structures 1. Storage Structures. Unit 4.3

Database Fundamentals

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?

4 Logical Design : RDM Schema Definition with SQL / DDL

CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

CHAPTER 17: File Management

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

Database Systems Introduction Dr P Sreenivasa Kumar

File Systems Management and Examples

Lecture 16: Storage Devices

System Architecture. CS143: Disks and Files. Magnetic disk vs SSD. Structure of a Platter CPU. Disk Controller...

In This Lecture. SQL Data Definition SQL SQL. Notes. Non-Procedural Programming. Database Systems Lecture 5 Natasha Alechina

Chapter 12 File Management

The Classical Architecture. Storage 1 / 36

Outline. CS 245: Database System Principles. Notes 02: Hardware. Hardware DBMS Data Storage

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

Classification of Physical Storage Media. Chapter 11: Storage and File Structure. Physical Storage Media (Cont.) Physical Storage Media

Chapter 11: File System Implementation. Operating System Concepts with Java 8 th Edition

IT2304: Database Systems 1 (DBS 1)

Raima Database Manager Version 14.0 In-memory Database Engine

Introduction to Databases

Chapter 8: Structures for Files. Truong Quynh Chi Spring- 2013

Chapter 12 File Management

CSE 562 Database Systems

Chapter 12 File Management. Roadmap

Database Management Systems

Chapter 1: Introduction. Database Management System (DBMS) University Database Example

Data Storage - II: Efficient Usage & Errors

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Two Parts. Filesystem Interface. Filesystem design. Interface the user sees. Implementing the interface

Database Design and Programming

Network Model APPENDIXD. D.1 Basic Concepts

Main Memory & Backing Store. Main memory backing storage devices

How To Manage Data In A Database System

File-System Implementation

Chapter 1: Introduction

Lecture 18: Reliable Storage

Chapter 10: Mass-Storage Systems

Services. Relational. Databases & JDBC. Today. Relational. Databases SQL JDBC. Next Time. Services. Relational. Databases & JDBC. Today.

Overview. Introduction to Database Systems. Motivation... Motivation: how do we store lots of data?

Relational Database Systems 2 1. System Architecture

The Linux Virtual Filesystem

Operating Systems. Virtual Memory

An overview of FAT12

IT2305 Database Systems I (Compulsory)

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

FAWN - a Fast Array of Wimpy Nodes

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

A Brief Introduction to MySQL

SQL Server Table Design - Best Practices

5.1 Database Schema Schema Generation in SQL

DATABASE DESIGN - 1DL400

Hypertable Architecture Overview

EECS 647: Introduction to Database Systems

Outline. mass storage hash functions. logical key values nested tables. storing information between executions using DBM files

Fundamentals of Database Design

AQA GCSE in Computer Science Computer Science Microsoft IT Academy Mapping

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Retrieving Data Using the SQL SELECT Statement. Copyright 2006, Oracle. All rights reserved.

COS 318: Operating Systems. File Layout and Directories. Topics. File System Components. Steps to Open A File

Introduction to Database Systems. Chapter 1 Introduction. Chapter 1 Introduction

Introduction to Databases

Data Storage Solutions

Embedded SQL programming

Mass Storage Structure

CHAPTER 13: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING

Transcription:

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (1) Database 2 Lecture I Alessandro Artale Faculty of Computer Science Free University of Bolzano Room: 221 artale@inf.unibz.it http://www.inf.unibz.it/ artale/ 2003/2004 First Semester

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (2) Course Overview Techniques of record storage on disk; Indexing techniques for efficient records retrieval; Query processing and optimization; Transaction management Concurrency control Error recovery techniques Transaction management in Distributed Databases

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (3) Readings Database Systems: The Complete Book, Hector Garcia-Molina, Jeff Ullman, and Jennifer Widom, Prentice-Hall, 2002. Fundamentals of Database Systems, R. Elmasri and S. B. Navathe, Addison Wesley, Edition, 2003. Database System Concepts, A.Silberschatz, H.F.Korth, S.Sudarshan, Mc Graw Hill, 2002. edition,

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (4) Summary of Lecture I Introduction to Databases. Storing a Database. Storing a toy database; An overview of secondary storage devices; Physical layout of data; Efficient techniques for storing DB Relations.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (5) What is a Database? A Database is a collection of data with a correlated meaning which are stored and manipulated. A database represents some aspect of the real world called the mini-world or Universe of Discourse (UoD).

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (6) What is a Database Management System? A Database Management System (DBMS) is a general-purpose software which facilitates the process of creating, storing and manipulating databases in an efficient way (both in space and time). DBMS main features: 1. Data must persist over long period of time; 2. Supports efficient access to very large amounts of data (gigabytes or more); 3. Support for concurrent access to data: Many different processes (transactions) can access the Database at once; 4. Recovery from system failures; 5. Security: Users must be authenticated.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (7) A Database Management System

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (8) Summary Introduction to Databases. Storing a Database. Storing a toy database; An overview of secondary storage devices; Physical layout of data; Efficient techniques for storing DB Relations.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (9) Storing a Database Databases are generally stored on Disk rather than in main memory. Size. Databases for enterprise contain giga or tera-byte of data. Volatility. Data must be preserved and main memory is volatile.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (10) Storing a Toy Database NAME ID DEPT Smith 123 CS Johnson 522 ES NAME CS ES OFFICE F10 L12 DB schema stored in a special file /usr/db/schema: Students#name#STR#id#INT#dept#STR Depts#name#STR#office#STR Relations stored in files, e.g., relation Students in /usr/db/students: Smith#123#CS Johnson#522#ES

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (11) Query Execution: Selection SELECT * FROM R WHERE Condition 1. Read the schema file to determine attributes of R and their types; 2. Read the R file and for each line: (a) Check the condition; (b) Display the line if the condition is true.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (12) Query Execution: Join SELECT office FROM Students, Depts WHERE Students.name = Smith AND Students.dept = Depts.name. 1. Read the schema file to determine attributes of Students and Depts and their types; 2. Read in turn each line/tuple from files Students and Depts, and: (a) Check whether the name of the student is Smith, and if true (b) Check whether the current pair of tuples represent the same department, and if true (c) Display the office.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (13) What is Wrong 1. No flexibility for DB modifications: For each change the entire file has to be rewritten. 2. Search Expensive: Selection: Always read entire relation; Join: It is not necessary to look at all pairs of tuples more efficient join operations. 3. No buffered data: Everything comes off the disk all the time. 4. No concurrency control: Several users can modify a file at the same time with unpredictable results. 5. No reliability: We can lose data in a crash. 6. No security: File system protection too coarse.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (14) Summary Introduction to Databases. Storing a Database. Storing a toy database; An overview of secondary storage devices; Physical layout of data; Efficient techniques for storing DB Relations.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (15) The Memory Hierarchy! Database System Implementation, H. Garcia-Molina, J. Ullman, and J. Widom, Prentice-Hall, 2000.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (16) The Memory Hierarchy (cont.) Cache: fastest. Access time: nanoseconds ( ), size: perhaps 1Mb ( ). Main memory. Access time: under a microsecond ( 128Mb. ), size: perhaps Secondary storage: Typically magnetic disks. Access times in milliseconds (unit of read/write = block or page, typically 4Kb). Capacities in gigabytes ( ). Tertiary storage. Access times in seconds, capacities in terabytes ( ). Most common: racks of tapes, CD-ROMs.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (17) Hard Disk Physical Structure Database System Implementation, H. Garcia-Molina, J. Ullman, and J. Widom, Prentice-Hall, 2000.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (18) Hard Disk Logical Structure Database System Implementation, H. Garcia-Molina, J. Ullman, and J. Widom, Prentice-Hall, 2000. Surfaces covered with concentric Tracks. Tracks at a common radius = Cylinder. Important because all data of a cylinder can be read quickly, without moving the heads. Floppy disk: 40 cylinders; typical magnetic disk: 10,000 cylinders (optical disk: several times that).

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (19) Hard Disk Logical Structure (cont.) Tracks divided into Sectors by unmagnetized gaps. Sector is indivisible unit for read/write and error detection. Usually one sector = Bytes. A bad sector is cancelled by the disk controller, so it is never used. Sectors are grouped into Blocks: logical units for read/write. Typical one block Bytes, i.e., sectors.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (20) Accessing Disk Blocks The following events occur when an application requests a particular tuple : 1. The DBMS determines which of the blocks contains ; 2. The block is read into a cache buffer (if not already there); 3. ready to be read from the application. In particular, modifying a block requires: 1. Read the block into a cache buffer; 2. Modify the block there; 3. Write the block back to disk.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (21) Efficient Use of Disk Data in DBMS applications do not fit in main memory. Algorithms should limit the number of disk accesses. Disk I/O. Read or write of a block is very expensive compared with what is likely to be done with the block once it arrives in main memory. Perhaps 1,000,000 machine instructions in the same time a disk I/O is performed. I/O Model of Computation. The number of block reads and writes is a good approximation of the running time of the algorithm and should be minimized.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (22) I/O Strategies Here are some techniques that make secondary-memory algorithms more efficient. Group related blocks by cylinder. If we are going to read or write blocks in a known order, place them by cylinder, so once we reach that cylinder we can read block after block. Prefetching. Load in cache buffers the blocks that are needed in advance. Similarly, keep output blocks in cache buffers until it is convenient to write them to disk. LRU (Least Recently Used) technique: The block that has not been referenced for the longest time is written back to the disk.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (23) Summary Introduction to Databases. Storing a Database. Storing a toy database; An overview of secondary storage devices; Physical layout of data; Efficient techniques for storing DB Relations.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (24) Physical Layout of Data Tuples and relations are stored in secondary storages following the block model. MOVIE TITLE YEAR LENGTH FILMTYPE STUDIONAME Star Wars 1977 124 color Fox Mighty Ducks 1991 104 color Disney Wayne s World 1992 95 color Paramount Attribute. Sequence of bytes representing the value of an attribute in a tuple, called Field. Tuple. Sequence of bytes divided into fields, called Record. Relation. A collection of records that form the extent of a relation is stored in a File as a collection of blocks.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (25) Physical Layout of Data (Cont.) We need to discuss the following issues: 1. How do we represent datatypes as fields? 2. How do we represent tuples as records? 3. How do we represent collections of records in blocks of memory? 4. How do we cope with record sizes that maybe different for different tuples? 5. How do we represent relations as collections of blocks? 6. What happens if the size of a record changes because of an update?

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (26) Attributes: How we Represent Datatypes as Fields (1) The representation of an attribute depends by the associated SQL datatype. In general, datatypes can be represented by a fixed length field made by a sequence of bytes. Integers, Reals: 2-8 byte fields. Fixed-length Character Strings, CHAR(n): array of bytes (possibly filled with pad characters.) Ex. of a CHAR(10) and cat as a string: cat Variable-length Character Strings, VARCHAR(n): array of regardless of how effectively long the string is. Two methods: bytes, Lenght plus Content. Ex. of a CHAR(10) and cat as a string: cat Null Terminated String. Ex. of a CHAR(10) and cat as a string: cat and the remaining positions are irrelevant.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (27) Attributes (Cont.) Date: array of fixed length (SQL2 uses arrays of 10 bytes). Sequence of bits, BIT(n): each put to. bits are packed in byte. Unused bits are

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (28) Fixed-Length Records: How we Represent Tuples as Records (2) Tuples are represented by records. When all fields of the record have a fixed length we simply concatenate the fields. CREATE TABLE EMPLOYEE ( INITIAL CHAR, LAST-NAME VARCHAR(15), ADDRESS VARCHAR(255), SSN CHAR(5), BDATE DATE, EMP-ID INT, SALARY DECIMAL(10,2) ); Each record of type Employee takes bytes. To locate a single record s field the associated schema file is consulted.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (29) Storing Collections of Fixed-Length Records: How we Represent Collections of Records in Blocks of Memory (3) Fixed-Length Records representing tuples of a relation are packed in blocks following the block header. The Bloch Header contains the following information: Info about wich relation the records of this block belong to; Pointers to other blocks storing tuples of the same relation; Offset for each record in the block; Timestamp indicating the block s last update; etc... Block Header record 1 record 2 record n

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (30) Variable-Length Records 1. Fields that vary in length. We can choose to allocate exact space for storing a VARCHAR string. 2. Very large fields. Fields can contain pictures, movies or sounds: These fields are very large and they do not fit within blocks.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (31) Records with Variable-Length Fields Put the Fixed-Length fields before the Variable-Length fields. The record header contains additional information: 1. The length of the whole record; 2. Pointers (i.e., an offset table) to the beginning of each Variable-Length field (excluding the first of them). Example. EMPLOYEE(INITIAL,LAST-NAME,ADDRESS,SSN,BDATE,EMP-ID,SALARY) Record Header 42 J 12345 5/7/1964 543 2235.00 Smith Queen Rd. Space allocated for this record with variable-length fields: bytes (instead of bytes).

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (32) Records with Large Fields Problem: The record does not fit in one block. Solution: Spanned Records, records are split between two or more blocks. Record Fragment: portion of the spanned record allocated in one block. The record (fragment) header contains additional information: 1. A bit indicating if the record is a fragment; 2. If it is a fragment, then two pointers for the next and previous fragment for the same record (the first fragment has a null previous pointer, while the last fragment has a null successive pointer). Spanned Records are also used for storing records slightly larger then half a block for saving space.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (33) Relations with Variable-Length Records: How we Cope with Records with Different Sizes (4) The header of the block contains an Offset Table holding the offset of the records within the block. The records are stored starting from the end of the block, while the offset table grows from the front end. header off. table unused record 3record 2record 1 Since we don t know in advance how many records the block will hold this solution allows to dynamically allocate space for the offset table.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (34) Summary Introduction to Databases. Storing a Database. Storing a toy database; An overview of secondary storage devices; Physical layout of data; Efficient techniques for storing DB Relations.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (35) Storing DB Relations: How we Represent Relations as Collections of Blocks (5) Each Relation is stored in a separate file of records. We use the term Data File to refer to the file of records which stores a given Relation. Data files can have the following structures: 1. Heap File Organization. Records are stored anywhere in the file where there is space. 2. Sequential File Organization. Records are stored in sequential order, according to the value of some field(s). NOTE: In the following we consider the Heap File Organization. The Sequential File Organization will be discussed in the next Lecture on Indexing Techniques.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (36) Update, Deletion and Insertion in Heap Files with Fixed-Length Records Updating an existing record is harmless for Fixed-Length Records. We need to optimize both Time and Space for Deletion/Insertion into a heap file. Technique 1. When a record is deleted the last record is moved into the space occupied by the deleted record. Technique 2. Usually, insertion is more frequent than deletion: 1. Wait for a new insertion to occupy the deleted space. 2. Maintain a list of pointers to the free space. Each block contains a Block header with a pointer to the first free space, while this first record contains the pointer to the next free space, and so on. 3. A new record is inserted in the free space pointed to by the header. The header pointer is changed to point to the next free space. 4. If the header pointer is null the record is simply appended.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (37) Deletion and Insertion in Heap Files. An Example Account Branch Balance header record 0 record 1 record 2 record 3 record 4 record 5 record 6 record 7 record 8 A-101 Downtown 500 A-110 Downtown 600 A-201 Perryridge 900 A-217 Brighton 750 A-222 Redwood 700 A-305 Round Hill 350

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (38) Deletion and Insertion in Heap Files with Variable-Length Records Deletion: Free the space for both the record and its pointer, compact both the remaining records andt the offset table so that there is always a unique unused region in the center. Insertion: Find a block with enough space, or get a new block, and put the record there introducing a new pointer in the offset table.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (39) Updates in Heap Files with Variable-Length Records We have problems similar to both insertion and deletion. 1. If the updated record is longer than the old version we require to either sliding records around the block or moving the record into another block. In both cases the offset table must be updated. 2. If the updated record shrinks we have the opportunity to free some space as with the deletion case.

Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (40) Summary of Lecture I Introduction to Databases. Storing a Database. Storing a toy database; An overview of secondary storage devices; Physical layout of data; Efficient techniques for storing DB Relations.