The Classical Architecture. Storage 1 / 36



Similar documents
Storage in Database Systems. CMPSCI 445 Fall 2010

Storing Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7

Lecture 1: Data Storage & Index

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

Operating Systems. Virtual Memory

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Data Storage - I: Memory Hierarchies & Disks

Computer Architecture

Physical Data Organization

Outline. Principles of Database Management Systems. Memory Hierarchy: Capacities and access times. CPU vs. Disk Speed

Database Management Systems

6. Storage and File Structures

Chapter 3. Database Environment - Objectives. Multi-user DBMS Architectures. Teleprocessing. File-Server

Storing Data: Disks and Files

Chapter 11 I/O Management and Disk Scheduling

Secondary Storage. Any modern computer system will incorporate (at least) two levels of storage: magnetic disk/optical devices/tape systems

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 5 - DBMS Architecture

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University. (

COS 318: Operating Systems. Virtual Memory and Address Translation

Price/performance Modern Memory Hierarchy

Chapter 12 File Management

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

COS 318: Operating Systems

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

1 Storage Devices Summary

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Overview of I/O Performance and RAID in an RDBMS Environment. By: Edward Whalen Performance Tuning Corporation

Operating Systems, 6 th ed. Test Bank Chapter 7

Lecture 17: Virtual Memory II. Goals of virtual memory

System Architecture. CS143: Disks and Files. Magnetic disk vs SSD. Structure of a Platter CPU. Disk Controller...

File Management. Chapter 12

With respect to the way of data access we can classify memories as:

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

DataBlitz Main Memory DataBase System

Topics in Computer System Performance and Reliability: Storage Systems!

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Leveraging EMC Fully Automated Storage Tiering (FAST) and FAST Cache for SQL Server Enterprise Deployments

Sistemas Operativos: Input/Output Disks

Using Synology SSD Technology to Enhance System Performance Synology Inc.

KVM & Memory Management Updates

Physical DB design and tuning: outline

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy

Recovery and the ACID properties CMPUT 391: Implementing Durability Recovery Manager Atomicity Durability

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Flash for Databases. September 22, 2015 Peter Zaitsev Percona

Board Notes on Virtual Memory

Chapter 1 File Organization 1.0 OBJECTIVES 1.1 INTRODUCTION 1.2 STORAGE DEVICES CHARACTERISTICS

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

IN-MEMORY DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: In-Memory DBMS - SoSe

Oracle Rdb Performance Management Guide

RAID HARDWARE. On board SATA RAID controller. RAID drive caddy (hot swappable) SATA RAID controller card. Anne Watson 1

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

Tivoli Storage Manager Explained

Configuring Apache Derby for Performance and Durability Olav Sandstå

Storage and File Structure

Q & A From Hitachi Data Systems WebTech Presentation:

Oracle Database 11 g Performance Tuning. Recipes. Sam R. Alapati Darl Kuhn Bill Padfield. Apress*

Enhancing SQL Server Performance

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer.

File-System Implementation

Devices and Device Controllers

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

In-memory Tables Technology overview and solutions

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

DBMS Questions. 3.) For which two constraints are indexes created when the constraint is added?

Chapter 13. Disk Storage, Basic File Structures, and Hashing

MS SQL Performance (Tuning) Best Practices:

Computer Systems Structure Main Memory Organization

COS 318: Operating Systems. Storage Devices. Kai Li and Andy Bavier Computer Science Department Princeton University

Chapter 11: File System Implementation. Operating System Concepts with Java 8 th Edition

Disks and RAID. Profs. Bracy and Van Renesse. based on slides by Prof. Sirer

Database storage management with object-based storage devices

HP Smart Array Controllers and basic RAID performance factors

Indexing on Solid State Drives based on Flash Memory

Database Fundamentals

MySQL Cluster New Features. Johan Andersson MySQL Cluster Consulting johan.andersson@sun.com

Speeding Up Cloud/Server Applications Using Flash Memory

FAST 11. Yongseok Oh University of Seoul. Mobile Embedded System Laboratory

File System Management

Database System Architecture & System Catalog Instructor: Mourad Benchikh Text Books: Elmasri & Navathe Chap. 17 Silberschatz & Korth Chap.

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Virtuoso and Database Scalability

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.

Data Management for Portable Media Players

WHITE PAPER Optimizing Virtual Platform Disk Performance

Transcription:

1 / 36

The Problem Application Data? Filesystem Logical Drive Physical Drive 2 / 36

Requirements There are different classes of requirements: Data Independence application is shielded from physical storage physical storage can be reorganized hardware can be changed Scalability must scale to (nearly) arbitrary data sizes fast retrieval efficient access to individual data items updating arbitrary data Reliability data must never be lost must cope with hardware and software failures... 3 / 36

Layer Architecture implementing all these requirements on the bare metal is hard and not desirable a DBMS must be maintainable and extensible Instead: use a layer architecture the DBMS logic is split into levels of functionality each level is implemented by a specific layer each layer interacts only with the next lower layer simplifies and modularizes the code 4 / 36

A Simple Layer Architecture Purpose Access Granularity query translation and optimization query layer declarative queries sets of records managing records and access paths DB buffer and hardware interface access layer storage layer records page DB 5 / 36

A Simple Layer Architecture (2) layers can be characterized by the data items they manipulate lower layer offers functionality for the next higher level keeps the complexity of individual layers reasonable rough structure: physical low level high level This is a reasonable architecture, but simplified. A more detailed architecture is needed for a complete DBMS. 6 / 36

A More Detailed Architecture granularity: relation, view,... application Query Interface SQL,... Record Interface FIND NEXT record, STORE record Record Access write record, insert in B-tree,... DB Buffer access page j, release page j File Interface read block k, write block k Device Interface granularity: data structures: granularity: granularity: data structures: granularity: granularity: data structures: granularity: granularity: data structures: granularity: granularity: data structures: granularity: relation, view,... logical schema, integrity constraints logical record, key,... logical record, key,... access path, physical schema... physical record,... physical record,... free space inventory, page indexes... page, segment page, segment page table, block map... block, file block, file free space inventory, extent table... track, cylinder,... DB logical data access paths physical data page structure storage allocation external storage 7 / 36

A More Detailed Architecture (2) A few pieces are still missing: transaction isolation recovery but otherwise it is a reasonable architecture. Some system deviate slightly from this classical architecture most DBMSs nowadays delegate drive access to the OS some DBMSs delegate buffer management to the OS (tricky, though) a few DBMSs allow for direct logical record access... 8 / 36

Influence of Hardware Must take hardware into account when designing a storage system. For a long time dominated by Moore s Law: The number of transistors on a chip doubles every 18 month. Indirectly drove a number of other parameters: main memory size CPU speed no longer true! HD capacity start getting problematic, too. density is very high only capacity, not access time Later we will look at these again. 9 / 36

Memory Hierarchy capacity latency bytes 1ns K-M bytes <10ns G bytes <100ns register cache main memory T bytes ms T bytes sec T-P bytes sec-min external storage (online) archive storage (nearline) archive storage (offline) 10 / 36

Memory Hierarchy (2) There are huge gaps between hierarchy levels traditionally, main memory vs. disk is most important but memory vs. cache etc. also relevant The DBMS must aim to maximize locality. 11 / 36

Hard Disk Access Hard Disks are still the dominant external storage: rotating platters, mechanical effects transfer rate: ca. 150MB/s seek time ca. 3ms huge imbalance in random vs. sequential I/O! The DBMS must take these effects into account sequential access is much more efficient gap is growing instead of shrinking even SSDs are slightly asymmetric (and have other problems) 12 / 36

Hard Disk Access (2) Techniques to speed up disk access: do not move the head for every single tuple instead, load larger chunks typical granularity: one page page size varies. traditionally 4KB, nowadays often 16K and more page size is a trade-off 1 2 3 4 5 6 7 8 9 10 11... 13 / 36

Hard Disk Access (3) The page structure is very prominent within the DBMS granularity of I/O granularity of buffering/memory management granularity of recovery Page is still too small to hide random I/O though sequential page access is important DBMSs use read-ahead techniques asynchronous write-back 14 / 36

Buffer Management Some pages are accessed very frequently reduce I/O by buffering/caching buffer manager keeps active pages in memory limited size, discards/write back when needed coupled with recovery, in particular logging Basic interface: 1. FIX(pageNo,shared) 2. UNFIX(pageNo,dirty) Pages can only be accessed (or modified) when they are fixed. 15 / 36

Buffer Managment Hash Table Buffer Frames PageNo Latch LSN State Data The buffer manager itself is protected by one or more latches. 16 / 36

Buffer Frame Maintains the state of a certain page within the buffer pageno latch LSN state data the page number a read/writer lock to protect the page (note: must not block unrelated pages!) LSN of the last change, for recovery (buffer manager must force the log before writing) clean/dirty/newly created etc. the actual data contained on the page (will usually contain extra information for buffer replacement) Usually kept in a hash table. 17 / 36

Buffer Replacement When memory is full, some buffer pages have to be replaced clean pages can be simply discarded dirty pages have to be written back first discarded pages are replaced with new pages page fault rate RANDOM replacement strategies OPT buffer size 18 / 36

Buffer Replacement - FIFO First In - First Out simple replacement strategy buffer frames are kept in a linked list inserted at the end, remove from the head old pages are removed first Does not take locality into account. 19 / 36

Buffer Replacement - LRU Least Recently Used similar to FIFO, buffer frames are kept in a double-linked list remove from the head when a frame is unfixed, move it to the end of the list hot pages are kept in the buffer A very popular strategy. Latching requires some care. 20 / 36

Buffer Replacement - LFU Least Frequently Used remembers the number of access per page in-frequently used pages are remove first maintains a priority queue of pages Sounds plausible, but too expensive in practice. 21 / 36

Buffer Replacement - Second Chance LRU is nice, but the LRU list is a hot spot. Idea: use a simpler mechanism to simulate LRU one bit per page bit is set when page is unfixed when replacing, replace pages with unset bit set bits are cleared during the process strategy is called second chance or clock Easy to implement, but a bit crude. 22 / 36

Buffer Replacement - 2Q Maintain not one queue but two many pages are referenced only once some pages are hot and reference frequently maintain a separate list for those 1. maintain all pages in FIFO queue 2. when a page is references again that is currently in FIFO, move it into an LRU queue 3. prefer evicting from FIFO Hot pages are in LRU, read-once pages in FIFO. Good strategy for common DBMS operations. 23 / 36

Buffer Replacement - Hints Application knowledge can help buffer replacement 2Q tries to recognize read-once pages these occur when scanning over data but the DBMS knows this anyway! it could therefore give hints when unfixing e.g., will-need, or will-not-need (changes placement in queue) 24 / 36

Segments While page granularity is fine for I/O, it is somewhat unwieldy most structures within a DBMS span multiple pages relations, indexes, free space management, etc. convenient to treat these as one entity all DBMS pages are partitioned into sets of pages Such a set of pages is called a segment. Conceptually similar to a file or virtual memory. 25 / 36

Segments (2) A segment offers a virtual address space within the DBMS can allocate and release new pages can iterate over all pages can drop the whole segment optionally offers a linear address space Greatly simplifies the logic of higher layers. 26 / 36

Block Allocation Catalog Catalog Catalog static file-mapping dynamic extent-mapping dynamic block-mapping 27 / 36

Block Allocation All approaches have pros and cons: static file-mapping very simple, low overhead resizing is difficult dynamic block-mapping maximum flexibility administrative overhead, additional indirection dynamic extent-mapping can handle growth slight overhead In most cases extent-based mapping is preferable. 28 / 36

Block Allocation (5) Dynamic extent-mapping: grows by adding a new extent should grow exponentially (e.g., factor 1.25) bounds the number of extents reduces both complexity and space consumption and helps with sequential I/O! 29 / 36

Segment Types Segments can be classified into types private vs. public permanent vs. temporary automatic vs. manual with recovery vs. without recovery Differ in complexity and required effort. 30 / 36

Standard Segments Most DBMS will need at least two low-level segments: segment inventory keeps track of all pages allocated to segments keeps extent lists or page tables or... free space inventory keeps track of free pages maintains bitmaps or free extents or... High-level segments built upon these. Common high-level segments: schema relations temp segments (created and discard on demand)... 31 / 36

Update Strategies DBMSs have different update behavior force force steal steal usually one prefers steal, force but then pages contain dirty data when using update-in-place, dirty data is on on disk complicates recovery transactions could see dirty data 32 / 36

Shadow Paging shadow copy 1 2 8 4 5 7 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 10 11... uses a page table, dirty pages are stored in a shadow copy. 33 / 36

Shadow Paging (2) Advantages: the clean data is always available on disk greatly simplified recovery can be used for transaction isolation, too Disadvantages: complicates the page access logic destroys data locality Nowadays rarely used in disk-based systems. 34 / 36

Delta Files Similar idea to shadow paging: on change pages are copied to a separate file a copied page can be changed in-place on commit discard the file, on abort copy back Can be implemented in two flavors: store a clean copy in the delta store the dirty data in the delta Both have pros and cons. 35 / 36

Delta Files (2) Delta files have some advantages over shadow paging: preserve data locality no mixture of clean and dirty pages Disadvantages: cause more I/O abort (or commit) becomes expensive keeping track of delta pages is non-trivial Still, often preferable over shadow paging. 36 / 36