Lecture 16: Storage Devices

Similar documents
COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University. (

COS 318: Operating Systems. Storage Devices. Kai Li and Andy Bavier Computer Science Department Princeton University

Sistemas Operativos: Input/Output Disks

Devices and Device Controllers

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

Disks and RAID. Profs. Bracy and Van Renesse. based on slides by Prof. Sirer

Introduction to I/O and Disk Management

Chapter 10: Mass-Storage Systems

Price/performance Modern Memory Hierarchy

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer.

Storage and File Systems. Chester Rebeiro IIT Madras

File System Management

Chapter 12: Mass-Storage Systems

Lecture 18: Reliable Storage

Operating System Concepts. Operating System 資 訊 工 程 學 系 袁 賢 銘 老 師

Mass Storage Structure

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

1 File Management. 1.1 Naming. COMP 242 Class Notes Section 6: File Management

Chapter 11 I/O Management and Disk Scheduling

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Chapter 12: Secondary-Storage Structure

University of Dublin Trinity College. Storage Hardware.

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

I/O Management. General Computer Architecture. Goals for I/O. Levels of I/O. Naming. I/O Management. COMP755 Advanced Operating Systems 1

Data Storage - I: Memory Hierarchies & Disks

Disk Storage & Dependability

4.2: Multimedia File Systems Traditional File Systems. Multimedia File Systems. Multimedia File Systems. Disk Scheduling

Outline. Principles of Database Management Systems. Memory Hierarchy: Capacities and access times. CPU vs. Disk Speed

Nasir Memon Polytechnic Institute of NYU

Q & A From Hitachi Data Systems WebTech Presentation:

Outline. CS 245: Database System Principles. Notes 02: Hardware. Hardware DBMS Data Storage

File Systems Management and Examples

Solid State Drive (SSD) FAQ

The Classical Architecture. Storage 1 / 36

Data Distribution Algorithms for Reliable. Reliable Parallel Storage on Flash Memories

Secondary Storage. Any modern computer system will incorporate (at least) two levels of storage: magnetic disk/optical devices/tape systems

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Storage in Database Systems. CMPSCI 445 Fall 2010

Storing Data: Disks and Files

Chapter 11 I/O Management and Disk Scheduling

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

William Stallings Computer Organization and Architecture 7 th Edition. Chapter 6 External Memory

Computer Architecture

TELE 301 Lecture 7: Linux/Unix file

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Robert Wagner

Two Parts. Filesystem Interface. Filesystem design. Interface the user sees. Implementing the interface

Data Storage - II: Efficient Usage & Errors

Flash for Databases. September 22, 2015 Peter Zaitsev Percona

Indexing on Solid State Drives based on Flash Memory

Data Storage and Backup. Sanjay Goel School of Business University at Albany, SUNY

Solid State Drive Architecture

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

System Architecture. CS143: Disks and Files. Magnetic disk vs SSD. Structure of a Platter CPU. Disk Controller...

OS OBJECTIVE QUESTIONS

UBIFS file system. Adrian Hunter (Адриан Хантер) Artem Bityutskiy (Битюцкий Артём)

Outline. Failure Types

Input / Ouput devices. I/O Chapter 8. Goals & Constraints. Measures of Performance. Anatomy of a Disk Drive. Introduction - 8.1

I/O. Input/Output. Types of devices. Interface. Computer hardware

A Deduplication File System & Course Review

Taking Linux File and Storage Systems into the Future. Ric Wheeler Director Kernel File and Storage Team Red Hat, Incorporated

UNIX File Management (continued)

Linux flash file systems JFFS2 vs UBIFS

Chapter 9: Peripheral Devices: Magnetic Disks

1 Storage Devices Summary

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

How To Write On A Flash Memory Flash Memory (Mlc) On A Solid State Drive (Samsung)

How To Improve Performance On A Single Chip Computer

Project Group High- performance Flexible File System 2010 / 2011

File Systems for Flash Memories. Marcela Zuluaga Sebastian Isaza Dante Rodriguez

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Speeding Up Cloud/Server Applications Using Flash Memory

NAND Flash Architecture and Specification Trends

Chapter 13. Disk Storage, Basic File Structures, and Hashing

William Stallings Computer Organization and Architecture 8 th Edition. External Memory

Topics in Computer System Performance and Reliability: Storage Systems!

COS 318: Operating Systems. Virtual Memory and Address Translation

CSCA0201 FUNDAMENTALS OF COMPUTING. Chapter 5 Storage Devices

Operating Systems, 6 th ed. Test Bank Chapter 7

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Virtuoso and Database Scalability

With respect to the way of data access we can classify memories as:

1 / 25. CS 137: File Systems. Persistent Solid-State Storage

Timing of a Disk I/O Transfer

Outline: Operating Systems

Record Storage and Primary File Organization

Lecture 9: Memory and Storage Technologies

Storing Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7

Flash Memory. Jian-Jia Chen (Slides are based on Yuan-Hao Chang) TU Dortmund Informatik 12 Germany 2015 年 01 月 27 日. technische universität dortmund

CSCA0102 IT & Business Applications. Foundation in Business Information Technology School of Engineering & Computing Sciences FTMS College Global

Distribution One Server Requirements

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

HARD DRIVE CHARACTERISTICS REFRESHER

Database 2 Lecture I. Alessandro Artale

Survey of Filesystems for Embedded Linux. Presented by Gene Sally CELF

Platter. Track. Index Mark. Disk Storage. PHY 406F - Microprocessor Interfacing Techniques

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Communicating with devices

Transcription:

CS 422/522 Design & Implementation of Operating Systems Lecture 16: Storage Devices Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken from previous versions of the CS422/522 lectures taught by Prof. Bryan Ford and Dr. David Wolinsky, and also from the official set of slides accompanying the OSPP textbook by Anderson and Dahlin. The big picture Lectures before the fall break: Management of CPU & concurrency Management of main memory & virtual memory Current topics --- Management of I/O devices Last lecture: I/O devices & device drivers This lecture: storage devices Next week: file systems * File system structure * Naming and directories * Efficiency and performance * Reliability and protection 1

Main points File systems Useful abstractions on top of physical devices Storage hardware characteristics Disks and flash memory File system usage patterns File systems Abstraction on top of persistent storage Magnetic disk Flash memory (e.g., USB thumb drive) Devices provide Storage that (usually) survives across machine crashes Block level (random) access Large capacity at low cost Relatively slow performance * Magnetic disk read takes 10-20M processor instructions 2

File system as illusionist: hide limitations of physical storage Persistence of data stored in file system: Even if crash happens during an update Even if disk block becomes corrupted Even if flash memory wears out Naming: Named data instead of disk block numbers Directories instead of flat storage Byte addressable data even though devices are block-oriented Performance: Cached data Data placement and data structure organization Controlled access to shared data File system abstraction File system Persistent, named data Hierarchical organization (directories, subdirectories) Access control on data File: named collection of data Linear sequence of bytes (or a set of sequences) Read/write or memory mapped Crash and storage error tolerance Operating system crashes (and disk errors) leave file system in a valid state Performance Achieve close to the hardware limit in the average case 3

Storage devices Magnetic disks Storage that rarely becomes corrupted Large capacity at low cost Block level random access Slow performance for random access Better performance for streaming access Flash memory Storage that rarely becomes corrupted Capacity at intermediate cost (50x disk) Block level random access Good performance for reads; worse for random writes A typical disk controller External connection IDE / ATA, SATA SCSI, SCSI-2, Ultra SCSI, Ultra-160 SCSI, Ultra-320 SCSI Fibre channel (FC) Cache Buffer data between disk and the I/O bus Controller Details of read/write Cache replacement algorithm Failure detection and recovery External connection Interface DRAM cache Controller Disk Disk controller firmware 4

Caching inside a disk controller Method Disk controller has DRAM to cache recently accessed blocks * Hitachi disk has 16MB * Some of the RAM space stores firmware (an embedded OS) Blocks are replaced usually in an LRU order Pros Good for reads if accesses have locality Cons Expensive Need to deal with reliable writes Magnetic disk 5

Disk organization Disk surface Circular disk coated with magnetic material Tracks Concentric rings around disk surface, bits laid out serially along each track Sectors Each track is split into arc of track (min unit of transfer) sectors Disk tracks ~ 1 micron wide Wavelength of light is ~ 0.5 micron Resolution of human eye: 50 microns 100K tracks on a typical 2.5 disk Separated by unused guard regions Reduces likelihood neighboring tracks are corrupted during writes (still a small non-zero chance) Track length varies across disk Outside: More sectors per track, higher bandwidth Disk is organized into regions of tracks with same # of sectors/track Only outer half of radius is used * Most of the disk area in the outer regions of the disk 6

Sectors Sectors contain sophisticated error correcting codes Disk head magnet has a field wider than track Hide corruptions due to neighboring track writes Sector sparing Remap bad sectors transparently to spare sectors on the same surface Slip sparing Remap all sectors (when there is a bad sector) to preserve sequential behavior Track skewing Sector numbers offset from one track to the next, to allow for disk head movement for sequential ops Moving-head disk mechanism 7

Disk cylinder and arm CD s and floppies come individually, but magnetic disks come organized in a disk pack Cylinder Certain track of the platter Disk arm A disk arm carries disk heads Read/write operation Disk controller receives a command with <track#, sector#> Seek the right cylinder (tracks) Wait until the right sector comes Perform read/write seek a cylinder Disk performance Disk Latency = Seek Time + Rotation Time + Transfer Time Seek Time: time to move disk arm over track (1-20ms) Fine-grained position adjustment necessary for head to settle Head switch time ~ track switch time (on modern disks) Rotation Time: time to wait for disk to rotate under disk head Disk rotation: 4 15ms (depending on price of disk) On average, only need to wait half a rotation Transfer Time: time to transfer data onto/off of disk Disk head transfer rate: 50-100MB/s (5-10 usec/sector) Host transfer rate dependent on I/O connector (USB, SATA, ) 8

Toshiba disk (2008) Question How long to complete 500 random disk reads, in FIFO order? 9

Question How long to complete 500 random disk reads, in FIFO order? Seek: average 10.5 msec Rotation: average 4.15 msec Transfer: 5-10 usec 500 * (10.5 + 4.15 + 0.01)/1000 = 7.3 seconds Question How long to complete 500 sequential disk reads? 10

Question How long to complete 500 sequential disk reads? Seek Time: 10.5 ms (to reach first sector) Rotation Time: 4.15 ms (to reach first sector) Transfer Time: (outer track) 500 sectors * 512 bytes / 128MB/sec = 2ms Total: 10.5 + 4.15 + 2 = 16.7 ms Might need an extra head or track switch (+1ms) Track buffer may allow some sectors to be read off disk out of order (-2ms) Question How large a transfer is needed to achieve 80% of the max disk transfer rate? 11

Question How large a transfer is needed to achieve 80% of the max disk transfer rate? Assume x rotations are needed, then solve for x: 0.8 (10.5 ms + (1ms + 8.5ms) x) = 8.5ms x Total: x = 9.1 rotations, 9.8MB Disk scheduling FIFO Schedule disk operations in order they arrive Downsides? 12

FIFO (FCFS) order Method First come first serve Pros Fairness among requests In the order applications expect Cons Arrival may be on random spots on the disk (long seeks) Wild swing can happen 0 53 199 98, 183, 37, 122, 14, 124, 65, 67 SSTF (Shortest Seek Time First) Method Pick the one closest on disk Rotational delay is in calculation Pros Try to minimize seek time Cons Starvation Question Is SSTF optimal? Can we avoid the starvation? 0 53 199 98, 183, 37, 122, 14, 124, 65, 67 (65, 67, 37, 14, 98, 122, 124, 183) 13

Disk scheduling SCAN: move disk arm in one direction, until all requests satisfied, then reverse direction Also called elevator scheduling 2 Disk Arm 4 3 1 5 6 7 Elevator (SCAN) Method Take the closest request in the direction of travel Real implementations do not go to the end (called LOOK) Pros Bounded time for each request Cons Request at the other end will take a while 0 53 199 98, 183, 37, 122, 14, 124, 65, 67 (37, 14, 65, 67, 98, 122, 124, 183) 14

Disk scheduling CSCAN: move disk arm in one direction, 2 until all requests satisfied, then start 4 3 again from farthest request 1 Disk Arm 7 6 5 C-SCAN (Circular SCAN) Method Like SCAN But, wrap around Real implementation doesn t go to the end (C-LOOK) Pros Uniform service time Cons Do nothing on the return 0 53 199 98, 183, 37, 122, 14, 124, 65, 67 (65, 67, 98, 122, 124, 183, 14, 37) 15

Disk scheduling R-CSCAN: CSCAN but take into account that short track switch is < rotational delay 3 Disk Arm 4 2 1 7 6 5 Question How long to complete 500 random disk reads, in any order? 16

Question How long to complete 500 random disk reads, in any order? Disk seek: 1ms (most will be short) Rotation: 4.15ms Transfer: 5-10usec Total: 500 * (1 + 4.15 + 0.01) = 2.2 seconds Would be a bit shorter with R-CSCAN vs. 7.3 seconds if FIFO order Question How long to read all of the bytes off of a disk? 17

Question How long to read all of the bytes off of a disk? Disk capacity: 320GB Disk bandwidth: 54-128MB/s Transfer time = Disk capacity / average disk bandwidth ~ 3500 seconds (1 hour) Flash memory Source Control Drain Control Gate Floating Gate Source Drain 18

Flash memory Writes must be to clean cells; no update in place Large block erasure required before write Erasure block: 128 512 KB Erasure time: Several milliseconds Write/read page (2-4KB) 50-100 usec Flash drive (2011) 19

Question Why are random writes so slow? Random write: 2000/sec Random read: 38500/sec Flash translation layer Flash device firmware maps logical page # to a physical location Garbage collect erasure block by copying live pages to new location, then erase * More efficient if blocks stored at same time are deleted at same time (e.g., keep blocks of a file together) Wear-levelling: only write each physical page a limited number of times Remap pages that no longer work (sector sparing) Transparent to the device user 20

File system flash How does Flash device know which blocks are live? Live blocks must be remapped to a new location during erasure TRIM command File system tells device when blocks are no longer in use File system workload File sizes Are most files small or large? Which accounts for more total storage: small or large files? 21

File system workload File sizes Are most files small or large? * SMALL Which accounts for more total storage: small or large files? * LARGE File system workload File access Are most accesses to small or large files? Which accounts for more total I/O bytes: small or large files? 22

File system workload File access Are most accesses to small or large files? * SMALL Which accounts for more total I/O bytes: small or large files? * LARGE File system workload How are files used? Most files are read/written sequentially Some files are read/written randomly * Ex: database files, swap files Some files have a pre-defined size at creation Some files start small and grow over time * Ex: program stdout, system logs 23

File system design For small files: Small blocks for storage efficiency Concurrent ops more efficient than sequential Files used together should be stored together For large files: Storage efficient (large blocks) Contiguous allocation for sequential access Efficient lookup for random access May not know at file creation Whether file will become small or large Whether file is persistent or temporary Whether file will be used sequentially or randomly File system abstraction Directory Group of named files or subdirectories Mapping from file name to file metadata location Path String that uniquely identifies file or directory Ex: /cse/www/education/courses/cse451/12au Links Hard link: link from name to metadata location Soft link: link from name to alternate name Mount Mapping from name in one file system to root of another 24

UNIX file system API create, link, unlink, createdir, rmdir Create file, link to file, remove link Create directory, remove directory open, close, read, write, seek Open/close a file for reading/writing Seek resets current position fsync File modifications can be cached fsync forces modifications to disk (like a memory barrier) File system interface UNIX file open is a Swiss Army knife: Open the file, return file descriptor Options: * if file doesn t exist, return an error * If file doesn t exist, create file and open it * If file does exist, return an error * If file does exist, open file * If file exists but isn t empty, nix it then open * If file exists but isn t empty, return an error * 25

Interface design question Why not separate syscalls for open/create/exists? Would be more modular! if (!exists(name)) create(name); // can create fail? fd = open(name); // does the file exist? 26