Sistemas Operativos: Input/Output Disks



Similar documents
COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University. (

COS 318: Operating Systems. Storage Devices. Kai Li and Andy Bavier Computer Science Department Princeton University

Disks and RAID. Profs. Bracy and Van Renesse. based on slides by Prof. Sirer

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

Introduction to I/O and Disk Management

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer.

Price/performance Modern Memory Hierarchy

Chapter 9: Peripheral Devices: Magnetic Disks

Chapter 10: Mass-Storage Systems

Lecture 16: Storage Devices

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

Data Storage - II: Efficient Usage & Errors

Devices and Device Controllers

Chapter 12: Mass-Storage Systems

Storing Data: Disks and Files

Disk Storage & Dependability

SSDs and RAID: What s the right strategy. Paul Goodwin VP Product Development Avant Technology

Solid State Drive Architecture

Q & A From Hitachi Data Systems WebTech Presentation:

High-Performance SSD-Based RAID Storage. Madhukar Gunjan Chakhaiyar Product Test Architect

Input / Ouput devices. I/O Chapter 8. Goals & Constraints. Measures of Performance. Anatomy of a Disk Drive. Introduction - 8.1

Reliability and Fault Tolerance in Storage

1 Storage Devices Summary

System Architecture. CS143: Disks and Files. Magnetic disk vs SSD. Structure of a Platter CPU. Disk Controller...

HP Smart Array Controllers and basic RAID performance factors

Database Management Systems

Operating System Concepts. Operating System 資 訊 工 程 學 系 袁 賢 銘 老 師

WHITE PAPER FUJITSU PRIMERGY SERVER BASICS OF DISK I/O PERFORMANCE

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

HARD DRIVE CHARACTERISTICS REFRESHER

Data Storage - I: Memory Hierarchies & Disks

Nasir Memon Polytechnic Institute of NYU

Data Distribution Algorithms for Reliable. Reliable Parallel Storage on Flash Memories

Outline. CS 245: Database System Principles. Notes 02: Hardware. Hardware DBMS Data Storage

DELL SOLID STATE DISK (SSD) DRIVES

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

An Introduction to RAID. Giovanni Stracquadanio

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

PIONEER RESEARCH & DEVELOPMENT GROUP

Overview of I/O Performance and RAID in an RDBMS Environment. By: Edward Whalen Performance Tuning Corporation

William Stallings Computer Organization and Architecture 7 th Edition. Chapter 6 External Memory

Chapter 11 I/O Management and Disk Scheduling

Understanding Flash SSD Performance

Mass Storage Structure

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Chapter 12: Secondary-Storage Structure

Timing of a Disk I/O Transfer

Operating Systems. RAID Redundant Array of Independent Disks. Submitted by Ankur Niyogi 2003EE20367

Storage and File Systems. Chester Rebeiro IIT Madras

CS 6290 I/O and Storage. Milos Prvulovic

Floppy Drive & Hard Drive

PowerVault MD1200/MD1220 Storage Solution Guide for Applications

Outline. Principles of Database Management Systems. Memory Hierarchy: Capacities and access times. CPU vs. Disk Speed

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Robert Wagner

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

1 File Management. 1.1 Naming. COMP 242 Class Notes Section 6: File Management

4.2: Multimedia File Systems Traditional File Systems. Multimedia File Systems. Multimedia File Systems. Disk Scheduling

Indexing on Solid State Drives based on Flash Memory

CS 153 Design of Operating Systems Spring 2015

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE. Matt Kixmoeller, Pure Storage

Flash for Databases. September 22, 2015 Peter Zaitsev Percona

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

SIMULATION AND MODELLING OF RAID 0 SYSTEM PERFORMANCE

How To Improve Performance On A Single Chip Computer

William Stallings Computer Organization and Architecture 8 th Edition. External Memory

RAID HARDWARE. On board SATA RAID controller. RAID drive caddy (hot swappable) SATA RAID controller card. Anne Watson 1

File System Management

Architecting High-Speed Data Streaming Systems. Sujit Basu

How To Write On A Flash Memory Flash Memory (Mlc) On A Solid State Drive (Samsung)

CS420: Operating Systems

CS161: Operating Systems

Chapter 6 External Memory. Dr. Mohamed H. Al-Meer

Freezing Exabytes of Data at Facebook s Cold Storage. Kestutis Patiejunas (kestutip@fb.com)

Physical Data Organization

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering

Definition of RAID Levels

Lecture 36: Chapter 6

Storage in Database Systems. CMPSCI 445 Fall 2010

Chapter 2: Computer-System Structures. Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General System Architecture

Hard Disk Drives and RAID

Lecture 18: Reliable Storage

Storing Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7

Why disk arrays? CPUs improving faster than disks

SOLID STATE DRIVES AND PARALLEL STORAGE

Chapter 2 Data Storage

Input/output (I/O) I/O devices. Performance aspects. CS/COE1541: Intro. to Computer Architecture. Input/output subsystem.

PARALLELS CLOUD STORAGE

Understanding endurance and performance characteristics of HP solid state drives

F600Q 8Gb FC Storage Performance Report Date: 2012/10/30

RAID Overview

UK HQ RAID Chunk Size T F ISO 14001

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

Data Storage and Backup. Sanjay Goel School of Business University at Albany, SUNY

Fault Tolerance & Reliability CDA Chapter 3 RAID & Sample Commercial FT Systems

Dependable Systems. 9. Redundant arrays of. Prof. Dr. Miroslaw Malek. Wintersemester 2004/05

Communicating with devices

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Transcription:

Sistemas Operativos: Input/Output Disks Pedro F. Souto (pfs@fe.up.pt) April 28, 2012

Topics Magnetic Disks RAID Solid State Disks

Topics Magnetic Disks RAID Solid State Disks

Magnetic Disk Construction Surface 7 Read/write head (1 per surface) Surface 6 Surface 5 Surface 4 Surface 3 Surface 2 Surface 1 Direction of arm motion Surface 0 Track A concentric ring on a platter surface Sector An arc of a track with a fixed number of bytes that is individually addressable Cylinder A set of tracks of all platters under the heads at a given position

23 21 26 27 20 15 3 11 5 10 7 9 17 20 16 10 89 4 5 7 Disks: Physical vs. Logical Geometry 28 29 30 31 0 1 2 3 22 23 24 0 1 2 4 21 3 24 22 25 10 11 1213 14 15 9 8 0 7 1 6 2 4 5 6 8 18 19 6 19 12 15 18 17 16 14 13 14 13 12 11 Modern disks have several zones (2 in the left figure), each with a fixed number of sectors per track Nowadays, disks use Logical Block Addressing a scheme where sectors are numbered starting at 0 It is up to the disk controller to map the LBA sector number to the physical sector on the disk Earlier, some disks would advertise a (logical) geometry (CHS) that might be different from the physical geometry

Modern Disk Specs: Seagate Cheetah 15K.7 Barracuda Class Enterprise Business Capacity Formatted capacity (GB) 600 3000 Discs 4 3 Heads 8 6 Sector size (B) 512 4096 Performance External interface 6 Gbit/s Ser. Att. SCSI 6 Gbit/s SATA Rotational speed (rpm) 15,000 7,200 Average latency (ms) 2.0 4.17 Seek time, rd/wr (ms) 3.4/3.9 8.5/9.5 Sust. Transfer rate (MB/s) 122 to 204 < 210 Cache Size (MB) 16 64 Reliability Non-recoverable read errors 1 sector per 1E16 1 sector per 1E14 MTBF 1,600,000 Annual. Failure Rate (AFR) 0.55% 1%

Disk Performance Times Seek time Time required to position the head over the track with the sector to access Read seeks are shorter than write seeks (see above). Why? Typically between 3 and 10 ms Rotational latency Time required for the desired sector to rotate undern the head On average, half of the rotation time, which depends on the rotational speed (2 ms/ 4.17 ms / 5.56 ms) Transfer time Time required to transfer the data, always a multiple of a sector Sustained transfer bandwidth ranges from 40 to 200 MB/s. For 40 MB/s: Block Size (B) Transfer Time (ms) 512 0.013 4096 0.103 1.? M 25.013

Disk Scheduling Algorithms and FCFS Observation Seek time is one of the main factors in disk performance Idea Order the service disk access requests so as to minimize seek time. FCFS Idea Process requests in the order they are submitted Pros Cons Simple and fair Unnecessarily long seeks, with wild arm swings

Shortest Seek Time First: SSTF Initial position Pending requests X X X X X X X 0 5 10 15 20 25 30 35 Cylinder Time Sequence of seeks Idea Process the request that requires the shortest seek time Pros Cons Tries to minimize seek time...... but it is not optimal May lead to starvation

Elevator (SCAN) Initial position X X X X X X X 0 5 10 15 20 25 30 35 Cylinder Time Sequence of seeks Idea Use an algorithm similar to that used in elevators There is no need to go until the end (LOOK) Pros Cons No starvation Requests on the wrong end may take too much time Rotational latency is of the same order as seek time

Circular SCAN (C-SCAN) Initial position X X X X X X X 0 5 10 15 20 25 30 35 Cylinder Time Sequence of seeks Idea Like scan, but service requests in only one direction There is no need to go until the end (C-LOOK) Pros Cons No starvation Equal treatment independent of the track Does nothing on the return arm movement Disk space management may also be important But with LBA the driver does not really know much to make the best decisions

Topics Magnetic Disks RAID Solid State Disks

Redundant Array of Independent Disks I was for Inexpensive in the original proposal, which dates back from the late 80 s Idea Store the data in a disk array so as to improve Performance by executing in parallel several disk operations on different disks Reliability by storing redundant information, so that if one disk fails, its content can be recovered Transparency The RAID controller interfaces to the OS just as a single disk controller Most RAID controllers are SCSI controllers Which allow the attachment of up to 7/15 devices Cons Some: Controller complexity OK! Cost Technological breakthroughs made larger disks much more cost effective than smaller disks

RAID Levels 0 & 1 Strip 0 Strip 1 Strip 2 Strip 3 (a) Strip 4 Strip 5 Strip 6 Strip 7 RAID level 0 Strip 8 Strip 9 Strip 10 Strip 11 (b) Strip 0 Strip 4 Strip 8 Strip 1 Strip 5 Strip 9 Strip 2 Strip 6 Strip 10 Strip 3 Strip 7 Strip 11 Strip 0 Strip 4 Strip 8 Strip 1 Strip 5 Strip 9 Strip 2 Strip 6 Strip 10 Strip 3 Strip 7 Strip 11 RAID level 1 Strip Is a set of k consecutive sectors, for some fixed k Bit 1 Bit 2 Bit 3 Bit 4 Bit 5 Bit 6 Bit 7 (c) RAID level 2 Access to strips that are in different disks can be done in parallel RAID 0 Bit 1 Bit 2 Bit 3 Bit 4 Parity (d) RAID level 3 Higher performance for large I/O requests, or smaller concurrent I/O requests as long as... Reliability Strip 0is Strip worse 1 Strip 2than Strip 3 for P0-3single disk, because... (e) Strip 4 Strip 5 Strip 6 Strip 7 P4-7 RAID level 4 RAID 1 RAID 0 with mirroring Strip 8 Strip 9 Strip 10 Strip 11 P8-11 Read load can be distributed over all disks with the desired data Strip 0 Strip 1 Strip 2 Strip 3 P0-3 Strip 4 Strip 5 Strip 6 P4-7 Strip 7 Highly reliable and recovery from a disk failure is (f) Strip 8 Strip 9 P8-11 Strip 10 Strip 11 RAID level 5 straightforward Strip 12 P12-15 Strip 13 Strip 14 P16-19 Strip 16 Strip 17 Strip 18 Strip 15 Strip 19

RAID Levels 2 & 3 (c) Bit 1 Bit 2 Bit 3 Bit 4 Bit 5 Bit 6 Bit 7 RAID level 2 Bit 1 Bit 2 Bit 3 Bit 4 Parity (d) RAID level 3 RAID 2 More space efficient than RAID 1 Strip 0 Strip 1 Strip 2 Strip 3 P0-3 Splits stream in chunks of fixed size (nibbles in the fig.) (e) Strip 4 Strip 5 Strip 6 Strip 7 P4-7 RAID level 4 Strip 8 Strip 9 Strip 10 Strip 11 P8-11 Each chunk is stored using Hamming code ((7,4) in the fig.), 1 bit per drive Strip 0 Strip 1 Strip 2 Strip 3 P0-3 Can recover from the failure of one disk Strip 4 Strip 5 Strip 6 P4-7 Strip 7 (f) Strip 8 Strip 9 P8-11 Strip 10 Strip 11 RAID level 5 RAID 3 Use just a parity bit Strip rather 12 P12-15 than Strip 13 Strip a147-bit Strip 15 Hamming code P16-19 Strip 16 Strip 17 Strip 18 Strip 19 Lower cost at the expense of lower reliability Still able to recover damaged disk content, as long as one can identify it Both 2 and 3 Higher throughput than RAID 0 or 1 But does not support concurrent I/O operations Require synchronized disks (hard)

RAID Levels (3/3) Strip 0 Strip 1 Strip 2 Strip 3 P0-3 (e) Strip 4 Strip 5 Strip 6 Strip 7 P4-7 RAID level 4 Strip 8 Strip 9 Strip 10 Strip 11 P8-11 (f) Strip 0 Strip 1 Strip 4 Strip 5 Strip 8 Strip 9 Strip 2 Strip 3 Strip 6 P4-7 P8-11 Strip 10 Strip 12 P12-15 Strip 13 Strip 14 P16-19 Strip 16 Strip 17 Strip 18 P0-3 Strip 7 Strip 11 RAID level 5 Strip 15 Strip 19 RAID 4 RAID 0 with one additional drive to store a parity strips Additional drive allows to rebuild one crashed drive Parity drive may be a bottleneck Write to a strip requires reading and writing from at least two drives RAID 5 RAID 4 with the parity strip distributed over all the drives RAID 6 General term used to refer to a RAID scheme that is able to tolerate two simultaneous disk failures The method used to achieve that is not prescribed

Topics Magnetic Disks RAID Solid State Disks

Solid State Disks: FLASH memory Non-volatile RAM Relies on Moore s law for increasing chip density In 2012, SSD have reached the magic cost of 1 USD/GB This is the cost of HDD about 10 years ago Will the cost of SSD replicate the same evolution as that of HDD cost?

SSD vs. Magnetic Disks + No moving parts More reliable mechanically More shock-resistant + Faster access than disk - 20 times more expensive than disk (see chart)

SSD Organization Pages Access (read/write) unit Typical size: 512-4096 bytes Blocks Set of pages Erasing unit Rewriting a page requires erasing its block Can write only 0 s Require an erase (all 1 s) before writing Typical size: 16-256 KB

Modern SSD Specs: Intel 710 Series 520 Series 320 Series Capacity Launch Quarter Q3 11 Q1 12 Q1 11 Max Formatted capacity (GB) 300 480 300 Performance External interface (SATA) 3 Gbit/s 6 Gbit/s 3 Gbit/s Latency time, rd/wr (us) 75/85 80/85 75/90 Sequential rd/wr (MB/s) 270/210 550/520 270/205 Random Access (IOPS) 8GB span 50,000/42,000 39,500/39,500 100% span 38,500/2,000 23,000/400 Reliability Non-recoverable read errors 1 sect./1e16 1 sect./1e16 1 sect./1e16 MTBF 2,000,000 1,200,000 1,200,000

SSD Technical Challenges and Solutions Limited lifetime Number of writes is limited to a few tens of thousands By spreading the writes evenly, these problems can be minorated, but number of blocks is much smaller than number of pages Rewriting performace limitations must erase block Solution The SSD controller can minorate some of these problems Thus, this is mostly transparent to the OS