Disk Storage & Dependability

Similar documents
Disk Storage & Dependability

Price/performance Modern Memory Hierarchy

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Input / Ouput devices. I/O Chapter 8. Goals & Constraints. Measures of Performance. Anatomy of a Disk Drive. Introduction - 8.1

CS 6290 I/O and Storage. Milos Prvulovic

1 Storage Devices Summary

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University. (

Big Picture. IC220 Set #11: Storage and I/O I/O. Outline. Important but neglected

Case for storage. Outline. Magnetic disks. CS2410: Computer Architecture. Storage systems. Sangyeun Cho

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer.

How To Improve Performance On A Single Chip Computer

Chapter 9: Peripheral Devices: Magnetic Disks

Chapter 10: Mass-Storage Systems

COS 318: Operating Systems. Storage Devices. Kai Li and Andy Bavier Computer Science Department Princeton University

Sistemas Operativos: Input/Output Disks

CS 61C: Great Ideas in Computer Architecture. Dependability: Parity, RAID, ECC

Data Storage - II: Efficient Usage & Errors

William Stallings Computer Organization and Architecture 7 th Edition. Chapter 6 External Memory

Disks and RAID. Profs. Bracy and Van Renesse. based on slides by Prof. Sirer

Reliability and Fault Tolerance in Storage

Input/output (I/O) I/O devices. Performance aspects. CS/COE1541: Intro. to Computer Architecture. Input/output subsystem.

Solid State Drive Architecture

Chapter 6 External Memory. Dr. Mohamed H. Al-Meer

Introduction to I/O and Disk Management

High Performance Computing. Course Notes High Performance Storage

Chapter 2: Computer-System Structures. Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General System Architecture

Outline. CS 245: Database System Principles. Notes 02: Hardware. Hardware DBMS Data Storage

Computer Systems Structure Main Memory Organization

Chapter 12: Mass-Storage Systems

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

HARD DRIVE CHARACTERISTICS REFRESHER

Hard Disk Drives and RAID

COMP 7970 Storage Systems

Storing Data: Disks and Files

Lecture 36: Chapter 6

William Stallings Computer Organization and Architecture 8 th Edition. External Memory

System Architecture. CS143: Disks and Files. Magnetic disk vs SSD. Structure of a Platter CPU. Disk Controller...

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

Lecture 16: Storage Devices

RAID Performance Analysis

Striped Set, Advantages and Disadvantages of Using RAID

Fault Tolerance & Reliability CDA Chapter 3 RAID & Sample Commercial FT Systems

Dependable Systems. 9. Redundant arrays of. Prof. Dr. Miroslaw Malek. Wintersemester 2004/05

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Robert Wagner

ES-1 Elettronica dei Sistemi 1 Computer Architecture

Operating Systems. RAID Redundant Array of Independent Disks. Submitted by Ankur Niyogi 2003EE20367

Storage. The text highlighted in green in these slides contain external hyperlinks. 1 / 14

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering

CS420: Operating Systems

Operating System Concepts. Operating System 資 訊 工 程 學 系 袁 賢 銘 老 師

RAID 5 rebuild performance in ProLiant

CS161: Operating Systems

Chapter 2 Data Storage

technology brief RAID Levels March 1997 Introduction Characteristics of RAID Levels

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Writing Assignment #2 due Today (5:00pm) - Post on your CSC101 webpage - Ask if you have questions! Lab #2 Today. Quiz #1 Tomorrow (Lectures 1-7)

HP Smart Array Controllers and basic RAID performance factors

Chapter 12: Secondary-Storage Structure

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

Homework # 2. Solutions. 4.1 What are the differences among sequential access, direct access, and random access?

RAID: Redundant Arrays of Independent Disks

Outline. Principles of Database Management Systems. Memory Hierarchy: Capacities and access times. CPU vs. Disk Speed

TECHNOLOGY BRIEF. Compaq RAID on a Chip Technology EXECUTIVE SUMMARY CONTENTS

DELL SOLID STATE DISK (SSD) DRIVES

An Introduction to RAID. Giovanni Stracquadanio

Communicating with devices

Definition of RAID Levels

Chapter 2 - Computer Organization

CSCA0102 IT & Business Applications. Foundation in Business Information Technology School of Engineering & Computing Sciences FTMS College Global

Understanding Flash SSD Performance

Database Management Systems

Version : 1.0. SR3620-2S-SB2 User Manual. SOHORAID Series

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

A+ Guide to Managing and Maintaining Your PC, 7e. Chapter 1 Introducing Hardware

Storing Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7

How To Write A Disk Array

RAID HARDWARE. On board SATA RAID controller. RAID drive caddy (hot swappable) SATA RAID controller card. Anne Watson 1

CHAPTER 4 RAID. Section Goals. Upon completion of this section you should be able to:

CS 153 Design of Operating Systems Spring 2015

760 Veterans Circle, Warminster, PA Technical Proposal. Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA

QuickSpecs. HP Smart Array 5312 Controller. Overview

HP Smart Array 5i Plus Controller and Battery Backed Write Cache (BBWC) Enabler

Data Storage and Backup. Sanjay Goel School of Business University at Albany, SUNY

Distribution One Server Requirements

PIONEER RESEARCH & DEVELOPMENT GROUP

Enterprise-class versus Desktop-class Hard Drives

Flash for Databases. September 22, 2015 Peter Zaitsev Percona

Difference between Enterprise SATA HDDs and Desktop HDDs. Difference between Enterprise Class HDD & Desktop HDD

RAID Technology Overview

Devices and Device Controllers

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Read this before starting!

Summer Student Project Report

VTrak SATA RAID Storage System

Outline. Database Management and Tuning. Overview. Hardware Tuning. Johann Gamper. Unit 12

IBM ^ xseries ServeRAID Technology

Benefits of Intel Matrix Storage Technology

Intel Rapid Storage Technology

Transcription:

Disk Storage & Dependability Computer Organization Architectures for Embedded Computing Tuesday, 10 December 13 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK and from Prof. Mary Jane Irwin, PSU

Summary Previous Class IO System Today: Disk Storage Dependability Computer Organization / Architectures for Embedded Computing 2

Review: Major Components of a Computer Processor Devices Control Memory Output Datapath Input Secondary Memory (Disk) Main Memory Cache Computer Organization / Architectures for Embedded Computing 3

Disk Storage Nonvolatile, rotating magnetic storage Computer Organization / Architectures for Embedded Computing 4

Magnetic Disk Purpose Sector Long term, nonvolatile storage Lowest level in the memory hierarchy slow, large, inexpensive General structure Track A rotating platter coated with a magnetic surface A moveable read/write head to access the information on the disk Typical numbers 1 to 4 platters (each with 2 recordable surfaces) per disk of 2.5cm to 9.5cm in diameter Rotational speeds of 5,400 to 15,000 RPM 10,000 to 50,000 tracks per surface cylinder - all the tracks under the head at a given point on all surfaces 100 to 500 sectors per track the smallest unit that can be read/written (typically 512B) Computer Organization / Architectures for Embedded Computing 5

Magnetic Disk Characteristic Disk read/write components 1. Seek time: position the head over the proper track (3 to 13 ms avg) Controller + Cache Track Sector Cylinder 2. Rotational latency: wait for the desired sector to rotate under the head (½ of 1/RPM converted to ms) 0.5/5400RPM = 5.6ms to 0.5/15000RPM = 2.0ms Head Platter 3. Transfer time: transfer a block of bits (one or more sectors) under the head to the disk controller s cache (70 to 125 MB/s are typical disk transfer rates in 2008) the disk controller s cache takes advantage of spatial locality in disk accesses 4. Controller time: the overhead the disk controller imposes in performing a disk I/O access (typically <.2 ms) Computer Organization / Architectures for Embedded Computing 6

Disk Performance Issues Manufacturers quote average seek time Based on all possible seeks Locality and OS scheduling lead to smaller actual average seek times Smart disk controller allocate physical sectors on disk Present logical sector interface to host SCSI, ATA, SATA Disk drives include caches Prefetch sectors in anticipation of access Avoid seek and rotational delay Computer Organization / Architectures for Embedded Computing 7

Latency & Bandwidth Improvements In the time that the disk bandwidth doubles the latency improves by a factor of only 1.2 to 1.4 Bandwidth (MB/s) Latency (msec) Year of Introduction Disk latency is one average seek time plus the rotational latency. Disk bandwidth is the peak transfer time of formatted data from the media (not from the cache). Computer Organization / Architectures for Embedded Computing 8

Magnetic Disk Examples (www.seagate.com) Feature Seagate ST31000340NS Seagate ST973451SS Seagate ST9160821AS Disk diameter (inches) 3.5 2.5 2.5 Capacity (GB) 1000 73 160 # of surfaces (heads) 4 2 2 Rotation speed (RPM) 7,200 15,000 5,400 Transfer rate (MB/sec) 105 79-112 44 Minimum seek (ms) 0.8r-1.0w 0.2r-0.4w 1.5r-2.0w Average seek (ms) 8.5r-9.5w 2.9r-3.3w 12.5r-13.0w MTTF (hours@25 o C) 1,200,000 1,600,000?? Dim (inches); Weight (lbs) 1x4x5.8; 1.4 0.6x2.8x3.9;0.5 0.4x2.8x3.9; 0.2 GB/cu.inch, GB/watt 43, 91 11, 9 37, 84 Power: op/idle/sb (watts) 11/8/1 8/5.8/- 1.9/0.6/0.2 Price in 2008, $/GB ~$0.3/GB ~$5/GB ~$0.6/GB Computer Organization / Architectures for Embedded Computing 9

Flash Storage Nonvolatile semiconductor storage 100x to 1000x faster than disk Smaller, lower power, more robust But more $/GB (between disk and DRAM) Feature Kingston Transcend RiDATA Capacity (GB) 8 16 32 Bytes/sector 512 512 512 Transfer rates (MB/sec) 4 20r-18w 68r-50w MTTF (hours) >1,000,000 >1,000,000 >4,000,000 Price (2008) ~ $30 ~ $70 ~ $300 Computer Organization / Architectures for Embedded Computing 10

Flash Types NOR flash: bit cell like a NOR gate Random read/write access Used for instruction memory in embedded systems NAND flash: bit cell like a NAND gate Denser (bits/area), but block-at-a-time access Cheaper per GB Used for USB keys, media storage, Flash bits wears out after 1000 s of accesses Not suitable for direct RAM or disk replacement Wear levelling: remap data to less used blocks Computer Organization / Architectures for Embedded Computing 11

Fallacy: Disk Dependability If a disk manufacturer quotes MTTF as 1,200,000hr (140yr) A disk will work that long Wrong: this is the mean time to failure What is the distribution of failures? What if you have 1000 disks How many will fail per year? Failed Disks = 1000 disks 8760 hrs / disk 1200000 hrs / failure = 7.3 Annual Failure Rate (AFR) = 8760 hrs / disk 1200000 hrs / failure = 0.73% Computer Organization / Architectures for Embedded Computing 12

Fallacies Disk failure rates are as specified Studies of failure rates in the field Schroeder and Gibson: 2% to 4% vs. 0.6% to 0.8% Pinheiro, et al.: 1.7% (first year) to 8.6% (third year) vs. 1.5% Why? A 1GB/s interconnect transfers 1GB in one sec But what s a GB? For bandwidth, use 1GB = 10 9 B For storage, use 1GB = 2 30 B = 1.075 10 9 B So 1GB/sec is 0.93GB in one second About 7% error Computer Organization / Architectures for Embedded Computing 13

Fallacy: Disk Scheduling Best to let the OS schedule disk accesses But modern drives deal with logical block addresses Map to physical track, cylinder, sector locations Also, blocks are cached by the drive OS is unaware of physical locations Reordering can reduce performance Depending on placement and caching Computer Organization / Architectures for Embedded Computing 14

Pitfall: Backing Up to Tape Magnetic tape used to have advantages Removable, high capacity Advantages eroded by disk technology developments Makes better sense to replicate data E.g, RAID, remote mirroring Computer Organization / Architectures for Embedded Computing 15

Dependability Service accomplishment Service delivered as specified Restoration Failure Fault: failure of a component May or may not lead to system failure Service interruption Deviation from specified service Computer Organization / Architectures for Embedded Computing 16

Dependability Measures Reliability: mean time to failure (MTTF) Service interruption: mean time to repair (MTTR) Mean time between failures MTBF = MTTF + MTTR Availability = MTTF / (MTTF + MTTR) Improving Availability Increase MTTF: fault avoidance, fault tolerance, fault forecasting Reduce MTTR: improved tools and processes for diagnosis and repair To increase MTTF, either improve the quality of the components or design the system to continue operating in the presence of faulty components 1. Fault avoidance: preventing fault occurrence by construction 2. Fault tolerance: using redundancy to correct or bypass faulty components (hardware) Fault detection versus fault correction Permanent faults versus transient faults Computer Organization / Architectures for Embedded Computing 17

RAID 0 &1 & 2 RAID 0: Parallelism No data replication or redundancy RAID 1: Mirroring N + N disks, replicate data Write data to both data disk and mirror disk On disk failure, read from mirror RAID 2: Error correcting code (ECC) N + E disks (e.g., 10 + 4) Split data at bit level across N disks Generate E-bit ECC Can tolerate limited disk failure, since the data can be reconstructed Too complex, not used in practice Computer Organization / Architectures for Embedded Computing 18

RAID 3: Bit-Interleaved Parity N + 1 disks Data striped across N disks at byte level Redundant disk stores parity Read access Read all disks Write access Generate new parity and update all disks On failure Use parity to reconstruct missing data Not widely used Computer Organization / Architectures for Embedded Computing 19

RAID 4: Block-Interleaved Parity N + 1 disks Data striped across N disks at block level Redundant disk stores parity for a group of blocks Read access Read only the disk holding the required block Write access Just read disk containing modified block, and parity disk Calculate new parity, update data disk and parity disk On failure Use parity to reconstruct missing data Not widely used Computer Organization / Architectures for Embedded Computing 20

RAID 3 vs RAID 4 3 reads and 2 writes involving all the disks 2 reads and 2 writes involving just two disks Computer Organization / Architectures for Embedded Computing 21

RAID 5: Distributed Parity N + 1 disks Like RAID 4, but parity blocks distributed across disks Avoids parity disk being a bottleneck Writes can be performed in parallel Widely used Computer Organization / Architectures for Embedded Computing 22

RAID 6: P + Q Redundancy N + 2 disks Like RAID 5, but two lots of parity Greater fault tolerance through more redundancy Multiple RAID or Nested RAID More advanced systems give similar fault tolerance with better performance RAID 01, RAID 10, Computer Organization / Architectures for Embedded Computing 23

RAID Summary RAID can improve performance and availability High availability requires hot swapping Assumes independent disk failures Too bad if the building burns down! Computer Organization / Architectures for Embedded Computing 24

Error Detection/Correction Codes Data Storage CDs and DVDs RAID ECC memory Paper bar codes UPS (MaxiCode) QR Code Communications Cellphones Satellites / Space Codes are all around us Computer Organization / Architectures for Embedded Computing 25

Encoding: Error Detection with Parity Bit m 1 m 2 m k m 1 m 2 m k p k+1 where p k+1 = m 1 m 2 m k Detects one-bit error since this gives odd parity Cannot be used to correct 1-bit error since any odd-parity word is equal distance Δ to k+1 valid codewords. Computer Organization / Architectures for Embedded Computing 26

Hamming Distance The Hamming distance between two words is the number of differences between corresponding bits. The Hamming distance d(000, 011) is 2: 000 011 = 011 (two 1s) The Hamming distance d(10101, 11110) is 3: 10101 11110 = 01011 (three 1s) Computer Organization / Architectures for Embedded Computing 27

Error Correction To guarantee correction of up to t errors in all cases, the minimum Hamming distance in a block code must be d min = 2t + 1 Computer Organization / Architectures for Embedded Computing 28

Hamming Codes Example: Uses multiple parity bits, each p applied to a different subset 0 of data bits Data bits Parity bits d 3 d 2 d 1 d 0 p 2 p 1 p 0 d 0 Encoding: 3 XOR networks to form parity bits d 2 d 1 Checking: 3 XOR p 1 networks p 2 to verify parities d 3 Decoding: Trivial (separable code) Redundancy: 3 check bits for 4 data bits Unimpressive, but gets better with more data bits (7, 4); (15, 11); (31, 26); (63, 57); (127, 120) Capability: Corrects any single-bit error s 2 = d 3 d 2 d 1 p 2 s 1 = d 3 d 1 d 0 p 1 s 0 = d 2 d 1 d 0 p 0 s 2 s 1 s 0 Syndrome s 2 s 1 s 0 Error 0 0 0 None 0 0 1 p 0 0 1 0 p 1 0 1 1 d 0 1 0 0 p 2 1 0 1 d 2 1 1 0 d 3 1 1 1 d 1 Computer Organization / Architectures for Embedded Computing 29

Reed-Muller Code Encoding contains more redundant information to increase the number of errors that can be corrected if needed Uses Hadamard matrices for encoding and decoding stronger error-correcting codes. Computer Organization / Architectures for Embedded Computing 30

Hadamard Matrices Each row is a possible code Each matrix has a Hamming distance d Can fix (d 1) / 2 errors Computer Organization / Architectures for Embedded Computing 31

Encoding Example Input Hadamard Matrix Output 000 00101011 001 10100101 010 00010111 011 + = 11000011 100 01110001 101 10011001 110 01001101 111 11111111 3 bits 8 bits Hamming Distance = 4 (4 1) / 2 = 1 (fixable error) Example: 110 à 01001101 Computer Organization / Architectures for Embedded Computing 32

Decoding Example Example: 0101 1101 à? Mapped Hadamard Matrix Possible Values Compare Differences 000 0010 1011 0101 1101 4 001 1010 0101 0101 1101 5 010 0001 0111 0101 1101 3 011 : 1100 0011-0101 1101 = 5 100 0111 0001 0101 1101 3 101 1001 1001 0101 1101 3 110 0100 1101 0101 1101 1 111 1111 1111 0101 1101 3 Result: 0101 1101 à 0100 1101 à 110 Computer Organization / Architectures for Embedded Computing 33

Summary Four components of disk access time: Seek Time, Rotational Latency, Transfer Time, Controller Time RAIDS can be used to improve availability and performance RAID 1 and RAID 5 widely used in servers, one estimate is that 80% of disks in servers are RAIDs RAID 0+1 (mirroring) EMC, Tandem, IBM RAID 3 Storage Concepts RAID 4 Network Appliance RAIDS have enough redundancy to allow continuous operation, but not hot swapping Assumes independent disk failures Too bad if the building burns down! Computer Organization / Architectures for Embedded Computing 34

Next Class Analog / Digital Interfaces Computer Organization / Architectures for Embedded Computing 35

Disk Storage & Dependability Computer Organization Architectures for Embedded Computing Tuesday, 10 December 13 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK and from Prof. Mary Jane Irwin, PSU

Disk Sectors and Access Each sector records Sector ID Data (512 bytes, 4096 bytes proposed) Error correcting code (ECC) Used to hide defects and recording errors Synchronization fields and gaps Access to a sector involves Queuing delay if other accesses are pending Seek: move the heads Rotational latency Data transfer Controller overhead Computer Organization / Architectures for Embedded Computing 37

Disk Interface Standards Higher-level disk interfaces have a microprocessor disk controller that can lead to performance optimizations ATA (Advanced Technology Attachment ) An interface standard for the connection of storage devices such as hard disks, solid-state drives, and CD-ROM drives. Parallel ATA has been largely replaced by serial ATA. SCSI (Small Computer Systems Interface) A set of standards (commands, protocols, and electrical and optical interfaces) for physically connecting and transferring data between computers and peripheral devices. Most commonly used for hard disks and tape drives. In particular, disk controllers have SRAM disk caches which support fast access to data that was recently read and often also include prefetch algorithms to try to anticipate demand Computer Organization / Architectures for Embedded Computing 38