Disk Storage & Dependability



Similar documents
Price/performance Modern Memory Hierarchy

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Input / Ouput devices. I/O Chapter 8. Goals & Constraints. Measures of Performance. Anatomy of a Disk Drive. Introduction - 8.1

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University. (

CS 6290 I/O and Storage. Milos Prvulovic

COS 318: Operating Systems. Storage Devices. Kai Li and Andy Bavier Computer Science Department Princeton University

Disks and RAID. Profs. Bracy and Van Renesse. based on slides by Prof. Sirer

1 Storage Devices Summary

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

Data Storage - II: Efficient Usage & Errors

Sistemas Operativos: Input/Output Disks

Storing Data: Disks and Files

Reliability and Fault Tolerance in Storage

CS 61C: Great Ideas in Computer Architecture. Dependability: Parity, RAID, ECC

Big Picture. IC220 Set #11: Storage and I/O I/O. Outline. Important but neglected

Lecture 36: Chapter 6

How To Improve Performance On A Single Chip Computer

Solid State Drive Architecture

Chapter 10: Mass-Storage Systems

Introduction to I/O and Disk Management

Input/output (I/O) I/O devices. Performance aspects. CS/COE1541: Intro. to Computer Architecture. Input/output subsystem.

Outline. CS 245: Database System Principles. Notes 02: Hardware. Hardware DBMS Data Storage

Operating Systems. RAID Redundant Array of Independent Disks. Submitted by Ankur Niyogi 2003EE20367

Chapter 9: Peripheral Devices: Magnetic Disks

Chapter 2: Computer-System Structures. Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General System Architecture

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer.

William Stallings Computer Organization and Architecture 7 th Edition. Chapter 6 External Memory

System Architecture. CS143: Disks and Files. Magnetic disk vs SSD. Structure of a Platter CPU. Disk Controller...

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Fault Tolerance & Reliability CDA Chapter 3 RAID & Sample Commercial FT Systems

HARD DRIVE CHARACTERISTICS REFRESHER

Storing Data: Disks and Files. Disks and Files. Why Not Store Everything in Main Memory? Chapter 7

Chapter 2 Data Storage

Chapter 12: Mass-Storage Systems

Flash for Databases. September 22, 2015 Peter Zaitsev Percona

William Stallings Computer Organization and Architecture 8 th Edition. External Memory

CS420: Operating Systems

Lecture 16: Storage Devices

Chapter 6 External Memory. Dr. Mohamed H. Al-Meer

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Robert Wagner

HP Smart Array Controllers and basic RAID performance factors

SSDs and RAID: What s the right strategy. Paul Goodwin VP Product Development Avant Technology

Version : 1.0. SR3620-2S-SB2 User Manual. SOHORAID Series

Storage Class Memory and the data center of the future

Dependable Systems. 9. Redundant arrays of. Prof. Dr. Miroslaw Malek. Wintersemester 2004/05

Outline. Principles of Database Management Systems. Memory Hierarchy: Capacities and access times. CPU vs. Disk Speed

Storage. The text highlighted in green in these slides contain external hyperlinks. 1 / 14

Database Management Systems

RAID Performance Analysis

Hard Disk Drives and RAID

Summer Student Project Report

RAID. Storage-centric computing, cloud computing. Benefits:

RAID HARDWARE. On board SATA RAID controller. RAID drive caddy (hot swappable) SATA RAID controller card. Anne Watson 1

The Technologies & Architectures. President, Demartek

RAID: Redundant Arrays of Independent Disks

PIONEER RESEARCH & DEVELOPMENT GROUP

Architecting High-Speed Data Streaming Systems. Sujit Basu

Understanding Flash SSD Performance

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

CS 153 Design of Operating Systems Spring 2015

RAID 5 rebuild performance in ProLiant

Chapter 2 - Computer Organization

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Data Distribution Algorithms for Reliable. Reliable Parallel Storage on Flash Memories

CS161: Operating Systems

COMP 7970 Storage Systems

Writing Assignment #2 due Today (5:00pm) - Post on your CSC101 webpage - Ask if you have questions! Lab #2 Today. Quiz #1 Tomorrow (Lectures 1-7)

An Introduction to RAID. Giovanni Stracquadanio

California Software Labs

Indexing on Solid State Drives based on Flash Memory

Computer Systems Structure Main Memory Organization

Communicating with devices

DELL SOLID STATE DISK (SSD) DRIVES

Distribution One Server Requirements

High-Performance SSD-Based RAID Storage. Madhukar Gunjan Chakhaiyar Product Test Architect

QuickSpecs. HP Smart Array 5312 Controller. Overview

How To Write A Disk Array

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

Definition of RAID Levels

Why disk arrays? CPUs speeds increase faster than disks. - Time won t really help workloads where disk in bottleneck

RAID Storage, Network File Systems, and DropBox

HP 128 GB Solid State Drive (SSD) *Not available in all regions.

HP Smart Array 5i Plus Controller and Battery Backed Write Cache (BBWC) Enabler

Striped Set, Advantages and Disadvantages of Using RAID

technology brief RAID Levels March 1997 Introduction Characteristics of RAID Levels

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

TECHNOLOGY BRIEF. Compaq RAID on a Chip Technology EXECUTIVE SUMMARY CONTENTS

A Deduplication File System & Course Review

Best Practices RAID Implementations for Snap Servers and JBOD Expansion

Exploring RAID Configurations

A Comparison of Client and Enterprise SSD Data Path Protection

Devices and Device Controllers

Transcription:

Disk Storage & Dependability Computer Organization Architectures for Embedded Computing Wednesday 19 November 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK and from Prof. Mary Jane Irwin, PSU

Summary Previous Class IO System Today: Disk Storage Dependability Computer Organization / Architectures for Embedded Computing 2

Review: Major Components of a Computer Processor Devices Control Memory Output Datapath Input Secondary Memory (Disk) Main Memory Cache Computer Organization / Architectures for Embedded Computing 3

Disk Storage Nonvolatile, rotating magnetic storage Computer Organization / Architectures for Embedded Computing 4

Purpose Long term, nonvolatile storage Magnetic Disk Lowest level in the memory hierarchy slow, large, inexpensive General structure A rotating platter coated with a magnetic surface A moveable read/write head to access the information on the disk Typical numbers 1 to 4 platters (each with 2 recordable surfaces) per disk of 2.5cm to 9.5cm in diameter Rotational speeds of 5,400 to 15,000 RPM 10,000 to 50,000 tracks per surface cylinder - all the tracks under the head at a given point on all surfaces 100 to 500 sectors per track Sector Track the smallest unit that can be read/written (typically 512B) Computer Organization / Architectures for Embedded Computing 5

Magnetic Disk Characteristics Disk read/write components 1. Seek time: position the head over the proper track (3 to 13 ms avg) 2. Rotational latency: wait for the desired sector to rotate under the head (½ of 1/RPM converted to ms) Controller + Cache 0.5/5400RPM = 5.6ms to 0.5/15000RPM = 2.0ms Head Track Sector Cylinder Platter 3. Transfer time: transfer a block of bits (one or more sectors) under the head to the disk controller s cache (70 to 125 MB/s are typical disk transfer rates in 2008) 4. Controller time: the overhead the disk controller imposes in performing a disk I/O access (typically <.2 ms) the disk controller s cache takes advantage of spatial locality in disk accesses Computer Organization / Architectures for Embedded Computing 6

Disk Performance Issues Manufacturers quote average seek time Based on all possible seeks Locality and OS scheduling lead to smaller actual average seek times Smart disk controller allocate physical sectors on disk Present logical sector interface to host SCSI, ATA, SATA Disk drives include caches Prefetch sectors in anticipation of access Avoid seek and rotational delay Computer Organization / Architectures for Embedded Computing 7

Latency & Bandwidth Improvements In the time that the disk bandwidth doubles the latency improves by a factor of only 1.2 to 1.4 Bandwidth (MB/s) Latency (msec) Year of Introduction Disk latency is one average seek time plus the rotational latency. Disk bandwidth is the peak transfer time of formatted data from the media (not from the cache). Computer Organization / Architectures for Embedded Computing 8

Magnetic Disk Examples (www.seagate.com) Feature Seagate ST31000340NS Seagate ST973451SS Seagate ST9160821AS Disk diameter (inches) 3.5 2.5 2.5 Capacity (GB) 1000 73 160 # of surfaces (heads) 4 2 2 Rotation speed (RPM) 7,200 15,000 5,400 Transfer rate (MB/sec) 105 79-112 44 Minimum seek (ms) 0.8r-1.0w 0.2r-0.4w 1.5r-2.0w Average seek (ms) 8.5r-9.5w 2.9r-3.3w 12.5r-13.0w MTTF (hours@25 o C) 1,200,000 1,600,000?? Dim (inches); Weight (lbs) 1x4x5.8; 1.4 0.6x2.8x3.9;0.5 0.4x2.8x3.9; 0.2 GB/cu.inch, GB/watt 43, 91 11, 9 37, 84 Power: op/idle/sb (watts) 11/8/1 8/5.8/- 1.9/0.6/0.2 Price in 2008, $/GB ~$0.3/GB ~$5/GB ~$0.6/GB Computer Organization / Architectures for Embedded Computing 9

Flash Storage Nonvolatile semiconductor storage 100x to 1000x faster than disk Smaller, lower power, more robust But more $/GB (between disk and DRAM) Feature Kingston Transcend RiDATA Capacity (GB) 8 16 32 Bytes/sector 512 512 512 Transfer rates (MB/sec) 4 20r-18w 68r-50w MTTF (hours) >1,000,000 >1,000,000 >4,000,000 Price (2008) ~ $30 ~ $70 ~ $300 Computer Organization / Architectures for Embedded Computing 10

Flash Types NOR flash: bit cell like a NOR gate Random read/write access Used for instruction memory in embedded systems NAND flash: bit cell like a NAND gate Denser (bits/area), but block-at-a-time access Cheaper per GB Used for USB keys, media storage, Traditional flash wears out after 1000 s of accesses Not suitable for direct RAM or disk replacement Wear levelling: remap data to less used blocks Computer Organization / Architectures for Embedded Computing 11

Flash Storage in Hard Drives Solid State Disc (SSD) Up to 512 GB (300 ), 1TB (800 ) Up to 520MB/s for reading and 400MB/s for writing Lower energy consumtion in idle and active mode Than traditional hard drives (HDD) Near 1.000.000 writes for each cell Hybrid Disc Nonvolatile buffer for write accesses Or used as permanent cache controlled by the OS Computer Organization / Architectures for Embedded Computing 12

Fallacy: Disk Dependability If a disk manufacturer quotes MTTF as 1,200,000hr (140yr) A disk will work that long Wrong: this is the mean time to failure What is the distribution of failures? What if you have 1000 disks How many will fail per year? Annual Failure Rate (AFR) = 8760 hrs / disk 1200000 hrs / failure = 0.73% Failed Disks = 1000 disks 8760 hrs / disk 1200000 hrs / failure = 7.3 Computer Organization / Architectures for Embedded Computing 13

Fallacies Disk failure rates are as specified Studies of failure rates in the field Schroeder and Gibson: 2% to 4% vs. 0.6% to 0.8% Pinheiro, et al.: 1.7% (first year) to 8.6% (third year) vs. 1.5% Why? A 1GB/s interconnect transfers 1GB in one sec But what s a GB? For bandwidth, use 1GB = 10 9 B For storage, use 1GB = 2 30 B = 1.075 10 9 B So 1GB/sec is 0.93GB in one second About 7% error Computer Organization / Architectures for Embedded Computing 14

Fallacy: Disk Scheduling Best to let the OS schedule disk accesses But modern drives deal with logical block addresses Map to physical track, cylinder, sector locations Also, blocks are cached by the drive OS is unaware of physical locations Reordering can reduce performance Depending on placement and caching Computer Organization / Architectures for Embedded Computing 15

Pitfall: Backing Up to Tape Magnetic tape used to have advantages Removable, high capacity Advantages eroded by disk technology developments Makes better sense to replicate data E.g, RAID, remote mirroring Computer Organization / Architectures for Embedded Computing 16

Dependability Service accomplishment Service delivered as specified Restoration Failure Fault: failure of a component May or may not lead to system failure Service interruption Deviation from specified service Computer Organization / Architectures for Embedded Computing 17

Dependability Measures Reliability: mean time to failure (MTTF) Service interruption: mean time to repair (MTTR) Mean time between failures MTBF = MTTF + MTTR Availability = MTTF / (MTTF + MTTR) Improving Availability Increase MTTF: fault avoidance, fault tolerance, fault forecasting Reduce MTTR: improved tools and processes for diagnosis and repair To increase MTTF, either improve the quality of the components or design the system to continue operating in the presence of faulty components 1. Fault avoidance: preventing fault occurrence by construction 2. Fault tolerance: using redundancy to correct or bypass faulty components (hardware) Fault detection versus fault correction Permanent faults versus transient faults Computer Organization / Architectures for Embedded Computing 18

RAID 0 & 1 & 2 RAID 0: Parallelism No data replication or redundancy RAID 1: Mirroring N + N disks, replicate data Write data to both data disk and mirror disk On disk failure, read from mirror RAID 2: Error correcting code (ECC) N + E disks (e.g., 10 + 4) Split data at bit level across N disks Generate E-bit ECC Can tolerate limited disk failure, since the data can be reconstructed Too complex, not used in practice Computer Organization / Architectures for Embedded Computing 19

RAID 3: Bit-Interleaved Parity N + 1 disks Data striped across N disks at byte level Redundant disk stores parity Read access Read all disks Write access Generate new parity and update all disks On failure Use parity to reconstruct missing data Not widely used Computer Organization / Architectures for Embedded Computing 20

RAID 4: Block-Interleaved Parity N + 1 disks Data striped across N disks at block level Redundant disk stores parity for a group of blocks Read access Read only the disk holding the required block Write access Just read disk containing modified block, and parity disk Calculate new parity, update data disk and parity disk On failure Use parity to reconstruct missing data Not widely used Computer Organization / Architectures for Embedded Computing 21

RAID 3 vs RAID 4 3 reads and 2 writes involving all the disks 2 reads and 2 writes involving just two disks Computer Organization / Architectures for Embedded Computing 22

N + 1 disks RAID 5: Distributed Parity Like RAID 4, but parity blocks distributed across disks Avoids parity disk being a bottleneck Writes can be performed in parallel Widely used Computer Organization / Architectures for Embedded Computing 23

RAID 6: P + Q Redundancy N + 2 disks Like RAID 5, but two lots of parity Greater fault tolerance through more redundancy Multiple RAID or Nested RAID More advanced systems give similar fault tolerance with better performance RAID 01, RAID 10, Computer Organization / Architectures for Embedded Computing 24

RAID Summary RAID can improve performance and availability High availability requires hot swapping Assumes independent disk failures Too bad if the building burns down! Computer Organization / Architectures for Embedded Computing 25

Error Detection/Correction Codes Data Storage CDs and DVDs RAID ECC memory Paper bar codes UPS (MaxiCode) QR Code Communications Cellphones Satellites / Space Codes are all around us Computer Organization / Architectures for Embedded Computing 26

Encoding: Error Detection with Parity Bit m 1 m 2 m k m 1 m 2 m k p k+1 where p k+1 = m 1 m 2 m k Detects one-bit error since this gives odd parity Cannot be used to correct 1-bit error since any odd-parity word is equal distance Δ to k+1 valid codewords. Computer Organization / Architectures for Embedded Computing 27

Hamming Distance The Hamming distance between two words is the number of differences between corresponding bits. The Hamming distance d(000, 011) is 2: 000 011 = 011 (two 1s) The Hamming distance d(10101, 11110) is 3: 10101 11110 = 01011 (three 1s) Computer Organization / Architectures for Embedded Computing 28

Error Correction To guarantee correction of up to t errors in all cases, the minimum Hamming distance in a block code must be d min = 2t + 1 Computer Organization / Architectures for Embedded Computing 29

Hamming Codes Example: Uses multiple parity bits, each p applied to a different subset 0 of data bits Data bits Parity bits d 3 d 2 d 1 d 0 p 2 p 1 p 0 d 0 Encoding: 3 XOR networks to form parity bits d 2 d 1 Checking: 3 XOR p 1 networks p 2 to verify parities d 3 Decoding: Trivial (separable code) Redundancy: 3 check bits for 4 data bits Unimpressive, but gets better with more data bits (7, 4); (15, 11); (31, 26); (63, 57); (127, 120) Capability: Corrects any single-bit error s 2 = d 3 d 2 d 1 p 2 s 1 = d 3 d 1 d 0 p 1 s 0 = d 2 d 1 d 0 p 0 s 2 s 1 s 0 Syndrome s 2 s 1 s 0 Error 0 0 0 None 0 0 1 p 0 0 1 0 p 1 0 1 1 d 0 1 0 0 p 2 1 0 1 d 2 1 1 0 d 3 1 1 1 d 1 Computer Organization / Architectures for Embedded Computing 30

Reed-Muller Code Encoding contains more redundant information to increase the number of errors that can be corrected if needed Uses Hadamard matrices for encoding and decoding stronger error-correcting codes. Computer Organization / Architectures for Embedded Computing 31

Hadamard Matrices Each row is a possible code Each row in the matrix has a Hamming distance d Can fix (d 1) / 2 errors Computer Organization / Architectures for Embedded Computing 32

Encoding Example Input Hadamard Matrix Output 000 00101011 001 10100101 010 00010111 011 + = 11000011 100 01110001 101 10011001 110 01001101 111 11111111 3 bits 8 bits Hamming Distance = 4 (4 1) / 2 = 1 (fixable error) Example: 110 à 01001101 Computer Organization / Architectures for Embedded Computing 33

Decoding Example Example: 0101 1101 à? Mapped Hadamard Matrix Possible Values Compare Differences 000 0010 1011 0101 1101 4 001 1010 0101 0101 1101 5 010 0001 0111 0101 1101 3 011 : 1100 0011-0101 1101 = 5 100 0111 0001 0101 1101 3 101 1001 1001 0101 1101 3 110 0100 1101 0101 1101 1 111 1111 1111 0101 1101 3 Result: 0101 1101 à 0100 1101 à 110 Computer Organization / Architectures for Embedded Computing 34

Summary Four components of disk access time: Seek Time, Rotational Latency, Transfer Time, Controller Time RAIDS can be used to improve availability and performance RAID 1 and RAID 5 widely used in servers, one estimate is that 80% of disks in servers are RAIDs RAID 0+1 (mirroring) EMC, Tandem, IBM RAID 3 Storage Concepts RAID 4 Network Appliance RAIDS have enough redundancy to allow continuous operation, but not hot swapping Assumes independent disk failures Too bad if the building burns down! Computer Organization / Architectures for Embedded Computing 35

Next Class Analog / Digital Interfaces Computer Organization / Architectures for Embedded Computing 36

Disk Storage & Dependability Computer Organization Architectures for Embedded Computing Wednesday 19 November 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK and from Prof. Mary Jane Irwin, PSU