Antoniu Pop 1 The University of Manchester COMP 25212 Storage Technologies - 2 Antoniu Pop antoniu.pop@manchester.ac.uk 16 April 2015
Antoniu Pop 2 The University of Manchester COMP 25212 Storage Technologies - 2 Learning Objectives - Storage 2 Motivate RAID Understand the principles of RAID configurations Understand how RAID impacts performance and reliability Understand failure & recovery constraints
Antoniu Pop 3 The University of Manchester Technology Trends and Drivers 1956-1980s Mainframe/Server Disk Drives High capacity, large formats (e.g., 14 ) Expensive, low volume market Somewhat slow evolution PCs used mostly floppy disks (Early) 1990s Still two markets: Server & PC drives Most PCs use hard disks PC hard disk sales explode high volume market Drives costs lower Drives disk technology faster How to use PC disks to build server-class storage?
ntoniu Pop 4 The University of Manchester COMP 25212 Storage Technologies - 2 RAID Redundant Array of Independent Disks Compensate for loss of reliability, capacity, performance Use lots of cheap(er), (disposable?) disks
Antoniu Pop 5 The University of Manchester COMP 25212 Storage Technologies - 2 Disk Problems and Solutions (recap) Disks are too small Fixed: use multiple disks (possibly striped) Disks are too slow Fixed: disk striping (RAID 0) Disks are unreliable Fixed: disk mirroring (RAID 1) Data redundancy
Antoniu Pop 6 The University of Manchester RAID 0 Striping RAID 0 A4 A8 A5 A9 A6 A7 Disk 0 Disk 1 Disk 2 Disk 3 Number of disks n 4 Read (short... long) 1... n 1... 4 Write (short... long) 1... n 1... 4 Failure tolerance 0 disks 0 disks Capacity efficiency 1 100%
Antoniu Pop 7 The University of Manchester RAID 1 Mirroring RAID 1 Disk 0 Disk 1 Disk 2 Disk 3 Number of disks n 4 Read (short... long) 1... n 1... 4 Write (short... long) 1... 1 1... 1 Failure tolerance n 1 disks 3 disks Capacity efficiency 1/n 25%
RAID 2 Bit-Striping + Hamming Code RAID 2 Ap1 Ap2 Ap3 B0 C0 B1 C1 B2 C2 B3 C3 Bp1 Cp1 Bp2 Cp2 Bp3 Cp3 Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 Disk 6 Number of disks 2 k 1 k = 3 = 7 Read (short... long) 1... 2 k k 1 1... 4 Write (short... long) 1... 2 k k 1 1... 4 Failure tolerance 1..2 disks 1..2 disks 2 Capacity efficiency k 1 2 k 1 4/7 = 57% Antoniu Pop 8 The University of Manchester
Antoniu Pop 9 The University of Manchester RAID 3 Byte-Striping + Parity RAID 3 B0 C0 B1 C1 B2 C2 Ap Bp Cp Disk 0 Disk 1 Disk 2 Disk 3 Number of disks n + 1 4 Read (short... long) 1... n 1... 3 Write (short... long) 1... n 1... 3 Failure tolerance 1 disks 1 disks Capacity efficiency n/(n + 1) 75%
Antoniu Pop 10 The University of Manchester RAID 4 Block-Striping + Parity RAID 4 B0 C0 B1 C1 B2 C2 Ap Bp Cp Disk 0 Disk 1 Disk 2 Disk 3 Number of disks n + 1 4 Read (short... long) 1... n 1... 3 Write (short... long) 0.5 (RMW)... n 0.5... 3 Failure tolerance 1 disks 1 disks Capacity efficiency n/(n + 1) 75%
Antoniu Pop 11 The University of Manchester RAID 5 Block-Striping + Distrib. Parity RAID 5 Bp C0 B0 B1 B2 Cp Ap C1 C2 D0 D1 Dp D2 Disk 0 Disk 1 Disk 2 Disk 3 Number of disks n + 1 4 Read (short... long) 1... n + 1 1... 4 Write (short... long) 0.5 (RMW)... n 0.5... 3 Failure tolerance 1 disks 1 disks Capacity efficiency n/(n + 1) 75%
Antoniu Pop 12 The University of Manchester RAID 6 Double Distributed Parity RAID 6 Bq Cp D2 B0 B1 B2 Cq Dp C0 Dq Ap Aq Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Bp C1 C2 D0 D1 E1 E2 Ep Eq E0 Number of disks n + 2 5 Read (short... long) 1... n + 2 1... 5 Write (short... long) 0.5 (RMW)... n 0.5... 3 Failure tolerance 2 disks 2 disks Capacity efficiency n/(n + 2) 3/5 = 60%
Antoniu Pop 13 The University of Manchester COMP 25212 Storage Technologies - 2 Nested RAID Each raid configuration comes with tradeoffs Combine RAID configurations to alleviate shortcomings Multiple RAID layers
Antoniu Pop 14 The University of Manchester RAID 1+0 (aka. RAID 10) RAID 0 RAID 1 RAID 1 A4 A4 Disk 0 Disk 1 Disk 2 Disk 3 Number of disks m [RAID 1] n [RAID 0] 2 2 = 4 Read (short... long) 1... n 1... 2 Write (short... long) 1... n 1... 2 Failure tolerance m 1 disks 1 disks Capacity efficiency n/(m n) = 1/m 1/2 = 50%
Antoniu Pop 15 The University of Manchester RAID 01 RAID 1 RAID 0 RAID 0 A4 A5 A4 A5 Disk 0 Disk 1 Disk 2 Disk 3 Number of disks m [RAID 1] n [RAID 0] 2 2 = 4 Read (short... long) 1... n 1... 2 Write (short... long) 1... n 1... 2 Failure tolerance m 1 disks 1 disks Capacity efficiency n/(m n) = 1/m 1/2 = 50%
Antoniu Pop 16 The University of Manchester RAID 10 vs. 01 different? RAID 1 RAID 0 RAID 0 A6 A4 A7 A5 A8 A6 A4 A7 A5 A8 RAID 1 RAID 0 RAID 1 A6 A6 A4 A7 A4 A7 A5 A8 A5 A8
Antoniu Pop 17 The University of Manchester RAID 50 RAID 5 RAID 0 RAID 5 Bp B0 Ap B1 Bp B2 Ap B3 C0 Cp C1 C2 Cp C3 D0 D1 Dp D2 D3 Dp Ep E0 E1 E2 Ep E3
Antoniu Pop 18 The University of Manchester RAID 160 RAID 1 RAID 6 RAID 0 RAID 1 RAID 1 RAID 1 Ap Ap Aq Aq Bq Bq B0 B0 B1 B1 Bp Bp Cp Cp Cq Cq C0 C0 C1 C1 D0 D0 Dp Dp Dq Dq D1 D1 E0 E0 E1 E1 Ep Ep Eq Eq RAID 6 RAID 1 RAID 1 RAID 1 RAID 1 Ap Ap Aq Aq Bq Bq B2 B2 B3 B3 Bp Bp Cp Cp Cq Cq C2 C2 C3 C3 D2 D2 Dp Dp Dq Dq D3 D3 E2 E2 E3 E3 Ep Ep Eq Eq
Antoniu Pop 19 The University of Manchester RAID Failure Mode Operation What happens when a disk fails? RAID 0 Lose all data (hope there s more than one RAID layer) RAID 1 Business as usual, hot-swap the failed disk RAID 2-6 Operate in degraded mode If data drive failed, every read must be reconstructed If parity drive failed, low performance impact Replace drive (hot-swapping: the system continues running) Rebuild the array (re-constitute the state of the lost drive)
Antoniu Pop 20 The University of Manchester RAID Recovery Limitations Rebuilding a degraded array Sequentially How long? On live system? Risk of failure during recovery Statistical distortion: higher risk of multiple failures within a narrow time frame RAID 5 risk advisory notice: Do not use for business-critical data! [Dell]
Antoniu Pop 21 The University of Manchester COMP 25212 Storage Technologies - 2 Where to Implement RAID? Software Operating System Most OS now provide software RAID E.g. Linux md (multiple devices) supports RAID 0, 1, 4, 5, 6 plus nestings Software File System E.g., ZFS Dedicated hardware (RAID controller)
Antoniu Pop 22 The University of Manchester COMP 25212 Storage Technologies - 2 Array Failure Rates (full data loss) Failure rate of a disk drive: r (with some assumptions!) RAID 0 1 (1 r) n RAID 1 r n RAID 2 It s complicated RAID 3-5 1 (1 r) n RAID 6 1 (1 r) n ( ) n r 1 1 (1 r) n 1 ( ) n r 1 (1 r) n 1 1 ( ) n r 2 (1 r) n 2 2 Example drive 1%/year failure rate with RAID 6 (3+2 drives): 1 0.99 5 5 0.01 0.99 4 4 5 2 0.012 0.99 3 = 0.000009851
Antoniu Pop 23 The University of Manchester Hamming Codes Ap1 Ap2 Ap3 RAID 2 no longer used, but... Hamming code error correction ECC
Antoniu Pop 24 The University of Manchester Parity Old idea: first tape drive (1951) had a parity track Transverse redundancy check How does it really work?