RAID: Redundant Arrays of Independent Disks Dependable Systems Dr.-Ing. Jan Richling Kommunikations- und Betriebssysteme TU Berlin Winter 2012/2013
RAID: Introduction Redundant array of inexpensive disks or: Redundant array of independent disks Redundancy by spreading data across several disks Tolerating disk failures Needs at least two disks I/O performance increased by parallel access Caution: Relation between disks and controllers Originally by Patterson et. al. 1988: 6 types Here: Original types and commonly used variations n always represents the number of disks necesssary to get n times the capacity of a single disk 8 2 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
RAID 0 (I) RAID 0 A D G B E H C F Disk 1 Disk 2 Disk 3 8 3 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
RAID 0 (II) Striped array with no fault-tolerance Requires n disks Advantages High performance (theoretically: n-times of single disk) Simple design Disadvantages No fault-tolerance Reliability lower than for single disk Failure of one disk leads to complete data loss Use cases Storing temporary data with need for very high bandwidth Video production Compilation of large software projects 8 4 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
RAID 1 (I) RAID 1 A B A B Disk 1 Disk 2 8 5 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
RAID 1 (II) Mirroring of data Requires 2n disks Advantages One write or two reads in parallel Increased read performance (similar RAID 0) Simple design Tolerates failure of one disk No reconstruction of data necessary Disadvantages High overhead (100%) Use cases Applications requiring very high reliability Usually used for smaller amount of data 8 6 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
RAID 2 (I) RAID 2 A E B F C G D H ECC A-D 1 ECC A-D 2 ECC A-D 3 ECC E-H 1 ECC E-H 2 ECC E-H 3 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 Disk 6 Disk 7 8 7 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
RAID 2 (II) Using hamming code ECC to spread data among disks Word-wise ECC Number of disks depends on code Advantages Design simpler than RAID 5 Correction of single disk failure High data rates Disadvantages High overhead (decreases with number of disks) High entry cost No commercial implementation Use cases Not used in industry 8 8 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
RAID 3 (I) RAID 3 bytewise parity generation A D B E C F Parity A-C Parity D-F Disk 1 Disk 2 Disk 3 Disk 4 8 9 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
RAID 3 (II) Stripe set with additional disk for bytewise parity Requires n + 1 disks Advantages High data rates Low overhead Disk failure has low impact on performance Disadvantages Complex design Transaction rate equals single disk Unequal distribution of accesses to disks (parity disk is accessed much more often) Use cases Applications requiring high reliability Rarely used today 8 10 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
RAID 4 (I) RAID 4 blockwise parity generation A D B E C F Parity A-C Parity D-F Disk 1 Disk 2 Disk 3 Disk 4 8 11 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
RAID 4 (II) Similar to RAID 3 but using blockwise parity instead of byteor wordwise Requires n + 1 disks Advantages Similar to RAID 3 Disadvantages Similar to RAID 3 Use cases Applications requiring high reliability Rarely used today 8 12 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
RAID 5 (I) RAID 5 blockwise parity generation A Parity D-F G B D Parity G-I C E H Parity A-C F I Disk 1 Disk 2 Disk 3 Disk 4 8 13 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
RAID 5 (II) Striping with distributed / interleaving parity Parity is spread over disks Requires n + 1 disks Advantages High data rates Low overhead Equally distributed accesses Disadvantages Disk failure has medium impact on performance Complex design Complex rebuild Use cases Applications requiring high reliability Most versatile RAID level Most often used today 8 14 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
RAID 6 (I) RAID 6 blockwise parity generation 1 blockwise parity generation 2 A Parity 2 C-D Parity 1 E-F B C Parity 2 E-F Parity 1 A-B D E Parity2 A-B Parity 1 C-D F Disk 1 Disk 2 Disk 3 Disk 4 8 15 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
RAID 6 (II) Similar to RAID 5 but using two different parity algorithms to generate two sets of parity blocks Parity is spread over disks Requires n + 2 disks Advantages Higher reliability than RAID 5 High data rates Low overhead Equally distributed accesses Disadvantages Lower write performance Higher overhead Complex design High controller overhead Use cases Applications requiring high reliability Starts replacing RAID 5 in a lot of applications 8 16 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
Variations and Other Aspects Nested RAID levels RAID 1+0 RAID 0+1 Implementation issues Hardware-RAID Software-RAID RAID-implementations found on some mainboards Performance considerations 8 17 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
RAID 1+0 (I) RAID 1+0 A C A C B D B D Disk 1 Disk 2 Disk 3 Disk 4 Mirror 1 Mirror 2 8 18 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
RAID 1+0 (II) Stripeset of mirrors Sometimes called RAID 10 Requires 2n disks Advantages Combination of advantages of RAID 1 (redundancy) and RAID 0 (performance) Simple design High data rates Disadvantages High overhead Use cases Applications requiring high performance and high reliability 8 19 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
RAID 0+1 (I) RAID 0+1 A C B D A C B D Disk 1 Disk 2 Disk 3 Disk 4 Stripe set 1 Stripe set 2 8 20 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
RAID 0+1 (II) Mirrored stripesets Sometimes called RAID 01 Requires 2n disks Advantages High performance Tolerates single disk failure (becomes RAID 0) Simple design Disadvantages High overhead Use cases Applications requiring high performance and some reliability 8 21 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
RAID 1+0 vs. RAID 0+1 In case of four disks, which one is better? Goal: High reliability Modelling using reliability block diagrams Assumption: All disks have the same reliability Calculation: Exercise! 8 22 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
Hardware-RAID vs. Software-RAID Hardware-RAID RAID algorithm is calculated by controller or external device Operating system sees a single disk device No impact on performance of main processor RAID array usually only usable with the original controller or compatible Hardware design optimized for RAID usage Usually operates on whole disks Software-RAID RAID algorithm executed by OS (or one of its drivers) using main processor Performance of main processor is affected by RAID RAID-drivers creates array out of several devices RAID array can be used with other controllers that use the same drive geometry as long as driver is compatible Hardware usually not optimized for RAID usage (number of disks per controller, disks sharing I/O-channel, ) Usually operates on partitions 8 23 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
Mainboard-RAIDs A lot of modern mainboards features RAID -controllers Usually not hardware-raid RAID-algorithms are executed on main processor by driver using specialized hardware Acceleration by specialized hardware depends on implementation Extreme cases are pure software (cheap) and pure hardware implementations (expensive) Caution: Array usually cannot be used by mainboards of other types 8 24 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
Performance Considerations (I) Performance of RAID depends on all components along the data path Disk Controller and the topology of disks (shared bus vs. point-to-point) Bus system of computer (topology, bandwidth) I/O-performance of system Performance depends on weakest part! Example for bad design: Four EIDE disks delivering 80 MB/each Overall throughput: 320 MB/s Each pair shares a 100 MB/s PATA channel Overall throughput: 200 MB/s Each PATA channel is connected to a PCI controller Overall throughput: 133 MB/s PCI bus limits theoretical peak throughput Reality: Much slower (overhead, other devices use PCI too, ) Only one third of disk bandwidth is usable in parallel operation 8 25 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
Performance Considerations (II) Example for good design: Four SATA disks delivering 80 MB/each Overall throughput: 320 MB/s Each device uses a 1.5 Gbs SATA 1 link Overall throughput: 600 MB/s Two devices share a PCI-e x1 controller Overall throughput: 500 MB/s In parallel operation total overall disk bandwidth can be used Software-RAID: Performance of main processor Trade-off between performance needed for RAID and rest of system Hardware-RAID: Is the interface fast enough for the maximum bandwidth of chosen RAID configuration? 8 26 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
Spare disks Problem: Most RAID implementation have very low reliability in degraded mode Fast repair is necessary (human activity) Hardware-RAID and some Software-RAIDs: Hot swap is possible But: Low MTTR leads to high cost Solution: Spare disks Additional disk(s) not being part of array In case of failure controller uses spare disk instead of failed disk and starts reconstruction of array Results in intact array Spare disk is replaced later But: Using RAID with higher fault tolerance is better due to critical phase while reconstructing E.g., RAID 5 + Spare vs. RAID 6 8 27 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
RAID and Reliability (I) Calculation based on reliability block diagram Assumption (fault model): Disk fault results in read error Controller recognizes read error and locates faulty disk RAID continues to work in degrated mode Problem In reality, disk faults often result in errors only while accessing the data Faults on seldom used parts of disk may exist for a long time not resulting in an error Result: Risk for data loss increases as the next error forces rebuilding of RAID and hidden fault results in the second error Solution: Disk scrubbing 8 28 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
Disk Scrubbing Idea: Access all sectors of all disks of the RAID system at regular intervals Implemented by forced rebuilding of array Detects hidden faults (they result in error while accessing a disk) Administrator replaces disk Reduces probability of double disk error Good practice: Once per month Linux Software-RAID: echo check > /sys/block/mdx/md/sync_action 8 29 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013
RAID and Reliability (II) Reliability does not only depend on disks Other components are important too Controller Power supply Cabling Cooling Problem: Single point of failure Detection requires detailled knowledge on configuration Hidden interfaces have to be considered 8 30 Dr.-Ing. Jan Richling Dependable Systems Winter 2012/2013