Technical White paper RAID Protection and Drive Failure Fast Recovery RAID protection is a key part of all ETERNUS Storage Array products. Choices of the level chosen to meet customer application requirements involve evaluating a number of aspects of the application demands. The considerations of these aspects are addressed in this paper, including operation in degraded mode, time to recover to a protected state, and time to restore to the fully configured state. The introduction of the Fast Recovery feature changes some of the long held assumptions about the use of the different RAID level choices and provides recovery to a protected state in a small fraction of the time of other choices. Table of Contents 1 Introduction... 1 2 Definitions... 1 3 Failure / Protection Relationships... 2 3.1 RAID1 and RAID10 Failed Drive Operations... 2 3.2 RAID5 Failed Drive Operations... 3 3.3 RAID6 Failed Drive Operations... 3 3.4 Failed Drive Operations... 3 4 The Cost of Protection... 4 5 Recovery of Protection... 4 5.1 Copy-back Full Configuration Restore Considerations... 4 6 Conclusions... 5 7 Minimum and Nominal Rebuild Time Charts... 6 List of Figures Figure 1 - Minimum Rebuild Times with very low host traffic... 6 Figure 2 - Nominal Rebuild Times with host traffic... 6 List of Tables Table 1 - Valid RAID Group Combinations... 2 Table 2 - Normalized Relative Rebuild Rates... 3 Table 3 - Usability / Protection Relationships... 4 1 Introduction RAID has been the standard means of ensuring against loss of data with storage arrays for many years, with several organizational forms available within the ETERNUS Storage Array product family. There are three key aspects of protecting against data loss when drives fail; first, operations when the failure is recognized; second, recovery to a protected state; and third when the failed drive is replaced and how the array handles the replacement operation. The introduction of the Fast Recovery feature in the ETERNUS Storage Arrays has changed the environment of data protection and recovery. This paper provides some insight into the value of this feature as well as helping the reader understand the impact of different RAID protection mechanisms on the effective deployment of the storage array products. Page 1 of 7 This paper is not intended to be a tutorial on RAID technology, as the technology is well covered in many existing documents, both within Fujitsu and in publicly available papers. 2 Definitions In the context of this paper, there are several definitions that the reader needs to understand to get the full meaning of the information provided. Fully Protected the state of a RAID Group when all of the protection capability offered by that particular RAID Group organization is in force. Degraded Mode the state of a RAID Group when the data is still accessible by the applications, but there may be no protection or a lower level of protection than the RAID Group offers when Fully Protected. Rebuild Mode the state of a RAID Group when the data is accessible, but there has been a failure and operations are underway to restore the RAID Group to a Fully Protected state.
Copy-back Mode the state of a RAID Group when the protection data is being restored to a replacement drive, while the RAID Group is in a Fully Protected state Copy-back-less Mode a mode of operation where a replacement drive does not assume the role of the drive it is replacing, but leaves the protection data in the rebuilt target location(s). Global Hot Spare (GHS) one or more drives within an array that can be used in any of several RAID Groups to replace a failed drive through the rebuild process appropriate for that RAID organization. Dedicated Hot Spare a drive that is a part of a RAID Group but that is not holding active data, but rather is available to replace a failed drive within just that group. a special form of a RAID6 Group that includes a Dedicated Hot Spare-like drive with all drives active within the normal operation of the group. The Hot Spare space is distributed across all of the drives in the group. There are specific valid member disk combinations, where xd represents a number of Data drives, 2P designates two Parity drives, and 1HS indicates one Dedicated Hot Spare drive. Organization (Ordered by Total Drives per RAID Group) User Drives per RAID Group Total Drives per RAID Group (3D+2P)x2+1HS 6 11 (6D+2P)x2+1HS 12 17 (9D+2P)x2+1HS 18 23 (12D+2P)x2+1HS 24 29 (5D+2P)x4+1HS 20 29 (13D+2P)x2+1HS 26 31 (3D+2P)x6+1HS 24 31 Table 1 - Valid RAID Group Combinations Usability Factor the portion of the total space in the drives of a group that can be used to hold user data (see Table 3); the higher the usability factor the less the cost in number of drives for a given amount of user storage. Likewise, the lower the usability factor the greater the cost in number of drives for a given amount of user storage. 3 Failure / Protection Relationships When disk drives fail within a RAID protected set, a change in activity takes place. With the data space that was held on the failed drive no longer available, accesses require special processing; depending upon the organization of the RAID Group. Failure Probability indicates the probability that a device will fail, in this case that there will be a failure in a disk drive. There are a number of different failures that can occur within a disk drive that are important to consider in choosing a RAID Group organization. Degraded Operation Time indicates the period of time that the RAID Group must reconstruct portions of the data that were held on the failing drive. Recover Protection Time indicates the period of time that the RAID Group has less than the expected level of fault protection. This is the time that a Hot Spare may be used to rebuild the content of the failed drive. During this time, host accesses will experience greater than normal response times. Restore Protection Time indicates the period of time that will be required to fully restore the RAID Group to the planned configuration. This includes the time required to obtain the replacement drive, get it installed in the array, and restore it into the planned configuration role. 3.1 RAID1 and RAID10 Failed Drive Operations In the case of RAID1 or RAID10, instead of balancing the Read operations across the two mirrored drives, all of the reads must be serviced by the remaining drive of the pair. Instead of writes being directed to both drives of the pair, only one write can be supported. Although failure of another drive within a RAID10 group is protected, if the failing drive is mated with an already failed drive, the second failure will result in data loss. If there is a suitable Hot Spare drive, either dedicated or global, then rebuilding can begin to restore protection to the group right away. The operation of rebuilding a RAID1 or RAID10 member involves copying the data from the surviving mate of the pair over to the replacement drive (either the Hot Spare or new replacement drive). The maximum rate of the rebuild is determined by the Write rate of the single drive in the copy operation. If the group is configured as a Copy-back Mode group, when the failed drive is replaced, a copy back operation will be initiated. Again this involves copying the content of the used Hot Spare drive to the new drive, and is limited to the maximum Write rate of the single drive in the copy operation. Page 2 of 7
3.2 RAID5 Failed Drive Operations In the case of RAID5, Read accesses require recovery of the data from the surviving drives within the RAID Group. Consider a RAID5(4D+1P) Group where four drives must be read to reconstruct the data for the failed drive access. Write operations involve reading from all of the surviving drives and may require writing back to one or two, depending upon where the data is located within the stripe. If there is a suitable Hot Spare drive, either dedicated or global, then rebuilding can begin to restore protection to the group right away. The rebuilding of a RAID5 group involves reading from all of the surviving drives and writing to the replacement drive. If another drive fails before the rebuilding process is completed, the failure will result in data loss. If the RAID5 group is configured in Copy-back Mode, when the failed drive is replaced, a copy back operation will be initiated. This involves copying the content of the used Hot Spare drive to the new drive, and is limited to the maximum Write rate of the single drive in the copy operation. 3.3 RAID6 Failed Drive Operations As in the case of RAID5, Read accesses in RAID6 require recovery of the data from the surviving drives within the RAID Group. In the case of a RAID6(4D+2P) Group, five drives must be read to reconstruct the data for the failed drive access. Likewise, Write operations involve reading all the surviving drives and writing two or three drives, depending upon where the data is located within the stripe. If there is a suitable Hot Spare drive, either dedicated or global, then rebuilding can begin to restore the full protection to the group right away. The rebuilding in a RAID6 group involves reading from all of the surviving drives and writing to the replacement drive. If another drive fails while the rebuilding process is active, the rebuilding can still complete without data loss, but the additional failed drive will need to be treated as well. The maximum rate of the rebuild is limited by the Write rate of the single drive. If the RAID6 group is configured in Copy-back Mode and a Hot Spare drive was used for the initial recovery, when the failed drive is replaced, a copy back operation will be initiated. This involves copying the content of the used Hot Spare drive to the new drive and is limited to the maximum Write rate of the single drive in the copy operation. With large NL-SAS drives, both rebuild and copy-back can take many hours, and with host traffic active, the very large drives can easily take more than a day to complete the rebuild or copy-back. With the protection present against a second drive failure afforded by RAID6, and with a maintenance contract to provide a replacement drive within a day for the failed drive, it has been recommended to not use Hot Spare drives for these RAID Groups. This is because the failed drive can usually be replaced before the first rebuild completes, but the rebuild must complete before the copy-back can begin, and therefore the system will incur a very long time of reduced capability while all the rebuilding and copyback is active. 3.4 Failed Drive Operations The introduces a new form of RAID6 Group, which includes an extra drive in support of two or more RAID6 groups. The equivalent space of one drive is provided through some reserved space on all of the drives in the set. Data spans all of the drives, along with dual parity protection over subgroups within the set using a rotating assignment scheme. When a drive in a group fails, rebuild begins immediately to the reserved space in all of the surviving drives. The rebuild operation proceeds more rapidly than in the other RAID organizations because the rebuild rate is not limited by the Write rate of a single drive. Table 2 shows the relative rebuild rates with both the standard RAID6 organization and most of the organizations. The key reason that the rebuild rates are much higher is that all of the surviving drives in the RAID Group provide space for the recovered data, eliminating the bottleneck of the Write rate of a single drive. This results in much shorter times that the group is exposed to a lesser degree of protection than planned in the configuration. The relative rebuild rates are normalized to the rate required by the RAID6(3D+2P) rate, considered in MB/s. This rate is determined by the Write rate on the target drive(s) for the rebuild. The time taken to rebuild is dependent upon the type of drive, the size of the drive and the amount of traffic in the system when the rebuild is running. RAID Organization (Ordered by Rebuild Rate) Rate with No Host Traffic Rate with Host Traffic RAID6(3D+2P) 1.0 0.5 (3D+2P)x2+1HS 7.4 3.8 (6D+2P)x2+1HS 12.5 4.2 (9D+2P)x2+1HS 17.7 5.6 (13D+2P)x2+1HS 22.2 6.3 (3D+2P)x6+1HS 48.0 13.3 Table 2 - Normalized Relative Rebuild Rates Certainly the rebuild rate is reduced when there is host traffic, as it is normally performed at low priority, giving preference to service of the user demands. Still it is clear that the higher rebuild rates for the organizations will significantly reduce the time that the array is exposed to loss at a level below the planned protection level. Page 3 of 7
4 The Cost of Protection Protecting against data loss does not come for free there is a cost for various levels of protection offered by the different RAID organizations. One way to look at the cost of protection is to consider what portion of the total space offered by the drives is available for user data. This can be viewed from a high level as the ratio of user drives to total drives. This needs to be weighed against the cost of lost data when a drive fails. Table 3 shows the levels of usability for approximately the same number of user drives and the associated protection level afforded by the different RAID organizations. Protect Level 0 indicates that there is data loss when any drive fails the data is not protected at all. Protect Level 1 indicates that one drive in a group can fail and the data is protected against loss, but a second drive failure will cause loss. Protect Level 2 indicates that any two drives in a group can fail and the data is protected against loss. RAID Organization (Ordered by Usability Factor) # RAID Groups User Drives Total Drives Usability Factor Protect Level RAID0(4D) 6 24 24 1.00 0 (13D+2P)x2+HS 1 26 31 0.84 2 (12D+2P)x2+HS 1 24 29 0.83 2 (9D+2P)x2+HS 1 18 23 0.78 2 (3D+2P)x6+HS 1 24 31 0.77 2 RAID5(4D+1P)+GHS 6 24 31 0.77 1 (6D+2P)x2+HS 2 24 34 0.71 2 (5D+2P)x4+HS 1 20 29 0.69 2 RAID6(4D+2P)+GHS 6 24 37 0.65 2 (3D+2P)x2+HS 4 24 44 0.55 2 RAID10(4+4)+GHS 6 24 49 0.49 1* Table 3 - Usability / Protection Relationships (1* indicates that in a RAID10 group there may be protection against another drive failure, provided it is not the mate of the first failed drive.) Note that Global Hot Spares are commonly used with some of the RAID organizations to reduce the number of drives and improve the Usability Factor. When the first drive failure in any group which is protected with a Global Hot Spare is encountered, the rebuild operation can begin right away. Some other group can encounter a failure, but without a spare to use, the rebuild will be delayed, exposing the group to data loss if another drive fails before the Hot Spare has been replaced. 5 Recovery of Protection A key aspect of any protection mechanism is the time of exposure to additional failures and the time required to recover a degree of protection closer to that planned in the configuration. It should be clear to the reader that until the failed drive is replaced, the protection level is not at that planned for the configuration. Recovery of the primary level of protection should be completed as soon as possible to ensure against data loss on any subsequent failures. The amount of time to complete the rebuild after a drive fault varies quite widely depending upon several factors. These include: RAID Organization most RAID organizations (RAID1, RAID10, RAID5, and RAID6) require rebuilding to a single replacement or Hot Spare drive, which limits the rebuild rate to the Write rate of a single drive. is able to use all of the surviving drives in the group during the rebuild process, therefore rebuilding at a much faster rate, reducing the exposure time. Drive Size and Speed the size and speed of the failed drive determine the rate of rebuild and the time it takes to complete the rebuild, with larger slower drives rebuilding can take a long time. Host Traffic the level of host traffic on the system also impacts the rebuild rate, as the rebuild is normally conducted at a lower priority than supporting the host demands, so with heavy host traffic, the exposure time is extended as well. The fastest rebuild rate, and therefore the minimum exposure time is when there is very little host traffic. In this case the RAID organization and drive type determine the exposure time. 5.1 Copy-back Full Configuration Restore Considerations It is important to recognize that it is necessary to replace any failed drive in a timely manner, and is no exception. Fast Recovery provides full protection for additional failures, but when the failed drive is replaced, it must be integrated into the group to complete the restore operation. This operation requires rebuilding the content of the single drive that is being reintroduced into the group. Unfortunately the rate for this operation is limited by the Write data rate of the single drive being reintroduced. This time is very much the same as the rebuild time for the RAID5 group shown on the charts taking much longer when there is host traffic than when there is little host traffic. Fortunately this is an operation that can be deferred until a portion of the day when the host traffic is light, and the rebuild rate will be the fastest. One of the really good aspects provided by is that the group is fully protected while the re-introduction process is underway, so any failure encountered can be recovered without data loss. Page 4 of 7
6 Conclusions This paper has shown that the feature reduces the recovery time for the first disk failure to only one tenth of the time that other RAID organizations require. This reduces the possibility of data loss from a second disk failure. By reducing the recovery time, normal host response time performance will return much more quickly than with conventional recovery procedures. In addition, provides full protection during the drive replacement process, further ensuring against data loss. It is noted however, that when there is heavy host traffic, the recovery time will take longer, but with the recovery time with traffic is still much less than with the other RAID organizations. As is always the case, recovery time is directly a function of the size and speed of the drives making up the RAID Group. Information has been provided for the reader to use in choosing an effective balance of recovery time and cost of storage for the specific demands of the application environment. It is quite clear that use of with larger NL-SAS drives makes the recovery time much more in line with much higher speed drives in the smaller sizes. This ensures that the data is protected as much of the time as possible when drive failures are encountered. Refer to the ETERNUS DX S3 Performance Guide (Advanced) P3AM- 7932, ETERNUS DX100 S3/DX200 S3 Product Notes P3AM-7682, and ETERNUS DX500 S3/DX600 S3 Product Notes P3AM-7762 for further information on details and limitations. Page 5 of 7
7 Minimum and Nominal Rebuild Time Charts Figure 1 - Minimum Rebuild Times with very low host traffic Figure 2 - Nominal Rebuild Times with host traffic Notice that with host activity, which represents the more normal situation, the rebuild times are approximately double what the best times can be when there is no host activity. The key benefit to realize with the organization is the rebuild times are only ~10% of what they are for the standard forms of RAID organization. This reduces the exposure time to additional failures to a much shorter time and provides protection for second failures long before a replacement drive can be installed, even when one is available on site. Page 6 of 7
About Fujitsu America Fujitsu America, Inc., is a leading ICT solutions provider for organizations in the U.S., Canada and the Caribbean. Fujitsu enables clients to meet their business objectives through integrated offerings and solutions, including consulting, systems integration, managed services, outsourcing and cloud services for infrastructure, platforms and applications; data center and field services; and server, storage, software and mobile/tablet technologies. For more information, please visit: http://solutions./ and http://twitter.com/fujitsuamerica FUJITSU AMERICA, INC. Address: 1250 East Arques Avenue Sunnyvale, CA 94085-3470, U.S.A. Telephone: 800 831 3183 or 408 746 6000 Website: http://solutions. Contact Form: http://solutions./contact Have a question? Email us at: AskFujitsu@ Fujitsu, the Fujitsu logo and ETERNUS are trademarks or registered trademarks of Fujitsu Limited in the United States and other countries. All other trademarks referenced herein are the property of their respective owners. The statements provided herein are for informational purposes only and may be amended or altered by Fujitsu America, Inc. without notice or liability. Product description data represents Fujitsu design objectives and is provided for comparative purposes; actual results may vary based on a variety of factors. Specifications are subject to change without notice. Copyright 2015 Fujitsu America, Inc. All rights reserved. FPC65-7381-01 03/15 14.0957 Page 7 of 7