STORAGE SOLUTIONS WHITE PAPER Best Practices RAID Implementations for Snap Servers and JBOD Expansion Contents Introduction...1 Planning for the End Result...1 Availability Considerations...1 Drive Reliability...2 Supported Drive Types and Capacities...2 Disk Drives and RAID Levels...2 Disk Drive Reliability and Failure Probabilities...2 Drive Spares and Hot Sparing...2 Instant Capacity Expansion (I.C.E.)...3 RAID 5 Rebuild Times...3 Sample Configurations...3 RAID 5...3 RAID 1...5 RAID 0...5 RAID Planning Summary Checklist...6 Taking the Snap Server 550 To the Max...7 Introduction This document is a best practices paper for the successful integration of Snap Servers using expansion storage. It contains the initial criteria on how to best configure RAID for the system. With this guide, you can choose configuration options for a range of capacity and availability needs, and to best meet your specific requirements. The general guidelines in this paper apply to the new Snap Server 500 Series, Snap Server 18000 and expansion, as well as the legacy Snap Server 4500, Snap Disk 30 and Snap Disk 10 expansion arrays. Planning for the End Result Effective storage implementations require good configuration and capacity planning. It is important to know the existing storage requirements and also to anticipate what those requirements will be in a year or two, to insure that the solution being applied today will meet your storage projections for that timeframe. A significant benefit of the Snap Server modular storage design is that you do not have to buy all of your storage at once. You can buy enough capacity to meet your current needs, and gradually scale your solution over time to accommodate increasing storage demands as needed. Availability Considerations The Expansion Array provides a universal expansion solution for the Snap Server 520, 550, 4500, and 18000. The Snap Disk 10 provides expandability for the Snap Server 4200 and 4500. You may also be configuring legacy Snap Disk 30 expansion for the Snap Server 4500 or 18000. No matter which storage configuration or expansion option you are using, there are several factors which affect the amount of usable storage and depend on the level of fault tolerance required; specifically, the type of drives, RAID level, disk layout, and number of hot spares chosen. Drive Reliability Relative to disk drives, the standard methods for calculating Mean Time To Failure (MTTF) and statistics are well documented. Ultimately, the Snap Server strategy for maximizing reliability is to use the highest-quality components available, and to simplify the component count necessary to meet a broad set of solutions. In a broad set of applications RAID 5 has become the preferred data protection standard, affording the best compromise in usable capacity and availability. The reliability of the disk drives chosen for these configurations also affects the overall availability of the solution. Higher reliability drives minimize the probability of a disk failure, which also extends the availability protection RAID 5 affords. In general terms, the chart below shows you the spectrum of reliability by drive type. The Snap Server 500 Series and expansion now allow you to utilize both high-reliability SAS drives for primary applications and higher-volume SATA drives for secondary, nearline applications. High-end Fibre Channel (FC) Enterprise Storage Contiuum Serial Attached SCSI (SAS) Serial ATA (SATA) Low-end
2 Supported Drive Types and Capacities The supports either SAS or SATA drives, in various capacities and speeds: SAS SATA Capacity RPM Capacity RPM 36GB 15,000/10,000 250GB 7200 74GB 15,000/10,000 500GB 7200 146GB 15,000/10,000 300GB 10,000 Additional disks may be added as new capacities become available. Periodically check the standard price list for updates Disk Drives and RAID Levels RAID sets are created for three reasons: (1) to maximize the available disk I/O bandwidth; (2) to maximize the available capacity for data; (3) to reduce the probability of multi-drive failures within the same RAID set, resulting in data loss. RAID 0 striping - provides the best performance, and uses the capacity of all the available disk drives in the RAID set, but has no resiliency, if a drive fails all data is lost. However, if the data can be easily reloaded, RAID 0 could be your best performance choice. RAID 1 mirroring - provides good performance, and has the best resiliency, because the data is mirrored to another disk drive. However, your total available capacity for data is one-half the total number of available disk drives. The Snap Server has the ability to create an N-way mirror meaning you can assign any number of additional disk drives to the mirror drive. RAID 5 - striping with interleaved parity - provides the best compromise between usable capacity and available I/O bandwidth. It has good failure resiliency as well, because parity is generated and written with each stripe. Should a disk drive fail, the data parity is used to reconstruct the missing data segment on the fly. However, the equivalent capacity of one disk drive is reserved for storing the data parity. For example, a 4-drive RAID 5 RAID set only has the available capacity of 3 drives. For the SANbloc S50 JBOD expansion chassis, we recommend a RAID group size of no more than 12 disks for RAID groups that contain 36GB drives. If you are using drives larger than 36GB, we recommend that you not select more than 8 drives per RAID set. Within a RAID set, all drives MUST be of the same drive type and capacity. These recommendations are based on balancing the amount of time it takes to reconstruct a failed disk with the performance characteristics of a RAID group. Generally, smaller RAID sets rebuild faster than larger ones. Disk Drive Reliability and Failure Probabilities To the extent of the maximum available I/O bandwidth, the larger the RAID stripe, the more performance you will get. However, if you were to build your storage solution combining, for example, all 16 drives into a single RAID set 4 in the head unit, and 12 in the expansion JBOD -- from a failure probability perspective, there would now be a total of 16 chances for a drive failure. The more disk drives you add to a RAID set; generally, the greater the chances are of having multiple drive failures within that set. A single drive failure in a RAID 5 configuration, while not catastrophic, causes degraded performance while the missing data segment is being recreated from parity. While the RAID set is in degraded mode, including during rebuilding to a replacement drive, should another disk drive failure occur, your data would be lost. Careful consideration should be given to the size of the RAID set, and the time to rebuild the replacement disk so the RAID 5 set can return to optimal status in minimum time. Rebuild time is directly related to the capacity of the disk drive, the amount of activity on the system at the time of rebuild, and somewhat related to the number of disks in that particular RAID set. The larger the disks, the more time it will take to rebuild the replacement RAID member, which also increases the risk of an additional drive failure during that rebuild time. For these reasons, we recommend creating RAID 5 arrays using a maximum of 8 drives, which provides a good balance of RAID resiliency, performance, and rebuild times. GuardianOS does support RAID sets up to 24 drives. However, we do not recommend creating RAID 5 arrays with more than 8 drives. Drive Spares and Hot Sparing You can further improve mean time to rebuild a RAID 5 array by assigning a hot spare to the configuration. When a hot spare is defined, should a single-drive failure occur, the spare drive would immediately be utilized to rebuild the failed RAID member. This requires no operator intervention and improves MTTR (mean time to recovery,) allowing for the immediate, unattended repair of any failed drive in an array. It is possible to define drives either as local spares (which are assigned to a specific RAID set) or as global spares which can repair any RAID that has suffered a drive failure where the spare drive meets the requirements of the RAID that has lost a member. Regardless of which sparing scheme is used, the repair process is automated, so that the RAID is rebuilt as soon as possible. This minimizes exposure to multi-drive failures, which can be devastating in a RAID 5 environment. Adding a hot spare to the configuration will cost you the capacity of a single disk, but allows for automatic rebuilding of the RAID set to an optimal state; and affords the narrowest rebuild window. The expansion chassis can support both SAS and SATA drives, which come in varying capacities and rotational speeds (RPM). In general, SATA drives should not be
3 spares for SAS drives, due to reliability, capacity, and RPM differences. It is a good practice to have spares for each type of drive model and capacity installed in a system. If local hot spares are assigned per expansion chassis they need to be of like model and capacity of the other drives in the chassis. For disk drive spares support, a good rule of thumb used in the storage industry is to keep two drives per year of expected system life per 100 drives of each drive model and capacity. Instant Capacity Expansion (I.C.E.) You may prefer to add storage capacity as it s needed, instead of all at one time. To accommodate this, the Snap Server I.C.E. feature allows existing volumes to easily grow into the free space of SANbloc S50 expansion arrays added after the Volume has been defined. Administrators can attach a new expansion array to the existing Snap Server 520, 550, 4500 or 18000, define and build the physical RAID sets desired, and then dynamically assign the new RAID set to the existing volumes that need the extra capacity. The GuardianOS architecture makes this provisioning simple and instantaneous, offering an easy way to add capacity to existing volumes by spanning them across multiple expansion units that are added over time. It is important to understand that the data protection will only be as effective as the underlying RAID assigned for that storage. For example, it would not be appropriate to use I.C.E. to add RAID 0 storage to an existing RAID 5 volume, since part of the volume would now reside on an unprotected RAID 0 stripe. These issues can be avoided with some basic planning before the configuration addition or change, so that the scalability and data integrity of the Snap Server is maintained. The assignment of a newly created RAID set to an existing RAID set is done by a term we call RAID grouping. RAID groups are created by concatenating the newly created RAID set with an existing RAID set. Any existing volumes associated with the original RAID set then become associated with the RAID group, which is now the collection of at least two or more RAID sets. Only two RAID sets can be grouped at a time, but you can create groups of groups, which allows you to expand an existing volume to up to eight times the original Volume size. RAID 5 Rebuild Times As one data point an 8-drive RAID 5 array of 300GB SAS drives rebuilds on an idle system at an average rate of approximately. 2.6GB/minute. Drives sizes, rotational speed, the number of drives in the RAID set, and I/O activity affect RAID rebuild times. Using this example, it would take at least 2 hours to complete a drive rebuild of this RAID set on an idle system. Times will increase if the rebuild is done during normal workload. During that time, any additional drive failure will cause catastrophic data loss. It was previously mentioned that RAID 5 by definition cannot sustain two drive failures and the more drives in the RAID set, the higher the probability of another drive failing in that same RAID set. Keeping the RAID sets as small as practical lessens the probability of a dual-drive failure during the critical rebuild window. It is always a good practice to ensure you have a backup strategy for your Snap Servers. You have several choices: 1. You can implement local backups using the embedded Back- Bone Net Vault backup software. 2. You can implement enterprise backup using our embedded NDMP capability. 3. You can implement site-to-site backups using our licensed Snap Enterprise Data Replicator remote replication suite. Sample Configurations Snap Servers, by default, ship with a four-drive RAID 5 without a configured hot spare. If you are planning to add a SANbloc S50 JBOD expansion chassis to your configuration, we recommend you define the RAID on the S50 with a Global Hot Spare. This way, the hot spare will be available to any expansion chassis in the configuration. Here are several example configurations that illustrate the configuration flexibility using the Snap Server 500 Series (Snap Server 520 and 550) as examples for organizing your RAID sets: RAID 5 RAID 5 is one of the higher availability configurations. With all of the options for configuring a Snap Server 500 Series system, especially with the use of hot spares, which we strongly recommend, it is possible to create a higher availability solution. The diagrams below show there are ways to design an implementation to maximize the usable capacity or to tailor the system to be even more resilient to disk drive failures Physical Layout of Disks and RAID Sets The Snap Server can support a number of JBOD expansion chassis, depending on model. As an example, the 520 can support up to 4 expansion chassis; the 550 model up to 7 expansion chassis per SAS HBA. Each expansion chassis can hold up to 12 disk drives. When configuring RAID sets and RAID groups, it is always a good practice to physically group the drives in each RAID set either vertically or horizontally, so that each group is easily identified when looking at the front of the chassis. Standardizing on one of these layouts below for each RAID group, as well as having a standard placement for local/global hot spares will help with ongoing support. Figure 1 shows the recommend drive layouts, depending on your level of resiliency and drive failure tolerance:
4 Single Snap Server head single RAID 5 RAID set. Drive 1 Expansion Chassis Three 4-drive RAID 5, RAID sets. Good performance, most tolerant of multi-drive failures. R5 Drive 1 R5 Drive 1 R5 Drive 1 R5 Drive 1 0 1 2 Expansion Chassis Two 6-drive RAID 5, RAID sets. Good performance, less tolerant of multi-drive failures. 0 R5 Drive 1 R5 Drive 1 1 2 R5 Drive 1 Figure 1. Recommend RAID layouts for Snap Server head and expansion chassis. Assumes all drives in each group are of the same model and capacity. Expansion Chassis One 8-drive RAID 5, RAID sets. One 4-drive RAID 5, RAID sets. Best overall performance, least tolerant of multi-drive failures. 0 1 2 Recommended RAID 5 Configurations Single Snap Server head single RAID 5 RAID set. Two 4-drive RAID 5, RAID sets. One 7-drive RAID 5, RAID sets. One- global hot spare, best overall performance. 0 1 Slot 8 GHS 2 Figure 2. Figure 2 represents our recommended single-expansion chassis configuration and Snap Server. The Snap Server head ships, configured as a 4-drive RAID 5. This sample configuration assumes the expansion JBOD chassis was subsequently added. It is recommended that the SANbloc S50 be configured with one 7- drive RAID 5; one 4-drive RAID 5, and one global hot spare. All RAID sets share a commonly available global hot spare (G.H.S.) drive in the expansion chassis; assuming this global hot spare for all drives in both the head and the expansion chassis, the drives would have to be of the same model and capacity. If a drive failed in any of the RAID 5 RAID sets, the global hot spare would be brought online automatically to recover the missing RAID member. If no hot spare is uconfigured, user intervention will be required to replace the failed drive and start RAID recovery rebuild onto a replacement drive. If the Snap Server 520 or 550 head unit was previously configured as a 4-drive RAID 5, the expansion chassis could be added later and all of the RAID 5 arrays could be grouped together and their total capacity added to the pre-existing volumes. Single Snap Server head single RAID 5 RAID set. One 3-drive RAID 5, RAID sets. Three 4-drive RAID 5, RAID sets. One- global hot spare, best resiliency. 0 1 Slot 12 GHS Figure 3. Figure 3 represents an alternative configuration. Here, the Snap Server 500 Series head unit is configured as a 4-drive RAID 5; and the is configured with two 4-drive RAID 5 RAID sets, one 3-drive RAID set, and a global hot spare. Both the head and SANbloc S50 share a commonly available global hot spare (G.H.S.) drive in the Snap Server head. As a global hot spare all drives in both the head and the expansion chassis would have to be of the same drive model and capacity. As is in the configuration in Figure 2, if a drive failed in any of the RAID sets, the global hot spare would be brought online automatically to reconstruct the missing RAID member s data and parity information onto the replacement drive. If no hot spare is configured, user intervention will be required to replace the failed drive and start RAID recovery rebuild. Usable capacity for this configuration depends on the drive type and capacity used. Preferably, the global hot spare should be configured in the head unit, but requires the administrator to reconfigure the default RAID 5 RAID set that ships with the head unit. If the head unit and expansion chassis are to be added together, it is preferable to reconfigure the head for a hot spare and configure the expansion chassis RAID sets, all at installation time.
5 General considerations for using RAID 5: Keep the RAID sets in the head separate from those in the expansion chassis, as illustrated in Figures 1 and 2. This provides the flexibility to swap out or upgrade the head unit later without affecting the underlying RAID sets on the expansion chassis. The more drives you add to the RAID set, up to the maximum recommended limit of 8 drives per RAID set, the greater the statistical probability of having more than one drive failure in the same RAID set. Since a two-drive failure in the same RAID set at the same time is catastrophic to RAID 5, it is best to limit the number of drives in a RAID set, ideally to 4 drives, maximum 8 drives, per RAID set; if you choose to make your RAID sets bigger be aware of the risk, and be sure to keep regular backups. RAID 5 rebuild times are a function of the drive capacity and rotational speed (RPM), and the system activity. The larger the drive capacities, the longer the RAID rebuild time. RAID 1 RAID 1, in Figure 4, has typically been used for mission-critical applications. Requiring the highest number of redundant disks, these are the most expensive cost-per-gigabyte solutions. At its simples, RAID 1 requires each user data drive to have at least one other drive mirroring that same data. So, storing 1TB of data requires 2TB of disk space. The most critical data may require more than one mirror disk for each data disk, a technique called N-way Mirroring. Supported by Snap Servers, this allows two or more drives to mirror the original data drive. It also makes the cost of storage extraordinarily high and dramatically reduces the amount of usable storage. There are many RAID 1 permutations; here is a good general guideline. In this configuration using two expansion chassis if an entire chassis fails, the data will still be available from the remaining primary or secondary mirrored member. With the I.C.E. feature, RAID sets can be grouped together into a single Volume of any size up to the limits of our O.S. RAID 1 Figure 4. Provides optimal performance and fault tolerance. Drives must be the same type and capacity for each RAID 1 RAID set. 14, 4-drive RAID 1 redundant, mirrored RAID sets Using 300GB SAS or 500GB SATA drives RAID 0 RAID 0 is the highest-performance RAID solution but offers no fault tolerance, and is primarily for users who are not concerned about loss of data. There are situations where non-critical, or temporary data storage may be appropriate for the use of a RAID 0 configuration, and in those environments this configuration is a very high-performance, maximum-capacity solution. Since no parity information or redundant disks are used, all disks that are defined into the RAID 0 are available for storage. There is also no processing overhead in generating or checking parity, so throughput is maximized. In this configuration, (Figure 5, Option 1) maximum data would be lost if the RAID 0 is set up as a single 16-drive stripe. To minimize data loss, it would be simple to define a total of 4 sets of 4-drive RAID 0 stripes, so that the maximum data loss for a given drive failure would be reduced to 1TB (Figure 5, Option 2.) If you need to expand an existing volume, use I.C.E. to group the new RAID 0 RAID set to the already existing RAID set. Lastly, if you need to make the RAID 0 stripe larger, the maximum number of drives supported in any RAID set is 24 drives. Snap Server S50 JBOD Expansion Snap Server S50 JBOD Expansion RAID 0: Option 1 One 16-drive RAID 0 RAID set, contiguous Using 300GB SAS or 500GB SATA drives NOTE: Assumes all drives in both the head and expansion chassis are of the same model and capacity. RAID 0: Option 2 One 16-drive RAID 0 RAID set, contiguous Using 300GB SAS or 500GB SATA drives NOTE: Drives in both the head and expansion chassis do not have to be the same capacit.y Figure 5. Provides optimal performance but NO fault tolerance. As in Figure 5 Option 2, containing your RAID 0 array to 4 drives each limits data loss in the event of a catastrophic drive failure, to that single RAID set. Four drives per RAID 0 array is a good performance compromise; using 8 drives provides optimal performance.
6 RAID Planning Summary Checklist Here is a summary list of dos and don ts to ensure you have the most appropriate RAID configuration for storing your data. RAID 5 Do use RAID 5 for your higher availability needs with minimal redundant storage. Do remember that each RAID 5 RAID set utilizes the equivalent of one disk for parity, which means there is one drive less in usable capacity on that RAID set. A 4-drive RAID 5 RAID set = 3 drives of capacity for user data. Do limit the number of drives in your RAID sets to no more than 8 drives. Do define at least one hot spare- for a local hot spare, assign one drive per RAID set with the same drive type and size for the entire configuration. Global host spares MUST be of the same drive type and size for the entire configuration. Do define a regular backup strategy for backing up your data. Snap Server provide three methods for data protection: local backup, enterprise backup, remote data replication. Do put the global hot spare in the head unit, if installing the head unit and expansion chassis all at the same at initial install. Don t define RAID sets greater the 8 drives. Don t group RAID sets of different drive types (SAS/SATA) or capacities. Don t forget to define a hot spare. Don t group together RAID sets of different drive types (SAS/SATA.) Don t span head and expansion units for members of the same RAID set. RAID 1 Do consider using RAID 1 for your highest availability needs. Do remember that each RAID 1 RAID set requires a minimum of two drives, but only provides one drive of usable capacity for user data. The other drive is a mirror of the first. Do remember that hot spares can also be defined for RAID 1 RAID sets. Same rules apply as in RAID 5. Do define a regular backup strategy for backing up your data. Snap Server provide three methods for data protection: local backup, enterprise backup, remote data replication. Don t group together, RAID sets of different drive types (SAS/SATA) RAID 0 Do consider using RAID 0 only if you are using your Snap Server for transitory data - meaning data that is stored on Snap Server is not stored there for long periods of time and is also stored elsewhere and easily recoverable, in the event of a drive failure. If a single member of the RAID 0 RAID set is lost, ALL data on that RAID set will be lost. Do consider increasing the number of disk drives, up to the maximum limit of 24 drives, per RAID set if peak performance is needed. Do define a regular backup strategy for backing up your data. Snap Server provide three methods for data protection: local backup, enterprise backup, remote data replication. Don t use RAID 0 for storing your critical data. Don t group together, RAID sets of different drive types (SAS/SATA)
7 Taking the Snap Server 550 to the Max To this point, we have shown configurations that typically involved a single Snap Server model 520 or 550 head and single expansion. One of the key benefits of the Snap Server 500 architecture is its ability to scale to support very large storage requirements. Here is an example that demonstrates the type of maximum expansion that is possible, with the best balance between chassis symmetry, useable capacity, and drive failure resiliency. The configuration in Figure 6 shows a large RAID 5 configuration using the Snap Server 550. The head is configured with a 2-drive RAID 1 mirror, and two centralized G.H.S. devices. Each expansion array is configured as two identical 6-drive RAID 5 arrays utilized for user data. Each subsequent array added to the configuration can be grouped with existing RAID sets to increase the overall capacity of any or all existing volumes. Usable capacity is slightly less in this configuration, because the equivalent of two RAID 5 parity drives are consumed in each chassis, leaving only ten drives available for user data. This configuration also provides symmetry within a single chassis boundary, has good performance, and has better resiliency because less drives are used per RAID set. Snap Server 550 Head Slot 1 Slot 2 Slot 3 GHS Slot 4 GHS The configuration in Figure 7 shows another alternative large RAID 5 configuration. The Snap Server 550 head is configured with a 2-drive RAID 1 mirror, and two centralized G.H.S. devices, as in Figure 5. In this configuration, SANbloc S50 JBODs would be added in pairs of two, encompassing three RAID 5 RAID sets of 8 drives each utilized for user data. Each subsequent two arrays added to the configuration can be grouped with existing RAID sets to increase the overall capacity of any or all existing volumes. Usable capacity is slightly higher in this configuration, as there are more data drives available in this configuration because fewer parity drives are needed overall. This configuration provides the optimal performance and capacity for RAID 5, using the maximum recommended drives per RAID set for resiliency. Snap Server 550 Head Expansion Chassis 1 Expansion Chassis 2 Expansion Chassis 3 Slot 1 Slot 2 Slot 3 GHS Slot 4 GHS 0 0 0 1 1 1 2 2 2 Expansion Chassis 1 0 1 2 Expansion Chassis 4 0 1 2 Expansion Chassis 2 0 1 2 Expansion Chassis 5 0 1 2 Expansion Chassis 3 0 1 2 Expansion Chassis 6 0 1 2 Expansion Chassis 4 0 1 2 Expansion Chassis 7 0 1 2 Expansion Chassis 5 0 1 2 Figure 7. Best overall performance and usable capacity. Expansion Chassis 6 0 1 2 Expansion Chassis 7 0 1 2 Figure 6. Best chassis symmetry and drive failure resiliency.
8 Adaptec, Inc. 691 South Milpitas Boulevard Milpitas, California 95035 Tel: (408) 945-8600 Fax: (408) 262-2533 Literature Requests: US and Canada: 1 (800) 442-7274 or (408) 957-7274 World Wide Web: http://www.adaptec.com Pre-Sales Support: US and Canada: 1 (800) 442-7274 or (408) 957-7274 Pre-Sales Support: Europe: Tel: (44) 1276-854528 or Fax: (44) 1276-854505 Copyright 2006 Adaptec, Inc. All rights reserved. Adaptec, the Adaptec logo, Snap Appliance, the Snap Appliance logo, Snap Server, Snap Disk, GuardianOS, SnapOS, and Storage Manager are trademarks of Adaptec, Inc., which may be registered in some jurisdictions. Microsoft and Windows are registered trademarks of Microsoft Corporation, used under license. All other trademarks used are owned by their respective owners. Information supplied by Adaptec, Inc., is believed to be accurate and reliable at the time of printing, but Adaptec, Inc., assumes no responsibility for any errors that may appear in this document. Adaptec, Inc., reserves the right, without notice, to make changes in product design or specifications. Information is subject to change without notice. P/N: 666928-011 Printed in U.S.A. 03/06 4340_1.1