The IntelliMagic White Paper: Green Storage: Reduce Power not Performance. December 2010

The IntelliMagic White Paper: Green Storage: Reduce Power not Performance December 2010 Summary: This white paper provides techniques to configure the disk drives in your storage system such that they use the least amount of power while still providing good performance. Minimizing power usage is not so much about finding disk drives with a lower power usage, but rather about selecting a disk drive configuration that closely matches your workload needs.

This white paper was prepared by: IntelliMagic BV Lokhorststraat 16 2311 TA Leiden Leiden, The Netherlands Phone: +31 71 579 6000 IntelliMagic Inc 558 Silicon Dr. Ste 101 Southlake TX, USA Phone: 1 214 432 7920 Email: info@intellimagic.net Web: www.intellimagic.net Disclaimer This white paper discusses Disk Subsystem performance and capabilities in general terms, to the best of our knowledge. Any decisions based on this paper and its recommendations remain the responsibility of the reader. IntelliMagic products analyze measurement data and provide estimates for workload parameters based on this information. However, IntelliMagic does not guarantee the correctness of these numbers, and therefore any sizing based on the results remains the responsibility of the user. Support Please direct requests for information to info@intellimagic.net Trademarks All trademarks and registered trademarks are the property of their respective owners. 2010 IntelliMagic BV Page ii IntelliMagic 2010

Table of Contents Preface... iv Drive Type Power Requirements... 1 Capacity or throughput?... 2 Cost of capacity requirements...2 Cost of throughput requirements...2 Watts per GB or Watts per IO/sec... 3 Flash Drives... 4 Access Density... 4 Front-end access density... 4 Back-end access density... 4 Relation between front-end and back-end access density... 5 Disk response time by access density... 5 Picking a RAID scheme... 7 Read miss throughputs... 7 Random and sequential writes... 7 Mixed workloads... 8 Over-configuring... 9 Example... 9 Migration options...10 Conclusion... 11 IntelliMagic 2010 Page iii

Preface This white paper provides techniques to configure the disk drives in your storage system such that they use the least amount of power while still providing good performance. Minimizing power usage is not so much about finding disk drives with a lower power usage, but rather about selecting a disk drive configuration that closely matches your workload needs. The key objective in selecting disk drives is to find a configuration that meets your performance needs at the lowest drive count, i.e. with the highest drive capacity possible without the drives getting too busy. It is important to realize that this is not only a matter of selecting a drive type, but that the selection of the most suitable RAID type is equally important as we will discuss in detail. Leiden, February 2009 Dr. Gilbert Houtekamer Els Das Second Edition, July 2010 In July of 2010 IntelliMagic renamed the end user version of Disk Magic to IntelliMagic Direction to distinguish from the IBM version. Additionally, RMF Magic was renamed to IntelliMagic Vision to better reflect the expanded scope which now includes support for open systems storage environments. These were the main reasons for this second edition. Southlake, TX USA, July 2010 Brett Allison Page iv IntelliMagic 2010

Watts Green Storage: Reduce Power not Performance Drive Type Power Requirements The topic of this white paper is how to look at power usage and cost of operation for disk drives within high-end disk subsystems or disk arrays. Why would we only look at the drives themselves, and not at the other disk subsystem components that use power? Within a disk subsystem, the drives themselves take up a large portion of the total power consumption. For a full box typically two-thirds of the power consumption is used by the drives alone. Moreover, the drives are a place where you can make choices that influence the power consumption a great deal; for many of the other components you have no configuration options. We will start with showing the base numbers for power usage for the common drive types that large companies use in their disk subsystems. Figure 1: Power Usage per Drive Type shows the approximate power requirements per drive for the three product families most relevant to highend disk subsystems: SATA, Fibre 10k RPM and Fibre 15K RPM. Note that the power usage per drive for the least energy efficient drive is only 1.7 times the power consumption for the most energy efficient drive. Thus the number of drives is the first rough approximation if you want to assess the total power consumption of a configuration. Figure 1: Power Usage per Drive Type 20 18 16 14 12 10 8 SATA 7.2k RPM drives Fibre 10k RPM drives Fibre 15k RPM drives 6 4 2 0 0 200 400 600 800 1000 1200 Drive capacity (GB) As you can imagine, a higher RPM means higher power requirements, because the disk spins faster. Larger drives also use more power than smaller drives of the same technology because the platters and the read/write head assembly constitute more mass that must be moved. IntelliMagic 2010 Page 1

Note that the power usage of a drive also depends on its activity: more I/Os per second mean more head movements and therefore more power usage. This relationship is almost linear and constant over all disk types, about 0.03 Watt per I/O per sec. Therefore the I/O-related power consumption is of little influence to the choices that you can make in the disk configuration and we will not talk about this again in this whitepaper. The most economical drive to run if you would solely look at the energy consumption per disk drive ( arm ) would be the 73GB 10,000 RPM Fibre drive in above chart. However, to assess which drive is the most economical to run in your storage systems, the power usage per disk arm is not the number to use. In the next paragraph we will show different ways to look at it. Capacity or Throughput? When reviewing the energy usage for disk drives, it is important to realize that there are two very different but equally important requirements for the number of disk drives. One requirement is the amount of net GBs used by the applications, and the other requirement is the throughput in IOs/second required by the applications. If you solely configure your drives based on the net capacity needed, you can express the power consumption of the different options in Watts per GB. However, many configurations with higher activity level need to take into account minimum throughput requirements when they select a disk configuration. If you need to configure the drives based on the required throughput rather than on GBs alone, you can express the energy cost of your options in Watts per I/Os per second. Cost of Capacity Requirements As we stated in the first section, the drive count (number of arms) is a good first-order approximation when you compare the power usage of different potential configurations. The energy consumption to provide the application with a certain amount of net GBs obviously becomes lower if you use higher capacity drives, because you need fewer drives. Additionally, the energy cost of a slower RPM drive is lower than for an equal-capacity higher RPM drive. So if you look at energy cost to configure a certain net capacity you can be sure that the larger the disks and the slower the RPM, the better. In fact, the lowest Watts per GB number is achieved with non-moving magnetic storage: disks that are switched off and tape cartridges. For enterprise disk subsystem usage, SATA drives provide the highest capacity and lowest RPM and therefore the lowest energy cost for the configured capacity. However, sizing your disk configuration on capacity requirements alone may not give you the performance and throughput that you need. Cost of Throughput Requirements Because of the slower mechanical components, the slower rotation speed, and the simpler ATA controller, the peak number of I/O capability on one SATA disk is much lower than for one Fibre disk. This means that you may need to buy more SATA drives than you need for the net capacity if your applications have high throughput requirements. If you need to configure many extra arms to satisfy these throughput requirements, SATA drives do not provide a very low cost alternative. Actually, for a given throughput you need less Fibre drives than you need SATA Page 2 IntelliMagic 2010

drives, so for most data base or active workloads, smaller and faster Fibre drives will provide a more energy efficient solution than SATA drives. So for high throughput-workloads, the GBs required are not the limiting factor in choosing the disk configuration, and the energy cost of the different configurations should be expressed in Watts per I/Os per second instead. Watts per GB or Watts per IO/sec Table 1: Energy per GB compared to Energy per IO below shows the metrics discussed for both SATA and Fibre Channel drives. In this table we set the maximum I/O rate per arm to the I/O rate corresponding to 50% HDD busy. As you can see, you can achieve the lowest energy consumption to satisfy capacity requirements with SATA (green area in the top three lines), and the lowest energy consumption for satisfying throughput requirements with Fibre (remaining green lines). A 73 GB 15K RPM drive is the fastest drive available, and therefore provides the highest number of I/Os per second per disk arm, which can be translated to the lowest energy consumption for configurations with very high throughput requirements. Technolog y SATA Fibre Ultrasta r A7K100 0 SATA 15K300 4 Gbs 10K300 4 Gbs Table 1: Energy per GB compared to Energy per IO Capacit y Limited I/O rate : Ratio per arm Watt/ Watt/ (compare GB (*) arm GB d to best) Throughp ut Limited: Watt/ IO/s Ratio (compare d to best) 100 0 35 13.6 0.014 1 0.389 2.3 750 35 12.8 0.017 1.3 0.366 2.1 500 35 11.9 0.024 1.8 0.340 2.0 300 82 19 0.063 2.7 0.232 1.4 147 82 15.7 0.107 4.5 0.191 1.1 73 82 14 0.192 8.1 0.171 1 300 62 15.5 0.052 2.2 0.250 1.5 147 62 12.4 0.084 3.5 0.200 1.2 73 62 11.3 0.155 6.5 0.182 1.1 * Random disk I/O rate corresponding to 50% busy IntelliMagic 2010 Page 3

An additional type of over-provisioning may be needed for SATA and FATA drives. Because of the lower-cost components they are not recommended to be used at the high sustained utilizations that you can use for Fibre drives. The vendors use terms like lower duty cycle or reduced duty cycle to describe the storage environments in which SATA and FATA drives should be used. This can be translated to a 20% - 30% HDD busy, instead of the 50% - 80% HDD busy that you could use for Fibre drives. This means that you would need to configure even more SATA/FATA drives to get the necessary throughput without loading the HDDs higher than recommended by the vendor. Flash Drives It should be noted that when even higher throughputs are required, flash drives become a viable alternative. Not many workloads, however, need that level of performance and throughput. However, by moving some of your most active workload to Flash drives you may be able to put the remaining workload on high capacity Fibre or FATA drives. With the rapidly dropping costs of flash drives, it is only a matter of time before they will become commonly used in large storage environments as well as in your laptop. Access Density Because it would be a complex, labor-intensive effort to review the workload for each individual logical or physical drive, a common concept used to describe the intensity of the I/O workload is the term access density, which is defined as the average number of I/O operations per second per GB of data stored. When configuring disk subsystems, it is very important to distinguish between front-end and back-end access density. Front-end Access Density The front-end access density is determined by the number of host I/O operations to a disk subsystem or logical volume. Example access density at the front-end: 20 x 54 GB Logical Volumes handling 500 I/Os per second yield 0.46 I/Os per second per net GB This is the access density that we normally talk about and that is relevant to the performance of the disk subsystem as a whole. However, when you want to look at the performance and throughput of the disk drives specifically, you need to use the back-end access density as defined below. Back-end Access Density Back-end access density is the number of operations to the hard disk drives. This is the number that is crucial to the HDD utilization, throughput and performance. Example access density at the back-end: Page 4 IntelliMagic 2010

An array group of 8 * 146 GB drives (=1168 GB) handling 250 HDD accesses per second yields 0.21 accesses per second per physical GB of hard disk What we need to know to compute the back-end access density for a particular workload is how many of the front-end I/Os result in disk accesses. Read hit I/Os do not result in a back-end access at all. Random read misses result in synchronous disk accesses, and sequential reads result in asynchronous disk accesses because of the prestaging. Destaged writes result in asynchronous disk accesses, typically multiple per write operation, depending on the RAID scheme used. The back-end I/O rate is not available from measurement tools directly; it must be computed or estimated from the front-end I/O Rates and cache statistics, as well as from the RAID scheme used. Applications like IntelliMagic Direction and IntelliMagic Vision do this for you. Relation between Front-end and Back-end Access Density To illustrate the potential differences between front-end and back-end I/O rates let us look at the charts below in Figure 2. These charts show the access densities for 5 disk subsystems over the course of a day. In the lefthand chart you can clearly see a break-up of the five disk subsystems into two categories: two are relatively inactive on the front-end; the other three are much more active. The right-hand chart shows the back-end HDD reads and writes. Note that this second chart shows a very different picture: the subsystems are more or less equally loaded in terms of back-end activity. Figure 2: Front-end and Back-end Access Density for the same workload and hardware Data and charts from IntelliMagic Vision Front-end Access Densities for five Disk Subsystems Back-end Access Densities for the same Disk Subsystems 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 12:00 AM 2:00 AM 4:00 AM 6:00 AM 8:00 AM 10:00 AM 12:00 PM 2:00 PM 4:00 PM 6:00 PM 8:00 PM 10:00 PM DMX #1 DMX #2 DMX #3 DMX #4 DMX #5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 12:00 AM 2:00 AM 4:00 AM 6:00 AM 8:00 AM 10:00 AM 12:00 PM 2:00 PM 4:00 PM 6:00 PM 8:00 PM 10:00 PM It is this back-end access density where selection of the drive type really matters, both in terms of speed and in terms of size. DMX #1 DMX #2 DMX #3 DMX #4 DMX #5 Disk Response Time by Access Density Combining the access densities with a response time curve, the drive capacity makes a large difference in terms of performance. For example, when the back-end I/O density is 1 I/O per sec per GB, a 73 GB drive has to handle 73 I/Os per second, where a 144 GB drive for the same IntelliMagic 2010 Page 5

HDD Response time (ms) Green Storage: Reduce Power not Performance workload must handle 144 I/Os per second. Since the HDD response time increases when there are more I/Os per disk arm, the response time for such a workload on a 144 GB drive will be much higher. Figure 3: Physical Disk Response time as a function of Back-end Access Density 25 20 15 10 5 73 GB 10K Fibre 73 GB 15K Fibre 146 GB 10K Fibre 146 GB 15K Fibre 300 GB 10K Fibre 300 GB 15K Fibre 0 0 0.5 1 1.5 Back-end Density (HDD I/O per GB) Looking at the chart in Figure 3, you can see that the 15K RPM drives all start at a low response time for low access densities, only around 6 ms. However, the response time for the 300 GB 15K RPM drive quickly exceeds the response time for the 73 GB 10K RPM drive when the access density increases. Table 2: Access Density Recommendations Drive Max. Back-end Access Density 73 GB, 15K RPM 1.4 73 GB, 10K RPM 0.8 146 GB, 15K RPM 0.7 146 GB, 10K RPM 0.4 300 GB, 15kRPM 0.3 300 GB, 10K RPM 0.2 You can use the previous chart to create recommendations for the maximum I/O density supported by each drive given a certain minimum performance requirement of, say, 15ms per back-end access. As you can see from the table on the left, the maximum recommended access densities are vastly different for the different types of Fibre drives. Note that using a universal 15 ms cut-off point may not be fair to the 10k drives because of their higher base service times; on the other hand the application users do not care what the drives are capable of in terms of base service time, they just care about the response times that they are experiencing. Page 6 IntelliMagic 2010

The back-end access densities for the various disk types, and the resulting HDD busy and frontend response times can be easily modeled using a software tool like IntelliMagic Direction. Picking a RAID Scheme When comparing RAID-10 to RAID-5, it is immediately clear that a RAID-10 scheme for the same net capacity uses more power, as there are as many mirror disks as data disks, while a RAID-5 implementation typically uses one parity disk for every 7 data disks. So a RAID-5 scheme uses just over half the power of the RAID-10 scheme if computed on a per GB basis. Read Miss Throughputs As we have seen, however, power per GB is not always the most important metric. Performance and throughput limitations are factors that need to be evaluated too. For a lightly loaded system, RAID-5 and RAID-10 will give the same response time, as the only host operation that requires a synchronous disk access is a read-miss. Both RAID-5 and RAID-10 systems will need to read the record. In a RAID-10 system you need to configure more drives to get the same net capacity, which automatically gives you more arms to spread the work over. For that reason a RAID-10 scheme will be able to process more read-miss I/Os per second per net GB than a RAID-5 scheme for the same net capacity. For high access densities, these extra arms may be just what you need to get the required throughput. However, this higher read miss throughput could also be obtained by simply over-configuring the number of RAID-5 RAID groups such that you have as many drives as a RAID-10 setup would have. So how then do you decide whether to use RAID-10 or RAID-5? For that we need to look at the writes, not just at the reads. RAID-10 and RAID-5 behave very differently for write operations. Random and Sequential Writes Random write operations, as in a database workload, result in two writes for RAID-10 (primary and mirror copy), but in four operations for RAID-5. This high number of RAID-5 operations is needed because to compute the new parity, the old data and old parity information need to be read first. After computing the parity, the data and parity are written, hence the 4 operations. Therefore, RAID-10 supports a higher throughput per RAID rank for random write operations: double the RAID-5 random write throughput. For sequential workloads the RAID-5 scheme is more efficient, as all data and parity information in a RAID group will be written in a single logical operation without having to read the existing data and parity first. So the overhead of the RAID-5 scheme for sequential is 1 parity block that needs to be written once for every 7 blocks of data, a overhead of 1/7 th. RAID-10 still requires two writes for every application write, regardless of it being random or sequential. Therefore, RAID-5 supports a higher throughput for sequential write operations per RAID group: almost double the RAID-10 sequential write throughput. IntelliMagic 2010 Page 7

Mixed Workloads Real workloads are a varying mix of reads and writes, cache hits and misses, random and sequential I/O, so the number of back-end I/O operations to the disks can go either way: more for a RAID-5 implementation, more for a RAID-10 implementation, or almost the same for both. The figures below show the number of disk operations for a RAID-5 versus RAID-10 implementation for a measured workload on two existing disk subsystems over the course of a day. For the disk subsystem in Figure 4, the red line for RAID-10 has higher peaks than the blue line that represents a RAID-5 implementation. This shows that for this disk subsystem, a RAID-10 implementation results in a higher back-end rate. Of course a RAID-10 implementation has a higher minimum number of disks to get the required net capacity as well. So, for this workload, a RAID-5 implementation will be more economical both from the point of view of capacity requirements as from the point of view of the throughput requirements. Figure 4: HDD accesses for disk subsystem 1 on RAID-10 vs. RAID-5 2500 2000 1500 1000 500 0 11:59 PM 2:14 AM 4:29 AM 6:44 AM 8:59 AM 11:14 AM 1:29 PM 3:44 PM 5:59 PM 8:14 PM 10:29 PM RAID 10 HDD Rate RAID 5 HDD Rate Figure 5 shows much higher peaks for the RAID-5 HDD rate. The random write content for this workload is so high that for the highest peak, the HDD rate on RAID-5 is almost twice as high as for a RAID-10 scheme. This means that RAID-10 may be a more economical choice, if the access density is high enough that there are more disks needed than for the capacity alone. Page 8 IntelliMagic 2010

Figure 5: HDD accesses for disk subsystem 2 on RAID-10 vs. RAID-5 6000 5000 4000 3000 2000 1000 Over-configuring 0 11:59 PM 2:14 AM 4:29 AM 6:44 AM 8:59 AM 11:14 AM 1:29 PM 3:44 PM 5:59 PM 8:14 PM 10:29 PM RAID 10 HDD Rate RAID 5 HDD Rate If we want to know which RAID type is more cost efficient for our workload we need to take into account how many disks are minimally needed to satisfy the capacity requirements, as well as how many drives are needed to support the peak back-end throughputs for RAID-5 and RAID-10. For some workloads with a large random write fraction and a large throughput, you would need to over-configure the number of RAID-5 groups so much that you actually need more drives than for a RAID-10 implementation. It is a very big effort to compute these types of scenarios by hand; a tool like IntelliMagic Direction will help you with this. Example Let us define a workload profile as all the ratio s and percentages that are relevant to computing the back-end disk accesses: the read/write ratio, read hit %, the destage % (or write efficiency), and the random/sequential write ratio. For each workload profile and for each RAID and disk type you can then draw a picture that shows the number of required RAID groups as a function of the number of front-end I/Os or, equivalently, of the front-end access density. For very low access densities, the required number of RAID groups is simply the number of RAID groups that are needed to configure the net capacity of application data. In the picture that shows the needed number of RAID groups as a function of access density, this number shows up as a horizontal line; one that is higher for RAID-10 than for RAID-5. However, when the access density gets higher, more RAID groups need to be configured to make sure that the HDD busy doesn t go beyond what the drives support. At this point, that is where the throughput requirements take over from the capacity requirements, the horizontal line changes to a growing step function line. Depending on the exact workload profile, the two curves for RAID-5 and RAID-10 cross at a different point or do not cross at all, showing that for some workload profiles RAID-5 uses less IntelliMagic 2010 Page 9

Number of RAID groups (8 disks) Green Storage: Reduce Power not Performance drives than RAID-10 regardless of the access density, whereas for other workloads high access densities make RAID-10 more economical. Figure 6 shows such a picture for RAID-5 and RAID-10, for a synthetic 50% read/50% write workload of which 50% of the writes are sequential. The drives used for this picture are 300 GB 15k drives. In this picture you can see that for this workload profile the higher the access density is, the more economical a RAID-10 configuration will be. Figure 6: Required number of RAID groups for growing access density 14 12 10 8 Req. RAID-5 RAID Groups Req. RAID-10 RAID Groups 6 4 2 0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Access Density Migration Options In most cases you are not designing a disk configuration from scratch, but there is an existing configuration that will be replaced. A very common case is a migration from 10K RPM drives to twice-as-big 15K RPM drives with the same RAID scheme. This will result in better response times at a low level of activity, but also in a somewhat lower maximum throughput when the drives are the limiting factor. This potential throughput reduction may be mitigated to some extent by a faster, next generation disk subsystem. The power savings for a migration like this are less than 50% (which is the drive count reduction), because 15K RPM drives use more power than the 10K RPM drives. Migrating from 73GB 10K RPM drives to 146GB 15K RPM drives provides a power saving of around 30%. When migrating from 146GB 10K RPM drives to 300GB 15K RPM drives the savings are about 25%. Page 10 IntelliMagic 2010

If you convert from RAID-10 to RAID-5 there will also be power savings because of the reduced drive count, but the downside is that this configuration can result in a lower maximum throughput. As in the previous example, this effect can be mitigated with faster drives. Migrating from RAID-10 73GB 10K RPM drives to RAID-5 73GB 15K RPM drives will save about 30% in HDD power requirements. Migrating to 146GB 15K PRM drives will result in approximately 60% power savings, but the maximum throughput will also be lower, potentially making this migration not viable for disk-intensive workloads. Again, a next generation disk subsystem can offset this lower throughput to some extent. Conclusion When you look at reducing energy consumption for the disk drive configuration in your disk subsystem, it may seem initially that the most reduction potential results from decreasing the number of drives by increasing the drive capacity, or by choosing RAID-5 over RAID-10. However, for high I/O rates, you may need extra drives to get the throughput needed. Depending on the read to write ratio and random to sequential ratio, RAID-10 may become more effective. So for some workload profiles smaller drives are a more economical choice, and for some workloads RAID-10 is a more economical choice than RAID-5, whereas for other workloads simply over-configuring the number of RAID-5 groups may be sufficient. IntelliMagic Direction can help you decide which scheme is the best choice for your workloads. IntelliMagic 2010 Page 11