The IntelliMagic White Paper on: Storage Performance Analysis for an IBM San Volume Controller (SVC) (IBM V7000) IntelliMagic, Inc. 558 Silicon Drive Ste 101 Southlake, Texas 76092 USA Tel: 214-432-7920 www.intellimagic.net sales@intellimagic.net IntelliMagic 2011 Summary: This document describes Page 1 how to analyze performance on an IBM SVC.
This white paper was prepared by: IntelliMagic B.V. Leiden, The Netherlands Phone: +31 71 579 6000 IntelliMagic, Inc. Texas, USA Phone: +1 214 432 7920 Email: info@ intellimagic.net Web: www.intellimagic.net Disclaimer This document discusses storage performance analysis for IBM SVC storage systems. IntelliMagic products can be used to support all phases of the storage performance management processes. Appropriate usage and interpretation of the results of IntelliMagic products are the responsibility of the user. Support Please direct support requests to support@ intellimagic.net Please direct requests for general information to info@ intellimagic.net. Trademarks All trademarks and registered trademarks are the property of their respective owners. 2011 IntelliMagic B.V. IntelliMagic 2011 Page 2
Table of Contents Section 1. Introduction... 4 1.1 I/O Path... 4 Section 2. IBM SVC Architectural Overview and Measurements... 5 2.1 SVC Architecture Overview... 5 Measurement Overview... 6 2.2 I/O Groups... 6 Measurements... 6 2.3 Nodes... 7 Measurements... 7 Cache... 7 Cache Measurements... 8 2.4 Managed Disk... 9 Managed Disk Measurements... 9 2.5 Storage Pool... 10 Storage Pool Measurements... 11 2.6 Volume... 11 Measurements... 11 Section 3. Case Studies... 13 3.1 Case Study... 13 Identifying Root Cause of High Response Times... 13 Summary:...22 Section 4. Conclusion... 24 Glossary... 25 Additional Resources... 26 IntelliMagic 2011 Page 3
Section 1. Introduction The purpose of this paper is to provide a practical guide for conducting performance analysis on an IBM SAN Volume Controller (SVC). This paper will discuss the end-to-end I/O path, the SVC architecture, and the SVC key measurements. In addition, it will provide guidance in diagnosing and resolving performance issues using IntelliMagic products such as IntelliMagic Vision and IntelliMagic Direction. IntelliMagic Vision and IntelliMagic Direction are part of the IntelliMagic Storage Performance Management Suite. For additional information on these software products, please refer to http://www.intellimagic.net/intellimagic/products. 1.1 I/O Path Figure 1: End-to-End View illustrates how I/Os traverse the I/O path. At a very basic level, any host initiated I/O request is either a read or a write. The host driver instructs the host bus adapter (HBA) to initiate communication with the SVC s fibre channel ports. The connectivity equipment, such as the SAN switches and directors, confirm access and send the packet to the destination fibre ports on the SVC. If the data requested by the host resides within the SVC s cache, then the data is sent back across the fabric and to the host. If the data requested by the host does not reside within the SVC s cache, the SVC requests it from the back-end storage array. A detailed discussion of the read and write paths for different types of I/O will be discussed in the cache section. Figure 1: End-to-End View IntelliMagic 2011 Page 4
Section 2. IBM SVC Architectural Overview and Measurements This section contains a brief overview of the SVC components and their relevant measurements. For an in-depth discussion on the SVC architecture and performance considerations see the references in the Additional Resources section of this paper. 2.1 SVC Architecture Overview The SVC architecture differs significantly from a traditional storage array. In addition to providing block level storage from internally managed drives, it virtualizes external block level storage. Block level storage virtualization can be implemented in the switch, the array or as an appliance. The SVC appliance resides in the I/O path between the host and the back-end storage as illustrated in Figure 1: End-to-End View. The virtualization of back-end storage arrays facilitates centralized migrations, provisioning, and replication. The virtualization and pooling of traditional storage may also lead to improved capacity utilization, as well as performance improvements, as I/O can be easily balanced across the back-end resources. An SVC consists of up to four I/O groups. Each I/O group consists of two redundant, clustered nodes. Each node consists of an off the shelf processor, memory, and four fibre channel ports. It also supports two Ethernet ports, and is capable of supporting four internal Solid State Drives (SSDs). The physical components of SVC are commodities. The uniqueness of SVC is in the software and the logical components that will be discussed in the remainder of this section. Figure 2: SVC Components IntelliMagic 2011 Page 5
Figure 2: SVC Components illustrates the SVC logical components. Working our way from the bottom of the diagram to the top of the diagram, the storage system provisioned LUNs are called managed disks (mdisks). Each mdisk consists of a number of extents of a specified size. The mdisks are grouped together to form storage pools. From the storage pools, the extents are then coalesced to form volumes. Hosts are zoned to the nodes within an I/O group. The I/O groups have access to all the volumes within the storage pools. The volumes are then assigned to the hosts. The individual components, how they relate to each other, and their associated measurements will be discussed in more detail in the remainder of this section. Measurement Overview Both front-end and back-end measurements are available for read response times, write response times, read I/O rate, write I/O rate, read throughput, and write throughput. Front-end only metrics include write cache delays, and read hit percentage for the cluster, I/O groups, and nodes. Back-end only metrics include read queue time and write queue time. 2.2 I/O Groups An I/O group is a logical entity that refers to a pair of redundant clustered nodes. If one node fails within an I/O group, its workload is transferred to the other node. Hosts should be zoned to both nodes within an I/O group so that a node failure does not cause a host s volumes to become inaccessible. Volumes should be balanced across the I/O groups. In order to balance the I/O across the I/O groups there are two approaches. The first approach is to zone every host to every I/O group and round robin the volume creation across the I/O groups. In large environments with many hosts, and a fully configured 8 node SVC cluster, this may not be possible due to host path, SVC, and zoning limitations. The second approach is to zone a host to a particular I/O group. With this approach the goal is to distribute the I/O workload across the I/O groups evenly by assigning the hosts to the I/O groups in a round robin approach. The drawback to this approach is that not all hosts have the same I/O workload profiles. With this approach it is extremely important to monitor each I/O group s performance to ensure that the workloads are balanced. If an I/O group is significantly over utilized, it may be necessary to move one or more hosts from the over utilized I/O group to a lesser utilized I/O group. This process is disruptive to the host as it requires zoning changes. Within an I/O group, a volume is associated with a preferred node. In normal operations, the preferred node services all the I/Os for a given volume. The preferred node can be selected at volume creation. By default SVC attempts to distribute the volumes evenly across the nodes. As with host workloads, volume workloads can vary dramatically. If one node is more heavily utilized than the other node, the preferred node can be manually changed for a specific volume. Measurements It is possible to aggregate a number of the node and volume measurements to the I/O group. In particular the response times, throughput, and I/O rates should be observed. The read response IntelliMagic 2011 Page 6
time, and write response times for the volumes associated with an I/O group provide an average of the amount of time required for SVC to service the read and write I/O requests, respectively. Acceptable I/O response times will vary depending on application and user requirements, SVC hardware, SVC firmware, and back-end storage hardware and configuration. In addition to monitoring the I/O response times, the read and write I/O rates and throughput should be monitored to understand if the I/O is balanced across the I/O groups and the nodes. 2.3 Nodes The SVC nodes are the only physical components within the SVC cluster. The nodes run the SVC software and provide I/O processing, memory buffering, and connectivity. The SVC nodes utilize commodity processors, memory, and fibre channel ports. The processors provide the compute power for processing all the I/O operations. The memory serves as both a read and write I/O cache. The fibre ports provide connectivity and communication between the attached hosts, SVC, and their back-end storage systems. They can also facilitate communication between peer storage SVC clusters for the purpose of replication activities. Measurements SVC provides a robust set of measurements for both the front-end, and back-end, as well as individual node CPU utilization. Perhaps the only shortcoming of the SVC node metrics is the lack of visibility into the internal bandwidth utilization; however, IntelliMagic Direction can be used to estimate this metric as shown in Figure 3: IntelliMagic Direction Internal Components. Figure 3: IntelliMagic Direction Internal Components Data throughput and I/O rate per port are required to understand the port utilizations. Tip #1: When planning connectivity requirements ensure that there is adequate bandwidth on the SVC ports to handle all the read hits, writes, and the read misses. The read miss payload and the write payload must be staged into SVC cache from the back-end storage controller, and sent out to the host over the same ports, effectively doubling the bandwidth requirement for read miss and write workloads. Cache The primary objective of cache is to reduce the host I/O response time. This is achieved by providing cache hits for read requests, and fast cache writes. The entire cache on an SVC node can be used for read activity. In the case of the latest model at the time of this writing, the CG8, only half of the SVC node s cache can be used for write activity. This allows for unwritten write data to be drained to internal storage in the case of a power outage. A side effect of this behavior is that it also prevents a large sequential write application from completely saturating any node s cache, and negatively impacting other applications. IntelliMagic 2011 Page 7
Cache on the SVC is segmented into 4 KB segments or pages. A track is used to describe the unit of locking and destage granularity. There are up to eight 4 KB segments in a track. SVC attempts to coalesce multiple segments into a track write when the segments are within the same track. Read I/O requests are either read cache hits or read cache misses. If the requested data is resident in cache, the data is immediately transferred to the requestor. This is called a read cache hit. If the requested data is not resident in cache, the data is requested from the back-end storage systems. This is called a read cache miss. After the data is loaded into cache from the back-end storage system, the front-end adapter sends it to the host that initiated the request. For write I/O requests, the data is written to cache on the host s preferred node. It is then mirrored to the partner node. Subsequently, the preferred node sends an acknowledgement to the host that the I/O has been completed. This is called a fast write (FW). At some point after the acknowledged completion of the I/O, the write is de-staged to the back-end storage system, releasing the cache tracks that it occupies. Assuming there is sufficient cache on the back-end storage system, the write response times to the back-end systems should be fast writes. In the event of a node failure all the writes from the remaining node will be drained to the backend storage system. The behavior of write I/Os in this scenario will change to write through mode. The acknowledgement from SVC to the host that a write has been completed will only be sent upon confirmation that the write has been acknowledged as completed by the back-end storage system. Tip #2: Consider running your SVC nodes and their associated components at no more than 50% utilization during online periods. That way if you have a node failure, your SVC cluster will not severely impact the performance of your online applications. Cache is also managed at the storage pool level. It is referred to as partitioning. There is an upper cache limit set for each of the storage pools. If the cache limit is reached for I/Os to a particular storage pool, write I/Os will behave in a write through mode. This behavior will only continue while the amount of cache consumed for write I/Os to a specific storage pool exceeds the upper limit set for the storage pool. As discussed previously, this has the effect of requiring write I/Os to be acknowledged by the back-end storage system prior to informing the initiator that the write has been completed. It is rare to encounter this behavior unless there is a problem draining writes to a back-end storage system due to over-utilization, or a problem within the storage system. Cache Measurements From a performance analysis perspective, it is important to understand the effectiveness of the cache management. Key cache measurements include the read cache hit ratios, and the write cache delays. Consistent low read cache hit ratios may indicate that the storage system has insufficient cache for the workload. These are available at the SVC cluster, I/O group, node and volume level. IntelliMagic 2011 Page 8
Tip #3: For open systems workloads, an average cache hit ratio greater than 50% is desired. IntelliMagic Direction can help identify whether additional cache will help improve the response time or throughput for a particular workload. The number of write cache delays per second indicates whether or not SVC is able to de-stage I/Os quickly enough to the back-end storage system. A write delay event indicates that cache is full, and that new write I/O operations can only be completed after a physical de-stage to disk. This typically only happens when the back-end storage system is saturated. The storage pool cache impacts only write behavior. Tip #4: A small number of write delays can significantly increase write response times to hosts. 2.4 Managed Disk A managed disk has a one-to-one relationship with a back-end storage system LUN or volume. When the storage system provides RAID formatted disk groups, as in the case of the IBM DS8000 series, the entire RAID group should be configured as a single LUN. Prior to SVC 6.1, SVC limited the maximum mdisk size to 2 TB. This would lead to two LUNs created per the 8 disk member RAID group when the physical drives in the RAID group exceeded 300 GB. Managed Disk Measurements You can monitor the performance of your managed disks using the managed disk statistics that include the back-end read and write response times. These statistics measure the amount of time it takes to perform a read or write operation from the SVC to the back-end storage system. Read and write queue times provide a measure of how long read or write operations are waiting to be sent to the back-end storage system. Tip #5: Average back-end port queue times should be less than a couple of milliseconds as the SVC series queue times only measure the amount of time the I/O request spends waiting to be sent to the back-end storage system. If this is over 2ms it is a sign of contention on the fabric or on the back-end storage system. Tip #6: For increases in front-end response time that do not correlate to increases in the backend response times, the the performance issue is with the path from the host to the SVC or the SVC s ports or processors Tip #7: Keeping in mind that enterprise disks can service I/Os in less than 6.0 ms when there is no queue, and that cache hit response times are typically be less than 1.0 millisecond, the average back-end response times for a SVC managed disk should be significantly less than 7.0 ms (6 ms for each read miss and 1 ms for the data transfer). Average response times greater than 10.0 ms indicate some sort of constraint from the SVC to the back-end storage systems, or within the back-end storage system. IntelliMagic 2011 Page 9
In order to understand if high back-end response times on the SVC are a result of a saturated back-end disk RAID group sets or some other storage system component you will need to have visibility into the storage system s components. Independent of the root cause, the back-end response times provide an excellent way of understanding if downstream saturation exists. When vendors describe I/O response times they typically mean averages similar to what is described in Example 1: Average Response Time. Example 1: Average Response Time Workload Type: OLTP Read / Write Ratio: 80/20 Read Hit Ratio: 50% Write Hit Ratio: 100% Write Hit Response Time: 1.0 ms Read Hit Response Time: 1.0 ms Read Miss Response Time: 6.0 ms Read I/Os per Second: 800 Write I/Os per Second: 200 Average Response Time = ((400*1.0) + (200*1.0) + (400*6.0))/1000 = 3.0 ms per I/O 2.5 Storage Pool A storage pool is a grouping of more than one managed disk. When planning a storage pool it is important to remember that when a single managed disk experiences a failure, it brings the entire storage pool offline. As a result, one of the primary design goals is to limit the hardware failure boundaries. General performance guidelines should also be addressed as part of the design, and will also be discussed in this section. With these thoughts in mind, there are several best practices to consider when creating a storage pool as enumerated in Table 1: Storage Pool Best Practices. IntelliMagic 2011 Page 10
Table 1: Storage Pool Best Practices Best Practice Availability Performance A storage pool should utilize managed disks from one storage system. X Each storage system should provide managed disks to a single SVC cluster. X Each back-end RAID group must be included in only one storage pool. X Do not utilize more than ten managed disks per storage pool. X Rather than adding capacity to an existing storage pool, create a new storage pool. X X Implement striped volumes for all workloads except 100% Sequential. X Testing by IBM has shown that there is very little difference in the performance between four and eight managed disks in a storage pool. X X Utilize storage pools with four or eight managed disks. Select an extent size that balances cluster capacity and volume granularity. Testing has shown good results with 128 MB and 256 MB extents. X Storage Pool Measurements The storage pool measurements provide a good means for monitoring the performance of both the front-end and back-end storage. The storage pool measurements provide a combination of measurements that are aggregated from both the front-end volume operations, as well as the back-end managed disk operations. On the frontend, the response times, I/O rates, and I/O throughput provide an excellent means for understanding if the storage pools are responsive and balanced. The response times include the read cache hits as well as the read cache misses. On a system with a reasonable read cache hit percentage, the average front-end read response time will be understandably lower than the back-end read response times for the same storage pool. The front-end write response times should be 1.0 ms or less as they are only measuring the amount of time required to write to the preferred node s cache and mirror to the secondary node. 2.6 Volume A volume is a discrete grouping of storage extents that can be made addressable to a host. The size of the extent can be selected during the storage pool creation but a typical size is 128 or 256 MBytes. When volumes are created the capacity and the layout of the volume are selected. For SVC managed disks, the data layout can be striped or sequential. In all cases but 100% sequential workloads, the volumes should be created as striped volumes. Striped volumes consist of extents that are spread across the managed disks in a round-robin fashion. Non-managed SVC volumes or image-mode disks are not covered in this paper as they not typically part of steady state configurations. Measurements In order to detect whether performance problems exist, it is important to ignore inactive volumes, since those volumes have an insignificant impact on the overall performance of the storage system. It is common for some of the volumes with the highest response time to have little I/O activity. Sorting the volumes based on their response time alone, therefore, isn't a good IntelliMagic 2011 Page 11
way to find the volumes that cause bad overall performance. A good way to find the volumes that have the biggest impact on the overall performance is to quantify the I/O Intensity which is the product of the I/O rate and the response time. Response times are provided from the view of the front-end of the storage system, so they include both cache hits and cache misses. On a well balanced system that is not overloaded, 100% of writes should be satisfied from cache. The response time for writes satisfied from cache should be no more than the amount of time required to write to the preferred node s cache and mirror to the secondary node, and send the host an acknowledgement. Read hits should also take very little time to process and transfer the data. The read misses will require physical disk access, and are significantly slower than the cache hits. Tip #8: Average response times greater than 10.0 ms indicate some sort of constraint on the path to the SVC, within the SVC, or within its associated back-end storage systems. IntelliMagic 2011 Page 12
Section 3. Case Studies The large number of components in the I/O path of an enterprise storage virtualization appliance like the IBM SVC makes the diagnosis of performance issues a complicated task. IntelliMagic Vision simplifies the root cause analysis by providing an intuitive interface with drilldown capabilities for all the key components. It uses a data collector based on the Storage Management Initiative Specification (SMI-S, see http://snia.org/smis) standard to collect configuration information and performance data, and leverages the native SVC interface to supplement the performance measurements. The configuration information exposes the relationships between the various storage components. IntelliMagic Vision uses these relationships to provide drilldown paths as appropriate, and to complement the performance data by calculating or estimating values for metrics and components that are not directly supported by the hardware, or the vendor's SMI-S provider software. By leveraging deep knowledge of storage system performance and configuration, IntelliMagic Vision provides a comprehensive solution for managing your storage virtualization environment. 3.1 Case Study The remainder of this section discusses a storage performance case study conducted using IntelliMagic Vision. Identifying Root Cause of High Response Times Our case study describes an IBM SVC storage environment that contains four CF8 nodes running SVC firmware 4.3.1, with a DS5100, and a DS8100 supporting the back end. The customer reported seeing extremely high response times as illustrated in Figure 4 : Response Time for SVC. Figure 4 : Response Time for SVC IntelliMagic 2011 Page 13
Figure 5: RAID Group Set Response Time illustrates a Service Level Attainment (SLA) chart for response times to the SVC s storage pools. IntelliMagic Vision uses the words RAID Group Set interchangeably with Storage Pools. The response times shown represents measurements observed at the front end of the SVC. The response time for almost all I/Os to all the Storage Pools is exceeding the 10ms red threshold. MDG1, MDG2 and MDG3 are associated with managed disks from the DS8100, and MDG5 and MDG6 are associated with managed disks from the DS5100. Since the response times are high for storage pools associated with managed disks on both back-end storage systems, it is likely there are multiple issues. Figure 5: RAID Group Set Response Time Tip #9: IntelliMagic Vision s SLA report provides an efficient way to track disk storage system health. In a single color-coded view the user can see if there are any exception conditions that need attention. The color-coding supports three colors: green, amber and red. The rules in applying those colors are automatically adapted to the type of disk storage system component and metric being evaluated. In most cases the user can drill down from a higher level object, such as the overall system, to lower level components, such as the RAID Group Sets, to isolate problems. Figure 6: Storage Pool Throughput illustrates the total throughput by storage pool. Most of the throughput for the DS8100 storage pools is associated with MDG2, and most of the throughput for the DS5100 storage pools is associated with MDG5. This provides us with some areas of focus for the rest of the analysis. IntelliMagic 2011 Page 14
Figure 6: Storage Pool Throughput Figure 7: MDG2 Volume Throughput illustrates that on MDG2 volume SQLRP has significantly more I/O throughput than any other volume. This breakdown gives us an idea of how we might re-balance or isolate volumes to address over-utilized back-end resources. Figure 7: MDG2 Volume Throughput Figure 8: MDG5 Volume Throughput illustrates that on MDG5, volumes named SQLWH and WFM_DB, contribute the most significant amount of throughput. IntelliMagic 2011 Page 15
Figure 8: MDG5 Volume Throughput The next step is to look at the back-end queue times and response times to see if there are any obvious issues on the back-end storage systems. Focusing on read I/O's first, the front-end response times for read misses include both the read queue time on the SVC controller and the storage system external read response times. Figure 9: Storage Pool External Read Response Time provides a breakdown by storage pool of the latter component. It shows that most of the storage pools have a significant percentage of their external read response times greater than 10 ms. Figure 9: Storage Pool External Read Response Time IntelliMagic 2011 Page 16
The other component of the front-end read miss response time is the SVC queue time. This only includes the amount of time spent waiting on the SVC to be sent to the back-end storage system. Figure 10: Storage Pool External Read Queue Time illustrates that storage pools MDG2, MDG3 and MDG5, all have significant read queue time. For these storage pools there is a significant amount of back-end storage system contention. Figure 10: Storage Pool External Read Queue Time For write I/Os, Figure 11: Storage Pool External Write Response Time shows the external write response time SLAs for each of the storage pools. Storage pools MDG1, MDG2, and MDG3, all on the DS8100, have poor external write response time. This indicates queuing on the front-end of the DS8100 or non-volatile storage delays. MDG5 on the DS5100 also has poor external write response time indicating contention within the DS5100. IntelliMagic 2011 Page 17
Figure 11: Storage Pool External Write Response Time Figure 12: MDG2 mdisk External Write Queue Time illustrates the external queue time for writes to the back-end storage systems. This is the amount of time spent waiting within the SVC for a write to be sent downstream, and does not include the response time to the back-end storage system. All of the mdisks in MDG2 have significant write queue time. Figure 12: MDG2 mdisk External Write Queue Time Since high external read and write response times exist for storage pools associated with the DS8100 and the DS5100, the next step is to explore the back-end storage systems to identify front-end, and back-end bottlenecks. Backend DS8100 Figure 13: MDisk Read Queue Time illustrates the read queue time SLA chart for MDG2. This is a drilldown from Figure 10: Storage Pool External Read Queue Time. All of the mdisks in this IntelliMagic 2011 Page 18
storage pool have very high queue time which indicates that all of the mdisks within the storage pool have contention. Recall from Figure 6: Storage Pool Throughput that MDG2 had the majority of the I/O for storage pools associated with the DS8100. Figure 13: MDisk Read Queue Time Figure 14: MDG2 Back-end RAID Group HDD I/O Rate illustrates the HDD I/O rate including RAID overhead. Many of the RAID groups associated with MDG2 have as many as 2,000 I/Os per second. MDG2 consists of RAID 5 RAID groups consisting of eight drives configured as 7+Parity or 6+Parity+Spare. For those with the 6+P+S configuration, each drive has as many as 285 I/Os per second. Since a single 15 K RPM disk cannot handle this many I/Os per second in a sustained manner, these RAID groups are saturated. IntelliMagic 2011 Page 19
Figure 14: MDG2 Back-end RAID Group HDD I/O Rate Backend DS5100 Figure 15: DS5100 Storage Pool MDG5 Managed Disk Queue Time illustrates that the queue time is bad on all the managed disks associated with MDG5. Figure 15: DS5100 Storage Pool MDG5 Managed Disk Queue Time IntelliMagic 2011 Page 20
Figure 16: DS5100 MDG5 Back-end I/O Rate illustrates the I/O rates for the back-end RAID groups supporting MDG5. Many of the RAID groups have over 800 I/Os per second. These RAID groups are configured as 4+P RAID 5 RAID groups. For a single disk drive, that averages out to be 160 I/Os per second. With this type of I/O rate, these disks are over-utilized. Figure 16: DS5100 MDG5 Back-end I/O Rate Frontend DS8100 Figure 17: Port Response Time for Reads shows that the DSS port read response times for IBM-75, the DS8100, are exceeding 6 ms. This is the amount of time required to read data from the SVC. Inspecting the port IDs we can see that there are only two Host Adapters in this configuration, HA 023 (ports 230-233) and HA 030 (ports 0300-0303). For the DS8000, this measurement indicates the amount of time that is spent waiting on the fabric or host adapter prior to being serviced. This typically indicates contention on the host adapter. It should be noted that the DS5100 (IBM-S) does not provide port read response time measurement at the firmware level that this machine was on, so it is impossible to evaluate the host adapter or port utilization based on this metric. IntelliMagic 2011 Page 21
Figure 17: Port Response Time for Reads Figure 18: Port Write Response Time illustrates the SLA chart for the port write response time. The write response time is the amount of time it takes to send writes back to the SVC. The DS8100 (IBM-75) shows a high percentage of intervals with high write response time. Since both port read and write response times are high, it is likely that the front-end ports are saturated on the DS8100. Figure 18: Port Write Response Time Summary: In the case study we identified significant bottlenecks using IntelliMagic Vision. IntelliMagic Vision easily identified front-end port response time elongation on the DS8100 and back-end disk contention for managed disks associated with both MDG2 on the DS8100, and MDG5 on the DS5100. In the case of the front-end adapter contention, additional host adapters will likely be IntelliMagic 2011 Page 22
required to resolve the issue. For the back-end contention, some of the workload running on MDG2 and MDG5 will need to be migrated to lesser utilized storage pools. Additionally, the heavily accessed volumes on these storage pools could be isolated on their own storage pools to avoid disk contention and high queue time in other, unrelated application volumes. The volume throughput breakdowns can be used to identify those volumes that are good candidates for isolation or load balancing. Using IntelliMagic Vision to proactively monitor the performance of any environment will result in higher SLAs, lower downtime and more satisfied customers. IntelliMagic 2011 Page 23
Performance Analysis Offer For Your SVC Environment IntelliMagic will often do a no fee performance assessment of your SVC / V7000 environment as a way of demonstrating the capabilities in IntelliMagic Vision. If you are interested in such an engagement, please fill out the Performance Assessment request form located on this web page: http://go.intellimagic.net/content/im_home_fpa Section 4. Conclusion In recent years, technology improvements have offered the prospect of significant storage performance gains. Due to a variety of reasons, most of these improvements have yet to be implemented on a wide scale. Some of these technology improvements, primarily capacity utilization improvements, have also provided significant performance challenges. As a result, there continues to be a need for professionals who can identify root cause, recommend problem resolutions, and provide follow-on monitoring for storage performance issues. Performance analysis requires an understanding of storage architectural components and measurements. In addition, performance analysis requires an understanding of the end-to-end I/O path from the host to the storage system. In this paper we highlighted some of the architectural components of the IBM SVC. We also discussed some of their associated measurements. Finally, we utilized IntelliMagic Vision to examine several IBM SVC storage system performance issues. Using IntelliMagic Vision we were able to easily identify the root cause of each of these issues. We realize that many subjects were greatly simplified, and understand that becoming an expert in storage performance management requires significant real world experience that cannot be obtained by reading a white paper. Here at IntelliMagic, we strive to make storage performance management easier by providing world class solutions, support, training, and services. Next time you need some guidance on storage performance issues, feel free to contact us for a free performance analysis at sales@intellimagic.net. IntelliMagic 2011 Page 24
Glossary SVC Concept Table 2: IBM SVC and IntelliMagic Vision terminology IntelliMagic Comments Vision Cluster Node Disk Storage System Host Adapter All nodes that make up the cluster are grouped into one DSS. Vdisk Logical Volume Vdisks are the LUNs that are defined to the servers. Storage Pool / Managed Disk Group Mdisk / Managed Disk Storage Pool/ RAID Group Set RAID Group IntelliMagic 2011 Page 25
Additional Resources SAN Volume Controller Best Practices and Performance Guidelines, IBM Redbook SG247521 Implementing the IBM System Storage SAN Volume Controller V6.1, IBM Redbook SG247933 IntelliMagic 2011 Page 26
IntelliMagic, Inc.: 558 Silicon Drive Ste. 101 Southlake, Texas 76092 (214) 432-7920 Corporate Headquarters: Leiden, The Netherlands +3 1(0)71-579-6000 www.intellimagic.net sales@intellimagic.net