Performance Impact on Exchange Latencies During EMC CLARiiON CX4 RAID Rebuild and Rebalancing Processes



Similar documents
The Effect of Priorities on LUN Management Operations

Navisphere Quality of Service Manager (NQM) Applied Technology

How To Understand And Understand The Power Of Aird 6 On Clariion

Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems

RAID 5 rebuild performance in ProLiant

Virtualized Exchange 2007 Archiving with EMC Xtender/DiskXtender to EMC Centera

Oracle Database Deployments with EMC CLARiiON AX4 Storage Systems

EMC Backup and Recovery for Microsoft SQL Server 2008 Enabled by EMC Celerra Unified Storage

EMC Unified Storage for Microsoft SQL Server 2008

CONFIGURATION GUIDELINES: EMC STORAGE FOR PHYSICAL SECURITY

Performance Validation and Test Results for Microsoft Exchange Server 2010 Enabled by EMC CLARiiON CX4-960

IBM ^ xseries ServeRAID Technology

EMC Backup and Recovery for Microsoft Exchange 2007 SP2

EMC Business Continuity for Microsoft SQL Server Enabled by SQL DB Mirroring Celerra Unified Storage Platforms Using iscsi

MICROSOFT EXCHANGE best practices BEST PRACTICES - DATA STORAGE SETUP

HP ProLiant DL380p Gen mailbox 2GB mailbox resiliency Exchange 2010 storage solution

Leveraging EMC Fully Automated Storage Tiering (FAST) and FAST Cache for SQL Server Enterprise Deployments

EMC Integrated Infrastructure for VMware

VIDEO SURVEILLANCE WITH SURVEILLUS VMS AND EMC ISILON STORAGE ARRAYS

Comparing Dynamic Disk Pools (DDP) with RAID-6 using IOR

Configuring RAID for Optimal Performance

Storage node capacity in RAID0 is equal to the sum total capacity of all disks in the storage node.

Evaluation Report: Supporting Microsoft Exchange on the Lenovo S3200 Hybrid Array

DELL TM PowerEdge TM T Mailbox Resiliency Exchange 2010 Storage Solution

EMC VNX2 Deduplication and Compression

EMC Backup and Recovery for Microsoft SQL Server

Minimum Hardware Configurations for EMC Documentum Archive Services for SAP Practical Sizing Guide

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III

Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations. Database Solutions Engineering

RAID Basics Training Guide

Intel RAID Volume Recovery Procedures

Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server

EMC Integrated Infrastructure for VMware

Domain Management with EMC Unisphere for VNX

Sonexion GridRAID Characteristics

HP Smart Array Controllers and basic RAID performance factors

an analysis of RAID 5DP

Models Smart Array 6402A/128 Controller 3X-KZPEC-BF Smart Array 6404A/256 two 2 channel Controllers

Lecture 36: Chapter 6

EMC Backup and Recovery for Microsoft Exchange 2007

DEPLOYING ORACLE DATABASE APPLICATIONS ON EMC VNX UNIFIED STORAGE

Intel Rapid Storage Technology

Implementing EMC CLARiiON CX4 with Enterprise Flash Drives for Microsoft SQL Server 2008 Databases

EMC Virtual Infrastructure for Microsoft Applications Data Center Solution

AX4 5 Series Software Overview

RAID Implementation for StorSimple Storage Management Appliance

RAID Technology Overview

EMC CLARiiON Guidelines for VMware Site Recovery Manager with EMC MirrorView and Microsoft Exchange

How To Create A Multi Disk Raid

Reference Architecture. EMC Global Solutions. 42 South Street Hopkinton MA

Accelerating Server Storage Performance on Lenovo ThinkServer

Lab Evaluation of NetApp Hybrid Array with Flash Pool Technology

EMC XtremSF: Delivering Next Generation Performance for Oracle Database

EMC Backup and Recovery for Microsoft SQL Server

HP StorageWorks Enterprise Virtual Array configuration best practices white paper

HP Smart Array 5i Plus Controller and Battery Backed Write Cache (BBWC) Enabler

Qsan Document - White Paper. Performance Monitor Case Studies

Microsoft SharePoint Server 2010

technology brief RAID Levels March 1997 Introduction Characteristics of RAID Levels

TECHNOLOGY BRIEF. Compaq RAID on a Chip Technology EXECUTIVE SUMMARY CONTENTS

QuickSpecs. HP Smart Array 5312 Controller. Overview

EMC VFCACHE ACCELERATES ORACLE

EMC VNXe HIGH AVAILABILITY

IncidentMonitor Server Specification Datasheet

An Integrated End-to-End Data Integrity Solution to Protect Against Silent Data Corruption

Dynamic Disk Pools Technical Report

EMC Virtual Infrastructure for Microsoft SQL Server

Intel RAID Web Console 2 and StorCLI Command Line Tool

Hitachi Unified Storage 110 Dynamically Provisioned 10,400 Mailbox Exchange 2010 Mailbox Resiliency Storage Solution

RAID Level Descriptions. RAID 0 (Striping)

Microsoft Exchange Server 2003 Deployment Considerations

Greenplum Database (software-only environments): Greenplum Database (4.0 and higher supported, or higher recommended)

Intel RAID Controllers

CLARiiON Performance Monitoring Scripting

EMC CLARiiON CX3 Series FCP

Assessing RAID ADG vs. RAID 5 vs. RAID 1+0

EMC NetWorker Module for Microsoft for Windows Bare Metal Recovery Solution

RAID Made Easy By Jon L. Jacobi, PCWorld

EMC CLARiiON Fibre Channel Storage Fundamentals Technology Concepts and Business Considerations

VERITAS Software - Storage Foundation for Windows Dynamic Multi-Pathing Performance Testing

Delivering Accelerated SQL Server Performance with OCZ s ZD-XL SQL Accelerator

EMC APPSYNC AND MICROSOFT SQL SERVER A DETAILED REVIEW

Violin Memory Arrays With IBM System Storage SAN Volume Control

RAID 6 with HP Advanced Data Guarding technology:

Configuring EMC CLARiiON for SAS Business Intelligence Platforms

Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations

Hitachi Unified Storage 130 Dynamically Provisioned 8,000 Mailbox Exchange 2010 Mailbox Resiliency Storage Solution

EMC MID-RANGE STORAGE AND THE MICROSOFT SQL SERVER I/O RELIABILITY PROGRAM

EMC Symmetrix Data at Rest Encryption

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering

Best Practices for Optimizing SQL Server Database Performance with the LSI WarpDrive Acceleration Card

MICROSOFT HYPER-V SCALABILITY WITH EMC SYMMETRIX VMAX

EMC Replication Manager and Kroll Ontrack PowerControls for Granular Recovery of SharePoint Items

Deploying Microsoft Exchange Server 2007 mailbox roles on VMware Infrastructure 3 using HP ProLiant servers and HP StorageWorks

Microsoft SQL Server 2005 on Windows Server 2003

Transcription:

Performance Impact on Exchange Latencies During EMC CLARiiON CX4 RAID Rebuild and Applied Technology Abstract This white paper discusses the results of tests conducted in a Microsoft Exchange 2007 environment. These tests examined the effects of single- and multiple-drive failures on Exchange s performance when using RAID 5 or RAID 6 technology. March 2010

Copyright 2010 EMC Corporation. All rights reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com All other trademarks used herein are the property of their respective owners. Part Number h6944 Applied Technology 2

Table of Contents Executive summary...5 Introduction...5 Audience... 5 Terminology... 6 Overview of test results...6 Overview of tests...7 About the tests... 7 Testing steps with no load... 7 Testing steps with load... 8 RAID group and LUN configuration... 8 Database RAID groups... 8 Testing plan and schedule... 8 Storage processor events... 9 Test results for baseline RAID 5...10 Test 001: RAID 5 with no activity and one drive pull... 12 Utilization... 12 Response time... 12 Storage processor utilization... 13 Test 001: RAID 5 rebalance with one drive replacement... 14 Utilization... 14 Response time... 15 Storage processor utilization... 15 Test 002: RAID 5, eight-hour Jetstress, with one drive pull... 17 Utilization... 17 Response time... 18 Storage processor utilization... 18 Test 002: RAID 5 rebalance after Jetstress with one drive replacement... 21 Test results for RAID 6...23 Test 004: RAID 6 with no activity and two drive pulls... 23 Utilization... 23 Response time... 23 Storage processor utilization... 24 Test 004: RAID 6 rebalance with two drive replacements... 24 Utilization... 24 Response time... 25 Storage processor utilization... 25 Test 005: RAID 6, eight-hour Jetstress, with two drive pulls... 26 Utilization... 26 Response time... 27 Storage processor utilization... 27 Test 005: RAID 6 rebalance after Jetstress with two drive replacements... 28 Utilization... 28 Response time... 28 Storage processor utilization... 29 Applied Technology 3

Jetstress comparison between the RAID 6 baseline and RAID 6 during hot spare rebuild... 30 Jetstress comparison between RAID 5 and RAID 6 during hot spare rebuild... 31 Conclusion...32 References...32 Applied Technology 4

Executive summary The Total Customer Experience (TCE) program, which is driven by Lean Six Sigma methodologies, demonstrates EMC s commitment to maintaining and improving the quality of EMC s products. In keeping with this philosophy, EMC designed Customer Integration Labs in its Global Solutions Centers and Partner Engineering Labs where we conduct rigorous tests that reflect real-world environments. In these tests, we design and execute TCE use cases and carefully measure performance. These TSE use cases provide us with insight into the challenges currently facing our customers, allowing us to provide the highest quality products. This white paper describes how the EMC CLARiiON implementations of RAID 5 and RAID 6 were tested. These tests determined the impact that RAID rebuilding (due to drive failures) and rebalancing have on the performance of a CLARiiON CX4-480, when the CX4-480 is operating with or without a load. The results of these tests demonstrated the superior ability of CLARiiON s RAID 5 and RAID 6 technologies to fail over to a hot spare when faced with a drive failure. Introduction This white paper summarizes the results of tests on an EMC CLARiiON CX4-480 that were obtained by testing with the functionality of RAID 5 and RAID 6 sets with hot spare disks. These tests were conducted with and without application activity. This paper discusses the test use-case scenarios, objectives, expected results, and actual results. The configurations for these tests included: CLARiiON CX4-480 storage configuration for a RAID 5 4+1 RAID group set CLARiiON CX4-480 storage configuration for a RAID 6 4+2 RAID group set CLARiiON CX4-480 storage configuration for hot spare disks and a RAID group These tests did not include: Install and configuration of Microsoft Exchange 2007 Install and configuration of Microsoft Jetstress 2007 Install and configuration of Microsoft Windows 2008 Creation of RAID 5 or RAID 6 CLARiiON RAID groups Audience The intended audience for this white paper is: Internal EMC personnel EMC Partners The audience should have a firm understanding of the following: CLARiiON CX4 RAID technology Navisphere UI or NaviSECCLI for the creation of RAID 5 and/or RAID 6 RAID groups Navisphere UI or NaviSECCLI for the creation of RAID 5 and/or RAID 6 LUNs Navisphere UI or NaviSECCLI for the creation of RAID 5 and/or RAID 6 metaluns Navisphere UI or NaviSECCLI for the creation of hot spare RAID groups/hot spare disk(s) Applied Technology 5

Terminology RAID: Redundant Array of Independent Disks. RAID 5: RAID 5 uses block-level striping with parity data distributed across all member disks. RAID 5 can recover from a single drive failure, while allowing the set to remain functional until the failed drive is replaced. RAID 5 calculation of drives is (n-1), where n is the total drives with 1 representing the loss of one drive. RAID 5 will lose data upon the loss of more than one drive. RAID 6: RAID 6 extends RAID 5 by adding an additional parity block using block-level striping with two parity blocks distributed across all member disks. Unlike RAID 5, RAID 6 has the ability to remain online while recovering from two drive failures, giving it an added level of fault tolerance over RAID 5. RAID 6 calculation of drives is (n-2), where n is the total drives with the lost of capacity of two total drives. The loss of capacity is offset with the fault-tolerance loss of three drives before total failure. Hot spare: The CLARiiON CX4 allows you to create hot spare disks; host spare disks are used for the short-term replacement of failed disks. Hot spares allow for additional time for user/vendor replacement of failed drives. The usage of hot spares decreases the possibility of RAID 5 and RAID 6 sets from total failure by replacing failed drive(s) without administrator/user intervention. Hot spare drives are not intended to be complete replacement of failed drives, and replacement of failed drives as soon as possible is expected. A recommendation rule of thumb is 30:1 drives to hot spares on CLARiiON arrays. Rebuild: This process is when a hard drive fails (or is marked bad) and a hot spare drive is available. The parity bits on the good drives are used to rebuild the failed drive. The rebuild process has a greater impact on response times than the disk replacement rebalancing process. During the rebuild process, each good drive reads eight 512 KB reads in its (4 MB) queue. 4 MB of data is written to a hot spare drive (eight 512 KB) through the back-end bus. Rebalance: This process occurs when a failed hard drive is replaced within a RAID group. Data is copied from the hot spare drive to the replacement drive through the back-end bus. Overview of test results During testing, all databases remained online without corruption. For both RAID 5 and RAID 6, database latencies rose slightly during the rebuild-to-hot-spare process. Array utilization barely registered above normal during the rebalancing process. (During the rebalancing process, the disk is rebalanced from the hot spare to the replaced drive.) The rebuild priority can be changed based on application requirements. These settings affect the performance of the storage processor. During testing the default setting for rebuild priority was set to High. Increasing from High to ASAP reduced the time for hot spare rebuild but affected the performance of the storage processor. Reducing the setting to Medium or Low lowered the impact on the storage processor performance during the hot spare rebuild. Lowering the rebuild priority decreased the impact on the storage processor and increased the amount of time it took to rebuild to the hot spare drive. Customers wishing to minimize the possibility of a second drive failure during the longer rebuild process based on the lower setting may wish to consider RAID 6, which allows for the loss of two drives within the same RAID group. Applied Technology 6

Figure 1. Snapshot of the Navisphere UI setting for Rebuild Priority options Overview of tests EMC conducted the following tests to measure the performance impact of RAID 5 and RAID 6 hot spare drive replacement and rebalancing with and without a load. We used Microsoft Jetstress to simulate reallife activity. We did this to obtain data points (for customers) that show how Microsoft Exchange will react and how their user base will be affected during rebuild and rebalance operations. RAID 5 was tested by removing a single drive; the drive was physically pulled from the array without preparation on the array or server. Hot spare drives were set up on the array according to EMC recommendations. This test measured the impact a failed drive had on the storage processors and Exchange servers. It measured the impact during the hot spare rebuild. It also showed what happened when the (simulated) repaired drive was placed back into the array, which was not done until the storageprocessor event logs showed that the array was stabilized and the rebuild/sync functions were completed. RAID 6 was tested the same way, except that in this test we removed two drives at the same time to prove that this technology can function without disruption with two failed drives. In this case, databases remained online. The testing was not done to see if Jetstress would fail; instead it showed how database latencies increasing during the rebuild and/or rebalance processes. We conducted baseline Jetstress testing on the RAID 5 and RAID 6 configurations. These results were compared to the results during the drive failure tests to determine the impact of rebuild and rebalance operations. About the tests Testing steps with no load 1. Clear all logging to ensure only current test data is within saved logs: Navisphere Analyzer (NAR) Navisphere event logs on each storage processor Windows event logs (system/application) 2. Start all logging: Navisphere Analyzer Windows Performance Monitor (PerfMon) 3. Wait 15 minutes. 4. Remove a predetermined drive in a predetermined RAID group homing Exchange database LUNs. For example: 1_0_1 RAID 5, 1_0_0 and 1_0_1 RAID 6. 5. Monitor Event logs on storage processors until events show that: All rebuilds for a FRU have completed. CRU Unit rebuild is complete. 6. Monitor NaviAnalyzer to confirm SP utilization has normalized. 7. Stop all monitoring. 8. Save all logs: Applied Technology 7

NAR Storage processor event log Perfmon Windows event logs (system/application) Testing steps with load 1. Clear all logging to ensure only current test data is within saved logs: Navisphere Analyzer (NAR) Navisphere event logs on each storage processor Windows event logs (system/application) 2. Start all logging: Navisphere Analyzer Windows Performance Monitor (PerfMon) 3. Start Microsoft Jetstress. 4. Wait 15 minutes (this is to get information in the logs before failure). 5. Remove a predetermined drive in a predetermined RAID group homing Exchange database LUNs. IE: 1_0_1 in the RAID 5 configuration, 1_0_0 and 1_0_1 in the RAID 6 configuration. 6. Monitor event logs on storage processors until events show that all rebuilds for a FRU have completed: CRU Unit Rebuild Complete Monitor NaviAnalyzer to confirm SP utilization has normalized Stop all monitoring 7. Save all logs: NAR Storage processor event log Windows event logs (system/application) RAID group and LUN configuration For both the RAID 5 and RAID 6 testing the capacity yield is four disks total capacity (n=4). Database RAID groups This configuration resulted in a total of eight DB metaluns and eight log file metaluns for eight ESGs: RAID 5, two RAID groups at 4+1 16 LUNs in each RAID group 8 metaluns were created from the 16 component LUNs. RAID 6, two RAID groups at 4+2 16 LUNs in each RAID group 8 metaluns were created from the 16 component LUNs. Log file RAID groups RAID 10 2+2 16 LUNs in each RAID group 8 metaluns created from the 16 component LUNs. Testing plan and schedule Please note that: The test plan shown in Table 1 outlines the test, RAID configuration, length of time, activity, and description of the test. Baseline testing using Microsoft Jetstress was also done for each of the RAID configurations with no drive loss/pull. Applied Technology 8

Test 001 and Test 004 show a length of time as to-be-determined (TBD). These tests determined how long it took to rebuild a RAID group to a hot spare. Table 1. Test plan Test Baseline RAID 5 RAID type Hot spares Length Activity Description RAID 5 1 TBD Jetstress Baseline RAID 5: Run Jetstress with no failures to gather data on I/O and latencies for future tests. Gather all logs upon completion. Test 001 RAID 5 1 TBD None Pull 1 drive: Allow hot spare to rebuild. Monitor event viewer on SP for completion. Gather all logs upon completion. Test 002 RAID 5 1 8h Jetstress Start an 8-hour Jetstress performance test. Pull 1 drive: Allow hot spare to rebuild. Monitor event viewer on SP for completion. Gather all logs upon completion. Test 004 RAID 6 2 TBD None Pull 2 drive(s): Allow hot spare to rebuild. Monitor event viewer on SP for completion. Gather all logs upon completion. Test 005 RAID 6 2 8h Jetstress Start an 8-hour Jetstress performance test. Pull 2 drive(s): Allow hot spares to rebuild. Monitor event viewer on SP for completion. Gather all logs upon completion. Storage processor events The messages in SP event logs were used to determine the following: Exactly when the drives failure was noted through the event: Disk(Bus 1 Enclosure 0 Disk 1) failed or was physically removed Exactly when the hot spare(s) began replacing the failed drive(s): Hot Spare is now replacing a failed drive. When rebuild has completed: All rebuilds for a FRU have completed When the failed drive was replaced: Drive was physically inserted into the Slot When the replaced drive has synced and hot spare no longer used: Hot Spare is no longer replacing a failed drive: Applied Technology 9

Test results for baseline RAID 5 Applied Technology 10

The Jetstress report shows the same average, about 16 ms db read latencies, achieving 1476 IOPS in a RAID 4+1 configuration: Applied Technology 11

Test 001: RAID 5 with no activity and one drive pull Utilization This chart shows RAID 5 utilization with no activity and one drive pull: Response time RAID group response time remained a steady 30 ms for RAID Group 20 (the RAID group was affected by drive pull): Applied Technology 12

Storage processor utilization The storage processor utilization increased to approximately 10 percent throughout the rebuild: This read size (KB) chart shows the rebuild process reading 512 KB from all disks in the RAID group: Applied Technology 13

The read bandwidth (measured in MB/s) was doing well at approximately 48 MB/s: Test 001: RAID 5 rebalance with one drive replacement Utilization The Navisphere chart shows that RAID group utilization rose to 8-10 percent during the rebalance to the replaced drive, and it took 6.7 hours. Applied Technology 14

Response time Storage processor utilization Applied Technology 15

Read size was, as expected, at 512 KB for the single drive accessed during the rebalance copy process. The image below shows a single drive being read from for this process. The read bandwidth in this test was 13 MB/s. Applied Technology 16

Test 002: RAID 5, eight-hour Jetstress, with one drive pull Utilization This chart shows several characteristics of utilization during the RAID rebuild: The difference in RAID group utilization from the affected RG, when the drive was pulled and was not at max, was 30 percent. As the rebuild continued to completion, the second RAID group s utilization increased. Utilization for both RAID groups was identical until the drive was pulled. Then utilization for the RAID group increased almost 40 percent for the rebuild, while the second RAID group increased slightly in the beginning and dropped back to the same levels (about 48 percent) through the rest of the test. Applied Technology 17

Response time This chart shows that before the rebuild started, the storage processors were about 7 percent active, with RG20 increasing as expected during the rebuild time, but only 5 percent during the rebuild process. RG21 appears to remain at about 7 percent throughout the test. Storage processor utilization This chart shows clearly when the rebuild begins and ends, along with spikes for the storage processor during CRU rebuild processes for this drive: Applied Technology 18

After the drive is pulled until the rebuild is complete, the read size goes from a low of 64 KB up to 100 KB. The read bandwidth, in MB/s, is shown below: Applied Technology 19

Total bandwidth (MB/s) for the RAID group was approximately 58 MB/s: Applied Technology 20

Jetstress testing shows the effects of the RAID rebuild compared to baseline tests. Achieved IOPS were 1279, or 13 percent lower than baseline. RG20 IOPS were 641.226, or 13 percent lower than baseline. DB read latencies for RG20 were 19.25 ms, or 22 percent higher than baseline. DB read latencies for RG21 were 12.75 ms, or 12 percent lower than baseline (this would be expected due to the lower IOPS). Write latencies did not change for either RAID group database or log files. Test 002: RAID 5 rebalance after Jetstress with one drive replacement Because the RAID group utilization and response times registered at almost 0 (or 1 2.5 percent, which is attributable to normal array processes), the following chart shows RAID group utilization, RAID group response time, and storage processor utilization. This chart shows that, after a failed drive replacement, a rebalance does not affect the RAID groups or storage processors. The time to complete was 6.7 hours. Applied Technology 21

Applied Technology 22

Test results for RAID 6 Test 004: RAID 6 with no activity and two drive pulls For the following tests two drives were pulled to show the ability of RAID 6 to recover from two drive failures. Utilization Response time This increased during the parity rebuild process but no more than during RAID 5. Applied Technology 23

Storage processor utilization Storage processor utilization rose slightly during the parity rebuild process in the beginning, but then it dropped quickly. Test 004: RAID 6 rebalance with two drive replacements Utilization RAID group utilization increased slightly during the rebalance process: Applied Technology 24

Response time RAID group response time had a brief increase during the start, and remained ~12 percent throughout the rebalance. The spike at the end was not related to these tests and caused by the polling/pulling of data. Storage processor utilization Storage processor utilization during the rebalance process was unremarkable, with the utilization against them being insignificant compared to normal operations. Applied Technology 25

Test 005: RAID 6, eight-hour Jetstress, with two drive pulls Utilization This test showed the effects of parity rebuild upon the storage processors and RAID group during heavy load: RAID group utilization before the drives were pulled showed both at 80 percent. Upon simulated drive failure and the drives being pulled, the affected RAID group increased to 90 percent utilization, while the second RAID group s utilization dropped to 60 percent. This is expected as the array gives more priority to the faulted RAID group. The time it took to complete, even during a heavy load, was the same at 6.7 hours. Applied Technology 26

Response time Before the drives were pulled, response times with Jetstress running were about 8 percent. Upon the simulated failure, the affected RAID group increased slightly to 12 percent (still below Microsoft best practices) while the other RAID groups response times dropped slightly. Storage processor utilization Applied Technology 27

Test 005: RAID 6 rebalance after Jetstress with two drive replacements Utilization RAID group utilization was almost identical to the rebalance without activity. Response time RAID group response times were the same as other rebalance tests, remaining at about 12 percent and raising slightly before completion. Applied Technology 28

Storage processor utilization For storage processor utilization, as with other rebalance tests, the utilization remains insignificant, rising briefly to 2 percent but not much higher than normal services processor utilization. With a read size of 512 KB, using Release 22 plus RAID 5, 4 MB of data was read from a hot spare in 512 KB chunks, and written to the repaired drive in the same manner as shown below: Read bandwidth during the rebalance after Jetstress averaged about 12 MB/s: Applied Technology 29

Jetstress comparison between the RAID 6 baseline and RAID 6 during hot spare rebuild In this test: The achieved IOPS was 1170.996, or 40 percent lower than baseline. RG20 IOPS was 586.472, or 41 percent lower than baseline. The DB read latency for RG20 was 21 ms, or 32 percent higher than baseline. The DB read latency for RG21 was 14.5 ms, or 4 percent higher than baseline. Applied Technology 30

Jetstress comparison between RAID 5 and RAID 6 during hot spare rebuild In this test: The achieved IOPS was 1170, or 8 percent higher than RAID 5. The RG20 IOPS was 586.472, or 8 percent higher than RAID 5. The DB read latency for RG20 was 21 ms, or 8 percent higher than RAID 5. The DB read latency for RG21 was 14.5 ms, or 12 percent higher than RAID 5. Write latencies did not change for either RAID group database or log files. Applied Technology 31

Conclusion These tests demonstrate the superior ability of CLARiiON s RAID 5 and RAID 6 technologies to fail over to a hot spare when faced with a drive failure. RAID 5 was tested with four data drives and had the one parity drive that is required for RAID 5 technology. RAID 6 was tested with four data drives and had the two parity drives required for RAID 6 technology. In a Microsoft Exchange environment with RAID 5, with one drive failure, the CLARiiON failed over to a hot spare and the following occurred: IOPS dropped slightly and latencies increased slightly during the rebuild process. No database was dismounted or corrupted. No server lost connectivity to the array. In a Microsoft Exchange environment with RAID 6, the CLARiiON recovered from two drive failures and IOPS dropped slightly and latencies increased slightly during the rebuild process and the following occurred: No database was dismounted or corrupted. No server lost connectivity to the array. It was able to recover multiple drive failures. It was able to rebuild a hot spare in 6.7 hours Both RAID technologies had outstanding performance during the rebalancing from the hot spare to the replacement drives. Additionally, both technologies took under 7 hours to complete, and the processor, RAID group utilization, and response times barely registered usage while rebalancing. References EMC CLARiiON Best Practices for Performance and Availability: Release 29.0 Firmware Update Applied Best Practices Applied Technology 32