Bright Idea: GE s Storage Performance Best Practices Brian W. Walker Principal Architect, Cloud Solutions 1
Speaker Introduction Brian Walker Principal Architect, Cloud Solutions Brian brings more than 12 years of storage experience gleaned from companies like GE and Transamerica. He s considered an expert in architecting complex storage and network solutions that affect bottom-line profits and brand image. Brian is based in Madison, WI 2
The GE Corporate Storage Infrastructure Supports 9 GE business units Primary storage suppliers EMC and Hitachi for FC storage NetApp for NAS Constantly evaluating new products and technologies Cisco for FC & IP switching 20+PBs managed Over 3,000 applications 60% virtualized, public / hybrid cloud approach Driving storage consolidation leveraging storage virtualization and dynamic tiering 3
Why GE Needed a Storage Performance Validation Process Problem IT team did not have a standardized benchmarking test to leverage when architecting and implementing storage solutions Resulted in inconsistent baseline metrics as well as potential duplication of work during new technology POC s Goals Develop and implement a repeatable, scalable and vendor neutral storage performance analysis framework to standardize baseline metric collection of storage technologies Create a common documentation and presentation strategy to promote results sharing across teams 4
Storage Performance Framework Project Charter Framework Design Storage Performance Framework 1. Key Performance Metrics 2. Available Tools 3. SPF Benchmarking Test Suite 4. Application Workload Profiling / Emulation 5. Testing Methodology 6. Storage Performance Dashboard Project Team Members 1. Champion 2. Project Owner 3. Storage Engineering 4. Principle Technologist 5. Enterprise Architect 6. Senior Architect Key Milestones Metric and profile definition Tool and test build out Collection and documentation Process integration 5
Open Source Tools Evaluated IOzone File system based CLI driven Limited graphing and analytics natively available High level of complexity at enterprise scale VDBench File system based Built on Java Both GUI and CLI driven Established in the industry Lacks flexibility in test definition SWAT Built on Java Workload capturing functionality Integration with VD Bench for replay 20,000 IOPS limit Lacks workload generation capability All lack flexibility and scalability to enterprise environment 6
Finding the Best Performance Validation Solution GE Problem No product can accurately recreate production-like workloads on storage hardware within the storage environment. Impacts ability to make informed decisions when introducing, or upgrading to, new storage solutions. Solution: Load DynamiX POC Duration (just over a month): April 24 May 31, 2013 Installed 2 Load Dynamix FC Series Appliances with workload modeling software 2x 8G FC ports per device Rated at 175,000 IOPS per port Later, added Load DynamiX 1G and 10G Series appliances 7
GE Testing Strategy: Use Cases Baseline Storage Array Performance (SPF Test Suite) 100% Seq. Read 1K 100% Seq. Write 1K 100% Seq. Read 1MB 100% Seq. Write 1MB 50-50 Seq. Read/Write 1K & 1MB 50-50 Seq. Read/Write Ideal Block (1K-1MB) Quantify breaking points and application workload impacts GE Application Workload Modeling (Transportation ERP) 8-1 Seq. Read/Write Ratio @ 94% of I/O 1-1 Rnd. Read/Write Ratio @ 6% of I/O 8K Default Block 22K Read I/O Size 28K Write I/O Size 8
Storage Performance Framework Components Tools Load DynamiX POC OS Tool Evaluation Workload Profiling Top 20 App Profiling Reporting Central Library Load DynamiX Enterprise Platform Processes Core App Migration Compute Lab 9
Storage Performance Framework Process & POC Overview New H/W Solution New Code Release Config Change New App Profile Storage Performance Benchmarking Framework Config / Install in Lab Standardized and automated approach for performance baseline collection Run SPF Test Suite Correlate data & develop baseline Upload to Storage Performance Dashboard Storage Eng & Ops Feedback POC Overview Ability to connect multiple storage arrays at once Load DynamiX Enterprise Switch Load DynamiX Validation Appliance Vendor or Configuration A Vendor or Configuration B 10
Load DynamiX Results Matched Storage Vendor Results Test Criteria Test 1 Sequential Read 1K Block / 1K File 200 Users; 1 hr. Duration Test 4 Sequential Write 1MB Block / 1MB File 200 Users; 1 hr. Duration Test 6 Sequential Read/Write 1MB Block / 1MB File 200 Users; 1 hr. Duration Test 7 Ideal Block Size 1K 1MB Block/File 200 Users; 1 hr. Duration 4 Port Load DynamiX FC Series 4 Port Vendor Tool IOPS Throughput IOPS Throughput 345,000 340 MB/s 346,000 339 MB/s 1,624 1,629 MB/s 1,616 1,618 MB/s 2,200 2,200 MB/s 2,137 2,138 MB/s 345,000@ 1K 350 MB/s@1k 350,000@1K 2,200@ 1MB 2,200 MB/s@ 1MB 2,260@1MB 338 MB/s@1K 2,221MB/s@ 1MB 11
Use Cases 12
Sample Use Cases for Load DynamiX Validated performance of new SAN vendor Varied drive mix to balance IOPS requirements vs. cost Identified array IOPs and throughput limits Iterated from small blocks and small transfers to large blocks and large transfers Optimized port layout scheme & set guidelines for traffic per port Evaluated behavior of tiering policies Used to set time expectations for when a new host or app would reach it normal mix. Validated SAN monitoring tools performance stats were consistent with Load DynamiX 13
Application Profile KPI s Application Level Avg. Transaction Response Time (s) Avg. Transaction failure rate (%) Avg. Throughput (MB/s) Standard block size Host Level CPU Utilization (%) Memory Utilization (%) I/O Utilization (%) HBA configuration Focus on key metrics Data for peak and normal load Storage Level Switch Ave. throughput (MB/s) Array / Device Avg. Throughput (MB/s) Avg. Response Time (ms) Avg. IOPS (Read & Write) Read vs. Write Ratio (%) Color Key Raw Numbers Configuration info Sequential vs Random workloads (%) Overall Avg. I/O Size Amount of storage (GB/TB) / Growth rate Clustering configuration Replication configuration Current tier (if available)* *What storage mix defines the tier 14
Example Storage Product Validation Applying the Workload for 1 hr. IOPs converged at ~95K 15
Example Storage Product Validation, Continued Latency ranged from 4 to 5.3ms Throughput peaked at 1.7GB/s 16
SPF Results: Product Comparison Vendor 2 offers superior for ERP Workload Criteria Vendor 1 1 port Vendor 2 1 port Vendor 1 4 port Vendor 2 4 port IOPS 27,000 165,000 140,000 450,000 Throughput 420 MB/s 750MB/s 1,650MB/s 3,050 MB/s Response Time 14ms 1ms 1ms.4ms 17
Why Load DynamiX over Open Source Tools? Comprehensive all-in-1 solution Granular workload modeling True enterprise scalability Simpler to install and maintain Easier to use; no scripting Superior reporting tools Professionally supported vs. 18
Benefits of Storage Performance Validation Standardized storage performance metrics Standardized storage testing methodology Performance limits are known before production No downtime due to performance problems Storage configurations optimized for cost/benefit Results in Accelerated Innovation 19
Thank You! Brian W. Walker Principal Architect, Cloud Solutions 20
Result Details Storage Performance Validation Numbers 21
SPF Baseline Results (Vendor 1) Test Criteria Test 1 Sequential Read 1K Block / 1K File 200 Users - 1 hr. Duration Test 2 Sequential Read 1MB Block / 1MB File 200 Users -1hr. Duration Test 3 Sequential Write 1K Block / 1K File 200 Users - 1 hr. Duration 1 Port Load DynamiX 4 Port Load DynamiX Resp. Resp. IOPS Throughput IOPS Throughput Time Time Start Time: 4:30 PM 5-28 Start time: 6:30 PM 5-28 27,00 0 27 MB/s 14ms 140,000 140MB/s 1ms Start time: 8:30 PM 5-28 Start time: 8:35 AM 5-29 411 420 MB/s 495ms 1,650 1,650MB/s 121ms Start time: 10:30 AM 5-29 Start time: 1:12 PM 5-29 8,800 8.7 MB/s 22ms 35,000 35MB/s 6ms 22
SPF Baseline Results (Vendor 2) Test Criteria Test 1 Sequential Read 1K Block / 1K File 200 Users - 1 hr. Duration Test 2 Sequential Read 1MB Block / 1MB File 200 Users - 1hr. Duration Test 3 Sequential Write 1K Block / 1K File 200 Users - 1 hr. Duration 1 Port Load DynamiX 4 Port Load DynamiX IOPS Throughput Resp. Time IOPS Throughput Resp. Time Start Time: 8:35 AM 6-3 Start time: 10:27 AM 6-3 165,000 160 MB/s 1ms 345,000 340 MB/s 1ms Start time: 11:50 AM 6-3 Start time: 2:01 PM 6-3 760 750 MB/s 268ms 3,000 3,050 MB/s 65ms Start time: 8:58 AM 6-4 Start time: 10:05 AM 6-4 150,000 150 MB/s 1ms 450,000 450 MB/s.4ms 23