Hitachi VSP Array with HAF FLASH. Performance of the Hitachi VSP HAF Flash Product - Benchmark + in Production Analysis Mark Weber, Optum Technology



Similar documents
SCI Briefing: A Review of the New Hitachi Unified Storage and Hitachi NAS Platform 4000 Series. Silverton Consulting, Inc.

Comparison of Hybrid Flash Storage System Performance

HITACHI VIRTUAL STORAGE PLATFORM FAMILY MATRIX

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

Lab Validation Report

HITACHI VIRTUAL STORAGE PLATFORM FAMILY MATRIX

Evaluation Report: Supporting Microsoft Exchange on the Lenovo S3200 Hybrid Array

Evaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array

Violin Memory 7300 Flash Storage Platform Supports Multiple Primary Storage Workloads

HP SN1000E 16 Gb Fibre Channel HBA Evaluation

SUN STORAGE F5100 FLASH ARRAY

VNX HYBRID FLASH BEST PRACTICES FOR PERFORMANCE

The IntelliMagic White Paper: Storage Performance Analysis for an IBM Storwize V7000

New Hitachi Virtual Storage Platform Family. Name Date

White paper. QNAP Turbo NAS with SSD Cache

Evaluation of Enterprise Data Protection using SEP Software

High Performance Tier Implementation Guideline

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE. Matt Kixmoeller, Pure Storage

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

AirWave 7.7. Server Sizing Guide

Lab Evaluation of NetApp Hybrid Array with Flash Pool Technology

Understanding endurance and performance characteristics of HP solid state drives

Windows 8 SMB 2.2 File Sharing Performance

IOmark-VM. DotHill AssuredSAN Pro Test Report: VM a Test Report Date: 16, August

White Paper. Educational. Measuring Storage Performance

Evaluation Report: Database Acceleration with HP 3PAR StoreServ 7450 All-flash Storage

SSDs and RAID: What s the right strategy. Paul Goodwin VP Product Development Avant Technology

Dionseq Uatummy Odolorem Vel

WEBTECH EDUCATIONAL SERIES

CONFIGURATION BEST PRACTICES FOR MICROSOFT SQL SERVER AND EMC SYMMETRIX VMAXe

High Performance SQL Server with Storage Center 6.4 All Flash Array

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Benchmarking Hadoop & HBase on Violin

Xanadu 130. Business Class Storage Solution. 8G FC Host Connectivity and 6G SAS Backplane. 2U 12-Bay 3.5 Form Factor

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

THE CEO S GUIDE TO INVESTING IN FLASH STORAGE

NetApp FAS Hybrid Array Flash Efficiency. Silverton Consulting, Inc. StorInt Briefing

Capacity planning for IBM Power Systems using LPAR2RRD.

Performance Beyond PCI Express: Moving Storage to The Memory Bus A Technical Whitepaper

Technical Paper. Best Practices for SAS on EMC SYMMETRIX VMAX TM Storage

Scaling from Datacenter to Client

Accelerate Oracle Performance by Using SmartCache of T Series Unified Storage

WHITE PAPER Guide to 50% Faster VMs No Hardware Required

CONSOLIDATING MICROSOFT SQL SERVER OLTP WORKLOADS ON THE EMC XtremIO ALL FLASH ARRAY

Best Practices for Optimizing SQL Server Database Performance with the LSI WarpDrive Acceleration Card

Accelerating Applications and File Systems with Solid State Storage. Jacob Farmer, Cambridge Computer

SAS Analytics on IBM FlashSystem storage: Deployment scenarios and best practices

Characterize Performance in Horizon 6

Understanding Flash SSD Performance

ARCHITECTURE WHITE PAPER ARCHITECTURE WHITE PAPER

Analysis of VDI Storage Performance During Bootstorm

How To Fix A Fault Fault Fault Management In A Vsphere 5 Vsphe5 Vsphee5 V2.5.5 (Vmfs) Vspheron 5 (Vsphere5) (Vmf5) V

RAID 5 rebuild performance in ProLiant

Performance Report Modular RAID for PRIMERGY

Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays

The Technologies & Architectures. President, Demartek

ISE 820 All Flash Array. Performance Review. Performance Review. March 2015

IPRO ecapture Performance Report using BlueArc Titan Network Storage System

Everything you need to know about flash storage performance

Hitachi Path Management & Load Balancing with Hitachi Dynamic Link Manager and Global Link Availability Manager

FAS6200 Cluster Delivers Exceptional Block I/O Performance with Low Latency

Evaluation Report: Supporting Multiple Workloads with the Lenovo S3200 Storage Array

Virtualizing Microsoft SQL Server 2008 on the Hitachi Adaptable Modular Storage 2000 Family Using Microsoft Hyper-V

High-Performance SSD-Based RAID Storage. Madhukar Gunjan Chakhaiyar Product Test Architect

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server

Deploying Affordable, High Performance Hybrid Flash Storage for Clustered SQL Server

MS Exchange Server Acceleration

Deep Dive on SimpliVity s OmniStack A Technical Whitepaper

Calsoft Webinar - Debunking QA myths for Flash- Based Arrays

How SSDs Fit in Different Data Center Applications

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

Virtualization of the MS Exchange Server Environment

Intel RAID SSD Cache Controller RCS25ZB040

WHITE PAPER Guide to 50% Faster VMs No Hardware Required

Maximizing SQL Server Virtualization Performance

Using Synology SSD Technology to Enhance System Performance. Based on DSM 5.2

Application-Focused Flash Acceleration

WHITE PAPER Optimizing Virtual Platform Disk Performance

EqualLogic PS Series Load Balancers and Tiering, a Look Under the Covers. Keith Swindell Dell Storage Product Planning Manager

Benchmarking Microsoft SQL Server Using VMware ESX Server 3.5

N /150/151/160 RAID Controller. N MegaRAID CacheCade. Feature Overview

The IntelliMagic White Paper on: Storage Performance Analysis for an IBM San Volume Controller (SVC) (IBM V7000)

The Revival of Direct Attached Storage for Oracle Databases

Pivot3 Desktop Virtualization Appliances. vstac VDI Technology Overview

HPe in Datacenter HPe3PAR Flash Technologies

Distribution One Server Requirements

All-Flash Arrays: Not Just for the Top Tier Anymore

Condusiv s V-locity 4 Boosts Virtual Machine Performance Over 50% Without Additional Hardware

Hitachi Unified Compute Platform (UCP) Pro for VMware vsphere

Q & A From Hitachi Data Systems WebTech Presentation:

WHITE PAPER 1

Achieving a High Performance OLTP Database using SQL Server and Dell PowerEdge R720 with Internal PCIe SSD Storage

Measuring Interface Latencies for SAS, Fibre Channel and iscsi

8Gb Fibre Channel Adapter of Choice in Microsoft Hyper-V Environments

LSI MegaRAID CacheCade Performance Evaluation in a Web Server Environment

HP Smart Array Controllers and basic RAID performance factors

Accelerating Microsoft Exchange Servers with I/O Caching

Flash 101. Violin Memory Switzerland. Violin Memory Inc. Proprietary 1

Transcription:

Hitachi VSP Array with HAF FLASH Performance of the Hitachi VSP HAF Flash Product - Benchmark + in Production Analysis Mark Weber, Optum Technology September 25, 2014

About the Presenter, Context, and Disclaimers IT 20 years, lots of storage performance. In CMG since the mid 90 s. User West Publishing, vendor at Xiotech, user at Optum This doc is detail-level from the trenches and is about supporting storage more than it is about inventing strategy. Specific interest in simple and functional no-nonsense solutions that solve problems. Presentation is built around a managed services environment; not HPC, not special anything. My group provides the most PB to the most users at an acceptable performance level with the smallest amount of people. Cant split hairs over a few hundred microseconds:1.3ms,1ms,.75ms its all good. Ask questions as we go Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 2

Hitachi Adds All-FLASH Storage Module Hitachi s all FLASH module is called Hitachi Accelerated Flash (HAF). HAF is MLC FLASH on a Flash Module Drive (i.e. like a disk ) HAF is fully integrated into Hitachi VSP storage array functions: Management tool support Monitoring support. Use as standard basic LDEVs or integrates seamlessly into Hitachi Dynamic Pooling / Hitachi Dynamic Tiering (HDP / HDT) as VVOLs. Full RAID level support across FMDs. Full feature support, add to pool, shrink from pool, assign to tier, etc. Can use Hitachi FLASH Acceleration Code (FA) to enhance FLASH performance. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 3

HAF test Configuration 2 DKC Hitachi VSP array, 8 VSDs 512 GB array Cache 16 x 1.6TB HAF cards, RAID6(16+2) 12 x 200GB SLC SSD drives, RAID5(3+1) 136 x 10K SAS in a Hitachi Dynamic Pool (HDP), RAID6(6+2) 8Gb connected servers 2 x dual port Brocade 16Gb adapters Brocade switch Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 4

64K Rnd Read Response Time: HAF vs Spinning Disk Random read is the quintessential workload that FLASH solves. This is 64K random read test. Under the exact same load at the lower queue depths the HAF is netting out 1000% more IO than spinning 10K disk (in RAID6, HDP) at 10% of the response time. Under deeper queue depths 550% more IOPS from HAF. NOTE THAT under the deeper queue testing the HAF was pushing 2,900 MB/sec on 4 x 8Gb HBAs, so about max MB/sec for the server ports. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 5

Rnd 64K Read Response Time HAF vs Spinning Disk - Commentary These results are good. The average IO size on UHG s Dynamic Pools is somewhat larger than you might suspect usually 32KB to over 100KB per IOP. Under the 128 queue element load this larger IO size load was running the array at 19% CPU, so a tripling of CPU should result in a possible 135,000 64K random reads. But a tripling of the corresponding 2,900 MB/sec would approach 9,000 MB/sec. I have not seen our VSPs push that much throughput although HDS has test results that reach this level. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 6

Rnd 64K Write HAF vs Spinning Disk IOPS are about 2x to 3x higher on 64K random writes using HAF. Response time is also about ½ to 1/3 with HAF. Note that because of array cache taking in server writes and buffering them up in both cases by the queue depth of 16 the Array Groups ran at 100% during the remainder of the test. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 7

32K 50505050 rwrs: HAF vs. SSD vs. 10K SAS This is my IO Fidelity test: It is complex IO with mixed read write random sequential that is not designed to find some number of max IO or MB/sec, but rather response time sensitivity at low queue depths. The HAF is 10% to 20% faster than SSD at low queue depths. All Flash is 400% or 500% faster than spinning disk here. HAF performs better than the SLC SSD drives in this test. Same or better is satisfied. There were 16 HAF FMDs under test and only 12 SSD; having 33% more drives in the case of HAF explains some of the HAF advantage with deeper loads. Spinning disk, while performing as designed, performs bad when compared to Flash, as designed. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 8

HAF and VSP VSD CPU Scalability. Port IOPS: 4 ports all evenly busy doing ~80,000 IOPS each. This array has 70-05-05 Flash Acceleration (FA) code enabled. (The FA feature is detailed in another paper HDS_VSP_FlashAccelerationTest_FA_MarkWeber_v3.pptx). Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 9

HAF Random Read @ 321,000 IOPS When And all when you you have have is a the hammer Iometer, Everything LUN looks looks like like a something nail. to test! Here is IOMeter doing 321,000 4K IOPS @ 1.6ms. Eight LUNs are used, and this load represents 64 threads * 7 = 448 concurrent IO threads. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 10

VMWARE/Citrix Write Intensive Load 1 of 2 12,000 IOP virtualization load on Production VSP We have VSP in production today with spinning disks running pools of storage that do all random writes as a result of Citrix and VMWare. This next test emulates that workload mix (IO size, r/w/r/s breakdown) and runs the IO load up to the level seen in production and then the IO response time is taken from this test for comparison purposes. CHA3 server BLDWP0098 does this load every day. 12,000 random writes (left chart), with a total random write MB/sec of 85, or an IO size of 7089, so probably 8KB writes. Write response times (right chart) on CHA3GHD1_54665 pool 4 are very good at.2ms, indicating a full cache hit. This is prod work against spinning disk in VSP (that includes array cache) but no FLASH in the array at this time. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 11

VMWARE/Citrix Write Intensive Load 2 of 2 With 5 IOMeter workers running I hit 12,200 IO @.4ms Rt. Note that my test was a full seek test, not sure of the data seek range of the real virtualization server is, it was probably not full synthetic random however (affects the amount of cache hits and page reuse). This tests shows HAF would easily support this heavily random virtualization load. To reiterate - my load was full synthetic for the random write part of the load I suspect the prod server would hit a subset of the capacity pages during a time window. The LAB HAF test also had a limited CLPR size so more IO had to go to the back end in my test. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 12

HAF and the Write Cliff No Cliff Here This is 2.5 hours of 64K full synthetic random write over the entire address space, 32 queue depth load. See the clip above, the pool was 99% filled. Write Cliff apparently happens when the Flash controller has trouble writing dirty pages to Flash and gets behind. When that happens write IO processing slows dramatically and IO drops off significantly- as if falling off a cliff. A write cliff was not seen on HAF. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 13

HAF RAID5(7+1) vs. RAID6(14+2) This is a very simple comparison of 32K 50-50-50-50 r-w-r-s IO to two AG s of eight FMD s in R5(7+1) vs one AG of 16 FMDs as R6(14+2). Generally speaking the results of R5 vs. R6(14+2) are very comparable but under deeper queue depths the extra disk cycles needed to do RAID6 eventually impact realized server IO. This test was only on 16 FMDs, so there is quite an amount of disk contention at the Queue=16 and Queue=32 data points. At Queue = 32 the R5 test was doing 909 complex server IOPS per FMD, R6 was 733 IOPS per FMD. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 14

HAF Drive Rebuild Time Under Load. 1 of 2 This test will run a control load of 70309010_48K (rwrs) at a load level sufficient to cause HAF RT to be 1ms+. One HAF will be failed, and the IO and response time levels recorded while running on parity rebuild, while running on the hot spare, and while running on the failback. 45% busy before failure Failure here This is complex IO load running @ ~ 1ms. Control load before the drive failure. AG s running in the 45% busy range. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 15

HAF Drive Rebuild Time Under Load. 2 of 2 The impact of failing a drive on running server IO was minimal as seen in this Windows Perfmon data take from the server. This sparing finished 5h after it was started, the whole time the AG s remained under 46% load. This rebuild time was impressive, since the rebuild time under idle IO conditions (100% touched) was also 5 hours. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 16

HAF and Hitachi Flash Acceleration (FA) Flash Acceleration is code that provides a deeper queue of work to Flash or SSD, along with some other optimizations. My testing of FA shows that it works. The chart to the right shows Flash Acceleration code yielding about a 10% -15% CPU improvement. UHG should enable Flash Acceleration on our VSP storage arrays that use SSD or flash. Summary of Gains with Flash Acceleration: IOPS increase with FA: 4% - 45% Response time reduction: 4% - 50% AG % busy reduction of : 0% - 10% VSD CPU % busy reduction of: 10% - 15% Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 17

HAF Integrated Into VSP Day to Day Function HAF works just like you would expect it to in the VSP array. It is like any other disk. Add to pool, shrink from pool. Create LDEV, delete LDEV. Storage Navigator LUN management, add paths, rename LUNs. Add / change CLPR (cache logical partition). Assign to / move HAF LUNs to different VSD CPU cards. Works as expected in TnM for performance reporting & alerting. Supports encryption. Array group naming/numbering/conventions are as typical with VSP. Physical ordering/naming of AGs is in FBX box is from left to right, wherein DKU orders and names disks from right to left. This fact has zero impact on usability. Full support for all RAID levels at GA. (we have tested two R5(7+1) and R6(14+2) Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 18

HAF in Production Lifecycle aka Back End Array Config: 16 FMDs (bottom right DKU (FBX)) Each of 4 Pools has 1 x R5(3+1) with 1.6 TB FMD s 4.5 TB FLASH (3.2 TB FMDs available now) Pools are 270TB total size so about 1.6% of pool capacity is FLASH Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 19

HAF @ 3.2TB? Just a thought: 3.2TB FMD s are the same processors with double the capacity. If our 1.6TB HAF FMD cards are 98% full now and 70% AG busy what happens when FMD s double in size to 3.2TB? Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 20

FLASH in Production More context: These are general purpose Managed Services arrays. We don t manage at the server level, only if we get threshold alerts. Three FTE equivalents manage array performance across 60+ arrays. +80,000 hard drives in our Hitachi arrays 40,000 servers in our environment. 3,500 Unix, 11,500 windows, 3,000 Linux, 22,000 VMs Strive for high-enough performance at a reasonable cost. Capacity Performance Cost. Pick 3. IO has gone from 25ms to 10ms to 3ms over our careers, someday to be 1ms then sub-millisecond. New frontiers are commercialization, white box, object space. The world is flipping, the world wants this stuff now. Vendor sprawl: in the last year or two the number of products to support has grown from 2 or 2 to 5 or 6. Acquisitions, politics. There might be some things in these charts is not pristine and perfect. Something is always boiling over somewhere. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 21

FLASH: The Problem We are Solving For This access pattern is somewhat universal in our environment. Random Reads in the blocks highlighted in the picture win big Not all data is created equal Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 22

Early VSP Build Strategy Go Big. Fully populated 6-bay 2048 drives 130 SSD, 200GB SLC 1918 spinning 10K 2.5 512 GB cache, for the last year now 1TB cache. What about random read cache misses? Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 23

Technology PLM: Servers vs Storage Storage perf Server perf New server CPU cores and larger RAM hit storage hard. Increase in the data capacity too. There is always a bottleneck somewhere. Is storage s answer this time higher % flash or All Flash Array (AFA)? Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 24

The FLASH is Busy Array groups from all three tiers in a pool. FLASH is the busiest. This means FLASH is working. The closer to 100% FLASH gets the happier we are. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 25

FLASH in Production VSP Behind 2 USPVs We folded 4 fully populated USP s into this VSP, behind 2 USPVs. Percent of total: FLASH is 2%, NLS is 64%. The beauty of HDT! 41% of IO to disk is serviced from the 2% of FLASH 5% of IO from the 64% of NLS IO is Packed Density on SSD, almost no density on NLS, 10K SAS in the middle On average (daily average) IO response time is good. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 26

FLASH in Production VMware Array Pretty consistent 80,000 IOPS. More writes than reads which is typical. The FLASH array groups are busiest. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 27

FLASH in Production VMware Array On average (daily average) IO response time is good. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 28

FLASH in Production Our Busiest GP Array. 65496 160,000 IOPS. 6000 MB/sec. Read and Write Response Time in the.5 to 5ms range mostly. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 29

FLASH in Production Busiest GP Array. 65496 200GB SSD s in this array. 1% of the capacity does 43% of the IOPS on the backend - That will work. Response in this case are skewed by those 21:00 response time events in the previous chart Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 30

GP Array, No FLASH On average (daily average) IO response time is sufficient, but higher than arrays with FLASH in them. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 31

Rolled Up: 27 HDS Arrays @ 22PB 27 Arrays Host IOPs ~ 1 million; or 83 billion host IOPs per day. 2% of the capacity does 31% of the work 4.6ms reads and 1ms writes across the board average seems pretty good for a large complicated environment. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 32

Summary Comments We are pleased with the contribution of FLASH to our performance environment. A lot of IO is moved from spinning disk to FLASH which lightens the load for the IO remaining on those spinning disks. Everybody is better. In the end all that matters is response time. FLASH helps here. We have upped oversubscription with a new touched goal of 80% or 90% which puts more IO on the arrays. Flash has helped us scale to other array performance limits. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 33

Thank You Contact information Mark Weber, Performance Architect 612-991-5404 Mark_weber@optum.com