Most Trusted Names in Data Centre Products Rely on Calsoft! September 2015 Calsoft Webinar - Debunking QA myths for Flash- Based Arrays
Agenda Introduction to Types of Flash-Based Arrays Challenges in Flash-Based Array Testing Key Features testing Flash Translation Layer (FTL) Testing Garbage Collection(GC) Testing Discard/Unmap testing Performance Testing Using Synthetic tools Enterprise applications Using Pre-configured System Parameters Application Testing Customer Centric Testing QA Metrics 2
Flash-Based Arrays - Types An all-flash array - also referred to solid state array - is an enterprise storage array that contains multiple SSD or solid state disks in place of spinning hard drives. The nature of flash storage allows for much faster data transfer rates as compared to traditional spinning disks. A hybrid flash array is a storage system which is a mix of flash memory drives and hard disk drives. It adds a thin slice of flash storage to an array, hereby increasing IOPS and reducing read latency 3
Challenges in Flash-Based Array Testing Understanding various features like FTL, GC, Discard for performance benchmarking Building automation framework for testing and Continuous Integration Customer centric testing understanding the user/customer environment Compliance testing White box testing using error injection 4
Challenges in Flash-Based Array Testing Challenges in Performance testing: Evaluating tools for performance benchmarking Choosing the right SSDs like SLC & MLC To generate massive yet realistic loads for stress testing Create array state that has characteristics similar to an aged flash storage array Stressing the array with realistic emulations of typical supported workloads Stressing of specific flash array features, such as snapshots, consistency groups, fail-over, replication, backups, etc. 5
Key Features: Introduction and Testing 6
What is Flash Translation Layer FTL? Flash Translation Layer (FTL) Address mapping (LBA to PBA Mapping i.e. logical to physical block addressing) Write amplification reducing algorithm Garbage collection algorithm and policies Bad block management Protection against power loss FTL are at different levels SSD/flash drive level Flash Array level (typically a software layer) 7
Flash translation layer (FTL) testing Write Amplifications(by excessive overwrites) Data Integrity Feature Testing like de-duplication, compression etc. Load, stress, scalability Boundary conditions (like device full, very small high rate IO) Performance of FTL White box using Automation scripts Error injections and handling Running static code analyzer like coverity 8
What is Garbage Collection? A process by which the SSD management layer works across the array/drive to: Identify partially filled valid data in cell(s) (write unit) Collate it Rewrite to new cell, and Delete and/or reclaim the freed cell for new write Benefits: Increases write performance by making free cells available Reduces the wear leveling by enabling erase cycle 9
Garbage Collection working a pictorial view 10
Garbage Collection(GC) testing Functionality test Performance impact using automation Changed/configured number of threads Impacts at situations like disk fill, heavy discards Testing and tuning garbage collection kick on and off time Continuous heavy IO and impact on write performance 11
Discard/Unmap/TRIM A discard command allows an operating system to inform a solidstate drive (SSD) which data blocks are no longer in use Types of Discard: Normal: Don t care read output for discarded blocks Secure: SHOULD ensure the blocks are zero ed out Discard is used as an opportunity to TRIM the data blocks on the SSDs Trimming enables reducing garbage collection overhead, otherwise it significantly slows down future write operations Enables better erase cycles increasing lifespan of SSDs SCSI provides UNMAP command (full analog of TRIM) 12
Discard/Unmap/TRIM Testing Using tools like FIO, IO-meter etc. while deleting a LUN or writing zeros on FTL Using TRIM command through windows based initiators Impact on IO while discard/unmap is in progress While SUT is loaded/stressed with other IO s verify Discard performance 13
Performance Testing 14
Load generation and performance using Synthetic tools Load generator A tool used to simulate desired load and help reveal performance issues Tools like FIO, IO-meter, vdbench Single click automation of tools for performance benchmarking Adjusting the knobs (like #of R/W threads, queue depth, bs, etc.) to simulate certain type of workloads. Deduplication testing using tools like b-test 15
Performance Benchmarking using Enterprise applications Benchmarking A defined workload and measurement methodology which does not change the characteristics Standard body offering benchmarking are SPC, SPEC, SNIA SSSI Current industry storage benchmarks have application/tools like TPC, SPC-1/2/C/E, SPEC SFS These tools generate: OLTP workloads characterstics similar to an exchange or messaging environments Distinct workloads like large file processing, video on demand, large database queries The request handling capabilities of file servers using file share protocols like NFSv3/v4 and CIFS 16
Performance with Pre-configured System Parameters IOPS published are typically 512b block-size aligned With a change in bs to 4k, 8k the IOPS drops by 20-40% Testing against unaligned bs like 11k, 13k, 17k etc. Sector unaligned IOs using tools like FIO, IO-meter etc. Change number of read/write threads, raid chunk size and bs for performance tuning 17
Application Testing Flash is best for IO or data intensive applications as it has better IOPS as compare to traditional HDD based storage systems. Applications which are data intesive are best served on flash based storage. For example :- Oracle database like Oracle RAC MS SQL, MSCS, Hyper-V clustersclusters DB2,Sybase and other database applications NoSQL Databases like membase, couchdb, etc. ERP/CRM applications like SAP etc. Applications like audio, video and image editing software 18
Customer Centric Testing Simulating customer environments ESXi clusters and its features like snapshots, clone etc VDI using login VSI MSCS and Hyper-V clusters and its features Citrix Xen and VMware horizon for different Hypervisors like Hyper- V, KVM etc. Production use case based testing Testing boot storm of 2000+ of VM s using VDI with VAAI on and off Building production hardware configurations to test Maximum number of paths by having multiple FC, iscsi switches with matrix of connections 19
QA Metrics Some extra metrics that needs to be followed Number of Test Cases Tests / KLOC Code Coverage (Coverity, lcov, gcov) Defect Density Automation Coverage Error injection and white box TCs 20
Calsoft, Inc. 4633 Old Ironsides Drive, Suite 408, Santa Clara CA 95054 Phone: (408) 834 7086