Solving the IO Bottleneck in NextGen DataCenters & Cloud Computing Are SSDs Ready for Enterprise Storage Systems Anil Vasudeva, President & Chief Analyst, Research 2007-11 Research All Rights Reserved Copying Prohibited Please write to for authorization Anil Vasudeva President & Chief Analyst imex@imexresearch.com 408-268-0800 2010 11 Research, Copying prohibited. All rights reserved.
Abstract Solving the I/O Bottleneck in NextGen Data Centers & Cloud Computing Virtualization brings tremendous advantages to data centers of all sizes large and small. But the rapid proliferation of virtual machines (VMs) per physical server, in its wake, creates a highly randomized I/O problem, raising performance bottlenecks. To address this I/O issue, the IT industry and storage administrators in particular, have begun the adoption of a slew of newer technology solutions - ranging from faster and larger memories in cache, SSDs for higher IOPS cost effectively, high bandwidth (10GbE) networks using NPIVs for virtualized networks between VMs and shared storage systems along with embedded intelligence software to optimize various VM workloads all in order to meet various SLA metrics of performance, availability, cost etc.. Are you aware of the side-effects that get created from Server Virtualization and prepared to ask pertinent questions of your suppliers of IT Infrastructure Equipment, Storage Virtualization and Data Storage Management software as well as in their implementation to achieve targeted results in performance, availability, scalability, interoperability and data management in their virtualized data centers This presentation provides an illustrative view of the impact of Server Virtualization on existing storage I/O solutions and best practices. It delineates the roles, capabilities and cost effectiveness of emerging technologies in mitigating the I/O bottlenecks so the IT infrastructure implementers can achieve their targeted performance under various workloads, from their storage systems in virtualized data centers. 2010 11 Research, Copying prohibited. All rights reserved. 2
Agenda Data Centers & Cloud Infrastructure Cloud Computing Architecture Performance Metrics by Workload Anatomy of Data Access Data Center Performance Bottlenecks Improving Query Response Time in OLTP Role of SSD in Improving I/O Perf. Gap SCM: A New Storage Class Memory SSDs Price Erosion & IOPS/GB Choosing SSD vs. Memory to Improve TPS New Storage Usage Hierarchy in NGDC & Clouds IO Bottleneck Mitigation in Virtualized Servers I/O Forensics for Auto Storage-Tiering Apps Benefitting from Improved I/O Key Takeaways Acknowledgements 3 2010 11 Research, Copying prohibited. All rights reserved.
Data Centers & Cloud Infrastructure Public CloudCenter Enterprise VZ Data Center On-Premise Cloud Vertical Clouds IaaS, PaaS SaaS Servers VPN Switches: Layer 4-7, Layer 2, 10GbE, FC Stg Supplier/Partners Remote/Branch Office Cellular Wireless ISP ISP ISP ISP Internet Core Optical Edge ISP Web 2.0 Social Ntwks. Facebook, Twitter, YouTube Cable/DSL ISP Home Networks ISP Caching, Proxy, FW, SSL, IDS, DNS, LB, Web Servers Tier-1 Edge Apps Application Servers HA, File/Print, ERP, SCM, CRM Servers FC/ IPSANs Database Servers, Middleware, Data Mgmt Tier-3 Data Base Servers Tier-2 Apps Directory Security Policy Management Middleware Platform Source:: Research - Cloud Infrastructure Report 2009-11 2010 11 Research, Copying prohibited. All rights reserved. 4
IT Industry s Journey - Roadmap SIVAC Cloudization Integration/Consolidation On-Premises > Private Clouds > Public Clouds DC to Cloud-Aware Infrast. & Apps. Cascade migration to SPs/Public Clouds. Integrate Physical Infrast./Blades to meet CAPSIMS Cost, Availability, Performance, Scalability, Inter-operability, Manageability & Security Standardization Automation Automatically Maintains Application SLAs (Self-Configuration, Self-Healing, Self-Acctg. Charges etc) Virtualization Pools Resources. Provisions, Optimizes, Monitors Shuffles Resources to optimize Delivery of various Business Services Standard IT Infrastructure- Volume Economics HW/Syst SW (Servers, Storage, Networking Devices, System Software (OS, MW & Data Mgmt SW) Source:: Research - Cloud Infrastructure Report 2009-11 2010 11 Research, Copying prohibited. All rights reserved. 5
Cloud Computing Architecture Private Cloud Enterprise Cloud Computing Hybrid Cloud Public Cloud Service Providers SaaS Applications SaaS App SLA App SLA App SLA App SLA App SLA PaaS EJB Ruby Platform Tools & Services Python.Net PHP Operating Systems.... Management IaaS Virtualization Resources (Servers, Storage, Networks) Application s SLA dictates the Resources Required to meet specific requirements of Availability, Performance, Cost, Security, Manageability etc. 2010 11 Research, Copying prohibited. All rights reserved.
Performance Metrics by Workload IOPS* (*Latency-1) 1000 K 100 K 10K 1K 100 Transaction Processing (RAID - 1, 5, 6) TP OLTP ecommerce Data Warehousing HPC Business Intelligence OLAP Scientific Computing Imaging Audio Video (RAID - 0, 3) HPC Web 2.0 10 1 5 10 50 *IOPS for a required response time ( ms) *=(#Channels*Latency-1) 100 MB/sec 500 Source:: Research - Cloud Infrastructure Report 2009-11 2010 11 Research, Copying prohibited. All rights reserved.
Anatomy of Data Access Performance Server Processor Disk I/O I/O Gap For the time it takes to do each Disk Operation: - Millions of CPU Operations can be done - Hundreds of Thousands of Memory Operations can be accomplished 1980 1990 2000 2010 Anatomy of Data Access Time taken by CPU, Memory, Network, Disk for a typical I/O Operation during a Data Access A 7.2K/15k rpm HDD can do 100/140 IOPS 2010 11 Research, Copying prohibited. All rights reserved.
Data Center Performance Bottlenecks Applications Excessive Locking Data Contention I/O Delays/Errors Network I/O Network Congestion Dropped packets Data Retransmissions Timeouts Component Failures Storage I/O Connect Lack of Bandwidth Overloaded PCIe Connect Storage Device Contention LAN Access Networks Storage I/O Access Clients Windows, Linux, Unix Servers Web Servers Application Servers Database Servers Storage Web, Application, Database User Bottlenecks Connectivity Timeouts, Workload Surges Server Bottlenecks Lack of Srvr Power IO Wait & Queuing CPU Overhead I/O Timeouts Device Bottlenecks Device I/O Hotspots Cache Flush Lack of Storage Capacity 2010 11 Research, Copying prohibited. All rights reserved.
Improving Query Response Time in OLTP Query Response Time (ms) 0 8 6 4 2 HDDs 14 Drives $$ HDDs 112 Drives w short stroking $$$$$$$$ Hybrid HDDs 36 Drives + SSDs $$$$ SSDs 12 Drives Improving Query Response Time Cost effective way to improve Query response time for a given number of users or servicing an increased number of users at a given response time is best served with use of SSDs or Hybrid (SSD + HDDs) approach, particularly for Database and Online Transaction Applications $$$ Conceptual Only -Not to Scale 0 10,000 20,000 30,000 40,000 IOPS (or Number of Concurrent Users) Source: Research SSD Industry Report 2011 2010 11 Research, Copying prohibited. All rights reserved. 10
Role of SSD in Improving I/O Perf. Gap Price $/GB NAND NOR SCM DRAM CPU SDRAM PCIe SSD DRAM getting Faster (to feed faster CPUs) & Larger (to feed Multi-cores & Multi-VMs from Virtualization) Servers Tape HDD SATA SSD Hybr. Storage HDD becoming Cheaper, not faster SSD segmenting into PCIe SSD Cache - as backend to DRAM & SATA SSD - as front end to HDD Source: Research SSD Industry Report 2011 Performance I/O Access Latency 11 2010 11 Research, Copying prohibited. All rights reserved.
Performance Data Reliability - Environment Volumetric Resistance SCM: A New Storage Class Memory SCM (Storage Class Memory) Solid State Memory filling the gap between DRAMs & HDDs Marketplace segmenting SCMs into SATA and PCIe based SSDs Key Metrics Required of Storage Class Memories Device - Capacity (GB), Cost ($/GB), - Latency (Random/Block RW Access-ms); Bandwidth BW(R/W- GB/sec) Integrity - BER (Better than 1 in 10^17) - Write Endurance (30K PE Cycles No. of writes before death); Data Retention (5 Years); MTBF (2 millions of Hrs), - Power Consumption (Watts); Density (TB/cu.in.); Power On/Off Time (sec), - Shock/Vibration (g-force); Temp./Voltage Extremes 4-Corner (oc,v); Radiation (Rad) 2010 11 Research, Copying prohibited. All rights reserved. 12
SSDs - Price Erosion & IOPS/GB 600 8 HDD IOPS/GB SSDs IOPS/GB 300 SSD SSD Units (M) Enterprise HDD Units (M) 4 Units (Millions) 0 IOPS/GB HDDs 2009 2010 2011 2012 2013 2014 Note: 2U storage rack, 2.5 HDD max cap = 400GB / 24 HDDs, de-stroked to 20%, 2.5 SSD max cap = 800GB / 36 SSDs 0 Source: Research SSD Industry Report 2011 2010 11 Research, Copying prohibited. All rights reserved.
Choosing SSD vs. Memory to Improve TPS 2500 2000 1500 TPS 1000 2.5 x 4.0 x SSD PCIe 500 0 4.0 x SSD SATA 2.5 x 0 2 4 6 8 10 12 14 16 18 20 Buffer Pool (Memory) GB HDD RAID 10 2010 11 Research, Copying prohibited. All rights reserved.
Data Storage Usage Patterns Data Access vs. Age of Data 80% of IOPs 80% of TB Data Access Performance Data Protection Scale Cost Data Reduction Storage Growth SSDs 1 Day 1 Week 1 Month 2 Mo. 3 Mo. 6 Mo. 1 Year 2 Yrs Age of Data Source:: Research - Cloud Infrastructure Report 2009-11 2010 11 Research, Copying prohibited. All rights reserved.
New Storage Hierarchy in NGDC & Clouds I/O Access Frequency vs. Percent of Corporate Data 95% 75% 65% % of I/O Accesses SSD Logs Journals Temp Tables Hot Tables FCoE/ SAS Arrays Tables Indices Hot Data Cloud Storage SATA Back Up Data Archived Data Offsite DataVault 1% 2% 10% 50% 100% % of Corporate Data Source:: Research - Cloud Infrastructure Report 2009-11 2010 11 Research, Copying prohibited. All rights reserved. 16
IO Bottleneck Mitigation in Virtualized Servers vsphere ESX VM Client 1 VM Client 2 VM Client n XCL Mgr. I/O Reg. XCL Driver I/O Reg. XCL Driver I/O Reg. XCL Driver Disk Controller. XCL VLUN SSD Driver ESX Kernel Primary Storage Offloading IOPS from Primary Storage Both Applications & Storage Run Faster SSD Drive w/esx Driver 2010 11 Research, Copying prohibited. All rights reserved.
I/O Forensics for Auto Storage-Tiering Storage-Tiered Virtualization Storage-Tiering at LBA/Sub-LUN Level Physical Storage Logical Volume SSDs Arrays Hot Data HDDs Arrays Cold Data LBA Monitoring and Tiered Placement Every workload has unique I/O access signature Historical performance data for a LUN can identify performance skews & hot data regions by LBAs Source: IBM & Research SSD Industry Report 2011 2010-11 2010 11 Research, Copying prohibited. All rights reserved. 18
Apps Benefitting from Improved I/O Smart Mobile Devices Commercial Visualization Bioinformatics & Diagnostics Decision Support Bus. Intelligence Entertainment- VoD / U-Tube Data: Research & Panasas 19 Instant On Boot Ups Rugged, Low Power 1GB/s, ms Rendering (Texture & Polygons) Very Read Intensive, Small Block I/O Data Warehousing Random IO, High OLTPM Most Accessed Videos Very Read Intensive 10 GB/s, ms 1GB/s, ms 4 GB/s, ms 2010 11 Research, Copying prohibited. All rights reserved.
Key Takeaways Solving I/O Problems I/O Bottlenecks occur at multiple places in the Compute Stack, the largest being at Storage I/O SSD comes out cheaper/iop for IO Intensive Apps To get of Reads Improve Indexing, archive out old data Minimize the impact of writes Get rid of temp tables/filesorts on slow disks. Compress big varchar/text/blobs Data Forensics and Tiered Placement Every workload has unique I/O access signature Historical performance data for a LUN can identify performance skews & hot data regions by LBAs Use Smart Tiering to identify hot LBA regions and non-disruptively migrate hot data from HDD to SSDs. Typically 4-8% of data becomes a candidate and when migrated to SSDs can provide response time reduction of ~65% at peak loads 2010 11 Research, Copying prohibited. All rights reserved.