System Architecture for Are SSDs Ready for Enterprise Storage Systems In-Memory Database Anil Vasudeva, President & Chief Analyst, Research 2007-13 Research All Rights Reserved Copying Prohibited Contact for authorization Anil Vasudeva President & Chief Analyst imex@imexresearch.com 408-268-0800 2010-12 Research, Copying prohibited. All rights reserved.
2010-12 Research, Copying prohibited. All rights reserved. 2 IT Industry Dynamics - Roadmap IT Industry Roadmap Source: Research Cloudization On-Premises > Private Clouds > Public Clouds DC to Cloud-Aware Infrast. & Apps. Cascade migration to SPs/Public Clouds. Integration/Consolidation Integrate Physical Infrast./Blades to meet CAPSIMS Cost, Availability, Performance, Scalability, Inter-operability, Manageability & Security Standardization Virtualization Pools Resources. Provisions, Optimizes, Monitors Shuffles Resources to optimize Delivery of various Business Services Standard IT Infrastructure- Volume Economics HW/Syst SW (Servers, Storage, Networking Devices, System Software (OS, MW & Data Mgmt. SW) Analytics BI Predictive Analytics - Unstructured Data From Dashboards Visualization to Prediction Engines using Big Data. Automation/SDDC Automatically Maintains Application SLAs (Self-Configuration, Self-Healing, Self-Acctg. Charges etc.) 2
IT Infrastructure: DataCenters & Cloud Public CloudCenter Enterprise VZ Data Center On-Premise Cloud Vertical Clouds IaaS, PaaS SaaS Servers VPN Switches: Layer 4-7, Layer 2, 10GbE, FC Stg Supplier/Partners Remote/Branch Office Cellular Wireless ISP ISP ISP ISP Internet Core Optical ISP Edge Web 2.0 Social Ntwks. Facebook, Twitter, YouTube Cable/DSL ISP Home Networks ISP Caching, Proxy, FW, SSL, IDS, DNS, LB, Web Servers Tier-1 Edge Apps Application Servers HA, File/Print, ERP, SCM, CRM Servers Tier-2 Apps Directory Security Policy Middleware Platform FC/IPSANs, NAS Database Servers, Middleware, Data Mgmt Tier-3 Data Base Servers Management Source:: Research - Cloud Infrastructure Report 2009-11 2010-12 Research, Copying prohibited. All rights reserved. 3 3
2010-12 Research, Copying prohibited. All rights reserved. 4 IT Industry Dynamics - Roadmap IT Industry Roadmap Source: Research Cloudization On-Premises > Private Clouds > Public Clouds DC to Cloud-Aware Infrast. & Apps. Cascade migration to SPs/Public Clouds. Integration/Consolidation Integrate Physical Infrast./Blades to meet CAPSIMS Cost, Availability, Performance, Scalability, Inter-operability, Manageability & Security Standardization Virtualization Pools Resources. Provisions, Optimizes, Monitors Shuffles Resources to optimize Delivery of various Business Services Standard IT Infrastructure- Volume Economics HW/Syst SW (Servers, Storage, Networking Devices, System Software (OS, MW & Data Mgmt. SW) Analytics BI Predictive Analytics - Unstructured Data From Dashboards Visualization to Prediction Engines using Big Data. Automation/SDDC Automatically Maintains Application SLAs (Self-Configuration, Self-Healing, Self-Acctg. Charges etc.) 4
2010-12 Research, Copying prohibited. All rights reserved. 5 Workloads: Mapped on Infrastructure Metrics IOPS* (*Latency-1) 1000 K 100 K 10K 1K Transaction Processing (RAID - 1, 5, 6) OLTP/Database ecommerce Data Warehousing Big Data/ Bus.Intelligence OLAP Scientific Computing Imaging HPC (RAID - 0, 3) 100 Audio Video Data Streaming 10 1 5 10 50 *IOPS for a required response time ( ms) *=(#Channels*Latency-1) 100 MB/sec Workloads need Infrastructure Optimized for Cost, Availability, Performance 500
Workloads: I/O Characteristics Storage performance, management and costs are big issues in running Databases Data Warehousing Workloads are I/O intensive Predominantly read based with low hit ratios on buffer pools High concurrent sequential and random read levels Sequential Reads requires high I/O Bandwidth (MB/sec) Random Reads require high IOPS Write rates driven by life cycle management and sort operations OLTP Workloads are strongly random I/O intensive Random I/O is more dominant Read/write ratios of 80/20 are most common but can be 50/50 Difficult to build out test systems with sufficient I/O characteristics Batch Workloads (Hadoop) are more write intensive Sequential Writes requires high I/O Bandwidth (MB/sec) Backup & Recovery times are critical for these workloads Backup operations drive high level of sequential IO Recovery operation drives high levels of random I/O Source: Research SSD Industry Report 2011 6
Driver : Need Real Time Analytics 7
2010-12 Research, Copying prohibited. All rights reserved. 8 Issue: Server to Storage I/O Gap Memory - MultiSlots Performance I/O Gap 1980 1990 2000 2010 Connect to Storage as Direct Attached Storage (Internal Storage) For Each Disk Operation, Millions of CPU Opns. or Thousands of Memory Opns. can be accomplished PCIe used for HBAs to connect to (External Shared Storage) via Storage Switches/Fabric as SAN/NAS on HDDs front ended by DRAM Cache creating a gap of 100,000x latency gap A 7.2K/15k rpm HDD can do 100/140 IOPS
Solution: SSDs Improving DB Query Responses Query Response Time (ms) 8 6 4 2 0 HDDs 14 Drives $$ HDDs 112 Drives w short stroking $$$$$$$$ Hybrid HDD/SSD 36 Drives $$$$ IOPS (or Number of Concurrent Users) SSDs 12 Drives $$$ Conceptual Only Not to Scale 0 10,000 20,000 30,000 40,000 For a targeted query response time in DB & OLTP applications, many more concurrent users can be added cost-effectively when using SSDs or SSD + HDDs storage vs. adding more HDDs or short-stroking HDDs Source: Research SSD Industry Report 2011 2010-11 Research, Copying prohibited. All rights reserved. 9
2010-12 Research, Copying prohibited. All rights reserved. 10 Industry Trends: Impact on Infrastructure Systems & Services Market Revenues $B 2011 2012 2013 2014 2015 2016 Services Software Network Storage Servers Industry Dynamics Serviceable PCIe SSDs Lower Cost DRAM BYOD / Boot Storms Big Data/RealTm Analytics Server/Stg. Price/Perf. Cloud Computing Multicores Power Efficiency Virtualization Scale Out Clustering Dense Blades Impact on Infrastructure Same Connector - PCIe+SAS/SATA NVMe based Flash Memory Client Images on Servers PCIe Servers based SSD New Storage Class Memory New Protocols / REST, HTTP.. Flash for Concurrent Multitasking Green Memory New Infrastrecture for Multi-VMs Distributed Memory Architecture Fast, Low Power Memory 64 bit Computing Larger Size Memories Workloads need Infrastructure - Optimized for Cost, Availability, Performance
2010-12 Research, Copying prohibited. All rights reserved. 11 Solution: SSDs Filling Price/Perf Gap Price $/GB CPU SDRAM NAND NOR SCM DRAM PCIe SSD DRAM getting Faster (to feed faster CPUs) & Larger (to feed Multi-cores & Multi-VMs from Virtualization) Tape HDD SATA SSD HDD becoming Cheaper, not faster Source: Research SSD Industry Report 2010-12 SSD segmenting into PCIe SSD Cache - as backend to DRAM & SATA SSD - as front end to HDD Performance I/O Access Latency Best Opportunity to fill the gap is for storage to be close to Server CPU. 11
Innovations Roadmap: HW Technologies 12
Innovations Roadmap DB SW Technologies OLTP Database Innovation Progress EDW/Big Data Database Innovation Progress 10^5 10,000 100,000 10^5 10^4 1,000 10,000 10^4 10^3 10^2 $/TPMc TPMc/ Processor 100 10 1,000 100 $/GB Data Warehouse Size -TB 10^3 10^2 10^1 10 10^0 10^1 1 1 10^-1 10^0 0.1.1 10^-2 10^-3 10^-1 1985 1990 1995 2000 2005 2010 2015 0.01.01 0 1985 1990 1995 2000 2005 2010 2015 10^-4 13
2010-12 Research, Copying prohibited. All rights reserved. 14 In-Memory Computing
Technology: Disk vs InMem DB Architecture 15
Technology: RDBMS vs. In-Memory DBMS 16
Technology:Legacy vs In-Memory Computing 17
Oracle vs SAP vs IBM DB 18
Competition: Oracle DB Architecture 19
Competition: SAP/HANA (Multi-Applications) A Converged DB System In-memory database combining transactional data processing, analytical data processing, and application logic processing functionality in memory. A full DBMS with a standard SQL interface, high availability, transactional isolation and recovery (ACID properties) both row-based and column-based stores within the same engine (row-based storage is good for transactional applications, while column-based storage is better for reports and analytics. Column-based storage compresses the better too.) massively parallel execution using multicore processors, SAP HANA optimizes the SQL which scales well with the number of cores. Aggregation operations by spawning a number of threads that act in parallel, each of which has equal access to the data resident on the memory on that node Additional functions - freestyle search (as SQL extensions). BI applications using MDX for Microsoft Excel & Consumer Services plus internal I/F for BusinessObjects prepackaged algorithms in the predictive analysis library of SAP HANA to perform advanced statistical calculations built-in text support, from its predecessor BI Accelerator that was based on the TREX search engine and Inxight functionality integrated into HANA text functions. 20
2010-12 Research, Copying prohibited. All rights reserved. 21 Competition: SAP/HANA (Multi-Applications) supports distribution across hosts, where large tables may be partitioned to be processed in parallel. DB engine of the SAP HANA Analytics appliance as well HANA s combination of a row and column store is fundamentally different from any other database engine on the market today, which allows it to perform OLTP and analytics processing in memory, at the same time. Avoids CPU waiting info from Memory through its unique CPU-cache-aware algorithms and data structures that there is as much useful data in the CPU caches as possible,. it uses late materialization to decompress columnar structures as late as possible, or to run operations directly on the compressed data also sold as an appliance on Intel Xeon CPUs leveraging insights into Intel s HyperThreading, Turbo Boost and Threading Building Blocks High Performance Analytic Appliance can perform large-scale data analyses on 500 billion records in less than a minute, taking analytics to an entirely new dimension represents a complete data warehouse in RAM, and as a result, much accelerated realtime analytics..
Technology: In-Memory Computing 22
Trends: In-Memory Computing Adoption 23
2010-12 Research, Copying prohibited. All rights reserved. 24 Trends: In-Memory DB Computing Primary DB for Each Application Oracle DB IBM DB2 MS SQL Srvr Open Src DB Other DB Don't Know ERP 61 9 15 4 2 9 Finance & Accounting 61 8 10 4 5 12 Order Mgmt 48 9 19 4 6 14 HR Mgmt 60 4 16 1 4 15 SCM 62 10 6 3 3 16 CRM 41 11 21 3 6 18 Project Mgmt 39 10 21 1 7 22 Info & Knowledge 32 9 26 4 8 21 Enterprise Asset Mgmt 51 7 12 3 4 23 SRM 50 6 11 2 5 26
System Architecture for Are SSDs Ready for Enterprise Storage Systems In-Memory Database Anil Vasudeva, President & Chief Analyst, Research 2007-13 Research All Rights Reserved Copying Prohibited Contact for authorization Anil Vasudeva President & Chief Analyst imex@imexresearch.com 408-268-0800 2010-12 Research, Copying prohibited. All rights reserved.