High Performance OpenStack Cloud Eli Karpilovski Cloud Advisory Council Chairman
Cloud Advisory Council Our Mission Development of next generation cloud architecture Providing open specification for cloud infrastructures Use existing infrastructure to extend business Publications of best practices for optimizing cloud efficiency and utilization Enable ease-of-use with comprehensive cloud management and tools Publish cloud best practices Provide IT and application managers with cloud tools Design, architect, use, development Strengthen the qualification and integration of cloud solutions Drive standards across industry Cloud Providers feedback regarding their direction and priorities around cloud standards development 2012 CLOUD ADVISORY COUNCIL 2
Cloud Advisory Council - Board Members Eli Karpilovski - Cloud Advisory Chairman Paul Rad - High performance Cloud Group Chair Kenny Li - Cloud Advisory Council Group Chair for Cloud Performance David Fishman - Cloud Advisory Council Co-Chairman for Open Source Eco-system Brian Sparks - Cloud Advisory Council Media Relations Director 2012 CLOUD ADVISORY COUNCIL 3
Exponential Data Growth Best Interconnect Required 0.8 Zettabyte 2009 44X 35 Zettabyte 2020 Source: IDC 2012 CLOUD ADVISORY COUNCIL 4
The Power of Data Data-Intensive Simulations Internet of Things National Security Healthcare Smart Cars Congestion-Free Traffic Business Intelligence 2012 CLOUD ADVISORY COUNCIL 5
The Freedom to Choose Your Cloud Stack Open Compute Project Open Daylight Management of Choice Software of Choice Open Source Hardware Closed & Proprietary Open Platform 2012 CLOUD ADVISORY COUNCIL 6
OpenStack: Open Source Cloud APIs 2012 CLOUD ADVISORY COUNCIL 7
OpenStack IaaS Interfaces SERVICE INTERFACE PROJECT EXAMPLE USE CASE Compute Nova Configure VMs Block Storage Cinder Set and assign persistent block level storage Network Neutron Manage networks and IP addresses Object Store Swift Horizontal scaling of object/file storage Images Glance Disk and server images discovery, registration, etc. Dashboard Horizon GUI for admins to control cloud operations Authentication Keystone Common auth across components of cloud 2012 CLOUD ADVISORY COUNCIL 8
OpenStack IaaS Logical Architecture 2012 CLOUD ADVISORY COUNCIL 9
OpenStack Open Source Components OpenStack Linux KVM etc. Python Ruby Puppet Cobbler mcollective/salt RabbitMQ... Trust me I m also a networking Guru 2012 CLOUD ADVISORY COUNCIL 10
What does OpenStack run on? Standard Hardware 2012 CLOUD ADVISORY COUNCIL 11
Data Must Always Be Accessible and at Real-Time Sensor Data Compute Storage Archive CPU CPU Lower Latency, Higher Bandwidth, RDMA, Offloads, NIC/Switch Routing, Overlay Networks Smart Interconnect Required to Unleash The Power of Data 2012 CLOUD ADVISORY COUNCIL 12
Remote Direct Memory Access (RDMA) Advantages ZERO Copy Remote Data Transfer USER KERNEL HARDWA RE Application Buffer Kernel Bypass Application Buffer Protocol Offload Low Latency, High Performance Data Transfers InfiniBand - 56Gb/s RoCE* 40Gb/s * RDMA over Converged Ethernet 2012 CLOUD ADVISORY COUNCIL 13
HARDWARE KERNEL USER RDMA How it Works Application 1 Buffer 1 Buffer 1 Application 2 Buffer 1 OS Buffer 1 Buffer 1 Buffer 1 OS RDMA over InfiniBand or Ethernet HCA HCA NIC Buffer 1 Buffer 1 NIC TCP/IP RACK 1 RACK 2 2012 CLOUD ADVISORY COUNCIL 14
3 Ways to Introduce RDMA in a Virtualized Environment VM Interfaces Eth VM VM VM VM 1 OS SCSI VF driver Paravirtualization SR-IOV (IB/ Eth) 3 Hypervisor vswitch SCSi Midware Virtual Function Adapter Driver 1 eipoib 2 iser (IB/Eth) Adapter HW Physical Function 2012 CLOUD ADVISORY COUNCIL 15
What is iser (iscsi Extension for RDMA) iscsi over RDMA solution InifiniBand or RoCE Comprehensive storage networking and management capabilities derived from iscsi Discovery, naming, security, error-recovery, booting, etc. Leverage on the wide following of iscsi OS code and storage products Management tools and standard interfaces Standardization, Testing and protocol maturity 2012 CLOUD ADVISORY COUNCIL 16
iscsi Mapping to iser / RDMA Transport Protocol frames (RDMA) iscsi PDU BHS AHS HD X Data DD X In HW In HW RC Send RC RDMA Read/Write iser eliminates the traditional iscsi/tcp bottlenecks : Zero copy using RDMA CRC calculated by hardware Work with message boundaries instead of streams Transport protocol implemented in hardware (minimal CPU cycles per IO) 2012 CLOUD ADVISORY COUNCIL 17
Gbps RDMA Accelerate OpenStack Storage VM OS RDMA Accelerate iscsi Storage Compute Servers VM OS VM OS Hypervisor (KVM) Open-iSCSI w iser Adapter Switching Fabric Storage Servers OpenStack (Cinder) iscsi/iser Target (tgt) RDMA Adapter Cache Local Disks 6 5 4 3 2 1 0 Cinder / Volume Storage Performance * 1.3 iscsi over TCP 5.5 iser * iser patches are available on OpenStack branch: https://github.com/mellanox/openstack Utilizing OpenStack Built-in components/management - Open-iSCSI, tgt target, Cinder To accelerate Storage Access 2012 CLOUD ADVISORY COUNCIL 18
IPoIB - Applications see IP/Ethernet over KVM VM Compute Servers VM VM VM IP Header IP Data TCP/UDP TCP/IP IP Packet Guest OS Guest OS Guest OS Hypervisor (KVM) Mellanox Driver ConnectX-3 VPI Guest OS Hypervisor Driver (eipoib) Ethernet Header IP Header IP Header IP Data TCP/UDP IP Data TCP/UDP Ethernet Ethernet Frame eipoib mapper Hypervisor Driver (IPoIB) IPoIB Header IP Header IP Data TCP/UDP Push IPoIB Heade r IP Packet IPoIB SwitchX VPI Switch & Gateway VPI: Standard Ethernet & InfiniBand Wire InfiniBand Header IPoIB Header IP Header IP Data TCP/UDP CRC IPoIB Packet IB layer 2012 CLOUD ADVISORY COUNCIL 19
Single Root I/O Virtualization (SR-IOV) PCIe device presents multiple instances to the OS/Hypervisor Enables Application Direct Access (ADA) Reduces CPU overhead and improves application performance Eliminates virtualization penalty with RDMA & ADA Low latency applications benefit from the Virtual infrastructure VM Virtual NIC VM VF Device Driver VM VF Device Driver VMn VF Device Driver Physical Function Device Driver PF VF VF VF 2012 CLOUD ADVISORY COUNCIL 20
Latency (us) Throughput (Gb/s) SR-IOV Boosts Ethernet Performance SR-IOV Accelerates RoCE Enables native RoCE performance in virtualized environments RoCE - SR-IOV Latency RoCE SR-IOV Throughput 3 2.5 40 2 1.5 1 30 20 0.5 0 1 VM 2 VM 4 VM 8 VM Message Size 2B Message Size 16B Message Size 32B 10 1 VM 2 VM 4 VM 8 VM 16 VM Throughput (Gb/S) No Performance Compromise in Virtualized Environment 2012 CLOUD ADVISORY COUNCIL 21
Latency [us] Single Root IO Virtualization - Latency Performance Comparison 90 80 70 60 VM to VM (same machine) - TCP PV VM to VM (2 machines) - TCP PV VM to VM (same machine) - RDMA VM to VM (2 machines) - RDMA 50 40 30 20 10 20x lower latency than a vnic 0 16 64 256 1024 4096 16384 Message Size [Bytes] SR-IOV Virtualization with bare metal latency 2012 CLOUD ADVISORY COUNCIL 22
Network Virtualization Evolution in Deployment Models PURE OVERLAY Various tunneling protocols run directly on hypervisor (VXLAN, NVGRE, etc.) HYBRID NETWORK VIRTUALIZATION Combination of OpenFlow on Switch and Overlay on Hypervisor PURE OPENFLOW OpenFlow in physical switches everywhere 2012 CLOUD ADVISORY COUNCIL 23
Performance Hybrid Network Virtualization Approach: Big Challenges Impact of Software Based Tunneling Expected Performance Number of Workloads 2012 CLOUD ADVISORY COUNCIL 24
Distributed, Accelerated Hybrid Network Virtualization VM OS VM OS VM OS VM OS Servers Cloud Management OpenStack Manager 10/40GbE or InfiniBand Ports tap tap Embedded Switch Paravirtual SR- IOV to the VM Neutron Agent Create/delete, configure policy per VM vnic OpenFlow Agent Neutron ML2 SDN Applications SDN Controller OpenFlow 1.0 support for NIC and Switch Real-time provisioning via OpenFlow OpenFlow Counters / Statistics, drop/allow/mirror Ingress ACLs 2012 CLOUD ADVISORY COUNCIL 25
eswitch & SR-IOV Integrated Adapter Technology Performance Impact Hardware QoS Delivers Significant Performance Improvement 2012 CLOUD ADVISORY COUNCIL 26
Acceleration of Overlay Networks OpenFlow Virtual Network Management API Virtual View NVGRE/VXLAN Overlay Networks Virtual Domain 1 Virtual Domain 2 Virtual Domain 3 Virtual Overlay Networks Simplifies Management and VM Migration ConnectX-3 Pro Overlay Accelerators Enable Bare Metal Performance Physical View VM1 VM2 VM3 VM4 Server VM5 VM6 VM7 VM8 Server Mellanox SDN Switches & Routers Overlay Network Virtualization: Isolation, Simplicity, Scalability 2012 CLOUD ADVISORY COUNCIL 27
Cloud Overlay Acceleration Results With NIC Hardware Offload 65% Improvement 79% Improvement Higher Is Better Lower Is Better NVGRE Initial Results Higher Throughput for Less CPU Transport Overhead 2012 CLOUD ADVISORY COUNCIL 28
HPC OpenStack Cloud Infrastructure Benefits Storage 6x performance improvement switching from iscsi TCP to iser (RDMA) Compute RDMA Support for applications Ethernet or InfiniBand Full support for all native Ethernet security and isolation features SRIOV to provide a faster path to bare metal performance 20x performance improvement VM to VM connectivity Support of bridging from InfiniBand to Ethernet Controller Neutron plugins to support seamless integration with Folsom, Grizzly and Havana 2012 CLOUD ADVISORY COUNCIL 29