ELI: Bare-Metal Performance for I/O Virtualization

Similar documents

ELI: Bare-Metal Performance for I/O Virtualization

ELI: Bare-Metal Performance for I/O Virtualization

Machine Virtualization: Efficient Hypervisors, Stealthy Malware

Bare-Metal Performance for x86 Virtualization

Nested Virtualization

Networked I/O for Virtual Machines

The Turtles Project: Design and Implementation of Nested Virtualization

Virtualization in Linux KVM + QEMU

Architecture of the Kernel-based Virtual Machine (KVM)

CS5460: Operating Systems. Lecture: Virtualization 2. Anton Burtsev March, 2013

Cloud Operating Systems for Servers

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

The Xen of Virtualization

Virtualization Technologies

The Microsoft Windows Hypervisor High Level Architecture

BHyVe. BSD Hypervisor. Neel Natu Peter Grehan

Full and Para Virtualization

Virtual Machines. COMP 3361: Operating Systems I Winter

Nested Virtualization

Virtualization Technology. Zhiming Shen

Advanced Computer Networks. Network I/O Virtualization

Virtual Switching Without a Hypervisor for a More Secure Cloud

DPDK Summit 2014 DPDK in a Virtual World

Virtualization. Pradipta De

COS 318: Operating Systems. Virtual Machine Monitors

RDMA Performance in Virtual Machines using QDR InfiniBand on VMware vsphere 5 R E S E A R C H N O T E

Virtualizing Performance-Critical Database Applications in VMware vsphere VMware vsphere 4.0 with ESX 4.0

Virtualization. P. A. Wilsey. The text highlighted in green in these slides contain external hyperlinks. 1 / 16

Kernel Virtual Machine

Xen and the Art of. Virtualization. Ian Pratt

Distributed Systems. Virtualization. Paul Krzyzanowski

KVM Architecture Overview

Uses for Virtual Machines. Virtual Machines. There are several uses for virtual machines:

Hardware Based Virtualization Technologies. Elsie Wahlig Platform Software Architect

KVM/ARM: Experiences Building the Linux ARM Hypervisor

Virtualization. Types of Interfaces

Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking

Knut Omang Ifi/Oracle 19 Oct, 2015

Using Linux as Hypervisor with KVM

Virtual device passthrough for high speed VM networking

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Hybrid Virtualization The Next Generation of XenLinux

Kernel Optimizations for KVM. Rik van Riel Senior Software Engineer, Red Hat June

Cloud^H^H^H^H^H Virtualization Technology. Andrew Jones May 2011

How To Understand The Power Of A Virtual Machine Monitor (Vm) In A Linux Computer System (Or A Virtualized Computer)

Bridging the Gap between Software and Hardware Techniques for I/O Virtualization

How To Make A Minecraft Iommus Work On A Linux Kernel (Virtual) With A Virtual Machine (Virtual Machine) And A Powerpoint (Virtual Powerpoint) (Virtual Memory) (Iommu) (Vm) (

Virtualization in the ARMv7 Architecture Lecture for the Embedded Systems Course CSD, University of Crete (May 20, 2014)

x86 ISA Modifications to support Virtual Machines

Intel Virtualization Technology Overview Yu Ke

Virtual Machines. Virtual Machine (VM) Examples of Virtual Systems. Types of Virtual Machine

Attacking Hypervisors via Firmware and Hardware

Real-Time Virtualization How Crazy Are We?

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

Hardware accelerated Virtualization in the ARM Cortex Processors

Rackspace Cloud Databases and Container-based Virtualization

Speeding Up Packet I/O in Virtual Machines

Brian Walters VMware Virtual Platform. Linux J. 1999, 63es, Article 6 (July 1999).

Virtual machines and operating systems

High-performance vnic framework for hypervisor-based NFV with userspace vswitch Yoshihiro Nakajima, Hitoshi Masutani, Hirokazu Takahashi NTT Labs.

How To Save Power On A Server With A Power Management System On A Vsphere Vsphee V (Vsphere) Vspheer (Vpower) (Vesphere) (Vmware

kvm: the Linux Virtual Machine Monitor

Eloquence Training What s new in Eloquence B.08.00

VMware and CPU Virtualization Technology. Jack Lo Sr. Director, R&D

Exploiting The Latest KVM Features For Optimized Virtualized Enterprise Storage Performance

Virtual Machine Monitors. Dr. Marc E. Fiuczynski Research Scholar Princeton University

Nested Virtualization

Real-time KVM from the ground up

On the Feasibility of Software Attacks on Commodity Virtual Machine Monitors via Direct Device Assignment

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

RUNNING vtvax FOR WINDOWS

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

Performance Profiling in a Virtualized Environment

Basics in Energy Information (& Communication) Systems Virtualization / Virtual Machines

VMWARE WHITE PAPER 1

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Performance tuning Xen

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

The Price of Safety: Evaluating IOMMU Performance Preliminary Results

Deploying Extremely Latency-Sensitive Applications in VMware vsphere 5.5

Microkernels, virtualization, exokernels. Tutorial 1 CSC469

Beyond the Hypervisor

KVM/ARM: The Design and Implementation of the Linux ARM Hypervisor

Virtualization. Jukka K. Nurminen

Intel s Virtualization Extensions (VT-x) So you want to build a hypervisor?

Enabling Intel Virtualization Technology Features and Benefits

Taming Hosted Hypervisors with (Mostly) Deprivileged Execution

IOS110. Virtualization 5/27/2014 1

SR-IOV Networking in Xen: Architecture, Design and Implementation Yaozu Dong, Zhao Yu and Greg Rose

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality

Masters Project Proposal

Virtualization. Dr. Yingwu Zhu

KVM PERFORMANCE IMPROVEMENTS AND OPTIMIZATIONS. Mark Wagner Principal SW Engineer, Red Hat August 14, 2011

Transcription:

ELI: Bare-Metal Performance for I/O Virtualization Abel Gordon Nadav Amit Nadav Har El Muli Ben-Yehuda, Alex Landau Assaf Schuster Dan Tsafrir IBM Research Haifa Technion Israel Institute of Technology Partially supported by the European Community's Seventh Framework Programme([FP7/2001-2013]) under grant agreements number 248615 (IOLanes) and 248647 (ENCORE)

Background and Motivation Virtualization already is an integral part of our systems Virtualization overhead is high for a common subset of workloads, in particular I/O intensive workloads Overhead causes: Context switch cost (e.g. switches between the hypervisor and the guests) Indirect cost (e.g. CPU cache pollution) Handling cost (e.g. handling external interrupts) Bare-metal virtualization guest hypervisor (t) single core

I/O Intensive Workloads Best performing model: Device Assignment (SR-IOV devices) The guest has direct access to a dedicated physical device (DMA and MMIO) No hypervisor intervention except for interrupt handling Overhead still high compared to bare-metal (nonvirtual) [Adams06, Ben-Yehuda10, Landau11] Switches to the hypervisor due to external interrupts arriving from the device Switches to the hypervisor due to interrupt completions signaled by the guest Overhead is visible as [Liu10, Dong10] Lower throughput (when the CPU is saturated, usually for small messages) Higher CPU consumption (when line rate is attained, usually for big messages) Higher latency

ELI: ExitLess Interrupts guest/host context switches (exits and entries) handling cost (handling external interrupts and interrupt completions) (a) Baseline Physical Interrupt Interrupt Injection Interrupt Completion guest hypervisor (b) ELI delivery Interrupt Completion guest hypervisor (c) ELI delivery & completion guest hypervisor (d) bare-metal (time)

Related Work Polling Disables interrupts and polls the device for new events Adds latency, waste cycles, consumes power Hybrid [Dovrolis01, Mogul96, Itzkovitz99] Dynamically switches between interrupts and polling Default in Linux (NAPI) [Salim01] Hard to predict future interrupt rate Interrupt Coalescing [Zec02, Salah07, Ahmad11] Limits interrupt rate (sends only one interrupt per several events) Adds latency [Larsen09, Rumble11], might burst TCP traffic [Zec02], complex to configure and change dynamically [Ahmad11,Salah08], adds variability ELI is complementary to these approaches: (1) Removes the virtualization overhead caused by the costly exits and entries during interrupt handling (2) Lets the guest control directly the interrupt rate and latency

x86 Interrupt Handling Interrupts are asynchronous events generated by external entities such as I/O devices x86 CPUs use interrupts to notify the system software about incoming events The CPU temporarily stops the currently executing code and jumps to a prespecified interrupt handler Hardware and software identifies interrupts using vector numbers. IDTR IDT IDT Entry Handler for vector 1 Address IDT Entry Handler for vector 2 Limit IDT Entry Handler for vector n IDT Register Interrupt Descrriptor Table Interrupt handlers

x86 Interrupt Handling in Virtual Environments Raised by physical devices Interrupt handler VM Guest IDT Virtual interrupt Physical interrupt VM Exit Host IDT VM Entry Interrupt handler Created and injected by software Hypervisor Two IDTs Guest IDT: handles virtual interrupts created by the virtual hardware Host IDT: handles physical interrupts raised by the physical hardware If a physical interrupt arrives while the guest is running, the CPU forces a transition to the hypervisor context (VM Exit) Required for correct emulation and isolation

ELI: ExitLess Interrupts - Delivery Allow interrupt delivery directly to the guest IDTR Limit Configure the hardware to deliver all interrupts to the guest (CPU only supports all or nothing mode) Control which interrupts should be handled by the guest and which interrupts should be handled by the host using a shadow IDT Shadow IDT IDT Entry IDT Entry IDT Entry P=0 P=1 P=0 #NP Handler #NP ELI Delivery Guest IDT Assigned Interrupt Shadow IDT VM Interrupt Handler Hypervisor Physical Interrupt Non-assigned Interrupt (#NP/#GP exit) IDT Entry #GP

ELI: ExitLess Interrupts - Completion The guest OS signals interrupt completions writing to the Local Advance Programmable Interrupt Controller (LAPIC) End-of-Interrupt (EOI) register Old LAPIC interface The guest accesses the LAPIC registers using regular load/stores to a pre-specified memory page The hypervisor traps accesses to the LAPIC page (almost all registers) New LAPIC interface (x2apic) The guest accesses LAPIC registers using Machine Specific Registers (MSRs) The hypervisor traps accesses to MSRs (LAPIC registers) using hardware virtualization MSR bitmap capability ExitLess Completion Requires x2apic ELI gives direct access only to the EOI register

Evaluation Throughput scaled so 100% means bare-metal throughput Throughput gains over baseline device assignment are noted inside the bars CPU is saturated in the 3 benchmarks Parameters: KVM uses EPT + Huge Pages (host only) Netperf: 256B messages Apache: stressed with apachebench (4 threads requesting 4KB static pages) Memcached: stressed with memslap (4 threads / 64 concurrent requests, key size = 64B, value size = 1024B, get/set ratio = 9:1) x2apic behavior emulated on a non-x2apic hardware

Evaluation Baseline ELI Delivery ELI NetPerf Exits/s 102K 44K 0.8K Time in guest 69% 94% 99% % of bare-metal throughput 60% 92% 98% Apache Exits/s 91K 64K 1.1K Time in guest 67% 89% 98% % of bare-metal throughput 65% Memcached Exits/s 123K Time in guest 60% % of bare-metal throughput 60% 83% 123K 83% 85% 97% 1K 98% 100% ELI removed most of the exits and almost achieved bare-metal performance!

Huge Pages Netperf Apache (1)ELI significantly improved performance even without huge pages (2)Huge pages are required to achieve bare-metal performance Memcached

Computation vs. I/O ratio (modified netperf) cycles/byte = CPU frequency (2.93GHz) / throughput ELI s improvement remains high even for 50Mbps (60 cycles/byte) because NAPI and the NIC s adaptive coalescing mechanism limit the interrupt rate (interrupt rate is not always proportional to the throughput)

Interrupt Coalescing (netperf) Even using maximum coalescing supported by the NIC (96s), ELI provides 10% performance improvement

Latency Configuration Baseline ELI delivery-only ELI Bare-metal Avg. Latency 36.14µs 30.10µs 28.51µs 27.93µs % of bare-metal 129% 108% 102% 100% Netperf UDP Request Response ELI substantially reduces the time it takes to deliver interrupts to the guest, critical for latency-sensitive workloads.

Implementation Locating the shadow IDT for unmodified guests Shadow IDT must be mapped into the guest address space Use PCI BARs (MMIO regions) to force the guest OS (Linux & Windows) to map and (keep mapped) additional memory pages Write-protect the shadow IDT Injecting virtual interrupts Use the original Guest IDT to inject a virtual interrupt Nested interrupts (a higher vector can interrupt a lower vector) Check if a physical interrupt is being handled by the guest before injecting a virtual interrupt

Security, protection and isolation Threat: malicious guests might try consume interrupts, keep interrupts disabled or signal invalid completions ELI defends itself against malicious guest using multiple mechanisms: Hardware Virtualization preemption timer to force exits (instead of relying on timer interrupts) EOIs while no interrupt is being handled do not affect the system Periodically check shadow IDT mappings Protect critical interrupts Deliver to a non-eli core Send spurious Interrupts (to re-create a possible lost interrupt) Redirect as NMI (NMIs can be configured to force an exit unconditionally) Use IDTR limit (reserve highest vectors for critical host interrupts)

Future Work Reduce frequency of exits caused by para-virtual I/O devices Use ELI to send notifications from the host to the guest (running on a different cores) without forcing an exit Reduce frequency of exits caused by non-assigned interrupts Shadow the interrupt handlers and batch/delay interrupts (host interrupts or other guest interrupts) Reduce exits required to inject a virtual interrupt Use ELI to asynchronously inject virtual interrupts from a different core Improve performance of SMP workloads with high Inter-processor interrupt (IPI) rate Send and IPI directly to a vcpu running on a different core without forcing an exit

Conclusions High virtualization performance requires the CPU to spend most of the time running the guest (useful work) and not the host (handling exits=overhead) x86 virtualization requires host involvement (exits!) to handle interrupts (critical path for I/O workloads) ELI lets the guest handle interrupts directly (no exits!) and securely, making it possible for untrusted and unmodified guests reach near baremetal performance

Questions?

Backup

Injection Mode and Nested Interrupts ELI has two mode of operations: Direct Mode: physical interrupts are delivered trough the shadow IDT and do not force an exit (only if NP#). MSR Bitmap is configured to avoid exits on EOI Injection Mode: physical interrupts and EOI force an exit. Virtual interrupts are delivered through the Guest IDT The guest runs most of the time in Direct Mode The hypervisor fall-back to Inject Mode when it needs to inject a virtual interrupt After the virtual EOI exit, the hypervisor switches back to Direct Mode A higher vector can interrupt a lower vector (Nested Interrupts) Before Injecting a virtual interrupt ELI checks the CPU interrupts in service register (ISR) If the virtual interrupt has a lower vector than a physical interrupt being handled, the hypervisor delays the injection.

Breakdown

Latency Latency Netperf UDP RR