Bare-Metal Performance for x86 Virtualization



Similar documents
Machine Virtualization: Efficient Hypervisors, Stealthy Malware

ELI: Bare-Metal Performance for I/O Virtualization

ELI: Bare-Metal Performance for I/O Virtualization

The Turtles Project: Design and Implementation of Nested Virtualization

ELI: Bare-Metal Performance for I/O Virtualization

Toward Exitless and Efficient Paravirtual I/O

Networked I/O for Virtual Machines

Uses for Virtual Machines. Virtual Machines. There are several uses for virtual machines:

Nested Virtualization

Nested Virtualization

Virtualization in Linux KVM + QEMU

COS 318: Operating Systems. Virtual Machine Monitors

Full and Para Virtualization

Virtualization Technology. Zhiming Shen

How To Make A Minecraft Iommus Work On A Linux Kernel (Virtual) With A Virtual Machine (Virtual Machine) And A Powerpoint (Virtual Powerpoint) (Virtual Memory) (Iommu) (Vm) (

Virtualization. ! Physical Hardware. ! Software. ! Isolation. ! Software Abstraction. ! Encapsulation. ! Virtualization Layer. !

Nested Virtualization

Intel s Virtualization Extensions (VT-x) So you want to build a hypervisor?

Cloud^H^H^H^H^H Virtualization Technology. Andrew Jones May 2011

Virtualization. P. A. Wilsey. The text highlighted in green in these slides contain external hyperlinks. 1 / 16

Virtualization. Dr. Yingwu Zhu

Using Linux as Hypervisor with KVM

Virtual Machines. COMP 3361: Operating Systems I Winter

Hardware Based Virtualization Technologies. Elsie Wahlig Platform Software Architect

kvm: Kernel-based Virtual Machine for Linux

Hypervisors and Virtual Machines

The Price of Safety: Evaluating IOMMU Performance Preliminary Results

Virtualization. Clothing the Wolf in Wool. Wednesday, April 17, 13

CS5460: Operating Systems. Lecture: Virtualization 2. Anton Burtsev March, 2013

Virtualization. Types of Interfaces

The Microsoft Windows Hypervisor High Level Architecture

Intel Virtualization Technology Overview Yu Ke

Introduction to Virtual Machines

Virtualization. P. A. Wilsey. The text highlighted in green in these slides contain external hyperlinks. 1 / 16

BHyVe. BSD Hypervisor. Neel Natu Peter Grehan

Taming Hosted Hypervisors with (Mostly) Deprivileged Execution

Microkernels, virtualization, exokernels. Tutorial 1 CSC469

Virtualization Technologies

Virtual Machine Monitors. Dr. Marc E. Fiuczynski Research Scholar Princeton University

KVM Architecture Overview

How To Understand The Power Of A Virtual Machine Monitor (Vm) In A Linux Computer System (Or A Virtualized Computer)

Chapter 5 Cloud Resource Virtualization

Hybrid Virtualization The Next Generation of XenLinux

Cloud Operating Systems for Servers

Knut Omang Ifi/Oracle 19 Oct, 2015

Beyond the Hypervisor

Virtual machines and operating systems

x86 ISA Modifications to support Virtual Machines

Distributed Systems. Virtualization. Paul Krzyzanowski

Virtualization for Cloud Computing

PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE

Virtualization. Explain how today s virtualization movement is actually a reinvention

Virtualization. Jia Rao Assistant Professor in CS

Brian Walters VMware Virtual Platform. Linux J. 1999, 63es, Article 6 (July 1999).

Cloud Computing. Dipl.-Wirt.-Inform. Robert Neumann

matasano Hardware Virtualization Rootkits Dino A. Dai Zovi

Xen and the Art of. Virtualization. Ian Pratt

x86 Virtualization Hardware Support Pla$orm Virtualiza.on

Cloud Computing #6 - Virtualization

The Xen of Virtualization

Performance tuning Xen

KVM: A Hypervisor for All Seasons. Avi Kivity avi@qumranet.com

IOMMU: A Detailed view

COS 318: Operating Systems. Virtual Machine Monitors

CS 695 Topics in Virtualization and Cloud Computing. More Introduction + Processor Virtualization

Kernel Virtual Machine

System Virtual Machines

Virtualization. Pradipta De

Network Interface Virtualization: Challenges and Solutions

Virtualization. Jukka K. Nurminen

A Hypervisor IPS based on Hardware assisted Virtualization Technology

Compromise-as-a-Service

kvm: the Linux Virtual Machine Monitor

Enabling Intel Virtualization Technology Features and Benefits

Advanced Computer Networks. Network I/O Virtualization

Outline. Outline. Why virtualization? Why not virtualize? Today s data center. Cloud computing. Virtual resource pool

Virtualization Technologies (ENCS 691K Chapter 3)

RPM Brotherhood: KVM VIRTUALIZATION TECHNOLOGY

Virtualization Technology. Zhonghong Ou Data Communications Software Lab, Aalto University

Virtualization VMware Inc. All rights reserved

Architecture of the Kernel-based Virtual Machine (KVM)

OS Virtualization. CSC 456 Final Presentation Brandon D. Shroyer

On the Feasibility of Software Attacks on Commodity Virtual Machine Monitors via Direct Device Assignment

Intel Virtualization Technology and Extensions

Virtual Machines. Virtualization

Understanding Full Virtualization, Paravirtualization, and Hardware Assist. Introduction...1 Overview of x86 Virtualization...2 CPU Virtualization...

Cloud Computing CS

Masters Project Proposal

Enabling Technologies for Distributed and Cloud Computing

Xen and the Art of Virtualization

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

Performance Profiling in a Virtualized Environment

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

Intel Virtualization Technology Processor Virtualization Extensions and Intel Trusted execution Technology

Basics in Energy Information (& Communication) Systems Virtualization / Virtual Machines

Attacking Hypervisors via Firmware and Hardware

Enabling Technologies for Distributed Computing

Bridging the Gap between Software and Hardware Techniques for I/O Virtualization

Introduction to the NI Real-Time Hypervisor

Performance Analysis of Large Receive Offload in a Xen Virtualized System

Transcription:

Bare-Metal Performance for x86 Virtualization Muli Ben-Yehuda Technion & IBM Research Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 1 / 49

Background: x86 machine virtualization Running multiple different unmodified operating systems Each in an isolated virtual machine Simultaneously On the x86 architecture Many uses: live migration, record & replay, testing, security,... Foundation of IaaS cloud computing Used nearly everywhere Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 2 / 49

The problem is performance Machine virtualization can reduce performance by orders of magnitude [Adams06,Santos08,Ram09,Ben-Yehuda10,Amit11,... ] Overhead limits use of virtualization in many scenarios We would like to make it possible to use virtualization everywhere Including I/O intensive workloads and nested workloads Where does the overhead come from? Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 3 / 49

The origin of overhead Popek and Goldberg s virtualization model [Popek74]: Trap and emulate Privileged instructions trap to the hypervisor Hypervisor emulates their behavior Traps cause an exit I/O intensive workloads cause many exits Nested workloads cause many exits Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 4 / 49

What is nested x86 virtualization? Running multiple unmodified hypervisors With their associated unmodified VM s Simultaneously On the x86 architecture Which does not support nesting in hardware...... but does support a single level of virtualization Guest OS Guest Hypervisor Guest OS Hypervisor Hardware Guest OS Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 5 / 49

Why? Operating systems are already hypervisors (Windows 7 with XP mode, Linux/KVM) Security: attack via or defend against hypervisor-level rootkits such as Blue Pill To be able to run other hypervisors in clouds Co-design of x86 hardware and system software Testing, demonstrating, debugging, live migration of hypervisors uli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 6 / 49

What is the Turtles project? Efficient nested virtualization for VMX based on KVM Runs multiple guest hypervisors, including VMware, Windows The Turtles Project: Design and Implementation of Nested Virtualization, Ben-Yehuda, Day, Dubitzky, Factor, Hare El, Gordon, Liguori, Wasserman and Yassour, OSDI 10 uli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 7 / 49

What is the Turtles project? (cont ) Nested VMX virtualization for nested CPU virtualization Multi-dimensional paging for nested MMU virtualization Multi-level device assignment for nested I/O virtualization Micro-optimizations to make it go fast + Muli Ben-Yehuda (Technion & IBM Research) + Bare-Metal Perf. for x86 Virtualization = Intel, Haifa, 2012 8 / 49

Theory of nested CPU virtualization Trap and emulate[popekgoldberg74] it s all about the traps Single-level (x86) vs. multi-level (e.g., z/vm) Single level one hypervisor, many guests Turtles approach: L 0 multiplexes the hardware between L 1 and L 2, running both as guests of L 0 without either being aware of it (Scheme generalized for n levels; Our focus is n=2) Guest L2 Guest L2 L1 Guest Hypervisor Guest Guest Hypervisor L1 Guest L2 Guest L2 Guest L0 Host Hypervisor L0 Host Hypervisor Hardware Hardware Multiple logical levels Multiplexed on a single level Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 9 / 49

Nested VMX virtualization: flow L 0 runs L 1 with VMCS 0 1 L 1 prepares VMCS 1 2 and executes vmlaunch vmlaunch traps to L 0 L 0 merges VMCS s: VMCS 0 1 merged with VMCS 1 2 is VMCS 0 2 L1 Guest Hypervisor L2 VMCS Guest OS Memory Tables 1-2 State L 0 launches L 2 L 2 causes a trap 0-1 State VMCS Memory Tables VMCS Memory Tables 0-2 State L 0 handles trap itself or forwards it to L 1 L0 Host Hypervisor... Hardware eventually, L 0 resumes L 2 repeat uli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 10 / 49

Exit multiplication makes angry turtle angry To handle a single L2 exit, L1 does many things: read and write the VMCS, disable interrupts,... Those operations can trap, leading to exit multiplication Exit multiplication: a single L2 exit can cause 40-50 L1 exits! Optimize: make a single exit fast and reduce frequency of exits L3 L2 L1 L0 Single Level Two Levels Three Levels Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 11 / 49

Introduction to x86 MMU virtualization x86 does page table walks in hardware MMU has one currently active hardware page table Bare metal only needs one logical translation, (virtual physical) Virtualization needs two logical translations 1 Guest page table: (guest virt guest phys) 2 Host page table: (guest phys host phys)... but MMU only knows to walk a single table! Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 12 / 49

Solution: multi-dimensional paging L 2 virtual GPT L 2 physical EPT12 L 1 physical EPT02 EPT01 L 0 physical EPT table rarely changes; guest page table changes a lot Compress three logical translations two in hardware L 0 emulates EPT for L 1 L 0 uses EPT 0 1 and EPT 1 2 to construct EPT 0 2 End result: a lot less exits! Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 13 / 49

I/O virtualization via device emulation HOST GUEST 1 device driver 2 4 device driver 3 device emulation Emulation is usually the default [Sugerman01] Works for unmodified guests out of the box Very low performance, due to many exits on the I/O path Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 14 / 49

I/O virtualization via paravirtualized devices HOST GUEST front end virtual driver 1 3 device driver 2 back end virtual driver Hypervisor aware drivers and devices [Barham03,Russell08] Requires new guest drivers Requires hypervisor involvement on the I/O path Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 15 / 49

I/O virtualization via device assignment HOST GUEST device driver Bypass the hypervisor on I/O path [Levasseur04,Ben-Yehuda06] SR-IOV devices provide sharing in hardware Better performance than paravirtual but far from native Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 16 / 49

Comparing I/O virtualization methods IOV method throughput (Mb/s) CPU utilization bare-metal 950 20% device assignment 950 25% paravirtual 950 50% emulation 250 100% netperf TCP_STREAM sender on 1Gb/s Ethernet (16K msgs) Device assignment best performing option Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 17 / 49

What does it mean, to do I/O? Programmed I/O (in/out instructions) Memory-mapped I/O (loads and stores) Direct memory access (DMA) Interrupts uli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 18 / 49

Direct memory access (DMA) All modern devices access memory directly On bare-metal: A trusted driver gives its device an address Device reads or writes that address Protection problem: guest drivers are not trusted Translation problem: guest memory host memory Direct access: the guest bypasses the host What to do? Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 19 / 49

IOMMU Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 20 / 49

The IOMMU mapping memory/performance tradeoff When does the host map and unmap translation entries? Direct mapping up-front on virtual machine creation: all memory is pinned, no intra-guest protection During run-time: high cost in performance We want: direct mapping performance, intra-guest protection, minimal pinning Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 21 / 49

Multi-level device assignment With nested 3x3 options for I/O virtualization (L 2 L 1 L 0 ) Multi-level device assignment means giving an L 2 guest direct access to L 0 s devices, safely bypassing both L 0 and L 1 MMIOs and PIOs L 2 device driver L 1 hypervisor L 0 hypervisor physical device L 1 IOMMU L 0 IOMMU Device DMA via platform IOMMU Requires that L 0 emulate an IOMMU efficiently L 0 compresses multiple IOMMU translations onto the single hardware IOMMU page table L 2 programs the device directly Device DMA s into L 2 memory space directly uli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 22 / 49

viommu: efficient IOMMU emulation Emulate an IOMMU so that we know when to map and unmap: enable memory-overcommitment, intra-guest protection Use a sidecore [Kumar07] for efficient emulation: avoid costly exits by running emulation on another core in parallel Optimistic teardown: relax protection to increase performance by caching translation entries viommu provides high performance with intra-guest protection and minimal pinning Guest Domain IOMMU Mapping Layer I/O Device Driver (1) Map / Unmap I/O Buffer Emulation Domain (Sidecore) (2) Update Mappings (3) IOTLB Invd. IOMMU Emulation (8) Transaction to IOVA (4) Poll (6) Update Mappings (7) IOTLB Invalidations System Domain Memory Emul. PTE Emul. IOMMU Regs. Physical PTE I/O Buffer IOMMU (11) Physical Access (9) IOVA Access I/O Device viommu: Efficient IOMMU Emulation, Amit, Ben-Yehuda, Schuster, Tsafrir, MuliUSENIX Ben-Yehuda ATC (Technion 11& IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 23 / 49 (5) Read (10) Translate

Micro-optimizations Goal: reduce world switch overheads Reduce cost of single exit by focus on VMCS merges: Keep VMCS fields in processor encoding Partial updates instead of whole-sale copying Copy multiple fields at once Some optimizations not safe according to spec Reduce frequency of exits focus on vmread and vmwrite Avoid the exit multiplier effect Loads/stores vs. architected trapping instructions Binary patching? Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 24 / 49

Nested VMX support in KVM Date: Mon, 16 May 2011 22:43:54 +0300 From: Nadav Har El <nyh (at) il.ibm.com> To: kvm [at] vger.kernel.org Cc: gleb [at] redhat.com, avi [at] redhat.com Subject: [PATCH 0/31] nvmx: Nested VMX, v10 Hi, This is the tenth iteration of the nested VMX patch set. Improvements in this version over the previous one include: * Fix the code which did not fully maintain a list of all VMCSs loaded on each CPU. (Avi, this was the big thing that bothered you in the previous version). * Add nested-entry-time (L1->L2) verification of control fields of vmcs12 - procbased, pinbased, entry, exit and secondary controls - compared to the capability MSRs which we advertise to L1. [many other changes trimmed] This new set of patches applies to the current KVM trunk (I checked with 6f1bd0daae731ff07f4755b4f56730a6e4a3c1cb). If you wish, you can also check out an already-patched version of KVM from branch "nvmx10" of the repository: git://github.com/nyh/kvm-nested-vmx.git Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 25 / 49

Windows XP on KVM L 1 on KVM L 0 Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 26 / 49

Linux on VMware L 1 on KVM L 0 Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 27 / 49

Experimental Setup Running Linux, Windows, KVM, VMware, SMP,... Macro workloads: kernbench SPECjbb netperf Multi-dimensional paging? Multi-level device assignment? KVM as L 1 vs. VMware as L 1? See paper for full experimental details and more benchmarks and analysis uli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 28 / 49

Macro: SPECjbb and kernbench kernbench Host Guest Nested Nested DRW Run time 324.3 355 406.3 391.5 % overhead vs. host - 9.5 25.3 20.7 % overhead vs. guest - - 14.5 10.3 SPECjbb Host Guest Nested Nested DRW Score 90493 83599 77065 78347 % degradation vs. host - 7.6 14.8 13.4 % degradation vs. guest - - 7.8 6.3 Table: kernbench and SPECjbb results Exit multiplication effect not as bad as we feared Direct vmread and vmwrite (DRW) give an immediate boost Take-away: each level of virtualization adds approximately the same overhead! Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 29 / 49

Macro: multi-dimensional paging 3.5 3.0 Improvement ratio 2.5 2.0 1.5 1.0 Shadow on EPT Multi dimensional paging 0.5 0.0 kernbench specjbb netperf Impact of multi-dimensional paging depends on rate of page faults Shadow-on-EPT: every L 2 page fault causes L 1 multiple exits Multi-dimensional paging: only EPT violations cause L 1 exits EPT table rarely changes: #(EPT violations) << #(page faults) Multi-dimensional paging huge win for page-fault intensive kernbench uli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 30 / 49

Macro: multi-level device assignment throughput (Mbps) 1,000 900 800 700 600 500 400 300 200 100 0 native throughput (Mbps) %cpu 100 80 60 40 20 0 % cpu emulation nested guest nested guest nested guest nested guest nested guest virtio virtio / emulation emulation / emulation single level guest direct access single level guest single level guest direct / direct direct / virtio virtio / virtio Benchmark: netperf TCP_STREAM (transmit) Multi-level device assignment best performing option But: native at 20%, multi-level device assignment at 60% (x3!) Interrupts considered harmful, cause exit multiplication Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 31 / 49

Macro: multi-level device assignment (sans interrupts) Throughput (Mbps) 1000 900 800 700 600 500 400 300 L0 (bare metal) 200 L2 (direct/direct) L2 (direct/virtio) 100 16 32 64 128 256 512 Message size (netperf -m) What if we could deliver device interrupts directly to L 2? Only 7% difference between native and nested guest! Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 32 / 49

Micro: synthetic worst case CPUID loop 60,000 50,000 CPU Cycles 40,000 30,000 20,000 L1 L0 cpu mode switch 10,000 0 2. Nested Guest 1. Single Level Guest 3. Nested Guest optimizations 3.5.1 4. Nested Guest optimizations 3.5.2 5. Nested Guest optimizations 3.5.1 & 3.5.2 CPUID running in a tight loop is not a real-world workload! Went from 30x worse to only 6x worse A nested exit is still expensive minimize both single exit cost and frequency of exits Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 33 / 49

Interim conclusions: Turtles Efficient nested x86 virtualization is challenging but feasible A whole new ballpark opening up many exciting applications security, cloud, architecture,... Current overhead of 6-14% Negligible for some workloads, not yet for others Work in progress expect at most 5% eventually Code is available Why Turtles? It s turtles all the way down uli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 34 / 49

Back to interrupts: How bad is it? netperf TCP_STREAM sender on 10Gb/s Ethernet with 256 byte messages Using device assignment with direct mapping in the IOMMU Only achieves 60% of bare-metal performance Same results for memcached and apache Where does the rest go? Interrupts: approximately 49,000 interrupts per second with Linux uli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 35 / 49

Interrupts cause exits Guest Each interrupt causes at least two exits One to deliver Virtual Interrupts Hypervisor End of Interrupt (EOI) One to signal completion Interrupts EOI I/O APIC I/O Device uli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 36 / 49

ELI: Exitless Interrupts guest (a) Baseline Physical Interrupt Interrupt Injection Interrupt Completion hypervisor ELI (b) delivery Interrupt Completion guest hypervisor ELI delivery & (c) completion guest hypervisor (d) bare-metal (time) ELI: direct interrupts for unmodified, untrusted guests ELI: Bare-Metal Performance for I/O Virtualization, Gordon, Amit, Hare El, Ben-Yehuda, Landau, Schuster, Tsafrir, ASPLOS 12 Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 37 / 49

Background: interrupts IDTR IDT IDT Entry Address IDT Entry Vector 1 Vector 2 Limit IDT Register IDT Entry Interrupt Descriptor Table Vector n Interrupt handlers I/O devices raise interrupts CPU temporarily stops the currently executing code CPU jumps to a pre-specified interrupt handler Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 38 / 49

ELI: delivery Shadow IDT Guest IDT Interrupt Handler IDTR Limit IDT Entry IDT Entry P=0 P=1 #NP Handler Assigned Interrupt Shadow IDT ELI Delivery VM Non-assigned Interrupt (#NP/#GP exit) IDT Entry P=0 #NP Hypervisor IDT Entry #GP Physical Interrupt All interrupts are delivered directly to the guest Host and other guests interrupts are bounced back to the host... without the guest being aware of it Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 39 / 49

ELI: signaling completion Guests signal interrupt completions by writing to the Local Advance Programmable Interrupt Controller (LAPIC) End-of-Interrupt (EOI) register Old LAPIC: hypervisor traps load/stores to LAPIC page x2apic: hypervisor can trap specific registers Signaling completion without trapping requires x2apic ELI gives the guest direct access only to the EOI register Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 40 / 49

ELI: threat model Threats: malicious guests might try to: keep interrupts disabled signal invalid completions consume other guests or host interrupts Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 41 / 49

ELI: protection VMX preemption timer to force exits instead of timer interrupts Ignore spurious EOIs Protect critical interrupts by: Delivering them to a non-eli core if available Redirecting them as NMIs unconditional exit Use IDTR limit to force #GP exits on critical interrupts uli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 42 / 49

Evaluation: netperf 102,000 800 Exits/sec + - 60% 98% Time in guest - + bare-metal performance! Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 43 / 49

Evaluation: apache + - 91,000 1,100 Exits/sec - + 65% 97% Time in guest bare-metal performance! Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 44 / 49

Evaluation: memcached + - 123,000 1,000 Exits/sec - + 60% 100% Time in guest bare-metal performance! Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 45 / 49

Conclusions: ELI Achievement unlocked: bare-metal performance for x86... if they use device assignment Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 46 / 49

The saga continues: ELVIS at SYSTOR Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 47 / 49

Thank you! Questions? Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 48 / 49

Hardware wishlist? For nested virtualization: sie -like functionality For general virtualization performance: fast inter-core exits, i.e., host-to-guest and guest-to-host inter-core notifications For I/O: IOMMU with I/O page faults (PRI) In general: better chipset-level visibility. A cycle-accurate simulator for I/O would be really nice... Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for x86 Virtualization Intel, Haifa, 2012 49 / 49