Nested Virtualization Introduction and improvements Bandan Das Karen Noel
2 Outline Introduction When things don't work Note on AMD Speeding up Wrap-up References
3 Introduction Nested Virtualization Linux Windows Windows Xen ESX Linux/KVM Hardware
4 Introduction Nested Virtualization That thing you boot in order to boot something, that lets you boot something else finally to get you to Facebook Windows Linux Xen Windows ESX Linux/KVM Hardware
5 Introduction Uses Operating system hypervisors (Linux/KVM, WinXP mode in newer versions of Windows) Cloud Computing Give users the ability to run their own hypervisors! Security Mcafee DeepSafe Testing/debugging hypervisors Use Me! Interoperability
6 Introduction VMCS (Virtual Machine Control Structure) Data structure to manage VMX non-root operations and transitions Defines the guest OS state Configured by the hypervisor and utilized by VMX hardware to run guest OS
7 Introduction VMCS (Virtual Machine Control Structure) Guest state area Saved on VMEXIT and loaded upon VMENTER Host state area Loaded on VMEXIT VM-execution control fields Control operation in non-root mode VM-exit control fields VM-entry control fields VM-exit information fields Read-only area that describe causes of VMEXITS
8 Introduction How it works L 0 runs L 1 with VMCS 01 Linux L 2 (Nested) L 1 wants to run L 2 and executes vmlaunch with VMCS 12 vmlaunch traps to L 0 Xen VMCS 12 L 1 L 0 merges VMCS 01 with VMCS 12 to create VMCS 02 and run L 2 L 0 If L 2 traps, we are back in L 0 VMCS 01 VMCS 02 L 0 decides whether to handle trap itself or forward to L 1 Eventually L 0 resumes L 1... KVM Hardware
9 Introduction Nesting is disabled by default modprobe kvm_intel nested=1 Run Qemu with -cpu host or -cpu cpu_name,+vmx (L 0 ) Enables the virtual CPU to advertise vmx Check output in /proc/cpuinfo!
10 Introduction Level 2 guest Modeled CPU Qemu -cpu Nehalem Level 1 Modeled CPUs Qemu -cpu Sandybridge, +vmx Linux/KVM(L 0 ) Host CPUs
11 When things don't work CPU Models Qemu defines virtual CPUs to expose to guests Attempts to match its physical counterpart but not always! Source of bugs and unimplemented bits When -cpu host fails, try another model -cpu host,-flag1,-flag2 -cpu Sandybridge,+vmx,+flag1,+flag2
12 When things don't work Interrupt Injection kvm exits to L 0 L 0 decides where to inject interrupt Incorrect injection can lead to undesirable results eg:. L 0 Injecting L 1 's timer interrupt into L 2 Commit 9242b5b eg: L 0 mishandling watchdog interrupts echo 0 > /proc/sys/kernel/nmi_watchdog
13 When things don't work Specification compliance (Unimplemented features) KVM is the primary L 1 hypervisor Features/code paths not used by KVM have less test exposure Mandatory VMX features such as MSR load/store mechanism http://www.spinics.net/lists/kvm/msg111816.html Virtualbox depends on it
14 When things don't work Experimenting with kvm module parameters (Intel) ept = 0 Will disable nested ept as well Significantly slower enable_apicv = 0 Disables APIC Virtualization Testing newer processors enable_shadow_vmcs = 0 Disables Shadow VMCS
15 Nested Virtualization - AMD Stable codebase nested is enabled by default AMD-v Advanced virtual Interrupt Controller (AVIC) Hardware yet to arrive! More Testing Hard to find bugs always exist! Newer releases of common and new hypervisors Nesting introduces I/O bottlenecks Are we spec compliant?
16 Speeding Up Nested EPT Without Nested EPT, L 1 shadows L 2 translations L 0 manages its address space using EPT (ngva => ngpa => GPA) => HPA L0 only sees ngva => GPA mapping L 1 shadowing is expensive Slow! Gains from using EPT in L0 are minimal Shadowing in L1 causes many vmexits, and nested vmexits are slow
17 Speeding Up Nested EPT EPT only supports two levels! What if the nested guest used EPT page tables? On L0, use existing code to shadow EPT tables L0 operates on L 1 's ngpa => GPA mapping (almost static) Processor takes care of expensive ngva => ngpa mapping ngva => (ngpa => GPA => HPA) Shadowing is now cheap! Reduced nested vmexits
18 Speeding Up Nested EPT Performance Kernel build without EPT 47m35s Kernel build with EPT 15m10s SPECJBB without EPT 1844 SPECJBB with EPT - 5540
19 Speeding Up Shadow VMCS L 1 accessing VMCS always causes an exit Slows things down! Solution: Shadow VMCS L 0 creates a personal copy of VMCS for L1 to run L2 No exits when L 1 accesses shadow VMCS Shadow copy gets synced up with real VMCS upon VMEXIT
20 Speeding Up Shadow VMCS Performance Kernel build without Shadow VMCS 15m10s Kernel build with Shadow VMCS 13m17s SPECJBB without Shadow VMCS 5280 SPECJBB with Shadow VMCS - 5540
21 Speeding Up Nested APIC-v APIC-v : APIC Register Virtualization Virtualized APIC Read Access: Guest reads from Virtual APIC page with no VMEXIT Write accesses in hot paths are virtualized (EOIs, Self-IPIs) Patches for nvmx support posted and under review Adds support for Posted Interrupts and Virtual Interupt delivery
22 Speeding Up Nested APIC-v (Upstream author results) wprime without Nested APIC-v 7.782s wprime with Nested APIC-v 7.172s iperf without Nested APIC-v - 2.12 Gbps iperf with Nested APIC-v 3.50 Gbps
23 Speeding Up Nested VT-d Incomplete but foundation work in progress Let L 1 directly assign and manage devices for L 2 GSoC project: VT-d emulation in Qemu
24 Speeding Up Virtualization overheads Investigate reducing VMEXITS in nested code paths Nested support for features that could improve performance or otherwise Recent Additions: MPX, Interrupt Acknowledgement Work in progress
25 Wrap Up (Future Work) Stability Upto three levels of nesting works Test Matrix is complicated Combinations of configurations and hypervisors Unimplemented features and bugs INVEPT, Nested VPID, testing other hypervisors Migration Support Complicated!
26 References Nested Virtualization: shadow turtles, Orit Wasserman, KVM Forum 2013 http://www.linux-kvm.org/wiki/images/e/e9/kvm-forum-2013-nestedvirtualization-shadow-turtles.pdf Nested EPT to make Nested VMX Faster, Gleb Natapov, KVM Forum 2013 http://www.linux-kvm.org/wiki/images/8/8c/kvm-forum-2013-nested-ept.pdf Making Nested Virtualization Real.., Jun Nakajima, Linuxcon Japan 2013 http://events.linuxfoundation.org/sites/events/files/cojp13_nakajima.pdf The Turtles Project, Muli Ben-Yehuda et al, OSDI 2010 https://www.usenix.org/legacy/events/osdi10/tech/slides/ben-yehuda.pdf