Virtualization Based on materials from: Introduction to Virtual Machines by Carl Waldspurger Understanding Intel Virtualization Technology (VT) by N. B. Sahgal and D. Rodgers Intel Virtualization Technology Roadmap and VT-d Support in Xen by Jun Nakajima 1 2010 VMware Inc. All rights reserved
Starting Point: A Physical Machine Physical Hardware Processors, memory, chipset, I/O devices, etc. Resources often grossly underutilized Software Tightly coupled to physical hardware Single active OS instance OS controls hardware 2
What is a Virtual Machine? Software Abstraction Behaves like hardware Encapsulates all OS and application state Virtualization Layer Extra level of indirection Decouples hardware, OS Enforces isolation Multiplexes physical hardware across VMs 3
Virtualization Properties Isolation Fault isolation Performance isolation Encapsulation Cleanly capture all VM state Enables VM snapshots, clones Portability Independent of physical hardware Enables migration of live, running VMs Interposition Transformations on instructions, memory, I/O Enables transparent resource overcommitment, encryption, compression, replication 4
Scenario 1 Jerry downloads a new un-stable OS and runs it on a physical machine. However, because there is a serious bug in the OS, it crashes and damages his hard drive. How would VM technology help? What virtualization property is it? 5
Scenario 2 In the data center, there is a server that runs a single OS. Administrator checked its usage stats and found that CPU ratio is most of the time under 2%. Administrator wants to save energy, can she simply shut it down? How would VM technology help? What virtualization property is it? 6
Scenario 3 To prevent the guest OS from leaking user privacy, below the VM monitor layer we add a functionality to track all private data, intercepts the leakage activities and reports them to the user. What property of VM makes this scenario possible? 7
Types of Virtualization Process Virtualization Language-level Java,.NET, Smalltalk OS-level processes, Solaris Zones, BSD Jails Cross-ISA emulation Apple 68K-PPC-x86 System Virtualization VMware Workstation, Microsoft VPC, Parallels VMware ESX, Xen, Microsoft Hyper-V 8
Types of Virtualization Native/Bare metal (Type 1) Higher performance ESX, Xen, HyperV Hosted (Type 2) Easier to install Leverage host s device drivers VMware Workstation, Parallels 9 Attribution: http://itechthoughts.wordpress.com/tag/full-virtualization/
Types of Virtualization Full virtualization Unmodified OS, virtualization is transparent to OS Para virtualization OS modified to be virtualized 10 Attribution http://forums.techarena.in/guides-tutorials/1104460.htm
What Needs to Virtualized? Processor Memory IO Guest OS + Applications Page Fault Unprivileged Undef Instr virq Privileged MMU Emulation CPU Emulation I/O Emulation Virtual Machine Monitor 11
Processor Protection Level x86: Only rings 0 (Kernel) and 3 (User) are typically used. 3 types of events cause entry into privileged mode (or Kernel mode) 1. An asynchronous hardware interrupt (e.g., IO device) 2. A system call instruction (e.g., int, cli) 3. Exceptions (also called traps or faults, e.g., divide-by-0, page fault) 12 Attribution: http://stackoverflow.com/questions/16649824/the-relation-between-privileged-instructions-traps-and-system-calls
Processor Virtualization An architecture is classically/strictly virtualizable if all its sensitive instructions (those that violate safety and encapsulation) are a subset of the privileged instructions. all instructions either trap or execute identically instructions that access privileged state trap 13 Attribution: http://itechthoughts.wordpress.com/tag/full-virtualization/
Trap and Emulate Run guest operating system deprivileged All privileged instructions trap into VMM VMM emulates instructions against virtual state e.g. disable virtual interrupts, not physical interrupts Resume direct execution from next guest instruction 14
x86 Virtualization Challenges Not Classically Virtualizable x86 ISA includes instructions that read or modify privileged state But which don t trap in unprivileged mode Example: POPF instruction Pop top-of-stack into EFLAGS register EFLAGS.IF bit privileged (interrupt enable flag) POPF silently ignores attempts to alter EFLAGS.IF in unprivileged mode! So no trap to return control to VMM Deprivileging not possible with x86! 15
x86 Virtualization Approaches Binary translation Para virtualization HW support 16
Binary Translation Translate dangerous instruction sequences into safe instruction sequences Replace privileged instructions with calls to VMM Cache translations for performance Advantages: Can run unmodified OS Disadvantages: Frequent traps to VMM Examples: VMware Guest Code vep C mov and mov Translation Cache ebx, eax mov ebx, eax cli mov [VIF], 0 ebx, ~0xfff ebx, cr3 and mov sti call ret mov test ebx, ~0xfff [CO_ARG], ebx HANDLE_CR3 [VIF], 1 [INT_PEND], 1 jne 17 call HANDLE_INTS jmp HANDLE_RET start
Processor Paravirtualization Make OS aware of virtualization Present to OS software interface that is similar, but not identical to underlying hardware Replace dangerous system calls with calls to VMM Page table updates Advantages: High performance Disadvantages: Requires porting OS Examples: Xen 18
HW Support Intel VT-x Codenamed "Vanderpool" Available since Itanium 2 (2005), Xeon and Centrino (2006) AMD-V Codename Pacifica Available since Athlon 64 (2006) 19
Intel VT-x VT extends the original x86 architecture to eliminate holes that make virtualization hard. 20
Operating Modes VMX root operation: Fully privileged, intended for VM monitor VMX non-root operation: Not fully privileged, intended for guest software Reduces Guest SW privilege w/o relying on rings Solution to Ring Aliasing and Ring Compression 21
VM Entry and VM Exit VM Entry Transition from VMM to Guest Enters VMX non-root operation Loads Guest state and Exit criteria from VMCS VMLAUNCH instruction used on initial entry VMRESUME instruction used on subsequent entries VM Exit VMEXIT instruction used on transition from Guest to VMM Enters VMX root operation Saves Guest state in VMCS Loads VMM state from VMCS VMM can control which instructions cause VM exits 22 CR3 accesses, INVLPG
VT-x Operations 23
Traditional Memory Translation Virtual Address Physical Address TLB 1 4 2 5 3 Operating System s Page Fault Handler Process Page Table 2 24 Attribution: Scott Devine,E6998 - Virtual Machines, Lecture 3 Memory Virtualization
Memory Virtualization Native Virtualized 25
Memory Virtualization Techniques Shadow page tables Paravirtualization HW supported nested page tables 26
Shadow Page Tables Keep a second set of page tables hidden from guest Map between guest virtual and machine pages Detect when guest changes page tables TLB invalidation requests, page table creation, write to existing page tables Update shadow page accordingly On context switch, install shadow page instead of guest page Advantages: Can support unmodified guest Disadvantages: Significant overhead to maintain consistency Examples: VMware and Xen HVM 27
Memory Paravirtualization Page table maps between virtual and machine addresses OS and VMM share page tables OS can only read Changes to page table require hyper call VMM validates that guest owns machine address Advantages: Higher performance can be achieved by batching updates Disadvantages: Requires changes to the OS Examples: Xen 28
Nested Page Tables 0 4GB Virtual Address Space 0 Guest Page Table 4GB Physical Address Space 0 VMM PhysMap 4GB Machine Address Space 29 Attribution: Scott Devine,E6998 - Virtual Machines, Lecture 3 Memory Virtualization
Nested Page Tables (cont'd) Virtual Address TLB Machine Address 3 1 2 Guest Page Table 2 PhysMap 3 By VMM 30 Attribution: Scott Devine,E6998 - Virtual Machines, Lecture 3 Memory Virtualization
Hardware Support Nested page tables HW keeps a second set of page tables that map from physical to machine addresses. On a TLB miss, first find physical address from guest page tables, then map to machine address Intel EPT (Extended Page Table) Since Corei7 (2008) AMD RVI (Rapid Virtualization Indexing) Since Opteron and Phenom II (2007) 31
Issues with Nested Page Tables Positives Simplifies monitor design No need for page protection calculus Negatives Guest page table is in physical address space Need to walk PhysMap multiple times 32 Need physical-to-machine mapping to walk guest page table Need physical-to-machine mapping for original virtual address
Memory Reclamation Balloning: guest driver allocates pinned PPNs, hypervisor deallocates backing MPNs Swapping: hypervisor transparently pages out PPNs, paged in on demand Page sharing: hypervisor identifies identical PPNs based on content, maps to same MPN copy-on-write 33
Ballooning 34
Page Sharing 35
Page Sharing 36
I/O Virtualization Emulation Paravirtualization (split driver) Direct mapped/pci passthrough Hardware support 37
Emulation Guest runs original driver VMM emulates HW in SW Advantages: Can run unmodified guest Disadvantages: Slow 38
IO Paravirtualization Slip driver approach Privileged domain interact with IO devices, exports high level interface as back-end drive Guest domain implements front end driver Front and back end drivers 39
Direct Mapped/PCI Passthrough Allocate a physical device to a specific domain Driver runs of guest domain Cannot use DMA DMA uses physical addresses. Breaks isolation 40
Hardware Support IOMMU (IO Memory Management Unit) Translates memory addresses from IO space to physical space Provides isolation. Limits device s ability to access machine memory. Intel VT-d Core 2 (2008) AMD-Vi Six Core Opteron (2010) 41
Intel VT-d Provides infrastructure for I/O virtualization DMA and interrupt remapping 42
VT-d Applied to Pass-through Model Direct Device Assignment to Guest OS Guest OS directly programs physical device VMM sets up guest- to host-physical DMA mapping PCI-SIG I/O Virtualization Working Group Activity towards standardizing natively sharable I/O devices IOV devices provide virtual interfaces, each independently assignable to VMs Advantages: High performance and simple VMM Disadvantages: Limits VM migration 43
Extra slides 44
Timeline 45
Brief History of VMware x86 Virtualization x86-64 Intel EPT Intel VT-x AMD-V AMD RVI 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009... ESX 4.0 ESX 3.5 ESX 3.0 Workstation 5.5 (64 bit guests) ESX 2.0 (vsmp) ESX Server 1.0 Workstation 2.0 Workstation 1.0 VMware founded 46
Virtualization Applications Server consolidation Convert underutilized servers to VMs Significant cost savings (equipment, space, power) reduce space, power and cooling 70-80% reduction numbers cited in industry Increasingly used for virtual desktops Data center management VM portability and live migration a key enabler automate resource scheduling across a pool of servers optimize for performance and/or power consumption allocate resources for new applications on the fly add/remove servers without application downtime Desktop management centralize management of desktop VM images automate deployment and patching of desktop VMs run desktop VMs on servers or on client machines 47
Virtualization Applications (cont.) Development, test and deployment Developers: test multiple OS versions, distributed application configurations on a single machine Record/replay application execution deterministically Trace application behavior online and offline Model distributed hardware for multi-tier applications Virtual appliances: a complete, portable application execution environment Application and OS flexibility run any application or operating system 48
Virtualization Applications (cont.) Fast, automated recovery automated failover/restart within a cluster disaster recovery across sites VM portability enables this to work reliably across potentially different hardware configurations Fault tolerance hypervisor-based fault tolerance against hardware failures [Bressoud and Schneider, SOSP 1995] run two identical VMs on two different machines, backup VM takes over if primary VM s hardware crashes commercial prototypes beginning to emerge (2008) 49