Intel Virtualization Technology and Extensions Rochester Institute of Technology Prepared and Presented by: Swapnil S. Jadhav (Computer Engineering) Chaitanya Gadiyam (Computer Engineering) 1
Agenda Virtualization Overview Ring-Deprivileging on Intel Processors Challenges of Ring-Based VMM Virtualization Intel VT (Virtualization Technology) Hardware Support for Virtualization Intel VT-x Intel VT-I Solving Virtualization Challenges with VT-x and VT-i Enhancements to Intel VT Virtual Processor IDs Extended Page Tables Performance with Intel VT EPT Intel VT Extensions VT-d refers to Intel VT for Directed I/O VT-c refers to Intel VT for Connectivity References 2
Virtualization Overview Basic Goals: Workload Isolation Workload Consolidation Workload Migration Types: Full virtualization No Guest OS modifications Para-virtualization Guest OS modifications Fig 1. Workload Isolation, Consolidation and Migration Support for Virtualization on Intel Processors (Ring-Deprivileging): IA-32 architecture Itanium architecture 3
Ring-Deprivileging on Intel Processors Privilege based mechanism IA-32 architecture 0/3/3 Model Itanium architecture 0/1/3 Model Ring Deprivileging 4
Challenges Because of Ring-Deprivileging on IA-32 and Itanium Architecture 1. Ring aliasing Software is run at a privilege level other than the level for which it was written IA-32 PUSH instruction for CS Register Itanium br.call instruction for PFS Register 2. Address-space compression Guest access to the processor s full virtual address space Guest access to control structures residing in virtual-address space IDT, GDT (IA-32) & IVT (Itanium) Protecting these control structures Supporting guest accesses to control structures 3. Non-faulting access to privileged state Preventing unprivileged software from accessing privileged components of CPU state IA-32 GDTR, IDTR, LDTR, and TR registers Itanium - PTA register contains base address of VHPT 4. Adverse impacts on guest transitions Diminishes effectiveness of delivery and handling of transitions to OS software IA-32 Affects Low latency syscalls SYSENTER, SYSEXIT Itanium Affects interrupt handlers performance 5. Interrupt virtualization VMM intercepts external interrupts interrupt masking IA-32 - interrupt flag (IF) in EFLAGS register Itanium i bit in PSR register Frequent intercepts of interrupts from performance; halts virtual-interrupts too OSes degrades 6. Ring compression Same privilege level Guest OS runs at the same privilege level as guest applications No protection of Guest OSes from guest applications 7. Access to hidden state No access for guest SW to hidden components No mechanism for saving/restoring those as well IA-32 Hidden Descriptor Caches for segment registers Itanium Current Frame Load Enable (CFLE) bit in Register Stack Engine (RSE) register 8. Frequent Access to privileged resources Access to Task Priority Register (TPR) Each access causes faults to VMM Frequent faults degrades performance 5
Intel VT 6
Intel VT: Hardware Support for Virtualization Full virtualization No Guest OS modifications Instruction-set virtualization Eliminate the need for CPU para-virtualization and binary translation techniques Enable support for broad range of unmodified guest OSes Maintaining high levels of performance Virtualization in the x86 processor architecture CPU virtualization (First Generation) Intel VT-x Intel VT-i I/O virtualization (Second Generation) Intel VT-d Connectivity virtualization (Third Generation) Intel VT-c 7
Evolution of Intel Virtualization Technology 8
Features of Intel VT (First Generation) Focuses on CPU/ISA virtualization Hardware assist to the virtualization software (VMM) Reduces VMM size Reduces complexity Enables lower cost More efficient More powerful virtualization solutions 9
CPU Virtualization with VT-x 10
CPU Virtualization with VT-x (IA-32 Architecture) New CPU Operating Mode VMX Root Operation (for VMM) Non-Root Operation (for Guest) Eliminates ring deprivileging New Transitions VM entry to guest OS VM exit to VMM VM Control Structure (VMCS) Configured by VMM software Specifies guest OS state Controls when VM exits occur (eliminates over and under exiting) Supports on-die CPU state caching 11
Latency Reductions by CPU Virtualization in VT-x VMX Transition and Instruction Latency Improvements are dramatic 12
CPU Virtualization with VT-i 13
CPU Virtualization with VT-I (Itanium Architecture) Extensions to the Itanium processor hardware Processor abstraction layer (PAL) firmware Processor status bit PSR.vm IVT vectors PAL firmware layer extensions a set of new procedures PAL services for high-frequency VMM operations A virtual processor descriptor (VPD) table The virtualization-acceleration field The virtualization-disable field 14
Solving Virtualization Challenges with VT-x and VT-i 15
Solving Virtualization Challenges with VT-x and VT-i # Challenges Intel VT-x Intel VT-i 1 Address-Space Compression Transition between guest software and the VMM can change the linear-address space The VMX transitions are managed by the VMCS, which resides in the physicaladdress and not linear-address space 2 Ring Aliasing and Ring Compression Allows VMM to run guest software at its intended privilege level Instructions such as PUSH (of CS) and br.call cannot reveal that software is running in a virtual machine Eliminates ring compression problems that arise when a guest OS executes at the same privilege level as guest applications. VMM has a virtual-address bit that guest software cannot use. A VMM can conceal hardware support for this bit by intercepting guest calls to the PAL procedure. Allows the VMM exclusive use of half of the virtual-address space Allow a VMM to run guest software at its intended privilege level Guest software can use instructions such as PUSH (of CS) and br.call 16
Solving Virtualization Challenges with VT-x and VT-i # Challenges Intel VT-x Intel VT-i 3 Non-faulting access to privileged state VMCS structure, not VMM, controls the disposition of interrupts and exceptions Guest OS can access GDT, IDT, LDT, TSS registers 4 Guest transitions Guest OS can run at privilege level 0 Can use SYSENTER and SYSEXIT thash instruction causes virtualization faults Allows VMM to conceal any modifications made to the VHPT base address Provides Virtualization Acceleration field in the VPD to VMM Read/Write access of interruption-control registers to guest software VMM not involved while VM transitions 17
Solving Virtualization Challenges with VT-x and VT-i # Challenges Intel VT-x Intel VT-I 5 Interrupt virtualization Includes an external-interrupt exiting VM execution control when set to 1, no need of a control on every guest attempt to modify interrupt flags includes an interrupt-window exiting VMexecution control when set to 1, VM exit occurs whenever guest software is ready to receive interrupts Helps when VMM has a virtual interrupt to deliver to a guest 6 Access to hidden state Maintains hidden components of CPU state in the guest-state area of the VMCS fields Loads and saves these VMCS fields on VM Entry and VM Exit respectively Preserves CPU state during transitions 7 Frequent Access to Privileged Resources Uses TPR Shadow and TPR Threshold field in VMCS to invoke VMM only when required Includes a virtualization-acceleration field Prevents guest software from affecting interrupt masking Avoids frequent transitions to the VMM Includes PAL service that a VMM can use to register that it has a virtual interrupt pending PAL service transfers control to the VMM via the new virtual external interrupt vector Uses an argument value in PAL service to set RSE.CFLE bit to desired value Uses Virtualization-Acceleration field in VPD to indicate that VMM can be bypassed Guest SW can read interrupt control registers 18
Enhancements to Intel VT 19
Enhancements to Intel VT Virtual-Processor Identifiers (VPIDs) Unique non-zero ID for each virtual processor Use VPIDs for tag translations in TLBs Prevents TLB flushes on each VM entry and exit Extended Page Tables (EPT) MMU virtualization vs. shadow-paging Reduce page-table translation overhead 20
Intel VT Virtual Processor IDs 21
Intel VT Virtual Processor IDs: Motivation First generation of Intel VT forces flush of Translation Lookaside Buffer (TLB) on each VMX transition Performance loss on all VM exits Performance loss on most VM entries Most of the time, the VMM has not modified the guest page tables and does not require TLB flushing to occur Exceptions include emulating MOV CR3, MOV CR4, INVLPG Better VMM software control of TLB flushes is beneficial 22
Intel VT Virtual Processor IDs: Details VPID activated if new enable VPID control bit is set in VMCS New 16-bit virtual-processor-id field (VPID) field in VMCS VMM allocates unique value for each guest OS VMM uses VPID of 0x0000, no guest can have this VPID Cached linear translations are tagged with VPID value No flush of TLBs on VM entry or VM exit if VPID active 23
Intel VT Extended Page Tables 24
Intel VT Extended Page Tables: Motivation VMM needs to retain control of physical-address space With Intel 64, paging is main mechanism for protecting that space Intel VT provides hooks for page-table virtualization But page-table virtualization in software is a major source of overhead Extended Page Tables (EPT) A new CPU mechanism for remapping guest-physical memory references Allows guest to retain control of legacy Intel 64 paging Reduces frequency of VM exits to VMM Map guest-physical to host-physical address New hardware page-table walker (Hardware MMU vs. Software MMU) Benefits Guest OS can modify its own page tables freely Eliminates VM Exits Memory Savings Shadow page tables not required with EPT Single EPT supports entire VM 25
Intel VT Extended Page Tables: Overview 26
Intel VT Extended Page Tables: Overview Software MMU with Shadow tables (no EPTs) Hardware MMU with No Shadow tables (With EPTs) 27
Performance with Intel VT EPT 28
Performance with Intel VT EPT Kernel micro benchmarks: comprise a suite of benchmarks that stress different subsystems of the operating system. 29
Performance with Intel VT EPT Apache compile benchmarks: The Apache compile workload compiles and builds the Apache web server. 30
Performance with Intel VT EPT SPECjbb2005: It is an industry-standard server-side Java benchmark. It has little MMU activity but exhibits high TLB miss activity due to Java's usage of the heap and associated garbage collection. 31
Performance with Intel VT EPT Oracle Server Swingbench: Swingbench is a database workload for evaluating Oracle database performance. 32
Performance with Intel VT EPT SQL Server Database Hammer: Database Hammer is a database workload for evaluating Microsoft SQL Server database performance. 33
Performance with Intel VT EPT Citrix XenApp: It is a presentation server or application session provider that enables its clients to connect and run their favourite personal desktop applications. 34
Intel VT Supporting Hypervisors 35
Intel VT Supporting Hypervisors As on year 2010 36
Intel VT Extensions 37
Intel VT Extensions Intel VT-d Supports directed I/O Virtualization Intel VT-c Optimizing virtualized networking throughput As on year 2010 38
References Uhlig, R.; Neiger, G.; Rodgers, D.; Santoni, A.L.; Martins, F.C.M.; Anderson, A.V.; Bennett, S.M.; Kagi, A.; Leung, F.H.; Smith, L., "Intel virtualization technology," Computer, vol.38, no.5, pp.48,56, May 2005 Performance Evaluation of Intel EPT Hardware Assist - http://www.vmware.com Intel Virtualization Technology - Hardware Support for Efficient Processor Virtualization - http://www.intel.com Liu Yuhang; Hao Qinfen; Xiao Limin; Zhu Mingfa, "Design of ISA for efficient virtualization," Industrial Electronics and Applications, 2009. ICIEA 2009. 4th IEEE Conference on, vol., no., pp.3167,3172, 25-27 May 2009 39
Intel Architecture Glossary The IA-32 and Itanium architectures each include specific instructions, registers, and tables, some of which are listed below. IA-32 terms CPUID: CPU identification instruction CR: control registers: CR0, CR3 (page-table base address, which controls translation from linear to physical addresses), CR4, and CR8 (current task priority) CS: segment register for the current code segment; in some modes. its low 2 bits are the current privilege level DR: debug register EFLAGS: 32-bit version of the flags register; contains arithmetic flags as well as the interrupt flag (IF), used to mask interrupts GDT: global descriptor table; contains descriptors that can be loaded into segment registers LDTR and TR GDTR, IDTR, LDTR, TR: registers that reference the GDT, IDT, LDT, and TSS HLT: halt instruction IDT: interrupt descriptor table; controls the delivery of exceptions and interrupts to their software handlers IF: bit in the EFLAGS register that controls interrupt masking INVLPG: invalidate TLB entry instruction LDT: local descriptor table; contains descriptors that can be loaded into segment registers LGDT, LIDT, LLDT, LTR: instructions that write to GDTR, IDTR, and TR MOV: move instruction; different versions allow read and write access to the control registers and debug registers MWAIT: monitor wait instruction PUSH: push instruction; pushes its operand on the stack RDMSR, WRMSR: instructions to read from and write to modelspecific registers RDPMC: read performance-monitoring counters instruction RDTSC: read time-stamp counter instruction segment registers: registers that control translation from logical to linear addresses SGDT, SIDT, SLDT, STR: instructions that read from GDTR, IDTR, and TR SYSENTER, SYSEXIT: fast system call and fast return from fast system call instructions TSS: task-state segment; among other things, the current TSS controls the ability of software to access I/O ports Itanium terms br.call: branch instruction used to effect a conditional procedure call i: bit in the PSR that controls interrupt masking IVT: interrupt vector table; controls delivery of exceptions and interrupts to their software handlers mov: move instruction; different versions allow read and write access to the control registers (including PTA) PFS: previous function state register ppl: previous privilege level field in the PFS register PAL: processor abstraction layer; provides a consistent firmware interface to processor implementation-specific features PSR: processor status register PTA: page table address register rfi: return from interruption instruction thash: translation hashed entry address instruction VHPT: virtual hash page table; controls translation from virtual to physical addresses 40
Question? 41
Thank you! 42