Virtual Machines: Architectures, Implementations and Applications

Similar documents
Hybrid Virtualization The Next Generation of XenLinux

Virtualization. Jia Rao Assistant Professor in CS

Hardware Assisted Virtualization Intel Virtualization Technology

CS5460: Operating Systems. Lecture: Virtualization 2. Anton Burtsev March, 2013

Virtualization in Linux KVM + QEMU

Intel Virtualization Technology and Extensions

Intel Virtualization Technology Overview Yu Ke

Nested Virtualization

Virtual Machines. COMP 3361: Operating Systems I Winter

Virtualization. ! Physical Hardware. ! Software. ! Isolation. ! Software Abstraction. ! Encapsulation. ! Virtualization Layer. !

Hardware virtualization technology and its security

Virtualization. Clothing the Wolf in Wool. Wednesday, April 17, 13

Intel s Virtualization Extensions (VT-x) So you want to build a hypervisor?

Intel Vanderpool Technology for IA-32 Processors (VT-x) Preliminary Specification

CS 695 Topics in Virtualization and Cloud Computing. More Introduction + Processor Virtualization

Intel Virtualization Technology Specification for the IA-32 Intel Architecture

Virtual machines and operating systems

Kernel Virtual Machine

Jukka Ylitalo Tik TKK, April 24, 2006

Full and Para Virtualization

Virtualization Technology. Zhiming Shen

COS 318: Operating Systems. Virtual Machine Monitors

Chapter 5 Cloud Resource Virtualization

The Microsoft Windows Hypervisor High Level Architecture

Virtualization VMware Inc. All rights reserved

Virtualization. Dr. Yingwu Zhu

Uses for Virtual Machines. Virtual Machines. There are several uses for virtual machines:

matasano Hardware Virtualization Rootkits Dino A. Dai Zovi

Windows Server Virtualization & The Windows Hypervisor

EE282 Lecture 11 Virtualization & Datacenter Introduction

System Virtual Machines

Intel Virtualization Technology Processor Virtualization Extensions and Intel Trusted execution Technology

Microkernels, virtualization, exokernels. Tutorial 1 CSC469

Virtual Machine Monitors. Dr. Marc E. Fiuczynski Research Scholar Princeton University

Platform Virtualization: Model, Challenges and Approaches

ARM Virtualization: CPU & MMU Issues

x86 Virtualization Hardware Support Pla$orm Virtualiza.on

Virtualization. Jukka K. Nurminen

Virtualization. Pradipta De

Virtualizing a computing system s physical

Basics in Energy Information (& Communication) Systems Virtualization / Virtual Machines

Virtualization. Types of Interfaces

Outline. Outline. Why virtualization? Why not virtualize? Today s data center. Cloud computing. Virtual resource pool

COS 318: Operating Systems. Virtual Machine Monitors

Virtualization in the ARMv7 Architecture Lecture for the Embedded Systems Course CSD, University of Crete (May 20, 2014)

Intel Virtualization Technology (VT) in Converged Application Platforms

Virtualization. Explain how today s virtualization movement is actually a reinvention

evm Virtualization Platform for Windows

A Unified View of Virtual Machines

FRONT FLYLEAF PAGE. This page has been intentionally left blank

Hardware Based Virtualization Technologies. Elsie Wahlig Platform Software Architect

Xen and the Art of. Virtualization. Ian Pratt

MODULE 3 VIRTUALIZED DATA CENTER COMPUTE

WHITE PAPER. AMD-V Nested Paging. AMD-V Nested Paging. Issue Date: July, 2008 Revision: 1.0. Advanced Micro Devices, Inc.

A Hypervisor IPS based on Hardware assisted Virtualization Technology

COS 318: Operating Systems

Attacking Hypervisors via Firmware and Hardware

Performance monitoring with Intel Architecture

Distributed Systems. Virtualization. Paul Krzyzanowski

Intel Virtualization Technology

Introduction to Virtual Machines

Virtualization Technologies

Volume 10 Issue 03 Published, August 10, 2006 ISSN X DOI: /itj Virtualization Technology

OSes. Arvind Seshadri Mark Luk Ning Qu Adrian Perrig SOSP2007. CyLab of CMU. SecVisor: A Tiny Hypervisor to Provide

Brian Walters VMware Virtual Platform. Linux J. 1999, 63es, Article 6 (July 1999).

Security Overview of the Integrity Virtual Machines Architecture

Virtual Machines. Virtual Machine (VM) Examples of Virtual Systems. Types of Virtual Machine

Taming Hosted Hypervisors with (Mostly) Deprivileged Execution

Virtualization Technology. Zhonghong Ou Data Communications Software Lab, Aalto University

Windows Server Virtualization & The Windows Hypervisor. Background and Architecture Reference

WHITE PAPER Mainstreaming Server Virtualization: The Intel Approach

Security of Cloud Computing

Cloud Computing #6 - Virtualization

Clouds, Virtualization and Security or Look Out Below

BHyVe. BSD Hypervisor. Neel Natu Peter Grehan

Compromise-as-a-Service

Virtualization Concepts And Applications. Yash Jain DA-IICT (DCOM Research Group)

Enterprise-Class Virtualization with Open Source Technologies

Virtualization for Cloud Computing

Chapter 16: Virtual Machines. Operating System Concepts 9 th Edition

CS 61C: Great Ideas in Computer Architecture Virtual Memory Cont.

Intel 64 and IA-32 Architectures Software Developer s Manual

IOMMU: A Detailed view

VMware and CPU Virtualization Technology. Jack Lo Sr. Director, R&D

Intel Virtualization Technology FlexMigration Application Note

The Turtles Project: Design and Implementation of Nested Virtualization

Attacking Hypervisors via Firmware and Hardware

kvm: Kernel-based Virtual Machine for Linux

Hardware accelerated Virtualization in the ARM Cortex Processors

Hypervisors and Virtual Machines

Lecture 2 Cloud Computing & Virtualization. Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

asdc Introduction to Virtualization Technology Argentina Software Development Center Software and Solutions Group Gisela Giusti October 11, 2007

Knut Omang Ifi/Oracle 19 Oct, 2015

Architecture of the Kernel-based Virtual Machine (KVM)

AMD 64 Virtualization

<Insert Picture Here> Oracle Database Support for Server Virtualization Updated December 7, 2009

Bare-Metal Performance for x86 Virtualization

KVM: Kernel-based Virtualization Driver

Virtualization. P. A. Wilsey. The text highlighted in green in these slides contain external hyperlinks. 1 / 16

Basics of Virtualisation

Transcription:

Virtual Machines: Architectures, Implementations and Applications HOTCHIPS 17 Tutorial 1, Part 2 J. E. Smith University of Wisconsin-Madison Rich Uhlig Intel Corporation August 14, 2005

System Virtual Machines Rich Uhlig Intel Corporation August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 2

System Virtual Machines: Outline Applications and Usage Models Virtualization Methods and VMM Software Architecture Hardware Resource Virtualization General principles of CPU virtualization (with IA-32 / Intel VT* case study) General principles of memory virtualization (page-table shadowing case study) General principles of IO virtualization Wrap-up * Intel Virtualization Technology (VT) August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 3

System Virtual Machines (VMs) App App... App VM 0 App App... App VM 1 App App... App Operating System IDE NIC Physical Host Hardware... Device Drivers GFX A new layer of software... Guest OS 0... Guest OS 1 Processors Memory Graphics VMM Physical Host Hardware Network Storage Keyboard / Mouse Without VMs: Single OS owns all hardware resources With VMs: Multiple OSes share hardware resources A Virtual Machine Monitor (VMM) honors existing hardware interfaces to create virtual copies of a complete hardware system August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 4

System VMs: Applications and Usage Models

Basic System VM Capabilities Workload Isolation Workload Aggregation App 1 App 2 App 1 App 2 App 1 App 2 App 1 App 2 OS OS 1 OS 2 OS 1 OS 2 OS 1 OS 2 HW VMM HW 1 HW 2 VMM HW HW Workload Migration Workload Embedding App OS App OS App 1 OS 1 HW App 1 OS 1 App 2 OS 2 VMM VMM VMM VMM HW VMM HW 1 HW 2 HW 1 HW 2 HW August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 6

Traditional Server Applications DB Server OS 1 DP Server Legacy Server Installations Mail Server OS 2 UP Server Web Server OS 3 DP Server Server Consolidation DB Server Failure Isolation Mail Server OS 1 OS 2 VMM Web Server OS 3 4P / 8P / 16P Server Service Migration DB Server DB Server OS 4 OS 4 VMM DP Server Manageability, Reliability, Availability Server consolidation (Legacy OSes, One App per OS ) Staged deployment of OS upgrades, security patches, etc. Software failures confined to VM in which they occur Service migration in Virtual Data Center August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 7

Emerging Client Applications Untrusted VM Trusted VM User-visible Capability OS User-hidden IT Management Stack Apps Legacy OS Trusted Apps User Apps OS IT Apps Embedded OS Trusted VMM VMM Hardware Platform Hardware Platform Security / Trusted Computing VMs encapsulate untrusted legacy software Create new environment for trusted code Client Partitioning Extending server manageability features to the client (e.g., Embedded IT client) August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 8

Virtualization Methods and VMM Software Architecture

Anatomy of a Virtualized System VM 0 VM App App 1 App App OS 1... OS 2 Guest OSes Virtualized Hardware of VM VMM VM Monitor Physical HW Resources August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 10

Base VMM Requirements A VMM must be able to: Protect itself from guest software Isolate guest software stacks (OS + Apps) from one another Present a (virtual) platform interface to guest software To achieve this, VMM must control access to: CPUs, Memory and I/O Devices Ways that a VMM can share resources between VMs Time multiplexing Resource partitioning Mediating hardware interfaces August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 11

(1) Time Multiplexing VM 0 VM 1 VMM Processor VM is allowed direct access to resource for a period of time before being context switched to another VM (e.g., CPU resource) Devil is in the details (will examine via a case study in later foils) August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 12

(2) Resource Partitioning VM 0 VM 1 VMM Remap / Protection Mechanism Storage Memory Display VMM allocates ownership of phys resources to VMs Typically involves some remapping and protection mechanism Examples: physical memory, disk partitions, graphical display August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 13

(3) Mediating Hardware Interfaces VM 0 VM 1 VMM Network Keyboard / Mouse VMM retains direct ownership of physical resource VMM hosts device driver as well as a virtualized device interface Virtual interface can be same as or different than physical device August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 14

Putting it all Together... VM 0 VM 1 VM 2 VM 3 VMM Processor Storage Network Memory Keyboard / Mouse Display VMM applies all 3 sharing methods, as needed, to create illusion of platform ownership to each guest OS August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 15

Some VMM Architecture Options Hypervisor Architecture Hosted Architecture VM 0 VM 1 VM n User-level VMM VM n Guest OS and Apps Guest OS and Apps... Guest OS and Apps User Apps Device Models VM 0 Guest OS and Apps Hypervisor Host OS Device Models (Top) Device Drivers (Bottom) Device Drivers Ring-0 VMM Kernel Host HW Host HW Hypervisor architecture provides its own device drivers and services Hosted architecture leverages device drivers and services of a host OS August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 16

System Virtualization Case Studies Processor Virtualization

CPU Virtualization: General Principles VM 0 VM 1 VMM Processor To virtualize a CPU, a VMM must retain control over: Accesses to privileged state (control regs, debug regs, etc.) Exceptions (page faults, machine-check exceptions, etc.) Interrupts and interrupt masking Address translation (via page tables) CPU access to I/O (via I/O ports or MMIO) August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 18

CPU Control via Ring Deprivileging Ring Deprivileging Defined: Guest OS kernel runs in a less privileged ring than usual (i.e., above ring 0) VMM runs in the most privileged ring 0 Goal of ring deprivileging is to prevent guest OS from: Accessing privileged instructions / state Modifying VMM code and data August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 19

Case Study: IA-32 CPU Virtualization IA-32 Provides 4 Privilege Levels (Rings) Segment-based Protections Distinguish between all 4 rings Page-based Protections Separate only User and Supervisor modes User mode: Code running in ring 3 Supervisor mode: Code running in rings 0, 1, or 2 August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 20

Ring Deprivileging: Some Options Without Ring Deprivileging Applications Ring 3 Guest Apps Guest OS VMM Ring 3 Ring 1 Ring 0 The 0/1/3 Model OS Kernel Ring 0 With Ring Deprivileging Each option has certain pros / cons Will explore in the coming foils Guest Apps Guest OS VMM Ring 3 Ring 0 The 0/3 Model August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 21

Ring Compression For the case of the 0/3 Model: Guest OS and Apps run in the same ring (3) Lose ring protections between guest OS / Apps Two rings are compressed into one For the case of the 0/1/3 Model: No ring compression, but Can t use paging to protect VMM from guest OS VMM forced to use segment-based protections The following foils assume 0/1/3 Model August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 22

IA-32 Virtualization Holes Ring 3 Ring 1 Guest Apps Guest OS PUSH CS/SS CALL Expose that guest OS is running in ring 1 LAR LSL VERR VERW Non-trapping writes of privileged state POPF Guest Apps Guest OS SYSENTER CLI STI CPUID SGDT SIDT SLDT STR Non-trapping Reads of Privileged State Ring 0 VMM Incorporate current ring # in computation (issues if executed in ring 1) Unable to access hidden segment-register state on VM context switch Excessive Faulting August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 23

Addressing IA-32 Virtualization Holes Method 1: Paravirtualization Techniques Modify guest OS to work around virtualization holes Requires ability to modify guest-os source code Method 2: Binary Translation or Patching Modify guest OS binaries on-the-fly Source code not required, but introduces new complexities E.g., self-modifying code, translation caching, etc. Some excessive trapping remains (e.g., SYSENTER case) Method 3: Change Processor ISA Software-only Methods Re-architect instruction set to close virtualization holes by design Example: New VT-x features for IA-32 processors August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 24

Case Study: IA-32 Virtualization w/ VT-x VT-x is a new operating mode for IA-32 processors Part of Intel Virtualization Technology (VT) Will be launched in Intel desktop CPUs in second half of 2005 Operating mode enabled with VMXON / VMXOFF VT-x provides two new forms of operation: Root Operation: Fully privileged, intended for VMM Non-root Operation: Not fully privileged, intended for guest OS August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 25

Case Study: IA-32 Virtualization w/ VT-x Non-Root Operation Apps OS Apps OS Ring 3 Ring 0 Guest software runs at intended privilege level (no ring deprivileging) Standard Root Operation VMM IA-32 VMXON VMXOFF Standard IA-32 August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 26

VT-x Transitions: VM Entry and VM Exit VM Entry VMM-to-guest transition Initiated by new instructions: VMLAUNCH or VMRESUME Enters non-root operation, loading guest state Establishes key guest state in a single, atomic operation VM Exit Guest-to-VMM transition Caused by virtualization events Enters root operation Saves guest state Load VMM state Ring 3 Ring 0 VM Entry Virtual Machines (VMs) Apps OS VM Exit Apps OS Root Operation VMRESUME VMM August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 27

VT-x Config Flexibility with the VMCS VM Control Structure (VMCS) specifies CPU behavior Holds guest state loaded / stored on VM entry / exit Accessed through a VMREAD / VMWRITE interface Configuration of VMCS controls guest OS behavior VMM programs VMCS to cause VM exits on desired events VM exits possible on: Privileged State: CRn, DRn, MSRs Sensitive Ops: CPUID, HLT, etc. Paging events: #PF, INVLPG Interrupts and Exceptions Other optimizations: Bitmaps, shadow registers, etc. Ring 3 Ring 0 VMREAD VMWRITE VM Exit Apps OS VMCS VM Entry (VMM) August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 28

The VM Control Structure (VMCS) VM-execution controls Guest -state area Host -state are a VM-exit controls VM-entry controls Determines what operations cause VM exits Saved on VM exits Reloaded on VM entry Loaded on VM exits Determines which state to save, load, how to transition Determines which state to load, how to transition CR0, CR3, CR4, Exceptions, IO Ports, Interrupts, Pin Events, etc. EIP, ESP, EFLAGS, IDTR, Segment Regs, Exit info, etc. CR3, EIP set to monitor entry point, EFLAGS hardcoded, etc. Example: MSR save -load list Incl uding injecting events (interrupts, exceptions) on entry Each virtual CPU has a separate VMCS For MP guest OS: separate VMCS for each virtual CPU One VMCS per logical CPU is active at any given time VMPTRLD instruction used to switch from one VMCS to another August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 29

Example VM-exit Causes Sensitive Instructions CPUID Reports processor capabilities RDMSR, WRMSR Read and write Model-Specific Registers INVLPG Invalidate TLB Entry RDPMC, RDTSC Read Perf Mon or Time-Stamp Counters HLT, MWAIT, PAUSE Indicate Guest OS Inactivity VMCALL New Instruction for Explicit Call to VMM Accesses to Sensitive State MOV DRx Accesses to Debug Registers MOV CRx Accesses to Control Registers Task Switch Accesses to CR3 Exceptions and Asynchronous Events Page Faults, Debug Exceptions, Interrupts, NMIs, etc. August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 30

Some Example VM-Exit Optimizations VT-x provides various optimizations to minimize frequency of VM exits: Shadow Registers and Masks Reads from CR0 and CR4 are satisfied from shadow registers established by the VMM VM exits can be conditional based on the specific bits modified on a CR write (via a mask) Execution-Control Bitmaps VM exits can be selectively controlled via bitmaps (e.g., for exceptions, IO-port accesses) August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 31

Some Example VM-Exit Optimizations (2) Time-Stamp Counter (TSC) Offsets VMM can supply an offset that is applied to reads of the TSC during guest execution Eliminates VM exits on executions of RDTSC and reduces distortions of virtual time External-interrupt Exiting External interrupts cause VM exits Interrupts never masked; no need for VM exits on CLI, STI, etc. Optimized Interrupt Delivery VMM can pend a virtual interrupt to a guest OS VM exit occurs only when guest-os interrupt window is open Eliminates exits on most executions of CLI, STI, IRET, etc. August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 32

VM Entry: Event Injection Allows VMM to inject events on VM entry: External interrupts NMI Exceptions (e.g., page fault) Injection occurs after all guest state is loaded Performs all the normal IDT checks, etc. Removes burden from VMM of emulating IDT, fault checking, etc. August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 33

How VT-x Closes Virtualization Holes New execution control causes instruction to VM exit Ring 3 (Non-Root Operation) Guest Apps Report that guest OS is running at ring 0 (as expected) No longer need to trap (EFLAGS.IF does not control interrupt masking) Guest Apps SYSENTER CPUID No longer need to trap these because relevant registers are atomically context switched on VM entry/exit Ring 0 (Non-Root Operation) Guest OS PUSH CS/SS CALL LAR LSL VERR VERW POPF Guest OS CLI STI SGDT SIDT SLDT STR Root Operation VMM Instructions report correct values without requiring traps (no ring deprivileging) Clean context switching supported through VM entry / exit and VMPTRLD operations (no hidden state) Excessive Faulting Avoided: - SYSENTER functions correctly (no ring deprivileging) - CLI / STI behavior optimized for pending virtual interrupts August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 34

System Virtualization Case Studies Memory Virtualization

Mem Virtualization: General Principles VM 0 VM 1 CR3 PD PT CR3 PD PT Guest OS PT Guest OS PT VMM Memory Virtualization Host Hardware TLB Memory Guest OS expects to control address translation Allocates memory, page tables, manages TLB consistency, etc. But, VMM must have ultimate control over phys mem Must map guest-physical address space to host-physical space August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 36

A Case Study: IA-32 Address Translation TLB D R/W U/S CR3 PD PDE... PT PTE... F F Paging-related Control Registers CR0 PE, PG, WP VPN PFN Access Hardware sets A / D Bits PT PTE... F F CR4 CR2 PAE, PSE Faulting Address PFN D A U/S R/W P IA-32 defines a hierarchical page-table structure Defines linear-to-physical address translation After page-table walk, page-table Entries (PTEs) are cached in a hardware TLB IA-32 address translation configured via control registers (CR3, etc.) Invalidation of PTEs signaled by OS via INVLPG instruction August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 37

Virtualizing Page Tables: Some Options Option 1: Protect access to guest-os page tables (PTs) Use paging protections or binary translation to detect changes Upon write access, substitute remapped phys address in PTE Also need VM exit on page-table reads (to report original PTE value to guest OS) Option 2: Make a shadow copy of page tables Guest OS freely changes its page tables VM exit occurs whenever CR3 changes VMM copies contents of guest page tables to active page tables Copy operation is analogous to a TLB refill, hence: Virtual TLB Details of option 2 follow As illustration of the use of VT-x August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 38

Virtual TLB: Basic Idea VM CR3 Guest Page Table PD PT Guest OS PT VMM VTLB TLB CR3 PD Active Page Table PT PT VTLB = Processor TLB + Active Page Table VMM initializes an empty VTLB and starts guest execution When guest accesses memory, #PF occurs, and is sent to VMM VMM copies needed translation (VTLB refill) and resumes guest August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 39

Virtual TLB: VT-x Setup VMCS Set INVLPG exiting = 1 MOV CR3 and task switch always cause exits VM-execution Controls Exception bitmap CR0 guest / host mask CR4 guest / host mask CR0 read shadow CR4 read shadow Bitmap set to cause exits on #PF exceptions Guest / host masks for both CR0 and CR4 set to protect paging-related bits. Read shadows for CR0 and CR4 set to follow guest values (may differ from actual values). VTLB algorithm programs VMX to cause VM exits on: Any writes to CR3 and relevant writes to CR0 and CR4 Any page-fault (#PF) exceptions Any executions of INVLPG August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 40

Virtual TLB: Actions on CR3 Write Guest OS write to CR3 causes VM exit CR3 Guest Host CR3 Put new CR3 value into guest area of VMCS and resume guest with VMRESUME PD P 0 PDE 0 0 CR3 write implies a TLB flush and page-table change VMM notes new CR3 value (used later to walk guest PT) VMM allocates a new PD page, with all invalid entries VMM sets actual CPU CR3 register to point to the new PD page August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 41

Virtual TLB: Actions on a Page Fault Guest page fault causes a VM exit CR3 PD PDE P 0 Guest Host Page fault reflected back to guest using vector-on-entry with VMRESUME CR3 PD PDE P 0 VMM examines guest PT using faulting addr If relevant PTE or PDE is invalid (P=0), then the #PF must be reflected to the guest OS. VMM configures VMCS for a #PF vector-on-entry Then resumes guest execution with a VMRESUME August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 42

Virtual TLB: Actions on a Page Fault (2) User-level read access... Guest page fault causes a VM exit CR3 PD PDE P 1 PT G A D U/S R/W P PTE 0 0 0 1 1 1 F F Guest Host CR3 PD PT Remap P G A D U/S R/W P PDE 1 PTE 0 1 1 F If guest page table indicates sufficient access, then VMM allocates PT and copies guest PTE to the active PT PFN of active PTE remapped to new value as per VMM policy Other active PTE bits set as in guest PTE (e.g., P, G, U/S) August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 43

Virtual TLB: Actions on INVLPG INVLPG causes VM exit CR3 PD PDE P 1 PT G A D U/S R/W P PTE 0 0 0 0 0 0... F F Guest Invalidation of guest PT doesn t cause VM exit Host CR3 PD P PT G A D U/S R/W P... F PDE 1 PTE 0 10 0 10 0 10 F Guest OS permitted to freely modify its page tables Implies guest PTs and active PTs can become inconsistent This is okay! (same as inconsistencies between PTs and TLB) If guest reduces access, signals via INVLPG, causing a VM exit VMM invalidates corresponding PTE in the active PT August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 44

Virtual TLB: A few other details MP considerations (TLB shootdown) Each logical processor has its own VTLB (just as it has a TLB) TLB shootdown in software resolves down to cases shown previously (e.g., INVLPG) Other Details Accessed and Dirty Bits require special treatment (emulated through R/W and P page protections) Real-mode supported through an identity page table Other Optimizations Other VTLB refill policies possible (eager vs. lazy refill) with different trade-offs Possible to maintain multiple shadow page tables to reduce VTLB flush cost August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 45

System Virtualization Case Studies IO-Device Virtualization

IO Virtualization: General Principles Hypervisor Architecture Hosted Architecture VM 0 VM 1 VM n User-level VMM VM n Guest OS and Apps Guest OS and Apps... Guest OS and Apps User Apps Device Models VM 0 Guest OS and Apps Hypervisor Host OS Device Models (Top) Device Drivers (Bottom) Device Drivers Ring-0 VMM Kernel Virtual device model presents interface to guest operating system Physical device driver programs and responds to actual device hardware Virtual Device Interface and Model Physical Device Interface and Driver August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 47

Virtual and Physical Device Interfaces VM 0 VM 0 Guest OS and Apps Guest device driver programs virtual device interface: Device Configuration Accesses IO-port Accesses Memory-mapped Device Registers Guest OS and Apps Virtual device model proxies device activity back to guest OS: Copying (or translation) of DMA buffers Injection of virtual interrupts Virtual Device Interface and Model Virtual device model proxies accesses to physical device driver: Possible translation of commands Translation of DMA addresses Virtual Device Interface and Model Physical Device Interface and Driver Physical Device Interface and Driver Device driver programs actual physical IO device: Device configuration IO-port and MMIO accesses Physical Device Physical device responds to commands: DMA transactions to host physical memory Physical device interrupts August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 48

Case Study: IO Virtualization with VT-x VM 0 Guest OS and Apps Virtual Device Interface and Model Guest device driver programs virtual device interface: Device Configuration Accesses IO-port Accesses Memory-mapped Device Registers VMCS VM-execution Controls Various Paging Controls IO-port bitmap Bits set as shown previously to implement VTLB algorithm Bitmap set to cause exits on specific IO ports as needed VT-x provides and IO-port bitmap execution control Enables VMM to intercept any IO-port accesses for bus configuration or IO-device control VT-x provides paging controls to intercept MMIO VTLB-like algorithm can enforce VM exits on physical pages with memory-mapped IO (MMIO) registers August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 49

IO Virtualization with VT-x (cont.) VMCS VM-execution Controls Interrupt-window exiting VM-entry Controls Interrupt-information field Bit set to allow guest to run until it is ready to accept interrupts Used to inject a virtual interrupt when guest is ready VM 0 Guest OS and Apps Virtual Device Interface and Model Virtual device model proxies device activity back to guest OS: Copying (or translation) of DMA buffers Injection of virtual interrupts VT-x Interrupt-window exiting Guest OS may not be interruptible (e.g., critical section) Interrupt-window exiting allows guest OS to run until it has enabled interrupts (via EFLAGS.IF) VT-x Event Injection on VM entry Enables VMM to vector interrupt through guest IDT on VM entry August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 50

Summary and Wrap-up For more information on Intel Virtualization Technology (VT): http://www.intel.com/technology/computing/vptech/ Questions? August 2005 System Virtual Machines, HotChips 17 Tutorial, (c) 2005, Intel Corporation 51