Virtualization in Linux KVM + QEMU



Similar documents
Uses for Virtual Machines. Virtual Machines. There are several uses for virtual machines:

Nested Virtualization

Introduction to Virtualization & KVM

CS5460: Operating Systems. Lecture: Virtualization 2. Anton Burtsev March, 2013

Kernel Virtual Machine

Full and Para Virtualization

Virtual Machines. COMP 3361: Operating Systems I Winter

Brian Walters VMware Virtual Platform. Linux J. 1999, 63es, Article 6 (July 1999).

Intel Virtualization Technology Overview Yu Ke

Hardware Based Virtualization Technologies. Elsie Wahlig Platform Software Architect

Nested Virtualization

CS 695 Topics in Virtualization and Cloud Computing. More Introduction + Processor Virtualization

Using Linux as Hypervisor with KVM

Virtualization. Types of Interfaces

Virtualization. ! Physical Hardware. ! Software. ! Isolation. ! Software Abstraction. ! Encapsulation. ! Virtualization Layer. !

KVM Architecture Overview

Cloud^H^H^H^H^H Virtualization Technology. Andrew Jones May 2011

Intel s Virtualization Extensions (VT-x) So you want to build a hypervisor?

kvm: Kernel-based Virtual Machine for Linux

Virtual Systems with qemu

Knut Omang Ifi/Oracle 19 Oct, 2015

The Turtles Project: Design and Implementation of Nested Virtualization

Virtualization VMware Inc. All rights reserved

Taming Hosted Hypervisors with (Mostly) Deprivileged Execution

Virtualization. Pradipta De

Virtual machines and operating systems

Virtualization Technology. Zhiming Shen

KVM: Kernel-based Virtualization Driver

Nested Virtualization

RPM Brotherhood: KVM VIRTUALIZATION TECHNOLOGY

Jukka Ylitalo Tik TKK, April 24, 2006

Distributed Systems. Virtualization. Paul Krzyzanowski

Clouds, Virtualization and Security or Look Out Below

VMware and CPU Virtualization Technology. Jack Lo Sr. Director, R&D

Virtualization. Jia Rao Assistant Professor in CS

Virtual Machines. Virtual Machine (VM) Examples of Virtual Systems. Types of Virtual Machine

Architecture of the Kernel-based Virtual Machine (KVM)

Chapter 5 Cloud Resource Virtualization

Microkernels, virtualization, exokernels. Tutorial 1 CSC469

Hypervisors and Virtual Machines

Hardware Assisted Virtualization Intel Virtualization Technology

kvm: the Linux Virtual Machine Monitor

Virtualization. Explain how today s virtualization movement is actually a reinvention

A Hypervisor IPS based on Hardware assisted Virtualization Technology

Virtualization. Dr. Yingwu Zhu

Deploy and test ovirt using nested virtualization environments. Mark Wu

Virtualization for Cloud Computing

Intel Vanderpool Technology for IA-32 Processors (VT-x) Preliminary Specification

Hybrid Virtualization The Next Generation of XenLinux

Performance tuning Xen

BHyVe. BSD Hypervisor. Neel Natu Peter Grehan

Virtualization. Jukka K. Nurminen

Virtualization Technology. Zhonghong Ou Data Communications Software Lab, Aalto University

MODULE 3 VIRTUALIZED DATA CENTER COMPUTE

Basics of Virtualisation

9/26/2011. What is Virtualization? What are the different types of virtualization.

Advanced Computer Networks. Network I/O Virtualization

Virtualization. Guillaume Urvoy-Keller UNS

Outline. Outline. Why virtualization? Why not virtualize? Today s data center. Cloud computing. Virtual resource pool

x86 Virtualization Hardware Support Pla$orm Virtualiza.on

Hypervisor Memory Forensics

Intel Virtualization Technology Processor Virtualization Extensions and Intel Trusted execution Technology

Virtualization benefits Introduction to XenSource How Xen is changing virtualization The Xen hypervisor architecture Xen paravirtualization

Introduction to Virtual Machines

Intel Virtualization Technology Specification for the IA-32 Intel Architecture

EXPLORING LINUX KERNEL: THE EASY WAY!

Virtualization Technologies

Configuring Virtual Blades

COS 318: Operating Systems. Virtual Machine Monitors

The Microsoft Windows Hypervisor High Level Architecture

Chapter 16: Virtual Machines. Operating System Concepts 9 th Edition

Virtualization and Other Tricks.

2972 Linux Options and Best Practices for Scaleup Virtualization

Cloud Computing CS

ARM VIRTUALIZATION FOR THE MASSES. Christoffer Dall

matasano Hardware Virtualization Rootkits Dino A. Dai Zovi

Enterprise-Class Virtualization with Open Source Technologies

A quantitative comparison between xen and kvm

KVM: A Hypervisor for All Seasons. Avi Kivity avi@qumranet.com

CS 61C: Great Ideas in Computer Architecture Virtual Memory Cont.

Practical Applications of Virtualization. Mike Phillips IAP 2008 SIPB IAP Series

Using SmartOS as a Hypervisor

TimeIPS Server. IPS256T Virtual Machine. Installation Guide

Understanding Full Virtualization, Paravirtualization, and Hardware Assist. Introduction...1 Overview of x86 Virtualization...2 CPU Virtualization...

Tracing Kernel Virtual Machines (KVM) and Linux Containers (LXC)

Intel Virtualization Technology and Extensions

KVM Security Comparison

VMware Server 2.0 Essentials. Virtualization Deployment and Management

Attacking Hypervisors via Firmware and Hardware

KVM on S390x. Revolutionizing the Mainframe

Virtualization. Clothing the Wolf in Wool. Wednesday, April 17, 13

ELI: Bare-Metal Performance for I/O Virtualization

Hypervisors. Introduction. Introduction. Introduction. Introduction. Introduction. Credits:

Outline: Operating Systems

KVM KERNEL BASED VIRTUAL MACHINE

Transcription:

CS695 Topics in Virtualization and Cloud Computing KVM + QEMU Senthil, Puru, Prateek and Shashank 1

Topics covered KVM and QEMU Architecture VTx support CPU virtualization in KMV Memory virtualization techniques shadow page table EPT/NPT page table IO virtualization in QEMU KVM and QEMU usage Virtual disk creation Creating virtual machines Copy-on-write disks

KVM + QEMU - Architecture

KVM + QEMU Architecture Need for hardware support less privileged rings ( rings > 0) are not sufficient to run guest sensitive unprivileged instructions Should go for Binary instrumentation/ patching paravirtualization VTx and AMD-V 4 different address spaces - host physical, host virtual, guest physical and guest virtual 4

X86 VTx support Guest Guest Guest 1 Guest 0 Host Host Host 1 Host 0 Guest User Guest Kernel QEMU KVM Communication Channels VMCS + vmx KVM /dev/kvm QEMU KVM Guest 0- instructions 5

X86 VMX Instructions Controls transition between VMX root and VMX nonroot VMX root -> VMX non-root - VM Entry VMX non-root -> VMX root VM Exit Example instructions VMXON enables VMX Operation VMXOFF disable VMX Operation VMLAUNCH VM Entry VMRESUME VM Entry VMREAD read from VMCS VMWRITE write to VMCS 6

X86 VMCS Structure Controls CPU behavior in VTx non root mode 4KB structure configured by KVM Also provides space for guest and host register save & restore Example fields HLT exiting if 1 VM Exit on HLT CR-load exiting if 1 VM Exit on CR load Exception Bitmap if bit i is set, VM Exits on exception i VM-entry interrupt To deliver interrupts during VM Entry 7

CPU Virtualization in KVM VCPUs QEMU Process threads Process scheduling Physical CPUs VM 1 Thead 0 VM 1 Thead 1 VM Thead 0 VM Thead 1 VM Thead VM Thead Guest OS 1 Guest OS 8

Shadow page table Problems in memory virtualization levels of indirection, MMU can translate 1 level GVA -> GPA -> HVA -> HPA must be achieved Solution1 - Shadow page table Contains GVA -> HPA. MMU will use this instead of guest page table One shadow table for each guest page table Incrementally build 9

Shadow page table building GVA 1 4 GPA 1 4 5 6 7 8 9 10 HVA (QEMU) 1 4 5 6 7 8 9 10 11 1 1 14 15 HPA A B C D E F G H I J K L M N O P Q R S Guest wants to create a linear mapping for a process Guest does pure demand QEMU knows GPA-> HVA mapping ( malloc()) 10

GPA HVA (QEMU) HPA Shadow page table building 1 4 5 6 7 8 9 10 1 4 5 6 7 8 9 10 11 1 1 14 15 A B C D E F G H I J K L M N O P Q R S Shadow page table GVA -> HPA Guest process page table GVA -> GPA ( Read only) 1 1 Step 1: Guest tries to map GVA 1 -> GPA 1 Page fault (because of RO) causes VM exit KVM sees GPA as 1 by instruction emulation /using register contents 11

GPA HVA (QEMU) HPA Shadow page table building 1 4 5 6 7 8 9 10 1 4 5 6 7 8 9 10 11 1 1 14 15 A B C D E F G H I J K L M N O P Q R S Shadow page table GVA -> HPA Guest process page table GVA -> GPA ( Read only) 1 1 Step : GPA 1 -> HVA 1 is obtained This possible because GPA -> HVA mapping is known to QEMU/KMV 1

GPA HVA (QEMU) HPA Shadow page table building 1 4 5 6 7 8 9 10 1 4 5 6 7 8 9 10 11 1 1 14 15 A B C D E F G H I J K L M N O P Q R S Shadow page table GVA -> HPA Guest process page table GVA -> GPA ( Read only) 1 1 Step : KVM does lookup on QEMU s page table to find out HVA->HPA KVM finds out HVA 1 -> HPA C 1

GPA HVA (QEMU) HPA Shadow page table building 1 4 5 6 7 8 9 10 1 4 5 6 7 8 9 10 11 1 1 14 15 A B C D E F G H I J K L M N O P Q R S Shadow page table GVA -> HPA Guest process page table GVA -> GPA ( Read only) 1 C 1 1 Step 4: KVM updates shadow page table with GVA 1 -> HPA C KVM also updates guest page table by emulating the instruction which tried to map GVA 1 -> GPA 1 GVA -> GPA -> HVA -> HPA is done 14

GPA HVA (QEMU) HPA Shadow page table building 1 4 5 6 7 8 9 10 1 4 5 6 7 8 9 10 11 1 1 14 15 A B C D E F G H I J K L M N O P Q R S Shadow page table GVA -> HPA Guest process page table GVA -> GPA ( Read only) 1 C B E 1 1 Step 5: Similarly other entries are update as and when page fault happens GVA -> GPA -> HVA -> HPA B GVA -> GPA -> HVA -> HPA E 15

Shadow page table (additional info) Additional questions How to identify pages used in page tables to write protect them? How to remove write protection when a page is not used in any page table? What happens when pure demand paging is not used i.e. ( guest builds the page table before loading on CR )? Advantages No guest OS change is required Any OS can be guest No special hardware is required Disadvantages For every page table used by guest.. Shadow version has to be kept. Shadow page table must be consistent with guest and host Caching shadow page table needs considerable memory 16

EPT/NPT Basics Solution EPT/NPT hardware support EPT/NTP enabled MMU can translate two levels of indirection. First one from GVA -> GPA and second from GPA -> HPA GVA -> GPA is maintained by guest and GPA -> HPA is maintained by KVM KVM does GPA -> HVA translation - because malloc() MMU walks EPT table for every GPA 17

EPT/NPT Building GVA 1 4 GPA 1 4 5 6 7 8 9 10 HVA (QEMU) 1 4 5 6 7 8 9 10 11 1 1 14 15 HPA A B C D E F G H I J K L M N O P Q R S EPT solution consists of two tables GPA -> HPA - EPT table GVA -> GPA guest process page table MMU accesses these two tables to complete address translation Guest has full rights on its page table 18

GPA HVA (QEMU) HPA EPT/NPT Building 1 4 5 6 7 8 9 10 1 4 5 6 7 8 9 10 11 1 1 14 15 A B C D E F G H I J K L M N O P Q R S EPT/NTP page table GPA -> HPA Guest process page table GVA -> GPA 1 1 1 Step 1: Guest tries to access linear address 1 Will not cause page fault, because VMCS is configured not to cause page fault VM exits Guest OS will handle this and fill GVA 1 -> GPA 1 19

GPA HVA (QEMU) HPA EPT/NPT Building 1 4 5 6 7 8 9 10 1 4 5 6 7 8 9 10 11 1 1 14 15 A B C D E F G H I J K L M N O P Q R S EPT/NTP page table GPA -> HPA Guest process page table GVA -> GPA 1 1 1 Step : When guest access the linear memory address GVA 1, the hardware gets GPA as 1 using guest page table And tries to figure out corresponding HPA using EPT table and cause EPT violation 0

GPA HVA (QEMU) HPA EPT/NPT Building 1 4 5 6 7 8 9 10 1 4 5 6 7 8 9 10 11 1 1 14 15 A B C D E F G H I J K L M N O P Q R S EPT/NTP page table GPA -> HPA Guest process page table GVA -> GPA 1 C 1 1 Step : EPT violation occurred because corresponding HPA is not mapped. KVM will fill this entry using GPA 1-> HVA 1 -> HPA C 1

GPA HVA (QEMU) HPA EPT/NPT Building 1 4 5 6 7 8 9 10 1 4 5 6 7 8 9 10 11 1 1 14 15 A B C D E F G H I J K L M N O P Q R S EPT/NTP page table GPA -> HPA Guest process page table GVA -> GPA 1 C 1 1 Step 4: When reexcutes the faulted instruction, MMU will walk two table in nested loop to figure out GVA -> HPA i.e. for every guest physical address encountered by MMU, EPT walk will be done to find GPA -> HPA

GPA HVA (QEMU) HPA EPT/NPT Building 1 4 5 6 7 8 9 10 1 4 5 6 7 8 9 10 11 1 1 14 15 A B C D E F G H I J K L M N O P Q R S EPT/NTP page table GPA -> HPA Guest process page table GVA -> GPA 1 C 1 1 Step 5: Similarly every EPT table entry is filled after EPT violation for the corresponding GPA since EPT stores GPA -> HPA, the size of EPT table = guest RAM size

EPT/NPT (additional info) Advantages No guest OS change is required Any OS can be guest Need not to trap page fault updates Size of EPT table is proportional to guest memory size Disadvantages TLB miss would cause considerable overhead in translation Ex. One level page table would cause page table memory access For m level EPT and n level guest page table, EPT solution access mn + m + n page references Hardware support required 4

EPT/NPT Scenario of TLB Miss EPT Table GPA -> HPA 1 F 4 E 7 EPT= HPA D 1 C HPA F 4 B 5 6 A Page Table GVA -> GPA 1 4 4 5 7 CR = GPA 6 HPA E Resolve GVA 1 -> GPA 4 : GPA 6 -> HPA A requires access to HPA D, HPA E = Read GVA 1 = GPA 4 from HPA A = 1 GPA 4 -> HPA B requires access to HPA D, HPA E = Read GVA 1 = GPA from HPA B = 1 GPA -> HPA C requires access to HPA D, HPA F = Total 8 access 1 GPA 4 4 5 6 GPA 5 5

QEMU IO device emulation Basic IO devices are emulated by QEMU Example - Keyboard, Mouse, Display, hard drive and NIC Device access from guest is trapped ( both PIO and MMIO) by KVM KVM passes control to QEMU to handle IO QEMU injects interrupts from devices through KVM To emulate DMA, QEMU uses threads to do the IO 6

QEMU IO device emulation- Example Assume disk drive having following interface register x to specify sector number register y to receive commands (1 read, 0 write) register z to read/write data When guest wants to read sector number 10 1. Guest does PIO 10 on register x. QEMU saves this information in device state. Guest issues read command using PIO 1 on register y 4. QEMU maps sector 10 on virtual disk file and reads necessary content 5. issues an interrupt 6. Guest reads 51 bytes from register z using PIO 7. QEMU gives the data it read from VD 7

Interrupt handlers reads data QEMU IO device emulation- Example PIO 10 => X PIO 1 => Y Saves X=10 in disk state Virtual Interrupt QEMU reads sector 10 from VD file PIO read from Z Data read from file Guest KVM QEMU 8

KVM + QEMU Usage Prerequisites Linux Distribution Install QEMU packages yum install qemu* - in Fedora or rpm based apt-get install qemu* - in Ubuntu or deb based Ensure hardware support grep vmx /proc/cpuinfo Download some ISO image from IITB FTP server Virtual disk (VD) A file at host which acts as disk drive for virtual machine VD can be a file or a raw partition Create VD qemu-img create f raw disk1.img 10G Creates disk of 10G in raw format ( sectors are directly mapped to file offset) 9

KVM + QEMU Usage cont. Creating first VM To boot from ISO qemu-kvm m 1G hda disk1.img cdrom F10- i686-live.iso -m says size of RAM, smp for number of processors -hda primary hard disk, -cdrom for CD rom for guest After this command, you will get the standard installation wizard running in guest. Easy! Once installed on disk1.img, qemu-kvm m 1G disk1.img will boot the guest from disk1.img directly QEMU+KVM = host user space process Every virtual machine runs as user space process on the host Can be monitored using standard Linux tools ps, top and kill etc One thread for every CPU in the guest (use smp option) 0

KVM + QEMU Usage Cont. Copy-On-Write VD COW disks versioning / snapshots at disk levels can choose any version without loosing consistency Equivalent to disk level backups Supported only on cow, qcow, qcow Create COW disk qemu-img create f qcow disk.cow 10G Install the VM Take a snapshot qemu-img snapshot c s1 disk.cow Start the VM, create and delete few files inside the VM and shutdown Take another snapshot s Now to rollback to s1, qemu-img snapshot a s1 disk.cow 1

References 1. Intel 64 and IA- Architectures Software Developer s Manual. http://www.linux-kvm.org/page/documents. http://wiki.qemu.org/manual 4. Accelerating Two-Dimensional PageWalks for Virtualized Systems