Virtual Machine Monitors: Disco and Xen. Robert Grimm New York University

From this document you will learn the answers to the following questions:

What does virtualization introduce?

What is the virtual subnet across all virtual machines?

What does the Guest operating system use to map to?

Similar documents
Microkernels, virtualization, exokernels. Tutorial 1 CSC469

COS 318: Operating Systems. Virtual Machine Monitors

Virtual Machine Monitors. Dr. Marc E. Fiuczynski Research Scholar Princeton University

COS 318: Operating Systems. Virtual Machine Monitors

Chapter 5 Cloud Resource Virtualization

Full and Para Virtualization

Disco: Running Commodity Operating Systems on Scalable Multiprocessors

Virtual Machines. COMP 3361: Operating Systems I Winter

Virtualization. Explain how today s virtualization movement is actually a reinvention

Xen and the Art of. Virtualization. Ian Pratt

Virtualization Technology. Zhiming Shen

Virtualization. Jia Rao Assistant Professor in CS

Virtual machines and operating systems

Virtualization. Dr. Yingwu Zhu

Hardware Based Virtualization Technologies. Elsie Wahlig Platform Software Architect

Virtualization. Pradipta De

The Microsoft Windows Hypervisor High Level Architecture

Hardware accelerated Virtualization in the ARM Cortex Processors

The Xen of Virtualization

Nested Virtualization

Virtualization. Types of Interfaces

Xen and the Art of Virtualization

Optimizing Network Virtualization in Xen

INFO5010 Advanced Topics in IT: Cloud Computing

Hybrid Virtualization The Next Generation of XenLinux

Intel s Virtualization Extensions (VT-x) So you want to build a hypervisor?

OS Virtualization. CSC 456 Final Presentation Brandon D. Shroyer

kvm: Kernel-based Virtual Machine for Linux

An Introduction to Virtual Machines Implementation and Applications

CS5460: Operating Systems. Lecture: Virtualization 2. Anton Burtsev March, 2013

Knut Omang Ifi/Oracle 19 Oct, 2015

Introduction to Virtual Machines

Xen and the Art of Virtualization

Chapter 14 Virtual Machines

Optimizing Network Virtualization in Xen

Bridging the Gap between Software and Hardware Techniques for I/O Virtualization

CS 695 Topics in Virtualization and Cloud Computing. More Introduction + Processor Virtualization

Presentation of Diagnosing performance overheads in the Xen virtual machine environment

Distributed Systems. Virtualization. Paul Krzyzanowski

Virtualization for Cloud Computing

KVM: A Hypervisor for All Seasons. Avi Kivity avi@qumranet.com

Using Linux as Hypervisor with KVM

Basics of Virtualisation

Virtualization. ! Physical Hardware. ! Software. ! Isolation. ! Software Abstraction. ! Encapsulation. ! Virtualization Layer. !

Uses for Virtual Machines. Virtual Machines. There are several uses for virtual machines:

Virtualization. P. A. Wilsey. The text highlighted in green in these slides contain external hyperlinks. 1 / 16

Virtualization in the ARMv7 Architecture Lecture for the Embedded Systems Course CSD, University of Crete (May 20, 2014)

Xen and XenServer Storage Performance

Virtualization. Jukka K. Nurminen

Models For Modeling and Measuring the Performance of a Xen Virtual Server

Xen Live Migration. Networks and Distributed Systems Seminar, 24 April Matúš Harvan Xen Live Migration 1

Computing in High- Energy-Physics: How Virtualization meets the Grid

Outline. Outline. Why virtualization? Why not virtualize? Today s data center. Cloud computing. Virtual resource pool

Cellular Disco: Resource Management Using Virtual Clusters on Shared-Memory Multiprocessors

Intel Virtualization Technology Overview Yu Ke

Performance tuning Xen

Virtualization in Linux KVM + QEMU

Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

nanohub.org An Overview of Virtualization Techniques

Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors

Cloud Computing CS

Virtual Scalability: Charting the Performance of Linux in a Virtual World

x86 Virtualization Hardware Support Pla$orm Virtualiza.on

Architecture of the Kernel-based Virtual Machine (KVM)

kvm: the Linux Virtual Machine Monitor

KVM: Kernel-based Virtualization Driver

Virtualization Technologies

2972 Linux Options and Best Practices for Scaleup Virtualization

An Implementation Of Multiprocessor Linux

Virtualization VMware Inc. All rights reserved

Memory Management under Linux: Issues in Linux VM development

Virtualization. P. A. Wilsey. The text highlighted in green in these slides contain external hyperlinks. 1 / 16

Basics in Energy Information (& Communication) Systems Virtualization / Virtual Machines

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality

How do Users and Processes interact with the Operating System? Services for Processes. OS Structure with Services. Services for the OS Itself

System Virtual Machines

COM 444 Cloud Computing

How To Understand The Power Of A Virtual Machine Monitor (Vm) In A Linux Computer System (Or A Virtualized Computer)

Security Overview of the Integrity Virtual Machines Architecture

Chapter 12: Multiprocessor Architectures. Lesson 09: Cache Coherence Problem and Cache synchronization solutions Part 1

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

HPC performance applications on Virtual Clusters

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

COS 318: Operating Systems. Virtual Memory and Address Translation

Cloud Computing #6 - Virtualization

VMware Server 2.0 Essentials. Virtualization Deployment and Management

BridgeWays Management Pack for VMware ESX

Operating System Impact on SMT Architecture

Database Virtualization

MODULE 3 VIRTUALIZED DATA CENTER COMPUTE

OSes. Arvind Seshadri Mark Luk Ning Qu Adrian Perrig SOSP2007. CyLab of CMU. SecVisor: A Tiny Hypervisor to Provide

GUEST OPERATING SYSTEM BASED PERFORMANCE COMPARISON OF VMWARE AND XEN HYPERVISOR

Virtualization with Windows

VMware and CPU Virtualization Technology. Jack Lo Sr. Director, R&D

KVM & Memory Management Updates

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1

Enhancing the Performance of Live Migration of Virtual Machine s with WSClock Replacement Algorithm

Hypervisors. Introduction. Introduction. Introduction. Introduction. Introduction. Credits:

opensm2 Enterprise Performance Monitoring December 2010 Copyright 2010 Fujitsu Technology Solutions

Chapter 16: Virtual Machines. Operating System Concepts 9 th Edition

Transcription:

Virtual Machine Monitors: Disco and Xen Robert Grimm New York University

The Three Questions What is the problem? What is new or different? What are the contributions and limitations?

Disco Overview

Background: ccnuma Cache-coherent non-uniform memory architecture Multi-processor with high-performance interconnect Non-uniform memory Global address space But memory distributed amongst processing elements Cache-coherence How to ensure consistency between processor caches? Solutions: Bus snooping, directory Targeted system: FLASH, Stanford's own ccnuma

The Challenge Commodity OS's not well-suited for ccnuma Do not scale Lock contention, memory architecture Do not isolate/contain faults More processors more failures Customized operating systems Take time to build, lag hardware Cost a lot of money

The Disco Solution Add a virtual machine monitor (VMM) Commodity OS's run in their own virtual machines (VMs) Communicate through distributed protocols VMM uses global policies to manage resources Moves memory between VMs to avoid paging Schedules virtual processors to balance load PE PE PE PE PE PE PE PE nterconnect, ccnuma Multiprocessor

Virtual Machine Challenges Overheads nstruction execution, exception processing, /O Memory Code and data of hosted operating systems Replicated buffer caches Resource management Lack of information dle loop, lock busy-waiting Page usage Communication and sharing Not really a problem anymore b/c of distributed protocols

Disco in Detail

nterface MPS R10000 processor All instructions, the MMU, trap architecture Memory-based interface to VMM Enabling/disabling interrupts, accessing privileged registers Physical memory Contiguous, starting at address 0 /O devices Virtual devices exclusive to VM Physical devices multiplexed by Disco dealized interface to SCS disks and network Virtual subnet across all virtual machines

Virtual CPUs Three modes Kernel mode: Disco Provides full access to hardware Supervisor mode: Guest operating system Provides access to protected memory segment User mode: Applications Emulation by direct execution Not for privileged instructions, access to physical memory, and /O devices Emulated in VMM with per VM data structures (registers, TLB) Syscalls, page faults handled by guest OS's trap handlers

Virtual Physical Memory Adds level of translation: physical-to-machine Performed in software-reloaded TLB Based on pmap data structure: entry per physical page Requires changes in rix memory layout Flushes TLB when scheduling different virtual CPUs MPS TLB is tagged with address space D ncreases number of misses, but adds software TLB Guest operating system now mapped through TLB TLB is flushed on virtual CPU switches Virtualization introduces overhead

NUMA Memory Management Two optimizations Heavily accessed pages relocated to using node Read-only shared pages replicated across nodes Based on cache miss counting facility of FLASH Supported by memmap data structure For each machine page, points to physical addr's, VMs, copies ;iiozci- - ----------~ r---------------~ Node 1,.j3 VCprTrr V! t! VCPU.. \..---------------J +-- f/. \, zl,i--&', 1 1,;.---------------d Krtual Pages Physical Pages Machine Pages

Virtual /O Devices Specialized interface for common devices Special drivers for guest OS's: one trap per operation DMA requests are modified From physical to machine memory Copy-on-write disks Page with same contents requires only one copy Physical Memory of VM 1 Physical Memory of VM 2 Code Data Buffer Cache -Code Data Buffer Cache! Private Pages Data Code Buffer Cache.Data Machine Memory cl 7 Shared Pages Free Pages

FGURE 4. kansparent Sharing of Pages Over NFS. This figure illustrates the case when the NFS reply, to a read request, i a data page. (1) The monitor s networking device remaps the data page from the source s machine address space to the destinatio The monitor remaps the data page from the driver s mbuf to the clients buffer cache. This remap is initiated by the operating Virtual Network nterface ssue: Different VMs communicate through standard distributed protocols (here, NFS) May lead to page duplication in memory Solution: virtual subnet Ethernet-like addresses, no maximum transfer unit Read-only mapping instead of copying Supports scatter/gather WS Server NFS Client Wrtual Pages What about NUMA? Physical Pages Machine Pages

rix 5.3 on Disco Changed memory layout to make all pages mapped Added device drivers to special /O devices Disco's drivers are the same as those in rix Patched HAL to use memory loads/stores instead of privileged instructions Added new calls Request zeroed-out memory pages, inform about freed page Changed mbuf management to be page-aligned Changed bcopy to use remap (with copy-on-write)

Disco Evaluation

Experimental Methodology FLASH machine "unfortunately not yet available" Use SimOS Models hardware in enough detail to run unmodified code Supports different levels of accuracy, checkpoint/restore Workloads pmake, engineering, scientific computing, database

Execution Overhead Uniprocessor configuration comparing rix and Disco Disco overhead between 3% and 16% (!) Mostly due to trap emulation and TLB reload misses m 1601-116 - ff dle DSCO Kt3UhSl Use1-1 70 l-. YP-. Bul 40t Pmake Engineering Raytrace Database RX

5.2 Execution Overheads To evaluate the overheads of running on Disco, we ran each workload on a uniprocessor, once using RX directly on the simulated hardware, and once using Disco running RX in a single virtual machine on the same hardware. Figure 5 shows this comparison. Overall, the overhead of virtualization ranges from a modest 3% for These slowdowns can be explained by the common path to enter and leave the kernel for all page faults, system calls and interrupts. This path includes many privileged instructions that must be individually emulated by Disco. A restructuring of the HAL of RX could remove most of this overhead. For example, RX uses the same TLB wired entry for different purposes in user mode and in the kernel. The path on each kernel entry and exit contains many Diggin' Deeper Operating System Service % of Avg Time $f System per Slowdown z.g -E:-2 : 3 Time nvocation on z 3 3? 0 m-1 Disco sz w SwE Re lathe Execution Time on Disco z% iii? 33 3 co-= k s;lg? E2 lea E; :E g E& *a m &S iz: 92 i! DEMAND-ZERO a 30% 21 cls 1.42 0.43 0.21 0.16 0.47 0.16 QUCK-FAULT -~--r 0% 5N-s 3.17 1.27 1 0.80 1 0.56 1 0.00 1 0.53 open 9% 42~ 1 1.63 11 1.16 1 0.08 1 0.06 1 0.02 1 0.30 UTLB-MSS 7% 0.035 ps 1 1.35 0.07 1.22 1 0.05 1 0.00 1 0.02 write read execve 6% 12w 1 2.14 11 1.01 1 0.24 1 0.21 1 0.31 1 0.17 6% 23~ 1 1.53 11 1.10 0.13 0.09 0.01 1 0.20 6% 1 437ps 1 1.60 11 0.97 1 0.03 1 0.05 1 0.17 1 Table 2. Service Breakdown for the Pmake workload. This table breaks down the overheads of the virtualization for the seven top kernel services of the pmake workload. DEMAND-ZERO is demand zero page fault, QUCK-FAULT, is slow TLB refill, UTLB-MSS is a fast TLB What refill. Other does than the this UTLB-MSS table service, tell the RX us? and RX on Disco configurations request the same number of services of each category. For each service, the execution time is expressed as a fraction of the RX time and separates the time spend in the kernel, emulating What TLB writes is the and privileged problem instructions, with performing entering/exiting monitor call and emulating the unmapped kernel? segments. The slowdown column is the sum of the relative execution times and measures the average slowdown for each service. What is the problem with placing OS in mapped memory? 0.40

Memory Overhead Workload: 8 instances of pmake Memory partitioned across virtual machines NFS configuration uses more memory than available. YP-. Bulfer-Cache 77 70 l- r-l - 40t 36 3 0 JR v M ce Database RX 1VM 2VMs 4VMs 8VMs 8VMstNFS

Scalability rix: High synchronization and memory overheads memlock: spinlock for memory management data structures Disco: Partitioning reduces overheads What about RADX experiment? B E 160 9 5 0 140 t w 0 120 w ii: - dle m OSCO SVllC ii 09 60 60 20 pmake 8VM 8VMlnfs SplashOS RADX 0 Engineerin 16% 78% RlX DSCO

Page Migration and Replication 20 SplashOS RADX 0 16% 78% 100% 6% 76% 100% RlX DSCO UMA RX DSCO UMA Engineering Raytrace e perfort-processor e running y on top of FGURE 8. Performance Benefits of Page Migration and Replication. For each workload, the tigure compares the execution time of RX on NUMA, RX on Disco on NUMA with page migration and replication, and RX on an bus-based UMA, The What does this figure tell us?

Xen in (Some) Detail

Actually, the Big Picture First Control Plane Software GuestOS (XenoLinux) Xeno-Aware Device Drivers Domain0 control interface virtual x86 CPU User Software GuestOS (XenoLinux) Xeno-Aware Device Drivers virtual phy mem User Software GuestOS (XenoBSD) Xeno-Aware Device Drivers virtual network H/W (SMP x86, phy mem, enet, SCS/DE) How does this differ from Disco? User Software GuestOS (XenoXP) Xeno-Aware Device Drivers virtual blockdev Figure 1: The structure of a machine running the Xen hypervisor, hosting a number of different guest operating systems, including Domain0 running control software in a XenoLinux environment. X E N read-only prevent so This co current st level man of admini server: cu filters and packet an interfaces of higheristrative p 3. DE n this that make Xen and g rent discu

Xen by Comparison Three main differences Less complete virtualization Domain0 to initialize/manage VMs, incl. to set policies Strong performance isolation Primary contribution nterface that is pretty close to hardware and enables low-overhead virtualization Need to change more OS code than in Disco Yet still not too much: 3000 lines for Linux

CPU nterface Xen runs in ring 0 Guest OS's run in the otherwise unused ring 1 Privileged instructions need to be processed by Xen Though, x86 makes life a little difficult how? Fast exception handlers do not require Xen interaction Must execute outside ring 0 what does Xen need to do?

Memory nterface x86 has hardware-accessed page tables and TLB Guest OS's responsible for managing page tables Provide machine memory to Xen (from guest's reservation) Have direct read-only access Defer to Xen for performing (batched) updates

Device Driver nterface Shockingly, idealized hardware abstraction Virtual firewall-router Virtual network interfaces Virtual block devices

Figure 2: The structure of asynchronous /O rings, which are VM/VMM Communication Control transfer Hypercalls: synchronous software traps to VMM Events: asynchronous, possibly batched upcalls to VMs Data transfer through /O rings Separate descriptors from actual data Zero-copy transfer during device /O Support batching and re-ordering Request Consumer Private pointer in Xen Response Producer Shared pointer updated by Xen Request Producer Shared pointer updated by guest OS Response Consumer Private pointer in guest OS Request queue - Descriptors queued by the VM but not yet accepted by Xen Outstanding descriptors - Descriptor slots awaiting a response from Xen Response queue - Descriptors returned by Xen in response to serviced requests Unused descriptors

Memory Management Virtual memory Guest OS's manage page tables (not shadows) Expose names and allocation Validated by types and reference counts Page directory/table, local/global descriptor table, writable Page directory and tables pinned.e., they cannot be swapped why? Physical memory Controlled through balloon driver in guest OS Requests and pins pages, which are then returned to VMM May be mapped into hardware memory why? Xen publishes machine-to-physical mapping

Performance solation Four "domains" PostgresSQL SPECweb99 Disk bandwidth hog (sustained dd) Fork bomb Results: Xen sees only 2-4% loss, Linux locks up What is the key ingredient?

What Do You Think?