EE282 Lecture 11 Virtualization & Datacenter Introduction



Similar documents
Intel Virtualization Technology Overview Yu Ke

Nested Virtualization

Hardware Based Virtualization Technologies. Elsie Wahlig Platform Software Architect

Virtualization Technology. Zhiming Shen

Hybrid Virtualization The Next Generation of XenLinux

Intel s Virtualization Extensions (VT-x) So you want to build a hypervisor?

Full and Para Virtualization

Virtual Machines. COMP 3361: Operating Systems I Winter

Virtualization. ! Physical Hardware. ! Software. ! Isolation. ! Software Abstraction. ! Encapsulation. ! Virtualization Layer. !

Chapter 5 Cloud Resource Virtualization

Intel Virtualization Technology Processor Virtualization Extensions and Intel Trusted execution Technology

Virtualization. Dr. Yingwu Zhu

Uses for Virtual Machines. Virtual Machines. There are several uses for virtual machines:

Virtualization. Jia Rao Assistant Professor in CS

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

Virtual Machine Monitors. Dr. Marc E. Fiuczynski Research Scholar Princeton University

VMware and CPU Virtualization Technology. Jack Lo Sr. Director, R&D

x86 Virtualization Hardware Support Pla$orm Virtualiza.on

Knut Omang Ifi/Oracle 19 Oct, 2015

COS 318: Operating Systems. Virtual Machine Monitors

Virtualization. Types of Interfaces

Virtualization. Jukka K. Nurminen

Basics in Energy Information (& Communication) Systems Virtualization / Virtual Machines

CS 695 Topics in Virtualization and Cloud Computing. More Introduction + Processor Virtualization

Intel Virtualization Technology

Virtual machines and operating systems

Enabling Intel Virtualization Technology Features and Benefits

WHITE PAPER Mainstreaming Server Virtualization: The Intel Approach

Introduction to Virtual Machines

Jukka Ylitalo Tik TKK, April 24, 2006

Data Centers and Cloud Computing

PCI-SIG SR-IOV Primer. An Introduction to SR-IOV Technology Intel LAN Access Division

Chapter 16: Virtual Machines. Operating System Concepts 9 th Edition

Hardware virtualization technology and its security

Intel Virtualization Technology (VT) in Converged Application Platforms

Virtualization in Linux KVM + QEMU

Virtualization. Pradipta De

CS5460: Operating Systems. Lecture: Virtualization 2. Anton Burtsev March, 2013

Understanding Full Virtualization, Paravirtualization, and Hardware Assist. Introduction...1 Overview of x86 Virtualization...2 CPU Virtualization...

Virtualization VMware Inc. All rights reserved

Microkernels, virtualization, exokernels. Tutorial 1 CSC469

Advanced Computer Networks. Network I/O Virtualization

IOMMU: A Detailed view

BHyVe. BSD Hypervisor. Neel Natu Peter Grehan

Virtualization. Explain how today s virtualization movement is actually a reinvention

Virtualization. P. A. Wilsey. The text highlighted in green in these slides contain external hyperlinks. 1 / 16

COS 318: Operating Systems. Virtual Machine Monitors

Windows Server 2008 R2 Hyper V. Public FAQ

Intel Virtualization Technology and Extensions

System Virtual Machines

The Microsoft Windows Hypervisor High Level Architecture

Volume 10 Issue 03 Published, August 10, 2006 ISSN X DOI: /itj Virtualization Technology

Windows Server Virtualization & The Windows Hypervisor

Cloud Computing #6 - Virtualization

MODULE 3 VIRTUALIZED DATA CENTER COMPUTE

Distributed Systems. Virtualization. Paul Krzyzanowski

Resource Efficient Computing for Warehouse-scale Datacenters

Virtualization in the ARMv7 Architecture Lecture for the Embedded Systems Course CSD, University of Crete (May 20, 2014)

Intel Virtualization Technology for Directed I/O

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality

Outline. Outline. Why virtualization? Why not virtualize? Today s data center. Cloud computing. Virtual resource pool

Enabling Technologies for Distributed and Cloud Computing

Enabling Technologies for Distributed Computing

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Attacking Hypervisors via Firmware and Hardware

Virtual Machines. Virtualization

Virtualization. Clothing the Wolf in Wool. Wednesday, April 17, 13

Lecture 2 Cloud Computing & Virtualization. Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

GUEST OPERATING SYSTEM BASED PERFORMANCE COMPARISON OF VMWARE AND XEN HYPERVISOR

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

WHITE PAPER. AMD-V Nested Paging. AMD-V Nested Paging. Issue Date: July, 2008 Revision: 1.0. Advanced Micro Devices, Inc.

Compromise-as-a-Service

NoHype: Virtualized Cloud Infrastructure without the Virtualization

Virtual Machines. Virtual Machine (VM) Examples of Virtual Systems. Types of Virtual Machine

Developing a dynamic, real-time IT infrastructure with Red Hat integrated virtualization

The NOVA Microhypervisor

Choices for implementing SMB 3 on non Windows Servers Dilip Naik HvNAS Pty Ltd Australians good at NAS protocols!

VIRTUALIZATION 101. Brainstorm Conference 2013 PRESENTER INTRODUCTIONS

Enterprise-Class Virtualization with Open Source Technologies

INFO5010 Advanced Topics in IT: Cloud Computing

Attacking Hypervisors via Firmware and Hardware

matasano Hardware Virtualization Rootkits Dino A. Dai Zovi

Hypervisors and Virtual Machines

DPDK Summit 2014 DPDK in a Virtual World

RPM Brotherhood: KVM VIRTUALIZATION TECHNOLOGY

Basics of Virtualisation

KVM KERNEL BASED VIRTUAL MACHINE

Chapter 14 Virtual Machines

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

COM 444 Cloud Computing

Using Linux as Hypervisor with KVM

Windows Server 2008 R2 Hyper-V Live Migration

Virtualization for Cloud Computing

A Unified View of Virtual Machines

IOS110. Virtualization 5/27/2014 1

Hypervisors. Introduction. Introduction. Introduction. Introduction. Introduction. Credits:

Transcription:

EE282 Lecture 11 Virtualization & Datacenter Introduction Christos(Kozyrakis( ( h.p://ee282.stanford.edu( EE282$ $Spring$2013$ $Lecture$11$

Announcements Project 1 is due on 5/8 th HW2 is due on 5/20 th Project 2 is pushed to next week 2

Virtualization Summary With%VMs:(Mul<ple( OSes(share(hardware( resources(.% VM 0 App App... Guest OS 0 App... VM 1 App App... Guest OS 1 App Virtual Machine Monitor (VMM) Physical Host Hardware The VMM creates virtual copies of a complete HW system Key properties: partitioning, encapsulation Uses: server consolidation/scaledown, client consolidation, security, Options: hypervisor vs hosted architecture, full vs para-virtualization Essential properties: safety, equivalency, efficiency 3

Reminder: Is my ISA Virtualizable? Basic requirement: at least two execution modes (user & kernel) Extra requirement: all sensitive instructions must be privileged Sensitive instructions: those that change the HW configurations (allocations, mapping, ) or whose outcome depends on HW configuration Notes E.g., write TLB or read processor mode Priviledged instructions: if executed in user mode they trap into kernel mode There can be privileged instructions that are not sensitive Memory accesses must go through a privileged translation stage (e.g. paging) An architecture may provide further support for VMs 4

x86 Virtualization Challenges Ring deprivileging Run guest OS above ring 0 Control privileged state access Virtualization holes Ring compression Non-trapping operations Excessive trapping Software solutions Binary translation Enables optimization! Paravirtualization Ring(3( Ring(1( Ring(0( VM 0( 1( Ring( Deprivileging( Apps( CPUID% Legacy(OS( CLI/STI% Binary(Translator( Binary(Transla<on( Cache( CPU 0( POPF% CPU n( ParaO( 3( Virtualiza<on( VM n( Apps( Modified(OS( VMM( 2( Binary( Transla<on( Processors( 5

HW Support for CPU Virtualization New(CPU(opera<ng(mode( Guest OSes run at intended rings VT(Root(opera<on((for(VMM)( NonORoot(opera<on((for(Guest)( Eliminates(ring(compression( New(transi<ons( Ring 3 Ring 0 VM 0 Apps Windows VM n Apps Linux VM(entry(and(exit( Swaps(registers(and(address( space(in(one(atomic(opera<on( VM(control(structure((VMCS)( VT Root Mode VM Entry VM Exit H/W VM Control Structure (VMCS) VMCS Configuration VMM Memory and I/O Virtualization Configured(by(VMM(soYware( Specifies(guest(OS(state( Controls(when(VM(exits(occur( VT-x CPU 0 CPU n Processors with VT-x (or VT-i) 6

Memory Virtualization Challenges Address Translation Guest OS expects contiguous, zero-based physical memory VM 0 VM n VMM must preserve this illusion Guest Page Tables Guest Page Tables Page-table shadowing Induced VM Exits Remap VMM intercepts paging operations VMM Constructs copy of page tables Shadow Page Tables Overheads VM exits add to execution time Shadow page tables consume significant host memory TLB CPU 0 Memory 7

HW Support for Memory Virtualization Extended page pables (EPT) Map guest physical to host address VM 0 VM n New hardware page-table walker Performance benefit Guest OS can modify its own page tables freely No VM Exits VMM Eliminates VM exits VT-x with EPT I/O Virtualization Memory Savings Shadow page tables required for each guest user process (w/o EPT) EPT Walker A single EPT supports entire VM Extended Page Tables (EPT) CPU 0 8

Extended Page Tables CR3 EPT Base Pointer (EPTP) Guest Linear Address Guest Page Tables Guest Physical Address Extended Page Tables Host Physical Address Regular page tables Map guest-linear to guest-physical (translated again) Can be read and written by guest New EPT page tables under VMM control Map guest-physical to host-physical (accesses memory) Referenced by new EPT base pointer No VM exits due to page faults, INVLPG, CR3 accesses 9

EPT Details CR 3 Guest Linear Address Host Physical Address + EPT Tables Page Directory + EPT Tables Page Table + EPT Tables All guest-physical addresses translated by EPT CR3, PDE, PTE Guest Physical Page Base Address Includes PDPTRs and 64-bit paging structures Page faults take priority over VM exits Guest Physical Address 10

Watch out: Higher TLB Miss Cost Virtual(Addr( CR3( PDE( TLB( PTE( Physical(Addr( With 32-bit Addressing: 2-level Walk 11

Watch out: Higher TLB Miss Cost Virtual(Addr( CR3( PML4( TLB( PDP( PDE( PTE( Physical(Addr( With 64-bit Addressing: 4-level Walk 12

Watch out: Higher TLB Miss Cost Virtual(Addr( Extended%Page%Table%(EPT)%Walk% CR3( EPTP( L4( L3( L2( L1( PML4( TLB( EPTP( L4( L3( L2( L1( PDP( EPTP( L4( L3( L2( L1( PDE( EPTP( L4( L3( L2( L1( PTE( EPTP( L4( L3( L2( L1( Physical(Addr( With Virtualization: 24 Steps in Walk! 13

Discussion How do we solve the problem of expensive translation? 14

I/O Virtualization Challenges Virtual device interface Traps device commands Translates DMA operations Injects virtual interrupts 1 VM 0 I/O Device Emulation Guest Device Driver Para- 2 virtualization VM n Guest Device Driver Software methods I/O device emulation Paravirtualize device interface Challenges Overheads of copying I/O buffers Controlling DMA and interrupts Device Model Device Model VMM Memory Physical Device Driver Storage Network 15

Virtual and Physical Device Interfaces VM 0 VM 0 Guest OS and Apps Guest device driver programs virtual device interface: Device Configuration Accesses I/O-port Accesses Memory-mapped Device Registers Guest OS and Apps Virtual device model proxies device activity back to guest OS: Copying (or translation) of DMA buffers Injection of virtual interrupts Virtual Device Interface and Model Virtual Device Interface and Model Virtual device model proxies accesses to physical device driver: Possible translation of commands Translation of DMA addresses Physical Device Interface and Driver Physical Device Interface and Driver Device driver programs actual physical I/O device: Device configuration I/O-port and MMIO accesses Physical device responds to commands: DMA transactions to host physical memory Physical device interrupts Physical Device 16

Motivation for HW Support Example: DMA incoming network packets to guest space Guest OS buffer or guest user buffer Requires DMAs that understand virtual addresses With virtualization Must offer security in the presence of multiple apps/vms 17

HW Support for I/O Virtualization Common challenges to I/O virtualization Controlling device access to memory (DMA remapping) Controlling device interrupts (interrupt remapping) Applications for DMA Remapping Memory protection and isolation (for reliability/security) Direct assignment of I/O devices to VMs Controlling DMA to I/O buffers within a VM Applications for Interrupt Remapping Isolate interrupt requests to proper VMs Enable VMM to efficiently route interrupt requests Complement CPU support for interrupt virtualization 18

HW Support for IO Virtualization CPU CPU System Bus I/O Controller VT-d (IOMMU) DRAM Integrated Devices PCIe* Root Ports PCI Express South Bridge PCI, LPC, Legacy devices, Defines an architecture for DMA and interrupt remapping Implemented as part of core logic chipset Most functionality is now integrated in CPU chips 19

IOMMU Architecture DMA Requests Device ID Virtual Address Length Fault Generation Bus 255 Bus N Bus 0 Dev 31, Func 7 Dev P, Func 2 Dev P, Func 1 Dev 0, Func 0 4KB Page Tables 4KB Page Frame DMA Remapping Engine Translation Cache Context Cache Device Assignment Structures Device D1 Device D2 Address Translation Structures Address Translation Structures Memory Access with System Physical Address Memory-resident Partitioning & Translation Structures 20

Page Tables Requestor ID DMA Virtual Address 15 8 7 3 2 0 Bus Device Func 63 57 56 48 000000b 000000000b 47 Level-4 table offset 40 39 Level-3 table offset 30 29 21 20 12 Level-2 table offset Level-1 table offset 11 Page Offset 0 Base Device Assignment Tables Level-4 Page Table Example Device Assignment Table Entry specifying 4-level page table Level-3 Page Table Level-2 Page Table Level-1 Page Table 4KB Page 21

Translation Caching H/W caches frequently used remapping structures Avoids overhead of accessing structures in memory Caches support tagging by software-assigned ID IOTLB Caches translations for recently accessed pages IOTLB scaling through PCIe address translation services Allows devices to locally cache translations Context cache Caches device-assignment entries 22

Interrupt Remapping Interrupt request specifies request & originator IDs Remap hardware transforms request into physical interrupt Interrupt-remapping hardware Enforces isolation through use of originator ID Caches frequently used remap structures Software may modify remap for efficient interrupt redirection Applicable to all interrupt sources Legacy interrupts delivered through I/O APICs Message signaled interrupts (MSI, MSI-X) Works with existing device hardware 23

Discussion How do NICs with multiple queues help with virtualization? 24

Deployment Models for I/O Virtualization Hypervisor Model Service VM Model Pass-through Model Service VMs Guest VMs VM 0 VM n VM 0 VM n Guest OS and Apps Guest OS and Apps I/O Services VM 0 VM n Guest OS and Apps Guest OS and Apps I/O Services Device Drivers Guest OS and Apps Device Drivers Device Drivers Device Drivers Hypervisor Hypervisor Hypervisor Shared Devices Shared Devices Assigned Devices Pro: High Performance Pro: Higher Security Pro: Higher Performance Pro: I/O Device Sharing Pro: I/O Device Sharing Pro: Rich Device Features Pro: VM Migration Pro: VM Migration Con: Limited Sharing Con: Large Hypervisor Con: Lower Performance Con: VM Migration Limits 25

IOMMU & Hypervisor Model Improved reliability and protection Hypervisor Model VMM hypervisors remap tables Errant DMA detected & reported to VMM VM 0 Guest OS and Apps VM n Guest OS and Apps Bounce buffer support Limited DMA addressability in some I/O devices prevents access to high memory Bounce buffers are a software technique to copy I/O buffers into high memory Extra copies IOMMU eliminates need for bounce buffers I/O Services Device Drivers Pro: High Performance Pro: I/O Device Sharing Pro: VM Migration Hypervisor Shared Devices Con: Large Hypervisor 26

IOMMU & Service VM Model Device driver deprivileging Device drivers run in Service OS Device drivers program devices in DMA-virtual address space Service VM Forwards DMA API calls to hypervisor Hypervisor sets up DMA-virtual to host-physical translation Further Improvements in protection Guest device driver unable to compromise hypervisor code or data either through DMA or through CPU-initiated memory accesses Service VM Model Service VMs I/O Services Device Drivers VM 0 Pro: Higher Security Pro: I/O Device Sharing Pro: VM Migration Guest VMs VM n Guest OS and Apps Hypervisor Shared Devices Con: Lower Performance 27

IOMMU & Pass-through Model Direct device assignment to guest OS Pass-through Model Hypervisor sets up guest-to-host physical VM 0 VM n DMA mapping tables Guest OS directly programs physical device Guest OS and Apps Device Drivers Guest OS and Apps Device Drivers Multi-queue or multi-interface devices Hypervisor Can assign device interfaces directly to VMs See PCI I/O virtualization standards Assigned Devices Pro: Higher Performance Pro: Rich Device Features Con: Limited Sharing Con: VM Migration Limits 28

VMMs Before & After HW Support Virtual Machines (VMs) VM 0 VM 1 VM 2 Apps Apps Apps OS OS OS VM n Apps OS VMM (a.k.a., hypervisor) Higher-level VMM Functions: Resource Discovery / Provisioning / Scheduling / User Interface Processor Virtualization Memory Virtualization I/O Device Virtualization Ring VT-x or VT-iBinary Deprivileging Configuration Translation Page-table EPT Configuration Shadowing DMA I/O DMA and Interrupt Interrupt Remap Remapping Configuration Remapping I/O Device Emulation Physical Platform Resources VT-i VT-x CPU 0 CPU n EPT VT-d PCI SIG Storage Network Processors Memory I/O Devices 29

Performance Implications Of Virtualization Layered( Resource( Management( Frequent( Context( Switching( Pressure(on(TLB( and(address( transla<on( VM App App... Guest OS Virtual(Machine(Monitor((VMM)( Physical(Host(Hardware( Processors App... Memory & Cache VM App App... Guest OS I/O App Longer(I/O( code(paths,( data(copies( Increased(memory( hierarchy(conten<on( 30

HW Support Vs. Binary Translation Binary translation VMM: Converts traps to callouts Callouts faster than trapping Faster emulation routine VMM does not need to reconstruct state Avoids callouts entirely Hardware supported VMM: Preserves code density No precise exception overhead Faster system calls 31

Compute-bound Benchmarks Bottomline: little difference for SPEC 32

Other Benchmarks Explanation for mixed results? 33

Reminder: Uses of VMs Memory Compression Memory Deduplication Virtual Machine Migration Thin Provisioning Enable old OS on modern hardware Archiving applications Sandboxing Take your apps with you on a memory stick 34

Introduction to Data Centers Readings for today Barroso & Holzle textbook, chapters 1 & 2 Slide credits: James Hamilton, Jeff Dean, Facebook Hamilton s blog is a great research for DC technology http://perspectives.mvdirona.com Facebook http://opencompute.org/ 35

What is a Datacenter (DC) The compute infrastructure for internet-scale services & cloud computing Examples: Google, Facebook, Yahoo, Amazon web services, Microsoft, Baidu, Both consumer and enterprise services Windows Live, gmail, hotmail, dropbox, bing, google, Adcenter, Exchange hosted services, Web apps, exchange online, salesforce.com, azureplatform, GoogleApps, 36

What is a Datacenter (DC) A simplistic view Scaled-up version of machine rooms for enterprise computing A large collection of commodity components PC-based servers (CPUs, DRAM, disks), Ethernet networking Commodity OS and software stack 10s to 100s of thousands of nodes System software for DC management (centralized) Software the implements internet services 37

A More Complete View of a DC Apart from computers & network switches, you need: Power infrastructure: voltage converters and regulators, generators and UPSs, Cooling infrastructure: A/C, cooling towers, heat exchangers, air impellers, Co-designed! 38

Example: MS Quincy Datacenter 470k sq feet (10 football fields) Next to a hydro-electric generation plant At up to 40 MegaWatts, $0.02/kWh is better than $0.15/kWh That s equal to the power consumption of 30,000 homes 39

Example: MS Chicago Datacenter Microsoft s Chicago Data Center [K. Vaid, Microsoft Global Foundation Services, 2010] Oct$3,$2010$ Kushagra$Vaid,$HotPower'10$ 10$ 40

Motivation for Internet-scale Services & Datacenters Some applications need big machines Examples: search, language translation, etc User experience Ubiquitous access Ease of management (no backups, no config) Vendor benefits (all translate to lower costs) Faster application development Tight control of system configuration Ease of (re)deployment for upgrades and fixes Single-system view for storage and other resources Lower cost by sharing HW resources across many users Lower cost by amortizing HW/storage management costs 41

Data Center Cost Business models based on low costs High volume needs low costs How much do you pay for gmail? Key competitive advantage Key metric: total cost of ownership (TCO) Two components to cost Capital expenses (CAPEX) Operational expenses (OPEX) 42

Cost Model: Facilities CAPEX Size of facility: 8MW; Cost of facility: $11/W See table below for typical construction costs Total facility CAPEX costs: $88M % facility costs for power and cooling = 82% = $72.16M % other infrastucture = 18% = $15.84M US accounting rules to convert CAPEX to OPEX Facilities amortization time = 10-15 years Annual cost of money = 5% OPEX = Pmt( interest_rate, number_payments, principal) Monthly opex: Power and cooling: 765K; other infrastucture: 168K 43

Cost Model: Systems CAPEX Servers 45,978 servers x $1450 per server = $66.7M CAPEX Depreciation: 3 years; cost of money = 5% Monthly CAPEX: $2,000K Networking Rack switches: 1150 x $4800; Array switches: 22 x $300K; Layer3 switch: 2 x $500K; Border routers: 2 x $144.8K = $13.41M CAPEX Depreciation: 4 years; cost of money = 5% Monthly CAPEX: $309K 44

Cost Model: OPEX Costs Power [=MegaWattsCriticalLoad*AveragePowerUsage/ 1000*PUE*PowerCost*24*365/12] 0.07c/KWhr; PUE = 1.45; average power use: 80% People $475K OPEX (monthly) Security guards: 3 x 24x365x$20 + Facilities: 1x24x365x$30 ; Benefits multiplier: 1.3 $85K OPEX (monthly) Network bandwidth costs to internet Varies by application and usage Vendor maintenance fees + sysadmins Varies by equipment and negotiations 45

Cost Model: TCO Servers amortized( $1,998,103 Power & Cooling Infrastructure amortized( $765,369 Power ( $474,208 Other Infrastructure amortized( $168,008 Network amortized( $308,814 People ( $85,410 Observa<ons( 34%(costs(related(to(power((trending(up(while(server(costs(down)( Networking(high(@(8%(of(overall(costs;(15%(of(server(costs( How(is(this(different(from(tradi<onal(enterprise(compu<ng?( 46

Enterprise Vs Internet-scale Computing: a Cost Perspective Enterprise computing approach Largest cost is people -- scales roughly with servers (~100:1 common) Enterprise interests focus on consolidation & utilization Consolidate workload onto fewer, larger systems Large SANs for storage & large routers for networking Internet-scale services approach Largest costs is server H/W Typically followed by cooling, power distribution, power Networking varies from very low to dominant depending upon service People costs under 10% & often under 5% (>1000+:1 server:admin) Services interests centered around work-done-per-$ (or watt) Observations People costs shift from top to nearly irrelevant. Focus instead on work done /$ & work done/watt 47

TCO Discussion Anything else missing? Tip: what can go wrong in a data center? 48

Using the cost analysis Cost model powerful tool for design tradeoffs E.g., can we reduce power cost with different disk? Burdened cost of a Watt per year What does this mean? ($765K+$475K)*12/8MW = $1.86/Watt/year A 1TB disk uses 10W of power, costs $90. An alternate disk consumes only 5W, but costs $150. If you were the data center architect, what would you do? 49

Answer A 1TB disk uses 10W of power, costs $90. An alternate disk consumes only 5W, but costs $150. If you were the data center architect, what would you do? @ $2/Watt even if we saved the entire 10W of power for disk, we would save $20 per year. We are paying $60 more for the disk probably not worth it. What is this analysis missing? 50