This Unit: Virtual Memory. CIS 501 Computer Architecture. Readings. A Computer System: Hardware



Similar documents
Memory Hierarchy II: Main Memory

This Unit: Main Memory. CIS 501 Introduction to Computer Architecture. Memory Hierarchy Review. Readings

Lecture 17: Virtual Memory II. Goals of virtual memory

CS 61C: Great Ideas in Computer Architecture Virtual Memory Cont.

Virtual vs Physical Addresses

W4118: segmentation and paging. Instructor: Junfeng Yang

Virtual Memory. How is it possible for each process to have contiguous addresses and so many of them? A System Using Virtual Addressing

COS 318: Operating Systems. Virtual Memory and Address Translation

CS161: Operating Systems

361 Computer Architecture Lecture 14: Cache Memory

Outline: Operating Systems

Intel P6 Systemprogrammering 2007 Föreläsning 5 P6/Linux Memory System

Chapter 6, The Operating System Machine Level

CSC 2405: Computer Systems II

Chapter 1 Computer System Overview

Processes and Non-Preemptive Scheduling. Otto J. Anshus

Exception and Interrupt Handling in ARM

Board Notes on Virtual Memory

Memory Management CS 217. Two programs can t control all of memory simultaneously

An Introduction to the ARM 7 Architecture

Chapter 2: OS Overview

Operating Systems, 6 th ed. Test Bank Chapter 7

CS5460: Operating Systems

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

1. Computer System Structure and Components

COS 318: Operating Systems

Review from last time. CS 537 Lecture 3 OS Structure. OS structure. What you should learn from this lecture

Intel s Virtualization Extensions (VT-x) So you want to build a hypervisor?

POSIX. RTOSes Part I. POSIX Versions. POSIX Versions (2)

An Implementation Of Multiprocessor Linux

Lesson Objectives. To provide a grand tour of the major operating systems components To provide coverage of basic computer system organization

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 5 - DBMS Architecture

How To Write A Page Table

Exceptions in MIPS. know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine

Operating System Overview. Otto J. Anshus

Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

CS/COE1541: Introduction to Computer Architecture. Memory hierarchy. Sangyeun Cho. Computer Science Department University of Pittsburgh

Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

I/O Device and Drivers

The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy

COS 318: Operating Systems. I/O Device and Drivers. Input and Output. Definitions and General Method. Revisit Hardware

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

The Microsoft Windows Hypervisor High Level Architecture

COS 318: Operating Systems. Virtual Machine Monitors

Lecture 25 Symbian OS

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

The Xen of Virtualization

Computer-System Architecture

& Data Processing 2. Exercise 3: Memory Management. Dipl.-Ing. Bogdan Marin. Universität Duisburg-Essen

System Virtual Machines

6.033 Computer System Engineering

Xen and the Art of. Virtualization. Ian Pratt

Introduction to Virtual Machines

This Unit: Multithreading (MT) CIS 501 Computer Architecture. Performance And Utilization. Readings

Chapter 2: Computer-System Structures. Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General System Architecture

Operating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015

Multi-core Programming System Overview

18-548/ Associativity 9/16/98. 7 Associativity / Memory System Architecture Philip Koopman September 16, 1998

Operating Systems 4 th Class

VMware and CPU Virtualization Technology. Jack Lo Sr. Director, R&D

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

Virtualization. Dr. Yingwu Zhu

Process Description and Control william stallings, maurizio pizzonia - sistemi operativi

Full and Para Virtualization

The World According to the OS. Operating System Support for Database Management. Today s talk. What we see. Banking DB Application

Hardware Assisted Virtualization

Introduction. What is an Operating System?

Operating Systems: Basic Concepts and History

The Classical Architecture. Storage 1 / 36

Traditional IBM Mainframe Operating Principles

Virtualization. Jia Rao Assistant Professor in CS

Introduction to Operating Systems. Perspective of the Computer. System Software. Indiana University Chen Yu

WHITE PAPER. AMD-V Nested Paging. AMD-V Nested Paging. Issue Date: July, 2008 Revision: 1.0. Advanced Micro Devices, Inc.

Virtualization. Explain how today s virtualization movement is actually a reinvention

Virtualization. Pradipta De

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

VtRES Towards Hardware Embedded Virtualization Technology: Architectural Enhancements to an ARM SoC. ESRG Embedded Systems Research Group

Operating Systems Overview

This Unit: Caches. CIS 501 Introduction to Computer Architecture. Motivation. Types of Memory

Chapter 5 Cloud Resource Virtualization

Page 1 of 5. IS 335: Information Technology in Business Lecture Outline Operating Systems

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

CS 695 Topics in Virtualization and Cloud Computing. More Introduction + Processor Virtualization

Overview of Operating Systems Instructor: Dr. Tongping Liu

Virtuoso and Database Scalability

CHAPTER 6 TASK MANAGEMENT

Operating System Structures

Hypervisor: Requirement Document (Version 3)

Linux Process Scheduling Policy

Microkernels, virtualization, exokernels. Tutorial 1 CSC469

How To Write To A Linux Memory Map On A Microsoft Zseries (Amd64) On A Linux (Amd32) (

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-17: Memory organisation, and types of memory

Memory management basics (1) Requirements (1) Objectives. Operating Systems Part of E1.9 - Principles of Computers and Software Engineering

The Deadlock Problem. Deadlocks. Deadlocks. Bridge Crossing Example

Chapter 11 I/O Management and Disk Scheduling

Physical Data Organization

CS5460: Operating Systems. Lecture: Virtualization 2. Anton Burtsev March, 2013

Hardware Enforcement of Application Security Policies Using Tagged Memory

Transcription:

This Unit: Virtual CIS 501 Computer Architecture Unit 5: Virtual App App App System software Mem CPU I/O The operating system (OS) A super-application Hardware support for an OS Virtual memory Page tables and address translation s and memory hierarchy issues Slides originally developed by Amir Roth with contributions by Milo Martin at University of Pennsylvania with sources that included University of Wisconsin slides by Mark Hill, Guri Sohi, Jim Smith, and David Wood. CIS 501 (Martin/Roth): Virtual 1 CIS 501 (Martin/Roth): Virtual 2 Readings P+H Virtual : C.4, C.5 starting with page C-53 (Opteron), C.6 A Computer System: Hardware CPUs and memories Connected by memory bus I/O peripherals: storage, input, display, network, With separate or built-in Connected by system bus (which is connected to memory bus) bus bridge CPU/$ CPU/$ System (I/O) bus I/O ctrl kbd display NIC CIS 501 (Martin/Roth): Virtual 3 CIS 501 (Martin/Roth): Virtual 4

A Computer System: + App Software Application software: computer must do something Application sofware A Computer System: + OS Operating System (OS): virtualizes hardware for apps Abstraction: provides services (e.g., threads, files, etc.) + Simplifies app programming model, raw hardware is nasty Isolation: gives each app illusion of private CPU, memory, I/O + Simplifies app programming model + Increases hardware resource utilization Application Application Application Application OS bus bridge System (I/O) bus bus bridge System (I/O) bus CPU/$ CPU/$ I/O ctrl CPU/$ CPU/$ I/O ctrl kbd display NIC kbd display NIC CIS 501 (Martin/Roth): Virtual 5 CIS 501 (Martin/Roth): Virtual 6 Operating System (OS) and User Apps Sane system development requires a split Hardware itself facilitates/enforces this split Operating System (OS): a super-privileged process Manages hardware resource allocation/revocation for all processes Has direct access to resource allocation features Aware of many nasty hardware details Aware of other processes Talks directly to input/output devices (device driver software) User-level apps: ignorance is bliss Unaware of most nasty hardware details Unaware of other apps (and OS) Explicitly denied access to resource allocation features CIS 501 (Martin/Roth): Virtual 7 System Calls Controlled transfers to/from OS System Call: a user-level app function call to OS Leave description of what you want done in registers SYSCALL instruction (also called TRAP or INT) Can t allow user-level apps to invoke arbitrary OS code Restricted set of legal OS addresses to jump to (trap vector) Processor jumps to OS using trap vector Sets privileged mode OS performs operation OS does a return from system call Unsets privileged mode CIS 501 (Martin/Roth): Virtual 8

Interrupts Exceptions: synchronous, generated by running app E.g., illegal insn, divide by zero, etc. Interrupts: asynchronous events generated externally E.g., timer, I/O request/reply, etc. Interrupt handling: same mechanism for both Interrupts are on-chip signals/bits Either internal (e.g., timer, exceptions) or from I/O devices Processor continuously monitors interrupt status, when one is high Hardware jumps to some preset address in OS code (interrupt vector) Like an asynchronous, non-programmatic SYSCALL Timer: programmable on-chip interrupt Initialize with some number of micro-seconds Timer counts down and interrupts when reaches 0 CIS 501 (Martin/Roth): Virtual 9 Virtualizing Processors How do multiple apps (and OS) share the processors? Goal: applications think there are an infinite # of processors Solution: time-share the resource Trigger a context switch at a regular interval (~1ms) Pre-emptive: app doesn t yield CPU, OS forcibly takes it + Stops greedy apps from starving others Architected state: PC, registers Save and restore them on context switches state? Non-architected state: caches, predictor tables, etc. Ignore or flush Operating responsible to handle context switching Hardware support is just a timer interrupt CIS 501 (Martin/Roth): Virtual 10 Virtualizing Main How do multiple apps (and the OS) share main memory? Goal: each application thinks it has infinite memory One app may want more memory than is in the system App s insn/data footprint may be larger than main memory Requires main memory to act like a cache With disk as next level in memory hierarchy (slow) Write-back, write-allocate, large blocks or pages No notion of program not fitting in registers or caches (why?) Solution: Part #1: treat memory as a cache Store the overflowed blocks in swap space on disk Part #2: add a level of indirection (address translation) Virtual (VM) Program code heap stack Main Programs use virtual addresses (VA) 02 N 1 VA size also referred to as machine size E.g., 32-bit (embedded) or 64-bit (server) uses physical addresses (PA) 02 M 1 (typically M<N, especially if N=64) 2 M is most physical memory machine supports VA!PA at page granularity (VP!PP) By system Mapping need not preserve contiguity VP need not be mapped to any PP Unmapped VPs live on disk (swap) CIS 501 (Martin/Roth): Virtual 11 CIS 501 (Martin/Roth): Virtual 12

Virtual (VM) Virtual (VM): Level of indirection Application generated addresses are virtual addresses (VAs) Each process thinks it has its own 2 N bytes of address space accessed using physical addresses (PAs) VAs translated to PAs at some coarse granularity OS controls VA to PA mapping for itself and all other processes Logically: translation performed before every insn fetch, load, store Physically: hardware acceleration removes translation overhead OS App1 App2 VAs OS controlled VA!PA mappings PAs (physical memory) CIS 501 (Martin/Roth): Virtual 13 VM is an Old Idea: Older than Caches Original motivation: single-program compatibility IBM System 370: a family of computers with one software suite + Same program could run on machines with different memory sizes Prior, programmers explicitly accounted for memory size But also: full-associativity + software replacement t miss is high: extremely important to reduce % miss Parameter I$/D$ L2 Main t hit 2ns 10ns 30ns t miss 10ns 30ns 10ms (10M ns) Capacity 8 64KB 128KB 2MB 64MB 64GB Block size 16 32B 32 256B 4+KB Assoc./Repl. 1 4, NMRU 4 16, NMRU Full, working set CIS 501 (Martin/Roth): Virtual 14 Uses of Virtual More recently: isolation and multi-programming Each app thinks it has 2 N B of memory, its stack starts 0xFFFFFFFF, Apps prevented from reading/writing each other s memory Can t even address the other program s memory! Protection Each page with a read/write/execute permission set by OS Enforced by hardware Inter-process communication. Map same physical pages into multiple virtual address spaces Or share files via the UNIX mmap() call OS App1 App2 Virtual : The Basics Programs use virtual addresses (VA) VA size (N) aka machine size (e.g., Core 2 Duo: 48-bit) uses physical addresses (PA) PA size (M) typically M<N, especially if N=64 2 M is most physical memory machine supports VA!PA at page granularity (VP!PP) Mapping need not preserve contiguity VP need not be mapped to any PP Unmapped VPs live on disk (swap) or nowhere (if not yet touched) OS App1 App2 CIS 501 (Martin/Roth): Virtual 15 CIS 501 (Martin/Roth): Virtual 16

Address Translation virtual address[31:0] physical address[25:0] VPN[31:16] translate PPN[27:16] VA!PA mapping called address translation Split VA into virtual page number (VPN) & page offset (POFS) Translate VPN into physical page number (PPN) POFS is not translated VA!PA = [VPN, POFS]! [PPN, POFS] Example above 64KB pages! 16-bit POFS 32-bit machine! 32-bit VA! 16-bit VPN Maximum 256MB memory! 28-bit PA! 12-bit PPN POFS[15:0] don t touch POFS[15:0] CIS 501 (Martin/Roth): Virtual 17 Address Translation Mechanics I How are addresses translated? In software (for now) but with hardware acceleration (a little later) Each process allocated a page table (PT) Software data structure constructed by OS PT Maps VPs to PPs or to disk (swap) addresses VP entries empty if page never referenced Translation is table lookup struct { int ppn; int is_valid, is_dirty, is_swapped; } PTE; struct PTE page_table[num_virtual_pages]; int translate(int vpn) { if (page_table[vpn].is_valid) return page_table[vpn].ppn; } CIS 501 (Martin/Roth): Virtual 18 vpn (swap) Page Table Size How big is a page table on the following machine? 32-bit machine 4B page table entries (PTEs) 4KB pages 32-bit machine! 32-bit VA! 4GB virtual memory 4GB virtual memory / 4KB page size! 1M VPs 1M VPs * 4B PTE! 4MB How big would the page table be with 64KB pages? How big would it be for a 64-bit machine? Page tables can get big There are ways of making them smaller CIS 501 (Martin/Roth): Virtual 19 Multi-Level Page Table (PT) One way: multi-level page tables Tree of page tables Lowest-level tables hold PTEs Upper-level tables hold pointers to lower-level tables Different parts of VPN used to index different levels Example: two-level page table for machine on last slide Compute number of pages needed for lowest-level (PTEs) 4KB pages / 4B PTEs! 1K PTEs/page 1M PTEs / (1K PTEs/page)! 1K pages Compute number of pages needed for upper-level (pointers) 1K lowest-level pages! 1K pointers 1K pointers * 32-bit VA! 4KB! 1 upper level page CIS 501 (Martin/Roth): Virtual 20

Multi-Level Page Table (PT) 20-bit VPN Upper 10 bits index 1st-level table Lower 10 bits index 2nd-level table struct { int ppn; int is_valid, is_dirty, is_swapped; } PTE; struct { struct PTE ptes[1024]; } L2PT; struct L2PT *page_table[1024]; VPN[19:10] pt root VPN[9:0] 1st-level pointers 2nd-level PTEs Multi-Level Page Table (PT) Have we saved any space? Isn t total size of 2nd level tables same as single-level table (i.e., 4MB)? Yes, but Large virtual address regions unused Corresponding 2nd-level tables need not exist Corresponding 1st-level pointers are null int translate(int vpn) { index1 = (vpn >> 10); // upper 10 bits index2 = (vpn & 0x3ff); // lower 10 bits struct L2PT *l2pt = page_table[index1]; if (l2pt!= NULL && l2pt->ptes[index2].is_valid) return l2pt->ptes[index2].ppn; } CIS 501 (Martin/Roth): Virtual 21 Example: 2MB code, 64KB stack, 16MB heap Each 2nd-level table maps 4MB of virtual addresses 1 for code, 1 for stack, 4 for heap, (+1 1st-level) 7 total pages = 28KB (much less than 4MB) CIS 501 (Martin/Roth): Virtual 22 Page-Level Protection Page-level protection Piggy-back page-table mechanism Map VPN to PPN + Read/Write/Execute permission bits Attempt to execute data, to write read-only data? Exception! OS terminates program Useful (for OS itself actually) struct { int ppn; int is_valid, is_dirty, is_swapped, permissions; } PTE; struct PTE page_table[num_virtual_pages]; int translate(int vpn, int action) { if (page_table[vpn].is_valid &&!(page_table [vpn].permissions & action)) kill; } CIS 501 (Martin/Roth): Virtual 23 Address Translation Mechanics II Conceptually Translate VA to PA before every cache access Walk the page table before every load/store/insn-fetch Would be terribly inefficient (even in hardware) In reality Translation Lookaside Buffer (): cache translations Only walk page table on miss Hardware truisms Functionality problem? Add indirection (e.g., VM) Performance problem? Add cache (e.g., ) CIS 501 (Martin/Roth): Virtual 24

Translation Lookaside Buffer Serial & Cache Access I$ CPU L2 D$ Main VA PA Translation lookaside buffer () Small cache: 16 64 entries Associative (4+ way or fully associative) + Exploits temporal locality in page table What if an entry isn t found in the? Invoke miss handler tag VPN VPN VPN data PPN PPN PPN I$ CPU L2 D$ Main VA PA Physical caches Indexed and tagged by physical addresses + Natural, lazy sharing of caches between apps/os VM ensures isolation (via physical addresses) No need to do anything on context switches Multi-threading works too + Cached inter-process communication works Single copy indexed by physical address Slow: adds at least one cycle to t hit Note: s are by definition virtual Indexed and tagged by virtual addresses Flush across context switches Or extend with process identifier tags (x86) CIS 501 (Martin/Roth): Virtual 25 CIS 501 (Martin/Roth): Virtual 26 Parallel & Cache Access Two ways to look at VA Cache: tag+index+offset : VPN+page offset Parallel cache/ If address translation doesn t change index That is, VPN/index must not overlap hit/miss cache hit/miss == == == virtual tag [31:12] tags index [11:5] address CIS 501 (Martin/Roth): Virtual 27 == VPN [31:16] page offset [15:0] cache [4:0] data data Parallel & Cache Access CPU PPN[27:16] page offset [15:0] VA What about parallel access? I$ D$ PA Only if (cache size) / (associativity)! page size L2 Index bits same in virt. and physical addresses! Main tag [31:12]? index [11:5] VPN [31:16] page offset [15:0] Access in parallel with cache Cache access needs tag only at very end + Fast: no additional t hit cycles + No context-switching/aliasing problems Dominant organization used today Example: Core 2, 4KB pages, 32KB, 8-way SA L1 data cache Implication: associativity allows bigger caches CIS 501 (Martin/Roth): Virtual 28 [4:0]

Organization Like caches: s also have ABCs Capacity Associativity (At least 4-way associative, fully-associative common) What does it mean for a to have a block size of two? Two consecutive VPs share a single tag Like caches: there can be L2 s Example: AMD Opteron 32-entry fully-assoc. s, 512-entry 4-way L2 (insn & data) 4KB pages, 48-bit virtual addresses, four-level page table Rule of thumb: should cover L2 contents In other words: (#PTEs in ) * page size! L2 size Why? Consider relative miss latency in each CIS 501 (Martin/Roth): Virtual 29 Misses miss: translation not in, but in page table Two ways to fill it, both relatively fast Software-managed : e.g., Alpha, MIPS, ARM Short (~10 insn) OS routine walks page table, updates + Keeps page table format flexible Latency: one or two memory accesses + OS call (pipeline flush) Hardware-managed : e.g., x86 Page table root in hardware register, hardware walks table + Latency: saves cost of OS call (avoids pipeline flush) Page table format is hard-coded Trend is towards hardware miss handler CIS 501 (Martin/Roth): Virtual 30 Page Faults Page fault: PTE not in or page table! page not in memory Or no valid mapping! segmentation fault Starts out as a miss, detected by OS/hardware handler OS software routine: Choose a physical page to replace Working set : refined LRU, tracks active page usage If dirty, write to disk Read missing page from disk Takes so long (~10ms), OS schedules another task Requires yet another data structure: frame map (why?) Treat like a normal miss from here Summary OS virtualizes memory and I/O devices Virtual memory infinite memory, isolation, protection, inter-process communication Page tables Translation buffers Parallel vs serial access, interaction with caching Page faults CIS 501 (Martin/Roth): Virtual 31 CIS 501 (Martin/Roth): Virtual 32