How is it possible for each process to have contiguous addresses and so many of them? Computer Systems Organization (Spring ) CSCI-UA, Section Instructor: Joanna Klukowska Teaching Assistants: Paige Connelly & Carlos Guzman Slides adapted from Andrew Case, Jinyang Li, Mohamed Zahran, Stewart Weiss, Randy Bryant and Dave O Hallaron Image source: http://en.wikipedia.org/wiki/file:_.svg A System Using ing A System Using ing Main : : address : () : : : : 7: 8: Chip () management unit address () Main : : : : : : : 7: 8: M-: M-: word Used in simple systems like embedded microcontrollers in devices like cars, elevators, and digital picture frames word Used in all modern servers, desktops, and laptops One of the great ideas in computer science Spaces Why (VM)? Linear address space: Ordered set of contiguous non-negative integer addresses: {,,, } space: Set of N = n virtual addresses {,,,,, N-} address space: Set of M = m physical addresses {,,,,, M-} Uses main efficiently Use DRAM as a cache for the parts of a virtual address space Simplifies management Each process gets the same uniform linear address space Clean distinction between data (bytes) and their attributes (addresses) Each object can now have multiple addresses Every byte in main : one physical address, one (or more) virtual addresses Isolates address spaces One process can t interfere with another s User program cannot access privileged kernel information
7 8 VM as a Tool for Caching DRAM Cache Organization is an array of N contiguous bytes stored on disk. The contents of the array on disk are cached in physical (DRAM cache) These cache blocks are called pages (size is P = p bytes) VP Unallocated VP Cached Uncached n-p - Unallocated Cached Uncached Cached Uncached N- pages (VPs) stored on disk M- Empty Empty Empty PP PP pages (PPs) cached in DRAM PP m-p - DRAM cache organization driven by the enormous miss penalty DRAM is about x slower than SRAM Disk is about,x slower than DRAM Consequences Large page (block) size: typically -8 KB, sometimes MB Fully associative Any VP can be placed in any PP Requires a large mapping function different from caches Highly sophisticated, expensive replacement algorithms Too complicated and open-ended to be implemented in hardware Write-back rather than write-through 9 Page Tables Page Hit A is an array of entries (s) that maps virtual pages to physical pages. Per-process kernel data structure in DRAM page disk address null null 7 VP VP PP PP Page hit: reference to VM word that is in physical (DRAM cache hit) page disk address null null 7 VP VP PP PP Page Fault Handling Page Fault Page fault: reference to VM word that is not in physical (DRAM cache miss) Page miss causes page fault page disk address null null 7 VP VP PP PP page disk address null null 7 VP VP PP PP
Handling Page Fault Handling Page Fault Page miss causes page fault Page fault handler selects a victim to be evicted (here ) Page miss causes page fault (an exception) Page fault handler selects a victim to be evicted (here ) page disk address null null 7 VP VP PP PP page disk address null null 7 VP VP PP PP Handling Page Fault Locality to the Rescue Again! Page miss causes page fault (an exception) Page fault handler selects a victim to be evicted (here ) Offending instruction is restarted: page hit! page disk address null null 7 VP VP PP PP works because of locality At any point in time, programs tend to access a set of active virtual pages called the working set Programs with better temporal locality will have smaller working sets If (working set size < main size) Good performance for one process after compulsory misses If ( SUM(working set sizes) > main size ) Thrashing: Performance meltdown where pages are swapped (copied) in and out continuously 7 8 VM as a Tool for Management VM as a Tool for Management Key idea: each process has its own virtual address space It can view as a simple linear array Mapping function scatters addresses through physical Well chosen mappings simplify allocation and management allocation Each virtual page can be mapped to any physical page A virtual page can be stored in different physical pages at different times Sharing code and data among processes Map virtual pages to the same physical page (here: ) Space for Process : VP translation PP Space Space for Process : VP translation PP Space Space for Process : N- VP PP 8 (e.g., read-only library code) Space for Process : N- VP PP 8 (e.g., read-only library code) N- M- N- M-
9 Simplifying Linking and Loading Linking Each program has similar virtual address space Code, stack, and shared libraries always start at the same address Loading execve() allocates virtual pages for.text and.data sections = creates s marked as invalid The.text and.data sections are copied, page by page, on demand by the system xc x x88 Kernel User stack (created at runtime) -mapped region for shared libraries Run-time heap (created by malloc) Read/write segment (.data,.bss) Read-only segment (.init,.text,.rodata) Unused invisible to user code %esp (stack pointer) brk Loaded from the executable file VM as a Tool for Protection Extend s with permission bits Page fault handler checks these before remapping If violated, send process SIGSEGV (segmentation fault) Process i: Process j: VP : VP : : VP : VP : : SUP SUP READ WRITE READ WRITE PP PP PP 9 PP Space PP PP PP 8 PP 9 PP VM Translation VM Translation Space V = {,,, N} Space P = {,,, M} Translation MAP: V P U { } For virtual address a: MAP(a) = a if data at virtual address a is at physical address a in P MAP(a) = if data at virtual address a is not in physical Either invalid or stored on disk Summary of Translation Symbols Translation With a Page Table Basic Parameters N = n : Number of addresses in virtual address space M = m : Number of addresses in physical address space P = p : Page size (bytes) Components of the virtual address () TLBI: TLB index TLBT: TLB tag : page offset : page number Components of the physical address () : page offset (same as ) : page number CO: Byte offset within cache line CI: Cache index CT: Cache tag Page table base register (PTBR) Page table address for process bit = : page not in (page fault) n- page number () Page table page number () m- page number () address p p p- page offset () p- page offset ()
Translation: Page Hit Translation: Page Fault Chip Cache/ Chip 7 Exception Cache/ Page fault handler Victim page New page Disk ) Processor sends virtual address to -) fetches from in ) sends physical address to cache/ ) Cache/ sends data word to processor : page tale entry address : entry : physical address : virtual address ) Processor sends virtual address to -) fetches from in ) bit is zero, so triggers page fault exception ) Handler identifies victim (and, if dirty, pages it out to disk) ) Handler pages in new page and updates in 7) Handler returns to original process, restarting faulting instruction 7 8 Integrating VM and Cache Speeding up Translation with a TLB Chip hit hit miss L cache miss Page table entries (s) are cached in L like any other word s may be evicted by other data references hit still requires a small L delay Solution: Translation Lookaside Buffer (TLB) Small hardware cache in Maps virtual page numbers to physical page numbers Contains complete entries for small number of pages : page tale entry address : entry : physical address : virtual address 9 TLB Hit Chip TLB : page tale entry address : entry : physical address : virtual address : virtual page number TLB Miss Chip TLB : page tale entry address : entry : physical address : virtual address : virtual page number Cache/ Cache/ A TLB hit eliminates a access A TLB miss incurs an additional access (the ) Fortunately, TLB misses are rare.
Multi-Level Page Tables A Two-Level Page Table Hierarchy Suppose: KB ( ) page size, 8-bit address space, 8-byte Level Tables Level Level s VP Problem: Would need a GB! 8 * - * = 9 bytes Level Table Common solution: Multi-level s Example: -level Level table: each points to a (always resident) Level table: each points to a page (paged in and out like any other data) (null) (null) (null) (null) (null) 7 (null) 8 (K - 9) null s null s bit addresses, KB pages, -byte s 7 Gap unallocated pages VP 9 K allocated VM pages for code and data K unallocated VM pages unallocated pages allocated VM page for the stack Why Two-level Page Table Reduces Requirement? Review of Symbols If a in the level table is null, then the corresponding level does not even have to exist. Only the level table needs to be in main at all times. The level s can be created and paged in and out by the VM system as they are needed. Basic Parameters N = n : Number of addresses in virtual address space M = m : Number of addresses in physical address space P = p : Page size (bytes) Components of the virtual address () TLBI: TLB index TLBT: TLB tag : page offset : page number Components of the physical address () : page offset (same as ) : page number CO: Byte offset within cache line CI: Cache index CT: Cache tag Simple System Example Simple System Page Table ing -bit virtual addresses -bit physical address Page size = bytes Only show first entries (out of ) 8 8 9 8 7 9 A 7 9 B C Page Number Page Offset D E D 7 F D 9 8 7 Page Number Page Offset
7 8 Simple System TLB entries -way associative TLBT TLBI 9 8 7 Simple System Cache lines, -byte block size ly addressed Direct mapped CT 9 8 7 CI CO Set 7 D 9 8 D D A 7 A Idx 7 9 B D B 99 B D 7 C B 8F F DF B 8 9 D Idx 8 9 A B C D E F D D B B A 9 8 B 9 77 B DA B B 89 B D 9 Translation Example # Translation Example # : xd : xb8f TLBT TLBI 9 8 7 TLBT TLBI 9 8 7 xf x x Y N xd TLBI TLBT TLB Hit? Page Fault? : xe xb N Y TBD TLBI TLBT TLB Hit? Page Fault? : CT CI CO CT CI CO 9 8 7 9 8 7 x xd Y x CO CI CT Hit? Byte: CO CI CT Hit? Byte: Translation Example # : x TLBT TLBI 9 8 7 Intel Core i7 System Processor package Core x Registers L d-cache KB, 8-way Instruction fetch L i-cache KB, 8-way L d-tlb entries, -way (addr translation) L i-tlb 8 entries, -way x x N N x8 TLBI TLBT TLB Hit? Page Fault? : CT CI CO 9 8 7 L unified cache KB, 8-way L unified cache 8 MB, -way (shared by all cores) L unified TLB entries, -way QuickPath interconnect links @. GB/s each DDR controller x bit @. GB/s GB/s total (shared by all cores) To other cores To I/O bridge x8 x8 N Mem CO CI CT Hit? Byte: Main 7
Review of Symbols Basic Parameters N = n : Number of addresses in virtual address space M = m : Number of addresses in physical address space P = p : Page size (bytes) Components of the virtual address () TLBI: TLB index TLBT: TLB tag : page offset : page number Components of the physical address () : page offset (same as ) : page number CO: Byte offset within cache line CI: Cache index CT: Cache tag CR End-to-end Core i7 Translation () TLBT TLBI TLB miss 9 9 9 9 Page tables L TLB ( sets, entries/set) TLB hit / Result L hit address () L d-cache ( sets, 8 lines/set) L, L, and main L miss CT CI CO of a Linux Process Different for each process Identical for each process Process-specific data structs (ptables, task and mm structs, kernel stack) Kernel code and data Kernel virtual Linux Organizes VM as Collection of Areas task_struct mm mm_struct pgd mmap vm_area_struct Process Shared libraries %esp brk x88 () x () User stack mapped region for shared libraries Runtime heap (malloc) Uninitialized data (.bss) Initialized data (.data) Program text (.text) Process virtual pgd: Page global directory address Points to L : Read/write permissions for this area Pages shared with other processes or private to this process Text 7 8 Linux Page Fault Handling Mapping vm_area_struct Process shared libraries data text read read write Segmentation fault: accessing a non-existing page rmal page fault Protection exception: e.g., violating permission by writing to a read-only page (Linux reports as Segmentation fault) VM areas initialized by associating them with disk objects. Process is known as mapping. Area can be backed by (i.e., get its initial values from) : Regular file on disk (e.g., an executable object file) Initial page bytes come from a section of a file Anonymous file (e.g., nothing) First fault will allocate a physical page full of 's (demand-zero page) Once the page is written to (dirtied), it is like any other page Dirty pages are copied back and forth between and a special swap file. 8
9 Demand paging Sharing Revisited: Shared Objects Key point: no virtual pages are copied into physical until they are referenced! Known as demand paging Crucial for time and space efficiency Process Process Process maps the shared object. Shared object Sharing Revisited: Shared Objects Sharing Revisited: Private Copy-on-write (COW) Objects Process Shared object Process Process maps the shared object. tice how the virtual addresses can be different. Process Private copy-on-write object Process Private copy-on-write area Two processes mapping a private copy-onwrite (COW) object. Area flagged as private copy-onwrite s in private areas are flagged as read-only Sharing Revisited: Private Copy-on-write (COW) Objects Process Copy-on-write Private copy-on-write object Process Write to private copy-on-write page Instruction writing to private page triggers protection fault. Handler creates new R/W page. Instruction restarts upon handler return. Copying deferred as long as possible! 9