Memory management Chater : Memory Management Part : Mechanisms for Managing Memory asic management Swaing Virtual Page relacement algorithms Modeling age relacement algorithms Design issues for aging systems Imlementation issues Segmentation CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater In an ideal world The ideal world has that is Very large Very fast Non-volatile (doesn t go away when ower is turned off) The real world has that is: Very large Very fast ffordable! Pick any two Memory management goal: make the real world look as much like the ideal world as ossible Memory hierarchy What is the hierarchy? Different levels of Some are small & fast Others are large & slow What levels are usually included? Cache: small amount of fast, exensive L (level ) cache: usually on the CPU chi L & L cache: off-chi, made of SRM Main : medium-seed, medium rice (DRM) Disk: many gigabytes of slow, chea, non-volatile storage Memory manager handles the hierarchy CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater asic management Comonents include Oerating system (erhas with device drivers) Single rocess Goal: lay these out in Memory rotection may not be an issue (only one rogram) Flexibility may still be useful (allow changes, etc) No swaing or aging xffff User rogram (RM) Oerating system (ROM) User rogram Oerating system (RM) Oerating system (RM) (RM) CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Device drivers (ROM) User rogram (RM) xffff Chater 5 Fixed artitions: multile rograms Fixed artitions Divide into fixed saces ssign a rocess to a sace when it s free Mechanisms Searate inut queues for each artition Single inut queue: better ability to otimize CPU usage 9K 9K Partition Partition Partition 7K Partition 7K Partition K Partition K 5K 5K Partition Partition K K CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater
How many rograms is enough? Several artitions (fixed or variable size) Lots of rocesses wanting to use the CPU Tradeoff More rocesses utilize the CPU better Fewer rocesses use less (cheaer!) How many rocesses do we need to kee the CPU fully utilized? This will hel determine how much we need Is this still relevant with costing $5/G? CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater 7 Modeling multirogramming More I/O wait means less rocessor utilization t % I/O wait, rocesses fully utilize CPU t % I/O wait, even rocesses aren t enough This means that the should have more rocesses if they re I/O bound More rocesses => management & rotection more imortant! CPU Utilization CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater 9 7 5 5 7 9 Degree of Multirogramming % I/O Wait 5% I/O Wait % I/O Wait Multirogrammed system erformance rrival and work requirements of jobs CPU utilization for jobs with % I/O wait Sequence of events as jobs arrive and finish Numbers show amount of CPU time jobs get in each interval More rocesses => better utilization, less time er rocess Job rrival CPU time needed : CPU idle 5 : CPU busy 9 :5 CPU/rocess : Time 5 7 7 CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater 9 59 5 Memory and multirogramming Memory needs two things for multirogramming Relocation Protection The cannot be certain where a rogram will be loaded in Variables and rocedures can t use absolute locations in Several ways to guarantee this The must kee rocesses searate Protect a rocess from other rocesses reading or modifying its own Protect a rocess from modifying its own in undesirable ways (such as writing to rogram code) CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater ase and limit registers Swaing Secial CPU registers: base & limit ccess to the registers limited to system mode Registers contain ase: start of the rocess s artition Limit: length of the rocess s artition ddress generation Physical address: location in actual Logical address: location from the rocess s oint of view Physical address = base + logical address Logical address larger than limit => error xffff Process artition x Limit ase x9 Logical address: x Physical address: x+x9 = xa C C C C C D D D Memory allocation changes as Processes come into Processes leave Swaedtodisk Comlete execution Gray regions are unused CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater
Swaing: leaving room to grow Tracking usage: bitmas Need to allow for rograms to grow llocate more for data Larger stack Handled by allocating more sace than is necessary at the start Inefficient: wastes that snotcurrentlyinuse What if the rocess requests too much? Process Process Stack Data Code Stack Data Code Room for togrow Room for togrow Kee track of free / allocated regions with a bitma One bit in ma corresonds to a fixed-size region of itma is a constant size for a given amount of regardless of how much is allocated at a articular time Chunk size determines efficiency t bit er K chunk, we need just 5 bits ( bytes) er M of For smaller chunks, we need more for the bitma Can be difficult to find large contiguous free areas in bitma C D Memory regions itma CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater Tracking usage: linked lists Kee track of free / allocated regions with a linked list Each entry in the list corresonds to a contiguous region of Entry can indicate either allocated or free (and, otionally, owning rocess) May have searate lists for free and allocated areas Efficient if chunks are large Fixed-size reresentation for each region More regions => more sace needed for free lists C D Memory regions - - C 7 9 llocating Search through region list to find a large enough sace Suose there are several choices: which one to use? First fit: the first suitable hole on the list Next fit: the first suitable after the reviously allocated hole est fit: the smallest hole that is larger than the desired region (wastes least sace?) Worst fit: the largest available hole (leaves largest fragment) Otion: maintain searate queues for different-size holes llocate blocks first fit llocate blocks best fit llocate blocks next fit llocate 5 blocks worst fit 5-5 - 9-5 5 - - 5 D - 9 - - - 5-9 - 5 5 CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater 5 CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater Freeing llocation structures must be udated when is freed Easy with bitmas: just set the aroriate bits in the bitma Linked lists: modify adjacent elements as needed Merge adjacent free regions into a single region May involve merging two regions with the just-freed area X X X X Limitations of swaing Problems with swaing Process must fit into hysical (imossible to run larger rocesses) Memory becomes fragmented External fragmentation: lots of small free areas Comaction needed to reassemble larger free areas Processes are either in or on disk: half and half doesn t do any good Overlays solved the first roblem ring in ieces of the rocess over time (tyically data) Still doesn t solve the roblem of fragmentation or artially resident rocesses CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater 7 CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater
Virtual Virtual and hysical addresses asic idea: allow the to hand out more than exists on the system Kee recently used stuff in hysical Move less recently used stuff to disk Kee all of this hidden from rocesses Processes still see an address sace from max address Movement of information to and from disk handled by the without rocess hel Virtual (VM) esecially helful in multirogrammed system CPU schedules rocess while rocess waits for its to be retrieved from disk CPU chi CPU MMU Virtual addresses from CPU to MMU Physical addresses on bus, in Memory Disk controller Program uses virtual addresses ddresses local to the rocess Hardware translates virtual address to hysical address Translation done by the Memory Management Unit Usuallyonthesamechias the CPU Only hysical addresses leave the CPU/MMU chi Physical indexed by hysical addresses CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater 9 CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater Paging and age tables What s inaagetableentry? Virtual addresses maed to hysical addresses Unit of maing is called a age ll addresses in the same virtual age are in the same hysical age Page table entry (PTE) contains translation for a single age Table translates virtual age number to hysical Not all virtual has a hysical age Not every hysical age need be used Examle: K virtual K hysical K 5 K - 5 5K 5K K 5 K K K - K K K - K K K - K K 7 Virtual address sace K K K K K K K K Physical Each entry in the age table contains Valid bit: set if this logical has a corresonding hysical frame in If not valid, remainder of PTE is irrelevant : age in hysical Referenced bit: set if data on the age has been accessed Dirty (modified) bit :set if data on the age has been modified Protection information Protection Dirty bit D R V Referenced bit Valid bit CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater Maing logical => hysical address ddress translation architecture Slit address from CPU into two ieces Page number () Page offset (d) Page number Index into age table Page table contains base address of age in hysical Page offset ddedtobaseaddresstoget actual hysical address Page size = d bytes Examle: K (=9 byte) ages bit logical addresses d = 9 d= - = bits bits d bit logical address CPU d - + f age table f d f- f f+ f+ hysical CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater
Memory & aging structures Page Page Page Page Page Logical (P) Page Page Logical (P) 9 Page table (P) Page table (P) Free ages 5 7 9 Physical Page (P) Page (P) Page (P) Page (P) Page (P) Page (P) Page (P) Two-level age tables Problem: age tables can be too large bytes in K ages need million PTEs Solution: use multi-level age tables Page size in first age table is large (megabytes) PTE marked invalid in first age table needs no nd level age table st level age table st level age table has ointers to nd level age tables nd level age table has actual hysical s in it nd level age tables 57 5 9 9 955 main CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater 5 CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater More on two-level age tables Tradeoffs between st and nd level age table sizes Total number of bits indexing st and nd level table is constant for a given age size and logical address length Tradeoff between number of bits indexing st and number indexing nd level tables More bits in st level: fine granularity at nd level Fewer bits in st level: maybe less wasted sace? ll addresses in table are hysical addresses Protection bits ket in nd level table Two-level aging: examle System characteristics Kages -bit logical address divided into bit, 9 bit Page number divided into: bit 9 bit Logical address looks like this: is an index into the st level age table is an index into the nd level age table ointed to by = bits = 9 bits offset = bits CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater 7 CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater -level address translation examle Imlementing age tables in hardware Page table base = bits = 9 bits offset = bits st level age table nd level age table hysical address 9 frame number main Page table resides in main (hysical) CPU uses secial registers for aging Page table base register (PTR) oints to the age table Page table length register (PTLR) contains length of age table: restricts maximum legal logical address Translating an address requires two accesses First access reads age table entry (PTE) Second access reads the data / instruction from Reduce number of accesses Can t avoid second access (we need the value from ) Eliminate first access by keeing a hardware cache (called a translation lookaside buffer or TL) of recently used age table entries CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater 9 CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater 5
Translation Lookaside uffer (TL) Handling TL misses Search the TL for the desired logical Search entries in arallel Use standard cache techniques If desired logical is found, get frame number from TL If desired logical isn t found Get frame number from age table in Relace an entry in the TL with the logical & hysical age numbers from this reference Logical age # unused 9 7 Physical frame # Examle TL If PTE isn t found in TL, needs to do the looku in the age table Looku can be done in hardware or software Hardware TL relacement CPU hardware does age table looku Can be faster than software Less flexible than software, and more comlex hardware Software TL relacement gets TL excetion Excetion handler does age table looku & laces the result into the TL Program continues after return from excetion Larger TL (lower miss rate) can make this feasible CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater How long do accesses take? ssume the following times: TL looku time = a (often zero - overlaed in CPU) Memory access time = m Hit ratio (h) is ercentage of time that a logical isfoundinthetl Larger TL usually means higher h TL structure can affect h as well Effective access time (an average) is calculated as: ET = (m + a)h + (m + m + a)(-h) ET =a + (-h)m Interretation Reference always requires TL looku, access TL misses also require an additional reference Inverted age table Reduce age table size further: kee one entry for each frame in PTE contains Virtual address ointing to this frame Information about the rocess that owns this age Search age table by Hashing the virtual and rocess ID Starting at the entry corresonding to the hash result Search until either the entry is found or a limit is reached is index of PTE Imrove erformance by using more advanced hashing algorithms CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater Inverted age table architecture rocess ID = 9 bits offset = bits id hysical address search 9 id id k id k k k inverted age table Page frame number main CS 55, csittedu (originaly modified by Ethan L Miller and Scott randt) Chater 5