CSL373: Lecture 14 User level memory management

Similar documents
Segmentation and Fragmentation

Memory Allocation. Static Allocation. Dynamic Allocation. Memory Management. Dynamic Allocation. Dynamic Storage Allocation

Lecture 10: Dynamic Memory Allocation 1: Into the jaws of malloc()

Jonathan Worthington Scarborough Linux User Group

Understanding Java Garbage Collection

The Design of the Inferno Virtual Machine. Introduction

Java's garbage-collected heap

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

Garbage Collection in the Java HotSpot Virtual Machine

Free-Space Management

Persistent Binary Search Trees

1 File Management. 1.1 Naming. COMP 242 Class Notes Section 6: File Management

Chapter 7 Memory Management

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

Chapter 7 Memory Management

Memory management. Announcements. Safe user input. Function pointers. Uses of function pointers. Function pointer example

Linux Process Scheduling Policy

The Linux Virtual Filesystem

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

& Data Processing 2. Exercise 3: Memory Management. Dipl.-Ing. Bogdan Marin. Universität Duisburg-Essen

Organization of Programming Languages CS320/520N. Lecture 05. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Garbage Collection in NonStop Server for Java

File Management. Chapter 12

Process Scheduling CS 241. February 24, Copyright University of Illinois CS 241 Staff

General Introduction

Record Storage and Primary File Organization

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer.

Linked Lists, Stacks, Queues, Deques. It s time for a chainge!

Linux Driver Devices. Why, When, Which, How?

Common Data Structures

recursion, O(n), linked lists 6/14

Extreme Performance with Java

Chapter 12. Paging an Virtual Memory Systems

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Trace-Based and Sample-Based Profiling in Rational Application Developer

Copyright (c) 2015 Christopher Small and The Art of Lawyering. All rights reserved.

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

CS5460: Operating Systems

Lecture 1: Data Storage & Index

Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging

COS 318: Operating Systems

Lecture 11 Doubly Linked Lists & Array of Linked Lists. Doubly Linked Lists

File Systems Management and Examples

CSCI E 98: Managed Environments for the Execution of Programs

File System Management

Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm

Operating Systems. Virtual Memory

Operating Systems, 6 th ed. Test Bank Chapter 7

Sequential Data Structures

Zing Vision. Answering your toughest production Java performance questions

Topics in Computer System Performance and Reliability: Storage Systems!

Using Power to Improve C Programming Education

Java Interview Questions and Answers

APP INVENTOR. Test Review

Whitepaper: performance of SqlBulkCopy

Chronon: A modern alternative to Log Files

Angelika Langer The Art of Garbage Collection Tuning

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

Operating System Overview. Otto J. Anshus

Secondary Storage. Any modern computer system will incorporate (at least) two levels of storage: magnetic disk/optical devices/tape systems

The V8 JavaScript Engine

Chapter 11 I/O Management and Disk Scheduling

The Real-Time Operating System ucos-ii

Analysis of a Search Algorithm

Practical Performance Understanding the Performance of Your Application

CSE373: Data Structures and Algorithms Lecture 1: Introduction; ADTs; Stacks/Queues. Linda Shapiro Spring 2016

Introduction to Data Structures

COS 318: Operating Systems. Virtual Memory and Address Translation

Energy Efficient MapReduce

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

Monitoring applications in multitier environment. Uroš Majcen A New View on Application Management.

NetBeans Profiler is an

Review from last time. CS 537 Lecture 3 OS Structure. OS structure. What you should learn from this lecture

1 The Java Virtual Machine

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

14:440:127 Introduction to Computers for Engineers. Notes for Lecture 06

Segmentation Segmentation: Generalized Base/Bounds

6.828 Operating System Engineering: Fall Quiz II Solutions THIS IS AN OPEN BOOK, OPEN NOTES QUIZ.

Chapter 12 File Management

Chapter 12 File Management. Roadmap

Data Structures and Data Manipulation

ICS Principles of Operating Systems

Parrot in a Nutshell. Dan Sugalski dan@sidhe.org. Parrot in a nutshell 1

Unit Storage Structures 1. Storage Structures. Unit 4.3

Chapter 11 I/O Management and Disk Scheduling

361 Computer Architecture Lecture 14: Cache Memory

Storage Management for Files of Dynamic Records

Module 2 Stacks and Queues: Abstract Data Types

Write Barrier Removal by Static Analysis

Dynamic Programming. Lecture Overview Introduction

Replication on Virtual Machines

Memory management basics (1) Requirements (1) Objectives. Operating Systems Part of E1.9 - Principles of Computers and Software Engineering

Name: 1. CS372H: Spring 2009 Final Exam

COMP 356 Programming Language Structures Notes for Chapter 10 of Concepts of Programming Languages Implementing Subprograms.

Performance Comparison of RTOS

Memory Management in the Java HotSpot Virtual Machine

Transcription:

CSL373: Lecture 14 User level memory management

Today: dynamic memory allocation Almost every useful program uses dynamic allocation: gives wonderful functionality benefits don t have to statically specify complex data structures can have data grow as a function of input size allows recursive procedures (stack growth) but, can have a huge impact on performance Today: how to implement, what s hard. Some interesting facts: two or three line code change can have huge, non-obvious impact on how well allocator works (examples to come) proven: impossible to construct an always good allocator surprising result: after 35 years, memory management still poorly understood.

What s the goal? And why is it hard? Satisfy arbitrary set of allocation and free s Easy without free: set a pointer to the beginning of some big chunk of memory ( heap ) and increment on each allocation: heap (free memory) allocation current free position Problem: free creates holes ( fragmentation ) Result? Lots of free space but cannot satisfy request!

a More abstractly freelist What an allocator must do: track which parts of memory in use, which parts are free ideal: no wasted space, no time overhead What the allocator cannot do: control order of the number and size of requested blocks change user ptrs = (bad) placement decisions permanent b malloc(20)? 20 10 20 10 20 The core fight: minimize fragmentation app frees blocks in any order, creating holes in heap holes too small? cannot satisfy future requests.

What is fragmentation really? inability to use memory that is free Two causes Different lifetimes: if adjacent objects die at different times, then fragmentation: if they die at the same time, then no fragmentation: Different sizes: if all requests the same size, then no fragmentation (paging artificially creates this)

The important decisions for fragmentation Placement choice: where in free memory to put a requested block? freedom: can select any memory in the heap ideal: put block where it won t cause fragmentation later. (impossible in general: requires future knowledge) Splitting free blocks to satisfy smaller requests fights internal fragmentation freedom: can chose any larger block to split one way: chose block with smallest remainder (best fit) Coalescing free blocks to yield larger blocks freedom: when coalescing done (deferring can be good) fights external fragmentation 20 10 30 30 30

Impossible to solve fragmentation If you read allocation papers or books to find the best allocator(!?!?!) it can be frustrating: all discussions revolve around tradeoffs the reason? There cannot be a best allocator Theoretical result: for any possible allocation algorithm, there exist streams of allocation and deallocation requests that defeat the allocator and force it into severe fragmentation. What is bad? Good allocator: M*log(n) where M = bytes of live data and n = ratio between smallest and largest sizes. Bad allocator: M*n

Pathological examples Given allocation of 7 20-byte chunks 20 20 20 20 20 20 20 What s a bad stream of frees and then allocates? Given 100 bytes of free space 100 What s a really bad combination of placement decisions and malloc & frees? Next: two allocators (best fit, first fit) that, in practice, work pretty well. pretty well = ~20% fragmentation under many workloads

Best fit Strategy: minimize fragmentation by allocating space from block that leaves smallest fragment Data structure: heap is a list of free blocks, each has a header holding block size and pointers to next 20 30 30 37 Code: Search freelist for block closest in size to the request. (Exact match is ideal) During free (usually) coalesce adjacent blocks Problem: Sawdust remainder so small that over time left with sawdust everywhere fortunately not a problem in practice

Simple bad case: allocate n, m (m<n) in alternating orders, free all the m s, then try to allocate an m+1. Example: start with 100 bytes of memory alloc 19, 21, 19, 21, 19 free 19, 19, 19: Best fit gone wrong 19 21 19 21 19 19 21 19 21 19 alloc 20? Fails! (wasted space = 57 bytes) However, doesn t seem to happen in practice

First fit Strategy: pick the first block that fits Data structure: free list, sorted lifo, fifo, or by address Code: scan list, take the first one. LIFO: put free object on front of list. Simple, but causes higher fragmentation Address sort: order free blocks by address. Makes coalescing easy (just check if next block is free) Also preserves empty space (good) FIFO: put free object at end of list. Gives ~ fragmentation as address sort, but unclear why

An example subtle pathology: LIFO FF Storage management example of subtle impact of simple decisions LIFO first fit seems good: put object on front of list (cheap), hope same size used again (cheap + good locality). But, has big problems for simple allocation patterns: repeatedly intermix short-lived large allocations, with long-lived small allocations. Each time large object freed, a small chunk will be quickly taken. Pathological fragmentation.

First fit: Nuances First fit + address order in practice: Blocks at front preferentially split, ones at back only split when no larger one found before them Result? Seems to roughly sort free list by size So? Makes first fit operationally similar to best fit: a first fit of a sorted list = best fit! Problem: sawdust at beginning of the list sorting of list forces a large requests to skip over many small blocks. Need to use a scalable heap organization When better than best fit? Suppose memory has free blocks: Suppose allocation ops are 10 then 20 Suppose allocation ops are 8, 12, then 12 20 15

The strange parallels of first and best fit Both seem to perform roughly equivalently In fact the placement decisions of both are roughly identical under both randomized and real workloads! Pretty strange since they seem pretty different Possible explanations: first fit ~ best fit because over time its free list becomes sorted by size: the beginning of the free list accumulates small objects and so fits tend to be close to best both have implicit open space hueristic try not to cut into large open spaces: large blocks at end only used until have to be (e.g., first fit: skips over all smaller blocks)

Some worse ideas Worst-fit: strategy: fight against sawdust by splitting blocks to maximize leftover size in real life seems to ensure that no large blocks around Next fit: strategy: use first fit, but remember where we found the last thing and start searching from there. Seems like a good idea, but tends to break down entire list Buddy systems: round up allocations to make management faster result? heavy internal fragmentation

Known patterns of real programs So far we ve treated programs as black boxes. Most real programs exhibit 1 or 2 (or all 3) of the following patterns of alloc/dealloc: ramps: accumulate data monotonically over time bytes peaks: allocate many objects, use briefly, then free all bytes plateaus: allocate many objects, use for a long time bytes

Bytes in use Pattern 1: ramps time e.g., an LRU simulator In a practical sense: ramp = no free! Implication for fragmentation? What happens if you evaluate allocator with ramp programs only?

Bytes in use Pattern 2: peaks time trace of gcc compiling with full optimization Peaks: allocate many objects, use briefly, then free all Fragmentation a real danger. Interleave peak & ramp? Interleave two different peaks? What happens if peak allocated from contiguous memory?

Exploiting peaks Peak phases: alloc a lot, then free everything so have new allocation interface: alloc as before, but only support free of everything. called arena allocation, obstack (object stack), or procedure call (by compiler people) arena = a linked list of large chunks of memory. Advantages: alloc is a pointer increment, free is free, & there is no wasted space for tags or list pointers. 64k 64k free pointer

Bytes in use Pattern 3: Plateaus time trace of perl running a string processing script Plateaus: allocate many objects, use for a long time what happens if overlap with peak or different plateau?

Some observations to fight fragmentation Segregation = reduced fragmentation: Allocated at same time ~ freed at same time Different type ~ freed at different time Implementation observations: Programs allocate small number of different sizes Fragmentation at peak use more important than at low Most allocations small (< 10 words) Work done with allocated memory increases with size. Implications?

Simple, fast segregated free lists Array of with free list to small sizes, tree for larger page Place blocks of same size on same page. Have count of allocated blocks: if goes to zero, can return page Pro: segregate sizes + no size tag + very fast small alloc Con: worst case waste: 1 page per size. R = 2

Typical space overheads Free list bookkeeping + alignment determine minimum allocatable size: store size of block pointers to next and previous freelist element 12 16 0xf0 0xfc 8 byte alignment? addr % 8 =0 Machine enforced overhead: alignment. Allocator doesn t know type. Must align memory to conservative boundary. Minimum allocation unit? Space overhead when allocated?

How do you actually get space? On Unix use sbrk to grow process s heap segment: heap sbrk(4096) /* add nbytes of valid virtual address space */ void *get_free_space(unsigned nbytes) { void *p; if(!(p = sbrk(nbytes))) error( virtual memory exhausted ); return p; } Activates a zero-filled page sized chunk of virtual address space. Remove from address space with sbrk(-nbytes) This last block called the wilderness

Malloc versus OS memory management Relocation: Virtual memory allows OS to relocate physical blocks (just update page table) as a result, it can compact memory. User-level cannot. Placement decisions permanent Size and distribution: OS: small number of large objects malloc: huge number of small objs internal fragmentation more important speed of allocation very important Duplication of data structures malloc memory management layered on top of VM why can t they cooperate? heap stack data code

Fragmentation generalized Whenever we allocate, fragmentation is a problem CPU, memory, disk blocks, more general stmt: the inability to use X that is free Internal fragmentation: How does malloc minimize internal fragmentation? What corresponds to internal fragmentation of a process s time quanta? How does scheduler minimize this? In a book? How does the English language minimize? (and: Page size tradeoffs?) External frag = cannot satisfy allocation request why is external fragmentation not a problem with money?

Reclamation: beyond free Automatic reclamation: User-level: Anything manual can be done wrong: storage deallocation a major source of bugs OS level: OS must manage reclamation of shared resources to prevent evil things. How? Easy if only used in one place: when ptr dies, deallocate (example?) Hard when shared: can t recycle until all sharers done a b sharing indicated by the presence of pointers to the data insight: no pointers to data = it s free! 2 schemes: ref counting; mark & sweep garbage collection

Reference counting Algorithm: counter pointers to object each object has ref count of pointers to it increment when pointer set to it decremented when pointer killed void foo(bar c) { bar a, b; a = c; c->refcnt++; b = a; a->refcnt++; a = 0; a->refcnt--; return; b->refcnt--; } refcnt = 0? Free resource works fine for hierarchical data structures file descriptors in Unix, pages, thread blocks a ref=2 b

Problems Circular data structures always have refcnt > 0 if no external references = lost! ref=1 ref=1 ref=1 Naïve: have to do on every object reference creation, deletion Without compiler support, easy to forget decrement or increment. Nasty bug.

Mark & sweep garbage collection Algorithm: mark all reachable memory; rest is garbage must find all roots - any global or stack variable that holds a pointer to an object. Must find all pointers in objects pass 1: mark a b mark memory pointed to by roots. Then recursively mark all objects these point to, pass 2: sweep go through all objects, free up those that aren t marked. Usually entails moving them to one end of the heap (compaction) and updating pointers a b

Some details Huge lever: Can update pointers So can compact instead of running out of storage Is fragmentation no longer an issue? Compiler support helps (to parse objects). Java and Modula-3 support GC Can sort of do it without: conservative gc at every allocation, record (address, size) scan heap, data & stack for integers that would be legal pointer values and mark! 0xff3ef0 0x3ef0 0x8 0xff400 0x400 Heap 0xff000 0x8000 stack 0x500 data code