Outline CS 6V81-05: System Security and Malicious Code Analysis Understanding the Implementation of Virtual Memory 1 Zhiqiang Lin Department of Computer Science University of Texas at Dallas February 29 th, 2012 2 Implementation 3 Outline Virtual Memory of a Linux Virtual Memory of a Linux 1 2 Implementation Different for each process Identical for each process %esp -specific data structs (ptables, task and mm structs, kernel stack) Physical Kernel code and data User stack Kernel 3 brk Memory mapped region for shared libraries Runtime heap (malloc) 0x08048000 (32) 0x00400000 (64) 0 Uninitialized data (.bss) Initialized data (.data) Program text (.text) 37
ber of nice things. With, programs running on the system can allocate far more than is physically available; indeed, even a single process can have a address space larger than the system s physical. Virtual also allows playing a number of tricks with the process s address space, including mapping in device. Thus far, we have talked about and physical addresses, but a number of the details have been glossed over. The Linux system deals with several types of addresses, each with its own semantics. Unfortunately, the kernel code is not always very clear on exactly which type of address is being used in each situation, so the programmer must be careful. Address type in Linux user process high low kernel addresses Linux Organizes VM as Collection of Areas Linux Organizes VM as Collection of Areas vm_area_struct task_struct mm_struct mm pgd mmap Shared libraries Key user process physical address space page mapping Figur e 13-1. Address types used in Linux kernel logical addresses The following is a list of address types used in Linux. Figure 13-1 shows how these address types relate to physical. User addresses These are the regular addresses seen by user-space programs. User addresses are either 32 or 64 bits in length, depending on the underlying hardware architecture, and each process has its own address space. i386+pae pgd: Page global directory address Points to L1 page table : Read/write permissions for this area Pages shared with other processes or private to this process x86_64 Data Text 0 38 371 sources: http://linux-mm.org/pagetablestructure sources: http://linux-mm.org/pagetablestructure
Powerpc-4k Powerpc-64k sources: http://linux-mm.org/pagetablestructure Simplifying Linking and Loading sources: http://linux-mm.org/pagetablestructure Demand paging Linux Page Fault Handling vm_area_struct shared libraries data 1 read 3 read Segmentation fault: accessing a non-existing page Segmentation fault: accessing a non-existing page Normal page fault Normal page fault Key point: no pages are copied into physical until they are referenced! Known as demand paging Crucial for time and space efficiency text 2 write Protection exception: e.g., violating permission by writing to a read-only page (Linux reports as Segmentation fault) Protection exception: e.g., violating permission by writing to a read-only page 39 (Linux reports as Segmentation fault)
Implementation Implementation Shared Objects Copy-on-write Sharing Revisited: Shared ObjectsSharing Revisited: Shared Objects (COW) Objects Sharing Revisited: Sharing Revisited: Two processes Copy-on-write (COW) Objects Copy-on-write (COW 1 Physical 2 Physical maps the shared object. 1 1 2 1 Physical 1 maps the shared object. Notice how the addresses can be different. mapping a private copy-on-write (COW) object.1 Physical Two processes Area flagged as mapping a private private copy-on-write 22 maps the shared object. Notice how the copy-on-write addresses can area be different. Shared object Shared object copy-on-write object 43 Implementation Demand paging copy-on-write PTEs in private areas Copy-on-write (COW) object. are flagged as read-only Area flagged as private copy-oninstruction writing to private page triggers write protection fault. PTEs in private Handler creates new areas are flagged R/W page. as read-only Instruction restarts upon handler return. copy-on-write Copying deferred as object 44 long as possible! Implementation 45 The execve Function The execve Function Revisited VM and mapping explain how fork provides private address space for each process. To create address for new new process Create exact copies of current mm_struct, vm_area_struct, and page tables. Flag each page in both processes as read-only Flag each vm_area_struct in both processes as private COW User stack Shared, file-backed libc.so.data.text Memory mapped region for shared libraries On return, each process has exact copy of Subsequent writes create new pages using COW mechanism., demand-zero a.out Runtime heap (via malloc), demand-zero Uninitialized data (.bss), demand-zero Initialized data (.data).data.text Program text (.text) Programs and initialized Create vm_area_struct s data backed object files. and page tables for by new areas.bss and stack backed by Setanonymous PC to entry files. point in.text Linux will fault in code and 0 To load and run a new the current Toprogram load anda.out run ainnew processa.out using in execve: program the current process using Free vm_area_struct s and execve: page tables for old areas Free vm_area_struct s Create vm_area_struct s and and page tables for old areas page tables for new areas Programs and initialized data anonymous files. backed by object files..bss and stack backed by, file-backed 2 Set PC to entry point in.textdata pages as needed. Linux will fault in code and data pages as needed. 48 Write to pr copy-on-w page
User-Level Memory Mapping void *mmap(void *start, int len, int prot, int flags, int fd, int offset) User-Level Memory Mapping User-Level Memory Mapping void *mmap(void *start, int len, int prot, int flags, int fd, int offset) void *mmap(void *start, int len, int prot, int flags, int fd, int offset) Map len bytes starting at the offset of the file specified by file description fd, preferably at address start start: may be 0 for pick an address prot: PROT_READ, PROT_WRITE,... flags: MAP_ANON, MAP_PRIVATE, MAP_SHARED,... Return a pointer to start of mapped area (may not be start) len bytes offset (bytes) len bytes start (or address chosen by kernel) Outline 0 0 Disk file specified by file descriptor fd Memory Management 50 Memory management is the heart of operating systems; Each process in a multi-tasking OS runs in its address space. 1 2 Implementation 3 Credit: http://duartes.org/gustavo/blog/post/anatomy-of-a-program-in-
How The Kernel Manages Memory paged out. If a region is backed by a file, its vm file field will be set. By traversing vm file f dentry d inode i mapping, the associated address space for the region may be obtained. The address space has all the filesystem specific information required to perform page-based operations on disk. The relationship between the different address space related structures is illustraed in Figure 4.2. A number of system calls are provided which affect the address Implementation space and regions. These are listed in Table 4.1. Relevant data structures in address space Credit: http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your- mm_struct Figure 4.2. Data Structures Implementation Related to the Address Space mm_struct struct mm_struct { struct mm_struct { [0] struct vm_area_struct *mmap; [120] long unsigned int total_vm; [4] struct rb_root mm_rb; [124] long unsigned int locked_vm; [8] struct vm_area_struct *mmap_cache; [128] long unsigned int shared_vm; [12] long unsigned int (*get_unmapped_area)(struct file *, long unsigned int, long unsigned int, long unsigned int, long unsigned int); [16] void (*unmap_area)(struct mm_struct *, long unsigned int); [20] long unsigned int mmap_base; [24] long unsigned int task_size; [28] long unsigned int cached_hole_size; [32] long unsigned int free_area_cache; [36] *pgd; [40] atomic_t mm_users; [44] atomic_t mm_count; [48] int map_count; [52] struct rw_semaphore mmap_sem; [80] spinlock_t page_table_lock; [96] struct list_head mmlist; [104] mm_counter_t _file_rss; [108] mm_counter_t _anon_rss; [112] long unsigned int hiwater_rss; [116] long unsigned int hiwater_vm; [132] long unsigned int exec_vm; [136] long unsigned int stack_vm; [140] long unsigned int reserved_vm; [144] long unsigned int def_flags; [148] long unsigned int nr_ptes; [152] long unsigned int start_code; [156] long unsigned int end_code; [160] long unsigned int start_data; [164] long unsigned int end_data; [168] long unsigned int start_brk; [172] long unsigned int brk; [176] long unsigned int start_stack; [180] long unsigned int arg_start; [184] long unsigned int arg_end; [188] long unsigned int env_start; [192] long unsigned int env_end;
mm_struct Page Directory Chapter 13: mmap and DMA struct mm_struct { [196] long unsigned int saved_auxv[44]; [372] unsigned int dumpable : 2; [376] cpumask_t cpu_vm_mask [380] mm_context_t context; [424] long unsigned int swap_token_time; [428] char recent_pagein; [432] int core_waiters; [436] struct completion *core_startup_done; [440] struct completion core_done; [468] rwlock_t ioctx_list_lock; [484] struct kioctx *ioctx_list; } SIZE: 488 struct mm_struct PGD Virtual Address (addr) 00111010110110011001101110110101111 PMD Software relationships pgd part pmd part pte part offset PTE pgd_offset(mm_struct, addr); pmd_offset(, addr); pte_offset(, addr); pte_page(); page. struct page physical page Hardware relationships pgd_val(pgd); pmd_val(pmd); pte_val(pte); Figur e 13-2. The thr ee levels of Linux page tables vm_area_struct Credit: http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your- Page Table A page-aligned array of items, Implementation each of which is called a Page Table Entry. The kernel uses the type for the items. A contains the physical address of the data page. The types introduced in this list are defined in <asm/page.h>, which must be vm_area_struct included by every source file that plays with paging. The kernel doesn t need to worry about doing page-table lookups during normal program execution, because they are done by the hardware. Nonetheless, the kernel must arrange things so that the hardware can do its work. It must build the page tables and look them up whenever the processor reports a page fault, that is, struct vm_area_struct { [0] struct mm_struct *vm_mm; [4] long unsigned int ; [8] long unsigned int ; 376 [12] struct vm_area_struct *; [16] pgprot_t vm_page_prot; [20] long unsigned int ; [24] struct rb_node vm_rb; union { struct {...} vm_set; struct raw_prio_tree_node prio_tree_node; [36] } shared; [52] struct list_head anon_vma_node; [60] struct anon_vma *anon_vma; [64] struct vm_operations_struct *vm_ops; [68] long unsigned int vm_pgoff; [72] struct file *vm_file; [76] void *vm_private_data; [80] long unsigned int vm_truncate_count; } SIZE: 84
Page Fault 120 Noncontiguous Memory Allocation Chapter 7 Figure 7.3. Relationship Between vmalloc(), alloc page() and Page Faulting do_page_fault 7.3 Freeing a Noncontiguous Area The function vfree() is responsible for freeing a area as described in Table 7.2. It linearly searches the list of vm structs looking for the desired region and then calls vmfree area pages() on the region of to be freed, as shown in Figure 7.4. void vfree(void *addr) Frees a region of allocated with vmalloc(), vmalloc dma() or vmalloc 32() Table 7.2. Noncontiguous Memory Free API do_page_fault Outline do_page_fault expand_stack handle_mm_fault force_sig_info search_exception_table find_vma find_vma_prev pmd_alloc pte_alloc handle_pte_fault search_one_table do_no_page do_swap_page do_wp_page establish_pte do_anonymous_page alloc_page lru_cache_add Figure 4.11. Call Graph: do page fault() 1 2 Implementation 4.6. Page Faulting 81 vmfree area pages() is the exact opposite of vmalloc area pages(). It walks the page tables and frees up the page table entries and associated pages for the region. 3
References Memory management is crucial Machine introspection Memory forensics Traversing kernel data structures to understand the Memory failures Exploits Dening BEFORE MEMORY WAS VIRTUAL http://cs.gmu.edu/cne/pjd/pubs/bvm.pdf Linux device drivers, Addison-wisely. http://www.cs.cmu.edu/afs/cs/academic/class/15213- f10/www/lectures/16-vm-systems.pdf Understanding Linux manager http://www.kernel.org/doc/gorman/pdf/understand.pdf Understanding Linux kernel (3rd edition) http://linux-mm.org/pagetablestructure