SecVisor: A Seshadri Mark Luk Ning Qu CyLab of CMU SOSP2007
Outline Introduction Assumption SVM Background Design Problems Implementation Kernel Porting Evaluation Limitation
Introducion Why? Only approved code could run in kernel mode What? A tiny hypervisor How? Using hardware memory protection to tell approved code from un-approved code
Deploy Requirement CPU with SVM feature support TCB(Trust Computing Base) CPU, Memory Controller and Memory Assumption
SVM Background: 1/3 2 Mode Support Host Mode for VMM, e.g. VMM runs at Ring0 Guest Mode for Guest OS, e.g. Kernel runs at Ring0, App at Ring3
SVM Background: 2/3 Host-Guest Execute Model Each Guest has a data structure called VMCB(Virtual Machine Control Block) which contain its execution state The execution of Guest is triggered when VMM call vmrun(vmcb) On hitting an intercept, CPU suspends the Guest, store its state to VMCB, then exit to Host Guest will execute until Host call vmrun(vmcb) again
SVM Background: 3/3 ASID(Address Space IDentifier) Extra 1 bit to TLB-entry to tell from the Host-AP and Guest-AP Intercept Events Including instruction, interrupt, exception, io, MSR intercept DEV(Device Exclusion Vectors) Extra 1 bit to per Physical-Page(4k), set to 1 means to disable DMA read or write NPT(Nested Page Table) Translate Guest physical address to CPU physical address, provide hardware-supported physical memory virtualization
Background of Background TLB(hw): A Translation Lookaside Buffer (TLB) is a CPU cache that is used by MMU to improve the speed of virtual address translation MMU(hw): A memory management unit (MMU) is a computer hardware component responsible for handling accesses to memory requested by CPU Page Table(OS):A page table is the data structure used by a virtual memory system in OS to store the mapping between virtual addresses and physical addresses When CPU send out the request to memory, will first check TLB. if TLB hit, return the Physical Address from L1 or L2 cache. Otherwise TLB miss, will look up Page Table or trigger Page Fault.
Design Problems How to protect SecVisor itself? How ensure only approved code executes in kernel mode? How ensure the Kernel-User Mode switch?
Requirement Analysis p1 Entry into kernel mode, IP should point to Approved Code p2 Inside the kernel mode, IP should still point to Approved Code p3 Exit from the kernel mode, should set to User Mode p4 Memory containing Approved Code should not be modified
SecVisor Memory Management Virtualizing Memory Virtualizing DEV Kernel Mode Entry& Exit Implementation
SecVisor Memory Management: Implementation Allocate its physical memory just after the ACPI code in RAM Pass a command line parameter when kernel boot to inform the RAM reduction SecVisor execute in Host mode, which owns the different address space from Guest
Implementation Virtualizing Memory based on NPT(Nested Page Table): NPT is the second page table, which is suited for setting page-table-based protections Allocate physical pages from SecVisor own memory for NPT, since SecVisor s physical pages are never accessible to the Guest, and they are protected against DMA write Basic Idea: SecVisor maintains a list of Guest physical pages that contain approved code. When executing in kernel mode, SecVisor clears the non-execute permission only for those entries within the approved list Optimization: Maintain 2 NPTs, one for User Mode, the other for Kernel Mode
Implementation Virtualizing Memory based on SPT(Shadow Page Table): SPT modify execute permission over user and kernel memroy on each mode transition SPT maintains the mappings between virtual and host physical address, synchronize with the Page Table(OS Kernel) by intercept triggered by Page Table related operations, e.g. modify page table, handle page fault.
Virtualizing DEV: Implementation Allocate physical pages for DEV from its own memory SecVisor intercepts all writes to the DEV configuration registers
Kernel Mode Entry& Exit: Implementation Kernel Mode Exit All kernel mode exit will cause a protection exception, as its handling, SecVisor will set the CPL field of VMCB to 3, to ensure when Guest resumes execution, the CPU is in Ring3 Kernel Mode Entry Maintain the shadow copies of GDT, LDT, IDT and some MSRs to make sure they are approved. For these registers are only written in kernel mode, SecVisor only needs to modify these registers before allowing user mode to execute. It is done by changing related values in the VMCB as part of handling a kernel to user mode transition.
Kernel Porting(1/2) Kernel bootstrap code: 1 setup(): initialize the hardware with calls to BIOS 2 decompress kernel(): perform further hardware init, decompress the runtime code, then jump to the start address of the runtime Porting: decompress kernel() will invoke SecVisor, also pass the start& end address of runtime code segment to SecVisor. After the policy approve the runtime, SecVisor creates a VMCB, set memory protection over the runtime code, then transfer control to the runtime code by vmrun instruction.
Load/Unload kernel modules load module() free module() Kernel Porting(2/2) Porting: In load module(), invoke SecVisor via a hypercall, the argument is the start& end addresses of the relocated module.
Evaluation Design Compliance Performance
Design Compliance Evaluation Small Size Kernel Interface 1 Load&Unload Kernel Module 2 Kernel Init Effort to port a kernel a) decompress kernel() b) load module(),free module()
LMBench results: Performance Evaluation
Application performance: Performance Evaluation
Performance Evaluation Application performance:
Limitation& Future Work Limitation SecVisor guarantee integrity of the code that executes in kernel mode, but not the integrity of the control flow Future Work Multi-CPU support
SecVisor: A Q&A Thank you. Any questions?