Inline Reference Monitors

Similar documents
Fast Byte-Granularity Software Fault Isolation

Processor Architectures

Intel 8086 architecture

W4118: segmentation and paging. Instructor: Junfeng Yang

Return-oriented programming without returns

I Control Your Code Attack Vectors Through the Eyes of Software-based Fault Isolation. Mathias Payer, ETH Zurich

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

Instruction Set Design

Full System Emulation:

Virtualization. Clothing the Wolf in Wool. Wednesday, April 17, 13

User. Role. Privilege. Environment. Checkpoint. System

(Refer Slide Time: 00:01:16 min)

Virtualization. Explain how today s virtualization movement is actually a reinvention

Fine-Grained User-Space Security Through Virtualization. Mathias Payer and Thomas R. Gross ETH Zurich

CSC 2405: Computer Systems II

CSCI E 98: Managed Environments for the Execution of Programs

Board Notes on Virtual Memory

VMkit A lightweight hypervisor library for Barrelfish

Review from last time. CS 537 Lecture 3 OS Structure. OS structure. What you should learn from this lecture

How do Users and Processes interact with the Operating System? Services for Processes. OS Structure with Services. Services for the OS Itself

COS 318: Operating Systems. Virtual Machine Monitors

Cloud Computing. Up until now

Computer Organization and Architecture

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Exception and Interrupt Handling in ARM

Chapter 5 Cloud Resource Virtualization

COS 318: Operating Systems. Virtual Machine Monitors

Parrot in a Nutshell. Dan Sugalski dan@sidhe.org. Parrot in a nutshell 1

Virtual Machine Monitors. Dr. Marc E. Fiuczynski Research Scholar Princeton University

Format string exploitation on windows Using Immunity Debugger / Python. By Abysssec Inc

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

CIS 551 / TCOM 401 Computer and Network Security

Carlos Villavieja, Nacho Navarro Arati Baliga, Liviu Iftode

COS 318: Operating Systems

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 20: Stack Frames 7 March 08

Jonathan Worthington Scarborough Linux User Group

CVE Adobe Flash Player Integer Overflow Vulnerability Analysis

The Design of the Inferno Virtual Machine. Introduction

Computer Architectures

Eugene Tsyrklevich. Ozone HIPS: Unbreakable Windows

x86 ISA Modifications to support Virtual Machines

Operating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015

Intel s Virtualization Extensions (VT-x) So you want to build a hypervisor?

Bypassing Memory Protections: The Future of Exploitation

Operating System Organization. Purpose of an OS

Thesis Proposal: Improving the Performance of Synchronization in Concurrent Haskell

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

Using Power to Improve C Programming Education

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Microcontrollers Deserve Protection Too

Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s)

x64 Servers: Do you want 64 or 32 bit apps with that server?

Intrusion Detection via Static Analysis

6.828 Operating System Engineering: Fall Quiz II Solutions THIS IS AN OPEN BOOK, OPEN NOTES QUIZ.

CS161: Operating Systems

Processes and Non-Preemptive Scheduling. Otto J. Anshus

Hardware Assisted Virtualization

Between Mutual Trust and Mutual Distrust: Practical Fine-grained Privilege Separation in Multithreaded Applications

The Microsoft Windows Hypervisor High Level Architecture

Full and Para Virtualization

Outline. Outline. Why virtualization? Why not virtualize? Today s data center. Cloud computing. Virtual resource pool

Virtualization for Cloud Computing

Lecture 17: Mobile Computing Platforms: Android. Mythili Vutukuru CS 653 Spring 2014 March 24, Monday

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

İSTANBUL AYDIN UNIVERSITY

V3 Storage Architecture Overview and Implications for VDI. May 2016

Example of Standard API

Introduction. Application Security. Reasons For Reverse Engineering. This lecture. Java Byte Code

Oracle Solaris Security: Mitigate Risk by Isolating Users, Applications, and Data

Load Testing and Monitoring Web Applications in a Windows Environment

Leveraging Thin Hypervisors for Security on Embedded Systems

CS5460: Operating Systems

Operating Systems. Notice that, before you can run programs that you write in JavaScript, you need to jump through a few hoops first

Eclipse Visualization and Performance Monitoring

Virtualization. Dr. Yingwu Zhu

LSN 2 Computer Processors

The Classical Architecture. Storage 1 / 36

A deeper look at Inline functions

Operating System Overview. Otto J. Anshus

A Unified View of Virtual Machines

CPU Organization and Assembly Language

QUIRE: : Lightweight Provenance for Smart Phone Operating Systems

General Introduction

Instruction Set Architecture

Software Vulnerabilities

ReactOS is (not) Windows. Windows internals and why ReactOS couldn t just use a Linux kernel

Design and Implementation of the Heterogeneous Multikernel Operating System

An Implementation Of Multiprocessor Linux

Exceptions in MIPS. know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine

Portable Software Fault Isolation

Virtualization. P. A. Wilsey. The text highlighted in green in these slides contain external hyperlinks. 1 / 16

COS 318: Operating Systems. Virtual Memory and Address Translation

Driving force. What future software needs. Potential research topics

sndio OpenBSD audio & MIDI framework for music and desktop applications

Hypervisor-Based, Hardware-Assisted System Monitoring

Kernel Types System Calls. Operating Systems. Autumn 2013 CS4023

KVM & Memory Management Updates

Bypassing Browser Memory Protections in Windows Vista

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

Virtualization. P. A. Wilsey. The text highlighted in green in these slides contain external hyperlinks. 1 / 16

Transcription:

Inline Reference Monitors Michael McCoyd 7 Septemeber, 2012 1 SFI sandboxing SFI is a way to mitigate risk of untrusted guest code by constraining what it can do. For example running a image decoder or arbitrary plugin in a browser at close to full speed. Sand Box some part of address space for guest can not trample on browsers (host) data structures. SFI on RISC Can modify the code so nothing bad can happen. All instructions are fixed length so for a given byte stream there is only one instruction stream. Why is SFI on CISC hard SFI on CISC was harder as variable length instructions. The attack can jump to an address in the middle of an intended instruction and thus create a separate instruction stream. Variable length instructions makes it impractical to modify code to always be safe. PittSFIeld: transform CISC into fixed length instruction architecture Was the next revolution and showed could have fast SFI on x86. Approach was to ensure that jumps only occur to the start of 32bit chunks, by masking addresses. 32 bytes Need to fit in chunk all the checks you want for that instruction, else you could jump past the checks. 32 bytes gives enough room if rewrite any really long x86 instructions. Also a power of 2 given bit masking method of enforcing addresses. 2 SFI Development Path SFI (RISC) -> PittSFIeld (CISC) -> (5 years) -> NaCL x86 w/ segments -> NaCL x86-64, ARM Native Client x86 32bit (NaCl) key: use x86 segment registers to reduce overhead of PittS- FIeld. x86 has two memory protection mechanisms: segmentation and page tables. 64 bit removed segmentation as not often used. Criticized as locking web into 32bit and x86, not ARM as well. NaCL ARM and 64bit x86-64 and ARM with modest success, but only modest. 20 years deployment from initial RISC SFI to cross platform SFI. Inputs Pitsfield needs assembly code (source), research system, can not be applied to arbitrary binaries. NaCl - modified complier - need the source to instrument the binary. 1

2.1 Trusted Computing Base A fundamental part of PittSFIeld is that rewriter need not be trusted, only the verifier. Trusted Computing Base (TCB) code that needs be correct for security properties to be true. Instrumentor takes x86 program source and produces an instrumented x86 program. Add masks and checks before jumps. -> instrumented Not part of the TCB. Bugs in instrumentor could only cause it to fail to accept program or change program semantics. Verifier at runtime verifies properly instrumented. Checks each jump or store immediately preceded by mask instruction. The security property that we want is sandboxing: not able to write outside its designated security area. For PitSFIeld, the TCB is only the verifier (and the Browser/OS). The PitSFIeld TCB was only intended to assure security, not proper execution. Crucially this split architecture allows the verifier to be much simpler than the rewriter. 2.2 Why was this separation possible? 1) Transform code to subset of x86 that was easier to verify Change a hairy instruction set into a simpler subset that is easier to verify. For example, change crazy address modes to easier code. Instead of: eax + 8 * ebx + 17, use lea to figure out what address that is, load into other register ecx, then mask ecx, then store to address in ecx. 2) Sufficient conditions, but stricter than necessary The transformation and verification are sufficient to ensure we do not write outside sandbox but may be more strict than needed to be. The verifier can be simpler as it has no memory across chunks. Verifier assumes nothing about a block of code at its start. Marks registers as safe for memory or jumps as their values are masked. Takes about 2 bits for each register at each point in code; 1 bit for code, 1 bit for data. Transform and check In important technique is to map a hairy problem (in this case an instruction set) into much smaller subset with easier semantics. Here it is easy to see that the smaller subset is safe and a verifier for it is possible. The insight needed is in picking the safer subset or problem. A problem with the technique is that it can reduce efficiency. Hardware requirements PitSFIeld does not require NOX bit, nor use if hardware has it. 2.3 PittSFIeld Memory Design PittSFIeld had clever memory layout so memory checks are just masking. Base address 0x30000000 0x20000000 0x10000000 0x00000000 Use host code & data guard pages guest data & stack guard pages guest code guard pages Zero tag region The sandbox is the guest code and data areas, maybe also the zero tag region. 2

Invariant: The guest code will never write outside the guest data region How: instrument every write instruction Need to not only prevent writes outside of guest area, but also writes that could change the code and remove its write protections. Ensure: every write instruction in code is what was loaded in So: need to protect code are from writing to prevent changes to the instructions. Method: every write instruction is masked 2.4 Examples 2.5 Write memory and 0x20FFFFFFFF, ebx mov eax, (ebx) mask ebx to either 0x00... (unused zero mov eax, ( ebx ) tag) or 0x20... (guest data) Could imagine check with branch, but branch instructions are slow. It is faster to just force it into that range. Writing to the zero tag area is fine as either never use that area or set the OS to mark them not writable. 2.6 Jump jmp 0 x12345678 [no change] Constant address so statically reject or allow jmp ( eax ) AND 0x10FFFFE0, jmp ( eax ) eax 0x10... masks to guest code area 0x...E0 alignes to start of chunk The masking of a jump will allow execution to enter the zero tag region. Thus that region needs to be marked not used by the page tables. 2.7 Return Return is effectively an indirect jump, which is unsafe. return pop eax and 0x10FFFFE0, eax Return address on stack in guest data area, jmp ( eax ) function body could have changed the return address. Problematic for performance as modern x86 hardware caches recent return addresses and will speculate where an upcoming return will go. This rewrite removes the return speculation. A significant performance overhead for programs with many call returns. 3

Concurrency A return optimization that does not work is to AND the return addess in place on the stack just before the return. This preserves the speculation gains. It does not work in the presence of concurrency. With two guest threads one thread could change the return address between the mask and the return of the other. Concurrency was not an issue with the earlier examples as registers are thread local, saved and restored between threads. Can extra hardware support solve the problem? Instruction that blocks other threads for one instruction. Might have different threads of same process have different protection views of the same memory. How generally solve concurrency issues? Common defense against malicious code changing your data is to make private copy in local state know others can not modify. Easy to erroneously write: check(x), do with(x). time of check to time of use vulnearability, a race condition on shared mutable state. This is why kernels turn off all concurrency for critical actions. 2.8 Read memory read (ebx), eax [no change] Reading is not part of their security claims. User must judge if PittSFIeld s claims met users needs. Does allow leak of privacy. Fixing by checking all reads is big performance hit. 4

2.9 SFI vs OS process separation Why do we need both SFI and OS processes? What are the tradeoffs of choosing between them? Seems a lot of similarity in wanting to ensure isolation between the guests/processes. Using an example of a video player, how do we communicate encoded video to the guest? Standard argument, but no great performance comparisons, is: Feature SFI (PittSFIeld) OS Call speed host to guest fast - no need to copy data slower - fork, copy data Guest to host writes Must break sandbox Same context switch as call. 2.10 Details Feature SFI (PittSFIeld) OS Basic setup Code rewriting for separation page tables and memory protection to ensure each has own virtual address space. Startup costs compile time rewrite forking overhead share data - maybe SFI faster IPC with API you create host guest just write data into guest, read any guest data. host guest: read just access, or browser store originally in players area. host guest: write??? Multiple guests No - memory layout prevents Yes Memory mapped files Limited up to address space size Cross domain calls host guest: trivial IPC marshal host guest: Program changes Normal access when reading data structures in host Need to marshal / unmarshal for each access access to resources rewrite?? Other alternatives to SFI are lightweight mechanisms to run in virtual machine. OSDI 2012 had an interesting paper Dune: Safe User-level Access to Privileged CPU Features that used hardware virtualization to give a process limited access to CPU privileged features. Clearly SFI is useful for very specialized low end hardware with out memory protection. 2.11 Can Guest talk to Host? How does SFI return any results to the caller, as masked inside sandbox. 5

Imagine running a guest video encoder. Version 1: only write to from buffer on stream. That is nice but also want to connect back to wed site for streaming video. Lets say that is ok to the site we came from but not other sites. How do we do that? Adding access code into the guest will not let that code talk to the browser, as still sandboxed. Could store arguments in guest data and cause host to read and execute on them. We could use page faults or other exceptions to transfer to host, but we are reimplementing system calls. Cleaner is to change verifier to allow one address that you can call to outside the sandbox, a clean implementation of system call. Stack Where is stack in this process? Guest and host would share a stack. So one guest could read the stack after host use. If two guests were running, one could modify the hosts stack out from under it. If such concurrency, host copies arguments to a separate stack used just by the host. If host call back to guest, host should save its registers before call guest. Make lots of defensive copies with mutual distrusting parties in face of concurrency, much like you need in OS design. Paper did not discuss. The interesting issues of OS design show up in SFI as well. Could allow no callbacks to guest. 3 Reference Monitor Pattern Given sandboxed guest code with no external access and host code can do anything such as accessing the network. You want to allow the guest a few limited abilities, say 10, such as to connect back to its website. The pattern is to use a Reference Monitor. The guest makes its request to some code within the host, called the reference monitor. The reference monitor checks if the guest is allowed to do the requested action. Instead of changing the SFI verifier to allow all 10 things. Decompose problem into two pieces. SFI verifier that completely sandboxes guest except one entry point to the host. Reference monitor in host checks those 10 things. Similar to system calls. 4 SFI limitations summary only one guest return gets slower (fix with mask and memory ) privacy not a goal 6