Týr: a dependent type system for spatial memory safety in LLVM Vítor De Araújo Álvaro Moreira (orientador) Rodrigo Machado (co-orientador) August 13, 2015 Vítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 1 / 13
igher-level languages are usually memory-safe A program cannot make an invalid access to memory Memory safety This usually has: Higher-level languages are usually memory-safe a static component A program cannot (e.g., make type an system invalid disallows access to memory invalid type casts) This usually has: a dynamic a static component (e.g., (e.g., run-time type system bounds disallows checking) invalid type casts) data carries a dynamic enough component metadata (e.g., with run-time it to bounds allow checking) e.g., array is data stored carriesas enough length metadata followed withby to elements allow runtime checking e.g., array is stored as length followed by elements int[] array int[] = new array int[] = new { 23, int[] 42, { 8123, }; 42, 81 }; 3 23 42 81 array[1] = 13; (works) array[42] array[1] = 13; (throws = 13; exception) array[i] array[42] = 13; (may = 13; or may throws notexception throw exception) array[i] = 13; may or may not throw exception (dynamic check) 2/ Vítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 2 / 13
a dynamic component (e.g., run-time bounds checking) Memory data carries safety enough (or metadata lack thereof) with it to allow checking e.g., array is stored as length followed by elements In contrast, C data carries no metadata int[3] is just three contiguous integers in memory int array[] int[] = array { 23, = 42, new 81 int[] }; { 23, 42, 81 }; 3 23 42 81 Language enforces no memory safety array[1] = 13; array[1] array[42] = 13; = 13; throws exception array[42] array[i] = 13; = (invalid 13; may memory may access; not throw may exception segfault, (dynamic check) array[i] = 13; may overwrite other program data) 2/16 Programmer has full control of data representation (no metadata) Programmer can decide when checks are needed However, very error-prone Source of bugs, security vulnerabilities (e.g., Heartbleed) Vítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 3 / 13
Recovering memory safety How can we recover memory safety in C programs? Traditional solution: add metadata to allow checking This has a number of drawbacks: It changes memory representation of objects requires recompilation of everything (external libraries, OS syscalls) C pointers can point to any part of an object No simple/cheap way to find metadata from an arbitrary pointer Pointers themselves must carry bounds, or separate data structure must be looked up Changes representation and/or is expensive But there is another way... ítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 4 / 13
Recovering memory safety A correct C program has no memory safety violations Programmer must keep track of array bounds, etc., manually Common idioms int sum(int *array, int len) struct data { int len; char *payload; }; Bounds information is already present in C programs But in an ad-hoc way that the compiler cannot check Solution: allow the programmer to formally express these relationships So that the compiler can validate their correct usage Vítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 5 / 13
Deputy Dependent type system for C Programmer adds annotations to pointers int sum(int * COUNT(len) array, int len) struct data { int len; char * COUNT(len) payload; }; Compiler now has enough information to check memory access Automatically insert checks to ensure correct usage Compiler employs the same metadata already present in the program Smaller memory overhead Inserted checks can often be proved redundant and optimized out for (i=0; i<len; i++) { assert(i>=0 && i<len); sum += array[i] } However, Deputy is based on CIL, which is C-only C++ suffers from the same problems Vítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 6 / 13
LLVM, Clang, Týr Our approach: Use LLVM instead of CIL LLVM is a language-agnostic framework for compilation, optimization code analysis and transformation in general designed around a typed assembly-like language (LLVM IR) Clang is a C/C++ compiler which emits LLVM IR We propose a dependent type system for LLVM IR, called Týr Support both C and C++ by targeting LLVM IR LLVM/Clang are actively developed, unlike CIL ítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 7 / 13
Týr-1 foo.c Clang foo.ll Týr-1 foo.ll + checks Annotation extractor foo.dep Compile C/C++ to Clang Check pointer foo.ll LLVM usage against provided annotations foo.ll Týr-2 + checks Insert run-timeopt checks + chk/opt Insert tracing information foo.ll* User diagnostics Vítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 8 / 13
Týr-2 Annotation extractor foo.dep foo.ll + checks LLVM opt foo.ll + chk/opt Týr-2 foo.ll* LLVM assembler User diagnostics Machine code Run the rest of the LLVM pipeline (optimizations) Look for checks which were found to be always false static error Remove tracing information and generate machine code Vítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 9 / 13
Type system Replaces LLVM IR pointer constructor (τ ) with two new types: Ptr τ, low-bound, high-bound : bounded pointer LocalVar τ : pointer to local variable in the stack Defines rules which ensure checks will be performed when a pointer is accessed: is this access valid? when metadata is modified: does this break any invariant? int f(int * COUNT(len) array, int len) { array[5] = 42; // is this within bounds? len = len + 1; // are these new bounds valid? } ítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 10 / 13
Current status Done Formal rules for typechecking and insertion of checks Initial work on building the LLVM module Next steps Implementation of the rules within LLVM module Experimental validation (performance, coverage) Proof of correctness of the type system Vítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 11 / 13
Related work Hardware-based approaches Watchdog (Nagarakatte et al.) (ISCAS 2012, CGO 2014) Uses hardware to speed up pointer bounds verification Automatic instrumentation of legacy code SoftBound (Nagarakatte et al.) (PLDI 2009) SAFECode (Dhurjati et al.) (PLDI 2006) CCured (Necula et al.) (TOPLAS 2005) Keep their own (possibly redundant) metadada Safe dialects of C Cyclone (Jim et al.) (USENIX 2002) Replaces unsafe C constructions with more well-behaved constructions ítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 12 / 13
Conclusion Approach based on dependent types Makes information already latent in C/C++ programs explicit Compiler can enforce invariants described the the programmer No change in data representation Allows partial/gradual migration Compatibility with external libraries Low overhead Reuse already existing information Compiler-inserted checks can be optimized ítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 13 / 13