Týr: a dependent type system for spatial memory safety in LLVM



Similar documents
Semantic Analysis: Types and Type Checking

Compiling Object Oriented Languages. What is an Object-Oriented Programming Language? Implementation: Dynamic Binding

Crash Course in Java

Securing software by enforcing data-flow integrity

CCured: Type-Safe Retrofitting of Legacy Software

Enforcing Security Policies. Rahul Gera

Trustworthy Software Systems

A Static Analyzer for Large Safety-Critical Software. Considered Programs and Semantics. Automatic Program Verification by Abstract Interpretation

1 Abstract Data Types Information Hiding

Fully Automated Static Analysis of Fedora Packages

Lecture 11 Doubly Linked Lists & Array of Linked Lists. Doubly Linked Lists

CSE 373: Data Structure & Algorithms Lecture 25: Programming Languages. Nicki Dell Spring 2014

Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C

Bypassing Browser Memory Protections in Windows Vista

CCured: Type-Safe Retrofitting of Legacy Software

LLVMLinux: Embracing the Dragon

Implementation Aspects of OO-Languages

Visualizing Information Flow through C Programs

Memory Allocation. Static Allocation. Dynamic Allocation. Memory Management. Dynamic Allocation. Dynamic Storage Allocation

Transparent Monitoring of a Process Self in a Virtual Environment

Software Testing & Analysis (F22ST3): Static Analysis Techniques 2. Andrew Ireland

Chapter 5 Names, Bindings, Type Checking, and Scopes

CS 111 Classes I 1. Software Organization View to this point:

Cyclone: A Type-Safe Dialect of C

recursion, O(n), linked lists 6/14

Virtual Machine Learning: Thinking Like a Computer Architect

Wiggins/Redstone: An On-line Program Specializer

Sources: On the Web: Slides will be available on:

Introduction to Automated Testing

Tachyon: a Meta-circular Optimizing JavaScript Virtual Machine

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

2) Write in detail the issues in the design of code generator.

I Control Your Code Attack Vectors Through the Eyes of Software-based Fault Isolation. Mathias Payer, ETH Zurich

EMSCRIPTEN - COMPILING LLVM BITCODE TO JAVASCRIPT (?!)

Design: Metadata Cache Logging

A Test Suite for Basic CWE Effectiveness. Paul E. Black.

A Brief Introduction to Static Analysis

Type Casting Verification: Stopping an Emerging Attack Vector

Chapter 12. Paging an Virtual Memory Systems

Static detection of C++ vtable escape vulnerabilities in binary code

To Java SE 8, and Beyond (Plan B)

The programming language C. sws1 1

Oracle Solaris Studio Code Analyzer

Run-Time Type Checking for Binary Programs

SoK: Eternal War in Memory

The software model checker BLAST

The AVR Microcontroller and C Compiler Co-Design Dr. Gaute Myklebust ATMEL Corporation ATMEL Development Center, Trondheim, Norway

Java Interview Questions and Answers

Software Engineering Techniques

Software in safety critical systems

Towards practical reactive security audit using extended static checkers 1

Security Certification of Third- Parties Applications

Applying Clang Static Analyzer to Linux Kernel

Characteristics of Java (Optional) Y. Daniel Liang Supplement for Introduction to Java Programming

Secure Software Programming and Vulnerability Analysis

Python, C++ and SWIG

Restraining Execution Environments

Applications of formal verification for secure Cloud environments at CEA LIST

Lecture 22: C Programming 4 Embedded Systems

Software Vulnerabilities

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

Chapter 3: Operating-System Structures. System Components Operating System Services System Calls System Programs System Structure Virtual Machines

Design by Contract beyond class modelling

Storage Classes CS 110B - Rule Storage Classes Page 18-1 \handouts\storclas

Input/Output Subsystem in Singularity Operating System

QUIRE: : Lightweight Provenance for Smart Phone Operating Systems

IKOS: A Framework for Static Analysis based on Abstract Interpretation (Tool Paper)

Static Code Analysis Procedures in the Development Cycle

Chapter 7 Memory Management

Dynamic Buffer Overflow Detection

C# and Other Languages

Lecture 10: Dynamic Memory Allocation 1: Into the jaws of malloc()

Object-Oriented Design Lecture 4 CSU 370 Fall 2007 (Pucella) Tuesday, Sep 18, 2007

Compiled Code Verification Survey and Prospects

Darshan Institute of Engineering & Technology PL_SQL

Static Analysis for Software Verification. Leon Moonen

RTI Monitoring Library Getting Started Guide

IBM SDK, Java Technology Edition Version 1. IBM JVM messages IBM

TOOL EVALUATION REPORT: FORTIFY

Chapter 15 Operating System Security

Moving from CS 61A Scheme to CS 61B Java

Keil C51 Cross Compiler

Introduction to Embedded Systems. Software Update Problem

Unified Architectural Support for Soft-Error Protection or Software Bug Detection

Transcription:

Týr: a dependent type system for spatial memory safety in LLVM Vítor De Araújo Álvaro Moreira (orientador) Rodrigo Machado (co-orientador) August 13, 2015 Vítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 1 / 13

igher-level languages are usually memory-safe A program cannot make an invalid access to memory Memory safety This usually has: Higher-level languages are usually memory-safe a static component A program cannot (e.g., make type an system invalid disallows access to memory invalid type casts) This usually has: a dynamic a static component (e.g., (e.g., run-time type system bounds disallows checking) invalid type casts) data carries a dynamic enough component metadata (e.g., with run-time it to bounds allow checking) e.g., array is data stored carriesas enough length metadata followed withby to elements allow runtime checking e.g., array is stored as length followed by elements int[] array int[] = new array int[] = new { 23, int[] 42, { 8123, }; 42, 81 }; 3 23 42 81 array[1] = 13; (works) array[42] array[1] = 13; (throws = 13; exception) array[i] array[42] = 13; (may = 13; or may throws notexception throw exception) array[i] = 13; may or may not throw exception (dynamic check) 2/ Vítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 2 / 13

a dynamic component (e.g., run-time bounds checking) Memory data carries safety enough (or metadata lack thereof) with it to allow checking e.g., array is stored as length followed by elements In contrast, C data carries no metadata int[3] is just three contiguous integers in memory int array[] int[] = array { 23, = 42, new 81 int[] }; { 23, 42, 81 }; 3 23 42 81 Language enforces no memory safety array[1] = 13; array[1] array[42] = 13; = 13; throws exception array[42] array[i] = 13; = (invalid 13; may memory may access; not throw may exception segfault, (dynamic check) array[i] = 13; may overwrite other program data) 2/16 Programmer has full control of data representation (no metadata) Programmer can decide when checks are needed However, very error-prone Source of bugs, security vulnerabilities (e.g., Heartbleed) Vítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 3 / 13

Recovering memory safety How can we recover memory safety in C programs? Traditional solution: add metadata to allow checking This has a number of drawbacks: It changes memory representation of objects requires recompilation of everything (external libraries, OS syscalls) C pointers can point to any part of an object No simple/cheap way to find metadata from an arbitrary pointer Pointers themselves must carry bounds, or separate data structure must be looked up Changes representation and/or is expensive But there is another way... ítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 4 / 13

Recovering memory safety A correct C program has no memory safety violations Programmer must keep track of array bounds, etc., manually Common idioms int sum(int *array, int len) struct data { int len; char *payload; }; Bounds information is already present in C programs But in an ad-hoc way that the compiler cannot check Solution: allow the programmer to formally express these relationships So that the compiler can validate their correct usage Vítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 5 / 13

Deputy Dependent type system for C Programmer adds annotations to pointers int sum(int * COUNT(len) array, int len) struct data { int len; char * COUNT(len) payload; }; Compiler now has enough information to check memory access Automatically insert checks to ensure correct usage Compiler employs the same metadata already present in the program Smaller memory overhead Inserted checks can often be proved redundant and optimized out for (i=0; i<len; i++) { assert(i>=0 && i<len); sum += array[i] } However, Deputy is based on CIL, which is C-only C++ suffers from the same problems Vítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 6 / 13

LLVM, Clang, Týr Our approach: Use LLVM instead of CIL LLVM is a language-agnostic framework for compilation, optimization code analysis and transformation in general designed around a typed assembly-like language (LLVM IR) Clang is a C/C++ compiler which emits LLVM IR We propose a dependent type system for LLVM IR, called Týr Support both C and C++ by targeting LLVM IR LLVM/Clang are actively developed, unlike CIL ítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 7 / 13

Týr-1 foo.c Clang foo.ll Týr-1 foo.ll + checks Annotation extractor foo.dep Compile C/C++ to Clang Check pointer foo.ll LLVM usage against provided annotations foo.ll Týr-2 + checks Insert run-timeopt checks + chk/opt Insert tracing information foo.ll* User diagnostics Vítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 8 / 13

Týr-2 Annotation extractor foo.dep foo.ll + checks LLVM opt foo.ll + chk/opt Týr-2 foo.ll* LLVM assembler User diagnostics Machine code Run the rest of the LLVM pipeline (optimizations) Look for checks which were found to be always false static error Remove tracing information and generate machine code Vítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 9 / 13

Type system Replaces LLVM IR pointer constructor (τ ) with two new types: Ptr τ, low-bound, high-bound : bounded pointer LocalVar τ : pointer to local variable in the stack Defines rules which ensure checks will be performed when a pointer is accessed: is this access valid? when metadata is modified: does this break any invariant? int f(int * COUNT(len) array, int len) { array[5] = 42; // is this within bounds? len = len + 1; // are these new bounds valid? } ítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 10 / 13

Current status Done Formal rules for typechecking and insertion of checks Initial work on building the LLVM module Next steps Implementation of the rules within LLVM module Experimental validation (performance, coverage) Proof of correctness of the type system Vítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 11 / 13

Related work Hardware-based approaches Watchdog (Nagarakatte et al.) (ISCAS 2012, CGO 2014) Uses hardware to speed up pointer bounds verification Automatic instrumentation of legacy code SoftBound (Nagarakatte et al.) (PLDI 2009) SAFECode (Dhurjati et al.) (PLDI 2006) CCured (Necula et al.) (TOPLAS 2005) Keep their own (possibly redundant) metadada Safe dialects of C Cyclone (Jim et al.) (USENIX 2002) Replaces unsafe C constructions with more well-behaved constructions ítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 12 / 13

Conclusion Approach based on dependent types Makes information already latent in C/C++ programs explicit Compiler can enforce invariants described the the programmer No change in data representation Allows partial/gradual migration Compatibility with external libraries Low overhead Reuse already existing information Compiler-inserted checks can be optimized ítor De Araújo Álvaro Moreira (orientador) Týr: Rodrigo a dependent Machado type (co-orientador) system for spatial memory safety in LLVM August 13, 2015 13 / 13