The CompCert verified C compiler



Similar documents
PROGRAM LOGICS FOR CERTIFIED COMPILERS

Specification and Analysis of Contracts Lecture 1 Introduction

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

Trustworthy Software Systems

Advances in Programming Languages

Module 10. Coding and Testing. Version 2 CSE IIT, Kharagpur

Static Analysis of Dynamic Properties - Automatic Program Verification to Prove the Absence of Dynamic Runtime Errors

Progress Report to ONR on MURI Project Building Interactive Formal Digital Libraries of Algorithmic Mathematics

Pretty-big-step semantics

Formal Certification of a Compiler Back-end

A Formally-Verified C Static Analyzer

INF5140: Specification and Verification of Parallel Systems

Technical paper review. Program visualization and explanation for novice C programmers by Matthew Heinsen Egan and Chris McDonald.

Applications of formal verification for secure Cloud environments at CEA LIST

Formal verification of a C-like memory model and its uses for verifying program transformations

Transparent Monitoring of a Process Self in a Virtual Environment

Formal Verification and Linear-time Model Checking

How To Verify Stack Space Bounds In C Programs In Coq

Virtual Machine Learning: Thinking Like a Computer Architect

Static Program Transformations for Efficient Software Model Checking

IKOS: A Framework for Static Analysis based on Abstract Interpretation (Tool Paper)

Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification

JavaScript as a compilation target Making it fast

Testing static analyzers with randomly generated programs

Software Verification and System Assurance

Introducing Formal Methods. Software Engineering and Formal Methods

Model Checking: An Introduction

CA Compiler Construction

Introduction to Automated Testing

Programming Languages

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

Pattern Insight Clone Detection

InvGen: An Efficient Invariant Generator

Introduction to Functional Verification. Niels Burkhardt

System modeling. Budapest University of Technology and Economics Department of Measurement and Information Systems

Language Processing Systems

B) Using Processor-Cache Affinity Information in Shared Memory Multiprocessor Scheduling

Abstractions For Software-Defined Networks

Instruction Set Design

International Journal of Software Engineering and Knowledge Engineering c World Scientific Publishing Company

Software Engineering Introduction & Background. Complaints. General Problems. Department of Computer Science Kent State University

Applying Clang Static Analyzer to Linux Kernel

Lab Experience 17. Programming Language Translation

Regression Verification: Status Report

Adversary Modelling 1

AURA: A language with authorization and audit

The Course.

FPGA area allocation for parallel C applications

Faster deterministic integer factorisation

Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages

FoREnSiC An Automatic Debugging Environment for C Programs

ESIEE Paris International Master of Computer Science DMC (Diplôme National de Master) Official program for years 2016 to 2018

Model Checking of Software

University of Dayton Department of Computer Science Undergraduate Programs Assessment Plan DRAFT September 14, 2011

EMSCRIPTEN - COMPILING LLVM BITCODE TO JAVASCRIPT (?!)

GameTime: A Toolkit for Timing Analysis of Software

Module 10. Coding and Testing. Version 2 CSE IIT, Kharagpur

1 File Processing Systems

RISC-V Software Ecosystem. Andrew Waterman UC Berkeley

1/20/2016 INTRODUCTION

Static Analysis. Find the Bug! : Analysis of Software Artifacts. Jonathan Aldrich. disable interrupts. ERROR: returning with interrupts disabled

C++ Programming Language

x86 ISA Modifications to support Virtual Machines

ABET General Outcomes. Student Learning Outcomes for BS in Computing

How to Write a Checker in 24 Hours

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Compiling Recursion to Reconfigurable Hardware using CLaSH

Static Analysis of Virtualization- Obfuscated Binaries

Software Testing. Quality & Testing. Software Testing

One LAR Course Credits: 3. Page 4

Meeting DO-178B Software Verification Guidelines with Coverity Integrity Center

Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008

LONG BEACH CITY COLLEGE MEMORANDUM

CSE 504: Compiler Design. Data Flow Analysis

Comprehensive Static Analysis Using Polyspace Products. A Solution to Today s Embedded Software Verification Challenges WHITE PAPER

FROM SAFETY TO SECURITY SOFTWARE ASSESSMENTS AND GUARANTEES FLORENT KIRCHNER (LIST)

ML for the Working Programmer

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015

Foundational Proof Certificates

Obfuscation: know your enemy

MEng, BSc Applied Computer Science

Visualizing Information Flow through C Programs

The Model Checker SPIN

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

ShadowDB: A Replicated Database on a Synthesized Consensus Core

C# and Other Languages

Formal Verification of Software

Transcription:

The CompCert verified C compiler Compiler built and proved by Xavier Leroy et al. Talk given by David Pichardie - Harvard University / INRIA Rennes Slides largely inspired by Xavier Leroy own material

Critical embedded software critical_prog.ppc 2

Critical embedded software High degree of assurance is required is the program critical_prog.ppc safe? critical_prog.ppc 2

Critical embedded software High degree of assurance is required is the program critical_prog.ppc safe? 2 options: qualify the PPC program as if hand-written (intensive testing, painful manual review...) qualify the program at the source level (static analysis, model checking, or program proof) critical_prog.ppc 2

Critical embedded software High degree of assurance is required is the program critical_prog.ppc safe? 2 options: qualify the PPC program as if hand-written (intensive testing, painful manual review...) qualify the program at the source level (static analysis, model checking, or program proof) 2nd option is preferred in practice can you trust your compiler? this talk: apply formal verification techniques to the compiler itself! critical_prog.ppc 2

Miscompilation happens 3

Miscompilation happens NULLSTONE isolated defects [in integer division] in twelve of twenty commercially available compilers that were evaluated. http://www.nullstone.com/htmls/category/divide.htm 3

Miscompilation happens NULLSTONE isolated defects [in integer division] in twelve of twenty commercially available compilers that were evaluated. http://www.nullstone.com/htmls/category/divide.htm We tested thirteen production-quality C compilers and, for each, found situations in which the compiler generated incorrect code for accessing volatile variables. E. Eide & J. Regehr, EMSOFT 2008 3

Miscompilation happens NULLSTONE isolated defects [in integer division] in twelve of twenty commercially available compilers that were evaluated. http://www.nullstone.com/htmls/category/divide.htm We tested thirteen production-quality C compilers and, for each, found situations in which the compiler generated incorrect code for accessing volatile variables. E. Eide & J. Regehr, EMSOFT 2008 We created a tool that generates random C programs, and then spent two and a half years using it to find compiler bugs. So far, we have reported more than 290 previously unknown bugs to compiler developers. Moreover, every compiler that we tested has been found to crash and also to silently generate wrong code when presented with valid inputs. J. Regehr et al, 2010 3

How do we verify a compiler? 4

How do we verify a compiler? A simple idea: 4

How do we verify a compiler? A simple idea: Program and prove your compiler in the same language! 4

How do we verify a compiler? A simple idea: Program and prove your compiler in the same language! Which language? 4

How do we verify a compiler? A simple idea: Program and prove your compiler in the same language! Which language? Coq 4

Coq: an animal with two faces 5

Coq: an animal with two faces First face: 5

Coq: an animal with two faces First face: a proof assistant that allows to interactively build proof in constructive logic 5

Coq: an animal with two faces First face: a proof assistant that allows to interactively build proof in constructive logic Second face: 5

Coq: an animal with two faces First face: a proof assistant that allows to interactively build proof in constructive logic Second face: a functional programming language with a very rich type system 5

Coq: an animal with two faces First face: a proof assistant that allows to interactively build proof in constructive logic Second face: a functional programming language with a very rich type system example: sort: 8 l: list int, { l : list int Sorted l ^ PermutationOf l l } 5

Coq: an animal with two faces First face: a proof assistant that allows to interactively build proof in constructive logic Second face: a functional programming language with a very rich type system example: sort: 8 l: list int, { l : list int Sorted l ^ PermutationOf l l } with an extraction mechanism to Ocaml sort: int list! int list 5

Verifying a compiler in Coq A simple recipe Logical Framework (here Coq) 6

Verifying a compiler in Coq A simple recipe Program the compiler inside Coq Definition compiler (p: source) : option target :=... Compiler Logical Framework (here Coq) 6

Verifying a compiler in Coq A simple recipe Program the compiler inside Coq Definition compiler (p: source) : option target :=... Compiler Source & Target Language Semantics State its correctness wrt. a formal specification of the language semantics Theorem compiler_is_correct : 8 p, compiler p = Some p! behaviors(p ) behaviors(p) If compilation succeeds, the generated code should behave as prescribed by the semantics of the source program. Logical Framework (here Coq) 6

Verifying a compiler in Coq A simple recipe Program the compiler inside Coq Definition compiler (p: source) : option target :=... Compiler Source & Target Language Semantics State its correctness wrt. a formal specification of the language semantics Theorem compiler_is_correct : 8 p, compiler p = Some p! behaviors(p ) behaviors(p) We interactively and mechanically prove this theorem Logical Framework (here Coq) Soundness Proof Proof.... (* few days later *)... Qed. 6

Verifying a compiler in Coq A simple recipe Program the compiler inside Coq Definition compiler (p: source) : option target :=... Compiler Source & Target Language Semantics State its correctness wrt. a formal specification of the language semantics Theorem compiler_is_correct : 8 p, compiler p = Some p! behaviors(p ) behaviors(p) We interactively and mechanically prove this theorem Logical Framework (here Coq) Soundness Proof Proof.... (* few days later *)... Qed. We extract an OCaml implementation of the compiler parser.ml compiler.ml linker.ml Extraction compiler. 6

Trusted Computing Base (TCB) 1. Formal specification of the programming language semantics already (informally) shared between any enduser programmer, compiler, static analyzer must itself be rigorously proofread/animated/ tested 2. Logical Framework only the proof checker needs to be trusted we don t trust sophisticated decision procedures Compiler Logical Framework (here Coq) Soundness Proof Source & Target Language Semantics 7

Trusted Computing Base (TCB) 1. Formal specification of the programming language semantics already (informally) shared between any enduser programmer, compiler, static analyzer must itself be rigorously proofread/animated/ tested Compiler Soundness Proof Source & Target Language Semantics 2. Logical Framework only the proof checker needs to be trusted we don t trust sophisticated decision procedures Logical Framework (here Coq) Proof Checker 7

Trusted Computing Base (TCB) 1. Formal specification of the programming language semantics already (informally) shared between any enduser programmer, compiler, static analyzer must itself be rigorously proofread/animated/ tested Compiler Soundness Proof Source & Target Language Semantics 2. Logical Framework only the proof checker needs to be trusted we don t trust sophisticated decision procedures Logical Framework (here Coq) Proof Checker Still a large code base but at least a foundational code base: logic & semantics 7

The CompCert project X. Leroy, S. Blazy et al. Develop and prove correct a realistic compiler, usable for critical embedded software. Source language: a very large subset of C. Target language: PowerPC/ARM/x86 assembly. Generates reasonably compact and fast code careful code generation; some optimizations. Note: compiler written from scratch, along with its proof; not trying to prove an existing compiler (otherwise see Zdancewic et al s Verified LLVM project). 8

The formally verified part of the compiler side-effects out type elimination Compcert C Clight C#minor of expressions loop simplifications Optimizations: constant prop., CSE, tail calls stack allocation of variables RTL CFG construction expr. decomp. CminorSel instruction selection Cminor register allocation linearization spilling, reloading LTL LTLin Linear of the CFG calling conventions layout of stack frames ASM asm code generation Mach 9

Formally verified in Coq After 50 000 lines of Coq and 4 person.years of effort Theorem transf_c_program_is_refinement: 8 p tp, transf_c_program p = OK tp! (8 beh, exec_c_program p beh! not_wrong beh)! (8 beh, exec_asm_program tp beh! exec_c_program p beh). Behaviors beh = termination / divergence / going wrong + trace of I/O operations (syscalls, volatile accesses). 10

Compiler verification patterns (for each pass) 11

Compiler verification patterns (for each pass) Verified transformation transformation 11

Compiler verification patterns (for each pass) Verified transformation transformation Verified translation validation transformation validator = formally verified = not verified 11

Compiler verification patterns (for each pass) Verified transformation transformation Verified translation validation transformation External solver with verified validation transformation validator checker = formally verified = not verified untrusted solver 11

External solver with verified validation Example: register allocation register allocation RTL LTL graph coloring heuristics graph coloring checker 12

Verified translation validation Example: SSA generation (in CompCert SSA extension) untrusted SSA generator RTL SSA validator 13

Verified translation validation Example: SSA generation (in CompCert SSA extension) untrusted SSA generator RTL SSA validator The untrusted generator can rely on advanced graph algorithms as Lengauer and Tarjan s dominator tree construction and frontier dominance computation. 13

Verified translation validation Example: SSA generation (in CompCert SSA extension) untrusted SSA generator RTL SSA validator The untrusted generator can rely on advanced graph algorithms as Lengauer and Tarjan s dominator tree construction and frontier dominance computation. We prove the validator is complete with respect to this family of algorithms. 13

The whole CompCert compiler C source parsing, construction of an AST type-checking, de-sugaring AST C Type reconstruction Graph coloring Code linearization heuristics Verified compiler assembling printing of Executable Assembly AST Asm linking asm syntax Part of the TCB Not part of the TCB Not proved (hand-written in OCaml) Proved in Coq (extracted to OCaml) 14

Performance of generated code Performance of generated code (On a PowerPC G5 processor) (On a PowerPC G5 processor) Execution time gcc -O0 Compcert gcc -O1 gcc -O3 AES Almabench Binarytrees Fannkuch FFT Knucleotide Nbody Qsort Raytracer Spectral VMach X. Leroy (INRIA) Verified squared POPL 2011 29 / 50 15

Conclusions The formal verification of realistic compilers is feasible. (Within the limitations of contemporary proof assistants) Much work remains: Shrinking the TCB (e.g. verified parsing, validated assembling & linking). More optimizations (see CompCert SSA). Front-ends for other languages Concurrency (see Sevcik et al s CompCert TSO and Appel and al s Verified Software Toolchain). Connections with source-level verification (ongoing french project on a verified C static analyzer) 16