Automated Repair of Binary and Assembly Programs for Cooperating Embedded Devices

Similar documents
Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z)

Language Processing Systems

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

Lecture 27 C and Assembly

Dynamic Behavior Analysis Using Binary Instrumentation

Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages

64-Bit NASM Notes. Invoking 64-Bit NASM

Search Algorithm in Software Testing and Debugging

Hacking Techniques & Intrusion Detection. Ali Al-Shemery arabnix [at] gmail

Two Flavors in Automated Software Repair: Rigid Repair and Plastic Repair

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

Return-oriented programming without returns

Physical Data Organization

CS61: Systems Programing and Machine Organization

System/Networking performance analytics with perf. Hannes Frederic Sowa

Embedded Software Development

Buffer Overflows. Security 2011

Lecture 9. Semantic Analysis Scoping and Symbol Table

TODAY, FEW PROGRAMMERS USE ASSEMBLY LANGUAGE. Higher-level languages such

Programming Languages

The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy

Parallelization: Binary Tree Traversal

Lecture 26: Obfuscation

Automating Mimicry Attacks Using Static Binary Analysis

CSC230 Getting Starting in C. Tyler Bletsch

CSE 141L Computer Architecture Lab Fall Lecture 2

Lexical Analysis and Scanning. Honors Compilers Feb 5 th 2001 Robert Dewar

Off-by-One exploitation tutorial

Chapter 7D The Java Virtual Machine

Raima Database Manager Version 14.0 In-memory Database Engine

Applying Clang Static Analyzer to Linux Kernel

CSCI 3136 Principles of Programming Languages

Toward Improving Graftability on Automated Program Repair

Object-Oriented Software Engineering THE TOOLS OF THE TRADE CHAPTER 5. Stephen R. Schach 5.1 Stepwise Refinement.

Compilers. Introduction to Compilers. Lecture 1. Spring term. Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam.

Application Note 195. ARM11 performance monitor unit. Document number: ARM DAI 195B Issued: 15th February, 2008 Copyright ARM Limited 2007

DATA ITEM DESCRIPTION

Machine Programming II: Instruc8ons

Machine-Level Programming II: Arithmetic & Control

Optimizing Linux Performance

The programming language C. sws1 1

A Dozen Years of Shellphish From DEFCON to the Cyber Grand Challenge

Exploiting nginx chunked overflow bug, the undisclosed attack vector

Compilers I - Chapter 4: Generating Better Code

CS:APP Chapter 4 Computer Architecture Instruction Set Architecture. CS:APP2e

Assembly Language: Function Calls" Jennifer Rexford!

The Elective Part of the NSS ICT Curriculum D. Software Development

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Software Vulnerabilities

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Oracle Solaris Studio Code Analyzer

For a 64-bit system. I - Presentation Of The Shellcode

Efficient Program Exploration by Input Fuzzing

Designing Better Fitness Functions for Automated Program Repair

æ A collection of interrelated and persistent data èusually referred to as the database èdbèè.

Data Structures and Data Manipulation

Stack Overflows. Mitchell Adair

Leak Check Version 2.1 for Linux TM

Secure Programming with Static Analysis. Jacob West

Machine-Level Programming I: Basics

Persistent Binary Search Trees

Bringing Big Data Modelling into the Hands of Domain Experts

Java in High-Performance Computing

TRACE PERFORMANCE TESTING APPROACH. Overview. Approach. Flow. Attributes

Static Analysis. Find the Bug! : Analysis of Software Artifacts. Jonathan Aldrich. disable interrupts. ERROR: returning with interrupts disabled

Format string exploitation on windows Using Immunity Debugger / Python. By Abysssec Inc

Fast Arithmetic Coding (FastAC) Implementations

Instruction Set Architecture

Symbol Tables. Introduction

Configuration management. Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 29 Slide 1

Tailored Source Code Transformations to Synthesize Computationally Diverse Program Variants. Benoit Baudry, Simon Allier, Martin Monperrus

Keil C51 Cross Compiler

Input / Ouput devices. I/O Chapter 8. Goals & Constraints. Measures of Performance. Anatomy of a Disk Drive. Introduction - 8.1

Informatica e Sistemi in Tempo Reale

A3 Computer Architecture

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

MSc Computer Science Dissertation

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

Advanced IBM AIX Heap Exploitation. Tim Shelton V.P. Research & Development HAWK Network Defense, Inc. tshelton@hawkdefense.com

Simplifying Failure-Inducing Input

Optimal Stack Slot Assignment in GCC

CSE 326: Data Structures B-Trees and B+ Trees

DATA STRUCTURES USING C

Jonathan Worthington Scarborough Linux User Group

Exam 1 Review Questions

Transcription:

Automated Repair of Binary and Assembly Programs for Cooperating Embedded Devices Eric Schulte 1 Jonathan DiLorenzo 2 Westley Weimer 2 Stephanie Forrest 1 1 Department of Computer Science University of New Mexico Albuquerque, NM 87131-0001 2 Department of Computer Science University of Virginia Charlottesville, VA 22904-4740 March 19, 2013 ASPLOS 2013 0

i Outline Background Technical Approach Empirical Results Distributed Program Repair Discussion Conclusion b; i if(a! a; int argc, []) main(int int mai argc, char *argv[]) ASPLOS 2013 1

Embedded Devices Computing Device Sales 500 tablet pc smartphone 400 Million Units 300 200 100 0 2008 2009 2010 2011 Years [Maier, 2011] ASPLOS 2013 2

Embedded Devices Resource Constraints Small disks Less memory Slow processors Slow, costly comm. ASPLOS 2013 2

Genprog: Automatically Repairing Software Bugs Use of algorithmic and heuristic methods to search for, generate, and evaluate program repairs. Strengths Repaired 55/105 bugs for $8 each [Le Goues et al., 2012a] Repairs multiple classes of bugs and security defects [Weimer et al., 2009] Wins human-competitive awards [Forrest et al., 2009; Le Goues et al., 2012b] Limitations Requires source code Requires build tool chain Requires program instrumentation Expensive fitness function (compilation, test execution) ASPLOS 2013 3

Software Repair Algorithm: Baseline Input Evolve Fitness Evaluation Output Accept X Discard Mutants ASPLOS 2013 4

Software Repair Algorithm: Contributions if(a==0) puts(b); while(b!=0) How do we mutate? Where do we mutate? if(a>b) puts(a); a=a-b; else b=b-a; ASPLOS 2013 5

Software Repair Algorithm: Contributions if(a==0) puts(b); while(b!=0) How do we mutate? Where do we mutate? if(a>b) puts(a); a=a-b; else b=b-a; ASPLOS 2013 5

ASM and ELF Program Representations Source 1 if (a ==0){ 2 printf ("%g\n", b); } 3 else { 4 while (b!=0){ 5 if (a>b){ a=a- b; } 6 else { b=b-a; } } } 7 printf ("%g\n", a); AST printf("%g\n",b); a=a-b; if(a==0) if(a>b) while(b!=0) else b=b-a; printf("%g\n",a); ASPLOS 2013 6

ASM and ELF Program Representations Source 1 if (a ==0){ 2 printf ("%g\n", b); } 3 else { 4 while (b!=0){ 5 if (a>b){ a=a- b; } 6 else { b=b-a; } } } 7 printf ("%g\n", a); ASM.file "gcd.c".globl main.type main, @function main:.cfi startproc pushq %rbp.cfi def cfa offset 16.cfi offset 6, -16 movq %rsp, %rbp.cfi def cfa register 6 subq $48, %rsp AST if(a==0) printf("%g\n",b); if(a>b) a=a-b; while(b!=0) printf("%g\n",a); else b=b-a; ASPLOS 2013 6

ASM and ELF Program Representations Source 1 if (a ==0){ 2 printf ("%g\n", b); } 3 else { 4 while (b!=0){ 5 if (a>b){ a=a- b; } 6 else { b=b-a; } } } 7 printf ("%g\n", a); AST if(a==0) printf("%g\n",b); if(a>b) a=a-b; while(b!=0) printf("%g\n",a); else b=b-a; ASM.file "gcd.c".globl main.type main, @function main:.cfi startproc pushq %rbp.cfi def cfa offset 16.cfi offset 6, -16 movq %rsp, %rbp.cfi def cfa register 6 subq $48, %rsp ELF ELF\? ELF header program header table section 1....text section [55] [48 89 e5] [48 83 ec 20] [48 89 7d e8] [89 75 e4] [83 7d e4 01] [7e 60]...... section n section header table ASPLOS 2013 6

ASM and ELF Program Mutations Delete Insert Swap movq 8(%rdx), %rdi xorl %eax, %eax movq %rdx, -80(%rbp) addl $1, %r14d call atoi movq -80(%rbp), %rdx movl %eax, (%r15) addq $4, %r15 movq 8(%rdx), %rdi xorl %eax, %eax addl $1, %r14d call atoi movq %rdx, -80(%rbp) movl %eax, (%r15) addq $4, %r15 movq 8(%rdx), %rdi xorl %eax, %eax movq -80(%rbp), %rdx addl $1, %r14d call atoi movq -80(%rbp), %rdx movl %eax, (%r15) addq $4, %r15 movq 8(%rdx), %rdi xorl %eax, %eax movq -80(%rbp), %rdx addl $1, %r14d call atoi movq %rdx, -80(%rbp) movq -80(%rbp), %rdx movl %eax, (%r15) addq $4, %r15 movq 8(%rdx), %rdi xorl %eax, %eax movq %rdx, -80(%rbp) addl $1, %r14d call atoi movq -80(%rbp), %rdx movl %eax, (%r15) addq $4, %r15 movq 8(%rdx), %rdi xorl %eax, %eax movq -80(%rbp), %rdx addl $1, %r14d call atoi movq %rdx, -80(%rbp) movl %eax, (%r15) addq $4, %r15 (... additional ELF bookkeeping... ) ASPLOS 2013 7

Genprog: Automatically Repairing Software Bugs if(a==0) puts(b); while(b!=0) How do we mutate? Where do we mutate? if(a>b) puts(a); a=a-b; else b=b-a; ASPLOS 2013 8

Genprog: Automatically Repairing Software Bugs if(a==0) puts(b); while(b!=0) How do we mutate? Where do we mutate? if(a>b) puts(a); a=a-b; else b=b-a; ASPLOS 2013 8

Genprog: Automatically Repairing Software Bugs if(a==0) puts(b); while(b!=0) How do we mutate? Where do we mutate? if(a>b) puts(a); a=a-b; else b=b-a; Fault Localization ASPLOS 2013 8

Light Weight Fault Localization 1. Sample program counter. 2. Translate memory addresses to program offsets. 3. Smooth sample with Gaussian convolution. CPU movq 8(%rdx), %rdi xorl %eax, %eax movl %eax, (%r15) addl $1, %r14d call atoi movq -80(%rbp), %rdx movq %rdx, -80(%rbp) addq $4, %r15 movq 8(%rdx), %rdi xorl %eax, %eax movl %eax, (%r15) Machine-code Instructions ASPLOS 2013 9

Light Weight Fault Localization 1. Sample program counter. 2. Translate memory addresses to program offsets. 3. Smooth sample with Gaussian convolution. CPU movq 8(%rdx), %rdi xorl %eax, %eax movl %eax, (%r15) addl $1, %r14d call atoi movq -80(%rbp), %rdx movq %rdx, -80(%rbp) addq $4, %r15 movq 8(%rdx), %rdi xorl %eax, %eax movl %eax, (%r15) Machine-code Instructions ASPLOS 2013 9

Light Weight Fault Localization 1. Sample program counter. 2. Translate memory addresses to program offsets. 3. Smooth sample with Gaussian convolution. CPU memory addr. to instruction movq 8(%rdx), %rdi xorl %eax, %eax movl %eax, (%r15) addl $1, %r14d call atoi movq -80(%rbp), %rdx movq %rdx, -80(%rbp) addq $4, %r15 movq 8(%rdx), %rdi xorl %eax, %eax movl %eax, (%r15) Machine-code Instructions ASPLOS 2013 9

Light Weight Fault Localization 1. Sample program counter. 2. Translate memory addresses to program offsets. 3. Smooth sample with Gaussian convolution. CPU memory addr. to instruction movq 8(%rdx), %rdi xorl %eax, %eax movl %eax, (%r15) addl $1, %r14d call atoi movq -80(%rbp), %rdx movq %rdx, -80(%rbp) addq $4, %r15 movq 8(%rdx), %rdi xorl %eax, %eax movl %eax, (%r15) Machine-code Instructions ASPLOS 2013 9

Fault Localization Comparison Execution Count 700 600 500 400 300 200 100 0 Merge Sort Full Trace Sampling 0x4005D8 0x400510 0x400768 0x4006A0 Address 0x400B50 0x400A88 0x4009C0 0x4008F8 0x400830 ASPLOS 2013 10

Genprog: Automatically Repairing Software Bugs Use of algorithmic and heuristic methods to search for, generate, and evaluate program repairs. Strengths Repaired 55/105 bugs for $8 each [Le Goues et al., 2012a] Repairs multiple classes of bugs and security defects [Weimer et al., 2009] Wins human-competitive awards [Forrest et al., 2009; Le Goues et al., 2012b] Limitations Requires source code Requires build tool chain Requires program instrumentation Expensive fitness function (compilation, test execution) ASPLOS 2013 11

Genprog: Automatically Repairing Software Bugs Use of algorithmic and heuristic methods to search for, generate, and evaluate program repairs. Strengths Repaired 55/105 bugs for $8 each [Le Goues et al., 2012a] Repairs multiple classes of bugs and security defects [Weimer et al., 2009] Wins human-competitive awards [Forrest et al., 2009; Le Goues et al., 2012b] Limitations Requires source code Requires build tool chain Requires program instrumentation Expensive fitness function (compilation, test execution) ASPLOS 2013 11

Benchmark Programs Program Program Description Bug atris graphical tetris game local stack buffer exploi ccrypt encryption utility segfault deroff document processing segfault flex lexical analyzer generator segfault indent source code processing infinite loop look svr4 dictionary lookup infinite loop look ultrix dictionary lookup infinite loop merge merge sort duplicate inputs merge-cpp merge sort (in C++) duplicate inputs s3 sendmail utility buffer overflow uniq duplicate text processing segfault units metric conversion segfault zune embedded media player infinite loop ASPLOS 2013 12

Empirical Results Effective 62% faster runtime 95% smaller disk footprint 86% less memory Total bugs repaired Rep. Num. Bugs AST 13 ASM 12 ELF 11 Average success rate 100 runs per bug Rep. Success Rate AST 78.17% ASM 70.75% ELF 65.83% ASPLOS 2013 13

Empirical Results Effective 62% faster runtime 95% smaller disk footprint 86% less memory Expected fitness evaluations Rep. Evaluations AST 583.98 ASM 188.38 ELF 207.15 Total runtime Rep. Sec. AST 229.50 ASM 278.30 ELF 74.20 ASPLOS 2013 13

Empirical Results Effective 62% faster runtime 95% smaller disk footprint 86% less memory Example: Merge Sort Repair by representation AST, 2 of 4900 Swaps ASM, 1 of 280 Deletes merge.c 44 if( left [l-mid -1] <= right [0]){ 45 result = list ; } 46 else {/* fix : swap branches */ 47 result = merge (left,l-mid, 48 right, mid ); } merge.s 210 cmpl % eax, % edx ; fix : del. 211 jg.l12 212 movq -72(% rbp ), % rax ASPLOS 2013 13

Empirical Results Disk size Effective 62% faster runtime 95% smaller disk footprint 86% less memory Rep. AST ASM ELF Requirements Source code & build toolchain Assembly code & linker Compiled executable ASPLOS 2013 13

Empirical Results Working memory Effective 62% faster runtime 95% smaller disk footprint 86% less memory Rep. MB AST 1402 ASM 756 ELF 200 ASPLOS 2013 13

Distributed Genetic Repair Algorithm 1 4 2 3 ASPLOS 2013 14

Distributed Genetic Repair Algorithm 1 4 2 3 ASPLOS 2013 14

Distributed Genetic Repair Algorithm delete(45) 1 swap(23,86) 4 2 insert(4,23) 3 delete(4) ASPLOS 2013 14

Distributed Genetic Repair Evaluation Relative performance of DGA Expected Wall Clock # Nodes Fitness Evaluations Seconds w/sms 1 1 1 1 2 0.94 0.89 1.07 3 0.84 0.67 0.81 4 0.80 0.55 0.63 ASPLOS 2013 15

Distributed Genetic Repair Evaluation Relative performance of DGA Expected Wall Clock # Nodes Fitness Evaluations Seconds w/sms 1 1 1 1 2 0.94 0.89 1.07 3 0.84 0.67 0.81 4 0.80 0.55 0.63 ASPLOS 2013 15

Distributed Genetic Repair Evaluation Relative performance of DGA Expected Wall Clock # Nodes Fitness Evaluations Seconds w/sms 1 1 1 1 2 0.94 0.89 1.07 3 0.84 0.67 0.81 4 0.80 0.55 0.63 ASPLOS 2013 15

Discussion ASM & ELF search space Search space System protection Program size 3 more assembly instructions than C statements Search space size program size = alphabet Possible program coverage Possible Programs Reachable Programs Original Program ASPLOS 2013 16

Discussion Search space System protection System protection Arbitrary assembly is arbitrarily dangerous Light weight sandboxing (ulimit and chroot) ASPLOS 2013 16

Conclusion ASM and ELF representation Remove requirement for source code and build toolchain Language Agnostic; x86 or ARM assembly or ELF Change program repair search space Reduce resources; 95% smaller disk footprint, 86% less memory, 62% faster runtime Distributed Genetic Program Repair Allows multiple devices to collaborate Fewer fitness evaluations and faster runtime ASPLOS 2013 17

Conclusion Take away Assembly mutations cause semantic changes! Software is not brittle! ASPLOS 2013 17

Conclusion Take away Assembly mutations cause semantic changes! Software is not brittle! ASPLOS 2013 17

Thank You Contact Name Email Homepage Eric Schulte eschulte@cs.unm.edu http://cs.unm.edu/~eschulte Code Program Repair Tool ptrace Tracer ELF (C) ELF (Common Lisp) http://genprog.cs.virginia.edu http://github.com/eschulte/tracer http://github.com/eschulte/rw-elf http://github.com/eschulte/elf ASPLOS 2013 18

Bibliography S. Forrest, W. Weimer, T. Nguyen, and C. Le Goues. A genetic programming approach to automated software repair. In Genetic and Evolutionary Computing Conference, 2009. C. Le Goues, M. Dewey-Vogt, S. Forrest, and W. Weimer. A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In International Conference on Software Engineering, 2012a. C. Le Goues, W. Weimer, and S. Forrest. Representations and operators for improving evolutionary software repair. In Proceedings of the fourteenth international conference on Genetic and evolutionary computation conference, pages 959 966. ACM, 2012b. D. Maier. Sales of smartphones and tablets to exceed pcs. Practical Ecommerce, October 2011. http://www.practicalecommerce.com/articles/ 3069-Sales-of-Smartphones-and-Tablets-to-Exceed-PCs-. W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest. Automatically finding patches using genetic programming. In International Conference on Software Engineering, pages 364 367, 2009. ASPLOS 2013 19

Backup Slides ASPLOS 2013 20

Backup: Wall Clock Times Mean wall clock time in seconds to find a successful repair DGA Naïve Parallel # Nodes Seconds Rounds Seconds 1 205.531 2 173.868 43.2 195.821 3 135.17 28.2 201.346 4 115.566 14.5 211.989 ASPLOS 2013 21

Backup: Software Mutational Robustness Semantic Space, Specification & Test Suite Specification Program Test Suite ASPLOS 2013 22

Backup: Software Mutational Robustness Semantic Space, Specification & Test Suite Specification Program 1 Test Suite ASPLOS 2013 22

Backup: Software Mutational Robustness Semantic Space, Specification & Test Suite 2 Specification Program 1 Test Suite ASPLOS 2013 22

Backup: Software Mutational Robustness Semantic Space, Specification & Test Suite 2 Specification Program 1 3 Test Suite ASPLOS 2013 22

Backup: Software Mutational Robustness Semantic Space, Specification & Test Suite 2 Specification 4 Program 1 3 Test Suite ASPLOS 2013 22

Backup: Software Mutational Robustness Semantic Space, Specification & Test Suite 2 Specification 4 5 Program 1 3 Test Suite ASPLOS 2013 22