Static Taint-Analysis on Binary Executables



Similar documents
COMP 356 Programming Language Structures Notes for Chapter 10 of Concepts of Programming Languages Implementing Subprograms.

A Static Analyzer for Large Safety-Critical Software. Considered Programs and Semantics. Automatic Program Verification by Abstract Interpretation

Detecting Software Vulnerabilities Static Taint Analysis

Pixy: A Static Analysis Tool for Detecting Web Application Vulnerabilities (Technical Report)

Platform-independent static binary code analysis using a metaassembly

Obfuscation: know your enemy

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 20: Stack Frames 7 March 08

Static Analysis. Find the Bug! : Analysis of Software Artifacts. Jonathan Aldrich. disable interrupts. ERROR: returning with interrupts disabled

Address Taken FIAlias, which is equivalent to Steensgaard. CS553 Lecture Alias Analysis II 2

Review; questions Discussion of Semester Project Arbitrary interprocedural control flow Assign (see Schedule for links)

Static Analysis of Virtualization- Obfuscated Binaries

1 Classical Universal Computer 3

How To Trace

Static Program Transformations for Efficient Software Model Checking

Regression Verification: Status Report

X86-64 Architecture Guide

CIS570 Modern Programming Language Implementation. Office hours: TDB 605 Levine

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Write Barrier Removal by Static Analysis

General Flow-Sensitive Pointer Analysis and Call Graph Construction

Software Testing & Analysis (F22ST3): Static Analysis Techniques 2. Andrew Ireland

Sources: On the Web: Slides will be available on:

Chapter 5 Names, Bindings, Type Checking, and Scopes

Why Do Software Assurance Tools Have Problems Finding Bugs Like Heartbleed?

µz An Efficient Engine for Fixed points with Constraints

Computer Organization and Architecture

Organization of Programming Languages CS320/520N. Lecture 05. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Lecture Notes on Static Analysis

Automated Program Behavior Analysis

Bypassing Browser Memory Protections in Windows Vista

Instruction Set Architecture of Mamba, a New Virtual Machine for Python

Static analysis for detecting taint-style vulnerabilities in web applications

IKOS: A Framework for Static Analysis based on Abstract Interpretation (Tool Paper)

Stack Allocation. Run-Time Data Structures. Static Structures

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Towards practical reactive security audit using extended static checkers 1

Introduction to Data-flow analysis

A Survey of Static Program Analysis Techniques

Linux Kernel. Security Report

Scalable and Precise Static Analysis of JavaScript Applications via Loop-Sensitivity

1 External Model Access

InvGen: An Efficient Invariant Generator

Static Error Detection using Semantic Inconsistency Inference

SAF: Static Analysis Improved Fuzzing

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6)

Upcompiling Legacy Code to Java

10CS35: Data Structures Using C

A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin

Software Fingerprinting for Automated Malicious Code Analysis

Instruction Set Architecture (ISA)

Visualizing Information Flow through C Programs

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case.

Dynamic Spyware Analysis

Binary Code Extraction and Interface Identification for Security Applications

Assembly Language: Function Calls" Jennifer Rexford!

Applying Clang Static Analyzer to Linux Kernel

16.1 MAPREDUCE. For personal use only, not for distribution. 333

Binary storage of graphs and related data

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

The programming language C. sws1 1

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Jakstab: A Static Analysis Platform for Binaries

Dynamic Spyware Analysis

I PUC - Computer Science. Practical s Syllabus. Contents

Stack machines The MIPS assembly language A simple source language Stack-machine implementation of the simple language Readings:

How To Develop A Static Analysis System For Large Programs

Lecture 2 Introduction to Data Flow Analysis

Boolean Expressions, Conditions, Loops, and Enumerations. Precedence Rules (from highest to lowest priority)

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

Software security assessment based on static analysis

Practical Memory Leak Detection using Guarded Value-Flow Analysis

Introduction. Figure 1 Schema of DarunGrim2

Intermediate Code. Intermediate Code Generation

Compiler Construction

Machine-Code Generation for Functions

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Dynamic Behavior Analysis Using Binary Instrumentation

Class notes Program Analysis course given by Prof. Mooly Sagiv Computer Science Department, Tel Aviv University second lecture 8/3/2007

C++FA 5.1 PRACTICE MID-TERM EXAM

Static Code Analysis Procedures in the Development Cycle

A Formally-Verified C Static Analyzer

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

Why? A central concept in Computer Science. Algorithms are ubiquitous.

An Introduction to Assembly Programming with the ARM 32-bit Processor Family

Tachyon: a Meta-circular Optimizing JavaScript Virtual Machine

Transcription:

Static Taint-Analysis on Binary Executables Sanjay Rawat, Laurent Mounier, Marie-Laure Potet VERIMAG University of Grenoble October 2011 Static Taint-Analysis on Binary Executables 1/29

Outline 1 Introduction 2 Intra-procedural taint analysis 3 Inter-procedural taint analysis 4 Tool plateform and experimental results 5 Conclusion Static Taint-Analysis on Binary Executables 2/29

Context: vulnerability analysis Vulnerable functions VF unsafe library functions (strcpy, memcpy, etc.) or code patterns (unchecked buffer copies, memory de-allocations) critical parts of the code (credential checkings) etc. Vulnerable paths = execution paths allowing to read external inputs (keyboard, network, files, etc.) on a memory location M i call a vulnerable function VF with parameter values depending on M i Static Taint-Analysis on Binary Executables 3/29

Vulnerable paths detection based on taint analysis Input: a set of input sources (IS) = tainted data a set of vulnerable functions (VF) Output: a set of tainted paths = tainted data-dependency paths from IS to VF x=is() y := x VF(y) Static Taint-Analysis on Binary Executables 4/29

Work objective Staticaly compute vulnerable execution paths: on large applications (several thousands of functions) scalability issues lightweight analysis from binary executable code Evaluation on existing vulnerable code (Firefox, Acroread, MediaPlayer,... ) Links with more general (test related) problems: interprocedural information flow analysis program chopping [T. Reps], impact analysis Static Taint-Analysis on Binary Executables 5/29

Outline 1 Introduction 2 Intra-procedural taint analysis 3 Inter-procedural taint analysis 4 Tool plateform and experimental results 5 Conclusion Static Taint-Analysis on Binary Executables 6/29

Taint Analysis Identify input dependent variables at each program location Two kinds of dependencies: Data dependencies // x is tainted y = x ; z = y + 1 ; y = 3 ; // z is tainted Control dependencies // x is tainted if (x > 0) y = 3 else y = 4 ; // y is tainted we focus on data dependencies... Static Taint-Analysis on Binary Executables 7/29

Static taint data-dependency analysis (classical) data-flow analysis problem: input functions return tainted values constants are untainted forward computation of a set of pairs (v, T ) at each program location: v is a variable T {Tainted, Untainted} is a taint value fix-point computation (backward dependencies inside loops) Main difficulties: memory aliases (pointers) non scalar variables (e.g., arrays) read (T[i]) ;... ; x = T[j] Static Taint-Analysis on Binary Executables 8/29

A source-level Example int x, y, z, t ;... read(y) ; read(t) ; y = 3 ; x = t ; z = y ; // x and t are now tainted Static Taint-Analysis on Binary Executables 9/29

Taint analysis at the assembly level y at ebp-8, x at ebp-4 and z at ebp-12. y = 3 ;... z = y ; 1: t3 := 3 2: t4 := ebp-8 3: Mem[t4] := t3... 7: t5 := ebp-8 8: t6 := Mem[t5] 9: t7 := ebp-12 10: Mem[t7] := t6 Needs to identify that: value written at ebp-8 mem. loc. written at line 3 content of reg. t4 at line 2 = content of reg. t5 at line 7 compute possible values of each registers and mem. locations... Static Taint-Analysis on Binary Executables 10/29

Value Set Analysis (VSA) Compute the sets of mem. addresses defined at each prog. loc. Difficult because: addresses and other values are not distinguishable both direct and indirect memory adresssing address arithmetic is pervasive Compute at each prog. loc. an over-approximation of: the set of (abstract) adresses that are defined the value contained in each register and each abstract address Can be expressed as a forward data-flow analysis... Static Taint-Analysis on Binary Executables 11/29

Memory model Memory = (unbounded) set of fix-sized memory cells Memloc = (consecutive) memory cells accessed during load/store ops. Memloc addresses: local variables and parameters offset w.r.t to ebp global variables fixed value dynamically allocated memory return values from malloc However: the exact value of ebp is unknown the value returned by a malloc() is unknown arithmetic computations to access non-scalar variables set of memory locations accessed is unknown statically Static Taint-Analysis on Binary Executables 12/29

Abstracting Adresses and Values Abstract address/value = - set of offsets w.r.t. register content at a given instruction - expressed as a pair < B, X > s.t.: B is an (abstract) base value, which can be either a pair (instruction, register) an element of {Empty, None, Any} X is a finite set of integers (X Z). Concrete values represented by < B, X > = if B = Empty (empty value) Z if B = Any (any value) X if B = None (constant value) {v + x x X v concrete val. of t at i} if B = (i, t) Static Taint-Analysis on Binary Executables 13/29

Example 1. t0 = ebp + 8 t0 =< (1, ebp), {8} > 2. t1 = Mem[t0] {the content of Mem[t0] is unknown... } t1 =< (2, t1), {0} > 3. t2 = t1 + 4 t2 =< (2, t1), {4} > 4. Mem[t2] = 50 < (2, t1), 4 >=< None, {50} > Static Taint-Analysis on Binary Executables 14/29

Intraprocedural VSA as a data-flow analysis Mapping { Register Abstract addresses } Abstract values associated to each CFG node Forward least-fix-point computation: lattice of abstract address/values (more precise less precise) widening operator (set of offsets is bounded) merge operator: least upper bound { < Any, > if B1 B2 < B1, X 1 > < B2, X 2 >= < B1, X 1 X 2 > if B1 = B2 transfer function: abstracts the instruction semantics... Static Taint-Analysis on Binary Executables 15/29

Example 1: conditional statement 1 # include <stdio.h> 2 int main () 3 { 4 int x, y=5, z ; 5 if (y <4) { 6 x =3; z =4; 7 } else { 8 x =4; z =3; 9 } ; 10 y=x+z; 11 return z; 12 } ebp = < 40105601, {0}> esp = < init, {4}> init-12 = < noval, {6, 7, 8}> init-16 = < noval, {3, 4}> init-8 = < noval, {3, 4}> Static Taint-Analysis on Binary Executables 16/29

Example 2: iterative statement 1 # include <stdio.h> 2 int main () 3 { 4 int x=0, i, y; 5 for (i =0; i <4; i ++) { 6 y =6; x=x+i; 7 } 8 return 0; 9 } ebp = < 40105701, {0}> esp = init, {4}> initesp-12 = < anyval, {}> initesp-16 = < noval, {6}> initesp-8 = < anyval, {}> Static Taint-Analysis on Binary Executables 17/29

Possible improvements use more sophisticated abstract domain? (e.g., stridded intervals?) restrict VSA to registers and memory locations involved in address computations Partially done... take into account the size of memory transfers Static Taint-Analysis on Binary Executables 18/29

Outline 1 Introduction 2 Intra-procedural taint analysis 3 Inter-procedural taint analysis 4 Tool plateform and experimental results 5 Conclusion Static Taint-Analysis on Binary Executables 19/29

Hypothesis on information flows Inside procedures: assigments: x := y + z From caller to callee: arguments: foo (x, y+12) From callee to caller: return value and pointer to arguments: z = foo (x, &y) And global variables... compute procedure summaries to express these dependencies. Static Taint-Analysis on Binary Executables 20/29

A summary-based inter-procedural data-flow analysis intra-procedural level: summary computation express side-effects wrt taintedness and aliases int foo(int x, int *y){ int z; z = x+1 ; *y = z ; return z ; } Summary: x is tainted z is tainted, z and *y are aliases inter-procedural level: apply summaries to effective parameters read(b) ; // taints b a = foo (b+12, &c) ; // a and c are now tainted... Static Taint-Analysis on Binary Executables 21/29

A summary-based inter-procedural data-flow analysis intra-procedural level: summary computation express side-effects wrt taintedness and aliases int foo(int x, int *y){ int z; z = x+1 ; *y = z ; return z ; } Summary: x is tainted z is tainted, z and *y are aliases inter-procedural level: apply summaries to effective parameters read(b) ; // taints b a = foo (b+12, &c) ; // a and c are now tainted... Static Taint-Analysis on Binary Executables 21/29

Scalability issues Fine-grained data-flow analysis not applicable on large programs needs some aggressive approximations: some deliberate over-approximations (global variables, complex data structures, etc.) consider only data-flow propagation operate at fine-grained level only on a program slice (parts of the code outside the slice either irrelevant or approximated) Static Taint-Analysis on Binary Executables 22/29

Slicing the Call Graph Inter-procedural information flow from IS to VF How to reduce the set of procedures to be analysed? A slice computation performed at the call graph level Split this set into 3 parts: 1 procedure that are not relevant 2 procedure those side-effect can be (implicitely) over-approximated use of default summaries... 3 procedure requiring a more detailed analysis summary computation through intra-procedural analysis Static Taint-Analysis on Binary Executables 23/29

Outline 1 Introduction 2 Intra-procedural taint analysis 3 Inter-procedural taint analysis 4 Tool plateform and experimental results 5 Conclusion Static Taint-Analysis on Binary Executables 24/29

Tool Highlights Based on two existing platforms: IDA Pro, a general purpose disassembler BinNavi: translation to an intermediate representation (REIL) a data-flow analysis engine (MonoREIL) + an additional set of Jython procedures But still under construction/evaluation... Static Taint-Analysis on Binary Executables 25/29

Tool Architecture Value Analysis Executable IDA Pro x86 BinNavi REIL Vulnerable Path detection var identification actual param identification VF detection Static Taint-Analysis on Binary Executables 26/29

Example of experimental result Name: Fox Player Total functions: 1074 Total vulnerable functions: 48 Total slices found: 16 (5 with a tainted data flow) Smallest slice: 3 func Largest slice: 40 func Average func in slice: 18 About 10 vulnerable paths discovered... Static Taint-Analysis on Binary Executables 27/29

Taintflow slice for Foxplayer Example sub_40de00 985 980 986 995 sscanf 975 sub_40dd20 972 sub_40d810 984 941 943 sub_4106a0 sub_411480 1231 1232 1321 sub_40eb80 sub_40eed0 sub_40f6f0 1075 1076 1078 1073 1097 1160 Static Taint-Analysis on Binary Executables 28/29 ReadFile

Conclusion Part of a more complete tool chain for Vulnerability Detection and Exploitability Analysis To be continued within the BinSec ANR project... Static Taint-Analysis on Binary Executables 29/29