Platform-independent static binary code analysis using a metaassembly

Similar documents
Static Taint-Analysis on Binary Executables

BARF: A multiplatform open source Binary Analysis and Reverse engineering Framework

Compilers I - Chapter 4: Generating Better Code

Assembly Language Programming

Static Analysis. Find the Bug! : Analysis of Software Artifacts. Jonathan Aldrich. disable interrupts. ERROR: returning with interrupts disabled

Instruction Set Architecture (ISA)

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Object Oriented Software Design

Object Oriented Software Design

Introduction to MIPS Assembly Programming

High level code and machine code

Computer Architectures

Stack machines The MIPS assembly language A simple source language Stack-machine implementation of the simple language Readings:

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

OAMulator. Online One Address Machine emulator and OAMPL compiler.

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Jonathan Worthington Scarborough Linux User Group

Compilers. Introduction to Compilers. Lecture 1. Spring term. Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam.

Introduction. Application Security. Reasons For Reverse Engineering. This lecture. Java Byte Code

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 20: Stack Frames 7 March 08

An Overview of Stack Architecture and the PSC 1000 Microprocessor

High-Level Programming Languages. Nell Dale & John Lewis (adaptation by Michael Goldwasser)

X86-64 Architecture Guide

Sources: On the Web: Slides will be available on:

1/20/2016 INTRODUCTION

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

The C Programming Language course syllabus associate level

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015

Bypassing Memory Protections: The Future of Exploitation

Identification and Removal of

Cloud Computing. Up until now

Sandy. The Malicious Exploit Analysis. Static Analysis and Dynamic exploit analysis. Garage4Hackers

Automating Mimicry Attacks Using Static Binary Analysis

Software Fingerprinting for Automated Malicious Code Analysis

Obfuscation: know your enemy

Computer Organization and Architecture

Instruction Set Architecture

Magento & Zend Benchmarks Version 1.2, 1.3 (with & without Flat Catalogs)

MACHINE ARCHITECTURE & LANGUAGE

CIS570 Modern Programming Language Implementation. Office hours: TDB 605 Levine

How to make the computer understand? Lecture 15: Putting it all together. Example (Output assembly code) Example (input program) Anatomy of a Computer

CSCI 3136 Principles of Programming Languages

1 Classical Universal Computer 3

ios applications reverse engineering Julien Bachmann

Language Processing Systems

CHAPTER 7: The CPU and Memory

Habanero Extreme Scale Software Research Project

Stitching the Gadgets On the Ineffectiveness of Coarse-Grained Control-Flow Integrity Protection

CSE 373: Data Structure & Algorithms Lecture 25: Programming Languages. Nicki Dell Spring 2014

EC 362 Problem Set #2

İSTANBUL AYDIN UNIVERSITY

Return-oriented programming without returns

M A S S A C H U S E T T S I N S T I T U T E O F T E C H N O L O G Y DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

The Security Development Lifecycle. OWASP 24 June The OWASP Foundation

DNA Data and Program Representation. Alexandre David

Software Protection through Code Obfuscation

Outline. hardware components programming environments. installing Python executing Python code. decimal and binary notations running Sage

Machine-Code Generation for Functions

CS 3719 (Theory of Computation and Algorithms) Lecture 4

Central Processing Unit (CPU)

Secure Software Programming and Vulnerability Analysis

Computer Programming I

Crema. A LangSec-Inspired Programming Language. Jacob Torrey (@JacobTorrey) & Mark Bridgman (@c0ercion) torreyj@ainfosec.com & bridgmanm@ainfosec.

Graded ARM assembly language Examples

Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013

Using Object Database db4o as Storage Provider in Voldemort

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

The Little Man Computer

Source Code Review Using Static Analysis Tools

The Advantages and Disadvantages of Network Computing Nodes

MICROPROCESSOR AND MICROCOMPUTER BASICS

CPU Organisation and Operation

1 The Java Virtual Machine

Chapter 7D The Java Virtual Machine

ARM Cortex-M3 Assembly Language

Lecture Outline. Stack machines The MIPS assembly language. Code Generation (I)

Efficient Low-Level Software Development for the i.mx Platform

CSC Software II: Principles of Programming Languages

VLIW Processors. VLIW Processors

Factoring & Primality

Bug hunting. Vulnerability finding methods in Windows 32 environments compared. FX of Phenoelit

.NET and J2EE Intro to Software Engineering

C# and Other Languages

How To Write Portable Programs In C

Towards practical reactive security audit using extended static checkers 1

Programming Language Rankings. Lecture 15: Type Inference, polymorphism & Type Classes. Top Combined. Tiobe Index. CSC 131! Fall, 2014!

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

64-Bit NASM Notes. Invoking 64-Bit NASM

Visualizing Information Flow through C Programs

The Course.

Introduction. Figure 1 Schema of DarunGrim2

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.

EMSCRIPTEN - COMPILING LLVM BITCODE TO JAVASCRIPT (?!)

Transcription:

Platform-independent static binary code analysis using a metaassembly language Thomas Dullien, Sebastian Porst zynamics GmbH CanSecWest 2009

Overview The REIL Language Abstract Interpretation MonoREIL Results 2

Motivation Bugs are getting harder to find Defensive side (most notably Microsoft) has invested a lot of money in a bugocide Concerted effort: Lots of manual code auditing aided by static analysis tools Phoenix RDK: Includes lattice based analysis framework to allow pluggable abstract interpretation in the compiler 3

Motivation Offense needs automated tools if they want to avoid being sidelined Offensive static analysis: Depth vs. Breadth Offense has no source code, no Phoenix RDK, and should not depend on Microsoft We want a static analysis framework for offensive purposes 4

Overview The REIL Language Abstract Interpretation MonoREIL Results 5

REIL Reverse Engineering Intermediate Language Platform-Independent meta-assembly language Specifically made for static code analysis of binary files Can be recovered from arbitrary native assembly code Supported so far: x86, PowerPC, ARM 6

Advantages of REIL Very small instruction set (17 instructions) Instructions are very simple Operands are very simple Free of side-effects Analysis algorithms can be written in a platform-independent way Great for security researchers working on more than one platform 7

Creation of REIL code Input: Disassembled Function x86, ARM, PowerPC, potentially others Each native assembly instruction is translated to one or more REIL instructions Output: The original function in REIL code 8

Example 9

Design Criteria Simplicity Small number of instructions Simplifies abstract interpretation (more later) Explicit flag modeling Simplifies reasoning about control-flow Explicit load and store instructions No side-effects 10

One Address REIL Instructions Source Address * 0x100 + n Easy to map REIL instructions back to input code One Mnemonic Three Operands Always An arbitrary amount of meta-data Nearly unused at this point 11

All operands are typed REIL Operands Can be either registers, literals, or sub-addresses No complex expressions All operands have a size 1 byte, 2 bytes, 4 bytes,... 12

The REIL Instruction Set Arithmetic Instructions ADD, SUB, MUL, DIV, MOD, BSH Bitwise Instructions AND, OR, XOR Data Transfer Instructions LDM, STM, STR 13

The REIL Instruction Set Conditional Instructions BISZ, JCC Other Instructions NOP, UNDEF, UNKN Instruction set is easily extensible 14

Register Machine REIL Architecture Unlimited number of registers t 0, t 1,... No explicit stack Simulated Memory Infinite storage Automatically assumes endianness of the source platform 15

Limitations of REIL Does not support certain instructions (FPU, MMX, Ring-0,...) yet Can not handle exceptions in a platformindependent way Can not handle self-modifying code Does not correctly deal with memory selectors 16

Overview The REIL Language Abstract Interpretation MonoREIL Results 17

Abstract Interpretation Theoretical background for most code analysis Developed by Patrick and Rhadia Cousot around 1975-1977 Formalizes static abstract reasoning about dynamic properties Huh? A lot of the literature is a bit dense for many security practitioners 18

Abstract Interpretation We want to make statements about programs Example: Possible set of values for variable x at a given program point p In essence: For each point p, we want to find Kp P(States) Problem: P(States) is a bit unwieldly Problem: Many questions are undecidable (where is the w*nker that yells halting problem )? 19

Dealing with unwieldy stuff Reason about something simpler: P(States) P(States) Abstraction Concretisation D D Example: Values vs. Intervals 20

Lattices In order for this to work, similar to P(States) must be structurally P(States) supports intersection and union You can check for inclusion (contains, does not contain) You have an empty set (bottom) and everything (top) D 21

Lattices A lattice is something like a generalized powerset Example lattices: Intervals, Signs, P(Registers ), mod p 22

Dealing with halting Original program consists of p 1... p n program points Each instruction transforms a set of states into a different set of states p 1... p n are mappings Specify p' 1 p' n : D ~ p : D This yields us n D P( States) P( States) D n 23

Dealing with halting n We cheat: Let be finite D is finite Make sure that Begin with initial state I Calculate Calculate D ~ p ( l ) ~ p( ~ p( l )) Eventually, you reach p~ is monotonous (like this talk) ~ n ( ) ~ n p l p 1 ( l ) You are done read off the results and see if your question is answered 24

Theory vs. practice A lot of the academic focus is on proving correctness of the transforms P(States) p i P(States) D As practitioner we know that p i is probably not fully correctly specified We care much more about choosing and constructing a so that we get the results we need D p i ' D 25

Overview The REIL Language Abstract Interpretation MonoREIL Results 26

MonoREIL You want to do static analysis You do not want to write a full abstract interpretation framework We provide one: MonoREIL A simple-to-use abstract interpretation framework based on REIL 27

You give it What does it do? The control flow graph of a function (2 LOC) A way to walk through the CFG (1 + n LOC) The lattice D (15 + n LOC) Lattice Elements A way to combine lattice elements The initial state (12 + n LOC) Effects of REIL instructions on D (50 + n LOC) 28

How does it work? Fixed-point iteration until final state is found Interpretation of result Map results back to original assembly code Implementation of MonoREIL already exists Usable from Java, ECMAScript, Python, Ruby 29

Overview The REIL Language Abstract Interpretation MonoREIL Results 30

First Example: Simple Register Tracking Question: What are the effects of a register on other instructions? Useful for following register values 31

Register Tracking Demo 32

Register Tracking Lattice: For each instruction, set of influenced registers, combine with union Initial State Empty (nearly) everywhere Start instruction: { tracked register } Transformations for MNEM op1, op2, op3 If op1 or op2 are tracked op3 is tracked too Otherwise: op3 is removed from set 33

Negative indexing Second Example: More complicated Question: Is this function indexing into an array with a negative value? This gets a bit more involved 34

Negative indexing Simple intervals alone do not help us much How would you model a situation where A function gets a structure pointer as argument The function retrieves a pointer to an array from an array of pointers in the structure The function then indexes negatively into this array Uh. Ok. 35

Abstract locations For each instruction, what are the contents of the registers? Let s slowly build complexity: If eax contains arg_4, how could this be modelled? eax = *(esp.in + 8) If eax contains arg_4 + 4? eax = *(esp.in + 8) + 4 If eax can contain arg_4+4, arg_4+8, arg_4+16, arg_4 + 20? eax = *(esp.in + 8) + [4, 20] 36

Abstract locations If eax can contain arg_4+4, arg_8+16? eax = *(esp.in + [8,12]) + [4,16] If eax can contain any element from arg_4 mem[0] to arg_4 mem[10], incremented once, how do we model this? eax = *(*(esp.in + [8,8]) + [4, 44]) + [1,1] OK. An abstract location is a base value and a list of intervals, each denoting memory dereferences (except the last) 37

Range Tracking eax.in + [a, b] + [0, 0] eax.in + a eax.in + b 38

Range Tracking eax + [a, b] + [c, d] + [0, 0] eax + a eax + b [eax+a]+c [eax+a]+d [eax+a+4]+c [eax+a+4]+d [eax+b]+c [eax+b]+d 39

Range Tracking Lattice: For each instruction, a map: Register Aloc Aloc Initial State Empty (nearly) everywhere Start instruction: { reg -> reg.in + [0,0] } Transformations Complicated. Next slide. 40

Transformations Range Tracking ADD/SUB are simple: Operate on last intervals STM op 1,, op 3 If op 1 or op 3 not in our input map M skip Otherwise, M[ M[op 3 ] ] = op 1 LDM op 1,, op 3 If op 1 or op 3 is not in our input map M skip M[ op 3 ] = M[ op 1 ] Others: Case-specific hacks 41

Where is the meat? Range Tracking Real world example: Find negative array indexing 42

MS08-67 Function takes in argument to a buffer Function performs complex pointer arithmetic Attacker can make this pointer arithmetic go bad The pointer to the target buffer of a wcscpy will be decremented beyond the beginning of the buffer 43

MS08-67 Michael Howard s Blog: In my opinion, hand reviewing this code and successfully finding this bug would require a great deal of skill and luck. So what about tools? It's very difficult to design an algorithm which can analyze C or C++ code for these sorts of errors. The possible variable states grows very, very quickly. It's even more difficult to take such algorithms and scale them to non-trivial code bases. This is made more complex as the function accepts a highly variable argument, it's not like the argument is the value 1, 2 or 3! Our present toolset does not catch this bug. 44

Michael is correct MS08-67 He has to defend all of Windows His regular developers have to live with the results of the automated tools His computational costs for an analysis are gigantic His developers have low tolerance for false positives 45

MS08-67 Attackers might have it easier They usually have a much smaller target They are highly motivated: I will tolerate 100 false positives for each real bug I can work through 20-50 a day A week for a bug is still worth it False positive reduction is nice, but if I have to read 100 functions instead of 20000, I have already gained something 46

MS08-67 Demo 47

Limitations and assumptions Limitations and assumptions The presented analysis does not deal with aliasing We make no claims about soundness We do not use conditional control-flow information We are still wrestling with calling convention issues The important bit is not our analysis itself the important part is MonoREIL Analysis algorithms will improve over time laying the foundations was the boring part 48

Status Abstract interpretation framework available in BinNavi Currently x86 In April (two weeks!): PPC and ARM Was only a matter of adding REIL translators Some example analyses: Register tracking (lame, but useful!) Negative array indexing (less lame, also useful!) 49

Outlook Deobfuscation through optimizing REIL More precise and better static analysis Register tracking etc. release in April (two weeks!) Negative array indexing etc. release in October Attempting to encourage others to build their own lattices 50

Related work? Julien Vanegue / ERESI team (EKOPARTY) Tyler Durden s Phrack 64 article Principles of Program Analysis (Nielson/Nielson/Hankin) University of Wisconsin WISA project Possibly related: GrammaTech CodeSurfer x86 51

Questions? ( Good Bye, Canada ) 52