The BINCOA Framework for Binary Code Analysis



Similar documents
Instruction Set Architecture (ISA)

Cloud Computing. Up until now

How To Trace

LASTLINE WHITEPAPER. In-Depth Analysis of Malware

Obfuscation: know your enemy

Instruction Set Design

TASTE: A Real-Time Software Engineering Tool-Chain Overview, Status, and Future

Computer Architectures

MICROPROCESSOR. Exclusive for IACE Students iacehyd.blogspot.in Ph: /422 Page 1

CSCI 3136 Principles of Programming Languages

Applications of formal verification for secure Cloud environments at CEA LIST

Driving force. What future software needs. Potential research topics

B) Using Processor-Cache Affinity Information in Shared Memory Multiprocessor Scheduling

Java 6 'th. Concepts INTERNATIONAL STUDENT VERSION. edition

Platform-independent static binary code analysis using a metaassembly

Chapter 3.2 C++, Java, and Scripting Languages. The major programming languages used in game development.

Get the most value from your surveys with text analysis

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

The CompCert verified C compiler

Operating System Overview. Otto J. Anshus

WHITE PAPER. Peter Drucker. intentsoft.com 2014, Intentional Software Corporation

Study and installation of a VOIP service on ipaq in Linux environment

An evaluation of the Java Card environment

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

1 File Processing Systems

Lab Experience 17. Programming Language Translation

Memory Systems. Static Random Access Memory (SRAM) Cell

Introduction to Software Paradigms & Procedural Programming Paradigm

Jakstab: A Static Analysis Platform for Binaries

zen Platform technical white paper

Topics. Introduction. Java History CS 146. Introduction to Programming and Algorithms Module 1. Module Objectives

Product Overview. Contents

Computer Systems Structure Main Memory Organization

Departamento de Investigación. LaST: Language Study Tool. Nº 143 Edgard Lindner y Enrique Molinari Coordinación: Graciela Matich

C++ Programming Language

Dynamic analysis of malicious code

The Model Checker SPIN

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Semester Review. CSC 301, Fall 2015

Certification of a Scade 6 compiler

Software: Systems and Application Software

Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation

GameTime: A Toolkit for Timing Analysis of Software

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

A Tool Suite for the Generation and Validation of Configurations for Software Availability

MADRAS: Multi-Architecture Binary Rewriting Tool Technical report. Cédric Valensi University of Versailles Saint-Quentin en Yvelines

Virtualization Technologies (ENCS 691K Chapter 3)

The Clean programming language. Group 25, Jingui Li, Daren Tuzi

RE tools survey (part 1, collaboration and global software development in RE tools)

Improving Test Performance through Instrument Driver State Management

CS 40 Computing for the Web

Static Analysis of Virtualization- Obfuscated Binaries

IBM SPSS Text Analytics for Surveys

JCAT. Java Card TM. An environment for attack and test on. Serge Chaumette, Iban Hatchondo, Damien Sauveron CCCT 03 & ISAS 03

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

Full Potential of Dynamic Binary Translation for AV Emulation Engine

How To Write A Program In Anieme Frontend (For A Non-Programmable Language)

Simple Cooperative Scheduler for Arduino ARM & AVR. Aka «SCoop»

Ektron to EPiServer Digital Experience Cloud: Information Architecture

Interactive Graphic Design Using Automatic Presentation Knowledge

EE282 Computer Architecture and Organization Midterm Exam February 13, (Total Time = 120 minutes, Total Points = 100)

Binary Code Extraction and Interface Identification for Security Applications

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries

PE Explorer. Heaventools. Malware Code Analysis Made Easy

mbeddr: an Extensible MPS-based Programming Language and IDE for Embedded Systems

Exploiting Dynamic Information in IDEs Eases Software Maintenance

Memory is implemented as an array of electronic switches

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

SOFTWARE DEVELOPMENT STANDARD FOR SPACECRAFT

Cross-platform IL code manipulation library for runtime instrumentation of.net applications

2) Write in detail the issues in the design of code generator.

2. Background on Data Management. Aspects of Data Management and an Overview of Solutions used in Engineering Applications

Agent Languages. Overview. Requirements. Java. Tcl/Tk. Telescript. Evaluation. Artificial Intelligence Intelligent Agents

Personalizing the Home Network Experience using Cloud-Based SDN

OpenGL ES Safety-Critical Profile Philosophy

Protecting Virtual Servers with Acronis True Image Echo

VMware and CPU Virtualization Technology. Jack Lo Sr. Director, R&D

Windows 7 Security Event Log Format

CE 504 Computational Hydrology Computational Environments and Tools Fritz R. Fiedler

- An Essential Building Block for Stable and Reliable Compute Clusters

The programming language C. sws1 1

1 External Model Access

LASTLINE WHITEPAPER. Why Anti-Virus Solutions Based on Static Signatures Are Easy to Evade

Voice Driven Animation System

Axivion Bauhaus Suite Technical Factsheet

Automated Program Behavior Analysis

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

Language Processing Systems

CSE 307: Principles of Programming Languages

Dynamic Spyware Analysis

Transcription:

1/ 13 The BINCOA Framework for Binary Code Analysis Sébastien Bardin, Philippe Herrmann, Jérôme Leroux, Olivier Ly, Renaud Tabary, Aymeric Vincent CEA LIST (Saclay, Paris) LABRI (Bordeaux)

Binary code analysis 2/ 13

Binary code analysis at a glimpse 3/ 13 Recent research field [Codesurfer/x86, SAGE, Jakstab, Osmose, TraceAnalyzer, McVeto, Vine, BAP ] Many promising applications off-the-shelf components (including libraries) mobile code (including malware) third-party certification Advantages over source-code analysis always available no compilation gap allows precise quantitative analysis (ex : wcet) Very challenging conceptual challenges practical issues

Practical issues 4/ 13 Engineering issue : many different (large) ISAs supporting a new ISA : time-consuming, error-prone, tedious consequence : each tool support only a few ISAs (often one!) Semantic issue : each tool comes with its own formal(?) model exact semantics seldom available modelling hypothesises often unclear Consequences lots of redundant engineering work between analysers difficult to achieve empiric comparisons difficult to combine / reuse tools

The BINary COde Analysis project 5/ 13 French research project (CEA, Uni. Bordeaux 1, Uni. Paris 7) Propose a common formal model for low-level programs Dynamic Bitvector Automata (DBA) Provide basic open-source tool support basic DBA manipulation (future) front-ends from x86, PPC, ARM Develop (complementary) binary-level analysers OSMOSE (CEA), TraceAnalyzer (CEA), Insight (LABRI)

Long-term objective 6/ 13 Mutualize engineering work Common semantic Ease collaboration between analyses

Dynamic Bitvector Automata 7/ 13 Main design ideas small set of instructions concise and natural modelling of common ISAs low-level enough to allow bit-precise modelling Can model : instruction overlapping, return address smashing, endianness, overlapping memory read/write Limitations : (strong) no self-modifying code, (weak) no dynamic memory allocation, no FPA

Dynamic Bitvector Automata (2) 8/ 13 Extended automata-like formalism bitvector variables and arrays of bytes all bv sizes statically known, no side-effects standard operations from BVA Feature 1 : Dynamic transitions for dynamic jumps Feature 2 : Directed multiple-bytes read and write operations for endianness and word load/store Feature 3 : Memory zone properties for (simple) environment

Dynamic Bitvector Automata (2) 8/ 13 Feature 1 : Dynamic transitions some nodes are labelled by an address dynamic transitions have no predefined destination destination computed dynamically via a target expression Feature 2 : Directed multiple-bytes read and write operations array[expr;k # ], where k N and # {, } Feature 3 : Memory zone properties specify special behaviour for some segments of memory volatile, write-aborts, write-ignored, read-aborts

Modelling with DBA 9/ 13 Procedure calls / returns : encoded as static / dynamic jumps Memory zone properties, a few examples : ROM (write-ignored), memory controlled by env (volatile), code section (write-aborts)

DBA toolbox 10/ 13 Open-source Ocaml code for basic DBA manipulation Features a datatype for DBAs basic typing (size checking) over DBAs import (export) from (to) a XML format DBA simplification (see next) GPL license, based on xml-light, 3 kloc

DBA toolbox - simplifications 11/ 13 Goal : simplify unduly complex DBAs typically obtained from instruction-wise translation useless flag computations / auxiliary variables / etc. Inspired by standard compilation techniques [peephole, dead code, etc.] beware of partial DBAs and dynamic jumps! rethink these standard techniques in a partial CFG setting Results : size reduction of 50% (all instrs), and between 30% and 50% (non-goto instrs)

Binary-level analysers 12/ 13 Osmose (CEA) [ICST-08, STVR-11] automatic test data generation (dynamic symbolic execution) 75 kloc of OCaml, front-ends : PPC, M6800, Intel c509 case-studies : programs from aeronautics and energy > negotiations to become open-source TraceAnalyzer (CEA, with Franck Védrine) [VMCAI-11] safe CFG reconstruction (refinement-based static analysis) 29 kloc of C++, front-end : PPC case-studies : programs from aeronautics Insight (LABRI, with Emmanuel Fleury) abstract interpretation and weakest precondition C++, front-end : x86 case-studies (on-going) : polymorphic virus analysis > aims at being open source when the API stabilizes

Conclusion 13/ 13 Current state DBAs are a nice formalism to work with [improve our former model] common semantics allows exchange of information [OSMOSE - Traceanalyzer] basic DBA support Ongoing and future work open-source front-ends extensions of DBAs : support for dynamic memory allocation