Static Analysis of Virtualization- Obfuscated Binaries

Similar documents
Obfuscation: know your enemy

Automated Program Behavior Analysis

Applications of obfuscation to software and hardware systems

All Your Code Belongs To Us Dismantling Android Secrets With CodeInspect. Steven Arzt Secure Software Engineering Group Steven Arzt 1

Making Dynamic Memory Allocation Static To Support WCET Analyses

Software Protection through Code Obfuscation

picojava TM : A Hardware Implementation of the Java Virtual Machine

Debugging. Common Semantic Errors ESE112. Java Library. It is highly unlikely that you will write code that will work on the first go

Parasitics: The Next Generation. Vitaly Zaytsev Abhishek Karnik Joshua Phillips

Static Analysis. Find the Bug! : Analysis of Software Artifacts. Jonathan Aldrich. disable interrupts. ERROR: returning with interrupts disabled

Code Obfuscation. Mayur Kamat Nishant Kumar

Introduction to Program Obfuscation

HVM TP : A Time Predictable and Portable Java Virtual Machine for Hard Real-Time Embedded Systems JTRES 2014

Lecture Notes on Linear Search

Fine-grained covert debugging using hypervisors and analysis via visualization

Static Taint-Analysis on Binary Executables

A Static Analyzer for Large Safety-Critical Software. Considered Programs and Semantics. Automatic Program Verification by Abstract Interpretation

HOTPATH VM. An Effective JIT Compiler for Resource-constrained Devices

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

Detecting the One Percent: Advanced Targeted Malware Detection

Transparent Monitoring of a Process Self in a Virtual Environment

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Regression Verification: Status Report

C++FA 5.1 PRACTICE MID-TERM EXAM

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

MAQAO Performance Analysis and Optimization Tool

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015

Lecture 12: Software protection techniques. Software piracy protection Protection against reverse engineering of software

Instruction Set Architecture (ISA) Design. Classification Categories

Lecture 2 Introduction to Data Flow Analysis

Boolean Expressions, Conditions, Loops, and Enumerations. Precedence Rules (from highest to lowest priority)

Debugging with TotalView

Comp 255Q - 1M: Computer Organization Lab #3 - Machine Language Programs for the PDP-8

Java Virtual Machine: the key for accurated memory prefetching

Adaptive Context-sensitive Analysis for JavaScript

Loop-Extended Symbolic Execution on Binary Programs

Surreptitious Software

Reducing Transfer Delay Using Java Class File Splitting and Prefetching

CIS570 Modern Programming Language Implementation. Office hours: TDB 605 Levine

Software Tamper Resistance: Obstructing Static Analysis of Programs

This is DEEPerent: Tracking App behaviors with (Nothing changed) phone for Evasive android malware

Performance Analysis and Visualization of SystemC Models. Adam Donlin and Thomas Lenart Xilinx Research

Code Obfuscation Literature Survey

DATA OBFUSCATION. What is data obfuscation?

Verification and Validation of Software Components and Component Based Software Systems

Visualizing Information Flow through C Programs

Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory

Process Models and Metrics

Keil C51 Cross Compiler

InvGen: An Efficient Invariant Generator

Computer Game Programming Intelligence I: Basic Decision-Making Mechanisms

ASSEMBLY PROGRAMMING ON A VIRTUAL COMPUTER

Standard for Software Component Testing

CSE 403. Performance Profiling Marty Stepp

Compiling Object Oriented Languages. What is an Object-Oriented Programming Language? Implementation: Dynamic Binding

for Malware Analysis Daniel Quist Lorie Liebrock New Mexico Tech Los Alamos National Laboratory

Example of a Java program

J a v a Quiz (Unit 3, Test 0 Practice)

Detecting the Presence of Virtual Machines Using the Local Data Table

Hypervisor-Based, Hardware-Assisted System Monitoring

APPROACHES TO SOFTWARE TESTING PROGRAM VERIFICATION AND VALIDATION

A Test Suite for Basic CWE Effectiveness. Paul E. Black.

Topological Properties

Introduction to Support Vector Machines. Colin Campbell, Bristol University

11.1 inspectit inspectit

Platform Independent Code Obfuscation

Database Application Developer Tools Using Static Analysis and Dynamic Profiling

Using Evolutionary Algorithms to obfuscate code

Test Case Design Techniques

CS 241 Data Organization Coding Standards

Basic Programming and PC Skills: Basic Programming and PC Skills:

Outline of this lecture G52CON: Concepts of Concurrency

Massachusetts Institute of Technology 6.005: Elements of Software Construction Fall 2011 Quiz 2 November 21, 2011 SOLUTIONS.

Obfuscation by Partial Evaluation of Distorted Interpreters

Habanero Extreme Scale Software Research Project

Valgrind BoF Ideas, new features and directions

Introduction to Java

COS 318: Operating Systems

Monitoring, Tracing, Debugging (Under Construction)

Pre-Calculus Graphing Calculator Handbook

Transcription:

Static Analysis of Virtualization- Obfuscated Binaries Johannes Kinder School of Computer and Communication Sciences École Polytechnique Fédérale de Lausanne (EPFL), Switzerland

Virtualization Obfuscation Obfuscation Make code hard to understand for humans and tools Popular for protecting benign and malicious code Virtualization Obfuscation Hide code inside self-contained VM Considered one of the strongest obfuscation schemes Commercial tools (CodeVirtualizer, VMProtect) and research obfuscator 2

Static Analysis vs. Virtualization Static Analysis Computes program invariants Safety proofs, bug finding, malware detection What makes static analysis of virtualizationobfuscated code fundamentally hard? Can we still make it work and use it for deobfuscation? 3

Static Analysis 2 y = 3 1 void foo (int x) { 2 int y = ; 3 ; 4 ; if (x > 0) { ; } 8 else {} apicall(y); } x <= 0 4 8 x > 0 apicall(y) 4

Virtualization Obfuscation 1 void foo (int x) { 2 int y = ; 3 ; 4 ; if (x > 0) { ; } 8 else {} apicall(y); } code = { 2, 01, 02, 03, 01, 03, 01, 08, 00, 03, 03, 01, 18, 01, 00 } data = { 00, 00,, 0 } x y conditional jump distance Compile to Bytecode

int vpc = 0, op1, op2; while (true) { switch(code[vpc]) { 8 case 03: // increment op1 = code[vpc + 1]; data[op1]++; ; 12 break; 13 case 08: // conditional jump 14 op1 = code[vpc + 1]; 1 op2 = code[vpc + 2]; 1 if (data[op1] <= 0) 1 vpc += data[op2] 18 else 1 vpc += 3; 20 break; 21 case 18: // call function 22 op1 = code[vpc + 1]; 23 apicall(data[op1]); 24 ; 2 break; 2 case 2: // assignment 2 op1 = code[vpc + 1]; 28 op2 = code[vpc + 2]; 2 data[op1] = data[op2]; 30 vpc += 3; 31 break; 32 default: // halt 33 return; 34 } // end switch 3 } // end while code = { 2, 01, 02, 03, 01, 03, 01, 08, 00, 03, 03, 01, 18, 01, 00 } data = { 00, 00,, 0 } x y conditional jump distance

Control Flow Graphs 2 3 y = vpc = 0 default 33 code[vpc]==03 code[vpc]==2 4 code[vpc]==08 code[vpc]==18 14 22 2 x > 0 op1 = code[vpc+1] op1 = code[vpc+1] op1 = code[vpc+1] op1 = code[vpc+1] 1 23 28 x <= 0 8 data[op1]++ op2 = code[vpc+2] 1 data[op1]>0 data[op1]<=0 apicall(data[op1]) op2 = code[vpc+2] 24 2 data[op1]=data[op2] 18 1 30 vpc += 3 vpc += data[op2] vpc += 3 apicall(y) 34 Original CFG Obfuscated CFG

vpc = 0 default 33 code[vpc]==03 code[vpc]==08 code[vpc]==18 code[vpc]==2 14 22 2 op1 = code[vpc+1] op1 = code[vpc+1] op1 = code[vpc+1] op1 = code[vpc+1] 1 23 28 data[op1]++ op2 = code[vpc+2] apicall(data[op1]) op2 = code[vpc+2] 1 24 2 data[op1]>0 data[op1]<=0 data[op1]=data[op2] 18 1 30 vpc += 3 vpc += data[op2] vpc += 3 34 8

code = { 2, 01, 02, 03, 01, 03, 01, 08, 00, 03, 03, 01, 18, 01, 00 } data = { 00, 00,, 0 } vpc = 0 code[vpc]==03 op1 = code[vpc+1] Weak update Upper bounds grow to infinity Constant imprecise data[op1]++ Jump distance imprecise 34

1 interpreter case = many original locations Interpreter loop head shared among all 2 3 y = vpc = 0 default 33 code[vpc]==03 code[vpc]==2 4 code[vpc]==08 code[vpc]==18 14 22 2 x > 0 op1 = code[vpc+1] op1 = code[vpc+1] op1 = code[vpc+1] op1 = code[vpc+1] 1 23 28 x <= 0 8 data[op1]++ op2 = code[vpc+2] 1 data[op1]>0 data[op1]<=0 apicall(data[op1]) op2 = code[vpc+2] 24 2 data[op1]=data[op2] 18 1 30 vpc += 3 vpc += data[op2] vpc += 3 apicall(y) 34 Original CFG Obfuscated CFG

Domain Flattening 2 3 y = Obfuscate vpc = 0 default 33 code[vpc]==03 code[vpc]==2 4 code[vpc]==08 code[vpc]==18 14 22 2 x > 0 op1 = code[vpc+1] op1 = code[vpc+1] op1 = code[vpc+1] op1 = code[vpc+1] 1 23 28 x <= 0 8 data[op1]++ op2 = code[vpc+2] 1 data[op1]>0 data[op1]<=0 apicall(data[op1]) op2 = code[vpc+2] 24 2 data[op1]=data[op2] 18 1 30 vpc += 3 vpc += data[op2] vpc += 3 apicall(y) 34 Location Sensitive Analysis Location Insensitive Analysis

VPC Lifting Virtualization flattens one dimension of location Idea: track VPC and use as additional dimension Separate states with differing VPC values int y = ; ; Join states with equal VPC values if ( ) {} else {} 12

{ } apicall(y) vpc = 0 code[vpc]==03 op1 = code[vpc+1] data[op1]++ 34 13

VPC Values as Locations Location-sensitive analysis over domain A has domain VPC-sensitive analysis over domain A, for VPC domain V, has domain VPC Location 14

Reconstructing CFGs A program CFG is the set of all feasible transitions between locations Abstract transition relation A VPC-CFG is the set of all feasible transitions between VPC locations 1

Each has unique pair (pc, vpc) vpc = 0 default 33 code[vpc]==03 code[vpc]==08 code[vpc]==18 code[vpc]==2 14 22 2 op1 = code[vpc+1] op1 = code[vpc+1] op1 = code[vpc+1] op1 = code[vpc+1] 1 23 28 data[op1]++ op2 = code[vpc+2] apicall(data[op1]) op2 = code[vpc+2] 1 24 2 data[op1]>0 data[op1]<=0 data[op1]=data[op2] 18 1 30 vpc += 3 vpc += data[op2] vpc += 3 34 1

Each has unique pair (pc, vpc) 2 28 2 30 2 y = switch(code[vpc]) { case 03: op1 = code[vpc + 1]; data[op1]++; ; break; 3 4 Original CFG Constant propagation 14 1 1 1 x <= 0 8 x > 0 Dead code elimination Jump threading apicall(y) 22 23 24 1

Implementation Implemented in Jakstab [CAV 08] Processes obfuscated binaries Reconstructs CFGs in presence of indirect jumps Analysis VPC-lifted Bounded Address Tracking [FMCAD ] Sets of memory states up to per-variable bound U k = 3 18

VPC Discovery Intuition: VPC changes frequently Each instruction will have a separate VPC value VPC is the busiest variable of the interpreter Detect VPC on the fly The variable that hits the value bound per variable first is promoted to VPC Heuristic not guaranteed to identify it correctly 1

Preliminary Results Benchmark Baseline Similarity Time tamperproof guard 0% 81% 13s search tree 0% 0% 312s matrix multiply 0% 0% 3s stuxnet 0% 8% 31s Targets Created with research obfuscator (C. Collberg) Similarity Graph edit distance using basic block markers Baseline: similarity of obfuscated code to original 20

Conclusion Virtualization causes domain flattening Strips one level of location sensitivity VPC-lifting reintroduces location sensitivity CFG reconstruction by tracing VPC values Ongoing work Improve VPC discovery Apply to real world obfuscators http://www.jakstab.org 21