Efficient Program Exploration by Input Fuzzing

Similar documents

Dynamic Behavior Analysis Using Binary Instrumentation

Analysis and Diversion of Duqu s Driver

[ X OR DDoS T h r e a t A d v i sory] akamai.com

Automatic Network Protocol Analysis

Inside a killer IMBot. Wei Ming Khoo University of Cambridge 19 Nov 2010

REpsych. : psycholigical warfare in reverse engineering. def con 2015 // domas

Introduction to Reverse Engineering

LASTLINE WHITEPAPER. The Holy Grail: Automatically Identifying Command and Control Connections from Bot Traffic

Fighting malware on your own

Obfuscation: know your enemy

Polyglot: Automatic Extraction of Protocol Message Format using Dynamic Binary Analysis

Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation

Egil Aspevik Martinsen Polymorphic Viruses. Material from Master Thesis «Detection of Junk Instructions in Malicious Software»

TitanMist: Your First Step to Reversing Nirvana TitanMist. mist.reversinglabs.com

Return-oriented programming without returns

Binary Code Extraction and Interface Identification for Security Applications

Peeling The Layers Of Vawtrak

Dynamic Spyware Analysis

RFID MODULE Mifare Reader / Writer SL032 User Manual Version 1.5 Nov 2012 StrongLink

Identification and Removal of

LASTLINE WHITEPAPER. Why Anti-Virus Solutions Based on Static Signatures Are Easy to Evade

Nemea: Searching for Botnet Footprints

Dynamic Spyware Analysis

Parasitics: The Next Generation. Vitaly Zaytsev Abhishek Karnik Joshua Phillips

Configurable Events for APC Network Management Card

Storm Worm & Botnet Analysis

Dolphin In-Circuit programming Updating Firmware in the field

Where s the FEEB? The Effectiveness of Instruction Set Randomization

Mission 1: The Bot Hunter

Self Protection Techniques in Malware

SRF08 Ultra sonic range finder Technical Specification

Towards Automated Malware Behavioral Analysis and Profiling for Digital Forensic Investigation Purposes

From Georgia, with Love Win32/Georbot. Is someone trying to spy on Georgians?

How to Grow a TREE from CBASS Interactive Binary Analysis for Security Professionals

WIZnet S2E (Serial-to-Ethernet) Device s Configuration Tool Programming Guide

Caml Virtual Machine File & data formats Document version: 1.4

Applications of obfuscation to software and hardware systems

Operation Liberpy : Keyloggers and information theft in Latin America

INTERNET ATTACKS AGAINST NUCLEAR POWER PLANTS

NGBPA Next Generation BotNet Protocol Analysis

Automated Repair of Binary and Assembly Programs for Cooperating Embedded Devices

RFID MODULE Mifare Reader / Writer SL031 User Manual Version 2.7 Nov 2012 StrongLink

Systematically Enhancing Black-Box Web Vulnerability Scanners

CIT 480: Securing Computer Systems. Malware

Joeri de Ruiter Sicco Verwer

Interface Protocol v1.2

Hacking Techniques & Intrusion Detection. Ali Al-Shemery arabnix [at] gmail

Getting Ahead of Malware

Automating Linux Malware Analysis Using Limon Sandbox Monnappa K A monnappa22@gmail.com

The Hillstone and Trend Micro Joint Solution

Comprehensive Advanced Threat Defense

Software Fingerprinting for Automated Malicious Code Analysis

Detecting the One Percent: Advanced Targeted Malware Detection

Trust Model for Hybrid Security Architecture Based on Reputation for Secure Execution of Mobile Agents

Automatic Test Data Synthesis using UML Sequence Diagrams

Effective and Efficient Malware Detection at the End Host

Command Processor for MPSSE and MCU Host Bus Emulation Modes

Xbox 360 File Specifications Reference

Command Param1 Param2 Return1 Return2 Description. 0xE9 0..0x7F (id) speed pos_high pos_low Set servo #id speed & read position

Automating Mimicry Attacks Using Static Binary Analysis

Evaluating a ROP Defense Mechanism. Vasilis Pappas, Michalis Polychronakis, and Angelos D. Keromytis Columbia University

Networks and Security Lab. Network Forensics

Standard for Software Component Testing

VIRUS TRACKER CHALLENGES OF RUNNING A LARGE SCALE SINKHOLE OPERATION

Malicious Network Traffic Analysis

Soft-Starter SSW-06 V1.6X

CS61: Systems Programing and Machine Organization

LASTLINE WHITEPAPER. Using Passive DNS Analysis to Automatically Detect Malicious Domains

Table 1 below is a complete list of MPTH commands with descriptions. Table 1 : MPTH Commands. Command Name Code Setting Value Description

Internet Monitoring via DNS Traffic Analysis. Wenke Lee Georgia Institute of Technology

How To Understand The Behavior Of A Computer Program Mathematically

Static Analysis of Virtualization- Obfuscated Binaries

Computer Virus Strategies and Detection Methods

Inside the Storm: Protocols and Encryption of the Storm Botnet

Lecture 2 Introduction to Data Flow Analysis

Sandy. The Malicious Exploit Analysis. Static Analysis and Dynamic exploit analysis. Garage4Hackers

esrever gnireenigne tfosorcim seiranib

Introduction. Figure 1 Schema of DarunGrim2

Computer Security DD2395

Anti-Malware Technologies

Selected Topics of IT Security ( ) Seminar description

Change Impact Analysis

Supporting ZDOs with the XBee API

EVILSEED: A Guided Approach to Finding Malicious Web Pages

Introduction. Application Security. Reasons For Reverse Engineering. This lecture. Java Byte Code

Assignment #3 Routing and Network Analysis. CIS3210 Computer Networks. University of Guelph

Transcription:

Efficient Program Exploration by Input Fuzzing towards a new approach in malcious code detection Guillaume Bonfante Jean-Yves Marion Ta Thanh Dinh Université de Lorraine CNRS - INRIA Nancy First Botnet Fighting Conference, 2013 1 / 49

Table of Contents Malicious program exploration Trace covering problem Problem Our approach: smart input fuzzing Implementation Ongoing work: a new approach in malware detection Program similarity FA approximation Conclusions 2 / 49

Context Host-based botnet detection The bot need communicate with the bot-master: receives special commands: does malicious things, otherwise: stays inactive In general: trigger-based malwares Real-life infamous examples Stuxnet: checks the value NTVDM TRACE If this value is equal to infection will not occur Falliere et al 2011 Gauss: decrypt the payload using several strings from the system and, upon success, executes it GReAT 2013 3 / 49

Researches on the code coverage Code coverage is considered extensively on the source code of programs (Godefroid et al 2005 and numerous subsequent works) but much fewer if one considers binary codes, malicious obfuscated programs (Moser et al 2007 and Brumley et al 2008) Detecting trigger-based malwares The direct dynamic-analysis fails (limited behaviors) The static-analysis faces some difficulties: few work on the binary codes, very sensitive to the obfuscation (Moser et al 2007) We propose a hybrid approach 4 / 49

Table of Contents Malicious program exploration Trace covering problem Problem Our approach: smart input fuzzing Implementation Ongoing work: a new approach in malware detection Program similarity FA approximation Conclusions 5 / 49

Table of Contents Malicious program exploration Trace covering problem Problem Our approach: smart input fuzzing Implementation Ongoing work: a new approach in malware detection Program similarity FA approximation Conclusions 6 / 49

Trace covering: hidden behaviors detection Hybrid approach: execute a program P which receives an input message m, we get a trace t as a sequence of instructions 0x40096b: mov cl, al 0x40096d: mov byte ptr [rbp-17], cl 0x400970: movsx eax, byte ptr [rbp-17] 0x400974: sar eax, 4 0x400977: cmp eax, 5 0x40097c: jle 0x4009af 0x40099b: call 0x400860 0x400860: as a path on the CFG sar eax, 4 cmp eax, 5 jle 0x4009af call 0x400860 0x4009af: call 0x4008c0 0x4008c0: call 0x4008c0 7 / 49

Trace covering: hidden behaviors detection Hybrid approach: execute a program P which receives an input message m, we get a trace t For each conditional branch br t, as a sequence of instructions 0x40096b: mov cl, al 0x40096d: mov byte ptr [rbp-17], cl 0x400970: movsx eax, byte ptr [rbp-17] 0x400974: sar eax, 4 0x400977: cmp eax, 5 0x40097c: jle 0x4009af 0x40099b: call 0x400860 0x400860: as a path on the CFG sar eax, 4 cmp eax, 5 jle 0x4009af call 0x400860 0x4009af: call 0x4008c0 0x4008c0: call 0x4008c0 8 / 49

Trace covering: hidden behaviors detection Hybrid approach: execute a program P which receives an input message m, we get a trace t For each conditional branch br t, find m so that the execution of P leads to a new trace t as a sequence of instructions 0x40096b: mov cl, al 0x40096d: mov byte ptr [rbp-17], cl 0x400970: movsx eax, byte ptr [rbp-17] 0x400974: sar eax, 4 0x400977: cmp eax, 5 0x40097c: jle 0x4009af 0x4009af: call 0x4008c0 0x4008c0: as a path on the CFG sar eax, 4 cmp eax, 5 jle 0x4009af call 0x400860 call 0x4008c0 9 / 49

Table of Contents Malicious program exploration Trace covering problem Problem Our approach: smart input fuzzing Implementation Ongoing work: a new approach in malware detection Program similarity FA approximation Conclusions 10 / 49

Backtracking on the CFG of the program Main ideas: for each checkpoint br t: fuzz testing minimization: find the minimal parts I br m of the input message affecting br s decision, re-execution trace optimization: find the nearest execution checkpoints C br t affecting br s decision 11 / 49

Backtracking on the CFG of the program Main ideas: for each checkpoint br t: fuzz testing minimization: find the minimal parts I br m of the input message affecting br s decision, re-execution trace optimization: find the nearest execution checkpoints C br t affecting br s decision let s consider an example 12 / 49

Example (program) 0x400966: call 0x4006e0 ;get_msg 0x40096b: mov cl, al 0x40096d: mov byte ptr [rbp-17], cl 0x400970: movsx eax, byte ptr [rbp-17] 0x400974: sar eax, 4 0x400977: cmp eax, 5 ;x[0]>5 0x40097c: jle 0x4009af 0x400982: call 0x400830 ;do_a 0x400987: movsx eax, byte ptr [rbp-17] 0x40098b: and eax, 15 0x400990: cmp eax, 7 ;x[1]<=7 0x400995: jnle 0x4009a5 0x40099b: call 0x400860 ;do_a1 0x4009a0: jmp 0x4009aa ; 0x4009a5: call 0x400890 ;do_a2 0x4009af: call 0x4008c0 ;do_b 0x4009b4: movsx eax, byte ptr [rbp-17] 0x4009b8: and eax, 15 0x4009bd: cmp eax, 8 ;x[1]>8 0x4009c2: jle 0x4009d2 0x4009c8: call 0x4008f0 ;do_b1 0x4009d2: call 0x400920 ;do_b2 m = get_msg(); if (m[0] > 5) do_a(); if (m[1] <= 7) do_a1(); else do_a2(); else do_b(); if (m[1] > 8) do_b1(); else do_b2(); input: m = byte ptr [rbp-17] branches: {0x40097c, 0x400995, 0x4009c2} checkpoints: C 0x40097c = 0x400970, C 0x400995 = 0x400987, C 0x4009c2 = 0x4009b4 13 / 49

Example (control flow graph) Input messages m = 47h Control flow graph call 0x4006e0 jle 0x4009af call 0x4008c0 jle 0x4009d2 call 0x400920 14 / 49

Example (control flow graph) Input messages m = 47h m = 49h Control flow graph call 0x4006e0 jle 0x4009af call 0x4008c0 jle 0x4009d2 call 0x400920 call 0x4008f0 15 / 49

Example (control flow graph) Input messages m = 47h m = 49h m = 67h Control flow graph call 0x4006e0 jle 0x4009af call 0x4008c0 call 0x400830 jle 0x4009d2 jnle 0x4009a5 call 0x400920 call 0x4008f0 call 0x400860 16 / 49

Example (control flow graph) Input messages m = 47h m = 49h m = 67h m = 66h Control flow graph call 0x4006e0 jle 0x4009af call 0x4008c0 call 0x400830 jle 0x4009d2 jnle 0x4009a5 call 0x400920 call 0x4008f0 call 0x400890 call 0x400860 17 / 49

Example (backtracking) Backtracking m = 47h Control flow graph call 0x4006e0 jle 0x4009af call 0x4008c0 jle 0x4009d2 call 0x400920 18 / 49

Example (backtracking) Backtracking m = 47h get checkpoints Control flow graph call 0x4006e0 jle 0x4009af call 0x4008c0 jle 0x4009d2 call 0x400920 19 / 49

Example (backtracking) Backtracking m = 47h get checkpoints Control flow graph call 0x4006e0 jle 0x4009af call 0x4008c0 jle 0x4009d2 call 0x400920 20 / 49

Example (backtracking) Backtracking m = 47h get checkpoints try m = 67h Control flow graph call 0x4006e0 jle 0x4009af call 0x4008c0 call 0x400830 jle 0x4009d2 call 0x400920 21 / 49

Example (backtracking) Backtracking m = 47h get checkpoints try m = 67h Control flow graph call 0x4006e0 jle 0x4009af call 0x4008c0 call 0x400830 jle 0x4009d2 call 0x400920 22 / 49

Example (backtracking) Backtracking m = 47h get checkpoints try m = 67h restore m Control flow graph call 0x4008c0 call 0x4006e0 jle 0x4009af call 0x400830 jle 0x4009d2 call 0x400920 23 / 49

Example (backtracking) Backtracking m = 47h get checkpoints try m = 67h restore m Control flow graph call 0x4008c0 call 0x4006e0 jle 0x4009af call 0x400830 jle 0x4009d2 call 0x400920 24 / 49

Example (backtracking) Backtracking m = 47h get checkpoints try m = 67h restore m try m = 49h Control flow graph call 0x4008c0 call 0x4006e0 jle 0x4009af call 0x400830 jle 0x4009d2 call 0x400920 call 0x4008f0 25 / 49

Example (backtracking) Backtracking m = 47h get checkpoints try m = 67h restore m try m = 49h Control flow graph jle 0x4009d2 call 0x4008c0 call 0x4006e0 jle 0x4009af call 0x400830 call 0x400920 call 0x4008f0 26 / 49

Example (backtracking) Backtracking m = 47h get checkpoints try m = 67h restore m try m = 49h restore m Control flow graph jle 0x4009d2 call 0x4008c0 call 0x4006e0 jle 0x4009af call 0x400830 call 0x400920 call 0x4008f0 27 / 49

Example (backtracking) Backtracking m = 47h get checkpoints try m = 67h restore m try m = 49h restore m Control flow graph jle 0x4009d2 call 0x4008c0 call 0x4006e0 jle 0x4009af call 0x400830 call 0x400920 call 0x4008f0 28 / 49

Example (backtracking) Backtracking m = 47h get checkpoints try m = 67h restore m try m = 49h restore m m = 67h Control flow graph call 0x400920 jle 0x4009d2 call 0x4008c0 call 0x4008f0 call 0x4006e0 jle 0x4009af call 0x400830 jnle 0x4009a5 call 0x400860 29 / 49

Example (backtracking) Backtracking m = 47h get checkpoints try m = 67h restore m try m = 49h restore m m = 67h Control flow graph call 0x400920 jle 0x4009d2 call 0x4008c0 call 0x4008f0 call 0x4006e0 jle 0x4009af call 0x400830 call 0x400890 jnle 0x4009a5 call 0x400860 30 / 49

Table of Contents Malicious program exploration Trace covering problem Problem Our approach: smart input fuzzing Implementation Ongoing work: a new approach in malware detection Program similarity FA approximation Conclusions 31 / 49

Fuzz testing optimization naive approach (infeasible): eg a (compressed) DNS response message of size 79 bytes, has 2 79x8 possible values!!! re-executing the whole program for each test is expensive our approach: reverse execution and reduce the number of tested inputs, reduce the length of re-execution traces: checkpoints by the dynamic tainting analysis 32 / 49

Dynamic tainting analysis by the liveness dataflow graph From the executed trace t, construct a graph with edges are instructions, and for each edge source nodes: read operands, target nodes: written operands movzx edi, word ptr [11] cmp word ptr [rax], di jz 0x35c360b7d8 Taint propagation from the input message 33 / 49

PathExplorer: a code coveraging tool using Pin dynamic binary instrumentation framework [2], source codes available at https://githubcom/tathanhdinh/pathexplorer 34 / 49

Experiment Backtracking traversal on the CFG of wget pin -t path_ explorer pin -r mlr -l depth -- wget url Options: mlr: the number of rollbacks for each checkpoint, depth: the depth of backtracking traversal depth mlr resolv/detected branches total rollbacks 700 7000 4/7 21017 700 35000 8/12 169923 800 7000 19/54 301976 800 35000 40/97 2659205 835 7000 207/452 2515703 835 35000 320/673 17061162 840 7000 210/562 3908525 840 35000 212/580 19384515 845 7000 224/619 4913159 845 35000 232/695 26048334 850 7000 245/780 6221047 850 35000 261/815 32299635 855 7000 294/930 8104504 855 35000 304/961 39327555 860 7000 319/992 9273380 860 35000 383/1140 43569664 900 7000 775/2543 22998356 900 35000 911/3210 144671156 35 / 49

Experiment 07 06 max local rollbacks 7000 35000 3500 3000 max local rollbacks 7000 35000 05 2500 resolved ratio 04 03 detected branches 2000 1500 02 1000 01 500 0 700 750 800 850 900 0 700 750 800 850 900 backtracking depth backtracking depth (a) Resolved ratio (b) Detected branches 16e+08 14e+08 max local rollbacks 7000 35000 12e+08 1e+08 used rollbacks 8e+07 6e+07 4e+07 2e+07 0 0 500 1000 1500 2000 2500 3000 3500 detected branches (c) Detected ratio 36 / 49

Table of Contents Malicious program exploration Trace covering problem Problem Our approach: smart input fuzzing Implementation Ongoing work: a new approach in malware detection Program similarity FA approximation Conclusions 37 / 49

Table of Contents Malicious program exploration Trace covering problem Problem Our approach: smart input fuzzing Implementation Ongoing work: a new approach in malware detection Program similarity FA approximation Conclusions 38 / 49

Malware characterization by message analysis In malware detection, we look for similarities between a known malcious program M and a suspicious P Traditional approach: trace similarity 39 / 49

Malware characterization by message analysis Thesis Two equivalent programs will interpret the input messages equivalently In malware detection, we look for similarities between a known malcious program M and a suspicious P Traditional approach: trace similarity Our approach: message interpretation similarity compare message relations instead of traces 40 / 49

Program similarity Input messages partition Let T be an equivalence between traces (eg partial similarity, control flow graph similarity, etc), the derived equivalence I between input messages is defined by: i 1 I i 2 P (i 1 ) T P (i 2 ) Program similarity by input messages partition P Q having the same derived equivalence I 41 / 49

Table of Contents Malicious program exploration Trace covering problem Problem Our approach: smart input fuzzing Implementation Ongoing work: a new approach in malware detection Program similarity FA approximation Conclusions 42 / 49

Input space partition by FA approximation Finite State Automata approximation The input message space are partitioned by Finite State Automata The input strings are the inputs of the program, The transition traces abstract the execution traces That extends the current approach in the Protocol Message Extraction (Caballero et al 2009) Corollary (systematic input format extraction) Ṭhe precisely obtained FA reveals the format of inputs 43 / 49

Early results in FA approximation {}: the parts of the input affecting to the branch s decision 0, 1: the decisions of a branch, : the execution halts before reaching the limit depth, : the execution continues after reaching the limit depth {3} 0xb2ca {0, 1} 1 0 0xb7ff 1 0xb2f6 0 1 0xa91a {4, 5} 0 1 0 0 0xa9da 0xa91a 0 Figure: wget and ping have the same approximation at the depth 600 44 / 49

Table of Contents Malicious program exploration Trace covering problem Problem Our approach: smart input fuzzing Implementation Ongoing work: a new approach in malware detection Program similarity FA approximation Conclusions 45 / 49

Conclusions Smart input fuzzing: hybrid approach for the program covering Dynamic-analysis: runs the program with a concrete input to get an execution trace, Static-analysis: construct the dataflow graph on the trace to detect checkpoints Improvement in progress: symbolic execution with SMT solver Message analysis: new approach for the program similarity Similarity: relations between traces (instead of traces) are compared, Protocol message extraction: the precisely obtained FA reveals the format of inputs 46 / 49

Conclusions (a brief comparison) current approaches our approach analysis method hybrid hybrid covering purpose functionality trace branch resolving symbolic execution fuzzing+re-execution technique trace minimization rollbacking technique whole-system emulation application-wide reverse execution source code unknown available 47 / 49

Thanks for your attention and any question? 48 / 49

Bibliography P Godefroid et al DART: Directed Automated Random Testing In: PLDI 2005 C-K Luk et al Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation In: PLDI 2005 A Moser et al Exploring Multiple Execution Paths for Malware Analysis In: SSP 2007 A Moser et al Limits of Static Analysis for Malware Detection In: ACSAC 2007 D Brumley et al Automatically Identifying Trigger-based Behavior in Malware In: Botnet Analysis and Defense 2008, pp 65 88 J Caballero et al Dispatcher: Enabling Active Botnet Infiltration Using Automatic Protocol Reverse-Engineering In: CCS 2009 N Falliere et al W32Stuxnet Dossier Tech rep Symantec Security Response, Feb 2011 GReAT The Mystery of the Encrypted Gauss Payload Aug 2013 49 / 49