Peeking into Pandora's Bochs { Instrumenting a Full System Emulator to Analyse Malicious Software



Similar documents
Peeking into Pandora s Bochs. Instrumenting a Full System Emulator to Analyse Malicious Software

Fine-grained covert debugging using hypervisors and analysis via visualization

TitanMist: Your First Step to Reversing Nirvana TitanMist. mist.reversinglabs.com

Attacking Obfuscated Code with IDA Pro. Chris Eagle

VICE Catch the hookers! (Plus new rootkit techniques) Jamie Butler Greg Hoglund

FATKit: A Framework for the Extraction and Analysis of Digital Forensic Data from Volatile System Memory p.1/11

Chapter 14 Analyzing Network Traffic. Ed Crowley

Automating Linux Malware Analysis Using Limon Sandbox Monnappa K A monnappa22@gmail.com

One-byte Modification for Breaking Memory Forensic Analysis

PE Explorer. Heaventools. Malware Code Analysis Made Easy

Parasitics: The Next Generation. Vitaly Zaytsev Abhishek Karnik Joshua Phillips

for Malware Analysis Daniel Quist Lorie Liebrock New Mexico Tech Los Alamos National Laboratory

Reverse Engineering by Crayon: Game Changing Hypervisor and Visualization Analysis

Dynamic analysis of malicious code

Sandy. The Malicious Exploit Analysis. Static Analysis and Dynamic exploit analysis. Garage4Hackers

File Disinfection Framework (FDF) Striking back at polymorphic viruses

Hypervisor-Based, Hardware-Assisted System Monitoring

Y R O. Memory Forensics: A Volatility Primer M E M. Mariano Graziano. Security Day - Lille1 University January Lille, France

Storm Worm & Botnet Analysis

Detecting Malware With Memory Forensics. Hal Pomeranz SANS Institute

Q-CERT Workshop. Matthew Geiger 2007 Carnegie Mellon University

Binary Code Extraction and Interface Identification for Security Applications

Dynamic Spyware Analysis

LASTLINE WHITEPAPER. In-Depth Analysis of Malware

Android Packer. facing the challenges, building solutions. Rowland YU. Senior Threat Researcher Virus Bulletin 2014

Detecting the One Percent: Advanced Targeted Malware Detection

CAS : A FRAMEWORK OF ONLINE DETECTING ADVANCE MALWARE FAMILIES FOR CLOUD-BASED SECURITY

EVTXtract. Recovering EVTX Records from Unallocated Space PRESENTED BY: Willi Ballenthin OCT 6, Mandiant Corporation. All rights reserved.

Adi Hayon Tomer Teller

Obfuscation: know your enemy

Managing large sound databases using Mpeg7

The Value of Physical Memory for Incident Response

How to Sandbox IIS Automatically without 0 False Positive and Negative

SAS Drug Development Release Notes 35DRG07

Binary Translation for Fun and Profit

From Georgia, with Love Win32/Georbot. Is someone trying to spy on Georgians?

Advanced ANDROID & ios Hands-on Exploitation

How To Port A Program To Dynamic C (C) (C-Based) (Program) (For A Non Portable Program) (Un Portable) (Permanent) (Non Portable) C-Based (Programs) (Powerpoint)

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Hotpatching and the Rise of Third-Party Patches

This is DEEPerent: Tracking App behaviors with (Nothing changed) phone for Evasive android malware

Bug hunting. Vulnerability finding methods in Windows 32 environments compared. FX of Phenoelit

Harnessing Intelligence from Malware Repositories

API Monitoring System for Defeating Worms and Exploits in MS-Windows System

CSCI E 98: Managed Environments for the Execution of Programs

Operating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

Dongwoo Kim : Hyeon-jeong Lee s Husband

C Compiler Targeting the Java Virtual Machine

Lab 2 : Basic File Server. Introduction

Networking Best Practices Guide. Version 6.5

KITES TECHNOLOGY COURSE MODULE (C, C++, DS)

A Simpler Way of Finding 0day

Migration of Process Credentials

Melde- und Analysestelle Informationssicherung MELANI Torpig/Mebroot Reverse Code Engineering (RCE)

Richmond SupportDesk Web Reports Module For Richmond SupportDesk v6.72. User Guide

Python, C++ and SWIG

Automatic Logging of Operating System Effects to Guide Application-Level Architecture Simulation

SYLLABUS MOBILE APPLICATION SECURITY AND PENETRATION TESTING. MASPT at a glance: v1.0 (28/01/2014) 10 highly practical modules

Memory Forensics for QQ from a Live System

Bypassing Memory Protections: The Future of Exploitation

File Server and Streams

CopyKittens Attack Group

Efficient Program Exploration by Input Fuzzing

Advantages of Amanda over Proprietary Backup

Gigabit Ethernet Packet Capture. User s Guide

Helping you avoid stack overflow crashes!

Table Of Contents. iii

Pentesting ios Apps Runtime Analysis and Manipulation. Andreas Kurtz

CIT 480: Securing Computer Systems. Malware

The HTTP Plug-in. Table of contents

Red Hat Linux Internals

Acronis Backup & Recovery: Events in Application Event Log of Windows

Everything is Terrible

Loophole+ with Ethical Hacking and Penetration Testing

Replication on Virtual Machines

Chapter 6, The Operating System Machine Level

File Systems Management and Examples

SIPAC. Signals and Data Identification, Processing, Analysis, and Classification

Securing Secure Browsers

Hide and seek - how targeted attacks hide behind clean applications Szappanos Gábor

Component visualization methods for large legacy software in C/C++

C++ Programming Language

Run-Time Deep Virtual Machine Introspection & Its Applications

Dynamic Spyware Analysis

Chapter 2: Remote Procedure Call (RPC)

Distributed Version Control with Mercurial and git

Application Note C++ Debugging

FORBIDDEN - Ethical Hacking Workshop Duration

A Day in the Life of a Cyber Tool Developer

Glossary of Object Oriented Terms

Spyware Doctor Enterprise Technical Data Sheet

Advanced Malware Cleaning Techniques for the IT Professional

Reverse Engineering and Computer Security

Transcription:

{ Instrumenting a Full System Emulator to Analyse Malicious Software Lutz Bohne (lutz.boehne@redteam-pentesting.de) RedTeam Pentesting GmbH http://www.redteam-pentesting.de October, 28th, Luxembourg hack.lu 2009

About Myself About RedTeam Pentesting About Myself F Lutz Bohne F Graduated in 2008 from RWTH Aachen University F Now employed by RedTeam Pentesting GmbH F Talk will cover some work I did for my Diploma Thesis

About Myself About RedTeam Pentesting About RedTeam Pentesting F Founded 2004 in Aachen, Germany F Specialisation exclusively on penetration tests F Worldwide realisation of penetration tests F Research in the IT security eld

Runtime Packers

Runtime Packers F malware is an ever-increasing threat F example: Symantec generated more than 1.6 million new malware signatures in 2008 1, a 165% increase over 2007 F automated analysis of malware a necessity due to large number of samples F also: malware often runtime-packed F lack of free and open source analysis tools 1 http://www.symantec.com/business/theme.jsp?themeid=threatreport

Runtime Packers Figure: PE binaries - on disk and in memory

Runtime Packers headers headers headers headers.text.text.text.data.data.rsrc.data stub.rsrc.rsrc.text.text.data.data.rsrc.rsrc stub stub Compression Decompression Figure: How runtime packers work

Runtime Packers Runtime Packers - Compression When packing a binary, F the original code and data are packed or encrypted F a small stub to unpack or decrypt the original code and data is added F the entrypoint is set to the stub's rst instruction F often, the original import information is removed

Runtime Packers Runtime Packers - Decompression When executing a runtime-packed binary, F rst, the stub is executed to decompress or decrypt the original code and data F second, the stub performs some tasks normally carried out by the PE loader, such as import resolution F nally, the stub transfers control to the original code, for example by jumping to the so-called Original Entry Point (OEP)

Runtime Packers Analysing runtime-packed executables Static analysis F code that is unpacked at runtime is typically not visible to static analysis methods F static analysis of the unpacking stub is sometimes hampered by anti-disassembly techniques Dynamic analysis F some runtime-packers employ anti-debugging techniques to hamper dynamic anlysis

Runtime Packers Weaknesses of typical runtime packers F CPUs can only execute \plain text" code F that code is \generated" at runtime by the unpacking stub and is at some point visible in memory F typical approach: monitor execution of the unpacking stub and dump process memory whenever new code is being executed F several projects deal with automated unpacking, but tools or source code are rarely released to the public.

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing Bochs Pandora's Bochs is based on Bochs 2 F FOSS PC Emulator F written in C++ F built-in debugger F supports instrumentation 2 http://bochs.sourceforge.net

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing Pandora's Bochs Pandora's Bochs originally designed as an automatic unpacker. Challenges: F unobtrusiveness F awareness of guest-os semantics F OEP detection F termination F reconstruction of valid PE les

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing Bochs can instrument certain events, for example F modication of the CR3 (Page Directory Base) register F memory accesses (writes) F execution of branch instructions! ideal for monitoring the unpacking process

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing Boch's Facilities Bochs has many macros with inscrutable names. One might even go as far as to say that Bochs is macro infested. - Bochs Developers Guide

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing Bochs's Facilities Implemented as a set of macros that are used throughout the emulator source code, for example: F BX_INSTR_TLB_CNTRL(cpu_id, what, new_cr3) F BX_INSTR_CNEAR_BRANCH_TAKEN(cpu_id, new_eip) BX_INSTR_CNEAR_BRANCH_NOT_TAKEN(cpu_id) BX_INSTR_UCNEAR_BRANCH(cpu_id, what, new_eip) BX_INSTR_FAR_BRANCH(cpu_id, what, new_cs, new_eip) F BX_INSTR_LIN_ACCESS(cpu_id, lin, phy, len, rw)

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing I prefer Python to C++, therefore wrote a Python interface: F Bochs is linked against the Python interpreter library F Bochs provides its own \module" that allows anything running within the Python interpreter to query emulator state (for example memory, registers) F at emulation startup, a module written in Python is imported F instrumentation macros essentially call a set of functions exported by the Python module

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing Instrument at two dierent levels of granularity: F coarse-grained instrumentation: whenever the CR3 register is modied, determine whether the current process is of interest. Turn ne-grained instrumentation on or o accordingly. F ne-grained instrumentation: if the current process is monitored, F record memory writes F monitor branches! check whether the branch target is modied memory All processes and their corresponding PE images are logged to a database. So are (optionally) branches and writes.

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing Identifying Processes on x86 F Modern operating systems provide each process with its own 4-GB virtual address space F x86 memory management unit uses page directories and page tables (\two-level paging") to translate virtual to physical memory addresses F page directory base register (CR3) contains physical address of active page directory! active page directory identies active virtual address space! every process identied by unique CR3 value

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing Figure: Paging on the x86 architecture

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing Figure: Segmentation on the x86 architecture

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing At fs:0 (segment descriptor 0x30) in kernel-mode: KPCR 0x000 0x01c 0x020 0x120 KPRCB 0x000 0x002 0x004 0x008 0x00c NtTib SelfPcr Prcb... PrcbData MinorVersion MajorVersion CurrentThread NextThread IdleThread... ETHREAD EPROCESS 0x000 Tcb 0x000 Pcb KTHREAD 0x000 0x010 0x018 0x034 KAPC_STATE 0x000 0x010 Header MutantListHead InitialStack... ApcState ApcListHead Process......... KPROCESS 0x000 0x010 0x018 0x020 0x06c 0x084 0x088 0x174 0x1b0 Header ProfileListHead DirectoryTableBase LdtDescriptor... ProcessLock... UniqueProcessId ActiveProcessLinks... ImageFileName... Peb... Figure: Identifying the current process in Windows (XP)

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing Figure: More information about the current process

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing Memory Dumps Whenever a branch targets memory that was previously written to by the same process, that memory region is dumped to a database F region to dump identied by VAD tree 3. F data structure in kernel space F contains information about a processes' virtual address space! stack, heap, memory-mapped les F need to continue execution, in case there is more to unpack! memory around the current branch target is marked clean 3 See Brendan Dolan-Gavitt. The VAD tree: A process-eye view of physical memory. Digital Investigation, Volume 4, Supplement 1:62{64, September 2007.

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing OEP Detection Branches to modied memory regions are OEP candidates Limitations: F only the rst branch to such a memory region F only branch targets within the original process image F last candidate is the most likely! when to stop monitoring?

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing Termination It is undecideable whether new code will be unpacked! when to stop unpacking? F Fixed timeout can guarantee termination F Before that timeout, track \innovation". A process shows innovation, if F there are many memory writes per unique branch target F new DLLs appear in the process image F modied memory is executed F an API function not called before is called F stop emulation after a congurable number of task switches where no monitored process showed innovation

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing Reconstructing a valid PE le from a memory image F copy original headers to the end of the le and zero-pad them F make \PE Signature Oset" point to the copied headers F set \Entry Point" to the detected OEP F set \File Alignment" to \Section Alignment" and correct all section headers F append new section header for a new section named.pandora that contains the copied headers F reconstruct Imports

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing Import Reconstruction Import Address Table (IAT): F on-disk: describes which library functions to resolve F normally lled by the PE loader with of addresses of library functions F in packed executables, typically lled by the unpacker stub Reconstruction: F nd all branches from within the process image to a DLL F disassemble the branch instruction! operands of indirect jumps are potentially within an IAT F inspect potential IAT, and try to resolve symbols! reconstruct IAT and corresponding headers

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing API Call Tracing API Call tracing yields information about a malware sample's behaviour F branch instructions are instrumented anyway! little overhead to check if branch target is an API function F need to know API function prototype to determine stack layout for API call arguments

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing GCC-XML 4 There is one open-source C++ parser, the C++ front-end to GCC, which is currently able to deal with the language in its entirety. The purpose of the GCC-XML extension is to generate an XML description of a C++ program from GCC's internal representation. Since XML is easy to parse, other development tools will be able to work with C++ programs without the burden of a complicated C++ parser. 4 http://www.gccxml.org

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing GCC-XML Output <Function id="_9749" name="getprocaddress" returns="_9622" context="_1" location="f2:2610" file="f2" line="2610" extern="1" attributes="dllimport stdcall "> <Argument name="hmodule" type="_8702"... /> <Argument name="lpprocname" type="_6677"... /> </Function> <Typedef id="_6677" name="lpcstr" type="_2864"... /> <PointerType id="_2864" type="_294c" size="32" align="32"/> <CvQualifiedType id="_294c" type="_294" const="1"/> <Typedef id="_294" name="char" type="_293"... /> <FundamentalType id="_293" name="char" size="8" align="8"/>

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing pygccxml 5 to the rescue Using pygccxml, we can use GCC-XML's output from python, to F query functions by name F inspect function prototypes F determine the stack layout for function calls Current implementation F handles character strings and integers F doesn't know anything about input and output parameters F doesn't handle return values F has basic support for handling stolen bytes 5 http://www.language-binding.net/pygccxml/pygccxml.html

Identifying and Monitoring Processes Termination Reconstruction API Call Tracing Stolen Bytes A method employed by some executable protectors. Basic idea: F copy rst N instructions of an API function to someplace else F append a jump to the (N+1)th instruction F modify import information to call the copied bytes Basic countermeasures: F if a branch target is not an exported symbol, use the one with the next-smallest address F disassemble instruction stream from there to the branch target F keep track of and adjust for instructions that modify ESP

Synthetic Samples Malware Samples

Synthetic Samples Malware Samples - Criteria F Unpacking Time F OEP detection F Does the unpacked code match the original code (.text section) F Could a valid and executable PE image be reconstructed

Synthetic Samples Malware Samples - Synthetic Samples F Generated by packing two dierent binaries, Notepad (68kB) und Wget (732kB) F 30 dierent runtime packers, using their default conguration F Only 20 packed Notepad samples would execute

Synthetic Samples Malware Samples - Synthetic Samples F hidden code could be extracted from almost all samples F OEP detected correctly for 80% of all samples F valid, executable PE images could be reconstructed for 58% of all samples F major obstacle to reconstruction: modication of the original code by a packer F unpacking times from several minutes to an hour or more! could be somewhat improved by logging less extensively

Synthetic Samples Malware Samples Malware Samples F 409 samples, collected over the course of one month by the RWTH Aachen Honeynet F 379 known malware (ClamAV), 239 runtime-packed(peid) F 361 started execution and 343 executed modied memory F average run time was 7 minutes and 21 seconds F Dr. Whatson started in 152 cases F analysis indicates most of them could be unpacked correctly F need to do more real-world testing

and Future Work F plain unpacking seems to work fairly well, appears to be largely immune to anti-debugging techniques F API call tracing not heavily tested F results so far look promising! future work: track return values, output parameters F major obstacle to reconstruction of valid PE images: F executable protectors that modify the original code F examples: stolen bytes, API call/entry point obfuscation! need better, interactive tools? F emulation speed is subpar, some compatibility issues! use dierent emulator/virtualizer?! prole and optimise instrumentation code

Additional Information F My Thesis is available at https://0x0badc0.de/pandorasbochs.pdf F Git repository: F mirrors the Bochs CVS repository F Pandora's Bochs commited into a branch pandoras bochs F moving target, used more as a version-controlled backup F clone from git://0x0badc0.de/home/repo/git/bochs F Gitweb at https://0x0badc0.de/gitweb?p=bochs/.git F Slides will be made available at http://www.redteam-pentesting.de

Questions?

Figure: Unpacking MEW11SE 1.2

Figure: Unpacking Neolite 2.0

Figure: Unpacking npack 1.1300beta

Figure: Unpacking PESpin 1.304

Figure: Unpacking telock 0.98

Figure: Unpacking UPX 3.01