Peeking into Pandora s Bochs. Instrumenting a Full System Emulator to Analyse Malicious Software



Similar documents
Peeking into Pandora's Bochs { Instrumenting a Full System Emulator to Analyse Malicious Software

Fine-grained covert debugging using hypervisors and analysis via visualization

TitanMist: Your First Step to Reversing Nirvana TitanMist. mist.reversinglabs.com

FATKit: A Framework for the Extraction and Analysis of Digital Forensic Data from Volatile System Memory p.1/11

One-byte Modification for Breaking Memory Forensic Analysis

Reverse Engineering by Crayon: Game Changing Hypervisor and Visualization Analysis

VICE Catch the hookers! (Plus new rootkit techniques) Jamie Butler Greg Hoglund

Attacking Obfuscated Code with IDA Pro. Chris Eagle

Y R O. Memory Forensics: A Volatility Primer M E M. Mariano Graziano. Security Day - Lille1 University January Lille, France

Automating Linux Malware Analysis Using Limon Sandbox Monnappa K A monnappa22@gmail.com

Adi Hayon Tomer Teller

for Malware Analysis Daniel Quist Lorie Liebrock New Mexico Tech Los Alamos National Laboratory

File Disinfection Framework (FDF) Striking back at polymorphic viruses

Chapter 14 Analyzing Network Traffic. Ed Crowley

CAS : A FRAMEWORK OF ONLINE DETECTING ADVANCE MALWARE FAMILIES FOR CLOUD-BASED SECURITY

PE Explorer. Heaventools. Malware Code Analysis Made Easy

Detecting the One Percent: Advanced Targeted Malware Detection

Android Packer. facing the challenges, building solutions. Rowland YU. Senior Threat Researcher Virus Bulletin 2014

Dynamic analysis of malicious code

Q-CERT Workshop. Matthew Geiger 2007 Carnegie Mellon University

Lab 2 : Basic File Server. Introduction

EVTXtract. Recovering EVTX Records from Unallocated Space PRESENTED BY: Willi Ballenthin OCT 6, Mandiant Corporation. All rights reserved.

Hypervisor-Based, Hardware-Assisted System Monitoring

How To Port A Program To Dynamic C (C) (C-Based) (Program) (For A Non Portable Program) (Un Portable) (Permanent) (Non Portable) C-Based (Programs) (Powerpoint)

LASTLINE WHITEPAPER. In-Depth Analysis of Malware

Detecting Malware With Memory Forensics. Hal Pomeranz SANS Institute

The Value of Physical Memory for Incident Response

Dynamic Spyware Analysis

Developing tests for the KVM autotest framework

File Systems Management and Examples

Multi-core Programming System Overview

Storm Worm & Botnet Analysis

SAS Drug Development Release Notes 35DRG07

ProTrack: A Simple Provenance-tracking Filesystem

ELEC 377. Operating Systems. Week 1 Class 3

Sandy. The Malicious Exploit Analysis. Static Analysis and Dynamic exploit analysis. Garage4Hackers

Storage in Database Systems. CMPSCI 445 Fall 2010

Parasitics: The Next Generation. Vitaly Zaytsev Abhishek Karnik Joshua Phillips

Memory Forensics for QQ from a Live System

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

Chapter 13 Configuration Management

Training Needs Analysis

Binary Code Extraction and Interface Identification for Security Applications

Everything is Terrible

How We Did It. Unique data model abstraction layer to integrate, but de-couple EHR data from patient website design.

Run-Time Deep Virtual Machine Introspection & Its Applications

Chapter 2: Remote Procedure Call (RPC)

PTK Forensics. Dario Forte, Founder and Ceo DFLabs. The Sleuth Kit and Open Source Digital Forensics Conference

Richmond SupportDesk Web Reports Module For Richmond SupportDesk v6.72. User Guide

Technical Properties. Mobile Operating Systems. Overview Concepts of Mobile. Functions Processes. Lecture 11. Memory Management.

Jonathan Worthington Scarborough Linux User Group

Virtualization. Dr. Yingwu Zhu

Advantages of Amanda over Proprietary Backup

Chapter 5 Cloud Resource Virtualization

Python, C++ and SWIG

Networking Best Practices Guide. Version 6.5

Operating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015

ADL User Guide for Open AT V4.10

Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery

Storage Class Memory Support in the Windows Operating System Neal Christiansen Principal Development Lead Microsoft

Chapter 6, The Operating System Machine Level

APPLICATION VIRTUALIZATION TECHNOLOGIES WHITEPAPER

Chapter 14 Virtual Machines

API Monitoring System for Defeating Worms and Exploits in MS-Windows System

Application Note C++ Debugging

I Control Your Code Attack Vectors Through the Eyes of Software-based Fault Isolation. Mathias Payer, ETH Zurich

Replication on Virtual Machines

Chapter 5: Operating Systems Part 1

Table Of Contents. iii

Rotorcraft Health Management System (RHMS)

C Compiler Targeting the Java Virtual Machine

CSCI E 98: Managed Environments for the Execution of Programs

Windows Operating Systems. Basic Security

Partition Alignment Dramatically Increases System Performance

Component visualization methods for large legacy software in C/C++

winhex Disk Editor, RAM Editor PRESENTED BY: OMAR ZYADAT and LOAI HATTAR

DiskPulse DISK CHANGE MONITOR

LDIF - Linked Data Integration Framework

VMDriver: A Driver-based Monitoring Mechanism for Virtualization

Spyware Doctor Enterprise Technical Data Sheet

x86 ISA Modifications to support Virtual Machines

The Linux Virtual Filesystem

A Day in the Life of a Cyber Tool Developer

Gigabit Ethernet Packet Capture. User s Guide

Understanding EMC Avamar with EMC Data Protection Advisor

Migration of Process Credentials

The Win32 Network Management APIs

Violin: A Framework for Extensible Block-level Storage

Workflow Templates Library

About This Guide Signature Manager Outlook Edition Overview... 5

Guideline on Auditing and Log Management

SonicWALL CDP 5.0 Microsoft Exchange InfoStore Backup and Restore

Embedded Software Development with MPS

LICENSE4J LICENSE MANAGER USER GUIDE

Transcription:

Instrumenting a Full System Emulator to Analyse Malicious Software Lutz Böhne (lutz.boehne@redteam-pentesting.de) RedTeam Pentesting GmbH http://www.redteam-pentesting.de April 9th 2010, Paris Hackito Ergo Sum 2010

About Myself About RedTeam Pentesting About Myself Lutz Böhne Graduated in 2008 from RWTH Aachen University Now employed by RedTeam Pentesting GmbH Talk will cover some work I did for my Diploma Thesis

About Myself About RedTeam Pentesting About RedTeam Pentesting Founded 2004 in Aachen, Germany Specialisation exclusively on penetration tests Worldwide realisation of penetration tests Research in the IT security field

Malware Runtime Packers

Malware Runtime Packers Malware malware is an ever-increasing threat high number of samples automated analysis desirable lack of free and open source analysis tools malware is often runtime-packed

PE Files Introduction Malware Runtime Packers

Runtime Packers Introduction Malware Runtime Packers headers headers headers headers.text.text.text.data.data.rsrc.data stub.rsrc.rsrc.text.text.data.data.rsrc.rsrc stub stub Compression Decompression

Malware Runtime Packers Weaknesses of typical runtime packers CPUs can only execute plain text code that code is generated at runtime by the unpacking stub and is at some point visible in memory typical approach: monitor execution of the unpacking stub and dump process memory whenever new code is being executed several projects deal with automated unpacking, but tools or source code are rarely released to the public.

Pandora s Bochs Pandora s Bochs is an automated unpacker based on Bochs 1 Challenges: unobtrusiveness awareness of guest-os semantics reconstruction of valid PE files OEP detection termination 1 http://bochs.sourceforge.net

Bochs can instrument certain events, for example modification of the CR3 (Page Directory Base) register memory accesses (writes) execution of branch instructions ideal for monitoring the unpacking process

Boch s Facilities Bochs has many macros with inscrutable names. One might even go as far as to say that Bochs is macro infested. - Bochs Developers Guide

Bochs s Facilities Implemented as a set of macros that are used throughout the emulator source code, for example: BX_INSTR_TLB_CNTRL(cpu_id, what, new_cr3) BX_INSTR_CNEAR_BRANCH_TAKEN(cpu_id, new_eip) BX_INSTR_CNEAR_BRANCH_NOT_TAKEN(cpu_id) BX_INSTR_UCNEAR_BRANCH(cpu_id, what, new_eip) BX_INSTR_FAR_BRANCH(cpu_id, what, new_cs, new_eip) BX_INSTR_LIN_ACCESS(cpu_id, lin, phy, len, rw)

I prefer Python to C++, therefore wrote a Python interface: Bochs is linked against the Python interpreter library Bochs provides its own module that allows anything running within the Python interpreter to query emulator state (for example memory, registers) at emulation startup, a module written in Python is imported instrumentation macros essentially call a set of functions exported by the Python module

Introduction Instrument at two different levels of granularity: coarse-grained instrumentation: monitor virtual address space changes determine current process switch fine-grained instrumentation on or off fine-grained instrumentation: record memory writes monitor branches check whether the branch target is modified memory

Introduction Instrument at two different levels of granularity: coarse-grained instrumentation: monitor virtual address space changes determine current process switch fine-grained instrumentation on or off fine-grained instrumentation: record memory writes monitor branches check whether the branch target is modified memory

Paging on the x86 architecture Modern operating systems provide each process with its own 4-GB virtual address space x86 memory management unit uses page directories and page tables to translate virtual to physical memory addresses page directory base register (CR3) contains physical address of active page directory active page directory identifies active virtual address space every process identified by unique CR3 value

Paging on the x86 architecture

Introduction Instrument at two different levels of granularity: coarse-grained instrumentation: monitor virtual address space changes determine current process switch fine-grained instrumentation on or off fine-grained instrumentation: record memory writes monitor branches check whether the branch target is modified memory

Segmentation on the x86 architecture

Identifying the current process in Windows (XP) At fs:0 (segment descriptor 0x30) in kernel-mode: KPCR 0x000 0x01c 0x020 0x120 KPRCB 0x000 0x002 0x004 0x008 0x00c NtTib SelfPcr Prcb... PrcbData MinorVersion MajorVersion CurrentThread NextThread IdleThread... ETHREAD EPROCESS 0x000 Tcb 0x000 Pcb KTHREAD 0x000 0x010 0x018 0x034 KAPC_STATE 0x000 0x010 Header MutantListHead InitialStack... ApcState ApcListHead Process......... KPROCESS 0x000 0x010 0x018 0x020 0x06c 0x084 0x088 0x174 0x1b0 Header ProfileListHead DirectoryTableBase LdtDescriptor... ProcessLock... UniqueProcessId ActiveProcessLinks... ImageFileName... Peb...

More information about the current process

Introduction Instrument at two different levels of granularity: coarse-grained instrumentation: monitor virtual address space changes determine current process switch fine-grained instrumentation on or off fine-grained instrumentation: record memory writes monitor branches check whether the branch target is modified memory

Introduction Instrument at two different levels of granularity: coarse-grained instrumentation: monitor virtual address space changes determine current process switch fine-grained instrumentation on or off fine-grained instrumentation: record memory writes monitor branches check whether the branch target is modified memory

Memory Dumps Introduction Whenever a branch targets memory that was previously written to by the same process, that memory region is dumped to a database region to dump identified by VAD tree 2. data structure in kernel space contains information about a processes virtual address space stack, heap, memory-mapped files need to continue execution, in case there is more to unpack memory around the current branch target is marked clean 2 See Brendan Dolan-Gavitt. The VAD tree: A process-eye view of physical memory. Digital Investigation, Volume 4, Supplement 1:62 64, September 2007.

OEP Detection Branches to modified memory regions are OEP candidates Limitations: only the first branch to such a memory region only branch targets within the original process image last candidate is the most likely when to stop monitoring?

Will more layers get unpacked? undecidable in the general case when to stop unpacking? termination guaranteed by (configurable) time limit until then: determine innovation of a process stop emulation when no monitored process showed innovation for a certain amount of time

Reconstructing a valid PE file from a memory image copy original headers to the end of the file and zero-pad them make PE Signature Offset point to the copied headers set Entry Point to the detected OEP adjust File Alignment and correct all section headers append new section.pandora reconstruct Imports

0x0000 0x0400 0x1000 0x1400 0x1c00 Headers.text.data.rsrc on disk in memory Headers.text.data.rsrc image base +0x0400 +0x1000 +0x1c00 +0x2000 +0x2400 +0x3000 +0x3800 +0x4000

0x0000 0x0400 0x1000 0x1400 0x1c00 Headers.text.data.rsrc on disk in memory Headers.text.data.rsrc image base +0x0400 after dumping +0x1000 +0x1c00 +0x2000 +0x2400 +0x3000 +0x3800 +0x4000 Headers.text.data.rsrc.pandora copied reconstructed headers imports image base +0x0400 +0x1000 +0x1c00 +0x2000 +0x2400 +0x3000 +0x3800 +0x4000

Import Import Address Table (IAT): on-disk: describes which library functions to resolve normally filled by the PE loader with of addresses of library functions in packed executables, typically reconstructed at runtime by the unpacker stub : find indirect branches from within the process image to a DLL indirect operands are potentially within an IAT check potential IAT for validity and reconstruct

API Call tracing yields information about a malware sample s behaviour branch instructions are instrumented anyway little overhead to check if branch target is an API function need to know API function prototype to determine stack layout for API call arguments

GCC-XML 3 There is one open-source C++ parser, the C++ front-end to GCC, which is currently able to deal with the language in its entirety. The purpose of the GCC-XML extension is to generate an XML description of a C++ program from GCC s internal representation. Since XML is easy to parse, other development tools will be able to work with C++ programs without the burden of a complicated C++ parser. 3 http://www.gccxml.org

GCC-XML Output Introduction <Function id="_9749" name="getprocaddress" returns="_9622" context="_1" location="f2:2610" file="f2" line="2610" extern="1" attributes="dllimport stdcall "> <Argument name="hmodule" type="_8702"... /> <Argument name="lpprocname" type="_6677"... /> </Function> <Typedef id="_6677" name="lpcstr" type="_2864"... /> <PointerType id="_2864" type="_294c" size="32" align="32"/> <CvQualifiedType id="_294c" type="_294" const="1"/> <Typedef id="_294" name="char" type="_293"... /> <FundamentalType id="_293" name="char" size="8" align="8"/>

pygccxml 4 to the rescue Using pygccxml, we can use GCC-XML s output from python, to query functions by name inspect function prototypes determine the stack layout for function calls Current implementation handles character strings and integers outputs arguments (and return value) once on call and once on return 4 http://www.language-binding.net/pygccxml/pygccxml.html

Synthetic Samples Malware Samples

Synthetic Samples Malware Samples - Criteria Performance was evaluated on a set of synthetic samples and on unknown malware. Criteria: Unpacking Time OEP detection Does the unpacked code match the original code (.text section)? Could a valid and executable PE image be reconstructed?

Synthetic Samples Malware Samples - Synthetic Samples Packed Notepad (68kB) und Wget (732kB) 30 different runtime packers, using their default configuration hidden code could be extracted from almost all samples OEP detected correctly for 80% of all samples valid, executable PE images could be reconstructed for 58% of all samples major obstacle to reconstruction: modification of the original code by a packer unpacking times from several minutes to an hour or more

Synthetic Samples Malware Samples Malware Samples 409 samples, collected over the course of one month by the RWTH Aachen Honeynet 379 known malware (ClamAV), 239 runtime-packed (PEiD) 361 started execution and 343 executed modified memory average run time was 7 minutes and 21 seconds analysis indicates most of them could be unpacked correctly

and Future Work plain unpacking seems to work fairly well, appears to be largely immune to anti-debugging techniques API call tracing not heavily tested results so far look promising major obstacle to reconstruction of valid PE images: executable protectors that modify the original code examples: stolen bytes, API call/entry point obfuscation need better, interactive tools? emulation speed is subpar, some compatibility issues use different emulator/virtualizer? profile and optimise instrumentation code

Additional Information My Thesis is available at https://0x0badc0.de/pandorasbochs.pdf Git repository: mirrors the Bochs CVS repository Pandora s Bochs commited into a branch pandoras bochs moving target, used more as a version-controlled backup clone from git://0x0badc0.de/home/repo/git/bochs Gitweb at https://0x0badc0.de/gitweb?p=bochs/.git Slides will be made available at http://www.redteam-pentesting.de

Questions?

Figure: Unpacking MEW11SE 1.2

Figure: Unpacking Neolite 2.0

Figure: Unpacking npack 1.1300beta

Figure: Unpacking PESpin 1.304

Figure: Unpacking telock 0.98

Figure: Unpacking UPX 3.01