Live Disk Forensics on Bare Metal

Similar documents
Implementation and Implications of a Stealth Hard-Drive Backdoor

COS 318: Operating Systems. I/O Device and Drivers. Input and Output. Definitions and General Method. Revisit Hardware

Computer Organization & Architecture Lecture #19

Module I-7410 Advanced Linux FS-11 Part1: Virtualization with KVM

Chapter 3: Operating-System Structures. System Components Operating System Services System Calls System Programs System Structure Virtual Machines

Example of Standard API

760 Veterans Circle, Warminster, PA Technical Proposal. Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA

Chapter 3: Operating-System Structures. Common System Components

The QEMU/KVM Hypervisor

Knut Omang Ifi/Oracle 19 Oct, 2015

Devices and Device Controllers

Phoenix SecureCore TM Setup Utility

IDE/ATA Interface. Objectives. IDE Interface. IDE Interface

Computer Systems Structure Input/Output

Medical Device Design: Shorten Prototype and Deployment Time with NI Tools. NI Technical Symposium 2008

BHyVe. BSD Hypervisor. Neel Natu Peter Grehan

Gigabit Ethernet Packet Capture. User s Guide

Objectives. Chapter 2: Operating-System Structures. Operating System Services (Cont.) Operating System Services. Operating System Services (Cont.

Chapter 11 I/O Management and Disk Scheduling

The Advanced JTAG Bridge. Nathan Yawn 05/12/09

Applications of Data Recovery Tools to Digital Forensics: Analyzing the Host Protected Area with the PC-3000

Automating Linux Malware Analysis Using Limon Sandbox Monnappa K A monnappa22@gmail.com

Operating System Structures

KVM Architecture Overview

Enabling Technologies for Distributed and Cloud Computing

Virtualization and Other Tricks.

Frontiers in Cyber Security: Beyond the OS

Real-Time KVM for the Masses Unrestricted Siemens AG All rights reserved

FRONT FLYLEAF PAGE. This page has been intentionally left blank

Intel Rapid Storage Technology

Implementing Network Attached Storage. Ken Fallon Bill Bullers Impactdata

A Comparison of NVMe and AHCI

Virtualization System Vulnerability Discovery Framework. Speaker: Qinghao Tang Title:360 Marvel Team Leader

Virtual Platforms Addressing challenges in telecom product development

Chapter 10 Case Study 1: LINUX

How do Users and Processes interact with the Operating System? Services for Processes. OS Structure with Services. Services for the OS Itself

CSC 2405: Computer Systems II

Configuring Your Computer and Network Adapters for Best Performance

NAND Flash FAQ. Eureka Technology. apn5_87. NAND Flash FAQ

Plug and Play for Windows 2000

x86 ISA Modifications to support Virtual Machines

基 於 SDN 與 可 程 式 化 硬 體 架 構 之 雲 端 網 路 系 統 交 換 器

UEFI on Dell BizClient Platforms

Enabling Technologies for Distributed Computing

Linux Driver Devices. Why, When, Which, How?

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Operating Systems Design 16. Networking: Sockets

LASTLINE WHITEPAPER. In-Depth Analysis of Malware

MAGENTO HOSTING Progressive Server Performance Improvements

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Router Architectures

Virtualization in Linux KVM + QEMU

Linux. Reverse Debugging. Target Communication Framework. Nexus. Intel Trace Hub GDB. PIL Simulation CONTENTS

Ways to Use USB in Embedded Systems

CS3600 SYSTEMS AND NETWORKS

Understanding Flash SSD Performance

Storage and File Systems. Chester Rebeiro IIT Madras

I/O. Input/Output. Types of devices. Interface. Computer hardware

PTK Forensics. Dario Forte, Founder and Ceo DFLabs. The Sleuth Kit and Open Source Digital Forensics Conference

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

KVM: Kernel-based Virtualization Driver

A Day in the Life of a Cyber Tool Developer

Linux Kernel Architecture

Have both hardware and software. Want to hide the details from the programmer (user).

Software Stacks for Mixed-critical Applications: Consolidating IEEE AVB and Time-triggered Ethernet in Next-generation Automotive Electronics

Troubleshooting Tools to Diagnose or Report a Problem February 23, 2012

Big Data Technologies for Ultra-High-Speed Data Transfer and Processing

Virtualization and the U2 Databases

5-BAY RAID STATION. Manual

Red Hat VDI. David Simmons

A Storage Architecture for High Speed Signal Processing: Embedding RAID 0 on FPGA

Introduction To Computer Networking

HRG Assessment: Stratus everrun Enterprise

Providing a jump start to EFI application development and a uniform pre-boot environment

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

Procedure: You can find the problem sheet on Drive D: of the lab PCs. Part 1: Router & Switch

The new frontier of the DATA acquisition using 1 and 10 Gb/s Ethernet links. Filippo Costa on behalf of the ALICE DAQ group

Virtualization Performance on SGI UV 2000 using Red Hat Enterprise Linux 6.3 KVM

Chapter 14 Analyzing Network Traffic. Ed Crowley

I/O Device and Drivers

Benefits of Intel Matrix Storage Technology

VLAN for DekTec Network Adapters

Fastboot Techniques for x86 Architectures. Marcus Bortel Field Application Engineer QNX Software Systems

Gigabit Ethernet Design

85MIV2 / 85MIV2-L -- Components Locations

SATA 150 RAID. user. Model MAN UM

A+ Guide to Managing and Maintaining Your PC, 7e. Chapter 1 Introducing Hardware

Programming Interface. for. Bus Master IDE Controller. Revision 1.0

Windows8 Internals, Sixth Edition, Part 1

CMPS223 Final Project Virtual Machine Introspection Techniques

Going Linux on Massive Multicore

A quantitative comparison between xen and kvm

What is a System on a Chip?

The Value of Physical Memory for Incident Response

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design

Transcription:

Live Disk Forensics on Bare Metal Hongyi Hu and Chad Spensky {hongyi.hu,chad.spensky}@ll.mit.edu Open-Source Digital Forensics Conference 2014 This work is sponsored by the Assistant Secretary of Defense for Research and Engineering under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government.

Who are we? Chad Spensky Lifetime hacker/tinkerer Education BS @ University of Pittsburgh MS @ University of North Carolina Research staff at MIT Lincoln Laboratory 3 rd time at OSDF Con User and modifier of TSK and Volatility Live Disk Forensics - 2

Who are we? Hongyi Hu Computer scientist, tinkerer, lawyer Education S.B., M.Eng @ MIT J.D. @ Boston U. Research staff at MIT Lincoln Laboratory 2nd time at OSDF Con My photos are not as cool as Chad s J Live Disk Forensics - 3

Agenda Overview Motivation Architecture Live Disk Forensics Summary Future Directions Live Disk Forensics - 4

Overview This talk is a small portion of a larger program LO-PHI: Low-Observable Physical Host Instrumentation Problem Statement Instrument physical and virtual machines while introducing as few artifacts as possible. Goals Be as difficult-to-detect as possible Develop capabilities for bare-metal machines Produce high-level semantic information LO-PH Live Disk Forensics - 5

Why? Malware analysis Malware can actively evade detectable analysis artifacts and may behave differently Cleanroom execution environment Installing software on the system may not always be an option E.g. Xbox 360 Low-artifact debugging Debuggers can be detected and evaded or mask real-world behavior Live Disk Forensics - 6

How? Instrument interesting tap points in the system E.g. Hard Disk, Main Memory, CPU, Network Bridge the semantic gap to obtain useful information from these raw data sources E.g. Volatility, Sleuthkit Analyze the raw and semantic data to answer interesting questions Is program X malware? What files were accessed? Is this machine compromised? Live Disk Forensics - 7

Agenda Overview Motivation Architecture Live Disk Forensics Summary Future Directions Live Disk Forensics - 8

Current Instrumentation Access physical memory Virtual: libvmi Physical: PCI & PCI-express FPGA boards Passively monitor disk activity Virtual: Custom hooks into QEMU block driver Physical: SATA man-in-the-middle with custom FPGA CPU Instrumentation Virtual: Custom hooks into QEMU KVM Physical: Working with Intel s extended Debug Port (XDP) and ARM s DSTREAM debugger Actuate inputs Virtual: libvirt Physical: Arduino Leonardo Live Disk Forensics - 9

Current Instrumentation Access physical memory Virtual: libvmi Physical: PCI & PCI-express FPGA boards Passively monitor disk activity Virtual: Custom hooks into QEMU block driver Physical: SATA man-in-the-middle with custom FPGA CPU Instrumentation Virtual: Custom hooks into QEMU KVM Physical: Working with Intel s extended Debug Port (XDP) and ARM s DSTREAM debugger Actuate inputs Virtual: libvirt Physical: Arduino Leonardo Live Disk Forensics - 10

Physical Instrumentation Power, Keyboard, Mouse SATA Introspection Network Tap Memory Introspection Semantic Analysis Live Disk Forensics - 11

Physical Instrumentation Power, Keyboard, Mouse SATA Introspection Network Tap Memory Introspection Semantic Analysis Live Disk Forensics - 12

Virtual Instrumentation block.c UNIX Socket LO-PH Semantic Analysis Live Disk Forensics - 13

Virtual Instrumentation block.c UNIX Socket LO-PH Semantic Analysis Live Disk Forensics - 14

Bridging the Semantic Gap Problem Most forensic tools, i.e. Volatility and Sleuthkit, assume static offline data We need to analyze live data streams Live Memory Introspection We were able to optimize Volatility to use a custom address space that speaks directly to our hardware Other code to deal with smearing vs. snapshots etc. Live Disk Forensics Far less straight-forward, especially on physical HDDs Live Disk Forensics - 15

Agenda Overview Motivation Architecture Live Disk Forensics Summary Future Directions Live Disk Forensics - 16

Live Disk Forensics 1. Instrumentation: Obtain a stream of disk activity Read 1 sector from block 0, [DATA] Write 1 sector to block 0, [DATA]... 2. Semantic Gap: Determine the meaning of this read/write Master Boot Record was modified File read/write/rename/etc. 3. Analyze data Is that bad? 2. Semantic 1. Data Collection 3. Analysis Reconstruction Live Disk Forensics - 17

Disk Instrumentation Virtual (QEMU/KVM) Obtain block, sector count, data, and read/write directly from block driver Physical Required developing specialized hardware Currently using a Xilinx development board Using off-the-shelf SATA core from Intelliprop Custom code for C&C over Ethernet Outputs raw SATA frames over UDP (~80MB/sec) ML507 Live Disk Forensics - 18

Disk Instrumentation Virtual Limitations Artifacts Same as QEMU Requires modifications to QEMU source Physical Limitations Artifacts May sometimes need to throttle SATA to ensure full capture Packet loss UDP is a best-effort protocol 2. Semantic 1. Data Collection 3. Analysis Reconstruction Live Disk Forensics - 19

Live Disk Forensics - 20 Disk Instrumentation: Physical

Live Disk Forensics - 21 Disk Instrumentation: Physical

Semantic Reconstruction 1. Start with a forensic copy of the instrumented disk 2. Identify the file system on the disk E.g. magic numbers, expert knowledge 3. Obtain stream of accesses to the instrumented disk in a common format E.g. (Logical Block Address, Data, Operation) 4. Utilize forensic tools to identify subsequent file system operation 2. Semantic 1. Data Collection 3. Analysis Reconstruction Live Disk Forensics - 22

SATA Reconstruction Multiple layers of abstraction that we must bridge Analog Signal à Raw bits Raw bits à SATA Frames SATA Frames à Sector manipulation Sector manipulation à File System Manipulation SATA Reconstruction File System Reconstruction 2. Semantic 1. Data Collection 3. Analysis Reconstruction Live Disk Forensics - 23

SATA Reconstruction Multiple layers of abstraction that we must bridge Analog Signal à Raw bits } Xilinx ML507 Raw bits à SATA Frames SATA Frames à Sector manipulation Sector manipulation à File System Manipulation SATA Reconstruction File System Reconstruction 2. Semantic 1. Data Collection 3. Analysis Reconstruction Live Disk Forensics - 24

SATA Reconstruction A Brief Primer on SATA (1) Serial ATA bus interface that replaces older IDE/ATA standards SATA uses frames (FIS) to communicate between host and device FIS Frame Information Structure Live Disk Forensics - 25

SATA Reconstruction A Brief Primer on SATA (2) Multi-layer protocol (physical, link, transport, command) Reconstruction focuses on the command layer Read SATA standard Appendix B is useful! Live Disk Forensics - 26

SATA Reconstruction A Brief Primer on SATA (3) Register FIS Host to Device Marks the beginning of SATA transaction Contains the logical block address (LBA) and operation information (read or write) Register FIS Device to Host Often marks completion of SATA transaction Also used in software reset protocol, device diagnostic, etc. Live Disk Forensics - 27

SATA Reconstruction A Brief Primer on SATA (4) DMA Activate Device declares that it is ready to receive DMA data (for a write) DMA Setup Precedes Data frames (for NCQ, AFAIK) Live Disk Forensics - 28

SATA Reconstruction A Brief Primer on SATA (5) Data contains data! BIST (Built In Self Test) PIO (Programmed I/O) Older mode of data transfer before DMA Other protocols not mentioned here Software reset, device diagnostic, device reset, packet Read the SATA spec for more info Live Disk Forensics - 29

SATA Reconstruction A Brief Primer on SATA (6) HOST DEVICE Register HTD Tells us the LBA (sector), number of sectors, operation, etc. Data A DMA Activate Data B Data C Register DTH Example DMA Write Live Disk Forensics - 30

SATA Reconstruction Native Command Queuing (1) Native Command Queuing (NCQ) makes reconstruction harder NCQ allows for up to 32 separate, concurrent, asynchronous disk transactions Many SATA devices implement NCQ NCQ identifies transactions by 5-bit TAG field (0-31) Live Disk Forensics - 31

SATA Reconstruction Native Command Queuing (2) Not all NCQ frames are tagged (e.g. DATA), so we perform reconstruction to correctly de-interleave transactions State machine to track status of each transaction (including error conditions) Very tricky in practice often differences between the official documentation and actual disk manufacturer practice Live Disk Forensics - 32

SATA Reconstruction Native Command Queuing (3) Example Live Disk Forensics - 33

SATA Reconstruction Wrote a Python module to handle all of these transactions Consumes raw SATA frames Supports all of the existing SATA versions Outputs stream of logical sector operations Traditional SATA analyzers are expensive and don t provide analysis-friendly interfaces Live Disk Forensics - 34

File System Reconstruction Multiple layers of abstraction that we must bridge Analog Signal à Raw bits } Xilinx ML507 Raw bits à SATA Frames SATA Frames à Sector manipulation Sector manipulation à File System Manipulation SATA Reconstruction File System Reconstruction 2. Semantic 1. Data Collection 3. Analysis Reconstruction Live Disk Forensics - 35

File System Reconstruction Multiple layers of abstraction that we must bridge Analog Signal à Raw bits } Xilinx ML507 Raw bits à SATA Frames SATA Frames à Sector manipulation SATA Reconstruction Sector manipulation à File System Manipulation SATA Reconstruction File System Reconstruction 2. Semantic 1. Data Collection 3. Analysis Reconstruction Live Disk Forensics - 36

File System Reconstruction Sector to file mapping handled by existing forensic tools E.g. Sleuthkit We use TSK for our base case and only need to track changes Read Operations Report context with associated index node (inode) Us Write operations Update mapping if needed Report context with associated inode TSK Live Disk Forensics - 37

File System Reconstruction: NTFS Disk Packet Disk Op: Write Start Sector: 493968 Num Sectors: 16 Data:. Filesystem Op Type: Content Write MFT Record: 1349 Filename: C:\foo\bar Live Disk Forensics - 38

File System Reconstruction: NTFS Disk Packet Disk Op: Write Start Sector: 493968 Num Sectors: 16 Data:. Filesystem Op Type: Content Write MFT Record: 1349 Filename: C:\foo\bar Disk to Record Mapping Sector Master File Table (MFT) Record 493968 1349 $MFT Record Attributes 1349 $FILE_NAME: bar, etc. Live Disk Forensics - 39

File System Reconstruction: NTFS Problem Sleuthkit was not made with incremental updates in mind Naïve solution of re-parsing the disk after updates is very slow Solution Only parse minimal information required to update given file system Drawback Optimizations are file system specific E.g. Only monitor MFT updates in NTFS Live Disk Forensics - 40

File System Reconstruction: NTFS Current Solution Utilizes PyTSK to keep a unified codebase in Python Props to Joachim, Michael, et al. for the awesome work! Utilizes AnalyzeMFT to parse individual MFT entries Props to David Kovar, bug fixes are on their way! Implementation MFT modification Diff previous MFT entry with new MFT entry Update internal caching structures Report changes Non-MFT Report if sector is associated with a run of a know MFT structure Otherwise report as unknown to be resolved later Live Disk Forensics - 41

File System Reconstruction Currently have a stable mostly-optimized implementation for NTFS Could still reduce memory footprint Want to push AnalyzeMFT-like functionality into TSK Working on expanding to other file systems Need to identify all of the potential regions that update the underlying structure per file system In the process of pushing the code out to the community to solicit feedback Live Disk Forensics - 42

Analysis Multiple layers of abstraction that we must bridge Analog Signal à Raw bits } Xilinx ML507 Raw bits à SATA Frames SATA Frames à Sector manipulation SATA Reconstruction Sector manipulation à File System Manipulation TSK & analyzemft SATA Reconstruction File System Reconstruction 2. Semantic 1. Data Collection 3. Analysis Reconstruction Live Disk Forensics - 43

Analysis Analysis step is application-dependent and open to the user Flexible and easy to use API Example uses: Simple filtering on specific files or disk regions (e.g. /bootmgr) Detect writes to slack space Feature extraction and machine learning for malware analysis 2. Semantic 1. Data Collection 3. Analysis Reconstruction Live Disk Forensics - 44

Analysis We are currently using our framework to detect VM-aware malware Results and future publication pending... However, we foresee there being numerous use cases that we have not yet thought of Live Disk Forensics - 45

Agenda Overview Motivation Architecture Live Disk Forensics Summary Future Directions Live Disk Forensics - 46

Advantages Less divergence from real environments Introspection at the hardware level (difficult to subvert from software) Ability to instrument proprietary, legacy, or embedded systems that can t be virtualized Open and flexible framework LO-PH Live Disk Forensics - 47

Summary Developed an instrumentation suite for both physical and virtual machines Showed that this instrumentation is capable of collecting complete real-time data with minimal artifacts Adapted popular forensics tools to bridge the semantic gap in real-time on live systems Provides entire instrumentation suite so that researchers can focus on higher-level problems Live Disk Forensics - 48

What s Next? Process introspection / zero-artifact debugging main() Function1(1,2,3) Function2(2)... FunctionN(X) Probabilistic/Zero-artifact breakpoints Live Disk Forensics - 49

Questions? LOPHI@ll.mit.edu Live Disk Forensics - 50