Automatic Network Protocol Analysis

Similar documents
Automatic Network Protocol Analysis

Polyglot: Automatic Extraction of Protocol Message Format using Dynamic Binary Analysis

Internet Firewall CSIS Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS net15 1. Routers can implement packet filtering

The Advantages of Block-Based Protocol Analysis for Security Testing

Network Protocol Analysis using Bioinformatics Algorithms

Symbol Tables. Introduction

Understand Names Resolution

How do I get to

Network Security: Workshop

Understanding Slow Start

Efficient Program Exploration by Input Fuzzing

Big Data and Scripting. Part 4: Memory Hierarchies

Attacking Obfuscated Code with IDA Pro. Chris Eagle

Outline. Definition. Name spaces Name resolution Example: The Domain Name System Example: X.500, LDAP. Names, Identifiers and Addresses

Additional Information: A link to the conference website is available at:

Introduction. Figure 1 Schema of DarunGrim2

DHCP and DNS Protocols

PageR Enterprise Monitored Objects - AS/400-5

BITS-Pilani Hyderabad Campus CS C461/IS C461/CS F303/ IS F303 (Computer Networks) Laboratory 3

Exam 1 Review Questions

Internet Security [1] VU Engin Kirda

Configuring Health Monitoring

LabVIEW Internet Toolkit User Guide

SMALL INDEX LARGE INDEX (SILT)

The Application Layer. CS158a Chris Pollett May 9, 2007.

Software Fingerprinting for Automated Malicious Code Analysis

Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation

Installing and Setting up Microsoft DNS Server

Chapter 14 Analyzing Network Traffic. Ed Crowley

Motivation. Domain Name System (DNS) Flat Namespace. Hierarchical Namespace

Internetworking. Problem: There is more than one network (heterogeneity & scale)

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

Rotorcraft Health Management System (RHMS)

Modbus Protocol. PDF format version of the MODBUS Protocol. The original was found at:

DNS. Some advanced topics. Karst Koymans. (with Niels Sijm) Informatics Institute University of Amsterdam. (version 2.6, 2013/09/19 10:55:30)

Napster and Gnutella: a Comparison of two Popular Peer-to-Peer Protocols. Anthony J. Howe Supervisor: Dr. Mantis Cheng University of Victoria

Recovering Business Rules from Legacy Source Code for System Modernization

Mail 2 ZOS FTPSweeper

LASTLINE WHITEPAPER. The Holy Grail: Automatically Identifying Command and Control Connections from Bot Traffic

Communications and Networking

There are numerous ways to access monitors:

PROFESSIONAL. Node.js BUILDING JAVASCRIPT-BASED SCALABLE SOFTWARE. Pedro Teixeira WILEY. John Wiley & Sons, Inc.

TCP Session Management (SesM) Protocol Specification

Life of a Packet CS 640,

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

Naming. Name Service. Why Name Services? Mappings. and related concepts

Application Protocols in the TCP/IP Reference Model

Forensic Analysis of Internet Explorer Activity Files

Purpose What is EDI X EDI X12 standards and releases Trading Partner Requirements EDI X12 Dissected... 3

VX Search File Search Solution. VX Search FILE SEARCH SOLUTION. User Manual. Version 8.2. Jan Flexense Ltd.

webmethods Certificate Toolkit

Automating Linux Malware Analysis Using Limon Sandbox Monnappa K A monnappa22@gmail.com

Hillstone T-Series Intelligent Next-Generation Firewall Whitepaper: Abnormal Behavior Analysis

Virtual private network. Network security protocols VPN VPN. Instead of a dedicated data link Packets securely sent over a shared network Internet VPN

Domain Name Resolver (DNR) Configuration

Electronic Mail

Hands-on Network Traffic Analysis Cyber Defense Boot Camp

Deploying System Center 2012 R2 Configuration Manager

CSE331: Introduction to Networks and Security. Lecture 12 Fall 2006

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 20: Stack Frames 7 March 08

Network (Tree) Topology Inference Based on Prüfer Sequence

Application Protocols in the TCP/IP Reference Model. Application Protocols in the TCP/IP Reference Model. DNS - Concept. DNS - Domain Name System

Fuzzing in Microsoft and FuzzGuru framework

Dynamic Spyware Analysis

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Analog Monitoring Tool AMT 0.3b User Manual

Lecture 1: Data Storage & Index

Lecture 9. Semantic Analysis Scoping and Symbol Table

Module: Software Instruction Scheduling Part I

Web Forensic Evidence of SQL Injection Analysis

First Workshop on Open Source and Internet Technology for Scientific Environment: with case studies from Environmental Monitoring

CS 356 Lecture 27 Internet Security Protocols. Spring 2013

Key Components of WAN Optimization Controller Functionality

Hydra. Advanced x86 polymorphic engine. Incorporates existing techniques and introduces new ones in one package. All but one feature OS-independent

Networking Domain Name System

DNS and issues in connecting UNINET-ZA to the Internet

Configuring a Load-Balancing Scheme

CSE 473 Introduction to Computer Networks. Exam 2 Solutions. Your name: 10/31/2013

Dynamic Spyware Analysis

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6)

Loop-Extended Symbolic Execution on Binary Programs

ScriptGen: an automated script generation tool for honeyd

Efficient Addressing. Outline. Addressing Subnetting Supernetting CS 640 1

Application Protocols for TCP/IP Administration

Implementation of Botcatch for Identifying Bot Infected Hosts

Understanding DNS (the Domain Name System)

Distributed Systems. 09. Naming. Paul Krzyzanowski. Rutgers University. Fall 2015

A Tiny Queuing System for Blast Servers

Virtual Integrated Design Getting started with RS232 Hex Com Tool v6.0

FTP Service Reference

Transcription:

Gilbert Wondracek, Paolo M ilani C omparetti, C hristopher Kruegel and E ngin Kirda {gilbert,pmilani}@ seclab.tuwien.ac.at chris@ cs.ucsb.edu engin.kirda@ eurecom.fr

Reverse Engineering Network Protocols Find out what application-layer language is spoken by a server implementation Message formats Protocol state machine Slow manual process Do it automatically!

Reverse Engineering Network Protocols: Security Applications Black-box fuzzing Deep packet inspection Intrusion detection Reveal differences in server implementations server fingerprinting testing/auditing

Reverse Engineering Network Protocols: Sources of Information Network traces limited information (no semantics) Server binaries static analysis dynamic analysis

Our approach Mostly dynamic analysis (+ static analysis) Use dynamic taint analysis to observe the data flow Observe how the program processes (parses) input messages Analyze individual messages Generalize to a message format for messages of a given type (i.e. HTTP get, NFS lookup..) Classification of messages into types is currently done manually

Dynamic taint analysis environment server client Execution trace Execution traces for individual messages analysis Tree of fields alignment/ generalization? or Message format

Dynamic Taint Analysis Run unmodified binary in a monitored environment (based on qemu, valgrind, ptrace..) Assign a unique label to each byte of network input Propagate the labels in shadow memory for each instruction, assign labels of input to output destinations also track address dependencies (example: lookup table-based toupper() function)

Label Input: G E T / H T T P / 1. 0 \r \n \r \n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Propagate Labels: EAX c G E BL G 0 push %esi push %ebx mov (%eax),%bl sub $0x1,%ecx 0 1 Tainted data affects program flow: Is (something derived from) byte 0 equal to '\n'? cmp $0x0a,%bl je 93

Message Format Analysis Structure-forming semantics enough information to parse a message out of a network data flow variation between messages Additional semantics keywords, file names, session ids,..

Structure-Forming Semantics Length fields and corresponding target fields, padding Delimiter fields and corresponding scope fields Hierarchical structure

Detecting Length Fields (1/2) Length fields are used to control a loop over input data Leverage static analysis to detect loops Look for loops where an exit condition tests the same taint labels on every iteration Need at least 2 iterations

Detecting Length Fields (2/2) The tricky part is detecting the target field! Look at labels touched inside length loop Remove labels touched in all iterations May need to merge multiple loops (example: memcpy uses 4- byte mov instructions, but may need to move 1-3 bytes individually) Some bytes may be unused

Detecting Delimiters Delimiter is one or more bytes that separate a field or message Observation: all bytes in the scope of the delimiter are compared against a part of the delimiter Delimiter field detection Create a list of taint labels used for comparisons for each byte value, merge consecutive labels into intervals Intervals indicate delimiter scope, nesting gives us a hierarchical structure recursive analysis to break up message

Additional Semantics Protocol keywords File names Echoed fields (session id,cookie,..) Pointers (to somewhere else in packet) Unused fields

Detecting Keywords A keyword is a sequence of (1 or 2 byte) characters which is tested against a constant value adjacent characters being successfully compared to non tainted values are merged into a string take delimiters into account Ideally, we would want to check it is being tested against values which are hard coded in binary trace taint from entire binary Currently, we just check the string (of at least 3 bytes) is present in the binary

Generalization (1/3) Message alignment Based on Needlman-Wunsch Extended to a hierarchy of fields

Needleman-Wunsch Dynamic programming algorithm for string alignment Computes alignment which minimizes edit distances Also provides edit path between the strings Scoring function (for match, mismatch, gap) Generalization (2/3) ABCDE alignment ABDF A B C D E A B C- D EF generalization A B C? C D E F E

Generalization (3/3) Hierarchical Needleman-Wunsch Operate on a tree of fields, not on a string of bytes To align two inner nodes (complex fields) recursively call NW on the sequence of child nodes To align two leaf nodes, take into account field semantics a length field only matches another length field a keyword only matches same exact keyword... Simple scoring function: +1 for match, -1 for mismatch or gap

Generalization: More Semantics Sets of keywords (i.e. keep-alive OR close..) Length field semantics encoding: endianess compute target field length T from length L: T=A*L+C Pointer field semantics encoding: endianess offset: relative or absolute offset value is A*L+C Repetitions generalize a? a? to a*

Evaluation 7 servers (apache,lighttpd,iacd,sendmail,bind,nfsd,samba) 6 protocols (http, irc, smtp, dns, nfs, smb) 14 message types ( http get irc nick, user smtp mail, helo, quit, dns IPv4 A query rpc/nfs lookup, getattr, create, write smb/cifs negotiate protocol request, session setup andx request, tree connect andx request

DNS A IPV4 query Session ID 2 bytes B000100000000 0000010001 Sequence Length 1 byte Target A=1,C=0 B: any byte T: any printable ascii byte 0001: constant byte values in hex T

HTTP GET line Scope ' ' (space) GET ' ' Scope '.' ' ' HTTP/1.1 Scope '/' '/' Filename *? T Sequence Sequence Delimiter '/' '.' Keyword T T

Parsing The message format allows us to produce a parser Successfully parses real-world messages of same type all structural information was successfully recovered Rejects negative examples different message types from same protocol hand-crafted negative examples

Related Work Network traces M. Beddoe. The Protocol Informatics Project. Toorcon 2004 C. Leita, K. Mermoud, M. Dacier. ScriptGen: An Automated Script Generation Tool for Honeyd. ACSAC 2005 W. Cui, V. Paxson, N. Weaver, R. Katz. Protocol-Independent Adaptive Replay of Application Dialog. NDSS 2006 W.Cui, J.Kannan,H.J.Wang: Discoverer: Automatic Protocol Reverse Engineering from Network Traces Static and dynamic analysis J. Newsome, D. Brumley, J. Franklin, and D. Song. Replayer: Automatic Protocol Replay by Binary Analysis. ACM CCS 2006. Dynamic taint analysis J. Caballero and D. Song. Polyglot: Automatic Extraction of Protocol Format using Dynamic Binary Analysis. ACM CCS 2007 Z. Lin, X. Jiang, D. Xu, and X. Zhang. Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution. NDSS 2008.

Conclusions Reverse engineer application layer network protocols Recover a message format Validate format by parsing real world messages Tested on common servers and protocols

Questions?