Polyglot: Automatic Extraction of Protocol Message Format using Dynamic Binary Analysis

Similar documents
Automatic Network Protocol Analysis

Automatic Network Protocol Analysis

Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation

HTTP Response Splitting

Research of Web Real-Time Communication Based on Web Socket

Package httprequest. R topics documented: February 20, 2015

Anatomy of a Pass-Back-Attack: Intercepting Authentication Credentials Stored in Multifunction Printers

Security-Assessment.com White Paper Leveraging XSRF with Apache Web Server Compatibility with older browser feature and Java Applet

Barracuda Networks Web Application Firewall

Internet Technologies Internet Protocols and Services

CS640: Introduction to Computer Networks. Applications FTP: The File Transfer Protocol

CS 5480/6480: Computer Networks Spring 2012 Homework 1 Solutions Due by 9:00 AM MT on January 31 st 2012

GET /FB/index.html HTTP/1.1 Host: lmi32.cnam.fr

Alteon Browser-Smart Load Balancing

All You Can Eat Realtime

ivoyeur: permission to parse

Solution of Exercise Sheet 5

Instruction Set Architecture (ISA)

Binary Code Extraction and Interface Identification for Security Applications

The Bro Network Intrusion Detection System

LBL Application Availability Infrastructure Unified Secure Reverse Proxy

Playing with Web Application Firewalls

IseHarvest: TCP packet data re-assembler framework for network traffic content

GS-AN045 S2W UDP, TCP, HTTP CONNECTION MANAGEMENT EXAMPLES

Hypertext for Hyper Techs

THE PROXY SERVER 1 1 PURPOSE 3 2 USAGE EXAMPLES 4 3 STARTING THE PROXY SERVER 5 4 READING THE LOG 6

People Data and the Web Forms and CGI CGI. Facilitating interactive web applications

HTTP/2: Operable and Performant. Mark

Effiziente Filter gegen Kinderpornos und andere Internetinhalte. Lukas Grunwald DN-Systems GmbH CeBIT Heise Forum 2010 Hannover

HTTP. Internet Engineering. Fall Bahador Bakhshi CE & IT Department, Amirkabir University of Technology

Project #2. CSE 123b Communications Software. HTTP Messages. HTTP Basics. HTTP Request. HTTP Request. Spring Four parts

Understanding Slow Start

Chapter 14 Analyzing Network Traffic. Ed Crowley

Using TestLogServer for Web Security Troubleshooting

Load balancing Microsoft IAG

TCP/IP Networking An Example

Exam 1 Review Questions

Large-Scale TCP Packet Flow Analysis for Common Protocols Using Apache Hadoop

Information Extraction Art of Testing Network Peripheral Devices

The Hyper-Text Transfer Protocol (HTTP)

Caml Virtual Machine File & data formats Document version: 1.4

Grammar and Model Extraction for Security Applications using Dynamic Program Binary Analysis

The Open Source Bro IDS Overview and Recent Developments

Computer Networks/DV2 Lab

BCR Export Protocol SHAPE 2012

Intrusion Detection System

HTTP Authentifizierung

International Journal of Engineering & Technology IJET-IJENS Vol:14 No:06 44

Intrusion Detection Systems (IDS)

1. When will an IP process drop a datagram? 2. When will an IP process fragment a datagram? 3. When will a TCP process drop a segment?

Developing Applications With The Web Server Gateway Interface. James Gardner EuroPython 3 rd July

Default configuration for the Workstation service and the Server service

Hands-on Network Traffic Analysis Cyber Defense Boot Camp

Announcements. Lab 2 now on web site

ICSA Labs Web Application Firewall Certification Testing Report Web Application Firewall - Version 2.1 (Corrected) Radware Inc. AppWall V5.6.4.

Configuring Health Monitoring

Web applications. Web security: web basics. HTTP requests. URLs. GET request. Myrto Arapinis School of Informatics University of Edinburgh

Lecture 2-ter. 2. A communication example Managing a HTTP v1.0 connection. G.Bianchi, G.Neglia, V.Mancuso

Owner of the content within this article is Written by Marc Grote

Network Traffic Evolution. Prof. Anja Feldmann, Ph.D. Dr. Steve Uhlig

Application Note. Introduction AN2471/D 3/2003. PC Master Software Communication Protocol Specification

Using SAML for Single Sign-On in the SOA Software Platform

Arnaud Becart ip- label 11/9/11

Web Application Forensics:

Protocol-Level Evasion of Web Application Firewalls

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Computer Networks/DV2 Lab

Defcon AppSecure. Network Application Firewalls: Exploits and Defense. Brad Woodberg, Juniper Networks

UNIX Sockets. COS 461 Precept 1

Improving Signature Testing Through Dynamic Data Flow Analysis

*[Bug hunting ] Jose Miguel Esparza 7th November 2007 Pamplona S21sec

Packet Matching. Paul Offord, Advance7

Savera Tanwir. Internet Protocol

HTTP Caching & Cache-Busting for Content Publishers

IP - The Internet Protocol

Exercise 7 Network Forensics

API of DNS hosting. For DNS-master and Secondary services Table of contents

Playing with Web Application Firewalls

Remote System Monitor for Linux [RSML]: A Perspective

KB Windows 2000 DNS Event Messages 1 Through 1614

Domain Name System WWW. Application Layer. Mahalingam Ramkumar Mississippi State University, MS. September 15, 2014.

Automatic Logging of Operating System Effects to Guide Application-Level Architecture Simulation

Protocolo HTTP. Web and HTTP. HTTP overview. HTTP overview

(Refer Slide Time: 00:01:16 min)

Networks and Security Lab. Network Forensics

WIZnet S2E (Serial-to-Ethernet) Device s Configuration Tool Programming Guide

Cyber Security Workshop Ethical Web Hacking

Final exam review, Fall 2005 FSU (CIS-5357) Network Security

8.2 The Internet Protocol

Developing Secure Mobile Apps

StreamServe Persuasion SP4 Service Broker

reference: HTTP: The Definitive Guide by David Gourley and Brian Totty (O Reilly, 2002)

Dynamic Spyware Analysis

Internet Firewall CSIS Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS net15 1. Routers can implement packet filtering

Dynamic Spyware Analysis

Cyber Security Scan Report

Application-layer Protocols and Internet Services

How do I get to

Advanced Internetworking

Acunetix Website Audit. 5 November, Developer Report. Generated by Acunetix WVS Reporter (v8.0 Build )

Transcription:

Polyglot: Automatic Extraction of Protocol Message Format using Dynamic Binary Analysis Juan Caballero, Heng Yin, Zhenkai Liang Carnegie Mellon University Dawn Song Carnegie Mellon University & UC Berkeley October 2007 1

Protocol reverse-engineering Process of extracting the application-level protocol used by a program, without access to the protocol specification Encompasses extracting: the Protocol Message Format the Protocol State Machine Challenges: No protocol specification» Many closed application protocols Automatic» Samba, ICQ, Yahoo Messenger took years This paper is about the Protocol Message Format 2

Outline Introduction Techniques Direction field extraction Separator extraction Keyword extraction Multi-byte fixed length fields Evaluation 3

Automatic message format extraction Extract the Protocol Message Format, one message at a time Problem: Automatically extract the message format from an unknown message given a program binary implementing the protocol Message Format = field sequence Field: smallest contiguous sequence of application data with meaning Use dynamic binary analysis to extract message format 4

Message Format Applications Extracting the message format is important for multiple problems: Protocol analyzers Application dialog replay Intrusion detection Fingerprint generation Detecting services running on non-standard ports 5

Challenges Main challenge is to identify the field boundaries: 1. Variable-length fields boundaries Extract the direction fields (e.g. length fields) Extract the separators 2. Fixed-length fields boundaries Avoid concatenating two consecutive fields Avoid splitting a single field into multiple ones Another important challenge is to: 3. Find the keywords present in the message Allow to map traffic to the protocol We propose techniques to address these challenges 6

Related work Previous work focuses on network traces Fundamental limitation: lack of protocol semantics Protocol Message Format: [Cui06,Cui07,Leita05,Leita06] Need complex heuristics to identify direction fields Need assumptions about separator values & data encoding Cannot identify consecutive fixed-length fields Keywords / Application Signatures: [Haffner05,Ma06,Cui07] Need tokens to appear at same position across messages Problems: variable-length fields, floating fields Some previous work using binary analysis: [Lim06] Assumes knowledge of program s output functions 7

Approach Shadowing: Monitor how a binary implementing the protocol processes a received message How the message is parsed How the message is used 1 0 0 1 1 0 1 0 1? Fixed Fixed Length Variable 8

Polyglot: system architecture Program binary Message Execution Monitor Execution trace Separator extraction Direction field extraction Separators Direction fields Keyword extraction Keywords Message format extraction Message format Execution Monitoring Analysis Input: program binary and received message Output: message format Two phases: Execution Monitoring generate execution trace Analysis extract the message format 9

Execution monitoring phase Run the binary inside an emulator Implements dynamic taint analysis [Chow04,Suh04,Costa05, Newsome05, Crandall06] Received network data marked as interesting Marking propagates as data is used Marking represents offsets in input stream Outputs an execution trace which contains Executed instructions Operand content during execution Taint information 10

Outline Introduction Techniques Direction field extraction Separator extraction Keyword extraction Multi-byte fixed length fields Evaluation 11

Direction fields Direction Fields: store information about the location of another target field in the message Target field has variable-length Three types of direction fields Length fields» Encode length of target field Pointer fields» Encode displacement relative to another position Counter fields» Encode position of field in a list of items 12

Direction field extraction 12 13 14 15 16 17 18 19 20 Length field Variable-length field Fixed-length field Direction field Target Field Detect direction fields when used to increment the value of a pointer Two methods to increment the pointer: Direct: increment computed from field and added to pointer» Indirect memory access that 1) accesses a tainted memory position, and 2) where the destination address has been computed from tainted data Indirect: increment computed using a loop» Need to find loops and check if they have tainted conditions 13

Direction field extraction No assumptions about field encoding used» number of bytes/words/quadwords,» integer / string Variable-length fields: Direction fields need to appear before variable-length fields Variable-length field: the sequence of bytes between the last position belonging to the direction field and the end of the target field 14

Outline Introduction Techniques Direction field extraction Separator extraction Keyword extraction Multi-byte fixed length fields Evaluation 15

Separators Positions 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 G E T / H T T P / 1. 1 \r \n Method Sep. URL Sep. Version Separator Used by protocols to mark the boundary of variable-length fields Separator = Constant + Scope Constant marks the boundary Scope contains list of positions in the message where the separator is used Two separator types depending on scope: Field, In-fields 16

Separator extraction Program compares the separator value against all positions in the separator scope until it finds it Intuition: find tokens that are compared against consecutive positions in the received data Three step process: 1. Summarize tainted comparisons (comparison table) 2. Extract byte-long separators 3. Extend byte-long separators into multi-byte separators Tokens 0x0a (\n) 0x47 (G) 0x45 (E) 0x54 (T) 0x2f (/) 0 F T Offset Positions 1 F T 2 F T Comparison table No assumptions about separator value or encoding 3 F 4 F T 5 F F 6 F F 17

Outline Introduction Techniques Direction field extraction Separator extraction Keyword extraction Multi-byte fixed length fields Evaluation 18

Keywords Protocol constants that: appear in the received application data are supported by the implementation Keywords can be strings or numbers Content-type in HTTP Version number Useful to match traffic to protocols 19

Keyword extraction Program compares the supported keywords against the received data Intuition: analyze true comparisons that include tainted data Two phases Populate the comparison table Extract true comparisons at consecutive positions Tokens 0x0a (\n) 0x47 (G) 0x45 (E) 0x54 (T) 0x2f (/) 0 F T Offset Positions 1 2 3 4 5 F F F F F T T T F 6 F F» Break keywords at false comparisons or separators Comparison table 20

Outline Introduction Techniques Direction field extraction Separator extraction Keyword extraction Multi-byte fixed length fields Evaluation 21

Multi-byte fixed length fields Intuition: program operates on the entire field All field bytes need to be considered for comparisons, arithmetic operations Most fields are independent Exceptions: direction fields, checksums Detection: for each instruction extract the list of positions it operates on If not direction field, then create a new multi-byte fixed field Limitation: can only extract fields up to the system s word size 4 bytes in 32-bit architectures But, still better than previous work that separates each byte 22

Outline Introduction Techniques Direction field extraction Separator extraction Keyword extraction Multi-byte fixed length fields Evaluation 23

Evaluation overview Evaluated on 11 programs and 5 protocols: HTTP, DNS, IRC, ICQ, Samba Clients & Servers Windows & Linux Text & Binary Results compared to Wireshark Program Version Type Size OS Apache Miniweb Savant Bind MaraDNS SimpleDNS 2.2.4 0.8.1 3.1 9.3.4 1.2.12.4 4.00.06 HTTP server HTTP server HTTP server DNS server DNS server DNS server 4,344kB 528kB 280kB 224kB 164kB 432kB Win. Win. Win. Win. Win. Win. TinyICQ 1.2 ICQ client 11kB Win. Beware ircd JoinMe Unreal IRCd 1.5.7 1.41 3.2.6 IRC server IRC server IRC server 148kB 365kB 760kB Win. Win. Win. Sambad 3.0.24 Samba server 3,580kB Linux Only Samba, HTTP results presented for brevity 24

Samba Requested Dialecs NetBios Samba results Wireshark 0.99.5 Polyglot Wireshark 0.99.5 Polyglot 0 1 4 8 9 13 14 16 18 26 28 30 32 34 36 37 39 Message Type: Fixed Length: Length Server Component: Fixed Command: Fixed NT Status: Fixed Flags: Fixed Flags2: Fixed Process ID High: Fixed Signature: Fixed Reserved: Fixed Tree ID: Fixed Process ID: Fixed User ID: Fixed Multiplex ID: Fixed Word Count: Length Byte Count: Length Requested Dialects: Variable [144] Fixed Fixed Fixed Fixed Unused Fixed Fixed Unused Fixed Unused Direction Requested Dialects: Variable [144] 39 40 63 64 88 89 112 113 123 124 134 135 149 150 156 157 171 172 182 Buffer Format: Fixed Name: Variable [23] Buffer Format: Fixed Name: Variable [24] Buffer Format: Fixed Name: Variable [23] Buffer Format: Fixed Name: Variable [10] Buffer Format: Fixed Name: Variable [10] Buffer Format: Fixed Name: Variable [14] Buffer Format: Fixed Name: Variable [6] Buffer Format: Fixed Name: Variable [14] Buffer Format: Fixed Name: Variable [11] Unused Variable [22] Separator Unused Name: Variable [23] Separator Unused Name: Variable [22] Separator Unused Name: Variable [9] Separator Unused Name: Variable [9] Separator Unused Name: Variable [13] Separator Unused Name: Variable [5] Separator Unused Name: Variable [13] Separator Unused Name: Variable [10] Separator Negotiate Protocol request 25

HTTP separators Separator Apache Savant Miniweb 0x0d0a ( \r\n ) 0x2f ( / ) 0x2e (. ) 0x20 ( ) 0x3a20 ( : ) Field In-field In-field - In-field Field In-field In-field In-field - Field - In-field In-field - GET /index.html HTTP/1.1 Host: 10.0.0.1 User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-us; rv:1.8.1.1) Gecko/20061208 Firefox/2.0.0.1 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: Keep-Alive 26

HTTP keywords Keyword Apache Savant Miniweb Server Add. keywords GET Host User-Agent Accept Accept-Language Accept-Encoding Accept-Charset Keep-Alive Connection Yes Yes NS Yes Accept Accept- Accept- NS Yes Yes NS Yes Yes Yes Accept- Accept- NS Yes Yes Yes NS NS NS NS NS NS Yes Apache Savant Miniweb - HTTP/, e, Keep-Alive HTTP/1., Keep-Alive, : GET /index.html HTTP/1.1 Host: 10.0.0.1 User-Agent: Mozilla/5.0 [ ] Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: Keep-Alive 27

Conclusion Proposed a new approach for extracting the protocol message format using dynamic binary analysis Proposed new techniques for: 1. detecting direction fields 2. detecting separators 3. detecting multi-byte fixed-length fields 4. extracting protocol keywords Evaluated techniques on 5 protocols and 11 programs. Results compared to Wireshark 28

Questions? 29