On the Use of Compression Algorithms for Network Traffic Classification



Similar documents
Multimedia Systems WS 2010/2011

Storage Optimization in Cloud Environment using Compression Algorithm

Image Compression through DCT and Huffman Coding Technique

Compression techniques

Statistical Approaches for Network Anomaly Detection

Information, Entropy, and Coding

Wan Accelerators: Optimizing Network Traffic with Compression. Bartosz Agas, Marvin Germar & Christopher Tran

Lempel-Ziv Coding Adaptive Dictionary Compression Algorithm

LZ77. Example 2.10: Let T = badadadabaab and assume d max and l max are large. phrase b a d adadab aa b

CHAPTER 2 LITERATURE REVIEW

Arithmetic Coding: Introduction

Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework

Stochastic Protocol Modeling for Anomaly-Based Network Intrusion Detection

Analysis of Compression Algorithms for Program Data

A Perfect CRIME? TIME Will Tell. Tal Be ery, Web research TL

Unified Language for Network Security Policy Implementation

Data Reduction: Deduplication and Compression. Danny Harnik IBM Haifa Research Labs

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Firewalls. Test your Firewall knowledge. Test your Firewall knowledge (cont) (March 4, 2015)

Development of a Network Intrusion Detection System

Data Mining Un-Compressed Images from cloud with Clustering Compression technique using Lempel-Ziv-Welch

Probability Interval Partitioning Entropy Codes

THE SECURITY AND PRIVACY ISSUES OF RFID SYSTEM

HMM Profiles for Network Traffic Classification

Network Monitoring Tool to Identify Malware Infected Computers

FUNDAMENTALS of INFORMATION THEORY and CODING DESIGN

Firewall. IPTables and its use in a realistic scenario. José Bateira ei10133 Pedro Cunha ei05064 Pedro Grilo ei09137 FEUP MIEIC SSIN

Hands-on Network Traffic Analysis Cyber Defense Boot Camp

Gambling and Data Compression

Firewalls. Basic Firewall Concept. Why firewalls? Firewall goals. Two Separable Topics. Firewall Design & Architecture Issues

Entropy and Mutual Information

Firewalls. Firewalls. Idea: separate local network from the Internet 2/24/15. Intranet DMZ. Trusted hosts and networks. Firewall.

Solution of Exercise Sheet 5

Defending Computer Networks Lecture 6: TCP and Scanning. Stuart Staniford Adjunct Professor of Computer Science

Port Scanning. Objectives. Introduction: Port Scanning. 1. Introduce the techniques of port scanning. 2. Use port scanning audit tools such as Nmap.

International Journal of Advanced Research in Computer Science and Software Engineering

Structures for Data Compression Responsible persons: Claudia Dolci, Dante Salvini, Michael Schrattner, Robert Weibel

Class Notes CS Creating and Using a Huffman Code. Ref: Weiss, page 433

Searching BWT compressed text with the Boyer-Moore algorithm and binary search

Firewall Firewall August, 2003

Ethernet. Ethernet. Network Devices

Configuring Health Monitoring

NetFlow/IPFIX Various Thoughts

Streaming Lossless Data Compression Algorithm (SLDC)

Reading.. IMAGE COMPRESSION- I IMAGE COMPRESSION. Image compression. Data Redundancy. Lossy vs Lossless Compression. Chapter 8.

zdelta: An Efficient Delta Compression Tool

Digitisation Disposal Policy Toolkit

Comparison of different image compression formats. ECE 533 Project Report Paula Aguilera

Network TrafficBehaviorAnalysisby Decomposition into Control and Data Planes

EXPLORER. TFT Filter CONFIGURATION

TCP/IP and the Internet

, SNMP, Securing the Web: SSL

2- Electronic Mail (SMTP), File Transfer (FTP), & Remote Logging (TELNET)

encoding compression encryption

Load Balancing and Sessions. C. Kopparapu, Load Balancing Servers, Firewalls and Caches. Wiley, 2002.

FILE TRANSFER PROTOCOL INTRODUCTION TO FTP, THE INTERNET'S STANDARD FILE TRANSFER PROTOCOL

Chapter 2 Quality of Service (QoS)

Selected Topics of IT Security ( ) Seminar description

Cisco Configuring Commonly Used IP ACLs

CIT 380: Securing Computer Systems

Intrusion Detection System using Hidden Markov Model (HMM)

Networking Test 4 Study Guide

Working with Snort Rules

Chapter 3 Using Access Control Lists (ACLs)

Remote login (Telnet):

Carrier/WAN SDN Brocade Flow Optimizer Making SDN Consumable

Internet Worm Classification and Detection using Data Mining Techniques

An Efficient and Reliable DDoS Attack Detection Using a Fast Entropy Computation Method

Attack and Defense Techniques

Secure Network Access System (SNAS) Indigenous Next Generation Network Security Solutions

CSE331: Introduction to Networks and Security. Lecture 12 Fall 2006

Web Document Clustering

CS155 - Firewalls. Simon Cooper <sc@sgi.com> CS155 Firewalls 22 May 2003

Overview. Securing TCP/IP. Introduction to TCP/IP (cont d) Introduction to TCP/IP

DDoS Protection Technology White Paper

Intrusion Detection System Based Network Using SNORT Signatures And WINPCAP

ANALYSIS AND EFFICIENCY OF ERROR FREE COMPRESSION ALGORITHM FOR MEDICAL IMAGE

Copyright. Network and Protocol Simulation. What is simulation? What is simulation? What is simulation? What is simulation?

Software Engineering and Service Design: courses in ITMO University

First Workshop on Open Source and Internet Technology for Scientific Environment: with case studies from Environmental Monitoring

How to Send Video Images Through Internet

INTRUSION DETECTION SYSTEM FOR WEB APPLICATIONS WITH ATTACK CLASSIFICATION

Network Security CS 192

We will give some overview of firewalls. Figure 1 explains the position of a firewall. Figure 1: A Firewall

Compressing Forwarding Tables for Datacenter Scalability

CS5008: Internet Computing

Transport Layer Protocols

8. 網路流量管理 Network Traffic Management

Problem Set 1. Problem 1: Information (2 points)

In-the-Dark Network Traffic Classification Using Support Vector Machines

Key Components of WAN Optimization Controller Functionality

ICOM : Computer Networks Chapter 6: The Transport Layer. By Dr Yi Qian Department of Electronic and Computer Engineering Fall 2006 UPRM

SonicOS 5.9 One Touch Configuration Guide

Adaptive String Dictionary Compression in In-Memory Column-Store Database Systems

DATA VERIFICATION IN ETL PROCESSES

Binary Trees and Huffman Encoding Binary Search Trees

TELEMETRY NETWORK INTRUSION DETECTION SYSTEM

Hybrid Lossless Compression Method For Binary Images

Fuzzy Network Profiling for Intrusion Detection

Transcription:

On the Use of for Network Traffic Classification Christian CALLEGARI Department of Information Ingeneering University of Pisa 23 September 2008 COST-TMA Meeting Samos, Greece

Outline Outline 1 Introduction Motivations Theoretical Background 2 Lempel-Ziv-Welch Huffman Dynamic Markov Compression 3 4 Data-Set Results C. Callegari and Traffic Classification 2 / 17

Motivations Introduction Motivations Theoretical Background Language Classification Language trees and zipping D. Benedetto, E. Caglioti, and V. Loreto Physical Review Letters, January 2002 Traffic Classification based on the TCP flags A Markovian signature-based approach to IP traffic classification H. Dahmouni, S. Vaton, D. Rossé Proceedings of the 3rd annual ACM workshop on Mining network data, 2006 C. Callegari and Traffic Classification 3 / 17

Introduction Theoretical Background Motivations Theoretical Background Entropy The entropy H of a discrete random variable X is a measure of the amount of uncertainty associated with the value of X Referring to an alphabet composed of n distinct symbols, respectively associated to a probability p i, then The starting point H = n p i log 2 p i bit/symbol i=1 The entropy represents a lower bound to the compression rate that we can obtain: the more redundant the data are and the better we can compress them. C. Callegari and Traffic Classification 4 / 17

Introduction LZW Huffman DMC Dictionary based algorithms: based on the use of a dictionary, which can be static or dynamic, and they code each symbol or group of symbols with an element of the dictionary Lempel-Ziv-Welch Model based algorithms: each symbol or group of symbols is encoded with a variable length code, according to some probability distribution. Huffman Dynamic Markov Compression C. Callegari and Traffic Classification 5 / 17

Lempel-Ziv-Welch Introduction LZW Huffman DMC created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm, published by Lempel and Ziv in 1978 universal adaptative 1 lossless data compression algorithm builds a translation table (also called dictionary) from the text being compressed the string translation table maps the message strings to fixed-length codes 1 The coding scheme used for the k th character of a message is based on the characteristics of the preceding k 1 characters in the message C. Callegari and Traffic Classification 6 / 17

Huffman Introduction LZW Huffman DMC developed by Huffman (1952) based on the use of a variable-length code table for encoding each source symbol the variable-length code table is derived from a binary tree built from the estimated probability of occurrence for each possible value of the source symbols prefix-free code 2 that expresses the most common characters using shorter strings of bits than are used for less common source symbols 2 The bit string representing some particular symbol is never a prefix of the bit string representing any other symbol C. Callegari and Traffic Classification 7 / 17

Introduction LZW Huffman DMC Dynamic Markov Compression developed by Gordon Cormack and Nigel Horspool (1987) adaptative lossless data compression algorithm based on the modelization of the binary source to be encoded by means of a Markov chain, which describes the transition probabilities between the symbol 0 and the symbol 1 the built model is used to predict the future bit of a message. The predicted bit is then coded using arithmetic coding C. Callegari and Traffic Classification 8 / 17

Introduction Input the system input is given by raw traffic traces in libpcap format the 5-tuple is used to identify a connection, while the value of the TCP flags is used to build the profile a value s i is associated to each packet: s i = SYN +2 ACK +4 PSH +8 RST +16 URG +32 FIN thus each mono-directional connection is represented by a sequence of symbols s i, which are integers in {0, 1,, 63} C. Callegari and Traffic Classification 9 / 17

Introduction Training Phase choose one of the three previously described algorithms (Huffman, DMC, or LZW) the compression algorithms have been modified so as that the learning phase is stopped after the training phase: Huffman case: the occurency frequency of each symbol is estimated only on the training dataset DMC case: the estimation of the Markov chain is only updated during the training phase LZW case: the construction of the dictionary is stopped after the training phase classification performed with a compression scheme that is optimal for the application used for building the considered profile and suboptimal for the others C. Callegari and Traffic Classification 10 / 17

Introduction Classification append each distinct observed connection b, to the training sequence A i of the application i compute the compression rate per symbol : L i = dim([a i b] ) dim([a i ] ) Length(b) (1) where [X] represents the compressed version of X choose argmin i (L i ) (2) C. Callegari and Traffic Classification 11 / 17

Data-Set Introduction Data-Set Results Data-Set 1 The 1999 DARPA/MIT IDS evaluation program it provides a corpus of data, that model the network traffic measured between a US Air Force base and the Internet 5 weeks data (several thousands connections per application) week 1: used for training week 3: used for classification Considered applications (several thousands connections per application): FTP, SSH, SMTP, and HTTP C. Callegari and Traffic Classification 12 / 17

Data-Set Introduction Data-Set Results Data-Set 2 Corpus of data collected in the TLC Net Group Laboratory- University of Pisa Considered applications (four hundred connections per application): FTP, SSH, SMTP, HTTP, and HTTPs Data-Set 3 Corpus of data provided by the italian research project (PRIN) RECIPE Considered applications (several thousands connections per application): POP3, SMTP, and HTTP C. Callegari and Traffic Classification 13 / 17

Results Introduction Data-Set Results LZW DMC Huffman D-1 D-2 D-3 D-1 D-2 D-3 D-1 D-2 D-3 FTP 100% 70% - 100% 0% - 100% 100% - SSH 95% 100% - 0% 100% - 50% 97% - SMTP 94% 60% 96% 100% 99% - 98% 70% 100 HTTP 95% 73% 97% 100% 76% - 83% 45% 52% HTTPS - 32% - - 33% - - 35% - POP3 - - 98% - - - - - 100% C. Callegari and Traffic Classification 14 / 17

Introduction Results 2: some more details Data-Set Results Huffman HTTP POP3 SMTP HTTP 53% 47% 0% HTTP nom 36% 64% 0% POP3 0% 100% 0% POP3 nom 0% 100% 0% SMTP 0% 0% 100% SMTP nom 0% 0% 100% LZW HTTP POP3 SMTP HTTP 96% 3.5% 0.5% HTTP nom 97% 3% 0% POP3 0% 98% 2% POP3 nom 1% 95% 4% SMTP 1% 3% 96% SMTP nom 0% 0% 100% C. Callegari and Traffic Classification 15 / 17

Conclusions Future Works Future Works More applications Background traffic Combine several statistical methods (e.g., compression + traffic descriptor statistics)... Application to the anomaly detection C. Callegari and Traffic Classification 16 / 17

Conclusions Future Works Thank You for your attention Any Question? C. Callegari and Traffic Classification 17 / 17