Classification of Network Traffic via Packet-Level Hidden Markov Models



Similar documents
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

An Interest-Oriented Network Evolution Mechanism for Online Communities

What is Candidate Sampling

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Recurrence. 1 Definitions and main statements

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Traffic State Estimation in the Traffic Management Center of Berlin

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

Stochastic Protocol Modeling for Anomaly Based Network Intrusion Detection

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification

A Secure Password-Authenticated Key Agreement Using Smart Cards

Forecasting the Direction and Strength of Stock Market Movement

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

An Empirical Study of Search Engine Advertising Effectiveness

Statistical Approach for Offline Handwritten Signature Verification

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

RequIn, a tool for fast web traffic inference

DEFINING %COMPLETE IN MICROSOFT PROJECT

How To Classfy Onlne Mesh Network Traffc Classfcaton And Onlna Wreless Mesh Network Traffic Onlnge Network

Network Security Situation Evaluation Method for Distributed Denial of Service

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

Relay Secrecy in Wireless Networks with Eavesdropper

A Passive Network Measurement-based Traffic Control Algorithm in Gateway of. P2P Systems

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

L10: Linear discriminants analysis

Single and multiple stage classifiers implementing logistic discrimination

Enabling P2P One-view Multi-party Video Conferencing

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Sketching Sampled Data Streams

The OC Curve of Attribute Acceptance Plans

An artificial Neural Network approach to monitor and diagnose multi-attribute quality control processes. S. T. A. Niaki*

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

Analysis of Energy-Conserving Access Protocols for Wireless Identification Networks

Efficient Project Portfolio as a tool for Enterprise Risk Management

Improved SVM in Cloud Computing Information Mining

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Politecnico di Torino. Porto Institutional Repository

Application of Multi-Agents for Fault Detection and Reconfiguration of Power Distribution Systems

A FEATURE SELECTION AGENT-BASED IDS

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

A Statistical Model for Detecting Abnormality in Static-Priority Scheduling Networks with Differentiated Services

EVALUATING THE PERCEIVED QUALITY OF INFRASTRUCTURE-LESS VOIP. Kun-chan Lan and Tsung-hsun Wu

How To Detect An Traffc From A Network With A Network Onlne Onlnet

A Design Method of High-availability and Low-optical-loss Optical Aggregation Network Architecture

Ad-Hoc Games and Packet Forwardng Networks

Gender Classification for Real-Time Audience Analysis System

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application

A graph-theoretic framework for isolating botnets in a network

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Support Vector Machines

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

On the Interaction between Load Balancing and Speed Scaling

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Lecture 2: Single Layer Perceptrons Kevin Swingler

A Performance Analysis of View Maintenance Techniques for Data Warehouses

The Load Balancing of Database Allocation in the Cloud

FORMAL ANALYSIS FOR REAL-TIME SCHEDULING

An Alternative Way to Measure Private Equity Performance

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

On File Delay Minimization for Content Uploading to Media Cloud via Collaborative Wireless Network

Statistical Methods to Develop Rating Models

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

The Greedy Method. Introduction. 0/1 Knapsack Problem

A Parallel Architecture for Stateful Intrusion Detection in High Traffic Networks

To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently.

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35, , ,200,000 60, ,000

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining

SUPPLIER FINANCING AND STOCK MANAGEMENT. A JOINT VIEW.

The Use of Analytics for Claim Fraud Detection Roosevelt C. Mosley, Jr., FCAS, MAAA Nick Kucera Pinnacle Actuarial Resources Inc.

Efficient Bandwidth Management in Broadband Wireless Access Systems Using CAC-based Dynamic Pricing

J. Parallel Distrib. Comput.

Detecting Credit Card Fraud using Periodic Features

1 Example 1: Axis-aligned rectangles

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

Estimating the Development Effort of Web Projects in Chile

Loop Parallelization

Title Language Model for Information Retrieval

Project Networks With Mixed-Time Constraints

M3S MULTIMEDIA MOBILITY MANAGEMENT AND LOAD BALANCING IN WIRELESS BROADCAST NETWORKS

Network traffic analysis optimization for signature-based intrusion detection systems

Transcription:

Classfcaton of Network Traffc va Packet-Level Hdden Markov Models Alberto Danott, Walter de Donato, Antono Pescapè Department of Computer Scence and Systems Unversty of Naples Federco II {alberto, walter.dedonato, pescape}@unna.t Perlug Salvo Ross Department of Electroncs and Telecommuncatons Norwegan Unversty of Scence and Technology salvoros@et.ntnu.no Abstract Traffc classfcaton and dentfcaton s a fertle research area. Beyond Qualty of Servce, servce dfferentaton, and bllng, one of the most mportant applcatons of traffc classfcaton s n the feld of network securty. Ths paper proposes a packet-level traffc classfcaton approach based on Hdden Markov Model (HMM). Classfcaton s performed by usng real network traffc and estmatng - n a combned fashon - Packet Sze (PS) and Inter Packet Tme (IPT) characterstcs, thus remanng applcable to encrypted traffc too. The effectveness of the proposed approach s evaluated by consderng several traffc typologes: we appled our model to real traffc traces of Age of Mythology and Counter Strke (two Mult Player Network Games), HTTP, SMTP, Edonkey, PPlve (a peer-to-peer IPTV applcaton), and MSN Messenger. An analytcal bass and the mathematcal detals regardng the model are gven. Results show how the proposed approach s able to classfy network traffc by usng packet-level statstcal propertes and therefore t s a good canddate as a component for a mult-classfcaton framework. I. INTRODUCTION Network traffc classfcaton s the process of analyzng traffc flows and assocatng them to dfferent categores of network applcatons and t represents an essental task n the whole chan of network securty. Studes n the feld of traffc classfcaton started n the last years, when the tradtonal use of transport protocol ports for classfcaton purposes became unrelable whle dfferent knds of new network applcatons were emergng (multplayer network games, p2p IPTV, fle sharng). Beyond the need to understand whch knd of traffc s carred on the Internet lnks, other man motvatons for lookng for new and relable traffc classfcaton technques today are to offer proper Qualty of Servce (QoS) dependng on the category of traffc carred by flows, and to perform a bllng not only based on bandwdth usage but also on the traffc category. However, n addton to these ssues, some of the most mportant and wdely spread applcatons of traffc classfcaton pertan to network securty: () the enforcement of securty polces on the use of dfferent applcatons; () the ablty to classfy encrypted traffc; () the dentfcaton of malcous traffc flows. For these reasons, several new approaches to traffc classfcaton are beng proposed and studed. As of today, though, no defntve answer s present. The debate n the scentfc communty s stll open, and, as t happened n the recent past for ntruson detecton systems [], Ths work has been partally supported by PRIN 27 RECIPE Project, by CONTENT NoE, and NETQOS EU projects, by WILATI+ project. approaches based on the jont work of dfferent traffc classfcaton technques (mult-classfcaton) seem to be among the more promsng solutons. New trends n network applcatons and protocol desgn, ndeed, make traffc classfcaton partcularly dffcult. Protocol encapsulaton, encrypted transmsson, use of non-standard ports, concerns related to users prvacy, and need to keep up wth huge traffc loads on network lnks are posng tremendous lmts to some of the developed technques. Payload nspecton technques, for example, make applcaton dentfcaton dffcult or even mpossble under some of the above-cted condtons (manly for both prvacy and performance ssues). On the other sde, approaches based on statstcal propertes of the network traffc are lookng more promsng and robust to encrypton, protocol obfuscaton, prvacy, etc. In ths paper we propose a novel classfcaton technque based on packet-level statstcal propertes of network traffc exhbted by dfferent applcatons. Specfcally, we propose the use of Packet-Level Hdden Markov Models (PL-HMMs), that we have proposed and valdated n the past for modelng purposes [2]. In ths work we present the algorthms, the statstcal propertes taken n consderaton, and we test the proposed classfcaton approach on a set of applcaton traffc that ranges from tradtonal network applcatons (e.g. HTTP, Emal) to more recent ones as network games and peer-topeer vdeo streamng. The presented results are encouragng and show that the proposed PL-HMM approach may be a good canddate as a technque to be used n a mult-classfcaton scenaro (that s, when dfferent classfcaton engnes are used and ther output s combned by a decson system). The rest of the paper s organzed as follows. In Secton II a bref descrpton of the motvatons s gven. Secton III provdes detals on the analytcal model at the base of our classfer. Secton IV dscusses the applcatons consdered and the measurement approach. Fnally, n Secton V we show results of traffc classfcaton. Secton VI ends the paper. II. MOTIVATION AND RELATED WORK Several classfcaton technques have recently been presented n lterature. Approaches based on deep payload nspecton are usually consdered very relable for traffc that s not encapsulated nto other applcaton-level protocols and for un-encrypted traffc. However, the current trends show that 978--4244-2324-8/8/$25. 28 IEEE. Ths full text paper was peer revewed at the drecton of IEEE Communcatons Socety subject matter experts for publcaton n the IEEE "GLOBECOM" 28 proceedngs.

the porton of encrypted traffc on the Internet s constantly ncreasng [3], and several applcatons are usng protocol encapsulaton or obfuscaton to evade network polcy enforced through flterng [4]. Moreover, access to full payload s often not possble (e.g. due to prvacy ssues). For these reasons, researchers are proposng approaches that look more robust because based on the ntrnsc propertes of the network traffc as t s generated by dfferent applcatons. Flow-level parameters (e.g. flow duraton, transmtted bytes, transmtted packets) are a popular choce, a vald alternatve or combnaton s to explot measurements comng from packet level (e.g. packet sze, nter-packet tmes). Several notable works [5] [6] [7] [8] [9] presented n lterature consder some of these propertes to buld classfcaton features, and then use statstcal or machne learnng approaches to classfcaton. Results show that a perfect classfcaton approach does not exst. The use of dfferent features and classfers can brng more accuracy under some condtons or n dentfyng some applcatons whle may not be satsfyng n other cases. It s therefore probable that n the future we wll see mult-classfer approaches, able to collect the advantages of dfferent technques and compensate for each weakness, beng proposed. In ths paper we propose a technque for traffc classfcaton based on a statstcal approach that takes nto account some new packet-level propertes of network traffc, tryng to offer a contrbuton n terms of technques to explot ntrnsc propertes of traffc generated by dfferent network applcatons. Indeed, as explaned n the followng sectons, the use of PL- HMMs allows us to take nto account jont characterstcs of nter-packet tmes (IPT) and payload sze (PS), as well as ther temporal correlaton. We use studes from our modelng work based on HMMs [2]: the traffc generated by a specfc applcaton s modeled as a flow of packets, seen as a sequence of (IPT,PS) pars generated accordng to dfferent dstrbutons dependng on the hdden state of the source. In [8], HMMs have been used, and compared wth other technques, for traffc classfcaton of flows at an early stage. Sequences made of only the frst 4 to packets were used to tran HMMs and to attempt flow classfcaton. However, dfferently from our work, only packet szes were consdered n ths paper. An approach based on profle HMMs has been proposed n []. Ths work s very dfferent from ours, n that the authors present two separate classfers workng separately on IPTs or on PSs, and a left-to-rght structure for the state topology of the HMM s used. However, a proposal for extendng ther approach was later presented n a techncal report [], where they try to account for jont IPT and PS modelng va vector quantzaton. Proposed profle HMMs n [] present a very complex state structure dependng on the length of the tranng sequence, wth a par of dfferent states for each packet. They are desgned for one-dmensonal observable varables. IPT and PS jont nformaton s taken nto account va vector quantzaton, thus a codebook labelng IPT and PS allowed pars s used as observable varable. Furthermore, a heurstc technque, namely model surgery, s needed to account for dfferent trace lengths. As t wll IPT PS sequence PL HMM λ PL HMM n PL HMM N λ n λ N argmax(.) traffc class Fg.. Archtecture of the classfer. be clear n the next secton, compared to [], the model proposed n ths paper works drectly on a two-dmensonal observable varable, thus explots IPT and PS jont nformaton wthout needng any pre-processng lke vector quantzaton. Our approach presents a fully-connected structure for the state topology that allows an enormous reducton of the number of states, avods post-processng lke model surgery, and although beng much less structured than the profle HMMs wth respect to the traffc characterstcs s stll able to acheve good classfcaton results. III. THE ANALYTICAL MODEL Notaton - Column vectors are denoted wth lower-case bold letters, wth a denotng the th element of vector a; matrces are denoted wth upper-case bold letters, wth A,j denotng the (, j)th element of matrx A; (.) T and E{.} denote transpose and expectaton operators; a b = b denotes the condtonal random varable a gven that b = b ; the symbol means dstrbuted as. Fgure shows the general system archtecture that we are consderng for traffc classfcaton. It s composed by a bank of parallel PL-HMMs and a mult-nput sngle-output block pontng at the maxmum nput. In order to capture the characterstcs of N dfferent typologes of network traffc, t s assumed that the N dfferent PL-HMMs n the bank have been obtaned va the Baum-Welch tranng proposed n [2]. The Baum-Welch algorthm [2] s an teratve procedure that looks for model parameters maxmzng the probablty that the model tself generates the sequences used as tranng set. Each PL-HMM of the bank s then used to compute the lkelhood (λ n ), representng the probablty that the test sequence belongs to the traffc typology assocated to the PL- HMM. The maxmum lkelhood then selects the best estmate for the traffc typology. A. PL-HMM The sngle PL-HMM s an HMM composed by a dscrete hdden state varable x[l] {s,...,s K } and a contnuous bdmensonal observable varable, y[l] =(d[l],b[l]) T, where K denotes the number of the states for the HMM, d[l] denotes log (IPT/µs) and b[l] denotes PS of the lth packet. IPT and PS are jontly descrbed wth memory and correlaton 978--4244-2324-8/8/$25. 28 IEEE. Ths full text paper was peer revewed at the drecton of IEEE Communcatons Socety subject matter experts for publcaton n the IEEE "GLOBECOM" 28 proceedngs.

taken nto account by the state varable, and assumed statstcally ndependent gven the state. The sngle PL-HMM s characterzed by the set of parameters M = {A, g (t), w (t), g (p), w (p) }, denotng the state transton matrx, the condtonal IPT and PS dstrbuton vectors, respectvely,.e. A,j =Pr(x[l +]=s j x[l] =s ); d[l] x[l] =s Gamma(g (t),w (t),w (p) ); b[l] x[l] =s Gamma(g (p) ). It s apparent the Markovan assumpton for the hdden state. The condtonal (n th state) pdf s for IPT and PS, are f (t) (d) = (d/w(t) f (p) (b) = (b/w(p) ) g(t) e (d/w(t) ) w (t) Γ(g (t) ) ) g(p) e (b/w(p) ) w (p) Γ(g (p) ) (d >), (b >). It s worth notcng that, accordng to our notaton, the IPT-PS sequence Y = (y[],...,y[l]) corresponds to the followng par of sequences: D = (d[],...,d[l]) for IPT values and B =(b[],...,b[l]) for PS values. B. Lkelhood Computaton The lkelhood λ =Pr(Y M) of an IPT-PS sequence Y, gven the model M, s computed explotng the dependences captured by the model n both forward and backward drectons. The Forward-Backward algorthm [2] s an effcent technque to compute the Forward varable α and the Backward varable β n a graphcal model,.e. the varables capturng such dependences. More specfcally, for HMM structures t s based on the followng equatons α j [l] = α [l ]A,j f (t) j (d[l])f (p) j (b[l]), β [l] = = j= A,j f (t) j (d[l +])f (p) j (b[l +])β j [l +]. Basng on these formulas, the lkelhood for an IPT-PS sequence Y s computed as λ =Pr(Y M)= α [l]β [l], = for an arbtrary l. The Forward-Backward algorthm s typcally mplemented n the log-doman. C. Traned PL-HMMs Our PL-HMMs present K =4to K =7states, dependng on the complexty of the protocol. We tred to keep the number of states as low as possble n order to contan computatonal complexty, and at the same tme provde suffcent accuracy n modelng the characterstcs of a specfc a network-traffc typology. The set of parameters for the tranng algorthm s chosen n order to cover almost unformly the whole range of observed IPT and PS values. Convergence of the Baum-Welch occurrences occurrences.2..8.6.4.2.2..8.6.4.2 IPT hstogram & IPT pdf 2 3 4 5 6 7 dbµ PS hstogram & PS pdf 2 4 6 8 2 4 bytes (a) Normalzed hstogram of the tranng set, pdf of the PL- HMM, pdf of the PL-HMM. covarance covarance.5 between IPT and IPT.5 5 5 2 between PS and IPT.5.5 5 5 2 covarance covarance.5 between IPT and PS.5 5 5 2 between PS and PS.5.5 5 5 2 (b) IPT-PS auto- and cross-covarance for the tranng set, the PL-HMM, the PL-HMM. Fg. 2. PL-HMM characterstcs. tranng for all typologes was reached n a few (less than ) teratons. Fgure 2 shows the characterstcs of the PL-HMM to model SMTP traffc (please refer to Secton IV for a descrpton of all the applcatons consdered n ths work). From ths fgure, t s clear how frst and second order statstcs are captured by the model. Ths s shown also to gve an ntutve dea of how packet-level propertes related to margnal dstrbutons, tme dependence, and mutual dependence between IPT and PS, are captured by a PL- HMM made of few parameters, that can then be exploted for classfcaton purposes. Table I shows the state parameters for the PL-HMM n whch each state corresponds to a dfferent short-tme behavor of the applcaton n terms of IPT and PS generaton, for more detals refer to [2]. Smlar behavor n terms of modelng capabltes have been obtaned for each of the traffc typologes descrbed n Secton IV. Global statstcs (average value and standard devaton) of 978--4244-2324-8/8/$25. 28 IEEE. Ths full text paper was peer revewed at the drecton of IEEE Communcatons Socety subject matter experts for publcaton n the IEEE "GLOBECOM" 28 proceedngs.

TABLE I SMTP: STATE PARAMETERS. PS IPT g (t) w (t) g (p) w (p) st state 96.9.28.5 38.2 2nd state.86 9.3 2.23 25.2 3rd state 22.94.7 54.95 4th state 54.95 32.33 35.9 5th state 9.23 4.23 229828.6 TABLE II TRAINING SETS STATISTICS. IPT [dbµ] PS [bytes] mean std dev. mean std dev. AoM 47 9 3 4 CS 48 29 25 Edonkey 49 82 377 HTTP 48 3 73 46 MSN 56 5 575 572 PPLve 66 4 77 27 SMTP 4 8 66 624 the tranng sets used to characterze each traffc typology are shown n Table II. It s easy to notce that IPT and PS jont characterzaton s needed n order to am at successful classfcaton. Also, analyzng dfferences and smlartes among traffc characterstcs, t s not surprsng that, antcpatng the results shown n Secton V, AoM and PPlve wll present the two best performance for correct classfcaton, whle the the worst performance for msclassfcaton wll be when confusng Edonkey wth SMTP and SMTP wth MSN. IV. CONSIDERED APPLICATIONS AND MEASUREMENT APPROACH We tested our algorthm over a heterogeneous set of network applcatons, shown n Table III. Each of them were verfed through deep payload nspecton and manual checks. The choce of the consdered applcatons to classfy was drven by the followng multdmensonal crtera: () both TCP and UDP based applcatons; () both and sgnalng traffc; () both tradtonal and novel Internet applcatons. As for TCP-based and tradtonal applcatons we consdered the traffc of HTTP and SMTP (respectvely related to Web and Emal), stll responsble for a relevant porton of the overall Internet traffc. Agan, n the class of TCP-based applcatons and stll fallng n the category of tradtonal Internet applcatons, we consdered Instant Messengng. It s used by about 5% of the Internet users all around the world [3], wth MSN Messenger (MSN n the followng) beng the most popular applcaton. In ths work we consder the traffc generated by MSN clents [4]. Also, as last TCP-based applcaton we consdered the traffc assocated to the Edonkey protocol [5], used by peer-to-peer fle sharng applcatons as Emule. Ths category of traffc s qute novel (compared to Web and Emal traffc) and t s partcularly mportant because most of the ssues related to the nablty to dentfy applcatons through protocol ports started wth respect to peer-to-peer fle sharng applcatons. As regard UDP-based and nnovatve (and wth QoS requrements) applcatons, we consdered the traffc generated by Age of Mythology (AoM) [6], a Real Tme Strategy Multplayer game, and CounterStrke (CS) [7], one of the most played Frst Person Shooter games on the Internet. TABLE III CONSIDERED TRAFFIC Tranng Test flows packets bytes flows packets bytes AoM 4 9887.3 M 2 5567 72 K CS 344 358 M 34 2796 88 K Edonkey 9 24529 289 M 82 9526 228 M HTTP 752 366 29 M 777 28484 88 M MSN 87 92375 58 M 7836 922686 557 M PPlve 37 452 799 K 57 6658 73 K STMP 57 385238 853 M 6738 72785 266 M Fnally, a category of traffc that s now constantly ncreasng s peer-to-peer vdeo streamng. Trple-player Operators are nterested n dentfyng and classfyng ths traffc wthout damagng the prvacy of the users. For ths reason, we consdered the sgnalng traffc generated by the PPlve applcaton. Therefore, accordng to our multdmensonal crtera, ths last traffc typology falls n the class composed by the trple: UDPbased applcaton, nnovatve Internet servce, sgnalng traffc. To stress the mportance of peer-to-peer vdeo streamng traffc n current networks, t s worth notcng that we prevously studed the traffc generated by PPlve and, whle we were able to recognze that the sgnalng nformaton was transmtted through UDP packets and the vdeo was carred by TCP packets, we were not able to relably dentfy all the vdeo streamng flows on TCP. Thus confrmng that, from the Operator pont of vew, the ablty to recognze sgnalng traffc nstead of traffc s of ndsputable mportance. Except for network games, all the traffc was captured at Unversty of Naples Federco II, Italy, wth the traffc from peer-to-peer applcatons generated by a set of controlled boxes. The AoM traces, nstead, have been provded by the Worcester Polytechnc Insttute, MA (USA) [8]. Whereas the CS traces have been already used for a study on network games traffc modelng [9]. Accordng to the results shown n [2] we can state that the tme nvarance of IPT does not affect the classfcaton process (based on both IPT and PS). We consdered the conventonal defnton of flows - gven by the 4-tuple: source IP, source port, destnaton IP, destnaton port - wth a tmeout of 6 seconds. In ths study we took nto account only traffc extng from observed hosts (e.g. packets wth destnaton port 8 or 25 for HTTP and SMTP respectvely, packets sent by observed machnes n the case of peer-to-peer applcatons, etc.), neglectng flows n the opposte drecton. We separated the avalable flows n two separate sets: a tranng set used for tranng the PL- HMM and thus buldng the models, and a test set used to verfy the classfer. Flows wth less than packets have been excluded both from tranng and test sets n order to avod numercal problems runnng the algorthms. From each consdered flow we extracted sequences of IPT and PS. Snce we wanted to characterze the traffc generated by the applcatons, ndependent as much as possble of the transport protocols, we dropped all packets wth empty payload, as TCPspecfc traffc, lke connecton establshment packets (SYN- ACK-SYNACK) and pure acknowledgment packets. For the same reason, n the estmaton of the PS, we measured the byte length of the TCP/UDP payload. 978--4244-2324-8/8/$25. 28 IEEE. Ths full text paper was peer revewed at the drecton of IEEE Communcatons Socety subject matter experts for publcaton n the IEEE "GLOBECOM" 28 proceedngs.

TABLE IV CLASSIFICATION RESULTS: CONFUSION MATRIX AoM CS Edonkey HTTP MSN PPlve SMTP AoM.%.%.%.%.%.%.% CS 2.94% 93.53% 2.94%.%.29%.%.29% Edonkey.%.22% 9.24%.22% 2.44%.22% 3.66% HTTP.%.4%.3% 93.35% 2.8%.49% 2.7% MSN.%.3% 2.34%.94% 94.6%.% 2.43% PPlve.%.%.64%.64%.9% 96.82%.% SMTP.% 2.4% 2.23% 2.25% 3.25%.% 9.23% V. EXPERIMENTAL RESULTS In Table IV we show, summarzed through a confuson matrx, the results of the classfcaton performed on the test sets. Each row represents n percentage the output of a run of the classfer over a dfferent applcaton test set (e.g. the cell correspondng to the HTTP row and Edonkey column tells us that.3% of the flows from the HTTP test set have been erroneously classfed as Edonkey). All the correct classfcaton percentages are shown on the dagonal n bold. We can see that for all the applcatons a correct classfcaton percentage above 9% s acheved, wth the best results obtaned when tryng to dentfy AoM and PPlve traffc. For AoM the % percentage value s manly explaned wth the very reduced number of flows of the test set, however t s mportant to note that the confuson values observable on the AoM column show that t almost never happens that flows from dfferent applcatons are erroneously classfed as AoM (ths actually happens only for CounterStrke whch s a game over UDP as AoM), demonstratng that the AoM model s very strct n capturng AoM traffc propertes. The worst results are obtaned when tryng to dentfy Edonkey or SMTP traffc. Here we see that there are several flows that are confused wth other applcatons. Probably the consdered statstcal propertes of such flows do not ft wth ther correspondng models. However, ths s a typcal stuaton n whch a mult-classfer system may overrde the weaknesses of a sngle approach by countng also on dfferent classfcaton technques based on other propertes. Moreover, t s worth notcng that n ths work we consdered only traffc n one drecton for each host, whereas by buldng models also for the other way and explotng the bond between correspondng flows n the two drectons (beng both generated by the same applcaton) t may be possble to acheve a better accuracy. The extenson of the classfer amng to process both traffc drectons at the same tme s currently under nvestgaton. VI. CONCLUSION Traffc classfcaton represents an essental task for both network management archtectures [23] and network securty solutons [24]. In ths paper we proposed an approach for traffc classfcaton based on HMMs appled to packet-level traffc parameters. Our approach, by jontly consderng IPT and PS and takng nto account also ther temporal structures, s able to classfy a number of traffc typologes (TCP and UDP based, and sgnalng, tradtonal and novel Internet applcatons). We showed how the technque s able to acheve promsng results such that t may be consdered as one of the technques to be used n a mult-classfer system. Our ongong work s devoted to both prelmnary longtudnal/portablty analyss (.e. tranng and testng stage usng dfferent traffc traces) and enlarge the set of consdered traffc typologes. Moreover we plan to compare performance aganst other classfers. REFERENCES [] G. Gacnto, F. Rol, L. Ddac, Fuson of multple classfers for ntruson detecton n computer networks, Pattern Recognton Lett., Vol. 24, no. 2, pp. 795 83, Aug. 23. [2] A. Danott, A. Pescapé, P. Salvo Ross, G. Iannello, G. Ventre, F. Palmer, An HMM Approach to Internet Traffc Modelng, IEEE Global Telecommun. Conf. (GLOBECOM), pp. 6, Dec. 26. [3] http://www.net-securty.org/secworld.php?d=4852, Mar. 28. [4] T. Karaganns, A. Brodo, N. Brownlee, K.C. Claffy, M. Faloutsos, Is P2P dyng or just hdng?, IEEE Global Telecommun. Conf. (GLOBE- COM), pp. 532 538, Dec. 24. [5] S. Zander, T. Nguyen, G. Armtage, Automated traffc classfcaton and applcaton dentfcaton usng machne learnng, IEEE LCN, pp. 25 257, Nov. 25. [6] M. Crott, F. Grngol, P. Pelosato, L. Salgarell, A Statstcal Approach to IP-level classfcaton of network traffc, IEEE Int. Conf. Commun. (ICC), pp. 7 76, Jun. 26. [7] J. Erman, A. Mahant, M. Arltt, Internet Traffc Identfcaton usng Machne Learnng, IEEE Global Telecommun. Conf. (GLOBECOM), pp. 6, Dec. 26. [8] L. Bernalle, R. Texera, K. Salamatan, Early Applcaton Identfcaton, ACM Co-Next, 26 [9] T. Auld, A.W. Moore, S.F. Gull, Bayesan Neural Networks for Internet Traffc Classfcaton, IEEE Trans. Neural Networks, Vol. 8, no., pp. 223 239, Jan. 27. [] C. Wrght, F. Monrose, G. Masson, HMM Profles for Network Traffc Classfcaton, VzSEC/DMSEC, pp. 9 5, Oct. 24. [] C. Wrght, F. Monrose, G. Masson, Towards better protocol dentfcaton usng profle HMMs, JHU Tech. Rep. JHU-SPAR52, Jun. 25. [2] L.R. Rabner, A tutoral on Hdden Markov Models and Selected Applcatons n Speech Recognton, Procs. IEEE, Vol. 77, no. 2, pp. 257 285, Feb. 989. [3] http://www.comscore.com/, Sep. 27. [4] http://jon.msn.com/messenger/overvew, Sep. 27. [5] http://sourceforge.net/projects/pdonkey/, Mar. 28. [6] http://www.mcrosoft.com/games/ageofmythology/, Mar. 28. [7] http://www.counter-strke.net/, Mar. 28, [8] http://nle.wp.edu/downloads, Sep. 27. [9] W. Feng, F. Chang, W. Feng, J. Walpole, A Traffc Characterzaton of Popular On-lne Games, IEEE/ACM Trans. Networkng, Vol. 3, no. 3, pp. 488 5, Jun. 25. [2] A. Botta, A. Danott, A. Pescapé, G. Ventre, Searchng for Invarants n Network Games Traffc, Poster at Co-Next 26 Student Workshop. [2] http://www.mcrosoft.com/technet/prodtechnol/ sa/2/mantan/samsec.mspx, Sep. 27. [22] http://www.hypothetc.org/docs/msn/general/ overvew.php, Sep. 27. [23] H. Jang, A.W. Moore, Z. Ge, S. Jn, J. Wang, Lghtweght Applcaton Classfcaton for Network Management, SIGCOMM Work. Internet Network Manag., Aug. 27. [24] O. Marques, P. Ballargeon, Desgn of a multmeda traffc classfer for Snort, Informaton Manag. & Computer Securty J., Vol. 5, no. 2, Jun. 27. 978--4244-2324-8/8/$25. 28 IEEE. Ths full text paper was peer revewed at the drecton of IEEE Communcatons Socety subject matter experts for publcaton n the IEEE "GLOBECOM" 28 proceedngs.