Toward line rate Traffic Classification

Similar documents

Traffic Classification with Sampled NetFlow

Near Real Time Online Flow-based Internet Traffic Classification Using Machine Learning (C4.5)

Encrypted Internet Traffic Classification Method based on Host Behavior

A Preliminary Performance Comparison of Two Feature Sets for Encrypted Traffic Classification

An apparatus for P2P classification in Netflow traces

Realtime Classification for Encrypted Traffic

Hadoop Technology for Flow Analysis of the Internet Traffic

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA

How To Classify Network Traffic In Real Time

Network Traffic Characterization using Energy TF Distributions

Online Classification of Network Flows

Statistical traffic classification in IP networks: challenges, research directions and applications

Classifying P2P Activity in Netflow Records: A Case Study on BitTorrent

Live Traffic Monitoring with Tstat: Capabilities and Experiences

A statistical approach to IP-level classification of network traffic

ATCM: A Novel Agent-based Peer-to-Peer Traffic Control Management

Appmon: An Application for Accurate per Application Network Traffic Characterization

Forensic Network Traffic Analysis

Traffic Analysis of Mobile Broadband Networks

Early Recognition of Encrypted Applications

Machine Learning Based Encrypted Traffic Classification: Identifying SSH and Skype

Identification of Network Applications based on Machine Learning Techniques

The Applications of Deep Learning on Traffic Identification

Signature-aware Traffic Monitoring with IPFIX 1

HTTPS Traffic Classification

Packet Flow Analysis and Congestion Control of Big Data by Hadoop

D4.4: Web-based Interactive Monitoring Application

Large-Scale TCP Packet Flow Analysis for Common Protocols Using Apache Hadoop

Fine-grained traffic classification with Netflow data

Keywords Attack model, DDoS, Host Scan, Port Scan

STRATEGY TO BLOCK TRAFFIC CREATE BY ANTI CENSORSHIP SOFTWARE IN LAN FOR SMALL AND MEDIUM ORGANISATION

A Measurement of NAT & Firewall Characteristics in Peer to Peer Systems

How is SUNET really used?

Network Monitoring Using Traffic Dispersion Graphs (TDGs)

KEITH LEHNERT AND ERIC FRIEDRICH

Traffic Identification Based on Applications using Statistical Signature Free from Abnormal TCP Behavior *

From Centralization to Distribution: A Comparison of File Sharing Protocols

VIRUS TRACKER CHALLENGES OF RUNNING A LARGE SCALE SINKHOLE OPERATION

OpenFlow with Intel Voravit Tanyingyong, Markus Hidell, Peter Sjödin

Research on Errors of Utilized Bandwidth Measured by NetFlow

Multi-level Metadata Management Scheme for Cloud Storage System

Internet Traffic Measurement

Traffic Classification

Getting the Most Out of Your Existing Network A Practical Guide to Traffic Shaping

In this whitepaper we will analyze traffic for the broadband access services that IIJ operates, and present our findings.

Implementation of Botcatch for Identifying Bot Infected Hosts

Efficient Prevention of Credit Card Leakage from Enterprise Networks

Virtual private network. Network security protocols VPN VPN. Instead of a dedicated data link Packets securely sent over a shared network Internet VPN

A Dynamic Flooding Attack Detection System Based on Different Classification Techniques and Using SNMP MIB Data

TLS and SRTP for Skype Connect. Technical Datasheet

Analysis of Communication Patterns in Network Flows to Discover Application Intent

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

Research on P2P-SIP based VoIP system enhanced by UPnP technology

Identifying Peer-to-Peer Traffic Based on Traffic Characteristics

Lecture 28: Internet Protocols

HyLARD: A Hybrid Locality-Aware Request Distribution Policy in Cluster-based Web Servers

Advancement in Virtualization Based Intrusion Detection System in Cloud Environment

POX CONTROLLER PERFORMANCE FOR OPENFLOW NETWORKS. Selçuk Yazar, Erdem Uçar POX CONTROLLER ЗА OPENFLOW ПЛАТФОРМА. Селчук Язар, Ердем Учар

Sage ERP Accpac Online

Sage 300 ERP Online. Mac Resource Guide. (Formerly Sage ERP Accpac Online) Updated June 1, Page 1

Monitoring of Tunneled IPv6 Traffic Using Packet Decapsulation and IPFIX

How To Detect Denial Of Service Attack On A Network With A Network Traffic Characterization Scheme

Trends and Differences in Connection-behavior within Classes of Internet Backbone Traffic

Using traffic analysis to identify The Second Generation Onion Router

Two State Intrusion Detection System Against DDos Attack in Wireless Network

Classifying Service Flows in the Encrypted Skype Traffic

Software Defined Networking and the design of OpenFlow switches

Network Performance Monitoring at Small Time Scales

Assignment 6: Internetworking Due October 17/18, 2012

Scalable NetFlow Analysis with Hadoop Yeonhee Lee and Youngseok Lee

A Game Theoretical Framework for Adversarial Learning

Measurement of the Usage of Several Secure Internet Protocols from Internet Traces

How To Identify Different Operating Systems From A Set Of Network Flows

Digging into HTTPS: Flow-Based Classification of Webmail Traffic

Transcription:

Toward line rate Traffic Classification Niccolo' Cascarano Politecnico di Torino http://sites.google.com/site/fulviorisso/ 1

Background In the last years many new traffic classification algorithms based on statistical approach One of the claims of these new algorithms is that their computational requirements are lows than Deep Packet Inspection [3-8] DPI is commonly considered too expensive Is that true? Can DPI be further improved? Is there anything better than DPI? 2

The path toward the answers Create a model of some classifiers (currently, DPI, Naïve Bayes and SVM) and compare their complexity Joint work with Università di Brescia Improve the DPI engine itself Service-based traffic classification 3

Question 1: is DPI so computationally complex? 4

What is DPI? DPI = pattern matching through regular expressions Two main flavors: Packet-Based per-flow State (PBFS): network data are analyzed on a packet-by-packet basis as soon packets are received by the classifier Message-Based per-flow State (MBFS): network data are analyzed as an unique stream of data after TCP/IP normalization PBFS seems roughly equivalent MBFS with respect to traffic classitication [1-2] We use PBFS DPI classifier + capability to analyze correlated session (e.g., FTP and SIP) 5

Methodology Cost modeling Average cost per packet (instead of worst-case) Modeled each classifier Derived the cost of each block Determined the transition probability from one block to the other by analyzing real traces (with ground truth [26]) Derived the min/max/average cost per packet Cost of each block timed the transition probability 6

Models DPI SVM Session ID Extracion extracts the L3 and L4 information from network packets Session lookup checks within the session table if a packets belongs to a classified session Pattern matching implements the pattern matching algorithm (DPI only) SVM decision implements the SVM classification algorithm (SVM only) Session update updates the session table with the outcome of the classification Correlated session it analyzes the application data for obtaining information on correlated sessions (DPI only) 7

Basic blocks implementation Session ID extraction: native assembly code for IA32 generated NetVM framework [19] Session Lookup e Session Update: C++ code using hash_map container of extended STL C++ library [18] Pattern matching: C++ code implementing a DFA-based algorithm generated by Flex [20]. About 30 application protocol are recognized (NOTE: the cost of this block does NOT depend on the number of protocol recognized) SVM Decision: C++ code written exploiting the multivariate Gaussian joint density function. We generated the models for recognizing about 10 application protocols. (NOTE: the cost of this block linearly DEPENDS on the number of protocol recognized) Correlated Session: C++ code written on purpose deriving correlated session rules for FTP and SIP protocol from the NetPDL database [17] 8

Experimental evaluation Costs of each block measured with the RDTSC instruction Costs dependent on the input traffic (e.g. DFA) is further characterized in order to push relevant parameters in the final formula Traffic traces UNIBS trace contains a big percentage of p2p traffic, known to be challenging for DPI classifiers POLITO trace contains a medium size campus network traffic trace (~6000 hosts within the network) 9

Absolute costs of each basic block Pattern matching depends on the packet size SVM depends on the number of protocols examined 10

Comparison 11

Comparison Legend Best case: all the packets belong to already classified sessions (fast path) Worst case: all the packets need to take the slow path Average case: the costs are normalized using the execution probabilities of each basic block Results DPI classifier has the same order of magnitude of the other ones, even for UNIBS challenging trace May be better on some traces Comparison not exactly fair (48 protocols for DPI against 12) 12

Conclusion 1 Packet-based DPI may not be as complex as we thought, as far as pure traffic classification is concerned 13

Question 2: can we reduce DPI cost? 14

Yes, We Can if we focus on traffic classification and not network security 15

(1) Use fast algorithms Min (ticks) Avg (ticks) Max (ticks) Flex (canonical DFA) 76 3980 19147 PCRE (NFA-based) 35.7K 2.08M 9.16M DFA is simple and O(payload_length) Key question: is the DFA usable? 16

(2) Use friendly regular expressions (preliminary results) 17

(2) and convert some in friendly Average cost on HTTP Match (ticks) No match Anchored 1663 1415 Anchored + Kleene 5622 1367 Not anchored + Kleene 5503 3300 Not anchored + Kleene and backtracking 5290 13659 Baseline: not anchored + Kleene http unknown Anchored (on UNIBS-GT) 0% 0% Anchored + Kleene (on UNIBS-GT) 0% 0% unknown http Anchored (on POLITO) 0.004% 0.38% Anchored + Kleene (on POLITO) 0.005% 0% 18

(3) Use a packet-based approach Unknown TCP traffic POLITO 23.5GB 2.6MB UNIBS-GT 870MB 0B Additional classified TCP traffic 19

(4) Snapshot-based classification no differences in accuracy when length >= 256 bytes 20

(4) Snapshot-based classification Fair speedup with TCP traffic 21

(5) Limiting classification attempts Avg # pkts Std dev UNIBS-GT (TCP) 654 4619 POLITO-GT (TCP) 563 3659 POLITO (TCP) 68 1879 UNIBS-GT (UDP) 2.62 0.71 POLITO-GT (UDP) 6.05 26.4 POLITO (UDP) 9.17 476 Avg # pkts Bittorrent (TCP) 1 0 Std dev Samba (TCP) 1.01 0.29 HTTP (TCP) 1.05 15.6 Skype (UDP) 1.7 437 SSL(UDP) 1.92 267 Telnet (TCP) 2599 3276 Direct Connect (TCP) 30694 60076 22

(5) Limiting classification attempts 23

(5) Limiting classification attempts Accuracy stable for TCP, may decrease in UDP; almost no misclassifications in both 24

(5) Limiting classification attempts Possible high speedup with TCP traffic 25

(4)+(5) Snapshot + Attempts limit Distribution of classified traffic changes; no clear understanding of the new parameters 26

Conclusions 2 DFA is OK for traffic classification Fast algorithms Up to 3 orders of magnitude friendly regex May achieve up to 5 times speedup No message-based processing Snapshot = 256 for UDP and fair attempts limit (e.g. 10) Fairly small packets; signature that operate on packet sequences Strict attempt limit for TCP (N=2) Able to catch response packets A speedup of 15 on results in Conclusion1 gives 20Mpps on a 3GHz CPU 27

Addendum What are regex? We usually assume regex= regular expressions (e.g. PERL) We believe this model is not powerful enough to cope with modern traffic classification We have to think about a more extended model E.g. currently Skype and RTP are detected with some imperative code in addition to regex Left to future work 28

Is there anything better than DPI? 29

Better perhaps no, but Service-Based Traffic Classification is surely an answer Not exactly a replacement of DPI Instead, something orthogonal to (I would like to say most) traffic classification approaches Service-Based Classification: Once you associated (IP, port) with Service S, all established sessions that insist on that endpoint are associated to S without further analysis 30

Service-Based Traffic Classification No further details are provided in this presentation However, a lot of analysis done that confirm that it really works By-product: if the first classification is correct, a lot of more traffic classified A service with a few sessions in clear and most encrypted traffic 31

SBC: Services vs. sessions 200000 180000 160000 Services Sessions 140000 120000 100000 80000 60000 40000 20000 0 0 20 40 60 80 100 120 140 160 Time (hours) Session table is one order of magnitude larger than service table 32

Conclusions DPI well-known limit is encrypted sessions No way to cope with that with DPI alone DPI (for traffic classification) may not be so costly compared to other competitors and have many advantages E.g. no training (regex are simple to derive) Simple implementation Most of time, walks over small portions of DFA (in cache) Service-Based Classification may be a good complement of previous solutions My 2c: statistical traffic classifiers may have a better fit with a limited number of protocols (i.e. if you want to identify just P2P) but are not applicable to hundreds of protocols 33

Questions? 34

References [1] A. Moore, K. Papagiannaki, Toward the Accurate Identification of Network Application, 6th International Workshop on Passive and Active Network Measurement,Boston MA, USA, May 2005, pp. 41-54. [2] F. Risso, A. Baldini, M. Baldi, P. Monclus, O. Morandi, Lightweight, Payload-Based Traffic Classification: An Experimental Evaluation, IEEE International Conference on Communications (ICC 2008), Beijing (China), pp. 5869-5875, May 2008. [3] J. Erman, A. Mahanti, M. Arlitt, C. Williamson, Identifying and discriminating between web an peer-to-peer traffic in the network core, Proceedings of the 16th International Conference on World Wide Web, Banff, Alberta, Canada pp. 883-892, 2007. [4] J. Erman, M. Arlitt, A. Mahanti, Traffic classification using clustering algorithms, Proceedings of the 2006 SIGCOMM, Pisa, Italy, pp. 281-286, 2006. [5] L. Bernaille, R. Teixeira, I. Akodkenou, Traffic classification on the fly, 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, San Jose, CA, pp. 40-49, 2008. [6] S. Zander, T. Nguyen, G. Armitage, Self-learning IP traffic classification based on statistical flow characteristics, International Workshop on Passive and Active Network Measurement, Boston MA, pp. 325-328, 2005. [7] M. Crotti, M. Dusi, F. Gringoli, L. Salgarelli, Traffic Classification through Simple Statistical Fingerprinting, ACM SIGCOMM Computer Communication Review, Vol. 37, No. 1, pp. 5-16, Jan. 2007. [8] L. Bernaille, R. Teixeira, K. Salamatian, Early Application Identification, 2nd CoNEXT Conference, Lisboa, Portugal, Dec. 2006. [9] A. Este, F. Gringoli, L. Salgarelli, Support Vector Machines for TCP Traffic Classification, Universit` degli Studi di Brescia, Technical Report a. 08-07, Jul. 2008. [10] N. Williams and S. Zander and G. Armitage, A Preliminary Performance Comparison of Five Machine Learning Algorithms for Practical IP Traffic Flow Classification, SIGCOMM Computer Communication Review, Vol. 36, No. 5,, pp. 7-15, Oct. 2006. [11] H. Kim, Kc Claffy, M. Fomenkova, D. Barman and M. Faloutsos, Internet Traffic Classification Demystified: The Myths, Caveats and Best Practices, ACM CoNEXT, Madrid, Spain, Dec. 2008. [12] WEKA, http://www.cs.waikato.ac.nz/ml/weka 35

References [13] T. Karagiannis, K, Papagiannaki, M. Faloutsos, BLINC: Multilevel traffic classification in the Dark, ACM SIGCOMM, Aug. 2005. [14] A. Este, F. Gargiulo, F. Gringoli, L. Salgarelli, C. Sansone, Pattern Recognition Approaches for Classifying IP Flows, 7th International Workshop on Statistical Pattern Recognition, Orlando, FL, Dec. 2008. [15] V.N. Vapnik, Statistical Learning Theory. John Wiley and Sons, New York, 1998. [16] B. Scholkopf, J.C. Platt, J. Shawe Taylor, A.J. Smola, R.C. Williamson, on Estimating the Support of a High Dimensional Distribution. Neural Computation, 13, pp. 1443 1471, 2001. [17] Computer Networks Group (NetGroup) at Politecnico di Torino. The NetBee Library. August 2004. [online] Available at http://www.nbee.org/. [18] Hash map container reference, http://www.sgi.com/tech/stl/hash map.html [19] O. Morandi, F. Risso, M. Baldi, A. Baldini, Enabling flexible protocol processing through dynamic code generation, International Conference on Communications, Beijing (China), pp. 5849-5856, May 2008. [20] flex: The Fast Lexical Analyzer, http://flex.sourceforge.net/ [21] R. Smith, C. Estan, S. Jha, S. Kong, Deflating the big bang: fast and scalable deep packet inspection with extended finite automata, ACM SIGCOMM Computer Communication Review, Volume 38, Issue 4 (October 2008), Pages 207-218. [22] M. Becchi, P. Crowley, Efficient regular expression evaluation: Theory to pratice, Symposium On Architecture For Networking And Communications Systems, Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, San Jose, California, Pp. 50-59, 2008. [23] S. Kumar, S. Dharmapurikar, F. Yu, P. Crowley, J. Turner, Algorithms to accelerate multiple regular expressions matching for deep packet inspection, ACM SIGCOMM Computer Communication Review, Volume 36, Issue 4, pp. 339-350, October 2006 [24] File Transfer Protocol (FTP), RFC 959, http://www.ietf.org/rfc/rfc959.txt [25] N. Brownlee, Traffic flow measurement: Meter MIB, Request for Comments RFC 2064, Internet Engineering Task Force, January 1997. [26] F. Gringoli, L. Salgarelli, M. Dusi, N. Cascarano, F. Risso, K.C. Claffy, GT: picking up the truth from the ground for Internet traffic, ACM Computer Communication Review, October 2009. 36