MonNet a project for network and traffic monitoring How is SUNET really used? Results of traffic classification on backbone data Wolfgang John and Sven Tafvelin Dept. of Computer Science and Engineering Chalmers University of Technology Göteborg, Sweden
Introduction: Measurement location 2x 10 Gbit/s (OC-192) capturing headers only IP addresses anonymized tightly synchronized bidirectional per-flow analysis Stockholm Internet GSIX Regional Regional ISPs ISPs Göteborg GU Chalmers Other smaller Univ. and Institutes
Introduction: Motivation Problem: Operators don t know type of their traffic How to: Improve network design and provisioning? Support QoS support or security monitoring? Enhance accounting possibilities? Reveal trends and changes in network applications?
Introduction: Motivation (2) Solution: Network classification Four approaches in literature: 1. Port numbers + easy to implement - unreliable (P2P, malicious traffic) 2. Packet payloads + accurate - requires updated payload signatures - privacy and legal issues - high processing requirements
Introduction: Motivation (3) Solution: Network classification (contd.) 3. Statistical fingerprinting + no detailed packet information needed - depending on quality of training data - promising, but still immature 4. Connection patterns + no payload required + no training data required - not perfect accuracy
Introduction: Overview Connection classification Overview of proposed heuristics Verification of methodology Results Traffic volumes Diurnal patterns Signaling behavior Summary of more results
Methodology: Traffic Classification Two articles classify P2P flows according to connection patterns: Karagiannis et al., 2004 Perenyi et al., 2006 Updated classification heuristics: Refined the heuristics in prior articles Added new, necessary heuristics
Methodology: Proposed Heuristics Rules based on connection patterns and port numbers 5 rules for P2P traffic 10 rules to classify other types of traffic remove false positives from P2P Rules are applied: On flows in 10 minute intervals Independently on all flows and Prioritized when fetched from the database
Methodology: Proposed Heuristics (2) Heuristics for potential P2P traffic (H1-H5) All traffic to and from potential P2P hosts is marked as P2P traffic H1: TCP and UDP traffic between IP pair H2: Well known P2P ports H3: Re-usage of source port within short time H4: Non-parallel connections to endpoint (IP/Port) H5: unclassified, long flows unclassified by H1-H5 and F1-F10 more than 1MB in one direction or duration of more than 10 minutes
Methodology: Proposed Heuristics (3) Heuristics for other traffic (F1-F10) F1 and F2: Web servers: parallel connections to Web ports All traffic to and from Web server is Web-traffic F3: common services (DNS, BGP) Equal source and destination port and port<501 F4: Mail servers: Hosts receiving traffic on mail ports (smtp, imap, pop) while sending traffic via smtp All traffic to and from Mail servers is Mail-traffic
Methodology: Proposed Heuristics (3) Heuristics for other traffic (F1-F10) F5 and F6: Messenger and Gaming Hosts, connected to by a number of different IPs on wellknown messenger, chat or gaming ports within a period of 10 days All traffic to and from these hosts is messenger or gaming F7: FTP Active FTP with initiating port number of 20 F8: non P2P ports: Some well-known, privileged port numbers, typically not used by P2P like dns, telnet, ssh, ftp, mail, rtp, bgp
Methodology: Proposed Heuristics (3) Heuristics for other traffic (F1-F10) F9: malicious and attack traffic Scans through IP ranges Scans through port ranges DoS or hammering attacks to few hosts in high frequency F10: unclassified, known non-p2p Port unclassified by H1-H4 and F1-F9 (no connection pattern) Well known ports including Web, messenger and gaming
Verification of proposed heuristic Comparison of classification for P2P traffic # connections in 10 6 Amount of data in TB
Results: Traffic Volumes Application breakdown April 2006
Results: Traffic Volumes (2) Application breakdown April till Nov. 2006
Results: Diurnal Patterns Fractions of P2P data, April till November 1 0.9 0.8 0.7 0.6 0.5 0.4 Linear (2AM P2P data) Linear (10AM P2P data) Linear (14PM P2P data) Linear (20PM P2P data ) 0.3 0.2 0.1 0 1143000000 1148000000 1153000000 1158000000 1163000000
Results: Signaling Behavior Connection establishment for P2P, Web and malicious traffic
Summary of Results Traffic is increasing for TCP and UDP Highest activity during evening hours P2P dominating (~90 % of data volume) P2P peak time at evening and night-time Web peak time during office hours Fractions of P2P and Web constant Malicious traffic constant in absolute numbers 'background noise'
Summary of Results (2) Major differences in signaling behavior 43% of TCP P2P connections 1-packet flows (attempts) 80% of malicious TCP traffic 1-packet flows (scans) Web traffic behaving nicely Different TCP options deployment P2P behaves as expected Web traffic shows artifacts of client-server patter e.g. popular web-servers neglecting SACK option
References W. John and S. Tafvelin, Analysis of Internet Backbone Traffic and Anomalies observed, ACM IMC07, San Diego, USA, 2007. W. John and S. Tafvelin, Differences between in- and outbound Internet Backbone Traffic, TNC07, Copenhagen, DK, 2007. Available on: http://www.ce.chalmers.se/~johnwolf W. John and S. Tafvelin, Heuristics to Classify Internet Backbone Traffic based on Connection Patterns, accepted at IEEE ICOIN08 W. John and S. Tafvelin and Tomas Olovsson, Trends and Differences in Connection Behavior within Classes of Internet Backbone Traffic, submitted for publication Available on request: johnwolf@ce.chalmers.se or as Paper copy
MonNet a project for network and traffic monitoring Thank you very much for you attention! Questions?