Traffic Monitoring and Application Classification: A Novel Approach



Similar documents
Network Monitoring Using Traffic Dispersion Graphs (TDGs)

Network Monitoring using Traffic Dispersion Graphs (TDGs)

Trends and Differences in Connection-behavior within Classes of Internet Backbone Traffic

Keywords Attack model, DDoS, Host Scan, Port Scan

An apparatus for P2P classification in Netflow traces

Encrypted Internet Traffic Classification Method based on Host Behavior

Traffic Analysis of Mobile Broadband Networks

Analysis of Communication Patterns in Network Flows to Discover Application Intent

How is SUNET really used?

Lehrstuhl für Informatik 4 Kommunikation und verteilte Systeme. Firewall

Lightweight Detection of DoS Attacks

INTRODUCTION TO FIREWALL SECURITY

Network Monitoring and Management NetFlow Overview

Port evolution: a software to find the shady IP profiles in Netflow. Or how to reduce Netflow records efficiently.

Internet Firewall CSIS Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS net15 1. Routers can implement packet filtering

NetFlow use cases. ICmyNet / NetVizura. Miloš Zeković, milos.zekovic@soneco.rs. ICmyNet Chief Customer Officer Soneco d.o.o.

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA

The Internet Is Like A Jellyfish

A host-based firewall can be used in addition to a network-based firewall to provide multiple layers of protection.

CS 665: Computer System Security. Network Security. Usage environment. Sources of vulnerabilities. Information Assurance Module

Flow Analysis Versus Packet Analysis. What Should You Choose?

Introduction to Netflow

NetFlow Tracker Overview. Mike McGrath x ccie CTO mike@crannog-software.com

IBM. Vulnerability scanning and best practices

Security Type of attacks Firewalls Protocols Packet filter

CSE 4482 Computer Security Management: Assessment and Forensics. Protection Mechanisms: Firewalls

Security Toolsets for ISP Defense

Network Security. Chapter 3. Cornelius Diekmann. Version: October 21, Lehrstuhl für Netzarchitekturen und Netzdienste Institut für Informatik

What is a Firewall? A choke point of control and monitoring Interconnects networks with differing trust Imposes restrictions on network services

Research on Errors of Utilized Bandwidth Measured by NetFlow

Netflow Overview. PacNOG 6 Nadi, Fiji

Signature-aware Traffic Monitoring with IPFIX 1

CSE331: Introduction to Networks and Security. Lecture 12 Fall 2006

Case Study: Instrumenting a Network for NetFlow Security Visualization Tools

Network Security. Vorlesung Kommunikation und Netze SS 10 E. Nett

How Cisco IT Protects Against Distributed Denial of Service Attacks

Cisco IOS Flexible NetFlow Technology

Network Management & Monitoring

Classification of Firewalls and Proxies

Wharf T&T Limited DDoS Mitigation Service Customer Portal User Guide

Firewall Firewall August, 2003

Chapter 8 Security Pt 2

NVisionIP: An Interactive Network Flow Visualization Tool for Security

Internet2 NetFlow Weekly Reports

Network Security Monitoring and Behavior Analysis Pavel Čeleda, Petr Velan, Tomáš Jirsík

Firewalls. Network Security. Firewalls Defined. Firewalls

Cisco PIX vs. Checkpoint Firewall

CISCO IOS NETFLOW AND SECURITY

HP Intelligent Management Center v7.1 Network Traffic Analyzer Administrator Guide

Firewall. User Manual

Flow-based detection of RDP brute-force attacks

Internet Protocol: IP packet headers. vendredi 18 octobre 13

Computer Security CS 426 Lecture 36. CS426 Fall 2010/Lecture 36 1

Overview. Packet filter

Firewall Introduction Several Types of Firewall. Cisco PIX Firewall

IMPLEMENTATION OF INTELLIGENT FIREWALL TO CHECK INTERNET HACKERS THREAT

Hadoop Technology for Flow Analysis of the Internet Traffic

A Preliminary Performance Comparison of Two Feature Sets for Encrypted Traffic Classification

Server Iron Hands-on Training

co Characterizing and Tracing Packet Floods Using Cisco R

Network Security. Tampere Seminar 23rd October Overview Switch Security Firewalls Conclusion

Distributed Denial of Service (DDoS)

Chapter 1 Personal Computer Hardware hours

Analysis of Network Packets. C DAC Bangalore Electronics City

Emerald. Network Collector Version 4.0. Emerald Management Suite IEA Software, Inc.

PEER-TO-PEER NETWORK

CYBER SCIENCE 2015 AN ANALYSIS OF NETWORK TRAFFIC CLASSIFICATION FOR BOTNET DETECTION

Networking for Caribbean Development

Multi-layer switch hardware commutation across various layers. Mario Baldi. Politecnico di Torino.

Firewalls CSCI 454/554

INCREASE NETWORK VISIBILITY AND REDUCE SECURITY THREATS WITH IMC FLOW ANALYSIS TOOLS

Outline. Outline. Outline

Layered protocol (service) architecture

Appmon: An Application for Accurate per Application Network Traffic Characterization

1. The Web: HTTP; file transfer: FTP; remote login: Telnet; Network News: NNTP; SMTP.

Analyzing Network Traffic to Detect Spamming Machines

Computer Networks & Security 2014/2015

Radware s Behavioral Server Cracking Protection

Internet Security Firewalls

Attack and Defense Techniques 2

An Introduction to Network Vulnerability Testing

Network Security: Workshop. Dr. Anat Bremler-Barr. Assignment #2 Analyze dump files Solution Taken from

Transcription:

Traffic Monitoring and Application Classification: A Novel Approach Michalis Faloutsos, UC Riverside QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture. Thomas Karagiannis Marios Iliofotou 1

General Problem Definition We don t know what goes on in the network Measure and monitor: Who uses the network? For what? How much file-sharing is there? Can we observe any trends? Security questions: Have we been infected by a virus? Is someone scanning our network? Am I attacking others? M. Faloutsos UCR 2

Problem in More Detail Given network traffic in terms of flows Flow: tuple (source IP, port; dest IP, port; protocol) Flow statistics: packet sizes, interarrival etc Find which application generates each flow Or which flows are P2P Or detect viruses/worms Issues: Definition of flow hides subtleties Monitoring tools, netflow, provide this M. Faloutsos UCR 3

State of the Art Approaches Port-based: some apps use the same port Works well for legacy applications, but not for new apps Statistics-based methods: Measure packet and flow properties Packet size, packet interarrival time etc Number of packets per flow etc Create a profile and classify accordingly Weakness: Statistical properties can be manipulated Packet payload based: Match the signature of the application in payload Weakness Require capturing the packet load (expensive) Identifying the signature is not always easy IP blacklist/whitelist filtering M. Faloutsos UCR 4

Our Novelty, Oversimplified We capture the intrinsic behavior of a user Who talks to whom Benefits: Provides novel insight Is more difficult to fake Captures intuitively explainable patterns Claim: our approach can give rise to a new family of tools M. Faloutsos UCR 5

How our work differs from others Previous work Our work BLINC: Profile behavior of user (host level) TDGs: Profile behavior of the whole network (network level) M. Faloutsos UCR 6

Motivation: People Really Care We started by measuring P2P traffic which explicitly tries to hide Karagiannis (UCR) at CAIDA, summer 2003 How much P2P traffic is out there? RIAA claimed a drop in 2003 We found a slight increase "Is P2P dying or just hiding?" Globecom 2004 M. Faloutsos UCR 7

The Reactions RIAA did not like it Respectfully said that we don t know what we are doing The P2P community loved it Without careful scrutiny of our method M. Faloutsos UCR 8

More People Got Interested Wired: ``Song-Swap Networks Still Humming" on Karagiannis work. ACM news, PC Magazine, USA Today, Congressional Internet Caucus (J. Kerry!) In litigation docs as supporting evidence! M. Faloutsos UCR 9

Structure of the talk Part I: BLINC: A host-based approach for traffic classification Part II: Monitoring using the network-wide behavior: Traffic Dispersion Graphs, TDGs M. Faloutsos UCR 10

Part I: BLINC Traffic classification The goal: Classify Internet traffic flows according to the applications that generate them Not as easy as it sounds: Traffic profiling based on TCP/UDP ports Misleading Payload-based classification Practically infeasible (privacy, space) Can require specialized hardware Joint Work with: Thomas Karagiannis, UC Riverside/ Microsoft Konstantina Papagiannaki, Nina Taft, Intel M. Faloutsos UCR 11

The State of the Art Recent research approaches Statistical/machine-learning based classification Roughan et al., IMC 04 McGregor et al., PAM 05 Moore et al., SIGMETRICS 05 Signature based Varghese, Fingerhut, Bonomi, SIGCOMM 06 Bonomi, et al. SIGCOMM 06 IP blacklist/whitelist filtering to block bad traffic Soldo+, Markopoulou, ITA 08 UCR/CAIDA a systematic study in progress: What works, under which conditions, why? M. Faloutsos UCR 12

Our contribution: BLINC BLINd Classification ie without using payload We present a fundamentally different in the dark approach We shift the focus to the host We identify signature communication patterns Difficult to fake M. Faloutsos UCR 13

BLINC overview Characterize the host Insensitive to network dynamics (wire speed) Deployable: Operates on flow records Input from existing equipment Three levels of classification Social : Popularity Functional : Consumer/provider of services Application : Transport layer interactions M. Faloutsos UCR 14

Social Level Social: Popularity Bipartite cliques Gaming communities identified by using data mining: fully automated crossassociation Chakrabarti et al KDD 2004 (C. Faloutsos CMU) M. Faloutsos UCR 15

Functional level Functional: Infer role of node Server Client Collaborator One way: #source ports vs. # of flows M. Faloutsos UCR 16

Social level Characterization of the popularity of hosts Two ways to examine the behavior: Based on number of destination IPs Analyzing communities M. Faloutsos UCR 17

Social level: Identifying Communities Find bipartite cliques M. Faloutsos UCR 18

Social Level: What can we see Perfect bipartite cliques Attacks Partial bipartite cliques Collaborative applications (p2p, games) Partial bipartite cliques with same domain IPs Server farms (e.g., web, dns, mail) M. Faloutsos UCR 19

Social Level: Finding communities in practice Gaming communities identified by using data mining: fully automated cross-association Chakrabarti et al KDD 2004 (C. Faloutsos CMU) M. Faloutsos UCR 20

Functional level Characterization based on tuple (IP, Port) Three types of behavior Client Server Collaborative M. Faloutsos UCR 21

Functional level: Characterizing the host Y-axis: number of source ports X-axis: number of flows Collaborative applications: No distinction between servers and clients Clients Servers Obscure behavior due to multiple mail protocols and passive ftp M. Faloutsos UCR 22

Application level Interactions between network hosts display diverse patterns across application types. We capture patterns using graphlets: Most typical behavior Relationship between fields of the 5-tuple M. Faloutsos UCR 23

Application level: Graphlets sourceip destinationip sourceport destinationport 445 135 Capture the behavior of a single host (IP address) Graphlets are graphs with four columns : src IP, dst IP, src port and dst port Each node is a distinct entry for each column E.g. destination port 445 Lines connect nodes that appear on the same flow M. Faloutsos UCR 24

Graphlet Generation (FTP) sourceip destinationip sourceport destinationport X ZY 21 20 10002 10001 3000 X Z Y 1026 21 20 10002 3001 3000 X UZ 1026 21 5000 3001 3000 X U Z 1026 20 21 5005 5000 3001 X X Y Z U 20 21 1026 10002 5005 5000 10001 3000 3001 M. Faloutsos UCR 25

What can Graphlets do for us? Graphlets are a compact way to profile of a host capture the intrinsic behavior of a host Premise: Hosts that do the same, have similar graphlets Approach Create graphlet profiles Classify new hosts if they match existing graphlets M. Faloutsos UCR 26

Training Part: Create a Graphlet Library M. Faloutsos UCR 27

Additional Heuristics In comparing graphlets, we can use other info: the transport layer protocol (UDP or TCP). the relative cardinality of sets. the communities structure: If X and Y talk to the same hosts, X and Y may be similar Follow this recursively Other heuristics: Using the per-flow average packet size Recursive (mail/dns servers talk to mail/dns servers, etc.) Failed flows (malware, p2p) M. Faloutsos UCR 28

Evaluating BLINC We use real network traces Data provided by Intel: Residential (Web, p2p) Genome campus (ftp) Train BLINC on a small part of the trace Apply BLINC on the rest of the trace M. Faloutsos UCR 29

Compare with what? Develop a reference point Collect and analyze the whole packet Classification based on payload signatures Not perfect but nothing better than this M. Faloutsos UCR 30

Classification Results Metrics Completeness Percentage classified by BLINC relative to benchmark Do we classify most traffic? Accuracy Percentage classified by BLINC correctly When we classify something, is it correct? Exclude unknown and nonpayload flows M. Faloutsos UCR 31

Classification results : Totals 80%-90% completeness! >90% accuracy!! BLINC works well M. Faloutsos UCR 32

Characterizing the unknown: Non-payload flows BLINC is not limited by non-payload flows or unknown signatures Flows classified as attacks reveal known exploits M. Faloutsos UCR 33

BLINC issues and limitations How do we compare graphlets? Graph similarity is difficult to define Currently, based on heuristics and training What if a node runs two apps at the same time? Extensibility Creating and incorporating new graphlets Application sub-types e.g., BitTorrent vs. Kazaa Access vs. Backbone networks? Works better for access networks (e.g. campus) M. Faloutsos UCR 34

Developing a Useable Tool Java front-end by Dhiman Barman UCR M. Faloutsos UCR 35

Follow up work: Profiling the end user We examine the dynamics of profiling How much variability exists Per node over time Among nodes in a network How can I summarize a graphlet So that I can compare it with others? The answers in PAM 2007 M. Faloutsos UCR 36

Conclusions - I We shift the focus from flows to hosts Capture the intrinsic behavior of a host Multi-level analysis: each level provides more detail Good results in practice: BLINC classifies 80-90% of the traffic with greater than 90% accuracy M. Faloutsos UCR 37

Part II: Traffic Dispersion Graphs Monitoring traffic as a network-wide phenomenon Paper at Internet Measurement Conference (IMC) 2007 Joint work with: Marios Iliofotou UC Riverside, G. Varghese UCSD Prashanth Pappu, Sumeet Singh (Cisco) M. Mitzenmacher (Harvard) M. Faloutsos UCR 38

Traffic Dispersion Graphs Virus signature Traffic Dispersion Graphs: Who talks to whom Deceptively simple definition Provides powerful visualization and novel insight M. Faloutsos UCR 39

Defining TDGs A node is an IP address (host, user) A key issue: define an edge (Edge filter) Edge can represent different communications Simplest: edge = the exchange of any packet Edge Filter can be more involved: A number of pkts exchanged TCP with SYN flag set (initiating a TCP connection) sequence of packets (e.g., TCP 3-way handshake) Payload properties such as a content signature M. Faloutsos UCR 40

Generating a TDG Pick a monitoring point (router, backbone link) Select an edge filter Edge Filter = What constitutes an edge in the graph? E.g., TCP SYN Dst. Port 80 If a packet satisfies the edge filter, create the link srcip dstip Gather all the links and generate a graph within a time interval, e.g., 300 seconds (5 minutes) M. Faloutsos UCR 41

TDGs are a New Kind of Beast TDGs are Directed graphs Time evolving Possibly disconnected TDGs are not yet another scalefree graph TDGs are not a single family of graphs TDGs with different edge filters are different TDGs hide a wealth of information Make cool visualizations Can be mined to provide novel insight M. Faloutsos UCR 42

TDGs and Preliminary Results We focus on studying port-based TDGs Even that can give interesting information We study destination ports of known applications: UDP ports: we generate an edge based on the first packet between two hosts TCP we add an edge on a TCP SYN packet for the corresponding destination port number e.g., port 80 for HTTP, port 25 for SMTP etc. M. Faloutsos UCR 43

Data Used Real Data: typical duration = 1 hour OC48 from CAIDA (22 million flows, 3.5 million IPs) Abilene Backbone (23.5 million flows, 6 million IPs) WIDE Backbone (5 million flows, 1 million IPs) Access links traces (University of Auckland) + UCR traces were studied but not shown here (future work) M. Faloutsos UCR 44

TDGs as a Visualization Tool M. Faloutsos UCR 45

Identifying Hierarchies SMTP (email) DNS Hierarchical structure with multiple levels of hierarchy M. Faloutsos UCR 46

Web Traffic Web: https Web: port 8080 M. Faloutsos UCR 47

TDG Visualizations (Peer-to-Peer) WinMX P2P App UDP Dst. Port 6257 15 sec Observations Many nodes with inand-out degree (InO) One large connected component Long chains InO degree Bidirectional Zoom M. Faloutsos UCR 48

Detecting Viruses and Unusual Activities Random IP range scanning activity? Slammer: port 1434 NetBIOS: port 137 M. Faloutsos UCR 49

Visually detecting virus activity Virus signature Virus (slammer) creates more star configurations Directivity makes it clearer Center node -> nodes, for virus stars M. Faloutsos UCR 50

Quantitative Study of TDGs M. Faloutsos UCR 51

Using Graph Metrics We use new and commonly used metrics Degree distribution Giant Connected Component Largest connected subgraph Number of connected components In-Out nodes Node with in- and out- edges Joint Degree Distribution M. Faloutsos UCR 52

Degree Distribution P(X x) P(X x) P(X x) Degree Degree Degree The degree distributions of TDGs varies a lot. Only some distributions can be modeled by power-laws (HTTP, DNS). P2P communities (edonkey) have many medium degree nodes (4 to 30). HTTP and DNS have few nodes with very high degrees. NetBIOS: Scanning activity: 98% of nodes have degree of one, few nodes with very high degree scanners M. Faloutsos UCR 53

Joint Degree Distribution (JDD) 11 1 10 11 10 9 1 1 8 6 9 7 6 1 2 5 5 3 4 2 5 3 2 3 5 2 1 1 2 3 4 5 6 7 8 9 1 0 1 1 JDD: P(k1,k2), the probability that a randomly selected edge connects nodes of degrees k1 and k2 Normalized by the total Number of links M. Faloutsos UCR 54

Joint Degree Distribution (JDD) HTTP (client-server) WinMX (peer-to-peer) DNS (c-s and p2p) Couture plots (log-log scale due to high variability) x-axis: Degree of the node on the one end of the link y-axis: Degree of the other node Observations: HTTP: low degree client to low to high degree servers WinMX: medium degree nodes are connected DNS: sings of both client server and peer-to-peer behavior Top degree nodes are not directly connected (top right corner) M. Faloutsos UCR 55

TDGs Can Distinguish Applications Monitor the top 10 ports number in number of flows. Scatter Plot: Size of GCC Vs number of connected components. Stable over Time! We can separate apps! OC48 Trace Soribada UDP port 22321 UDP port 7674 WinMX UDP port 6257 M. Faloutsos UCR 56 TCP port 1433 edonkey TCP port 4662 UDP port 4665 NetBIOS UDP port 137 MS-SQL-S

TDGs as a Monitoring/Security Tool Two modes of operation: Classification: based on previously observed thresholds. Security: calculate TDGs and trigger an alarm on large change How do we choose which TDGs to monitor? Manually, Automatically-adaptively, Using automatically extracted signatures of content (Earlybird) M. Faloutsos UCR 57

Final Conclusions The behavior of hosts hides a information Studying the transport-layer can provide insight We can do this at two levels Host level using using BLINC Network-wide level using TDGs Advantages: More difficult to fake More intuitive to interpret and deploy It can be used to monitor and secure M. Faloutsos UCR 58

My Areas of Research Measurements and models for the Internet Network Topology: models and patterns [ToN03, CSB06, NSDI07] Traffic monitoring: models and classification [sigcomm05] [PAM07] Routing Security Modeling and Securing BGP routing NEMECIS: [Infocom04, 07] Adhoc routing security: [ICNP 06][ICNP07] Quantifying and protecting against URL hijacking [miniinfocom08] Design and capacity of WLANs and hybrid nets [mobicom07, infocom08] DART: A radical network layer for ad hoc [Infocom04] [ToN06] Cooperative Diversity in ad hoc networks [JSAC06, Infocom06] M. Faloutsos UCR 59

Extras M. Faloutsos UCR 60

Main research areas Measurements Traffic, BGP routing and topology, ad hoc Routing scalable ad hoc, BGP instability Security DoS, BGP attacks, ad hoc DoS Designing the future network Rethinking the network architecture M. Faloutsos UCR 61

TDG Visualization (DNS) DNS TDG UDP Dst. Port 53 5 seconds In- and Outdegree nodes Very common in DNS, presence of few very high degree node One large Connected Component! (even in such small interval) M. Faloutsos UCR 62

TDG Visualization (HTTP)HTTP TDG TCP SYN Dst. Port 80 30 seconds A busy web server? Observations There is not a large connected component as in DNS Clear roles very few nodes with inand-out degrees) Web caches? Web proxies? Many disconnected components M. Faloutsos UCR 63

TDG Visualization (Slammer Worm) Slammer Worm UDP Dst. port 1434 10 seconds About: Jan 25, 2003. MS-SQL- Server 2000 exploit. Trace: April 24th Observations (Scanning Activity) Many high out-degree nodes Many disconnected components The majority of nodes have only in-degree (nodes being scanned) M. Faloutsos UCR 64