Statistical Approaches for Network Anomaly Detection

Similar documents

On the Use of Compression Algorithms for Network Traffic Classification

On Entropy in Network Traffic Anomaly Detection

Detecting Network Anomalies. Anant Shah

Module II. Internet Security. Chapter 7. Intrusion Detection. Web Security: Theory & Applications. School of Software, Sun Yat-sen University

Intruders and viruses. 8: Network Security 8-1

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Development of a Network Intrusion Detection System

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection

STUDY OF IMPLEMENTATION OF INTRUSION DETECTION SYSTEM (IDS) VIA DIFFERENT APPROACHS

CS 356 Lecture 17 and 18 Intrusion Detection. Spring 2013

A Review of Anomaly Detection Techniques in Network Intrusion Detection System

TELEMETRY NETWORK INTRUSION DETECTION SYSTEM

Conclusions and Future Directions

Image Compression through DCT and Huffman Coding Technique

Network Security: Attacks and Defenses

CSCI 4250/6250 Fall 2015 Computer and Networks Security

Layered Approach of Intrusion Detection System with Efficient Alert Aggregation for Heterogeneous Networks

Intrusion Detection. Overview. Intrusion vs. Extrusion Detection. Concepts. Raj Jain. Washington University in St. Louis

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Role of Anomaly IDS in Network

Intruders & Intrusion Hackers Criminal groups Insiders. Detection and IDS Techniques Detection Principles Requirements Host-based Network-based

Intrusion Detection: Game Theory, Stochastic Processes and Data Mining

Firewalls, Tunnels, and Network Intrusion Detection

Adaptive Anomaly Detection for Network Security

Firewalls, Tunnels, and Network Intrusion Detection. Firewalls

HYBRID INTRUSION DETECTION FOR CLUSTER BASED WIRELESS SENSOR NETWORK

Intrusion Detection. Tianen Liu. May 22, paper will look at different kinds of intrusion detection systems, different ways of

Intrusion Detection Systems

Science Park Research Journal

AUTONOMOUS NETWORK SECURITY FOR DETECTION OF NETWORK ATTACKS

Evaluating Intrusion Detection Systems without Attacking your Friends: The 1998 DARPA Intrusion Detection Evaluation

Fuzzy Network Profiling for Intrusion Detection

Detecting Flooding Attacks Using Power Divergence

An Efficient Way of Denial of Service Attack Detection Based on Triangle Map Generation

Intrusion Detection for Mobile Ad Hoc Networks

A TWO LEVEL ARCHITECTURE USING CONSENSUS METHOD FOR GLOBAL DECISION MAKING AGAINST DDoS ATTACKS

IntruPro TM IPS. Inline Intrusion Prevention. White Paper

An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework

Detecting Anomalies in Network Traffic Using Maximum Entropy Estimation

A Comparison of Four Intrusion Detection Systems for Secure E-Business

How To Prevent Network Attacks

Social Media Mining. Data Mining Essentials

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Adaptive Network Intrusion Detection System using a Hybrid Approach

Network Intrusion Detection Systems

Intrusion Detection Using Data Mining Along Fuzzy Logic and Genetic Algorithms

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

INTRUSION DETECTION SYSTEMS and Network Security

Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis

An Efficient and Reliable DDoS Attack Detection Using a Fast Entropy Computation Method

CHAPTER 1 INTRODUCTION

Storage Optimization in Cloud Environment using Compression Algorithm

Introduction of Intrusion Detection Systems

A Frequency-Based Approach to Intrusion Detection

Speedy Signature Based Intrusion Detection System Using Finite State Machine and Hashing Techniques

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

Combining Statistical and Spectral Analysis Techniques in Network Traffic Anomaly Detection

Network Security 網路安全. Lecture 1 February 20, 2012 洪國寶

Statistical Machine Learning

Analyzing TCP Traffic Patterns Using Self Organizing Maps

Mining and Detecting Connection-Chains in Network Traffic

SURVEY OF INTRUSION DETECTION SYSTEM

A Review on Network Intrusion Detection System Using Open Source Snort

How To Detect Denial Of Service Attack On A Network With A Network Traffic Characterization Scheme

Performance Evaluation of Intrusion Detection Systems

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Taxonomy of Intrusion Detection System

Fuzzy Network Profiling for Intrusion Detection

Web Forensic Evidence of SQL Injection Analysis

Efficient Security Alert Management System

SY system so that an unauthorized individual can take over an authorized session, or to disrupt service to authorized users.

IDS / IPS. James E. Thiel S.W.A.T.

Joint Entropy Analysis Model for DDoS Attack Detection

A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique

NETWORK INTRUSION DETECTION SYSTEM USING HYBRID CLASSIFICATION MODEL

INTRUSION DETECTION SYSTEM FOR WEB APPLICATIONS WITH ATTACK CLASSIFICATION

Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering

IDS IN TELECOMMUNICATION NETWORK USING PCA

Final exam review, Fall 2005 FSU (CIS-5357) Network Security

Data Mining For Intrusion Detection Systems. Monique Wooten. Professor Robila

7 Network Security. 7.1 Introduction 7.2 Improving the Security 7.3 Internet Security Framework. 7.5 Absolute Security?

Distributed Denial of Service (DDoS)

Hybrid Model For Intrusion Detection System Chapke Prajkta P., Raut A. B.

Network Security. Chapter 9. Attack prevention, detection and response. Attack Prevention. Part I: Attack Prevention

Intrusion Detection via Machine Learning for SCADA System Protection

A SYSTEM FOR DENIAL OF SERVICE ATTACK DETECTION BASED ON MULTIVARIATE CORRELATION ANALYSIS

Mahalanobis Distance Map Approach for Anomaly Detection

Transcription:

Statistical Approaches for Network Anomaly Detection Christian CALLEGARI Department of Information Engineering University of Pisa ICIMP Conference 9, May 2009 Barcelona Spain

Short Bio Post-Doctoral Fellow with the Telecommunication Network research group at the Dept. of Information Engineering of the University of Pisa B.E. degree in 2002 from the University of Pisa, discussing a thesis on Network Firewalls M.S. degree in 2004 from the University of Pisa, discussing a thesis on Network Simulation PhD in 2008 from the University of Pisa, discussing a thesis on Network Anomaly Detection Contacts Dept. of Information Engineering Via Caruso 16-56122 Pisa - Italy christian.callegari@iet.unipi.it C. Callegari Anomaly Detection 2 / 169

Acknowledgments I d like to thank some colleagues of mine for their contribution in support of this work: Michele Pagano (Associate Professor) Teresa Pepe (PhD Student) Loris Gazzarrini (Master Student) C. Callegari Anomaly Detection 3 / 169

What about you? C. Callegari Anomaly Detection 4 / 169

Outline 1 Introduction 2 Intrusion Detection Expert System 3 Statistical Anomaly Detection 4 Clustering 5 Markovian Models 6 Entropy-based Methods 7 Sketch 8 Principal Component Analysis 9 Wavelet Analysis C. Callegari Anomaly Detection 5 / 169

Outline Introduction 1 Introduction Motivations Taxonomy of the Intrusion Detection Systems Some Useful Definitions Evaluation Data-set 2 Intrusion Detection Expert System 3 Statistical Anomaly Detection 4 Clustering 5 Markovian Models 6 Entropy-based Methods 7 Sketch 8 Principal Component Analysis C. Callegari Anomaly Detection 6 / 169

Introduction Motivations Why an intrusion detection system? Network security mainly means PREVENTION Physical protection for hardware Passwords, access tokens, etc. for authentication Access control list for authorization Cryptography for secrecy Backups and redundancy for authenticity... and so on BUT...... Absolute security cannot be guaranteed! C. Callegari Anomaly Detection 7 / 169

Introduction Motivations What is an Intrusion Detection System? Prevention is suitable when Internal users are trusted Limited interaction with other networks Need for a system which acts when prevention fails Intrusion Detection System An intrusion detection system (IDS) is a software/hardware tool used to detect unauthorized accesses to a computer system or a network C. Callegari Anomaly Detection 8 / 169

Introduction A taxonomy of the intruders Motivations Intruders can be classified as Masquerader: an individual who is not authorized to use the computer and who penetrates a system s access control to exploit a legitimate user s account Misfeasor: a legitimate user who accesses data, programs, or resources for which such access is not authorized, or who is authorized for such access, but misuses his/her privileges Clandestine User: an individual who seizes supervisory control of the system and uses the control to evade auditing and access controls or to suppress audit collection C. Callegari Anomaly Detection 9 / 169

Introduction A taxonomy of the intrusions Intrusions can be classified as Motivations Eavesdropping and Packet Sniffing: passive interception of network traffic Snooping and Downloading Tampering and Data Diddling: unauthorized changes to data or records Spoofing: impersonating other users Jamming or Flooding: overwhelming a system s resources Injecting Malicious Code Exploiting Design or Implementation Flaws (e.g., buffer overflow) Cracking Passwords and Keys C. Callegari Anomaly Detection 10 / 169

IDS Taxonomy Introduction IDS Taxonomy Intrusion Detection Systems are classified on the basis of several criteria: 1 Scope Host IDS (HIDS) Network IDS (NIDS) 2 Architecture Centralized Distributed 3 Analysis Techniques Stateful Stateless 4 Detection Techniques Misuse Based IDS Anomaly Based IDS C. Callegari Anomaly Detection 11 / 169

Introduction Host based vs. Network based IDS Taxonomy Host based IDS Aimed at detecting attacks related to a specific host Architecture/Operating System dependent Processing of high level information (e.g. system calls) Effective in detecting insider misuse Network based IDS Aimed at detecting attacks towards hosts connected to a LAN Architecture/Operating System independent Processing data at lower level of granularity (packets) Effective in detecting attacks from the outside C. Callegari Anomaly Detection 12 / 169

Introduction IDS Taxonomy Centralized IDS vs. Distributed IDS Centralized IDS All the operations are performed by the same machine More simple to realize Only one point of failure Distributed IDS Composed of several components Sensors which generate security events Console to monitor events and alerts and control the sensors Central Engine that records events and generate alarms May need to deal with different data formats Need of a secure communication protocol (IPFIX) C. Callegari Anomaly Detection 13 / 169

Introduction Stateless IDS vs. Stateful IDS IDS Taxonomy Stateless IDS Treats each event independently of the others Simple system design High processing speed Stateful IDS Maintains information about past events The effect of a certain event depends on its position in the events stream More complex system design More effective in detecting distributed attacks C. Callegari Anomaly Detection 14 / 169

Introduction IDS Taxonomy Misuse based IDS vs. Anomaly based IDS Misuse based IDS Identifies intrusion by looking for patterns of traffic or of application data presumed to be malicious Pattern of misuses are stored in a database Effective in detecting only known attacks Anomaly based IDS Identifies intrusions by classifying activity as either anomalous or normal Needs a training phase to recognize normal activity Able to detect new attacks Generates more false alarms than a misuse based IDS C. Callegari Anomaly Detection 15 / 169

Introduction Attacks State of the Art IDS Taxonomy C. Callegari Anomaly Detection 16 / 169

IDS State of the Art Introduction IDS Taxonomy BUT... Focus is on Network based IDSs (The only ones effective in detecting Distributed Denial of Service - DDoS) State of the art IDSs are Misuse Based Most attacks are realized by means of software tools available on the Internet Most attacks are well-known attacks... The most dangerous attacks are those written ad hoc by the intruder! C. Callegari Anomaly Detection 17 / 169

The best choice? Introduction IDS Taxonomy Combined use of both HIDS (for insider attacks) & NIDS (for outsider attacks) Misuse IDS (low False Alarm rate) & Anomaly IDS (for new attacks) Stateless IDS (fast data process) & Stateful IDS (for complex attacks) Distributed IDS Not a single point of failure More effective in monitoring large networks C. Callegari Anomaly Detection 18 / 169

The best choice? Introduction IDS Taxonomy C. Callegari Anomaly Detection 19 / 169

Definitions Introduction Some Useful Definitions False Positive (FP): the error of rejecting a null hypothesis when it is actually true. In our case it implies the creation of an alarm in correspondence of normal activities False Negative (FN): the error of failing to reject a null hypothesis when it is in fact not true. In our case it corresponds to a missed detection C. Callegari Anomaly Detection 20 / 169

ROC Curve Introduction Some Useful Definitions Plots Detection Rate vs. False Positive Rate 1 0.9 0.8 0.7 Detection Rate 0.6 0.5 0.4 0.3 0.2 0.1 Non Stationary ECDF Non Homogeneous MC Homogeneous MC 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Alarm Rate C. Callegari Anomaly Detection 21 / 169

ROC Curve Introduction Some Useful Definitions Results presented by the ROC are often considered incomplete because they do not take into account the cost of missed attacks they do not take into account the cost of false alarms they do not say if the system itself is resistant to attacks... Several researchers are working on more complete ways of representing the results C. Callegari Anomaly Detection 22 / 169

Introduction DARPA Evaluation Program Evaluation Data-set The 1998/1999 DARPA/MIT IDS evaluation program is the most comprehensive evaluation performed to date It provides a corpus of data for the development, improvement, and evaluation of IDSs Different kind of data are available: Operating systems logs Network traffic Collected by an inside sniffer Collected by an outside sniffer The data model the network traffic measured between a US Air Force base and the Internet C. Callegari Anomaly Detection 23 / 169

The DARPA Network Introduction Evaluation Data-set C. Callegari Anomaly Detection 24 / 169

The DARPA Dataset Introduction Evaluation Data-set 5 weeks data Data from weeks 1 and 3 are attack free and can be used to train the system Data from week 2 contains labeled attacks and can be used to realize the signatures database Data from weeks 4 and 5 contains several attacks and can be used for the detection phase An Attack Truth list is provided Attacks are categorized as Denial of Service (DoS) User to Root (U2R) Remote to Local (R2L) Data Probe 177 instances of 59 different types of attacks C. Callegari Anomaly Detection 25 / 169

Other Data-sets Introduction Evaluation Data-set The DARPA data-set has many drawbacks: simulated environment not up-to-date traffic the methodology used for generating the traffic has been shown to be inappropriate for simulating actual networks Other Data-sets: several publicly available traffic traces e.g. CAIDA, Abilene (Internet2), GEANT,... no ground truth is provided! C. Callegari Anomaly Detection 26 / 169

References Introduction Evaluation Data-set Anderson, Computer Security Monitoring and Surveillance, Tech Rep 98-17, 1980 E. Millard, Internet attacks increase in number, severity, Top Tech News, 2005 C. Staf, Hackers: companies encounter rise of cyber extortion, vol. 2006, Computer Crime Research Center, 2005. S. Axelsson, Intrusion Detection Systems: A Survey and Taxonomy, Chalmers University, Technical Report 99-15, March 2000 W. Stallings, Cryptography and Network Security, Prentice Hall A. Patcha, J.M. Park, An overview of anomaly detection techniques: Existing solutions and latest technological trends, Computer Networks 51, 2008 C. Callegari Anomaly Detection 27 / 169

References Introduction Evaluation Data-set MIT, Lincoln laboratory, DARPA evaluation intrusion detection, http://www.ll.mit.edu/ist/ideval/ R. Lippmann, J. Haines, D. Fried, J. Korba, and K. Das, The 1999 DARPA off-line intrusion detection evaluation, Computer Networks 34, 2000 J. Haines, R. Lippmann, D. Fried, E. Tran, S. Boswell, and M. Zissman, 1999 DARPA intrusion detection system evaluation: Design and procedures, Tech. Rep. 1062, MIT Lincoln Laboratory, 2001 J. McHugh, Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection, ACM Transactions on Information and System Security 3, 2000 Christian Callegari, Stefano Giordano, Michele Pagano, New Statistical Approaches for Anomaly Detection, Security and Communication Networks, to appear C. Callegari Anomaly Detection 28 / 169

Outline IDES 1 Introduction 2 Intrusion Detection Expert System 3 Statistical Anomaly Detection 4 Clustering 5 Markovian Models 6 Entropy-based Methods 7 Sketch 8 Principal Component Analysis 9 Wavelet Analysis C. Callegari Anomaly Detection 29 / 169

A bit of History IDES The history of IDSs can be split in three main blocks 1 First Generation IDSs (end of the 1970s) The concept of IDS first appears in the 1970s and early 1980s (Anderson, Computer Security Monitoring and Surveillance, Tech Rep 1980) Focus on audit data of a single machine Post processing of data 2 Second Generation IDSs (1987) Intrusion Detection Expert System (Denning, An intrusion Detection Model, IEEE Trans. on Soft. Eng., 1987) Statistical analysis of data 3 Third Generation IDSs (to come) Focus on the network Real-time detection Real-time reaction Intrusion Prevention System C. Callegari Anomaly Detection 30 / 169

IDES IDES Model s components Subjects: initiators of activity on a target system Objects: resources managed by the system files, commands, etc. Audit Records: generated by the target system in response to actions performed or attempted by subjects Profiles: structures that characterize the behavior of subjects with respect to objects in terms of statistical metrics and models of observed activity Anomaly Records: generated when abnormal behavior is detected Activity Rules: actions taken when some condition is satisfied C. Callegari Anomaly Detection 31 / 169

Subjects and Objects IDES Subjects Objects Initiators of actions on the target system It is typically a terminal user They can be grouped into different categories Users groups may overlap Receptors of subjects actions If a subject is a recipient of actions (e.g. electronic mail), then is also considered to be a object Additional structures may be imposed (e.g. records may be grouped in database) Objects granularity depends on the environment C. Callegari Anomaly Detection 32 / 169

Audit Records IDES {Subject, Action, Object, Exception-Condition, Resource-Usage, Time-stamp} Action: operation performed by the subject on or with the object Exception-Condition: denotes which, if any, execution condition is raised on the return Resource-Usage: list of quantitative elements, where each element gives the amount of some resource Time-stamp: unique time/date stamp identifying when the action took place C. Callegari Anomaly Detection 33 / 169

Profiles IDES An activity profile characterizes the behavior of a given subject (or set of subjects) with respect to a given object, thereby serving as a signature or description of normal activity for its respective subject and object Observed behavior is characterized in terms of a statistical metric and model A metric is a random variable x representing a quantitative measure accumulated over a period Observations x i of x obtained from the audit records are used together with a statistical model to determine whether a new observation is abnormal The statistical models make no assumptions about the underlying distribution of x; all knowledge about x is obtained from the observations x i C. Callegari Anomaly Detection 34 / 169

Metrics and Models IDES Metrics Event counter Interval timer Resource measure Statistical models Operational model: abnormality is decided by comparison of x n with a fixed threshold Mean and standard deviation model: abnormality is decided by checking if x n falls inside the confidence interval Multivariate model: based on the correlations between two or more metrics Markov process model: based on the transition probabilities Time series model: takes into account order and inter-arrival time of the observations C. Callegari Anomaly Detection 35 / 169

Profile structure IDES {Variable-name, Action-pattern, Exception-pattern, Resource-usage-pattern, Period, Variable-type, Threshold, Subject-pattern, Object-pattern, Value} Variable-name Action-pattern: pattern that matches one or more actions in the audit records (e.g. login ) Exception-pattern: pattern that matches on the Exception-condition field of an audit record Resource-usage-pattern: pattern that matches on the Resource-usage field of an audit record Period: time interval for measurements Variable-type: name of abstract data type that defines a particular type of metric and statistical model (e.g. event counter with mean and standard deviation model) Threshold C. Callegari Anomaly Detection 36 / 169

Profile structure IDES {Variable-name, Action-pattern, Exception-pattern, Resource-usage-pattern, Period, Variable-type, Threshold, Subject-pattern, Object-pattern, Value} Subject-pattern: pattern that matches on the Subject field of an audit record Object-pattern: pattern that matches on the Object field of an audit record Value: value of current observation and parameters used by the statistical model to represent distribution of previous values There also is the possibility of defining profiles for classes C. Callegari Anomaly Detection 37 / 169

Profile templates IDES When user accounts and objects can be created dynamically, a mechanism is needed to generate activity profiles for new subjects and objects Manual create: the security officer explicitly creates all profiles Automatic explicit create: all profiles for a new user are generated in response to a create record in the audit trail First use: a profile is automatically generated when a subject (new or old) first uses an object (new or old) C. Callegari Anomaly Detection 38 / 169

Anomaly Records IDES {Event, Time-stamp, Profile} Event: indicates the event giving rise to the abnormality and is either audit, meaning the data in an audit record was found abnormal, or period, meaning the data accumulated over the current period was found abnormal Time-stamp: either the Time-stamp in the audit trail or interval stop time Profile: activity profile with respect to which the abnormality was detected C. Callegari Anomaly Detection 39 / 169

Activity Rules IDES A condition that, when satisfied, causes the rule to be fired, and a body, which specified the action to be taken Audit-record rule: triggered by a match between a new audit record and an activity profile, updates the profiles and checks for anomalous behavior Periodic-activity-update rule: triggered by the end of an interval matching the period component of an activity profile, updates the profiles and checks for anomalous behavior Anomaly-record rule: triggered by the generation of an anomaly record, brings the anomaly to the immediate attention of the security officer Periodic-anomaly-analysis rule: triggered by the end of an interval, generates summary reports of the anomalies during the current period C. Callegari Anomaly Detection 40 / 169

References IDES D. Denning, An intrusion detection model, IEEE Transactions Software Engineering, vol. SE-13, no.2, 1987 C. Callegari Anomaly Detection 41 / 169

Outline Statistical Anomaly Detection 1 Introduction 2 Intrusion Detection Expert System 3 Statistical Anomaly Detection 4 Clustering 5 Markovian Models 6 Entropy-based Methods 7 Sketch 8 Principal Component Analysis 9 Wavelet Analysis C. Callegari Anomaly Detection 42 / 169

Statistical Anomaly Detection Statistical Approach: Traffic Descriptors The goal is to identify some traffic parameters, which can be used to describe the network traffic and that vary significantly from the normal behavior to the anomalous one Some examples Packet length Inter-arrival time Flow size Number of packets per flow... and so on C. Callegari Anomaly Detection 43 / 169

Statistical Anomaly Detection Choice of the Traffic Descriptors For each parameter we can consider Mean Value Variance and higher order moments Distribution function Quantiles... and so on The number of potential traffic descriptors is huge (some papers identify up to 200 descriptors) GOAL To identify as few attack invariant descriptors as possible to classify traffic with an acceptable error rate C. Callegari Anomaly Detection 44 / 169

Outline Cluster 1 Introduction 2 Intrusion Detection Expert System 3 Statistical Anomaly Detection 4 Clustering Clustering Outliers Detection 5 Markovian Models 6 Entropy-based Methods 7 Sketch 8 Principal Component Analysis 9 Wavelet Analysis C. Callegari Anomaly Detection 45 / 169

Clustering Cluster Cluster Clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense Clustering is a method of unsupervised learning The clusters are computed on the basis of a distance measure, which will determine how the similarity of two elements is calculated Common distances are: Euclidean distance Manhattan distance Mahalanobis distance... C. Callegari Anomaly Detection 46 / 169

K-Means Algorithm Cluster Cluster The k-means algorithm assigns each point to the cluster whose center (also called centroid) is the nearest 1 Choose the number of clusters, k 2 Randomly generate k clusters and determine the cluster centers, or directly generate k random points as cluster centers 3 Assign each point to the nearest cluster center 4 Recompute the new cluster centers 5 Repeat the two previous steps until some convergence criterion is met (e.g., the assignment hasn t changed) C. Callegari Anomaly Detection 47 / 169

%&'($)*+(,-./0 Cluster ),/,(2&$)*34 Cluster K-Means Algorithm - An example Consider k = 2, choose 2 points (centroids), build 2 clusters C. Callegari Anomaly Detection 48 / 169

,/,(2&$)*34 Cluster Cluster K-Means Algorithm - An example Compute the new centroids C. Callegari Anomaly Detection 49 / 169

'($)*+(,-./0 Cluster /,(2&$)345 Cluster K-Means Algorithm - An example Build the new clusters C. Callegari Anomaly Detection 50 / 169

/,(2&$)345 Cluster Cluster K-Means Algorithm - An example Repeat last 2 steps, until a assignments don t change C. Callegari Anomaly Detection 51 / 169

Outliers Cluster Outliers Detection In statistics, an outlier is an observation that is numerically distant from the rest of the data Detection based on the full dimensional distances between the points as well as the densities of local neighborhoods There exist at least two approaches the anomaly detection model is trained using unlabeled data that consist of both normal as well as attack traffic the model is trained using only normal data and a profile of normal activity is created C. Callegari Anomaly Detection 52 / 169

Cluster Outliers Detection - Method 1 Outliers Detection The idea behind the first approach is that anomalous or attack data form a small percentage of the total data Anomalies and attacks can be detected based on cluster sizes large clusters correspond to normal data the rest of the data points, which are outliers, correspond to attacks C. Callegari Anomaly Detection 53 / 169

References Cluster Outliers Detection L. Portnoy, E. Eskin, S.J. Stolfo, Intrusion detection with unlabeled data using clustering, ACM Workshop on Data Mining Applied to Security, 2001 S. Ramaswamy, R. Rastogi, K. Shim, Efficient algorithms for mining outliers from large data sets, ACM SIGMOD International Conference on Management of Data, 2000 K. Sequeira, M. Zaki, ADMIT: Anomaly-based data mining for intrusions, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002 V. Barnett, T. Lewis, Outliers in Statistical Data, Wiley, 1994 C. Callegari Anomaly Detection 54 / 169

References Cluster Outliers Detection C.C. Aggarwal, P.S. Yu, Outlier detection for high dimensional data, ACM SIGMOD International Conference on Management of Data, 2001 M. Breunig, H.-P. Kriegel, R.T. Ng, J. Sander, LOF: identifying density-based local outliers, ACM SIGMOD International Conference on Management of Data, 2000 E.M. Knorr, R.T. Ng, Algorithms for mining distance-based outliers in large datasets, International Conference on Very Large Data Bases, 2008 P.C. Mahalanobis, On tests and measures of groups divergence, Journal of the Asiatic Society of Bengal 26, 1930 C. Callegari Anomaly Detection 55 / 169

Outline Markovian Models 1 Introduction 2 Intrusion Detection Expert System 3 Statistical Anomaly Detection 4 Clustering 5 Markovian Models First Order Homogeneous Markov Chains First Order Non Homogeneous Markov Chains High Order Homogeneous Markov Chains 6 Entropy-based Methods 7 Sketch 8 Principal Component Analysis C. Callegari Anomaly Detection 56 / 169

Markovian Models State Transition Analysis The approach was first proposed by Denning and developed in the 1990s. Mainly used in two distinct environment HIDS: to model the sequence of system commands used by a user NIDS: to model the sequence of some specific fields of the packet (e.g. the sequence of the flags values in a TCP connection) The most classical approach: Markov chains C. Callegari Anomaly Detection 57 / 169

Markovian Models Markov Chains and TCP Idea: Model TCP connections by means of Markov chains The IP addresses and the TCP port numbers are used to identify a connection State space is defined by the possible values of the TCP flags The value of the flags is used to identify the chain transitions A value S p is associated to each packet according to the rule S p = syn + 2 ack + 4 psh + 8 rst + 16 urg + 32 fin C. Callegari Anomaly Detection 58 / 169

Markovian Models First Order Homogeneous Markov Chains Markov Chain and TCP - Training phase Calculate the transition probabilities a ij = P[q t+1 = j q t = i] = P[q t = i, q t+1 = j] P[q t = i] Server side 3-way handshake psh flag closing SSH Markov Chain C. Callegari Anomaly Detection 59 / 169

Markovian Models First Order Homogeneous Markov Chains Markov Chain and TCP - Training phase Calculate the transition probabilities a ij = P[q t+1 = j q t = i] = P[q t = i, q t+1 = j] P[q t = i] Client side 3-way handshake ack flag closing FTP Markov Chain C. Callegari Anomaly Detection 60 / 169

Markovian Models First Order Homogeneous Markov Chains Markov Chain and TCP - Training phase Calculate the transition probabilities a ij = P[q t+1 = j q t = i] = P[q t = i, q t+1 = j] P[q t = i] 3-Way Handshake Syn Flood Attack SSH Markov Chain C. Callegari Anomaly Detection 61 / 169

Markovian Models First Order Homogeneous Markov Chains Markov Chain and TCP - Detection phase Given the observation (S 1, S 2,, S T ) The system has to decide between two hypothesis H 0 : normal behaviour H 1 : anomaly (1) A possible statistic is given by the logarithm of the Likelihood Function LogLF(t) = T +R t=r+1 Log(a St S t+1 ) Or by its temporal derivative D w (t) = LogLF (t) 1 W W LogLF(t i) i=1 C. Callegari Anomaly Detection 62 / 169

Markovian Models First Order Homogeneous Markov Chains Markov Chain and TCP - Detection phase C. Callegari Anomaly Detection 63 / 169

Markovian Models Non Homogeneous Markov Chain First Order Non Homogeneous Markov Chains First order homogeneous Markov chain P(C t = s i0 C t 1 = s i1, C t 2 = s i2, C t 3 = s i3, ) = P(C t = s i0 C t 1 = s i1 ) = P(C 0 = s i0 C 1 = s i1 ) = P(s i0 s i1 ) First order non-homogeneous Markov chain P(C t = s i0 C t 1 = s i1, C t 2 = s i2, C t 3 = s i3, ) = P(C t = s i0 C t 1 = s i1 ) = P t(s i0 s i1 ) We build a distinct Markov Chain for each connection step (first 10 steps) The model should better characterizes the setup and the release phases C. Callegari Anomaly Detection 64 / 169

Markovian Models High order Markov Chain High Order Homogeneous Markov Chains First order homogeneous Markov chain P(C t = s i0 C t 1 = s i1, C t 2 = s i2, C t 3 = s i3, ) = P(C t = s i0 C t 1 = s i1 ) = P(C 0 = s i0 C 1 = s i1 ) = P(s i0 s i1 ) l th order homogeneous Markov chain P(C t = s i0 C t 1 = s i1, C t 2 = s i2, C t 3 = s i3, ) = P(C t = s i0 C t 1 = s i1, C t 2 = s i2,, C t l = s il ) = P(C 0 = s i0 C 1 = s i1, C 2 = s i2,, C l = s il ) = P(s i0 s i1, s i2,, s il ) Some connection phases have dependences, between packets, of order bigger than 1 C. Callegari Anomaly Detection 65 / 169

Markovian Models Mixture Transition Distribution High Order Homogeneous Markov Chains We have an explosion of the number of the chain parameters, which grows exponentially with the order (K l (K 1)) Parsimonious representation of the transition probabilities Mixture Transition Distribution (MTD) model (K (K 1) + l 1) lx P(C t = s i0 C t 1 = s i1, C t 2 = s i2,, C t l = s il ) = λ j r(s i0 s ij ) where the quantities R = {r(s i s j ); i, j = 1, 2,, K } and Λ = {λ j ; j = 1, 2,, l} satisfy the constraints KX r(s i s j ) 0; i, j = 1, 2,, K and r(s i s j ) = 1 j = 1, 2,, K s i =1 lx λ j 0; j = 1, 2,, l λ j = 1 j=1 j=1 C. Callegari Anomaly Detection 66 / 169

Markovian Models State Space Reduction High Order Homogeneous Markov Chains We only consider the states observed during the training phase We add a rare state to take into account all the other possible states We fix the following quantities: r(rare s i ) = ɛ i = 1, 2,, K with ɛ small (in our case ɛ = 10 6 ) r(s i rare) = (1 ɛ)/(k 1) i = 1, 2,, K 1 (2) C. Callegari Anomaly Detection 67 / 169

Markovian Models Parameters Estimation High Order Homogeneous Markov Chains We need to estimate the parameters of the Markov chain (Maximum Likelihood Estimation - MLE) According to the MTD model, the log-likelihood of a sequence (c 1, c 2,, c T ) of length T is: K K ( l ) LL(c 1, c 2,, c T ) = N(s i0, s i1,, s il ) log λ j r(s i0 s ij ) i 0 =1 i l =1 where N(s i0, s i1,, s il ) represents the number of times the transition s il s il 1 s i0 is observed We have to maximize the right hand side of the equation, with respect to R and Λ, taking into account the given constraints j=1 C. Callegari Anomaly Detection 68 / 169

Markovian Models Parameters Estimation High Order Homogeneous Markov Chains Estimation Steps We apply an alternate maximization with respect to R and to Λ In the first step (estimation of Λ) we use the sequential quadratic programming The second step (estimation of R) is a linear inverse problem with positivity constraints (LININPOS) that we solve applying the Expectation Maximization (EM) algorithm Global Maximum This process leads to a global maximum, since LL is concave in R and Λ. C. Callegari Anomaly Detection 69 / 169

Markovian Models Markov Chains - Detection Phase High Order Homogeneous Markov Chains Choose between a single hypothesis H 0 (estimated stochastic model), and the composite hypothesis H 1 (all the other possibilities) H 0 : {(c 1, c 2,, c T ) computed model MC 0 } H 1 : {anomaly} No optimal result is presented in the literature The best solution is represented by the use of the Generalized Likelihood Ratio (GLR) test: ( )1 Maxv u L(c 1, c 2,, c T Λ v, R v ) T H 0 X = ξ L(c 1, c 2,, c T Λ u, R u ) H 1 C. Callegari Anomaly Detection 70 / 169

Markovian Models Markov Chains - Detection Phase High Order Homogeneous Markov Chains Equivalent to decide on the basis of the Kullback-Leibler divergence between the model associated to H 0 (MC 0 ) and the one computed for the observed sequence (MC s ) The Kullback-Leibler divergence, for first order Markov chains, is defined as: KL (MC 0, MC s ) = π 0 (s i )P 0 (s j s i ) log P 0(s j s i ) P s (s j s i ) i j where π 0 (s i ) is the stationary distribution of MC 0 and P k (s j s i ) is the (single step) transition probability from state C t 1 = s i to state C t = s j Extension to Markovian models of order l The state of the chain C t has to be considered as a point in a finite l-dimensional lattice: C t = (C t, C t 1,..., C t l+1 ) C. Callegari Anomaly Detection 71 / 169

Markovian Models Non-Homogeneous Markov Chain High Order Homogeneous Markov Chains 1 0.9 0.8 0.7 Detection Rate 0.6 0.5 0.4 0.3 0.2 0.1 Non Stationary ECDF Non Homogeneous MC Homogeneous MC 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Alarm Rate C. Callegari Anomaly Detection 72 / 169

Markovian Models High Order Markov Chain High Order Homogeneous Markov Chains C. Callegari Anomaly Detection 73 / 169

References Markovian Models High Order Homogeneous Markov Chains N. Ye, Y.Z.C.M Borror, Robustness of the Markov-chain model for cyber-attack detection, IEEE Transactions on Reliability 53, 2004 D.-Y. Yeung, Y. Ding, Host-based intrusion detection using dynamic and static behavioral models, Pattern Recognition 36, 2003 W.-H. Ju and Y. Vardi, A hybrid high-order Markov chain model for computer intrusion detection, Tech. Rep. 92, NISS, 1999 M. Schonlau, W. DuMouchel, W.-H. Ju, A. Karr, M. Theus, and Y. Vardi, Computer intrusion: Detecting masquerades, Tech. Rep. 95, NISS, 1999 N. Ye, T. Ehiabor, and Y. Zhanget, First-order versus high-order stochastic models for computer intrusion detection, Quality and Reliability Engineering International, vol. 18, 2002 C. Callegari Anomaly Detection 74 / 169

References Markovian Models High Order Homogeneous Markov Chains A. Raftery, A model for high-order markov chains, Journal of the Royal Statistical Society, series B, vol. 47, 1985 A. Raftery and S. Tavare, Estimation and modelling repeated patterns in high-order markov chains with the mixture transition distribution (MTD) model, Journal of the Royal Statistical Society, series C - Applied Statistics, vol. 43, 1994 Y. Vardi and D. Lee, From image deblurring to optimal investments: Maximum likelihood solutions for positive linear inverse problem, Journal of the Royal Statistical Society, series B, vol. 55, 1993 C. Callegari, S. Vaton, and M. Pagano, A new statistical approach to network anomaly detection, Performance Evaluation of Computer and Telecommunication Systems (SPECTS), 2008 C. Callegari Anomaly Detection 75 / 169

Outline Entropy 1 Introduction 2 Intrusion Detection Expert System 3 Statistical Anomaly Detection 4 Clustering 5 Markovian Models 6 Entropy-based Methods Entropy Compression Algorithms 7 Sketch 8 Principal Component Analysis 9 Wavelet Analysis C. Callegari Anomaly Detection 76 / 169

Theoretical Background Entropy Entropy Entropy The entropy H of a discrete random variable X is a measure of the amount of uncertainty associated with the value of X Referring to an alphabet composed of n distinct symbols, respectively associated to a probability p i, then The starting point H = n p i log 2 p i bit/symbol i=1 The entropy represents a lower bound to the compression rate that we can obtain: the more redundant the data are and the better we can compress them. C. Callegari Anomaly Detection 77 / 169

Entropy Compression Algorithms Compression Algorithms Dictionary based algorithms: based on the use of a dictionary, which can be static or dynamic, and they code each symbol or group of symbols with an element of the dictionary Lempel-Ziv-Welch (LZW) Model based algorithms: each symbol or group of symbols is encoded with a variable length code, according to some probability distribution. Huffman Coding (HC) Dynamic Markov Compression (DMC) C. Callegari Anomaly Detection 78 / 169

Lempel-Ziv-Welch Entropy Compression Algorithms Created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm, published by Lempel and Ziv in 1978 Universal adaptative 1 lossless data compression algorithm Builds a translation table (also called dictionary) from the text being compressed The string translation table maps the message strings to fixed-length codes 1 The coding scheme used for the k th character of a message is based on the characteristics of the preceding k 1 characters in the message C. Callegari Anomaly Detection 79 / 169

Huffman Coding Entropy Compression Algorithms Developed by Huffman (1952) Based on the use of a variable-length code table for encoding each source symbol The variable-length code table is derived from a binary tree built from the estimated probability of occurrence for each possible value of the source symbols Prefix-free code 2 that expresses the most common characters using shorter strings of bits than are used for less common source symbols 2 The bit string representing some particular symbol is never a prefix of the bit string representing any other symbol C. Callegari Anomaly Detection 80 / 169

Entropy Dynamic Markov Compression Compression Algorithms Developed by Gordon Cormack and Nigel Horspool (1987) Adaptative lossless data compression algorithm Based on the modelization of the binary source to be encoded by means of a Markov chain, which describes the transition probabilities between the symbol 0 and the symbol 1 The built model is used to predict the future bit of a message. The predicted bit is then coded using arithmetic coding C. Callegari Anomaly Detection 81 / 169

System Design Entropy Compression Algorithms Input The system input is given by raw traffic traces in libpcap format The 5-tuple is used to identify a connection, while the value of the TCP flags is used to build the profile A value s i is associated to each packet: s i = SYN +2 ACK +4 PSH +8 RST +16 URG +32 FIN thus each mono-directional connection is represented by a sequence of symbols s i, which are integers in {0, 1,, 63} C. Callegari Anomaly Detection 82 / 169

System Design Entropy Compression Algorithms Training Phase Choose one of the three previously described algorithms (Huffman, DMC, or LZW) The compression algorithms have been modified so as that the learning phase is stopped after the training phase: Huffman case: the occurency frequency of each symbol is estimated only on the training dataset DMC case: the estimation of the Markov chain is only updated during the training phase LZW case: the construction of the dictionary is stopped after the training phase Detection performed with a compression scheme that is optimal for the normal traffic used for building the considered profile and suboptimal for anomalous traffic C. Callegari Anomaly Detection 83 / 169

System Design Entropy Compression Algorithms Detection Phase Append each distinct observed connection b, to the training sequence A Compute the compression rate per symbol : X = dim([a b] ) dim([a] ) Length(b) where [X] represents the compressed version of X Choose between a single hypothesis H 0 (normal traffic), and the composite hypothesis H 1 (anomaly) X H 0 H 1 ξ C. Callegari Anomaly Detection 84 / 169

Entropy Results - System Comparison Compression Algorithms 1 0.8 Detection Rate 0.6 0.4 0.2 Huffman DMC LZW 0 0 0.2 0.4 0.6 0.8 1 False Alarm Rate C. Callegari Anomaly Detection 85 / 169

Entropy Results - On-line System Compression Algorithms 1 0.8 Detection Rate 0.6 0.4 0.2 Huffman DMC LZW 0 0 0.2 0.4 0.6 0.8 1 False Alarm Rate C. Callegari Anomaly Detection 86 / 169

References Entropy Compression Algorithms T. Cover and J. Thomas, Elements of information theory, Wiley-Interscience, 2nd ed., 2006 C. E. Shannon, A mathematical theory of communication, Bell System Technical Journal, vol. 27 1948 D. Huffman, A method for the construction of minimum-redundancy codes, Proceedings of the Institute of Radio Engineers, vol. 40, 1952 G. Cormack and N. Horspool, Data compression using dynamic Markov modelling, vol. 30, 1987 C. Callegari Anomaly Detection 87 / 169

References Entropy Compression Algorithms J. Ziv and A. Lempel, Compression of individual sequences via variable-rate coding, IEEE Transactions on Information Theory, vol. 24, 1978 T. Welch, A technique for high-performance data compression, IEEE Computer Magazine, vol. 17, no. 6, 1984 Christian Callegari, Stefano Giordano, Michele Pagano, On the Use of Compression Algorithms for Network Anomaly Detection, IEEE International Conference on Communications (ICC 2009) C. Callegari Anomaly Detection 88 / 169

Outline Sketch 1 Introduction 2 Intrusion Detection Expert System 3 Statistical Anomaly Detection 4 Clustering 5 Markovian Models 6 Entropy-based Methods 7 Sketch Count-Min sketch Heavy Hitters Detection 8 Principal Component Analysis 9 Wavelet Analysis C. Callegari Anomaly Detection 89 / 169

Data Stream Mining Sketch Count-Min sketch Data Stream mining The data stream mining is a set of techniques which permits to analyze a data flow, almost in real-time, without storing all the data Can be used to analyze, for example, the network traffic The aim can be the calculation of histograms, the individuation of the most common features, etc. Two constraints 1 Almost real-time 2 Without storing all the data C. Callegari Anomaly Detection 90 / 169

Sketch Count-Min sketch The Count-Min Sketch Algorithm Proposed by Cormode and Muthukrishran, 2004 Each element of the data flow is identified by id i t {1, 2,..., N}, with N big label c t R The data flow is the sequence (i t, c t ) t N An example The data flow is the network traffic i t is the source IP address of packet t c t is the length of packet t C. Callegari Anomaly Detection 91 / 169

Sketch Count-Min sketch The Count-Min Sketch Algorithm At each instant τ N and for each id i, we define a count a i (τ) a i (τ) def = τ c t δ i,it t=0 with δ i,it = 1 if i = i t and δ i,it = 0 otherwise In our example a i (τ) is the total number of bytes sent from i up to the instant τ First step of the algorithm: estimate â i (τ) C. Callegari Anomaly Detection 92 / 169

Sketch Count-Min sketch The Count-Min Sketch Algorithm Fix the precision ɛ and the failure probability δ Let s w = e ɛ and d = ln 1 δ Let s take d independent random hash functions Let s allocate a table Count dxw initialized to zero For each instant t, let s update the table according to: for each j {1, 2,..., d} Count[j, h j (i t )] = Count[j, h j (i t )] + c t C. Callegari Anomaly Detection 93 / 169

Sketch Count-Min sketch The Count-Min Sketch Algorithm At the instant τ and for each id i {1, 2,..., N} we have an estimation â i (τ) of the count a i (τ) â i (τ) = min Count[j, h j (i t )](τ) j C. Callegari Anomaly Detection 94 / 169

Sketch Count-Min sketch The Count-Min Sketch Algorithm Main properties The estimator â i (τ) a i (τ) P{â i (τ) a i (τ) + ɛ a (τ) 1 } 1 δ a (τ) def = (a 1 (τ), a 2 (τ),..., a N (τ)) a (τ) 1 def = a 1 (τ) + a 2 (τ) +... + a N (τ) C. Callegari Anomaly Detection 95 / 169

Sketch Count-Min sketch The Count-Min Sketch Algorithm Complexity in time Number of operations for updating the Count table is O(ln( 1 δ )) Number of operations for calculating â i is O(ln( 1 δ )) Complexity in space Number of words for storing the hash functions and the Count table is O( 1 ɛ ln( 1 δ )) C. Callegari Anomaly Detection 96 / 169

Finding Heavy Hitters Sketch Heavy Hitters Detection Aim: find those items whose frequencies exceed a threshold φ during the observation window These items are called Heavy Hitters Possible application: finding the IP addresses, whose contribution to the network traffic exceeds the threshold An anomaly can be detected as a variation of the heavy hitters distribution At a given instant τ and for a given threshold φ we define heavy hitters all the i, such that a i (τ) φ a(τ) 1 C. Callegari Anomaly Detection 97 / 169

Sketch Heavy Hitters Detection Application of the Count-Min sketch algorithm Initialization, at the instant t = 0 Calculate a(0) 1 = c 0 Update Count Add i 0 and his estimated count â i0 (0) to the list L of the potential heavy hitters C. Callegari Anomaly Detection 98 / 169

Sketch Heavy Hitters Detection Application of the Count-Min sketch algorithm Iteration, at the instant t = t Calculate a(t) 1 = a(t 1) 1 +c t Update Count Calculate the estimated count â it (t) If â it (t) φ a(t) 1 then If i t does not belongs to L: add i t and his estimated count â it (t) to L Else replace the count corresponding to i t Eliminate from the list every i, whose count is less than φ a(t) 1 After all the iterations, L contains all the heavy hitters The real count of all the elements of L is greater than (φ ɛ) a(t) 1, with probability at least 1 δ C. Callegari Anomaly Detection 99 / 169

Sketch Heavy Hitters Detection Finding Hierarchical Heavy Hitters Extension of the method, which takes into account the hierarchical structure of the IP addresses The Hierarchical Heavy Hitters (HHH) are defined recursively from the bottom to the top of the hierarchy At the lowest level (level 0), the HHH are the Heavy Hitters (i.e all those source addresses whose counts exceed the threshold) At level l > 0 an IP prefix is a HHH if its count minus the count of its descendant HHHs is greater that or equal to the threshold C. Callegari Anomaly Detection 100 / 169

Sketch Heavy Hitters Detection Finding Hierarchical Heavy Hitters C. Callegari Anomaly Detection 101 / 169

References Sketch Heavy Hitters Detection Graham Cormode, S. Muthukrishnan, An Improved Data Stream Summary: The Count-Min Sketch and its Applications, Theoretical Informatics, 2004 Pascal Cheung-Mon-Chan et al, Finding Hierarchical Heavy Hitters with the Count Min Sketch, IPS-MOME 2006 C. Callegari Anomaly Detection 102 / 169

Outline PCA 1 Introduction 2 Intrusion Detection Expert System 3 Statistical Anomaly Detection 4 Clustering 5 Markovian Models 6 Entropy-based Methods 7 Sketch 8 Principal Component Analysis PCA Detection and Identification 9 Wavelet Analysis C. Callegari Anomaly Detection 103 / 169

PCA PCA Principal Component Analysis PCA is a coordinate transformation method that maps the measured data onto a new set of axes These axes are called Principal Components Each principal component points in the direction of maximum variation or energy remaining in the data The principal axes are ordered by the amount of energy in the data they capture C. Callegari Anomaly Detection 104 / 169

Geometric illustration PCA PCA C. Callegari Anomaly Detection 105 / 169

Data PCA PCA A week of network-wide traffic measurements from Internet2: Internet2 samples 1 out of every 100 packets for inclusion in the flow statistics In Internet2 packets are aggregated into five-minute time-bins Abilene anonymizes the last eleven bits of the IP address stored in the flow records Routing Info. In order to aggregate the collected IP flows into OD flows, we also need to parse the routing data. Internet2 deploys Zebra BGP monitors that record all BGP messages they receive. C. Callegari Anomaly Detection 106 / 169

Data PCA PCA Measurements Matrix,X t p : Let p denote the number of traffic aggregate Let t denote the number of time-bin Column i denotes the time-series of the i-th traffic aggregate, with zero mean Row j represents an instance of all the traffic aggregate at j-th time-bin x, transposed row of X X = x 11 x 12 x 1p x 21 x 22 x 2p...... x t1 x t2 x tp C. Callegari Anomaly Detection 107 / 169

PCA PCA Principal Component Analysis Using PCA, we find that the set of OD flows has small intrinsic dimension (10 or less) Stability over time of this kind of representation (from week to week) Origin Destination Flows An OD flow consists of all traffic entering the network at a given point, and exiting the network at some other point, this is one out of the three traffic aggregations analyzed. High dimensional multivariate structure PCA: lower dimensional approximation C. Callegari Anomaly Detection 108 / 169

PCA PCA Principal Component Analysis Generalization of PCA to higher dimensions, as in the case of X, take the rows of X as points in Euclidean space, so that we have a dataset of t points in R p Mapping the data onto the first r principal axes places the data into an r-dimensional hyperplane. C. Callegari Anomaly Detection 109 / 169

PCA Linear algebraic formulation PCA Calculating the principal components is equivalent to solving the symmetric eigenvalue problem for the matrix X T X Each principal component v i is the i-th eigenvector computed from the spectral decomposition of X T X: X T Xv i = λ i v i i = 1,... p (3) k-th principal component: Where λ i is the eigenvalue corrisponding to v i v k = arg max v =1 k 1 (X i=1 Xv i v T i )v C. Callegari Anomaly Detection 110 / 169

PCA PCA Principal Component Analysis Once the data have been mapped into principal component space, it can be useful to examine the transformed data one dimension at a time: The contribution of principal axis i as a function of time is given by Xv i This vector can be normalized to unit length by dividing by σ i = λ i Thus, we have for each principal axis i: u i = Xv i σ i i = 1,..., p (4) The u i (eigenflows) are vectors of size t and orthogonal by construction C. Callegari Anomaly Detection 111 / 169

PCA PCA Principal Component Analysis Thus vector u i captures the temporal variation common to all flows along principal axis i The set of principal components {v i } p i=1 can be arranged in order as columns of a principal matrix V p p Likewise we can form the matrix U t p in which column i is u i Then taken together, V, U, and σ i can be arranged to write each traffic aggregate X i as: X i σ i = U(V T ) i i = 1,..., p (5) X i is the time-series of the i-th traffic aggregate and (V T ) i is the i-th row of V C. Callegari Anomaly Detection 112 / 169

PCA PCA Principal Component Analysis Equation 5 makes clear that each traffic aggregate X i is in turn a linear combination of the u i, with associated weights (V T ) i The elements of {σ i } k i=1 are the singular values Xv i = v T i X T Xv i = λ i v T i v i = λ i (6) Thus, the singular values are useful for gauging the potential for reduced dimensionality in the data, often simply through their visual examination in a scree plot C. Callegari Anomaly Detection 113 / 169

Scree Plot PCA PCA Energy Percentages captured by first PCs C. Callegari Anomaly Detection 114 / 169

PCA PCA Lower Dimensional Approximation Finding that only r singular values are non-negligible, implies that X effectively resides on an r-dimensional subspace of R p : X r σ i u i v i (7) i=1 where r < p is the effective intrinsic dimension of X C. Callegari Anomaly Detection 115 / 169

Subspace Method PCA Detection and Identification Subspace Method This method is based on a separation of the high-dimensional space occupied by a set of network traffic measurements into disjoint subspaces corresponding to normal and anomalous network conditions. This separation can be performed effectively by Principal Component Analysis Once the principal axes have been determined, the data-set can be mapped onto the new axes 1 Normal subspace: S 2 Anomalous Subspace: S C. Callegari Anomaly Detection 116 / 169

PCA Modeled and Residual Part of x Detection and Identification Detecting volume anomalies in link traffic relies on the separation of link traffic x at any time-step into normal and anomalous components: 1 modeled part of x 2 residual part of x We seek to decompose the set of link measurements at a given point in time x: x = ˆx + x We form ˆx by projecting x onto S, and we form x by projecting x onto S C. Callegari Anomaly Detection 117 / 169

Anomalous Subspace PCA Detection and Identification Anomalous Subspace, x To accomplish this, we arrange the set of principal components corresponding to the normal subspace (v 1, v 2,..., v r ) as columns of a matrix P of size m r where r denotes the number of normal axes. ˆx = PP T x = Cx and x = (I PP T )x = Cx C. Callegari Anomaly Detection 118 / 169

PCA Squared Prediction Error Detection and Identification A useful statistic for detecting abnormal changes in x is the Squared Prediction Error (SPE): SPE = x 2 = Cx 2 and we may consider network traffic to be normal if: SPE threshold C. Callegari Anomaly Detection 119 / 169

Identification PCA Detection and Identification Now we know the anomalous time bin We don t know which is the anomalous traffic aggregate responsible for this anomaly Identification, for every x 2 over the threshold: We determine the smallest set of OD flows, which if removed from the corresponding statistic, would bring it under threshold. C. Callegari Anomaly Detection 120 / 169

PCA Multiway Subspace Method Detection and Identification The distributions of packet features (IP addresses and ports) observed in flow traces reveal both the presence and the structure of a wide range of anomalies. They enable highly sensitive detection of a wide range of anomalies Traffic Features: 1 Src IP 2 Dest IP 3 Src Port 4 Dest Port C. Callegari Anomaly Detection 121 / 169

Features Distributions PCA Detection and Identification Many important kinds of traffic anomalies cause changes in the distribution of addresses or ports observed in traffic How feature distributions change as the result of a traffic anomaly (e.g., port scan) 30 30 x 10 6 # of Packets 25 20 15 10 5 # of Packets 25 20 15 10 5 # Bytes 2 1.5 1 0.5 2000 1500 1000 500 # Packets2500 0 5 10 15 20 25 30 35 40 Destination Port Rank 50 100 150 200 250 300 350 400 450 500 Destination Port Rank 2.5 x 10 3 # of Packets 30 25 20 15 10 # of Packets 500 450 400 350 300 250 200 150 H(Dst IP) H(Dst Port) 2 1.5 1 0.5 x 10 3 3 2 1 5 100 50 12/19 0 5 10 15 20 25 30 35 40 45 50 Destination IP Rank (a) Normal 5 10 15 20 25 30 35 40 45 50 Destination IP Rank (b) During Anomaly Figure 2: Port scan anoma and in terms of entropy. C. Callegari Anomaly Detection 122 / 169 Figure 1: Distribution changes induced by a port scan anomaly.

Entropy PCA Detection and Identification The distribution of traffic features is a high-dimensional object and so can be difficult to work with directly. 1 Analyze the degree of dispersal or concentration of the distribution 2 A metric that captures the degree of dispersal or concentration of a distribution is sample entropy 3 Empirical histogram Y = {n i, i = 1,..., N} 4 Sample entropy: H(Y ) = N i=1 n i S log n i 2 S where S = P N i=1 n i is the total number of observations in the histogram C. Callegari Anomaly Detection 123 / 169

C. Callegari Figure 3: Multivariate, multi-way Anomaly Detection data to analyze. 124 / 169 Multiway Anomalies PCA Detection and Identification Anomalies typically induce changes in multiple traffic features.

PCA Multiway Subspace Method Detection and Identification Anomalies Spanning multiple traffic features Unfold the multiway matrix in Figure into a single, large matrix With this technique subspace method can detect anomalies spanning multiple traffic features C. Callegari Anomaly Detection 125 / 169

The Architecture PCA Detection and Identification &'()*)+')*%,#-*.+/0.'#-!"#$%&'()*)+')* B2+*) J).!"#$ 4556788749%8:;:6 4556788749%86;55 4556788749%86;86 4556788749%86;95!"#$%&&&&&&'!($%&&&&&&&%)#*+(! 84<=4:4%%844=455%%%%%%%%%855 844=455%%85>=455%%%%%%%%%%%%%4 85>=85>%%%494=:99%%%%%%%%%%6: CHH+)H2.)% '-.#%'-H+)**% +#/.)+* CHH+)H2.'#-I CHH+)H2.)% '-.#%L1%M"#$* CHH+)H2.)% '-.#%'-@/.% "'-K*.N+)*N#"F.#@K C-#(2"A G'*. 12.2%3#/+0)*,2"0/"2.)%?-.+#@A?-.+#@A &'()*)+')* B,C%C-#(2"A% 1).)0.#+ D2-/2"%E2"'F2.'#- data type used for the traffic matrixwasbyte-countsinstead of entropy values, then a cell item vi,j would uniquely map to the number of packets carried by router j, forexample, at time i. There are many ways to perform structural traffic aggregation, each with different statistical properties, e.g. different number of constituent flows, distribution in flow size, etc. Our research demonstrates that the choice of traffic aggregation can significantly impact the effectiveness of PCA as a traffic anomalydetector,andhenceitisimportantto Figure 1: The Architecture This method has been combined with Sketch tools such as PCA analyze timeseries, they classify individual time bins as anomalous, which are different from the underlying network events that may have caused the detection. In fact, a given anomalous time bin may contain multiple anomalous events of interest to a network operator and, vice versa, one anomalous event may span multiple time-bins. For simplicity, we use the term anomaly as shorthand for anomalous time bin in the remainder of this paper, consistent with previous work. Second, PCA requires the length of the time series (i.e., n) C. Callegari Anomaly Detection 126 / 169

Experimental Results PCA Detection and Identification # Bytes 2 1.5 1 0.5 x 10 6 2000 1500 1000 500 # Packets2500 500 H(Dst IP) 2.5 x 10 3 2 1.5 1 0.5 x 10 3 3 H(Dst Port) 2 1 12/19 12/20 50 C. Callegari Anomaly Detection 127 / 169

tion perfomance. We now consider PCA Detection theandeffect Identification of varying these parameters. Experimental Results Pr of missed detections 1 0.8 0.6 0.4 0.2 Abilene, s=121, 1! m! 11 Src/21+Dst/21 Src/21 Dst/21 0 10 20 30 40 50 60 70 80 90 # of PCs in normal space C. Callegari Anomaly Detection 128 / 169

C. Callegari Figure 4: The probability of Anomaly missed Detection detections as a129 / 169 Experimental Results PCA Detection and Identification Pr of missed detections 1 0.8 0.6 0.4 0.2 Dante, s=484, 1! m! 11 Src/21+Dst/21 Src/21 Dst/21 0 10 20 30 40 50 60 70 80 90 # of PCs in normal space

Experimental Results PCA Detection and Identification Some remarks on the method: The false-positive rate is very sensitive to the dimensionality of the normal subspace The effectiveness of PCA is sensitive to the way the traffic measurements are aggregated Large anomalies can contaminate the normal sub-space Pinpointing the anomalous flows is inherently difficult C. Callegari Anomaly Detection 130 / 169

Experimental Results PCA Detection and Identification abilene ingress routers geant ingress routers to at ec- % variance 60 40 20 0 2 4 6 8 10 # pc abilene OD flows % variance 60 40 20 0 2 4 6 8 10 # pc or- nt ly in n- ew rilso ht fyal % variance % variance 60 40 20 0 60 40 20 0 2 4 6 8 10 # pc abilene input links 2 4 6 8 10 # pc % variance 60 40 20 0 (a) scree plots geant input links 2 4 6 8 10 # pc C. Callegari Anomaly Detection 131 / 169

lso ht fyal al es he im s, p k for at be for a m- ints alnt % 20 0 Experimental Results # pc cumulative percent variance captured 100 90 80 70 60 PCA 2 4 6 8 10 % (a) scree plots 20 Detection and Identification 0 2 4 6 8 10 # pc 50 abilene input links abilene OD flows 40 abilene ingress routers geant input links geant ingress routers 30 10 0 10 1 10 2 10 3 # principal components (b) CDF of scree plots C. Callegari Anomaly Detection 132 / 169

Experimental Results PCA Detection and Identification 35 30 geant (threshold = 90%) ingress routers input links percent false positives 25 20 15 10 5 0 2 3 4 5 6 7 8 9 10 # principal components C. Callegari Anomaly Detection 133 / 169

# principal PCA components Detection and Identification Experimental (a) Results Geant false-positive rate 180 160 geant (threshold = 90%) ingress routers input links total detections 140 120 100 80 60 2 3 4 5 6 7 8 9 10 # principal components C. Callegari Anomaly Detection 134 / 169

# principal components PCA Detection and Identification Experimental(b) Results Geant total detections 70 60 abilene (threshold = 90%) ingress routers OD flows input links percent false positives 50 40 30 20 10 2 3 4 5 6 7 8 9 10 # principal components C. Callegari Anomaly Detection 135 / 169

# principal components PCA Detection and Identification Experimental (c) Results Abilene false-positive rate 2200 2000 1800 1600 abilene (threshold = 90%) ingress routers OD flows input links total detections 1400 1200 1000 800 600 400 200 2 3 4 5 6 7 8 9 10 # principal components C. Callegari Anomaly Detection 136 / 169

References PCA Detection and Identification Papagiannaki K. Crovella M. Diot C. Kolaczyk E. D. Lakhina, A. and N. Taft, Structural analysis of network traffic flows, Tech rep 2004 Crovella M. Lakhina, A. and C. Diot, Characterization of network-wide anomalies in traffic flows, ACM SIGCOMM conference on Internet measurement, 2004 Crovella M. Lakhina, A. and C. Diot, Diagnosing network-wide traffic anomalies, Conference on Applications, technologies, architectures, and protocols for computer communications, 2004 Crovella M. Lakhina, A. and C. Diot, Mining anomalies using traffic feature distributions, ACM SIGCOMM Computer Communication Review, 2005 Jennifer Rexford Christophe Diot Haakon Ringberg, Augustin Soule, Sensitivity of PCA for Traffic Anomaly Detection, ACM SIGMETRICS, 2007 C. Callegari Anomaly Detection 137 / 169

Outline Wavelet 1 Introduction 2 Intrusion Detection Expert System 3 Statistical Anomaly Detection 4 Clustering 5 Markovian Models 6 Entropy-based Methods 7 Sketch 8 Principal Component Analysis 9 Wavelet Analysis Case Study 1 C. Callegari Anomaly Detection 138 / 169

Wavlet Analysis Wavelet The wavelets are scaled and translated copies (known as daughter wavelets ) of a finite-length or fast-decaying oscillating waveform (known as the mother wavelet ) Wavelet transforms have advantages over traditional Fourier transforms for representing functions that have discontinuities and sharp peaks The main difference, with respect to the Fourier transform, is that wavelets are localized in both time and frequency whereas the standard Fourier transform is only localized in frequency C. Callegari Anomaly Detection 139 / 169

Wavelet Wavelet Decomposition Mother wavelet ψ(t), satisfying the admissibility condition 2 Ψ(ω) dω < ω Wavelet basis {ψ m,n (t)} m,n Z = { a m/2 0 ψ ( a m 0 t nb 0 ) } m,n Z Representation of any finite energy signal x(t) L 2 (R) by means of its inner products {x m,n } m,n Z with the wavelets {ψ m,n (t)} m,n Z : x m,n = x(t) ψ m,n (t)dt = x(t) a m/2 0 ψ ( a m 0 t nb 0 ) dt (8) Orthonormal dyadic wavelet basis a 0 = 2 and b 0 = 1 Stringent constraints on the mother wavelet C. Callegari Anomaly Detection 140 / 169

Mother Wavelet Wavelet Morlet C. Callegari Anomaly Detection 141 / 169

Mother Wavelet Wavelet Meyer C. Callegari Anomaly Detection 142 / 169

Mother Wavelet Wavelet Mexican Hat C. Callegari Anomaly Detection 143 / 169

Wavelet Filter bank implementation of the Wavelet Transform C. Callegari Anomaly Detection 144 / 169 Level 1 Level 2 Level 3 Two scale difference equation ψ(t) = 2 n g n φ(2t n) h n φ(2t n) φ(t) = 2 n where g n = ( 1) n 1 h n 1 Let x = (x 1, x 2,...) denote the approximation of a finite energy signal x(t) h 2 h 2 h 2 x g 2 g 2 g 2

Wavelet Wavelet and Edge Detection An edge in an image is a contour across which the brightness of the image changes abruptly In image processing,an edge is often interpreted as one class of singularities In a function, singularities can be characterized easily as discontinuities where the gradient approaches infinity However, image data is discrete, so edges in an image often are defined as the local maxima of the gradient Wavelet transform has been found to be a remarkable tool to analyze the singularities including the edges and to detect them effectively C. Callegari Anomaly Detection 145 / 169

Wavelet Wavelet and Edge Detection C. Callegari Anomaly Detection 146 / 169

Wavelet Wavelet and Anomaly Detection The concept of edge can be easily extended to that of anomaly in network traffic Classical approaches look at the time series of specific kinds of packets inside aggregate traffic They detect irregular traffic patterns in traffic trace Wavelet analysis is applied to evaluate the traffic signal filtered only at certain scales, and a thresholding technique is used to detect changes C. Callegari Anomaly Detection 147 / 169

A case study Wavelet Case Study 1 A Signal Analysis of Network Traffic Anomalies Paul Barford, Jeffery Kline, David Plonka and Amos Ron ACM Internet Measurement Workshop 2002 The Measurement Data: SNMP and IP flow data collected at the border router (Juniper M10) of the University of Wisconsin-Madison campus network the campus network consists primarily of four IPv4 class B networks or roughly 256,000 IP addresses of which fewer than half are utilized IP connectivity to the commodity Internet and to research networks via about 15 discrete wide-area transit and peering links all of which terminate into the aforementioned router C. Callegari Anomaly Detection 148 / 169

SNMP Data Wavelet Case Study 1 The SNMP data was gathered by MRTG at a five minute sampling interval which is commonly used by network operator The SNMP data consists of the High Capacity interface statistics, defined by RFC2863, which were polled using SNMP version 2c Byte and packet counters for each direction of each wide-area link, specifically these 64-bit counters: ifhcinoctets, ifhcoutoctets, ifhcinucastpkts, and ifhcoutucastpkt C. Callegari Anomaly Detection 149 / 169

IP Flow Data Wavelet Case Study 1 The flow data was gathered using flow-tools and was post-processed using FlowScan The Juniper M10 router was running JUNOS 5.0R1.4, and later JUNOS 5.2R1.4, and was configured to perform cflowd flow export with a packet sampling rate of 96 This caused 1 of 96 forwarded packets to be sampled, and subsequently assembled into flow records similar to those defined by Cisco s NetFlow version 5 with similar packet-sampling-interval and 1 minute flow active-timeout Data were post-processed, so as to store mean value (over 5 minutes time-bins) of rate and packet dimension C. Callegari Anomaly Detection 150 / 169

Anomalies Wavelet Case Study 1 By manual inspecting the data, 109 anomalies were identified: 41 Network Events 46 Attacks 4 Flash Crowds 18 Measurement Events Necessity of filtering out the daily and weekly variations C. Callegari Anomaly Detection 151 / 169

The Method Wavelet Case Study 1 Framelet system, i.e. a redundant wavelet system (which essentially means that r, the number of high-pass filters, is larger than 1; a simple count shows that, if r > 1, the total number of wavelet coefficients exceeds the length of the original signal) In chosen system, there is one low-pass filter L and three high-pass filters H 1, H 2, H 3 C. Callegari Anomaly Detection 152 / 169

Wavelet Case Study 1 The analysis platform Derive from a given signal x (that represents five-minute average measurements over a 2 months period) three output signals, as follows The L(ow frequency)-part of the signal: all the low-frequency wavelet coefficients from levels 9 and up should capture patterns and anomalies of very long duration: several days and up signal here is very sparse (its number of data elements is approximately 0.4% of those in the original signal), and captures weekly patterns in the data quite well for many different types of Internet data, the L-part of the signal reveals a very high degree of regularity and consistency in the traffic, hence can reliably capture anomalies of long duration C. Callegari Anomaly Detection 153 / 169

Wavelet Case Study 1 The analysis platform The M(id frequency)-part of the signal: the wavelets coefficients from frequency levels 6, 7, 8 has zero-mean is supposed to capture mainly the daily variations in the data data elements number about 3% of those in the original signal The H(igh frequency)-part of the signal: obtained by thresholding the wavelet coefficients in the first 5 frequency levels need for thresholding stems from the fact that most of the data in the H-part consists of small short-term variations, variations that we think of as noise C. Callegari Anomaly Detection 154 / 169

Wavelet Case Study 1 Detection of anomalies Normalize the H- and M-parts to have variance one Compute the local variability of the (normalized) H- and M-parts by computing the variance of the data falling within a moving window of specified size The length of this moving window should depend on the duration of the anomalies that we wish to captured If we denote the duration of the anomaly by t 0 and the time length of the window for the local deviation by t 1, we need, in the ideal situation, to have q = t 0 /t 1 = 1 If the quotient q is too small, the anomaly may be blurred and lost If the quotient is too large, we may be overwhelmed by anomalies that are of very little interest C. Callegari Anomaly Detection 155 / 169

Wavelet Case Study 1 Detection of anomalies Combine the local variability of the H- part and M- part of the signal using a weighted sum. The result is the V(ariable)-part of the signal Apply thresholding to the V-signal. By measuring the peak height and peak width of the V-signal, one is able to begin to identify anomalies, their duration, and their relative intensity Needed Parameters: M- window size H- window size weights assigned to the M- and H-parts threshold C. Callegari Anomaly Detection 156 / 169

s technique in Experimental is the remainder. Results hat occur over y the first step on the use of ependent winby simplicity. ignificance of cal variability. mbinations of components) detection. To hich compoploy machine ust automated us to evaluate ther) features this study has ignal analysis tioned in Secnt-end for our signal is 8 weeks long (the period used to evaluate long-lived Wavelet Case Study 1 anomalies), the M-part is frequency levels 6,7,8; and the L-part Bytes/sec One Autonomous System to Campus, Inbound, 2001-DEC-16 through 2001-DEC-23 20 M 15 M bytes, original signal 10 M 5 M 0 6 M 4 M 2 M -2 M 0-4 M bytes, high-band -6 M 4 M 2 M 0-2 M bytes, mid-band -4 M 20 M 15 M bytes, low-band 10 M 5 M 0 Sat Sun Mon Tue Wed Thu Fri Sat Sun Fig. 1. Aggregate byte traffic from IP flow data for a typical week plus high/mid/low decomposition. C. Callegari Anomaly Detection 157 / 169

this study has signal analysis tioned in Secnt-end for our a framelet sigs a wide range Fig. 1. Aggregate byte traffic Wavelet from IPCase flow Study data for 1 a typical week plus high/mid/low decomposition. Experimental Results APIT analysis reely-available avelet decomhod for exposseries data can f a signal. eters: an M- assigned to the t of parameter ever, one can by modifying ocal deviation; sed on the M- ger or shorter ur journal had ores of 2.0 or Bytes/sec One Interface to Campus, Inbound, 2001-DEC-16 through 2001-DEC-23 20 M 15 M bytes, original signal 10 M 5 M 0 6 M 4 M 2 M -2 M 0-4 M bytes, high-band -6 M 4 M 2 M 0-2 M bytes, mid-band -4 M 20 M 15 M bytes, low-band 10 M 5 M 0 Sat Sun Mon Tue Wed Thu Fri Sat Sun Fig. 2. Aggregate SNMP byte traffic for the same week as Figure 1 plus high/mid/low decomposition. C. Callegari Anomaly Detection 158 / 169

Wavelet Case Study 1 NT WORKSHOP 2002 7 Experimental Results e figure also to high, mid, -, M-, and L- omponent of ame week at MP. In this ty SNMP inspecific BGP flow records. ndistinguishgh-frequency ocus on flash rowds first is posed by the ysis of either 30 M 25 M 20 M 15 M 10 M 5 M 0 Bytes/sec 5 M Outbound Class-B Network Bytes, mid-band 0-5 M -10 M 20 M 15 M 10 M 5 M Class-B Network, Outbound, 2001-SEP-30 through 2001-NOV-25 Outbound Class-B Network Bytes, original signal Outbound Class-B Network Bytes, low-band 0 Oct-01 Oct-08 Oct-15 Oct-22 Oct-29 Nov-05 Nov-12 Nov-19 Nov-26 Fig. 3. Baseline signal of byte traffic for a one week on either side of a flash crowd anomaly caused by a software release plus high/mid/low decomposition. C. Callegari Anomaly Detection 159 / 169

Fig. 3. Baseline signal of byte Wavelet traffic forcasea one Study week 1 on either side of a flash crowd anomaly caused by a software release plus high/mid/low decomposition. Experimental Results ocus on flash rowds first is posed by the ysis of either ocus on floweight weeks s-b networks nux releases. distributions ror server. In d by hand to aly (the posil inspection). al. The lowas the longowds is from s of packets. es should rebound HTTP ve means for of outbound nomaly from twork packet Bytes 1500 1000 500 0 300 200 100 0-100 -200-300 1500 1000 Campus HTTP, Outbound, 2001-SEP-30 through 2001-NOV-25 outbound HTTP average packet size signal outound HTTP average packet size, mid-band 500 outbound HTTP average packet size, low-band 0 Oct-01 Oct-08 Oct-15 Oct-22 Oct-29 Nov-05 Nov-12 Nov-19 Nov-26 Fig. 4. Baseline signal of average HTTP packet sizes (bytes) for four weeks on either side of a flash crowd anomaly plus mid/low decomposition. C. Callegari Anomaly Detection 160 / 169

the high and mid bands. By separating these signals from the Wavelet Case Study 1 longer time-scale behavior, we have new signals which may be amenable to detection by thresholding. Experimental Results Flows/sec 400 300 200 100 0 200 150 100 50 0 30 20 10 0-10 -20-30 400 300 200 100 0 Campus TCP, Inbound, 2002-FEB-03 through 2002-FEB-10 Inbound TCP Flows, mid-band Inbound TCP Flows, original signal Inbound TCP Flows, high-band Inbound TCP Flows, low-band Sun Mon Tue Wed Thu Fri Sat Fig. 5. Baseline signal of packet flows for a one week period highlighting two short-lived DoS attack anomalies plus high/mid/low decomposition. detection mec atives. Our tion score dis scribed in Sec Figure 7 sh a series of sho packet rate du wise be diffic anomalies are automatically first and seco values as show most score do measurement C. Callegari Anomaly Detection 20 k161 / 169 Packets/sec 50 k 40 k 30 k

Fig. 5. Baseline signal of packet Wavelet flows forcasea one Study week 1 period highlighting two short-lived DoS attack anomalies plus high/mid/low decomposition. Experimental Results Packets/sec 30 k 20 k Bytes/sec 30 M 20 M 10 M 0 8 M 6 M 4 M 2 M -2 M 0-4 M 6 M 4 M 2 M 0-2 M 30 M 20 M 10 M 0 Campus TCP, Inbound, 2002-FEB-10 through 2002-FEB-17 Inbound TCP Bytes, original signal Inbound TCP Bytes, high-band Inbound TCP Bytes, mid-band Inbound TCP Bytes, low-band Sun Mon Tue Wed Thu Fri Sat Fig. 6. Baseline signal of byte trafficfrom flow data for a one week period showing three short-lived measurement anomalies plus high/mid/low decomposition. 10 k 0 Sun Fig. 7. Deviati anomaly in f In Figure 8 containing a fourth of the an overall dec ric (packets, scoring ident it is feasible t age to determ accumulated E. Exposing C. Callegari Anomaly Detection 162 / 169

values as shown by the scale Wavelet on the Case right Study of1 the figure (the leftmost score does not quite reach 2). The third band marks an measurement anomaly unrelated to the DoS attacks. Experimental Results low-band 50 k Campus TCP, Inbound, 2002-FEB-03 through 2002-FEB-10 Inbound TCP Packets 2 Sat hlighting two osition. Packets/sec 40 k 30 k 20 k Deviation Score 1.5 Score 7 10 k 0 Sun Mon Tue Wed Thu Fri Sat Fig. 7. Deviation analysis exposing two DoS attacks and one measurement anomaly in for a one week period in packet count data. In Figure 8, we present a deviation analysis during a week containing a network outage. This outage affected about one C. Callegari Anomaly Detection 163 / 169

Experimental Results Wavelet Case Study 1 IN PROCEEDINGS OF ACM SIGCOMM INTERNET MEASUREMENT WORKSHOP 2002 9 30 M Inbound Outbound 2 Bytes/sec 20 M 10 M 1.5 Score 0 40 k 2 Pkts/sec Flows/sec 30 k 20 k 10 k 0 200 150 100 50 1.5 2 1.5 Score Score 0 Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Fig. 8. Deviation analysis exposing a network outage of one (of four) Class-B networks. Bytes/sec 600 k 500 k 400 k 300 k 200 k 100 k 0 20 k Inbound Outbound 2 1.5 2 Score 15 k C. Callegari 10 k Anomaly Detection 164 / 169 kts/sec Score

k 20 k Pkts/sec30 10 k 0 Experimental 200 Results Flows/sec 150 100 50 Wavelet Case Study 1 1.5 2 1.5 Score Score 0 Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Fig. 8. Deviation analysis exposing a network outage of one (of four) Class-B networks. Bytes/sec 600 k 500 k 400 k 300 k 200 k 100 k 0 20 k Inbound Outbound 2 1.5 2 Score 15 k Pkts/sec 10 k 5 k 1.5 Score 0 200 2 Flows/sec 150 100 50 1.5 Score 0 Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat 30 M Inbound Outbound 2 Bytes/sec 20 M 10 M 1.5 Score Pkts/sec 0 50 k 2 40 k 30 k 20 k 1.5 C. Callegari 10 k Anomaly Detection 165 / 169 Score

k 10 k Pkts/sec15 5 k Experimental 0 Results Flows/sec 200 150 100 50 Wavelet Case Study 1 1.5 2 1.5 Score Score 0 Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat 30 M Inbound Outbound 2 Bytes/sec 20 M 10 M 1.5 Score 0 50 k 2 40 k Pkts/sec 30 k 20 k 10 k 1.5 Score Flows/sec 0 300 250 200 150 100 50 0 Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat 2 1.5 Score Fig.9. Deviation analysis of two DoS events as seen in the 254 hostsubnet containing the victim of the attack (top) and in the aggregate traffic of the entire campus. C. Callegari Anomaly Detection 166 / 169

Experimental Results flow records showed this anomaly to have been due to of net- Wavelet Case Study 1 work abuse in which four campus hosts had their security compromised and were being remotely operated as peer-to-peer file servers. 10 M 8 M 6 M 4 M 2 M 0 3 M 2 M 1 M 0-1 M -2 M 3 M 2 M 1 M -1 M 0-2 M -3 M 10 M 8 M 6 M 4 M 2 M 0 Bytes/sec Class-B Network, Outbound, 2001-NOV-25 through 2001-DEC-23 Outbound Bytes, original signal Outbound Bytes, high-band Outbound Bytes, mid-band Outbound Bytes, low-band effort often in to identify th prised the an Of those 3 deviation sco confident sco wasn t detect tial but less t showed that was detected tude of the u the normaliz we used in o B. Holt-Win To further anomaly dete Nov-27 Dec-04 Dec-11 Dec-18 Dec-25 reporting tec Fig. 10. Example of three-band analysis exposing a multi-day network abuse available dev anomaly. Holt-Winters nential smoo plementation VI. DEVIATION SCORE EVALUATION We select C. Callegari Anomaly Detection 167 / 169

References Wavelet Case Study 1 P.Barford,J.Kline,D.Plonka,A.Ron, A signal analysis of network traffic anomalies, ACM SIGCOMM InternetMeasurement Workshop, 2002 P. Huang, A. Feldmann, W. Willinger, A non-intrusive, wavelet-based approach to detecting network performance problems, ACM SIGCOMM Internet Measurement Workshop, 2001 L. Li, G. Lee, DDos attack detection and wavelets, IEEE ICCCN, 2003 A. Dainotti, A. Pescape, and G. Ventre, Wavelet-based Detection of DoS Attacks, IEEE Globecom, 2006 Christian Callegari, Stefano Giordano, and Michele Pagano, Application of Wavelet Packet Transform to Network Anomaly Detection, nternational Conference on Next Generation Teletraffic and Wired/Wireless Advanced Networking (NEW2AN), 2008 C. Callegari Anomaly Detection 168 / 169

Wavelet Case Study 1 Thank You for your attention C. Callegari Anomaly Detection 169 / 169