On Entropy in Network Traffic Anomaly Detection

Similar documents
Detecting Anomalies in Network Traffic Using Maximum Entropy Estimation

An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework

Joint Entropy Analysis Model for DDoS Attack Detection

Conclusions and Future Directions

A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique

Detecting Network Anomalies. Anant Shah

DISCLOSING MALICIOUS TRAFFIC FOR NETWORK SECURITY

Detecting Flooding Attacks Using Power Divergence

Measuring Intrusion Detection Capability: An Information-Theoretic Approach

Classifying DNS Heavy User Traffic by using Hierarchical Aggregate Entropy. 2012/3/5 Keisuke Ishibashi, Kazumichi Sato NTT Service Integration Labs

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

STUDY OF IMPLEMENTATION OF INTRUSION DETECTION SYSTEM (IDS) VIA DIFFERENT APPROACHS

Social Media Mining. Data Mining Essentials

An Efficient Way of Denial of Service Attack Detection Based on Triangle Map Generation

Denial of Service attacks: analysis and countermeasures. Marek Ostaszewski

An Anomaly-Based Method for DDoS Attacks Detection using RBF Neural Networks

KEITH LEHNERT AND ERIC FRIEDRICH

Contribution of the French MetroSec

System for Denial-of-Service Attack Detection Based On Triangle Area Generation

How To Detect Denial Of Service Attack On A Network With A Network Traffic Characterization Scheme

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

AUTONOMOUS NETWORK SECURITY FOR DETECTION OF NETWORK ATTACKS

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

An Efficient and Reliable DDoS Attack Detection Using a Fast Entropy Computation Method

CHAPTER 1 INTRODUCTION

International Journal of Innovative Research in Advanced Engineering (IJIRAE) ISSN: Volume 1 Issue 11 (November 2014)

How To Prevent Network Attacks

Statistical Machine Learning

Monitoring sítí pomocí NetFlow dat od paketů ke strategiím

A Review of Anomaly Detection Techniques in Network Intrusion Detection System

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Botnet Detection Based on Degree Distributions of Node Using Data Mining Scheme

DDoS Confirmation & Attack Packet Dropping Algorithm in On- Demand Grid Computing Platform

Efficient Security Alert Management System

An Approach for Detecting and Preventing DoS Attacks in LAN

Anomaly Detection with Virtual Service Migration in Cloud Infrastructures

Bandwidth based Distributed Denial of Service Attack Detection using Artificial Immune System

IDS IN TELECOMMUNICATION NETWORK USING PCA

Network Intrusion Detection using Data Mining Technique

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Detecting Constant Low-Frequency Appilication Layer Ddos Attacks Using Collaborative Algorithms B. Aravind, (M.Tech) CSE Dept, CMRTC, Hyderabad

A Preliminary Performance Comparison of Two Feature Sets for Encrypted Traffic Classification

How To Cluster

Combining Statistical and Spectral Analysis Techniques in Network Traffic Anomaly Detection

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

MACHINE LEARNING IN HIGH ENERGY PHYSICS

E-commerce Transaction Anomaly Classification

Network Intrusion Detection using Semi Supervised Support Vector Machine

DATA VERIFICATION IN ETL PROCESSES

Internet Worm Classification and Detection using Data Mining Techniques

A Frequency-Based Approach to Intrusion Detection

Using Data Mining for Mobile Communication Clustering and Characterization

NETWORK INTRUSION DETECTION SYSTEM USING HYBRID CLASSIFICATION MODEL

An apparatus for P2P classification in Netflow traces

A VIVACIOUS APPROACH TO DETECT AND PREVENT DDoS ATTACK

Classification algorithm in Data mining: An Overview

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Data Preprocessing. Week 2

A SYSTEM FOR DENIAL OF SERVICE ATTACK DETECTION BASED ON MULTIVARIATE CORRELATION ANALYSIS

RID-DoS: Real-time Inter-network Defense Against Denial of Service Attacks. Kathleen M. Moriarty. MIT Lincoln Laboratory.

An Anomaly-based Botnet Detection Approach for Identifying Stealthy Botnets

Network Intrusion Detection Using a HNB Binary Classifier

A survey on Data Mining based Intrusion Detection Systems

Monitoring of Internet traffic and applications

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

PhD Proposal: Functional monitoring problem for distributed large-scale data streams

A Survey on Intrusion Detection System with Data Mining Techniques

Adaptive Flow Aggregation - A New Solution for Robust Flow Monitoring under Security Attacks

A Web-based Interactive Data Visualization System for Outlier Subspace Analysis

AS THE Internet continues to grow in size and complexity,

Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis

Intrusion Forecasting Framework for Early Warning System against Cyber Attack

Supervised Feature Selection & Unsupervised Dimensionality Reduction

A Dynamic Flooding Attack Detection System Based on Different Classification Techniques and Using SNMP MIB Data

The Artificial Prediction Market

Categorical Data Visualization and Clustering Using Subjective Factors

Chapter ML:XI (continued)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

Network Management & Monitoring

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Big Data Analytics CSCI 4030

Hybrid Intrusion Detection System Model using Clustering, Classification and Decision Table

Decompose Error Rate into components, some of which can be measured on unlabeled data

An Overview of Knowledge Discovery Database and Data mining Techniques

CYBER SCIENCE 2015 AN ANALYSIS OF NETWORK TRAFFIC CLASSIFICATION FOR BOTNET DETECTION

An Intelligent DDoS Attack Detection System Using Packet Analysis and Support Vector Machine

A Novel Distributed Denial of Service (DDoS) Attacks Discriminating Detection in Flash Crowds

Linear Threshold Units

D-optimal plans in observational studies

Transcription:

On Entropy in Network Traffic Anomaly Detection Jayro Santiago-Paz, Deni Torres-Roman. Cinvestav, Campus Guadalajara, Mexico November 2015 Jayro Santiago-Paz, Deni Torres-Roman. 1/19 On Entropy in Network Traffic Anomaly Detection

Outline Introduction 1 Introduction Databases 2 Windowing in Network Traffic 3 Kullback-Leibler divergence Mutual information Entropy calculation 4 5 The Classifier Metrics 6 Jayro Santiago-Paz, Deni Torres-Roman. 2/19 On Entropy in Network Traffic Anomaly Detection

Outline Introduction 1 Introduction Databases 2 Windowing in Network Traffic 3 Kullback-Leibler divergence Mutual information Entropy calculation 4 5 The Classifier Metrics 6 Jayro Santiago-Paz, Deni Torres-Roman. 2/19 On Entropy in Network Traffic Anomaly Detection

Outline Introduction 1 Introduction Databases 2 Windowing in Network Traffic 3 Kullback-Leibler divergence Mutual information Entropy calculation 4 5 The Classifier Metrics 6 Jayro Santiago-Paz, Deni Torres-Roman. 2/19 On Entropy in Network Traffic Anomaly Detection

Outline Introduction 1 Introduction Databases 2 Windowing in Network Traffic 3 Kullback-Leibler divergence Mutual information Entropy calculation 4 5 The Classifier Metrics 6 Jayro Santiago-Paz, Deni Torres-Roman. 2/19 On Entropy in Network Traffic Anomaly Detection

Outline Introduction 1 Introduction Databases 2 Windowing in Network Traffic 3 Kullback-Leibler divergence Mutual information Entropy calculation 4 5 The Classifier Metrics 6 Jayro Santiago-Paz, Deni Torres-Roman. 2/19 On Entropy in Network Traffic Anomaly Detection

Outline Introduction 1 Introduction Databases 2 Windowing in Network Traffic 3 Kullback-Leibler divergence Mutual information Entropy calculation 4 5 The Classifier Metrics 6 Jayro Santiago-Paz, Deni Torres-Roman. 2/19 On Entropy in Network Traffic Anomaly Detection

Databases Chandola et al. (2009) states that the term anomaly-based intrusion detection in networks refers to the problem of finding exceptional patterns in network traffic that do not conform to the expected normal behavior. Given a traffic network and its set of the selected traffic features X = {X 1, X 2,..., X p }, and N time instances of X, the normal and abnormal behaviors of the instances can be studied. The space of all instances of X builds the feature space which can be mapped to another space by employing a function such as entropy. In the literature, Shannon and generalized Rényi and Tsallis entropy estimators, as well as probability estimators (Balanced, Balanced II), are used. Jayro Santiago-Paz, Deni Torres-Roman. 3/19 On Entropy in Network Traffic Anomaly Detection

Databases The A-NIDS usually consists of two stages: training and testing stage. In the training stage using a database of normal or free-anomaly network traffic, the feature extraction, windowing and entropy calculation modules, a normal profile is found. In the testing stage, using the feature extraction, windowing and entropy calculation modules, anomalies in the current network traffic are detected and classified. Figure 1: General architecture of entropy-based A-NIDS. Jayro Santiago-Paz, Deni Torres-Roman. 4/19 On Entropy in Network Traffic Anomaly Detection

Databases Synthetic The synthetic databases are generated artificially, e.g., the MIT-DARPA 1998, 1999, 2000 databases a, which include five major categories: Denial of Service Attacks (DoS), User to Root Attacks (U2R), Remote to User Attacks (R2U) and probes. a http://www.ll.mit.edu/ideval/index.html Real Some real public databases are: CAIDA a, which contains anonymized passive traffic traces from high-speed Internet backbone links, and the traffic data repository, maintained by the MAWI b Working Group of the WIDE Project. Other researchers have created their own databases in different universities, e.g., Carnegie Mellon University, Xi an Jiaotong University, and Clemson University (GENI), or traffic collected from backbone in SWITCH, Abilene, and Géant. a https://www.caida.org/data/passive/passive_2012_dataset.xml b http://mawi.wide.ad.jp/mawi/ Jayro Santiago-Paz, Deni Torres-Roman. 5/19 On Entropy in Network Traffic Anomaly Detection

Motoda H. and Liu H. (2002) Windowing in Network Traffic Feature selection is a process that chooses a subset of M features from the original set of N features M N so that the feature space is optimally reduced according to a certain criterion. Feature extraction is a process that extracts a set of new features from the original features through some functional mapping. Assuming that there are N features Z 1, Z 2,..., Z N after feature extraction, another set of new features X 1, X 2,..., X M (M < N) is obtained via the mapping functions F i, i.e. X i = F i (Z 1, Z 2,..., Z N ). Jayro Santiago-Paz, Deni Torres-Roman. 6/19 On Entropy in Network Traffic Anomaly Detection

Windowing in Network Traffic Among the algorithms used to reduce the number of features in network traffic anomaly detection are: PCA, Mutual Information and linear correlation, decision tree, and maximum entropy. In network traffic, the most commonly employed features are: source and destination IP addresses and source and destination port numbers. Other features extracted from headers are: protocol field, number of bytes, service, flag, and country code. Zhang et al. (2009) divided the size of packets into seven types and Gu et al. (2005) defined 587 packet classes based on the port number. At flow a level the features selected were: flow duration, flow size distribution (FSD), and average packet size per flow. For KDD Cup 99, 41 features or a subset were employed. On the other hand, Tellenbach et al. (2011) used source port, country code and others, constructing the TES as input data. time T. a An IP flow corresponds to an IP port-to-port traffic exchanged between two IP addresses during a period of Jayro Santiago-Paz, Deni Torres-Roman. 7/19 On Entropy in Network Traffic Anomaly Detection

Windowing in Network Traffic Window-based methods group consecutive packets or flows based on a sliding window. The ith window of size L packets is represented as W i (L, τ) = {pack k, pack k+1,..., pack k+l }, with k = il iτ, where τ is the overlapping and τ {0, 1,..., L 1}. When the window size is given by time, L can be different in each window. Windowing is performed in two ways: overlapping (τ 0) and non overlapping (τ = 0) windows. The window sizes most commonly used are: 5 min, 30 min, 1 min, 100 sec, 5 sec and 0.5 sec. Some researchers use windows with a fixed length L = 4096, 1000, and 32 packets. Jayro Santiago-Paz, Deni Torres-Roman. 8/19 On Entropy in Network Traffic Anomaly Detection

Kullback-Leibler divergence Mutual information Entropy calculation Let X be a random variable which takes values of the set {x 1, x 2,..., x M }, p i := P (X = x i ) the probability of occurrence of x i, and M the cardinality of the finite set; hence, the Shannon entropy is: M H S (X) = p i log (p i ). (1) i=1 The Rényi entropy is defined as: ( H R (X, q) = 1 M ) 1 q log p q i i=1 and the Tsallis entropy is ( H T (X, q) = 1 1 q 1 (2) ) M p q i, (3) i=1 when q 1 the generalized entropies are reduced to Shannon entropy. In order to compare the changes of entropy at different times, the entropy is normalized, i.e., H(X) = H(X) H max(x). (4) Jayro Santiago-Paz, Deni Torres-Roman. 9/19 On Entropy in Network Traffic Anomaly Detection

Kullback-Leibler divergence Mutual information Entropy calculation Consider two complete discrete probability distributions P = (p 1, p 2,..., p n) and Q = (q 1, q 2,..., q n), with n i=1 p i = n i=1 q i = 1, 1 p i 0, 1 q i 0, i = 1, 2,..., n. The information divergence is a measure of the divergence between P and Q and is defined by Rényi (1961): ( D ρ(p Q) = 1 n ) ρ 1 log p ρ i q1 ρ i, ρ 0, (5) i=1 where ρ is the order of the information divergence. Consequently, the smaller D ρ(p Q) is, the closer the distributions P and Q are. D ρ(p Q) = 0 iff P = Q. When ρ 1 the Kullback-Leibler (KL) divergence is obtained D 1 (P Q) = n i=1 ( p i log ( pi q i )), ρ 1. (6) Jayro Santiago-Paz, Deni Torres-Roman. 10/19 On Entropy in Network Traffic Anomaly Detection

Kullback-Leibler divergence Mutual information Entropy calculation Conditional Entropy The conditional entropy of a variable Y given X, with alphabet X and Y, respectively, is defined as: H(Y X) = p(x) p(y x) log (p(y x)) (7) x X y Y = p(x, y) log (p(y x)). (8) x X y Y Joint Entropy The joint entropy of X and Y, defined as H(X; Y ) = p(x, y) log (p(x, y)) (9) x X y Y where p(x, y) is the joint probability mass function. Jayro Santiago-Paz, Deni Torres-Roman. 11/19 On Entropy in Network Traffic Anomaly Detection

Kullback-Leibler divergence Mutual information Entropy calculation The mutual information (MI) between two random variables X and Y is a measure of the amount of knowledge of Y supplied by X or vice versa. If X and Y are independent, then their mutual information is zero. The MI of two random variables X and Y is defined as: I(X; Y ) = H(X) H(X Y ) = H(Y ) H(Y X) = H(X) + H(Y ) H(X; Y ) (10) where H( ) is entropy, H(X Y ) and H(Y X) are conditional entropies, H(X; Y ) is the joint entropy. The MI equation can be written as: I(X; Y ) = x X y Y ( ) p(x, y) p(x, y) log p(x)p(y) where p(x) and p(y) are marginal probability mass functions of X and Y, respectively. In order to estimate the MI between X, Y, it is necessary to estimate p(x, y). (11) Jayro Santiago-Paz, Deni Torres-Roman. 12/19 On Entropy in Network Traffic Anomaly Detection

Kullback-Leibler divergence Mutual information Entropy calculation Different probability estimators are used, e.g., relative frequency, Balanced, and Balanced II, and consequently, a true probability distribution is built. The entropy is calculated using these estimators; the more accurate the estimators, the better the entropy estimates. Rahmani et al. (2009) noted that time series of IP-flow number and aggregate traffic size are strongly statistically dependent, and when an attack occurs, it causes a rupture in the time series of joint entropy values. In order to calculate the joint entropy H(X; Y ) they employed p(x, y) of the time series X and Y using either the Gamma density probability function (when the number of connections was small) or the central limit theorem (when the number of connections was large enough). Liu et al. (2010) calculated the conditional entropy H(Y X) where Y and X are two of the most widely used traffic variables: source and destination Ip addresses. Amiri et al. (2011) used an estimator of MI developed by Kraskov et al. (2004), which employs entropy estimates from k-nearest neighbors distances. Velarde-Alvarado et al. (2009) estimated entropy values using the balanced estimator II as a probability estimator. Jayro Santiago-Paz, Deni Torres-Roman. 13/19 On Entropy in Network Traffic Anomaly Detection

An anomaly in network traffic is a data pattern that does not conform to those representing a normal traffic behavior. Assuming that 1) X R p is a p-dimensional real-valued random variable with a domain S X R p representing traffic features, 2) x i are instances of X, i.e. x i S X, and 3) data patterns of normal behavior are represented by the subspace S N S X, anomaly detection determines whether an instance x i belongs to S N or not. The space S X can be partitioned or divided into classes with the help of decision functions, allowing further classification. Jayro Santiago-Paz, Deni Torres-Roman. 14/19 On Entropy in Network Traffic Anomaly Detection

Specific Decision Functions Zhang et al. (2009), Gu et al. (2005) used the KL divergence D 1 (P Q), in addition, Zhang et al. (2009) classified the abnormal situations into different classes. Coluccia et al. (2013) employed both KL divergence and Maximum entropy. Yan et al. (2008) used D 0.5 (P Q). In Santiago-Paz et al. (2015), a decision function is based on the Mahalanobis distance d 2 M (x i), and a second decision function is given by f(x i ) = N i α i k(x i, x) b for One Class-Support Vector Machine (OC-SVM), where k(x i, x) is a kernel. Huang et al. (2006) computed the Rényi entropy (q = 3) of the Coiflet and Daubechies wavelets. In Velarde-Alvarado et al. (2009), used the proportional uncertainty (PU) and the method of remaining elements (MRE) to detect anomalies. Tellenbach et al. (2011) used Kalman filter, PCA, and KLE as anomaly detection methods. Ma et al. (2014) established a function decision based on the entropy of the source IP address Ĥs and the entropy of the destination IP address Ĥd. In Berezínski et al. (2015), Özçelik and Brooks (2015) a function decision based on entropy and a range of values was used to detect anomalies. Jayro Santiago-Paz, Deni Torres-Roman. 15/19 On Entropy in Network Traffic Anomaly Detection

Table 1: Results of network traffic anomaly detection using entropy. Author Information metric Database Anomaly TNR [%] Gu et al. (2005) KL divergence Portscan 91.0 DoS 87.10 Zhang et al. Probe 68.18 KL divergence MIT-DARPA (2009) R2L 79.49 U2R 60.87 Liu et al.(2010) Conditional entropy CAIDA DDoS 93.0 Ferreira et al. (2011) Amiri et al. (2011) Santiago-Paz et al.(2015) Shannon, Rényi, Tsallis Mutual Information Shannon, Rényi, Tsallis KDD Cup 99 KDD Cup 99 LAN, MIT-DARPA subset DoS 93.18 Probe 79.20 R2L 97.76 U2R 95.05 DoS 90.02 Probe 99.97 R2L 99.98 U2R 95.0 Worms, DoS, Portscan 99.83 Jayro Santiago-Paz, Deni Torres-Roman. 16/19 On Entropy in Network Traffic Anomaly Detection

The Classifier Metrics Gupta et al. (2014) state that given: 1) a training data set of the form {(x i, y i )}, where x i S X is a feature vector or data pattern and y i {1,..., G} is the subset of the G class labels that are known to be correct labels for x i, 2) a discriminant function f(x; β g) with class-specific parameters β g for each class with g = 1,..., G; then class discriminant functions are used to classify an instance x as the class label that solves arg max gf(x; β g). Lakhina et al. (2005) apply two clustering algorithms: k-means and hierarchical agglomeration, using a vector h = [ H(srcIP), H(dstIP), H(srcPort), H(dstPort)]. Xu et al., (2005) define three free feature dimensions and introduce an Entropy-based Significant Cluster Extraction Algorithm for clustering. Lima et al. (2011)use the WEKA 1 Simple K-Means algorithm. SVM is applied by Tellenbach et al. (2011) to classify the anomalies. Yao et al. (2012) use the Random Forests Test. Santiago-Paz et al. (2014) present the Entropy and Mahalanobis Distance (EMD) based Algorithm to define elliptical regions in the feature space. In Santiago-Paz et al. (2015), OC-SVM and k-temporal nearest neighbors are used to improve accuracy in classification. 1 http://www.cs.waikato.ac.nz/ml/weka/ Jayro Santiago-Paz, Deni Torres-Roman. 17/19 On Entropy in Network Traffic Anomaly Detection

The Classifier Metrics Given a classifier and an instance, there are four possible outcomes: 2 T N, F P, F N, and T P. With these entries, the following statistics are computed: Accuracy (AC) is the T N+T P proportion of the total number of predictions that were correct: AC = T N+F P +F N+T P ; True Positive Rate (TPR) is the proportion of positive cases that were correctly identified: T P R = T P ; True Negative Rate (TNR) is the proportion of negative cases F N+T P that were classified correctly: T NR = T N ; False Negative Rate (FNR) is the proportion of positive cases that were incorrectly classified as negative: F NR = F N F N+T P ; 2 T P R AC T N+F P and F -measure is a measure of a test s accuracy: F -measure =. In addition, Receiver Operating Characteristic 3 (ROC) graphs illustrate the performance of a T P R+AC classifier. 2 T N is the number of correct predictions that an instance is negative, F P is the number of incorrect predictions that an instance is positive, FN is the number of incorrect predictions that an instance is negative, and TP is the number of correct predictions that an instance is positive. 3 ROC graphs are two-dimensional graphs in which an (F P rate, T P rate) pair corresponding to a single point in Receiver Operating Characteristic space. Jayro Santiago-Paz, Deni Torres-Roman. 18/19 On Entropy in Network Traffic Anomaly Detection

Nowadays, there is no public database large enough to exhaustively test and compare different algorithms in order to extract significant conclusions about their performances and their capabilities of classification. Therefore, the construction of a common database with real normal and anomalous traffic for the evaluation of A-NIDS is needed. The value of the q parameter for generalized entropies is found experimentally; its correct choice for the best anomaly detection is an open research problem. For different networks, the larger the slot size, the more different the entropy behaviors. In the near future, this behavior including more and recent traces in order to determine whether the learned model from a certain network can be used in a different network should be addressed. Another open issue is related to the adequate window size for reducing the data volume, ensuring good entropy estimates and early detection of anomalies. The set of decision functions and classifiers with new closeness and farness entropy-based measures should be enhanced. Jayro Santiago-Paz, Deni Torres-Roman. 19/19 On Entropy in Network Traffic Anomaly Detection