Sketch-based Network-wide Traffic Anomaly Detection

Similar documents

Face Hallucination and Recognition

Fast Robust Hashing. ) [7] will be re-mapped (and therefore discarded), due to the load-balancing property of hashing.

TCP/IP Gateways and Firewalls

Secure Network Coding with a Cost Criterion

A New Statistical Approach to Network Anomaly Detection

Teamwork. Abstract. 2.1 Overview

Advanced ColdFusion 4.0 Application Development Server Clustering Using Bright Tiger

Pricing and Revenue Sharing Strategies for Internet Service Providers

Traffic classification-based spam filter

Lecture 7 Datalink Ethernet, Home. Datalink Layer Architectures

An Integrated Data Management Framework of Wireless Sensor Network

Finance 360 Problem Set #6 Solutions

Simultaneous Routing and Power Allocation in CDMA Wireless Data Networks

Australian Bureau of Statistics Management of Business Providers

Normalization of Database Tables. Functional Dependency. Examples of Functional Dependencies: So Now what is Normalization? Transitive Dependencies

3.3 SOFTWARE RISK MANAGEMENT (SRM)

Take me to your leader! Online Optimization of Distributed Storage Configurations

Design Considerations

A Latent Variable Pairwise Classification Model of a Clustering Ensemble

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 31, NO. 12, DECEMBER

NCH Software FlexiServer

The Domain Name System (DNS)

TERM INSURANCE CALCULATION ILLUSTRATED. This is the U.S. Social Security Life Table, based on year 2007.

WHITE PAPER BEsT PRAcTIcEs: PusHIng ExcEl BEyond ITs limits WITH InfoRmATIon optimization

Chapter 1 Structural Mechanics

Virtual trunk simulation

Art of Java Web Development By Neal Ford 624 pages US$44.95 Manning Publications, 2004 ISBN:

Business schools are the academic setting where. The current crisis has highlighted the need to redefine the role of senior managers in organizations.

GWPD 4 Measuring water levels by use of an electric tape

CONTRIBUTION OF INTERNAL AUDITING IN THE VALUE OF A NURSING UNIT WITHIN THREE YEARS

SNMP Reference Guide for Avaya Communication Manager

Chapter 3: e-business Integration Patterns

Leakage detection in water pipe networks using a Bayesian probabilistic framework

GreenTE: Power-Aware Traffic Engineering

COMPARISON OF DIFFUSION MODELS IN ASTRONOMICAL OBJECT LOCALIZATION

READING A CREDIT REPORT

CERTIFICATE COURSE ON CLIMATE CHANGE AND SUSTAINABILITY. Course Offered By: Indian Environmental Society

NCH Software MoneyLine

A quantum model for the stock market

Pay-on-delivery investing

Wide-Area Traffic Management for. Cloud Services

Minimizing the Total Weighted Completion Time of Coflows in Datacenter Networks

A Supplier Evaluation System for Automotive Industry According To Iso/Ts Requirements

Enhanced continuous, real-time detection, alarming and analysis of partial discharge events

A Similarity Search Scheme over Encrypted Cloud Images based on Secure Transformation

Design and Analysis of a Hidden Peer-to-peer Backup Market

ASYMPTOTIC DIRECTION FOR RANDOM WALKS IN RANDOM ENVIRONMENTS arxiv:math/ v2 [math.pr] 11 Dec 2007

GREEN: An Active Queue Management Algorithm for a Self Managed Internet

Introduction the pressure for efficiency the Estates opportunity

Bite-Size Steps to ITIL Success

Chapter 2 Traditional Software Development

On Capacity Scaling in Arbitrary Wireless Networks

Business Banking. A guide for franchises

Load Balancing in Distributed Web Server Systems with Partial Document Replication *

Maintenance activities planning and grouping for complex structure systems

Older people s assets: using housing equity to pay for health and aged care

Life Contingencies Study Note for CAS Exam S. Tom Struppeck

Cooperative Content Distribution and Traffic Engineering in an ISP Network

Betting Strategies, Market Selection, and the Wisdom of Crowds

INDUSTRIAL AND COMMERCIAL

How To Understand Time Value Of Money

Fixed income managers: evolution or revolution

IBM Security QRadar SIEM

LT Codes-based Secure and Reliable Cloud Storage Service

FRAME BASED TEXTURE CLASSIFICATION BY CONSIDERING VARIOUS SPATIAL NEIGHBORHOODS. Karl Skretting and John Håkon Husøy

Overview of Health and Safety in China

Spatio-Temporal Asynchronous Co-Occurrence Pattern for Big Climate Data towards Long-Lead Flood Prediction

Scheduling in Multi-Channel Wireless Networks

Multi-Robot Task Scheduling

Ricoh Legal. ediscovery and Document Solutions. Powerful document services provide your best defense.

3.5 Pendulum period :40:05 UTC / rev 4d4a39156f1e. g = 4π2 l T 2. g = 4π2 x1 m 4 s 2 = π 2 m s Pendulum period 68

The guaranteed selection. For certainty in uncertain times

Setting Up Your Internet Connection

Ricoh Healthcare. Process Optimized. Healthcare Simplified.

arxiv: v1 [cs.ai] 18 Jun 2015

Betting on the Real Line

Key Features of Life Insurance

A train dispatching model based on fuzzy passenger demand forecasting during holidays

Comparison of Traditional and Open-Access Appointment Scheduling for Exponentially Distributed Service Time

Storing Shared Data on the Cloud via Security-Mediator

Order-to-Cash Processes

Subject: Corns of En gineers and Bureau of Reclamation: Information on Potential Budgetarv Reductions for Fiscal Year 1998

Distribution of Income Sources of Recent Retirees: Findings From the New Beneficiary Survey

CLOUD service providers manage an enterprise-class

LADDER SAFETY Table of Contents

NCH Software BroadCam Video Streaming Server

Early access to FAS payments for members in poor health

Avaya Remote Feature Activation (RFA) User Guide

Transcription:

Sketch-based Network-wide Traffic Anomay Detection Yang Liu, Linfeng Zhang, and Yong Guan Department of Eectrica and Computer Engineering Iowa State University, Ames, Iowa 500 Emai: {yang, zhangf, guan}@iastate.edu Abstract Internet has become an essentia part of the daiy ife for biions of users wordwide, who are using a arge variety of network services and appications everyday. However, there have been serious security probems and network faiures that are hard to resove, for exampe, Botnet attacks, poymorphic worm/virus spreading, DDoS, and fash crowds. To address many of these probems, we need to have a network-wide view of the traffic dynamics, and more importanty, be abe to detect traffic anomay in a timey manner. Existing network measurement and monitoring soutions often suffer scaabiity probems caused by overy arge processing, space, or communication overhead. In this paper, we propose to deveop sketch-based agorithms for network-wide anomay detection that are abe to detect both high-profie and coordinated ow-profie traffic anomaies as an outier in the reguar traffic patterns. Our approach is based on the spatia anaysis by using traffic measurements from mutipe monitors. Spatia anaysis methods have been proved to be effective in detecting network-wide traffic anomaies that are not detectabe at a singe monitor. To our knowedge, Principe Component Anaysis (PCA) is the best-known spatia detection method for the coordinated ow-profie traffic anomaies. However, existing PCA-based soutions have scaabiity probems in that they require O(m 2 n) running time and O(mn) space to anayze traffic measurements from m aggregated traffic fows within a siding window of the ength n, which makes it often infeasibe to be depoyed for monitoring arge-scae high-speed networks. We propose two nove sketch-based agorithms for PCA-based traffic anomay detection in a distributed fashion. Our agorithm can archive O(w og n) running time and O(w og 2 n) space at oca monitors, and O(m 2 og n) running time and O(m og n) space at Network Operation Center, where w denotes the maximum number of traffic fows at a singe oca monitor. Additionay, our agorithm can protect the privacy of traffic measurements for Internet Service Providers.

Sketch-based Network-wide Traffic Anomay Detection INTRODUCTION The Internet has become an essentia part of the daiy ife for biions of users wordwide. Peope are using and reying on a arge variety of services buit on the top of the Internet, such as web browsing, onine banking, shopping, entertainment, VoIP, Video on demand, auction, socia networks, etc. However, everyday we are sti reading news stories about major security breaches, new poymorphic worm/virus spreading, identity theft, Botnet activity, DDoS or phishing emais. To address many of these probems (e.g. DDoS, Botnet, worm/virus, etc.), we need to have a network-wide view of the traffic dynamics, and more importanty, be abe to detect the traffic anomay in a timey manner. Otherwise, the faiure of doing so may cause catastrophic damages or unwanted resuts with impacts affecting onine business, pubic safety, homeand security, persona privacy, the economy and the society at arge. Traffic anomaies can occur due to a variety of probems. Firsty, security threats ike DDoS, worms, and Botnets, can generate extremey arge-voume anomaous traffic. Secondy, unusua events can cause traffic anomaies, ike equipment faiures, vendor impementation errors, and software bugs. Thirdy, abnorma user behaviors can change the traffic patterns, for exampe, fash crowds, non-maicious arge fie transfers, etc. In the eary days, traffic anomaies often invove unusua arge-voume traffic, i.e. high-profie traffic, which are mainy caused by traditiona DoS, worm, or fash crowds. In recent years, new threats ike Botnets introduce ow-profie but in a coordinated manner, which ony generate a sma amount of traffic but foow specific coordinated traffic patterns. Besides these, there are aso some traffic anomaies that are ow-profie and non-coordinated, e.g. Back mais and spam voice IP cas. LOCAL MONITOR Network Operation Center LOCAL MONITOR Network Operation Center 2 R LOCAL MONITOR R2 LOCAL R5 MONITOR R4 R3 AS LOCAL MONITOR LOCAL MONITOR User Server R6 LOCAL MONITOR R9 AS2 R7 R8 Fig.. A Distributed Framework for Network Measurement and Monitoring For the purpose of addressing probems ike intrusion detection, faut detection and recovery, and QoS provisions, many ISPs have chosen to use a distributed architecture for the network monitoring as shown in Fig., which consists of oca monitors and Network Operation Centers (NOC). In this framework, oca monitors coect data from routers and other network devices, perform some processing at or cose to the data sources, and then transfer their measurements to NOCs. NOCs are responsibe for mining characteristics of interest from coected measurements, and identifying the probems and the roots thereof. Many of such measurements from these system are aso the data source for the traffic anomay detection. Monitoring and detecting network-wide traffic anomaies has been and is sti chaenging for the foowing reasons: Firsty, the Internet traffic exhibits huge fuctuations and ong range dependence [], which makes traffic anomaies often be hidden by arge voumes of the norma traffic. Secondy, traffic anomaies show an extreme diversity and new varieties of traffic anomaies are emerging everyday. Thirdy, ISPs want to detect traffic anomaies when they are sti at a ow-profie voume in order to reduce the damage as much and eary as possibe. Last but not the east, there are many systems where

2 data, computing, and other resources are distributed and cannot be transported to a center for various reasons, e.g. ow bandwidth, security, privacy, and oad baancing issues. The spatia anaysis ike Principa Component Anaysis (PCA) [2] has been verified to be an effective method for the traffic anomay detection. But it introduces severa chaenges for appying PCA onine in practice [3]. This method ony cassifies specific time intervas as anomaies, but cannot identify responsibe ones. Loca monitors must send their data to the NOC periodicay, which coud cause the scaabiity probems. PCA requires a singuar vaue decomposition (SVD) of a n m matrix. The computation compexity of SVD is O(nm 2 ) and the space requirement is O(nm), which woud become a botteneck to perform PCA in high-spend networks. Because the bandwidth to a NOC may be imited, oca monitors cannot send data at a high frequency. To address this probem, the NOC must set a ong enough period for each monitor to update their measurements.. Our Contribution In this paper, we propose sketch-based agorithms for the traffic anomay detection based on networkwide traffic measurements, which can significant improve the computation and storage overhead for the PCA-based methods [2], [4] [7]. In our agorithms, each oca monitor ony maintains a series of sketches for each traffic fow rather than the raw measurements, which requires O(w og n) running time and O(w og 2 n) space, where w denotes the maximum possibe number of traffic fows at a oca monitor. Traffic measurements are projected into random seected sketches that are sent to the NOC, and keep the origina data in secret. It is very difficut to earn other information about network traffic from the sketches but traffic anomaies. The sketch computation and update can be done either at the oca monitor or at the NOC with the consideration about the privacy protection and the communication cost. NOC can run PCA-based detection methods by using coected sketches with O(m 2 og n) running time and O(m og n) space. Our main contributions can be summarized as foows: ) Our agorithm is efficient in both the running time and the space requirement. 2) Our agorithm is fexibe for ISP to baance the computation and the storage in a distributed measurement and monitoring system. 3) Our agorithm can detect both high-profie and coordinated ow-profie traffic anomaies as an outier in the reguar traffic patterns ike the PCA-based methods. 4) Our agorithm provides an additiona benefit, i.e. the privacy protection of traffic measurements. 5) Our agorithm can aso keep the communication cost beow a upper bound when the NOC ony aows a ong updating period due to the imited communication bandwidth..2 Reated work Traffic anomay detection has become an important issue for the network management in the Internet, which has obtained considerabe research interests. For the high-profie traffic anomaies, researchers can appy some signa anaysis methods on the traffic measurements from a singe monitor to detect traffic anomaies [8]. To dea with the ow-profie coordinated traffic anomaies, Lakhina et a. [2], [5] proposed a PCA-based detection method by utiizing traffic measurements from mutipe inks. Li et a. [7] aggregated traffic fows into sketch subspaces and detected traffic anomaies based on PCA. Due to the high communication cost in Lakhina s method, Huang et a. [9] designed a oca agorithm to fiter data at the oca monitor in order to avoid excessive use of the network-wide communication. A oca monitor wi send its data to the NOC ony if the oca error exceeds a user-specified toerance. Furthermore, a genera framework of detection methods [0] and a distributed muti-dimensiona indexing system of traffic measurements [] have been proposed for the traffic anomay detection probem. Recenty, severa new methods are introduced to detect traffic anomaies based on various features in the network traffic [2] [4]. Chhabra et a. [3] used the generaized quantie sets (GQSs) to identify a set of candidate anomaies at each oca monitor. Then a oca monitor communicates its detection resuts with other oca monitors to finay detect traffic anomaies. Kine et a. [4] utiized Bayes Net to identify potentia anomaous traffic from traffic voumes and correations between ingress/egress packet and bit rates.

3 The rest of this paper is structured as foowing. We formaize the traffic anomay detection probem in Sec.2. Next we present two detection agorithms in Sec.3 and Sec.4, respectivey, where the second one is an advanced version of the first one that improves the agorithm at oca monitors. The theoretica anaysis of these two methods are aso given in the same sections, respectivey. We evauate our agorithm by using the data from Abiene Observatory Data Coections [5] in Sec.6. We concude our paper with future work in Sec.7. 2 PROBLEM DEFINITION 2. Background The Internet is a goba system of interconnected computer networks, which can provide data interchanging by using the standardized Internet Protoco Suite (TCP/IP). The computer networks are organized into severa autonomous systems (AS), each of which is independenty operated by an Internet Service Provider (ISP). The success of the Internet mainy owes to the end-to-end principe, which resuts in a simpe network infrastructure. A the data transported by the Internet are divided into IP packets, and each packet is forwarded hop-by-hop by routers. There are a source address and a destination address in each packet s header, which are used by routers to determine the forwarding path from the source to the destination. Inter-domain routing protoco (BGP) is used to forward IP packets among different ASs. And there are more than one intra-domain routing protoco that can be used in a singe AS. The communication between two computers is controed by a transmission protoco ike TCP, which creates an individua end-to-end traffic fow. Due to the exponentia increase in terms of the number of users and appications, it has become not feasibe to maintain statistics for each individua end-to-end traffic fow if not impossibe. Thus, ISPs often aggregate end-to-end traffic fows at different eves, such as origin autonomous systems, ingress inks, appications, etc. For exampe, ISPs can use the origin-destination (OD) fow, defined as a packets that enter the network at one origin router and exits at another destination router. Loca monitors coect traffic measurements on aggregated traffic fows in rea time. A traffic measurement, denoted by s, can contain one or severa traffic features, e.g. s = {c (IP),c 2 (Port),c 3 (Size)}, where c k ( ) denotes a function on IP addresses, TCP/UDP ports, packet size, or other traffic features. And c k ( ) is computed over packets within a time interva, which can be the count, the entropy, or other quantities of the traffic features. 2.2 Traffic Anomay Definition Fig. 2. DoS Traffic Food c CAIDA A high-profie traffic anomay often means an unusua arge voume of traffic from one or mutipe sources to one or mutipe destinations. As a simpe exampe, we show the packet counts in a campus

4 ink in Fig.2 from CAIDA [6]. There was a spike around :00 a.m. in both inbound and outbound traffic, which was caused by a food of incoming TCP ACK packets directed to a campus computer. Fash crowds is another exampe of common traffic anomaies, which refer to arge unexpected traffic spikes towards particuar web-sites due to associated user behaviors. Usuay, sudden events of great interest trigger fash crowds, for exampe the CNN broadcast on the terrorist attacks of September, 200 [7]. The increase in the request rate is dramatic, but reativey short in duration. On September, CNN served over 32 miion pages, and the traffic was amost doubing every 7 minutes. Usuay, a oca monitor can detect such big changes in the traffic pattern, but it is difficut to identify potentia traffic anomaies that arise from sma fuctuations in thousands of traffic fows. For the owprofie coordinated traffic anomaies, some computers attempt to compromise vunerabe hosts, propagate maracious software, or operate some botnets, which ony invove sma-voume traffic fows as shown in Fig. 3 incuding the traffic voumes in four OD fows on the Abiene network [5]. Such traffic anomaies ike the Botnet [8] cannot be detected at a singe oca monitor and therefore require a network-wide traffic anaysis. Voume Voume Voume Voume 2 x 08 ATLA CHIC 0 5700 5750 5800 5850 5900 5 x 07 0 5 CHIC KANS 0 5700 5750 5800 5850 5900 6 x 07 4 2 CHIC SALT 0 5700 5750 5800 5850 5900 5 x 07 0 5 SEAT SALT 0 5700 5750 5800 5850 5900 Time Fig. 3. An exampe of coordinated traffic anomaies from the Abiene network In a arge-scae network, ISPs can utiize traffic measurements from mutipe monitors to detect traffic anomaies that cannot be detected at a singe monitor. An observation, denoted by x = (s,...,s m ) T, is defined as a vector of the measurements from mutipe monitors, where s j denotes a measurement of the j-th fow and m denotes the number of traffic fows. Athough each s j may be a norma measurement individuay, they can ony be observed concurrenty when there are some traffic anomaies occurring. In genera, a traffic anomay can be detected as an observation that doesn t beong to an norma observation set Ω, which contains a observations corresponding to the norma traffic. The easiest way to check whether an observation x beongs to Ω is to compare x with each eement in Ω, which may require a arge amount of time to process a possibe observations. In order to detect traffic anomaies in neary rea time, we have to investigate the properties of norma traffic, and try to find some quantities that are different enough to distinguish norma traffic and anomaous traffic. The anomay distance is defined as such a distance norm that the norma observations in Ω are cose to each other with a high probabiity. We can pick a typica norma observation x 0 Ω, and rank every observation based on the anomay distance between itsef and the typica observation x 0. Then the probabiity distribution

5 of the anomay distance on norma observations can be computed, which wi be used to detect traffic anomaies with a statistica confidence. Definition : Given an observation set Ω and an anomay distance d Ω (, ), an observation x is a traffic anomay at ( ) confidence eve if P(x : d Ω (x,x 0 ) > d Ω (x,x 0 )) () where P( ) denotes the probabiity distribution of the anomay distance, and is a rea number between 0 and. In this paper, we take the detection of traffic voume anomaies as an exampe to verify our method, which is a specific case study of the genera probem. In our agorithm, we do not specify any aggregation method, and assume that the aggregation method is given by ISPs who can baance the computation overhead and their requirements. And we have no knowedge about the statistica properties of the traffic. As a consequence, we are given a previous measurement set ˆΩ, and estimate the probabiity distribution of the anomay distance based on it. The traffic voume is defined as the tota size of a IP packets in a traffic fow within a singe time interva. Here x ij denotes the traffic voume of the j-th fow at the i-th time interva. The measurement set ˆΩ contains a recent observations within a siding window of the ength n from m traffic fows. And the probabiity density distribution of the anomay distance is denoted by f d (x). For simpicity, we use dˆω (x) to denote the anomay distance between x and x 0. Definition 2: Given the measurement set ˆΩ, an observation x i measured at the i-th time interva is a traffic anomay at confidence eve, if dˆω (x i) > δ (2) where δ denotes the threshod of the anomay distance assigned to the observation set ˆΩ, which can be obtained by soving the foowing equation, f d (x)dx =. (3) δ 2.3 Objective of this research Our agorithm aims at detecting traffic anomaies with continuous traffic measurements updating without retrieving previous data in high-speed networks. To archive this goa, we must compute the anomay distance for samped observations based on the measurement set ˆΩ with the consideration of computation and space constraints. Firsty, the computation shoud be very efficient at both the oca monitors and the NOC. Secondy, the agorithm is impemented in a distributed system which requires carefu consideration about the storage and communication overhead. Thirdy, it shoud support continuous update to adjust the detection method due to the evauation of the traffic anomay. Last but not the east, a privacy protection mechanism makes ISPs be abe to share traffic measurements with each other, which can provide a network-wide detection of the traffic anomay. 3 SIMPLE SKETCH METHOD (SSM) Packet Stream Aggregation Sketch Computation Voume Counter Sketch Traffic Voume PCA Computation Computing Anomay Distance Computing Threshod d >? Aarm? Loca Monitor Network Operation Center Fig. 4. System mode of simpe sketch method

6 The system mode of the simpe sketch method is shown in Fig.4, which utiizes Random Projection [9] [2] to reduce the computation compexity. In this method, the measurement set ˆΩ is converted into a compact representation, i.e. the sketches. The sketches require smaer storage space and therefore can be maintained in the memory. The computation of the threshod δ and the anomay distance dˆω(x i ) are based on the sketches rather than the origina measurement set ˆΩ. There are five modues and the agorithm in each modue is described in the foowing section. Then we give an exampe to show the dection procedure of the SSM. At ast, we provide the theory for the SSM method to detect traffic anomaies and the anaysis of its computation compexity and the error bound. 3. Agorithm 3.. Voume Counter ISP impements a aggregation method and reports a pair (F owid, Size) to the voume counter, where Size denotes the packet size and FowID is the index of the traffic fow. The voume counter maintains a ist of buckets for each fow. A bucket U ij stores the traffic voume x ij of the j-th fow at the i-th time interva. An array of buckets contains a traffic voumes within a siding window of the ength n for each fow. When a pair (FowID,Size) with FowID = j comes at the i-th time interva, the corresponding bucket U ij wi be increased by Size. When a time interva ends, we just deete the ast bucket and create a new bucket for the new time interva. The oca monitor reports y ij = x ij x tj (4) at the time interva t, which possiby contains anomaous traffic to the NOC, where x tj denotes the mean of traffic voumes within the siding window at the current time interva t, x tj = n t i=t n+ x ij. (5) 3..2 Sketch Computation The sketch is a compact representation of a sequence of traffic measurements in order to compute interested statistics efficienty [22], [23]. It is an extension of the Random Projection which projects a n-dimensiona vector into a random-seected -dimensiona sketch with a set of random vectors. After the projection, the distance between the origina vectors can be preserved with the benefit that the size of the sketches is much ess than the dimension n of the origina vectors, i.e. = O(og n). At each oca monitor, we compute an -dimensiona sketch of the traffic voume for each fow. And a monitors share the same random numbers for the sketch computation. At each time interva, we use some pseudo random generators to compute independent and identicay-distributed (i.i.d.) random numbers, denoted by r i,...,r i, from a specific probabiity distribution F. Here, F coud be the standard norma distribution, or some simpe distributions that are easy to generate random numbers [24]. The property of the probabiity distribution F is cosey reated to the detection accuracy. The sketch z kj is computed as, z kj = t r ik (x ij x tj ) = t r ik y ij (6) i=t n+ for i = t n +,...,t, j =,...,m, and k =,...,. i=t n+ 3..3 Principe Component Anaysis NOC gets a sketches {z kj : k =,...,, j =,...,m} from oca monitors, and organizes them into a m matrix Z. PCA is appied on Z and treats each row as a point in R m and each coumn as a variabe. PCA performs a coordinate rotation that aigns the transformed axes with the directions that make the projections of the row vectors on each axis get as arge variance as possibe. Principa components are the unit vectors aong these axes. The first principa component of Z, denoted by a, can be found as, a = arg max Zx (7) x =

7 where arg max stands for the vector x = (x,...,x m ) T that satisfies x = and makes the function Zx get the maximum vaue. Here, x stands for the Eucidean norm x = x 2 + + x2 m. (8) With the first r principa components, i.e. a,...,a r, the r-th principa component a r can be found by subtracting the first r principa components from Z, r a r = arg max (Z Za j a T j )x. (9) x = The standard procedure to find principe components is the singuar vaue decomposition (SVD). A pair of vectors a R m and b R are singuar vectors of Z, if Za = λb and b T Z = λa T, where λ is a rea number denoting the corresponding singuar vaue. And the principe components, i.e. a,...,a m, are one of the pairs of singuar vectors. Usuay, the corresponding singuar vaues of each principe component are ordered, i.e. λ λ 2 λ m 0. Then the matrix Z can be written as Z = j λ j b j a T j. (0) 3..4 Anomay Distance Computation Given the principe components, i.e. a,...,a m, we choose the ast few principe components to compute the anomay distance of an observation y i = (y i,...,y im ) T, d Z (y i ) = (a T j y i) 2 () j=r+ where r is an integer, r < m. Because the first r principe components can capture the main pattern in the traffic, a norma observation shoud reside in the subspace of a,...,a r, and the ast m r principe components are assumed to contain ony random fuctuations. Thus, the probabiity distribution of the anomay distance can be approximated by a distribution of a sum of chi-square random variabes, because a T j y i foows the norma distribution. Before computing anomay distance, we need to determine the number of principa components which are corresponding to random fuctuations in the traffic voumes. The ast m r principe components are chosen to compute anomay distance, where r usuay refers as the size of the norma subspace. There are severa techniques which can be appied to determine the size of the norma space, such as kσ-heuristic, Catte s Scree Test, and so forth. Here, we give a brief introduction about the 3σ-heuristic [2]. The projection of the matrix Z on the j-th principe component, i.e. Za j, is examined one by one. When a projection is found that the vaue of an eement in Za j exceeds 3σ j from the mean, where σ j is the standard deviation, this and a remaining principe components are seected. 3..5 Threshod Computation The threshod computation is based on the faut detection in mutivariate process contro [25]. Because a,...,a m is an orthonorma set of vectors, we get y i = (a T j y i) 2. (2) Let Q = [a,...,a r ], and then d Z (y i ) = j=r+ (a T j y i) 2 = y i Ty i r (a T j y i) 2 = (I QQ T )y i. (3)

8 Therefore, the anomay distance d Z (y i ) equas to the squared prediction error (SPE) [25]. We can compute a threshod δ based on the Q-statistic deveoped by Jackson and Mudhokar, where [ δ 2 c 2ϕ2 h = ϕ 2 + + ϕ ] /h 2h(h ) ϕ ϕ 2 (4) c =, (5) ϕ k = (n ) k h = 2ϕ ϕ 3, (6) j=r+ 3ϕ 2 2 λ 2k j (k =,2,3). (7) NOC first computes the anomay distance according to Eq.(). At the ast step, if d Z (y i ) > δ, NOC identifies y i as a traffic anomay. 3.2 Detection Exampe Voume.5 0.5 Distance 2 x 08 ATLA CHIC 0 5700 5750 5800 5850 5900 3 x 08 Time 2 Anomay Distance Fig. 5. An exampe of traffic anomaies by SSM 0 5700 5750 5800 5850 5900 Time Given an observation y i, we can identify y i as a traffic anomay with a probabiity at east if d Z (y i ) > δ. Here, we use an exampe from the Abiene network [5] to show the detection procedure. There are 9 routers in the Abiene network, i.e. ATLA, CHIC, HOUS, KANS, LOSA. NEWY, SALT, SEAT, and WASH. We first get the traffic voume series y of each OD fow within 0 days from the Voume Counter, e.g. y ATLA CHIC = ( 4.07 0 7,...,4.54 0 7 ). (8) } {{ } n=2880 Then we compute the sketch z ATLA CHIC by using n random numbers r ik according to Eq.(6), z ATLA CHIC = (4.26 0 7,...,8.73 0 7 ). (9) } {{ } =300 NOC coects z O D for a OD fow and organizes them into a matrix Z. By appying PCA on Z, we get principe components and the eigenvaues. Given an observation y 5857 = (2.24 0 7,..., 2.42 0 8 ) } {{ } m=8 (20)

9 NOC computes the anomay distance d Z (y 5857 ) = 2.08 0 8 according to Eq.(). The threshod δ =.79 0 8 is computed as Eq.(4). Therefore, d Z (y i ) > δ and NOC identifies the i = 5857 time interva as a traffic anomay. 3.3 Computation Compexity SSM detects traffic anomaies based on the same principes as Lakhina s method [2], which can detect ow-profie coordinated traffic anomaies. If there is an increase on severa fows at the same time, we can detect that the anomay distance wi exceed the threshod. In fact, SSM is an approximation agorithm for Lakhina s method, which can improve the computation compexity. Theorem : The computation compexity and the space requirement of SSM are both O(wn og n) at the oca monitor, where w denotes the maximum possibe number of traffic fows. The computation compexity is O(m 2 og n) and the space requirement is O(m og n) at the NOC. Proof: At a oca monitor, we first compute the mean of traffic voumes and then subtract it from x ij in order to get y ij, which takes O(wn) running time. Then, y ij is mutipied by the random numbers r ik, which need to compute O(wn og n) productions. We cacuate the sum of the productions as the sketch, that requires O(wn) running time. Therefore, the computation compexity is O(wn og n). The oca monitor needs to save the traffic voumes and random numbers, which requires O(wn) spaces. At the NOC, the computation compexity of the SVD on a m matrix is O(m 2 ), which means that the computation compexity is at most O(m 2 og n). In order to save the sketch matrix, NOC needs O(m) = O(m og n) memory space. At each time step, NOC uses the observation y and pre-computed principe components to detect traffic anomaies, which ony requires O(m 2 ) running time. In genera, the SSM requires O(m og 2 n) running time and O(m og n) space at the NOC. 3.4 Error Bound Anaysis First of a, we give a brief introduction about PCA-based method, which was introduced by Lakhina to detect coordinated ow-profie traffic anomaies. Next, we prove that SSM can bound the detection error at a use-specified accuracy eve. 3.4. PCA-based Anomay Detection In Lakhina s method, a traffic voume x ij from m traffic fows within the siding window of the ength n are organized into a n m matrix X, which is adjusted to a matrix Y with zero coumn mean, i.e. y ij = x ij x tj. PCA is appied on the matrix Y, and the SVD of Y is denoted by Y = η j u j vj T, (2) where v j is the principa component and η j is the corresponding singuar vaue. Lakhina found that the traffic voumes of mutipe fows had a ow intrinsic dimensionaity, which means that the norma traffic can effectivey reside in a r-dimensiona subspace with r m. An adjusted observation y i = (y i,...,y im ) T can be decomposed into norma and abnorma subspaces, y i = y i,norma + y i,anomay (22) where y i,norma = PP T y i, y i,anomay = (I PP T )y i, (23) with P = [v,...,v r ]. The size of the norma subspace, denoted by r, is determined by the 3σ-heuristic. The traffic observation is cassified as a norma traffic if y i,anomay Q, (24)

0 where Q is defined as the same as Eq.(4), Q 2 = φ c 2φ 2 h 2 0 + + φ 2h 0 (h 0 ) /h0 φ φ 2, (25) where φ k = h 0 = 2φ φ 3 3φ 2, (26) 2 j=r+ σ 2k j (k =,2,3), (27) and σ j is the standard deviation of the projection of the measurements on the j-th principa component, which can be estimated as σ j = Yv j = η j. (28) n n 3.4.2 Error Bound for SSM In this section, we expain why SSM can compute principa components based on the sketch matrix Z and further detect anomaies based on them ike Lakhina s method. A proofs assume that the standard norma distribution is used for the sketch computation. Let R be the n random matrix, which consists of the random number r ik from the standard norma distribution. According to the sketch computation in Eq.(6), we have z j = R T y j and Z = R T Y, (29) where y j and z j are a coumn in Y and Z, respectivey. The vector z j is aso caed the random projection of y j, which has the foowing properties [20]. Lemma : Let R be a n random matrix from the standard norma distribution and z j = R T y j. We have E( z j 2 ) = y j 2 ; P( z j 2 y j 2 ε y j 2 ) < 2e (ε2 ε 3 ) 4 for ε > 0. Proof: E( z j 2 ) = E( = E = = zkj) 2 k= k= ( t i=t n+ t k= i=t n+ t yij 2 i=t n+ = y j 2 r ik y ij ) 2 y 2 ij E(r2 ik ) + t t i=t n+ i =i+ 2y ij y i je(r ik )E(r i k) (30) Let W k = y j z kj = y j ti=t n+ r ik y ij, which is a standard norma variabe. We define W = y j 2 z j 2 = Wk 2 (3) k=

It foows the chi-square distribution χ 2. Therefore, using Markov s inequaity. P( z j 2 ( + ε) y j 2 ) = P(W ( + ε)k) P( z j 2 ( + ε) y j 2 ) Π k= E(eθWk ) Because W foows the standard norma distribution, = P(e θw e (+ε)θ ) E(eθW ) e (+ε)θ (32) = e (+ε)θ ( ) E(e θw 2 ) (33) e (+ε)θ E(e θw 2 ) = 2θ. (34) The above equation hods for any θ < /2. Thus we get, ( ) e 2(+ε)θ k/2 P(W ( + ε)) (35) 2θ The optima choice of θ is ε/2( + ε). So we get, P(W ( + ε)) ( ( + ε)e ε) k/2 < e (ε 2 ε 3 ) 4 (36) Simiary, P(W ( ε)) ( ( + ε)e ε) k/2 < e (ε 2 ε 3 ) 4 (37) According to the above two equations, we get P( z j 2 y j 2 ε y j 2 ) < 2e (ε2 ε 3 ) 4 Besides the standard norma distribution, there are severa probabiity distributions which have been proposed for the random projection. Aon intorduced the tug-of-war agorithm [24], where the random matrix R is generated from the probabiity distribution r ik = { with probabiity /2 + with probabiity /2 Later, Achioptas [9] gave a more efficient agorithm, i.e. the sparse random projection with r ik = with probabiity /2s s 0 with probabiity /s + with probabiity /2s (38) (39) where s is an integer. In the sparse random projection, ony s of the data need to be processed. Recenty, very sparse random projection has been recommended by Li et. a. [2], which uses R of entries in {,0,} with probabiity { 2 n, n, 2 }. For the sparse random projection, we have the foowing n properties [9], [2], Lemma 2: Let R be a n random matrix with entries in {,0,} with probabiities {/2s, /s,/2s} and z j = R T y j. For ε > 0, we have E( z j 2 ) = y j 2 ; P( z j 2 y j 2 ε y j 2 ) < 2e (ε2 /2 ε 3 /3) 2. Proof: It is easy to verify that E( z j 2 ) = y j 2 (40) The second part has not finished yet.

2 In the foowing part, we can use either convetiona random projection or sparse random projection, both of which give the same resut. We wi not distingush two kinds of random projections. For the SVD, i.e. Z = j λ jb j a T j and Y = j η ju j vj T, the singuar vaues are approximatey preserved. Lemma 3: If > C og n ε for a arge enough constant C and an arbitrary positive constant ε, 2 r r r ( ε) ηj 2 λ 2 j ( + ε) ηj 2 (4) for r with the probabiity 2e C og n 4. Proof: Because λ 2,...,λ2 r are the first r argest eigenvaues of the matrix Z T Z and v,...,v m are an orthonorma set of vectors, we have r r vj T (Z T Z)v j λ 2 j = = = r r r vt j Y T RR T Yv j η2 ju T j RR T u j According to the properties of Random Projection, we have We aso know that r λ 2 j η 2 j R T u j 2. (42) r ηj 2 ( ε) u j 2 r = ( ε) ηj 2. (43) λ 2 j = a T j Z T Za j = at j YT RR T Ya j = R T (Ya j ) 2 ( + ε) Ya j 2. (44) Because η 2,...,η2 r are the first r argest eigenvaues of the matrix YT Y and a,...,a m are an orthonorma set of vectors, we have r r a T j (YT Y)a j Therefore, we have ηj 2 = r r Ya j 2 + ε λ2 j. (45) 2 r λ 2 j ( + ε) ηj. 2 (46)

3 Next, we want to bound the error of the covariance matrix in order to get a good approximation of the anomay distance. According to the fact that principa components consists of a orthonorma set of vectors and Y 2 F = m ηj 2, we can easiy easiy get the foowing resut. Lemma 4: Let V = Y T Y and A = Z T Z. If > C og n ε for a arge enough constant C, then with the probabiity 2 2e C og n 4, we have V A F 2ε Y 2 F (47) where X F = mi= n x 2 ij is the Frobenius norm of a matrix X Rn m. Proof: First, because a,...,a m are an orthonorma set of vectors, For each j =,...,r, we have Because V A 2 F = (V A)a j 2. (48) (V A)a j 2 = a T j V T Va j + a T j A T Aa j a T j V T Aa j a T j A T Va j = Va j 2 + λ 4 j 2λ 2 ja T j Va j. (49) λ 2 j = a T j Aa j = at j Y T RR T Ya j According to the properties of Random Projection, we have = R T (Ya j ) 2. (50) ( ε) Ya j 2 R T (Ya j ) 2 ( + ε) Ya j 2 (5) with a high probabiity. According to Ya j 2 = a T j Va j, we have a T j Va j + ε λ2 j. (52) Because a,...,a m are an orthonorma set of vectors, we have m Va j = m η 4 j. Therefore, V A 2 F = ) (η j 4 + λ 4 j 2λ 2 ja T j Va j ( ηj 4 + λ 4 j 2 ) + ε λ4 j. (53) Second, v,...,v m are aso an orthonorma set of vectors, V A 2 F = (V A)v j 2. (54) For each j =,...,m, (V A)v j 2 = v T j V T Vv j + v T j A T Av j v T j V T Av j v j Av j = η 4 j + Av j 2 2η 2 jv T j Av j. (55)

4 We aso have v T j Av = v T j Z T Zv j = vt i Y T RR T v j According to the properties of Random Projection, we have = η 2 j R T u j 2. (56) R T u j 2 ( ε) u j 2 = ( ε). (57) Because v,...,v m are aso an orthonorma set of vectors, we have m Av j = m λ 4 j. Therefore, we have ( ) V A 2 F = ηj 4 + λ4 j 2η2 j vt j Av j ( ) ηj 4 + λ4 j 2( ε)η4 j. (58) Finay, based on Eq.(53) and Eq.(58), we get m V A 2 F ( ε ηj 4 + ) ( ) + ε λ4 j ε λ 4 j + η4 j. (59) Based on Lemma 3 and Y 2 F = m ηj 2, we have the foowing resut. V A 2 F ε ε ( ) λ 4 j + η4 j 2 λ 2 j ε(( + ε) 2 + ) + 2 ηj 2 2 ηj 2 = 2ε( + ε + ε 2 /2) Y 4 F 2ε Y 4 F. (60) Therefore, V A F 2ε Y 2 F According to Eq. (47), we aso get a perturbation error bound for the eigenvaues from the Mirsky s theorem [26], (λ 2 j m η2 j )2 V A F 2ε Y 2 F. (6) Based on Lemma 3 and Lemma 4, we know that φ k can be approximated by ϕ k up to the mutipicative factor ( ± ǫ k/2 ). Therefore, the threshod Q can be approximated by δ up to the mutipicative factor ( ± ǫ /2 ). In the foowing part, we want to prove that d Y (y) can be approximated by d Z (y). The coumn space of a matrix M is the subspace spanned by the coumns, which is denoted by R(M) = {Mx : x R m }. The set of a eigenvaues of the matrix M is denoted by L(M) = {λ : Mx = λx, x 0}. And Θ denotes

5 the canonica ange between two subspaces, Θ(M, N) = sin Σ, where M and N are r-dimensiona subspaces of R m. The coumns of their orthogona bases can be transformed by a unitary matrix to I Γ I 0 Γ 0 0 and Σ if 2r m, or 0 I and 0 I if 2r > m. (62) 0 0 0 0 Σ 0 The matrix permutation theorem can be written as foowing [27]. Lemma 5: Matrix Perturbation: Let M have the spectra resoution ( ) ( ) U T L 0 M(U U 2 ) =, (63) 0 L 2 U T 2 where (U,U 2 ) is unitary with U R n r. Let B R n r have orthonorma coumns, and for any symmetric H of order r, et E = MB BH. If ν = min L(L 2 ) L(H) > 0, then we have sin Θ[R(U ), R(B)] F E F ν. (64) We appy the matrix permutation theorem to the covariance matrices A and V, and get an error bound for the anomay distance. Theorem 2: If > C og n ε for a arge enough constant C, then 2 d Z (y) d Y (y) 2 ε η 2 r+ η2 r Y 2 F y (65) with the probabiity 2e C og n 4. Proof: Let Q = [a,,a r ], Q c = [a r+,,a m ], P = [v,,v r ], and P c = [v r+,,v m ]. We have the foowing spectra resoutions, ( ) ( ) Q T Λ 0 Q T A(QQ c ) =, (66) c 0 Λ 2 ( ) ( ) P T M 0 V (PP c ) =, (67) 0 M 2 P T c where Λ = diag(λ 2,...,λ2 r), Λ 2 = diag(λ 2 r+,...,λ2 m), M = diag(η 2,...,η2 r), and M 2 = diag(η 2 r+,...,η2 m). Here diag( ) denotes a diagona matrix. Let E = VQ QΛ. Because QΛ = AQ, we have According to Lemma 5, we have E = VQ AQ. (68) sin Θ[R(P), R(Q)] F E F ν = V A F ν where ν = η 2 r+ λ2 r η2 r+ η2 r. The project matrices of R(P) and R(Q) are PPT and QQ T, respectivey. Then, according to Ref. [27], we have (69) PP T QQ T F = 2 sin Θ[R(P), R(Q)] F 2 V A F. (70) ν

6 Then we get d Z (y) d Y (y) = (I QQ T )y (I PP T )y (I QQ T )y (I PP T )y = (PP T QQ T )y PP T QQ T F y 2 V A F y ν 2 ε ηr+ 2 η2 r Y 2 F y. (7) Therefore, the anomay distance can be approximated up to the mutipicative factor ( ± ε /2 ). 4 ADVANCED SKETCH METHOD (ASM) Packet Stream Aggregation Variance Histograms Voume Counter Sketch Traffic Voume PCA Computation Computing Anomay Distance Computing Threshod d >? Aarm? Loca Monitor Network Operation Center Fig. 6. System mode of advanced sketch method If the ength of the siding window is so ong that a oca monitor can not hod a traffic measurements in the memory, we need to find an aternative method to compute the sketch onine. In the ASM, we utiize a variance estimation agorithm to maintain an approximation of the sketch in order to reduce the computation compexity and the space requirement at a oca monitor. The architecture of the advanced sketch method is shown in Fig.6. We use Variance Histograms (VH) to maintain an approximation of the sketch for each fow, which is a modification of a variance estimation agorithm [28]. In this ASM, the Voume Counter modue maintains ony a bucket for each traffic fow at the current time interva. The Sketch Computation modue is repaced by Variance Histograms. A other modues are the same as the SSM. We first introduce the Variance Histograms and then give the sketch computation agorithm at each oca monitor. At ast, we use the same exampe as the SSM to show the detection procedure in the ASM. 4. Agorithm 4.. Variance Histograms A Variance Histogram (VH) contains a ist of buckets for each traffic fow, which are maintained by the variance estimation agorithm in Fig. 7. The traffic voume x ij at each time interva is treated as a data eement for the variance computation in this section. Given a sequence of data eements {x (t n+)j,...,x tj }, the variance is defined as V tj = t i=t n+ (x ij x tj ) 2 (72) where x tj = n ti=t n+ x ij is the mean of data eements. A bucket B pj contains the foowing statistics information for a subsequence of traffic voumes x ij. τ pj : time stamp; n pj : tota number of data eements in the subsequence;

7 Step: Check the time stamp of the ast bucket B Nj if τ Nj t n deete B Nj ; endif Step2: Create a new bucket B j τ j = t; n j = ; µ j = x tj ; V j =0; for k =,..., Z kj = x tj r tk ; R kj = r tk ; endfor Step3: Traverse the bucket ist to merge buckets p = ; B B = B j ; whie B (p+2)j exists. B A = B (p+)j B (p+2)j ; if n A + n B > n/2 return endif if n A ε 0 n B and V A B V B ε 5 V B deete B (p+2)j ; B (p+)j = B A ; ese p = p + ; B B = B B B pj ; endif endwhie Fig. 7. Procedures for updating VH µ pj : mean of data eements in the subsequence; V pj : variance of data eements in the subsequence; Z pkj : sum of x ij r ik for a x ij in the subsequence; R pkj : sum of the corresponding r ik. The agorithm starts with an empty ist of buckets and updates the ist of buckets with three steps as shown in Fig.7. First, when a new data eement x tj comes, the current time stamp is updated to t. We check the odest bucket B Nj and deete it if it is expired, where N denotes the number of buckets in the ist. Second, the new eement constitutes a new bucket B j and each od bucket B pj becomes B (p+)j for p =,...,N. Last, we check whether there are quaified pairs of buckets that can be merged. Let B A = B (p+)j B (p+2)j and B B = p q= B qj. We merge two adjacent buckets B (p+)j and B (p+2)j if and ony if they satisfy the foowing merging rues. Rue : V A B V B ε 5 V B. Rue 2: n A ε 0 n B. Rue 2: n A + n B n/2. When two adjacent buckets B pj and B qj merge into a new bucket B (p q)j, the merged bucket s time stamp is set to be the time stamp of the oder one, and the merged bucket s statistics information can be cacuated as foow, n (p q)j = n pj + n qj (73) µ (p q)j = n pjµ pj + n qj µ qj n pj + n qj (74) V (p q)j = V pj + V qj + n pjn qj (µ pj µ qj ) 2 n pj + n qj (75) Z (p q)kj = Z pkj + Z qkj (76) R (p q)kj = R pkj + R qkj (77)

8 Packet stream Header Header Header Aggregation (SrcIP, DstIP, Size, ) (FowID, Size) FowID=j U j U j x tj t 0 r t x tj r tk x tj r t x tj r t x tj r tk x tj r t x tj τ j n j µ j V j R j Z j R kj Z kj R j Z j B j B 2j B pj B (p+)j B (p+2)j B Nj B B B AUB B A Fig. 8. Sketch computation with VH Let B a,j = N p= B pj denote the bucket by merging a buckets together, and ˆV = V a,j be the estimated variance. We get the foowing resut [28]. Lemma 6: Variance Histogram maintains a ε-approximate variance, with O( ε og n) space and O() running time. ( ε)v ˆV V, (78) 4..2 Sketch Computation Agorithm At a oca monitor, we impement a VH for each traffic fow and n pseudo random number generators shared by a traffic fows among oca monitors. The architecture for the sketch computation at a oca monitor is shown in Fig. 8. The voume counter ony uses a bucket to maintain the traffic voume at the current time interva t for each traffic fow. When a time interva ends, the voume counter reports the traffic voume x tj to the Variance Histogram V H j. The V H j updates its buckets as shown in Fig. 7. At each time interva, we can compute an approximation of the sketch as, ẑ kj = (Z a,kj n a,j µ a,j R a,kj ). (79) where n a,j, µ a,j, Z a,kj, and R a,kj are the eements in B a,j = N p= B pj. 4.2 Detection Exampe NOC coects the sketches ẑ kj from oca monitors, and foows the same procedure in the SSM to identify traffic anomaies. Here, we use the same exampe in Section 3.2. The sketch of the measurement y ATLA CHIC in Eq.(8) is ẑ ATLA CHIC = (.00 0 8,..., 2.04 0 8 ) (80) First, NOC organize ẑ Origin Destanition into a matrix Ẑ. Second, NOC uses the same method as the SSM to compute principe components and eigenvaues. Last, the anomay distance and the threshod are computed according to Eq.() and Eq.(4), respectivey. ˆδ =.82 0 8, dẑ(y 5857 ) = 2. 0 8. (8) Therefore, we get dẑ(y i ) > ˆδ and thus identify i = 5857 as a time interva containing traffic anomaies, which gets the same resut as the SSM.

9 n x (t-n)j x (t-n+)j x (t- j+)j x (t-)j x tj B (N+)j B Nj B 2j B j j Fig. 9. An iustration of sketch approximation 4.3 Computation Compexity ASM foows the same procedure as SSM to detect traffic anomaies at the NOC, which has the same computation compexity and the space requirement as SSM. Because the variance estimation agorithm ony needs O( ε og n) space and O() running time, we have the foowing theorem. Theorem 3: ASM requires O(w og n) running time and (w og 2 n) space at the oca monitor. Proof: According to the variance estimation agorithm [28], VH ony needs O() running time to update the variance buckets. But we aso need to update the sketches Z pkj and the random numbers R pkj. Therefore, ASM needs O() running time to update the variance histograms. A bucket needs O() space to storage the statistics information and we have at most O(og n) buckets for each traffic fow. In genera, the oca monitor need O(w og 2 n) space and O(w og n) running time, because = O(og n). 4.4 Error Bound Anaysis The approximated sketch ẑ kj is a sketch of a subsequence of the traffic voumes within the siding window as shown in Fig.9. We organize ẑ kj into a m matrix Ẑ. Then we have the foowing resut. Lemma 7: Let Â = ẐT Ẑ. If > C og n ε for a arge enough constant C, then 2 Â V F 2 ε Y 2 F (82) with the probabiity 2e C 4 og n. Proof: The variance estimation agorithm maintain the variance ˆV of a subsequence of the data eements in the siding window of the size n, which have the property, ( ε)v < ˆV < V (83) according to Ref. [28]. Let n { }} { ŷ j = ( 0,...,0,y (t j+)j,...,y tj ) T. (84) } {{ } j Because V = t i=t n+ (x ij x tj ) 2 = y j 2 and ˆV = t i=t j+ (x ij x tj ) 2 = ŷ j 2, then we have According to Eq.(79), we have ŷ j y j 2 = ˆV V < εv = ε y j 2. (85) ẑ kj = where r k = (r (t n+)k,...,r tk ). Therefore, t i=t j+ (x ij x tj )r ik = r k ŷ j. (86) ẑ j z j = R(ŷ j y). (87) where ẑ j and z j are the j-th coumn in Ẑ and Z, respectivey. We appy the properties of the Random Projection on the vector ẑ j z j, ẑ j z 2 ( + ε) ŷ j y 2. (88)

20 Based on Eq.(85) and Eq.(88), we have z j ẑ j 2 ε y j 2. (89) Ẑ Z 2 F < ε Y 2 F. (90) Because A = Z T Z, we have Â A 2 F = ẐT Ẑ Z T Z 2 F = ẐT Ẑ ẐT Z + ẐT Z Z T Z 2 F ẐT Ẑ ẐT Z 2 F + ẐT Z Z T Z 2 F 2ε Y 4 F. (9) According to Lemma 4, we have Â V 2 F = Â A + A V 2 F Â A 2 F + A V 2 F 4ε Y 4 F (92) Using Lemma 7, we can bound the threshod ˆδ in ASM. The anomay distance dẑ(y i ) can be bounded by simiar method in Theorem 2. Theorem 4: If > C og n ε for a arge enough constant C, then 2 dẑ(y) d Y (y) 2 2ε η 2 r+ η2 r Y 2 F y (93) with the probabiity 2e C 4 og n. Proof: Let Ẑ = j ˆλ jˆbj â T j, and we have ˆQ = [â,,â r ], ˆQ c = [â r+,,â m ]. The spectra resoution of the matrix Â can be written as, ( ˆQT ˆQ T c ) ( ( ) ˆΛ Â ˆQ ˆQc = 0 0 ˆΛ2 ), (94) where ˆΛ = diag(ˆλ 2,..., ˆλ 2 r ) and ˆΛ 2 = diag(ˆλ 2 r+,..., ˆλ 2 m ). Let Ê = V ˆQ ˆQˆΛ where V = Y T Y as before. Because ˆQˆΛ = Â ˆQ, we have According to Lemma 5 (Matrix Permutation Theorem), we have Ê = V ˆQ Â ˆQ. (95) sin Θ[R(P), R( ˆQ)] F Ê F ˆν = V Â F ˆν where ˆν = η 2 r+ ˆλ 2 r η 2 r+ η2 r. The project matrices of R(P) and R( ˆQ) are PP T and ˆQ ˆQ T, respectivey. Then, according to Ref. [27], we have (96) PP T ˆQ ˆQ T F = 2 sin Θ[R(P), R( ˆQ)] F 2 V Â F ˆν. (97)

2 Then we get dẑ(y) d Y (y) = (I ˆQ ˆQ T )y (I PP T )y (I ˆQ ˆQ T )y (I PP T )y = (PP T ˆQ ˆQ T )y PP T ˆQ ˆQ T F y 2 V Â F y ˆν 2 2ε ηr+ 2 η2 r Y 2 F y. (98) 5 DISCUSSIONS We can bound the error of the estimated threshod and the anomay distance in terms of η j and Y. Therefore, our agorithms are an approximation of Lakhina s method, and the accuracy of our agorithms depends on the properties of the covariance matrix V of the traffic measurements. In fact, the PCA-based detection method can have a high fase aarm rate when a the eigenvaues of V are cose to each other. Fortunatey, Lakhina observed that the above situations were rare in the traffic voume measurements [6]. There are ony a few eigenvaues which are much arger than 0, and the other are cose to 0. In this situation, the vaues of the threshod and the anomay distance ony have a sma fuctuation in both SSM and ASM. Therefore, our agorithms can detect the traffic anomaies with a ow fase negative rate. Random projection method has be proposed for privacy preserving distributed data mining [29]. If NOC doesn t know the sequence of the random number r ik for the sketch computation, there is no way to guess the traffic voume series. Athough NOC may know r ik, NOC doesn t know the ength due to the VH agorithm. Aso, oca monitors can introduce an additiona permutation ǫ into the random numbers, e.g. r ik = r ik + ǫ w ik where w ik is an independent random number from the standard norma distribution. Then NOC needs to sove a set of inear equations, ẑ j = Rŷ j (99) where ẑ j = R ŷ j and R = R + ǫ W (W is a matrix with entries w ik ). The oca monitor can specify the parameter ǫ in order to increase the estimation error of the traffic voume y ij. For the communication cost, supposing the NOC set the period of updating traffic measurements as T, Lakhina s method needs to send a vector of measurements of the ength T, and our agorithm sends a vector of the ength. Because is regardess of T, our agorithm can reduce the communication cost by /T if T >. If the NOC has enough bandwidth, the sketch computation can aso be done at the NOC side. Then our agorithm can use the same communication cost as Lakhina s method. 6 EXPERIMENTAL EVALUATION In the evauation, we want to determine the size of the sketches which is enough to get a good approximation of Lakhina s method, because a smaer size of the sketches means that our agorithms require ess computation resource. Our agorithm is very usefu when the time ength of the traffic measurements is so ong that the oca monitors don t have enough space to save them. Thus we ony impement the ASM in the foowing evauation. Abiene Observatory Data Coections [5] are used as the data set to evauate the performance of our agorithm. Abiene is the Internet2 backbone network, which spans the continenta USA. We use the data coected by Juniper s J-Fow too between 06/09/2008 and 06/29/2008. We use both BGP and ISIS routing information to aggregate packets into OD fows. When a packet arrives, we first update a temporary ist which saves the tota traffic voume of each OD fow in every five-minutes interva. In this way, we can construct the traffic voume time series. We first appy Lakhina s method to detect anomaies, and then use these detected anomaies as the rea

22 anomaies to evauate the detection accuracy of our method. We compute both Type I errors and Type II errors with different size of the sketches. Type I = number of fase anomaies tota number of true norma observations, number of fase norma observations Type II = tota number of true anomaies (00) Errors 0.8 0.6 0.4 Type I error, n = 0 days Type II error, n = 0 days Type I error, n = 4 days Type II error, n = 4 days 0.2 0 0 200 400 600 800 000 Length of the sketch Fig. 0. Detection Errors in Abiene Data We check the eigenvaues of the measurement matrix Y, and choose the size of the norma subspace as r = 6 which is proper for our data. We check each observation just after the siding window, and show Type I errors and Type II errors of the ast week in Fig.0. We find that both type I errors and type II errors decrease quicky at the beginning and then reach a neary optima vaue. If the size of the sketch is more than 00, there is no remarkabe decrease in the mean of errors and ony the variance of the errors becomes smaer. Because the number of true anomaies is much ess than the number of true norma observations, the optima type II error is higher than the optima type I error. Because there is aways some randomness in the data and the ength of the sketch shoud be ess than the ength of the siding window, we cannot reduce type I and type II errors to zero in practice. 7 CONCLUSION AND FUTURE WORK In this paper, we study the network-wide traffic anomay detection probem. Our agorithm archives O(w og n) running time and O(w og 2 n) space at oca monitors. The NOC coud run PCA-based detection method with O(m 2 og n) running time and O(m og n) space. Our agorithm aso make the ISPs be abe to impement the detection method by paying carefu consideration about the privacy protection, the communication cost, and other resources over a distributed computing environment. In the future, we wi study the detection of traffic anomaies by using various traffic features ike the communication patterns. ACKNOWLEDGMENT This project has benefited from the use of measurement data coected on the Internet2 network as part of the Internet2 Observatory Project. REFERENCES [] K. Park and W. Wiinger, Sef-Simiar Network Traffic and Performance Evauation. New York: John Wiey & Sons, Inc., 2000. [2] A. Lakhina, M. Crovea, and C. Diot, Diagnosing network-wide traffic anomaies, SIGCOMM Comput. Commun. Rev., vo. 34, no. 4, pp. 29 230, 2004. [3] H. Ringberg, A. Soue, J. Rexford, and C. Diot, Sensitivity of pca for traffic anomay detection, SIGMETRICS 07, pp. 09 20, 2007.

[4] Y. Huang, N. Feamster, A. Lakhina, and J. J. Xu, Diagnosing network disruptions with network-wide anaysis, SIGMETRICS 07, pp. 6 72, 2007. [5] A. Lakhina, M. Crovea, and C. Diot, Mining anomaies using traffic feature distributions, SIGCOMM 05, pp. 27 228, 2005. [6] A. Lakhina, K. Papagiannaki, M. Crovea, C. Diot, E. D. Koaczyk, and N. Taft, Structura anaysis of network traffic fows, SIGMETRICS 04/Performance 04, pp. 6 72, 2004. [7] X. Li, F. Bian, M. Crovea, C. Diot, R. Govindan, G. Iannaccone, and A. Lakhina, Detection and identification of network anomaies using sketch subspaces, IMC 06, pp. 47 52, 2006. [8] P. Barford, J. Kine, D. Ponka, and A. Ron, A signa anaysis of network traffic anomaies, IMW 02, pp. 7 82, 2002. [9] L. Huang, X. L. Nguyen, M. Garofaakis, J. Heerstein, M. Jordan, A. Joseph, and N. Taft, Communication-efficient onine detection of network-wide anomaies, INFOCOM 07, pp. 34 42, 2007. [0] Y. Zhang, Z. Ge, A. Greenberg, and M. Roughan, Network anomography, IMC 05, pp. 30 30, 2005. [] X. Li, F. Bian, H. Zhang, C. Diot, R. Govindan, W. Hong, and G. Iannaccone, Mind: A distributed muti-dimensiona indexing system for network diagnosis, INFORCOM 06, pp. 2, 2006. [2] T. Ahmed, M. Coates, and A. Lakhina, Mutivariate onine anomay detection using kerne recursive east squares, INFOCOM 07, pp. 625 633, 2007. [3] P. Chhabra, C. Scott, E. Koaczyk, and M. Crovea, Distributed spatia anomay detection, INFOCOM 08, pp. 705 73, 2008. [4] J. Kine, S. Nam, P. Barford, D. Ponka, and A. Ron, Traffic anomay detection at fine time scaes with bayes nets, ICIMP 08, pp. 37 46, 2008. [5] Abiene observatory data coections, www.internet2.edu/observatory/. [6] Caida, www.caida.org. [7] W. LeFebvre, Cnn.com: Facing a word crisis, http://www.tcsa.org/isa200/cnn.txt, 200. [8] P. Barford and V. Yegneswaran, An inside ook at botnets, Advances in Information Security, vo. 27, pp. 7 9, 2007. [9] D. Achioptas, Database-friendy random projections: Johnson-indenstrauss with binary coins, Journa of computer and System Sciences, vo. 66, no. 4, pp. 67 687, 2003. [20] S. S. Vempaa, The Random Projection Method. Rhode Isand: American Mathematica Society, 2004. [2] P. Li, T. J. Hastie, and K. W. Church, Very sparse random projections, KDD 06, pp. 287 296, 2006. [22] B. Krishnamurthy, S. Sen, Y. Zhang, and Y. Chen, Sketch-based change detection: methods, evauation, and appications, IMC 03, pp. 234 247, 2003. [23] C. C. Aggarwa, Data Streams: Modes and Agorithms. New York: Springer, 2007. [24] N. Aon, P. B. Gibbons, Y. Matias, and M. Szegedy, Tracking join and sef-join sizes in imited storage, PODS 99, pp. 0 20, 999. [25] J. E. Jackson and G. S. Mudhokar, Contro procedures for residuas associated with principa component anaysis, Thechnometrics, pp. 34 349, 979. [26] R. Bhatia, Perturbation Bounds for Matrix Eigenvaues. Phiadephia: SIAM, 2007. [27] G. Stewart and J. guang Sun, Matrix perturbation theory. Boston: Academic Press, 990. [28] L. Zhang and Y. Guan, Variance estimation over siding windows, PODS 07, pp. 225 232, 2007. [29] K. Liu and J. Ryan, Random projection-based mutipicative data perturbation for privacy preserving distributed data mining, IEEE Trans. on Know. and Data Eng., vo. 8, no., pp. 92 06, 2006. 23