QUICKEST MULTIDECISION ABRUPT CHANGE DETECTION WITH SOME APPLICATIONS TO NETWORK MONITORING



Similar documents
Sequential Non-Bayesian Network Traffic Flows Anomaly Detection and Isolation

The CUSUM algorithm a small review. Pierre Granjon

Quickest Detection in Multiple ON OFF Processes Qing Zhao, Senior Member, IEEE, and Jia Ye

Is Average Run Length to False Alarm Always an Informative Criterion?

Factor analysis. Angela Montanari

Message-passing sequential detection of multiple change points in networks

Statistics Graduate Courses

Statistical Machine Learning

Capacity Limits of MIMO Channels

Least Squares Estimation

Signal Detection. Outline. Detection Theory. Example Applications of Detection Theory

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Detecting Anomalies in Network Traffic Using Maximum Entropy Estimation

Lecture 8: Signal Detection and Noise Assumption

(67902) Topics in Theory and Complexity Nov 2, Lecture 7

Introduction to Detection Theory

Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.

LECTURE 4. Last time: Lecture outline

Similarity and Diagonalization. Similar Matrices

SYSTEMS OF REGRESSION EQUATIONS

Adaptive Online Gradient Descent

Recall that two vectors in are perpendicular or orthogonal provided that their dot

Factorization Theorems

2.3 Convex Constrained Optimization Problems

Background: State Estimation

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel

Component Ordering in Independent Component Analysis Based on Data Power

Detecting Network Anomalies. Anant Shah

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Notes on Determinant

Linear Codes. Chapter Basics

MIMO CHANNEL CAPACITY

A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property

CONTROL SYSTEMS, ROBOTICS, AND AUTOMATION - Vol. V - Relations Between Time Domain and Frequency Domain Prediction Error Methods - Tomas McKelvey

Decision-making with the AHP: Why is the principal eigenvector necessary

Rethinking MIMO for Wireless Networks: Linear Throughput Increases with Multiple Receive Antennas

0.1 Phase Estimation Technique

Big Data - Lecture 1 Optimization reminders

67 Detection: Determining the Number of Sources

171:290 Model Selection Lecture II: The Akaike Information Criterion

Department of Economics

Monte Carlo testing with Big Data

Orthogonal Diagonalization of Symmetric Matrices

Linear Threshold Units

The p-norm generalization of the LMS algorithm for adaptive filtering

Chapter 6: Multivariate Cointegration Analysis

Chapter 6. Orthogonality

Load Balancing and Switch Scheduling

Model order reduction via Proper Orthogonal Decomposition

On Adaboost and Optimal Betting Strategies

Lecture 4: AC 0 lower bounds and pseudorandomness

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing

MATHEMATICAL METHODS OF STATISTICS

Modeling and Performance Evaluation of Computer Systems Security Operation 1

Performance of TD-CDMA systems during crossed slots

LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS. 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method

2004 Networks UK Publishers. Reprinted with permission.

Load balancing of temporary tasks in the l p norm

Monte Carlo Simulation

Multivariate Normal Distribution

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 5, SEPTEMBER Statistical Analysis of Network Traffic for Adaptive Faults Detection

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Time Series Analysis III

Introduction to Online Learning Theory

QUANTIZED PRINCIPAL COMPONENT ANALYSIS WITH APPLICATIONS TO LOW-BANDWIDTH IMAGE COMPRESSION AND COMMUNICATION. D. Wooden. M. Egerstedt. B.K.

TD(0) Leads to Better Policies than Approximate Value Iteration

By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

Lecture 3: Linear methods for classification

Optimization in R n Introduction

Anomaly detection. Problem motivation. Machine Learning

Direct Methods for Solving Linear Systems. Matrix Factorization

Multivariate Analysis of Variance (MANOVA): I. Theory

Segmentation models and applications with R

Stochastic Inventory Control

Lecture 3: Finding integer solutions to systems of linear equations

Online Hidden Markov Model Parameter Estimation and Minimax Robust Quickest Change Detection in Uncertain Stochastic Processes

System Identification for Acoustic Comms.:

Multisensor Data Fusion and Applications

Assessing the Relative Power of Structural Break Tests Using a Framework Based on the Approximate Bahadur Slope

Probability and Random Variables. Generation of random variables (r.v.)

A Stochastic 3MG Algorithm with Application to 2D Filter Identification

An Adaptive Decoding Algorithm of LDPC Codes over the Binary Erasure Channel. Gou HOSOYA, Hideki YAGI, Toshiyasu MATSUSHIMA, and Shigeichi HIRASAWA

Distributionally Robust Optimization with ROME (part 2)

ALOHA Performs Delay-Optimum Power Control

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.1, No.3,August 2013

A linear algebraic method for pricing temporary life annuities

Transcription:

QUICKEST MULTIDECISION ABRUPT CHANGE DETECTION WITH SOME APPLICATIONS TO NETWORK MONITORING I. Nikiforov Université de Technologie de Troyes, UTT/ICD/LM2S, UMR 6281, CNRS 12, rue Marie Curie, CS 42060 10004 TROYES CEDEX - FRANCE nikiforov@utt.fr Abstract The quickest change detection/isolation (multidecision) problem is of importance for a variety of applications. Efficient statistical decision tools are needed for detecting and isolating abrupt changes in the properties of stochastic signals and dynamical systems, ranging from on-line fault diagnosis in complex technical systems (like networks) to detection/classification in radar, infrared, and sonar signal processing. Keywords: sequential change detection/isolation, multidecision problems, anomaly detection, network monitoring 1. Introduction The quickest change detection/isolation (multidecision) problem is of importance for a variety of applications. Efficient statistical decision tools are needed for detecting and isolating abrupt changes in the properties of stochastic signals and dynamical systems, ranging from on-line fault diagnosis in complex technical systems (like networks) to detection/classification in radar, infrared, and sonar signal processing. The early on-line fault diagnosis (detection/isolation) in industrial processes (SCADA systems) helps in preventing these processes from more catastrophic failures. The quickest multidecision detection/isolation problem is the generalization of the quickest changepoint detection problem to the case of K 1 post-change hypotheses. It is necessary to detect the change in distribution as soon as possible and indicate which hypothesis is true after a change occurs. Both the rate of false alarms and the misidentification (misisolation) rate should be controlled by given levels. 2. Problem statement Let X 1, X 2,... denote the series of observations, and let ν be the serial number of the last pre-change observation. In the case of multiple hypothesis, there are several possible post-change hypotheses H j, j = 1, 2,..., K 1. Let P j k and E j k denote the probability measure and the expectation when ν = k and H j is the true post-change hypothesis, and let P and E = E 0 0 denote the same when ν =, i.e., there is no change. Let (see [1] for details) C γ = δ = (T, d): 0 l K 1 1 j l K 1 El 0 ( ) inf T r : d r = j γ, (1) r 1

where T is the stopping time, d is the final decision (the number of post-change hypotheses) and the event d r = j denotes the first false alarm of the j-th type, be the class of detection and isolation procedures for which the average run length (ARL) to false alarm and false isolation is at least γ > 1. In the case of detection isolation procedures, the risk associated with the detection delay is defined analogously to Lorden s worst-worst-case and it is given by [1] ESADD(δ) = max sup esssup E j ν [(T ν) + F ν ]. (2) 1 j K 1 0 ν< Hence, the imax optimization problem seeks to Find δ opt C γ such that ESADD(δ opt ) = inf δ C γ ESADD(δ) for every γ > 1, (3) where C γ is the class of detection and isolation procedures with the lower bound γ on the ARL to false alarm and false isolation defined in (1). Another imax approach to change detection and isolation is as follows [2, 3]; unlike the definition of the class C γ in (1), where we fixed a priori the changepoint ν = 0 in the definition of false isolation to simplify theoretical analysis, the false isolation rate is now expressed by the maximal probability of false isolation sup ν 0 P l ν(d = j l T > ν). As usual, we measure the level of false alarms by the ARL to false alarm E T. Hence, define the class C γ,β = δ = (T, d): E T γ, max max sup 1 l K 1 1 j l K 1 ν 0 P l ν(d = j T > ν) β. (4) Sometimes Lorden s worst-worst-case ADD is too conservative, especially for recursive change detection and isolation procedures, and another measure of the detection speed, namely the maximal conditional average delay to detection SADD(T) = sup ν E ν (T ν T > ν), is better suited for practical purposes. In the case of change detection and isolation, the SADD is given by SADD(δ) = max sup 1 j K 1 0 ν< E j ν(t ν T > ν). (5) We require that the SADD(δ) should be as small as possible subject to the constraints on the ARL to false alarm and the maximum probability of false isolation. Therefore, this version of the imax optimization problem seeks to Find δ opt C γ,β such that SADD(δ opt ) = inf SADD(δ)for every γ > 1 and β (0, 1). δ C γ,β (6) A detailed description of the developed theory and some practical examples can be found in the recently published book [4]. 3. Efficient procedures of quickest change detection/isolation Asymptotic theory. In this paragraph we recall a lower bound for the worst mean detection/isolation delay over the class C γ of sequential change detection/isolation

tests proposed in [1]. First, we start with a technical result on sequential multiple hypotheses tests and then we give an asymptotic lower bound for ESADD(δ). Lemma 1. Let (X k ) k 1 be a sequence of i.i.d. random variables. Let H 0,..., H K 1 be K 2 hypotheses, where H i is the hypothesis that X has density f i with respect to some probability measure µ, for i = 0,..., K 1 and assume the inequality def. 0 < ρ i j = f i log f i f j dµ <, 0 i j K 1, to be true. Let E i (N) be the average sample number (ASN) in a sequential test (N, δ) which chooses one of the K hypotheses subject to a K K error matrix A = [ a i j ], where a i j = P i (accepting H j ), i, j = 0,..., K 1. Let us reparameterize the matrix A in the following manner : 1 K 1 l=1 α l α 1... α K 1 γ 1 1 K 1 l=2 β 1,l γ 1... β 1,K 1 γ 2 β 2,1... β 2,K 1............. γ i β i,1... β i,k 1............ γ K 1 β K 1,1... 1 K 2 l=1 β K 1,l γ K 1 Then a lower bound for the ASN E i (N) is given by the following formula : (1 γ i ) ln ( K 1 l=1 E i (N) max α 1 l) log 2, ρ i0 (1 γ i ) ln β 1 ji log 2 max for i = 1,..., K 1, where 1 j i K 1 γ i = γ i + K 1 l=1,l i Theorem 1. Let (Y k ) k 1 be an independent random sequence observed sequentially : P0 if k ν L(Y k ) =, ν = 0, 1, 2,..., for 1 l K 1 P l if k ν + 1 ρ i j β i,l.

The distribution P l has density f l, l = 0,..., K 1. An asymptotic lower bound for ESADD(δ), which extends the result of Lorden [5] to multiple hypotheses case, is : ESADD(T; γ) log γ ρ as γ, where ρ def. = 1 l K 1 is the K-L information. 0 j l K 1 ρ l, j ( def. and 0 < ρ l, j = E l 1 log f ) l(y i ) < f j (Y i ) Generalized CUSUM test. The generalized CUSUM (non recursive) test asymptotically attains the above mentioned lower bound [1]. Let us introduce the following stopping time and final decision Ñ = Ñ 1,..., Ñ K 1 ; d = argñ 1,..., Ñ K 1 of the detection/isolation algorithm. The stopping time Ñ l is responsible for the detection of hypothesis H l : Ñ l = inf Ñ l (k), Ñ l (k) = inf n k : S n k (l, j) h k 1 0 j l K 1 Ñ l = inf n 1 : max S n k 1 k n 0 j l K 1 n (l, j) h, S nk (l, j) = i=k log f l(y i ) f j (Y i ). The generalized matrix recursive CUSUM test, which also attains the asymptotic lower bound, has been considered in [6, 7]. Let us introduce the following stopping time and final decision N r = N 1,..., N K 1 ; d r = arg N 1,..., N K 1 of the detection/isolation algorithm. The stopping time N l is responsible for the detection of hypothesis H l : N l = inf n 1 : Q n(l, j) h, 0 k j K 1 Q n (l, j) = (Q n 1 (l, j) + Z n (l, j)) +, Z n (l, j) = log f l(y n ) f j (Y n ) For some safety critical applications, a more tractable criterion consists in imizing the maximum detection/isolation delay: SADD(δ) = max 1 j K 1 0 ν< sup Eν(T j ν T > ν). (7)

subject to : C γ,β = δ : E T γ, max max sup 1 l K 1 1 j l K 1 ν 0 P l ν(d = j T > ν) β. for 1 l, j l K 1. An asymptotic lower bound in this case is given by the following theorem [3]. Theorem 2. Let (Y k ) k 1 be an independent random sequence observed sequentially : P0 if k ν L(Y k ) =, ν = 0, 1, 2,..., for 1 l K 1 P l if k ν + 1 Then log γ log β 1 SADD(N; γ, β) max ρ, d ρ as γ, β 1, i where ρ d = 1 j K 1 ρ j,0 and ρ i = 1 l K 1 1 j l K 1 ρ l, j. Vector recursive CUSUM test. If γ, β 0 and log γ log β 1 (1 + o(1)), then the above mentioned lower bound can be realized by using the following recursive change detection/isolation algorithm [2] : N r = N r(l), 1 l K 1 d r =arg 1 l K 1 N r(l), where N r (l)=inf n 1 : 0 j l K 1 [ S n (l, j) h l, j ] 0, S n (l, j) = g n (l, 0) g n ( j, 0), g n (l, 0) = (g n 1 (l, 0) + Z n (l, 0)) +, with Z n (l, 0) = log f l(y n ) f 0 (Y n ), g 0(l, 0) = 0 for every 1 l K 1 and g n (0, 0) 0, hd if 1 l K 1 and j = 0 h l, j =. h i if 1 j, l K 1 and j l 4. Applications to network monitoring In this section the above mentioned theoretical results are illustrated by application of the proposed detection/isolation procedures to the problem of network monitoring. Let us consider a network composed of r nodes and n mono-directional links, where y l denotes the volume of traffic on the link l at discrete time k (see details in [8, 9]). For the sake of simplicity, the subscript k denoting the time is omitted now. Let x i, j be the Origin-Destination (OD) traffic demand from node i to node j at time k. The traffic matrix X = x i, j is reordered in the lexicographical order as a column vector X = [ (x (1),..., x (m) ) ]T, where m = r 2 is the number of OD flows. Let us define an n m routing matrix A = [ a l,k ] where 0 al,k 1 represents the fraction of OD flow k volume that is routed through link l. This leads to the linear model Y = A X,

where Y = (y 1,..., y n ) T is the Simple Network Management Protocol (SNMP) measurements. Without loss of generality, the known matrix A is assumed to be of full row rank, i.e., rank A = n. The problem consists in detecting and isolating a significant volume anomaly in an OD flow x i, j by using only SNMP measurements y 1,..., y n. In fact, the main problem with the SNMP measurements is that n m. To overcome this difficulty a parsimonious linear model of non-anomalous traffic has been developed in the following papers [10, 11, 12, 13, 14, 15, 16, 17]. The derivation of this model includes two steps: i) description of the ambient traffic by using a spatial stationary model and ii) linear approximation of the model by using piecewise polynomial splines. The idea of the spline model is that the non-anomalous (ambient) traffic at each time k can be represented by using a known family of basis functions superimposed with unknown coefficients, i.e., it is assumed that X k Bµ k, k = 1, 2,..., where the m q matrix B is assumed to be known and µ t R q is a vector of unknown coefficients such that q < n. Finally, it is assumed that the model residuals together with the natural variability of the OD flows follow a Gaussian distribution, which leads to the following equation: X k = Bµ k + ξ k (8) where ξ k N(0, Σ) is Gaussian noise, with the m m diagonal covariance matrix Σ = diag (σ 2 1,..., σ2 m). The advantages of the detection algorithm based on a parametric model of ambient traffic and its comparison to a non-parametric approach are discussed in [11, 14], (see also [18] for PCA based approach). Hence, the link load measurement model is given by the following linear equation : Y k = A Bµ k + Aξ k = Hµ k + ζ k + [θ l ], (9) where Y k = (y 1,..., y n ) k T and ζ k N(0, AΣA T ). Without any loss of generality, the resulting matrix H = A B is assumed to be of full column rank. Typically, when an anomaly occurs on OD flow l at time ν + 1 (change-point), the vector θ l has the form θ l = ε a(l), where a(l) is the l-th normalized column of A and ε is the intensity of the anomaly. The goal is to detect/isolate the presence of an anomalous vector θ l, which cannot be explained by the ambient traffic model X k Bµ k. Therefore, after the de-correlation transformation, the change detection/isolation problem is based on the following model with nuisance parameter X k : Y k = HX k + ξ k + θ(k, ν), ξ k N(0, σ 2 I n ), k = 1, 2,..., (10) where H is a full rank matrices of size n q, n > q, and θ(k, ν) is a change occurring at time ν + 1, namely : 0 if k ν θ(k, ν) = θ l if k ν + 1, 1 l K 1.

This problem is invariant under the group G = Y g(y) = Y + HX (see details in [19]). The invariant test is based on maximal invariant statistics. The solution is the projection of Y on the orthogonal complement R(H) of the column space R(H) of the matrix H. The parity vector Z = WY is a maximal invariant to the group G. WH = 0, W T W = P H = I r H(H T H) H T, WW T = I n q. Transformation by W removes the interference of the nuisance parameter X Z = WY = Wξ (+Wθ). Hence, the sequential change detection/isolation problem can be re-written as Z k = WY k = Wξ k + Wθ(k, ν), ξ k N(0, σ 2 I n q ), k = 1, 2,.... Theorem 3. Let (Y k ) k 1 be the output of the model given by (10) observed sequentially. Then the generalized CUSUM or matrix recursive CUSUM tests attain the lower bound corresponding to the imax setup : ESADD(N; γ) log γ ρ as γ, ρ def. = inf X l,x j 1 l K 1 ρ l, j(x l, X j ) 0 j l K 1 where X l (resp. X j ) corresponds to the hypothesis H l (resp. H j ). The vector recursive CUSUM test attains the lower bound log γ log β 1 SADD(N; γ, β) max, as γ, β 0, log γ log β 1 (1 + o(1)), where ρ d ρ i ρ d = inf ρ j,0(x j, X 0 ) and ρ i = inf 1 j K 1 X j,x 0 X l,x j 1 l K 1 5. Acknowledgement ρ l, j(x l, X j ). 1 j l K 1 This work was partially supported by the French National Research Agency (ANR) through ANR CSOSG Program (Project ANR-11-SECU-0005). REFERENCES 1. Nikiforov, I.V., A generalized change detection problem. IEEE Transactions on Information Theory, 41(1) : 171-187, 1995. 2. Nikiforov, I.V., A simple recursive algorithm for diagnosis of abrupt changes in random signals. IEEE Transactions on Information Theory, 46 (7) :2740-2746, 2000. 3. Nikiforov, I.V. A lower bound for the detection/isolation delay in a class of sequential tests, IEEE Transactions on Information Theory, 49 (11), p. 3037-3047, 2003.

4. A. Tartakovsky, I. Nikiforov and M. Basseville (2014) Sequential Analysis : Hypothesis Testing and Changepoint Detection, CRC Press, Taylor & Francis Group. 5. Lorden, G. Procedures for reacting to a change in distribution, Annals Math. Statistics, vol.42, pp. 1897-1908, 1971. 6. Oskiper, T. and Poor, H. V. Online activity detection in a multiuser environment using the matrix CUSUM algorithm, IEEE Transactions on Information Theory, 48 (2), p. 477 493, 2002 7. Tartakovsky, A. G. Multidecision Quickest Change-Point Detection: Previous Achievements and Open Problems, Sequential Analysis, 27, p. 201 231, 2008 8. Lakhina, A. et al., Diagnosing network-wide traffic anomalies, in SIGCOMM, 2004. 9. Zhang, Y. et al. (2005): Network anomography, in IMC 05. 10. Fillatre, L., Nikiforov, I., Vaton, S. Sequential Non-Bayesian Change Detection- Isolation and Its Application to the Network Traffic Flows Anomaly Detection, in Proceedings of the 56th Session of ISI, Lisboa, 22-29 August 2007, pp. 1-4, (special session). 11. Casas, P., Fillatre, L., Vaton, S., Nikiforov, I. Volume Anomaly Detection in Data Networks: an Optimal Detection Algorithm vs. the PCA Approach. International Workshop on Traffic Management and Traffic Engineering for the Future Internet, FITraMEn 08, 11-12 Decembre 2008, Porto. 8 p. 12. Fillatre, L., Nikiforov, I., Vaton, S., Casas, P. Network Traffic Flows Anomaly Detection and Isolation. 4th edition of the International Workshop on Applied Probability, IWAP 2008, 7-10 July 2008, Compiègne. 1-6 p, (invited paper). 13. Fillatre, L., Nikiforov, I., Casas, P., Vaton, S. Optimal volume anomaly detection in network traffic flows. 16th European Signal Processing Conference (EU- SIPCO 08), 25-29 August 2008, Lausanne. 5 p. 14. Casas, P., Fillatre, L., Vaton S., Nikiforov I. Volume anomaly detection in data networks : an optimal detection algorithm vs the PCA approach. Lecture Notes in Computer Science, 2009, vol. 5464, n. 96, pp. 96-113 15. Casas, P., Vaton, S., Fillatre, L., Nikiforov, I. V. Optimal Volume Anomaly Detection and Isolation in Large-Scale IP Networks Using Coarse-Grained Measurements, Computer Networks, vol. 54, pp. 1750-1766, 2010. 16. Casas, P., Fillatre, L., Vaton, S., Nikiforov, I. Reactive Robust Routing: Anomaly Localization and Routing Reconfiguration for Dynamic Networks. Journal of Network and Systems Management, vol.19, n. 1, p. 58-83, 2010. 17. Fillatre, L., Nikiforov, I. Asymptotically Uniformly Minimax Detection and Isolation in Network Monitoring. IEEE Transactions on Signal Processing, vol. 60, no. 7, July, pp. 3357-3371, 2012. 18. Ringberg, H. et al. (2007) Sensitivy of PCA for traffic anomaly detection, in SIG- METRICS. 19. M. Fouladirad and I. Nikiforov, Optimal statistical fault detection with nuisance parameters, Automatica, vol. 41, no. 7, pp. 1157 1171, July 2005.