Time Series Analysis of Network Traffic

Similar documents

Observingtheeffectof TCP congestion controlon networktraffic

Examining Self-Similarity Network Traffic intervals

Network Traffic Modeling and Prediction with ARIMA/GARCH

MULTI-LEVEL NETWORK RESILIENCE: TRAFFIC ANALYSIS, ANOMALY DETECTION AND SIMULATION

RESEARCH OF THE NETWORK SERVER IN SELF-SIMILAR TRAFFIC ENVIRONMENT

IBM SPSS Forecasting 22

Time Series - ARIMA Models. Instructor: G. William Schwert

2.2 Elimination of Trend and Seasonality

ON THE FRACTAL CHARACTERISTICS OF NETWORK TRAFFIC AND ITS UTILIZATION IN COVERT COMMUNICATIONS

Promotional Forecast Demonstration

Internet Traffic Variability (Long Range Dependency Effects) Dheeraj Reddy CS8803 Fall 2003

Studying Achievement

Time Series Analysis

TIME SERIES ANALYSIS

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

TIME SERIES ANALYSIS

IBM SPSS Forecasting 21

ANALYZING NETWORK TRAFFIC FOR MALICIOUS ACTIVITY

Time Series Analysis

A TWO LEVEL ARCHITECTURE USING CONSENSUS METHOD FOR GLOBAL DECISION MAKING AGAINST DDoS ATTACKS

Univariate and Multivariate Methods PEARSON. Addison Wesley

Characteristics of Network Traffic Flow Anomalies

4. Simple regression. QBUS6840 Predictive Analytics.

Statistical analysis of Snort alarms for a medium-sized network

Hurst exponents, power laws, and efficiency in the Brazilian foreign exchange market

Forecasting in supply chains

Time series Forecasting using Holt-Winters Exponential Smoothing

Network Traffic Characterization using Energy TF Distributions

Stability of QOS. Avinash Varadarajan, Subhransu Maji

Analysis of algorithms of time series analysis for forecasting sales

Application Layer Traffic Analysis of a Peer-to-Peer System

Luciano Rispoli Department of Economics, Mathematics and Statistics Birkbeck College (University of London)

A Novel Distributed Denial of Service (DDoS) Attacks Discriminating Detection in Flash Crowds

Time Series Analysis and Forecasting

Denial of Service and Anomaly Detection

Forecasting of Paddy Production in Sri Lanka: A Time Series Analysis using ARIMA Model

Time Series Analysis: Basic Forecasting.

Threshold Autoregressive Models in Finance: A Comparative Approach

modeling Network Traffic

An Anomaly-Based Method for DDoS Attacks Detection using RBF Neural Networks

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

MODELING OF SYN FLOODING ATTACKS Simona Ramanauskaitė Šiauliai University Tel ,

NOVEL PRIORITISED EGPRS MEDIUM ACCESS REGIME FOR REDUCED FILE TRANSFER DELAY DURING CONGESTED PERIODS

Knowledge Based System for Detection and Prevention of DDoS Attacks using Fuzzy logic

Sales forecasting # 2

Time Series Analysis and Forecasting Methods for Temporal Mining of Interlinked Documents

5. Multiple regression

I. Basic concepts: Buoyancy and Elasticity II. Estimating Tax Elasticity III. From Mechanical Projection to Forecast

Network Bandwidth Utilization Forecast Model on High Bandwidth Networks

Univariate Time Series Analysis; ARIMA Models

Maximizing the number of users in an interactive video-ondemand. Citation Ieee Transactions On Broadcasting, 2002, v. 48 n. 4, p.

Conclusions and Future Directions

Accelerated Simulation Method for Power-law Traffic and Non- FIFO Scheduling

Signal Processing Methods for Denial of Service Attack Detection

FINAL EXAM SECTIONS AND OBJECTIVES FOR COLLEGE ALGEBRA

Load Balancing and Switch Scheduling

Connection-level Analysis and Modeling of Network Traffic

Effect of sampling rate and monitoring granularity on anomaly detectability

A Compound Model for TCP Connection Arrivals

Monitoring Trends in Network Flow for Situational Awareness

Rob J Hyndman. Forecasting using. 11. Dynamic regression OTexts.com/fpp/9/1/ Forecasting using R 1

Chapter 9: Univariate Time Series Analysis

MONITORING OF TRAFFIC OVER THE VICTIM UNDER TCP SYN FLOOD IN A LAN

Forecasting methods applied to engineering management

Network Performance Measurement and Analysis

An Efficient Filter for Denial-of-Service Bandwidth Attacks

Round-Trip Time Inference Via Passive Monitoring

PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU

Some useful concepts in univariate time series analysis

Bandwidth Allocation DBA (BA-DBA) Algorithm for xpon Networks

Internet Activity Analysis Through Proxy Log

Graphical Tools for Exploring and Analyzing Data From ARIMA Time Series Models

Introduction to Longitudinal Data Analysis

Transcription:

Time Series Analysis of Network Traffic Cyriac James IIT MADRAS February 9, 211 Cyriac James (IIT MADRAS) February 9, 211 1 / 31

Outline of the presentation Background Motivation for the Work Network Trace Analysis and Results Conclusion Drawbacks and Future Work Cyriac James (IIT MADRAS) February 9, 211 2 / 31

Background Traffic Analysis: Looking for Invariants Modeling and Prediction: Linear Models Features Assumptions: Stationarity Predictability Linear fit Cyriac James (IIT MADRAS) February 9, 211 3 / 31

Motivation for the Work Network Traffic is bursty and non-poisson. (Ref: W. E. Leland et al, V. Paxson et al) Can contradict assumptions: Stationarity Linear fit Predictability Stationarity doesn t imply Predictability. Experiments: In the context of TCP SYN Flood attack Cyriac James (IIT MADRAS) February 9, 211 4 / 31

25 Polling Interval = 1s Actual data Predicted data 25 Polling Interval = 1s Actual data Predicted data 2 2 15 15 Data set Data set 1 1 5 5 1 2 3 4 5 6 7 8 9 1 Polling Interval (a) Prediction-1 1 2 3 4 5 6 7 8 9 1 Polling Interval (b) Prediction-2 Figure: Time Series and Prediction Cyriac James (IIT MADRAS) February 9, 211 5 / 31

2 4 6 8 1 12 14 16 18 2 12 1 Polling Interval = 1s Predicted data Actual data 8 Attack Period Data set 6 4 2 Polling Interval (a) Detection-1 12 Polling Interval = 1s Predicted data Actual data 1 8 Attack Period 6 Model Getting Adjusted to Attack Data set 4 2 2 2 4 6 8 1 12 14 16 18 2 Polling Interval (b) Detection-2 Cyriac James (IIT MADRAS) February 9, 211 6 / 31

4 3 2 Actual Vs Predicted : Gaussian White Noise Actual Predicted Magnitude of the process 1 1 2 3 4 2 4 6 8 1 12 Sampling Interval Figure: Prediction of WGN Ideally: Stationary and Predictable Cyriac James (IIT MADRAS) February 9, 211 7 / 31

Unanswered Questions Are the assumptions true? Quantify Predictability? Good Feature? Across networks at all times? How often model parameters need to re-estimated? Relation between Stability, Stationarity, ACF, Hurst Exponent etc. Currently lacking: A systematic approach Cyriac James (IIT MADRAS) February 9, 211 8 / 31

Network Trace Figure: Tenet Network Architecture Traces collected using tcpdump. Link Bandwidth: 4Mbps Feature: SYN - SYN/ACK Data Set-1: 26 th July 21 to 3 th July 21 Data Set-2: 23 rd August 21 to 27 th August 21 Data Set-3: 2 th September 21 to 24 th September 21 Cyriac James (IIT MADRAS) February 9, 211 9 / 31

35 Data Set 1,Monday 35 Data Set 1,Tuesday 3 3 Half open count 25 2 15 1 Half open count 25 2 15 1 5 5 2 4 6 8 Sampling Interval (Seconds) 2 4 6 8 Sampling Interval (Seconds) 35 Data Set 1,Wednesday 35 Data Set 1,Thursday 3 3 Half open count 25 2 15 1 Half open count 25 2 15 1 5 5 2 4 6 8 Sampling Interval (Seconds) 2 4 6 8 Sampling Interval (Seconds) Figure: Original Time Series Cyriac James (IIT MADRAS) February 9, 211 1 / 31

Time Series Transformations Study on Time invariant feature - inconclusive. Transformation of Time Series Differencing and Averaging Cyriac James (IIT MADRAS) February 9, 211 11 / 31

Half open count:first difference 35 3 25 2 15 1 5 Data Set 1,Monday 2 4 6 8 Sampling Interval (Seconds) Half open count:first difference 35 3 25 2 15 1 5 Data Set 1,Tuesday 2 4 6 8 Sampling Interval (Seconds) Half open count:first difference 35 3 25 2 15 1 5 Data Set 1,Wednesday 2 4 6 8 Sampling Interval (Seconds) Half open count:first difference 35 3 25 2 15 1 5 Data Set 1,Thursday 2 4 6 8 Sampling Interval (Seconds) Figure: Difference Time Series Cyriac James (IIT MADRAS) February 9, 211 12 / 31

35 Data Set 1,Monday 35 Data Set 1,Tuesday Average half open count 3 25 2 15 1 5 2 4 6 8 Sampling Interval (Seconds) Average half open count 3 25 2 15 1 5 2 4 6 8 Sampling Interval (Seconds) 35 Data Set 1,Wednesday 35 Data Set 1,Thursday Average half open count 3 25 2 15 1 5 Average half open count 3 25 2 15 1 5 2 4 6 8 Sampling Interval (Seconds) 2 4 6 8 Sampling Interval (Seconds) Figure: Average Time Series Cyriac James (IIT MADRAS) February 9, 211 13 / 31

Analysis and Results Cyriac James (IIT MADRAS) February 9, 211 14 / 31

Stationarity Check First and second order moments should be invariant of time Day Data Set-1 Dat Set-2 Data Set-3 Monday 13.4132 7.8968 8.163 Tuesday 11.431 8.1568 6.747 Wednesday 14.949 8.4121 4.9967 Thursday 14.374 8.4447 4.7113 Friday 13.3957 8.2669 6.29 Average 13.349 8.2355 6.1152 (a) Mean: Original Series Day Data Set-1 Dat Set-2 Data Set-3 Monday 13.4352 7.8969 8.1667 Tuesday 11.4376 8.1533 6.7134 Wednesday 14.162 8.41 4.9954 Thursday 14.383 8.4461 4.7138 Friday 13.399 8.2724 6.13 Average 13.3524 8.23572 6.124 (c) Mean: Average Series Day Data Set-1 Dat Set-2 Data Set-3 Monday 5.543 5.499 4.8475 Tuesday 5.8 5.958 4.2121 Wednesday 5.6499 5.1832 3.7834 Thursday 5.7435 5.452 3.674 Friday 5.373 5.2722 3.9951 Average 5.4615 5.1292 4.117 (b) Mean: Difference Series Cyriac James (IIT MADRAS) February 9, 211 15 / 31

Stationarity and Auto-Correlation Function(ACF) Studies define stationarity in terms of ACF (Ref: H. Liu and M. S. Kim, G. Kirchgassner and J. Wolters) Fast decaying ACF > Stationary Slow decaying ACF > Non-Stationary Remember: All time series are found non-stationary. Cyriac James (IIT MADRAS) February 9, 211 16 / 31

.8.6.4.2.8.6.4.2.8.6.4.2 Data Set 1, Ensemble Average Lag Data Set 2, Ensemble Average Lag Data Set 3, Ensemble Average Lag ACF Plot: Original Series ACF.2 2 4 6 8 1 12 14 16 18 2 ACF.2 2 4 6 8 1 12 14 16 18 2 ACF.2 2 4 6 8 1 12 14 16 18 2 Cyriac James (IIT MADRAS) February 9, 211 17 / 31

.8.6.4.2.8.6.4.2.8.6.4.2 Data Set 1, Ensemble Average Lag Data Set 2, Ensemble Average Lag Data Set 3, Ensemble Average Lag ACF Plot: Difference Series ACF.2 2 4 6 8 1 12 14 16 18 2 ACF.2 2 4 6 8 1 12 14 16 18 2 ACF.2 2 4 6 8 1 12 14 16 18 2 Cyriac James (IIT MADRAS) February 9, 211 18 / 31

.8.6.4.2.8.6.4.2.8.6.4.2 Data Set 1, Ensemble Average Lag Data Set 2, Ensemble Average Lag Data Set 3, Ensemble Average Lag ACF Plot: Average Series ACF.2 2 4 6 8 1 12 14 16 18 2 ACF.2 2 4 6 8 1 12 14 16 18 2 ACF.2 2 4 6 8 1 12 14 16 18 2 Cyriac James (IIT MADRAS) February 9, 211 19 / 31

Stationarity and Stability Definition: Stability > Stationarity (Ref: G. Kirchgassner and J. Wolters) Consider the series as Linear Prediction (LP) process a t = α 1 a t 1 +α 2 a t 2 +ǫ t (1) where a t,a t 1,... is the time series data, α 1 and α 2 are the model coefficients and ǫ t is the random shock or residual at time t. Characteristic Equation: Yule-Walker Estimation: α 1 and α 2 x 2 α 1 x α 2 = (2) Table: Roots of Original Series Data Set Magnitude of the Root 1.61 2.5322 3.564 Cyriac James (IIT MADRAS) February 9, 211 2 / 31

Smoothness Factor Matthew Roughan et al in their work on modeling backbone traffic have quantified the smoothness of the time series. Relative Variance (variance divided by the mean) Lower RV implies smoother series Table: Smoothness Data Set Original Series Difference Series Average Series 1 8.413 12.2682 3.5591 2 14.728 19.3858 2.1594 3 1.61624 17.1416 2.4271 Cyriac James (IIT MADRAS) February 9, 211 21 / 31

Hurst Exponent Estimation Measure of the burstiness H =.5, is a white gaussian noise < H <.5, is a mean reverting and less bursty series.5 < H < 1, is a bursty and trend reinforcing series Rescaled Range Estimator used Table: Hurst exponent Data Set Original Series Difference Series Average Series 1.7178.622.4486 2.5978.5616.4238 3.6244.5732.4258 Cyriac James (IIT MADRAS) February 9, 211 22 / 31

Modeling and Prediction LP model with order 2 Parameter Estimation: Yule-Walker Method Training data: Monday - Thursday Testing: Friday Table: Average Relative Error Data Set Original Series Difference Series Average Series 1.4613.964.168 2.787.954.26 3.7776.8767.327 Cyriac James (IIT MADRAS) February 9, 211 23 / 31

Modeling and Prediction 2 18 16 Actual Vs Predicted : Data Set 1 Actual Predicted 14 Half open count 12 1 8 6 4 2 1 2 3 4 5 6 7 8 9 1 Sampling Interval ( in seconds) 2 Actual vs Predicted : Data Set 1 18 16 Actual Predicted Half open count : First difference 14 12 1 8 6 4 2 1 2 3 4 5 6 7 8 9 1 Sampling Interval (in seconds) Average half open count 2 15 1 5 Actual Vs Predicted : Data Set 1 Actual Predicted 1 2 3 4 5 6 7 8 9 1 Sampling Interval ( in seconds) Cyriac James (IIT MADRAS) February 9, 211 24 / 31

Detection.7.6.5 Probability of False Negative (FN) Probability of False Positive (FP) Probability.4.3.2 Threshold Value of Zero FN and 3% FP.1.1.2.3.4.5.6.7.8.9 1 Threshold Figure: FP vs FN Cyriac James (IIT MADRAS) February 9, 211 25 / 31

Detection 35 Prediction Error During an Attack: Data Set 1 3 Prediction Error 25 2 15 Attack Period 1 5 5 1 15 2 25 3 Sampling Interval (in seconds) Figure: FP vs FN Cyriac James (IIT MADRAS) February 9, 211 26 / 31

Conclusion Assumption of stationarity is not correct in all cases Stability does not imply stationarity. ACF graph alone cannot conclude stationarity. Transformations appear promising. Predictable Series : Slowly decaying ACF, low Hurst exponent and low Relative variance. Window over which series is stationary? Cyriac James (IIT MADRAS) February 9, 211 27 / 31

Drawbacks and Future Work Statistical Significance tests Other transformations: Median smoothening and Mean differencing. Hour based Analysis. Repeat experiments with traffic traces from a different source or network. Compare the traffic characteristics at the edge and core network. Application: Anomaly detection, Bandwidth management. Cyriac James (IIT MADRAS) February 9, 211 28 / 31

Publications 1 Cyriac James and Hema A. Murthy, Time Series Analysis of Network Data: A Case Study, in Third International Workshop on Network Science for Communication Networks, In Conjuction with IEEE Infocom 211 Status: Submitted on 15th January 211 Cyriac James (IIT MADRAS) February 9, 211 29 / 31

References G. E. P. Box, G. M. Jenkins, and G. C. Reinsel, Time Series Analysis: Forecasting and Control. Pearson Education, 1994. D. M. Divakaran, H. A. Murthy, and T. A. Gonsalves, Detection of SYN flooding attacks using linear prediction analysis, 14th IEEE International Conference on Networks, pp. 16, September 26. G. Zhang, S. Jiang, G. Wei, and Q. Guan, A prediction-based detection algorithm against distributed denial-of-service attacks, in Proceedings of the International Conference on Wireless Communications and Mobile Computing (IWCMC), June 29. J. Cheng, J. Yin, C. Wu, B. Zhang, and Y. Liu, DDOS attack detection method based on linear prediction model, in ICIC, 29. W. U. Qing-tao and S. Zhi-qing, Detecting DD O S attacks against web server using time series analysis, Wuhan Univesity Journal of Natural Sciences, vol. 11, no. 1, pp. 17518, 26. W. E. Leland, M. S. Taqqu, W. Willinger, and D. V. Wilson, On the self-similar nature of ethernet traffic, 1994. Cyriac James (IIT MADRAS) February 9, 211 3 / 31

References V. Paxson and S. Floyd, Wide-Area Traffic: The Failure of Poisson Modeling, 1995. T. Karagiannis, M. Molle, and M. Faloutsos, Long-range dependence ten years of internet traffic modeling, in IEEE INTERNET COMPUT- ING, vol. 8, Sept 24, pp. 5764. M. Roughan and J. Gottlieb, Large Scale Measurement and Modeling of Backbone Interent Traffic, in Internet Performance and Control of Network Systems, 22. B. Qian and K. Rasheed, Hurst Exponent And Financial Market Predictability, in IASTED conference on Financial Engineering and Applications, 24. H. Liu and M. S. Kim, Real-time detection of stealthy DD O S attacks using time-series decomposition, in Proceedings of ICC, July 21. G. Kirchgassner and J. Wolters, Introduction to Modern Time Series Analysis. Springer, 27. Cyriac James (IIT MADRAS) February 9, 211 31 / 31