Time Series Analysis of Network Traffic Cyriac James IIT MADRAS February 9, 211 Cyriac James (IIT MADRAS) February 9, 211 1 / 31
Outline of the presentation Background Motivation for the Work Network Trace Analysis and Results Conclusion Drawbacks and Future Work Cyriac James (IIT MADRAS) February 9, 211 2 / 31
Background Traffic Analysis: Looking for Invariants Modeling and Prediction: Linear Models Features Assumptions: Stationarity Predictability Linear fit Cyriac James (IIT MADRAS) February 9, 211 3 / 31
Motivation for the Work Network Traffic is bursty and non-poisson. (Ref: W. E. Leland et al, V. Paxson et al) Can contradict assumptions: Stationarity Linear fit Predictability Stationarity doesn t imply Predictability. Experiments: In the context of TCP SYN Flood attack Cyriac James (IIT MADRAS) February 9, 211 4 / 31
25 Polling Interval = 1s Actual data Predicted data 25 Polling Interval = 1s Actual data Predicted data 2 2 15 15 Data set Data set 1 1 5 5 1 2 3 4 5 6 7 8 9 1 Polling Interval (a) Prediction-1 1 2 3 4 5 6 7 8 9 1 Polling Interval (b) Prediction-2 Figure: Time Series and Prediction Cyriac James (IIT MADRAS) February 9, 211 5 / 31
2 4 6 8 1 12 14 16 18 2 12 1 Polling Interval = 1s Predicted data Actual data 8 Attack Period Data set 6 4 2 Polling Interval (a) Detection-1 12 Polling Interval = 1s Predicted data Actual data 1 8 Attack Period 6 Model Getting Adjusted to Attack Data set 4 2 2 2 4 6 8 1 12 14 16 18 2 Polling Interval (b) Detection-2 Cyriac James (IIT MADRAS) February 9, 211 6 / 31
4 3 2 Actual Vs Predicted : Gaussian White Noise Actual Predicted Magnitude of the process 1 1 2 3 4 2 4 6 8 1 12 Sampling Interval Figure: Prediction of WGN Ideally: Stationary and Predictable Cyriac James (IIT MADRAS) February 9, 211 7 / 31
Unanswered Questions Are the assumptions true? Quantify Predictability? Good Feature? Across networks at all times? How often model parameters need to re-estimated? Relation between Stability, Stationarity, ACF, Hurst Exponent etc. Currently lacking: A systematic approach Cyriac James (IIT MADRAS) February 9, 211 8 / 31
Network Trace Figure: Tenet Network Architecture Traces collected using tcpdump. Link Bandwidth: 4Mbps Feature: SYN - SYN/ACK Data Set-1: 26 th July 21 to 3 th July 21 Data Set-2: 23 rd August 21 to 27 th August 21 Data Set-3: 2 th September 21 to 24 th September 21 Cyriac James (IIT MADRAS) February 9, 211 9 / 31
35 Data Set 1,Monday 35 Data Set 1,Tuesday 3 3 Half open count 25 2 15 1 Half open count 25 2 15 1 5 5 2 4 6 8 Sampling Interval (Seconds) 2 4 6 8 Sampling Interval (Seconds) 35 Data Set 1,Wednesday 35 Data Set 1,Thursday 3 3 Half open count 25 2 15 1 Half open count 25 2 15 1 5 5 2 4 6 8 Sampling Interval (Seconds) 2 4 6 8 Sampling Interval (Seconds) Figure: Original Time Series Cyriac James (IIT MADRAS) February 9, 211 1 / 31
Time Series Transformations Study on Time invariant feature - inconclusive. Transformation of Time Series Differencing and Averaging Cyriac James (IIT MADRAS) February 9, 211 11 / 31
Half open count:first difference 35 3 25 2 15 1 5 Data Set 1,Monday 2 4 6 8 Sampling Interval (Seconds) Half open count:first difference 35 3 25 2 15 1 5 Data Set 1,Tuesday 2 4 6 8 Sampling Interval (Seconds) Half open count:first difference 35 3 25 2 15 1 5 Data Set 1,Wednesday 2 4 6 8 Sampling Interval (Seconds) Half open count:first difference 35 3 25 2 15 1 5 Data Set 1,Thursday 2 4 6 8 Sampling Interval (Seconds) Figure: Difference Time Series Cyriac James (IIT MADRAS) February 9, 211 12 / 31
35 Data Set 1,Monday 35 Data Set 1,Tuesday Average half open count 3 25 2 15 1 5 2 4 6 8 Sampling Interval (Seconds) Average half open count 3 25 2 15 1 5 2 4 6 8 Sampling Interval (Seconds) 35 Data Set 1,Wednesday 35 Data Set 1,Thursday Average half open count 3 25 2 15 1 5 Average half open count 3 25 2 15 1 5 2 4 6 8 Sampling Interval (Seconds) 2 4 6 8 Sampling Interval (Seconds) Figure: Average Time Series Cyriac James (IIT MADRAS) February 9, 211 13 / 31
Analysis and Results Cyriac James (IIT MADRAS) February 9, 211 14 / 31
Stationarity Check First and second order moments should be invariant of time Day Data Set-1 Dat Set-2 Data Set-3 Monday 13.4132 7.8968 8.163 Tuesday 11.431 8.1568 6.747 Wednesday 14.949 8.4121 4.9967 Thursday 14.374 8.4447 4.7113 Friday 13.3957 8.2669 6.29 Average 13.349 8.2355 6.1152 (a) Mean: Original Series Day Data Set-1 Dat Set-2 Data Set-3 Monday 13.4352 7.8969 8.1667 Tuesday 11.4376 8.1533 6.7134 Wednesday 14.162 8.41 4.9954 Thursday 14.383 8.4461 4.7138 Friday 13.399 8.2724 6.13 Average 13.3524 8.23572 6.124 (c) Mean: Average Series Day Data Set-1 Dat Set-2 Data Set-3 Monday 5.543 5.499 4.8475 Tuesday 5.8 5.958 4.2121 Wednesday 5.6499 5.1832 3.7834 Thursday 5.7435 5.452 3.674 Friday 5.373 5.2722 3.9951 Average 5.4615 5.1292 4.117 (b) Mean: Difference Series Cyriac James (IIT MADRAS) February 9, 211 15 / 31
Stationarity and Auto-Correlation Function(ACF) Studies define stationarity in terms of ACF (Ref: H. Liu and M. S. Kim, G. Kirchgassner and J. Wolters) Fast decaying ACF > Stationary Slow decaying ACF > Non-Stationary Remember: All time series are found non-stationary. Cyriac James (IIT MADRAS) February 9, 211 16 / 31
.8.6.4.2.8.6.4.2.8.6.4.2 Data Set 1, Ensemble Average Lag Data Set 2, Ensemble Average Lag Data Set 3, Ensemble Average Lag ACF Plot: Original Series ACF.2 2 4 6 8 1 12 14 16 18 2 ACF.2 2 4 6 8 1 12 14 16 18 2 ACF.2 2 4 6 8 1 12 14 16 18 2 Cyriac James (IIT MADRAS) February 9, 211 17 / 31
.8.6.4.2.8.6.4.2.8.6.4.2 Data Set 1, Ensemble Average Lag Data Set 2, Ensemble Average Lag Data Set 3, Ensemble Average Lag ACF Plot: Difference Series ACF.2 2 4 6 8 1 12 14 16 18 2 ACF.2 2 4 6 8 1 12 14 16 18 2 ACF.2 2 4 6 8 1 12 14 16 18 2 Cyriac James (IIT MADRAS) February 9, 211 18 / 31
.8.6.4.2.8.6.4.2.8.6.4.2 Data Set 1, Ensemble Average Lag Data Set 2, Ensemble Average Lag Data Set 3, Ensemble Average Lag ACF Plot: Average Series ACF.2 2 4 6 8 1 12 14 16 18 2 ACF.2 2 4 6 8 1 12 14 16 18 2 ACF.2 2 4 6 8 1 12 14 16 18 2 Cyriac James (IIT MADRAS) February 9, 211 19 / 31
Stationarity and Stability Definition: Stability > Stationarity (Ref: G. Kirchgassner and J. Wolters) Consider the series as Linear Prediction (LP) process a t = α 1 a t 1 +α 2 a t 2 +ǫ t (1) where a t,a t 1,... is the time series data, α 1 and α 2 are the model coefficients and ǫ t is the random shock or residual at time t. Characteristic Equation: Yule-Walker Estimation: α 1 and α 2 x 2 α 1 x α 2 = (2) Table: Roots of Original Series Data Set Magnitude of the Root 1.61 2.5322 3.564 Cyriac James (IIT MADRAS) February 9, 211 2 / 31
Smoothness Factor Matthew Roughan et al in their work on modeling backbone traffic have quantified the smoothness of the time series. Relative Variance (variance divided by the mean) Lower RV implies smoother series Table: Smoothness Data Set Original Series Difference Series Average Series 1 8.413 12.2682 3.5591 2 14.728 19.3858 2.1594 3 1.61624 17.1416 2.4271 Cyriac James (IIT MADRAS) February 9, 211 21 / 31
Hurst Exponent Estimation Measure of the burstiness H =.5, is a white gaussian noise < H <.5, is a mean reverting and less bursty series.5 < H < 1, is a bursty and trend reinforcing series Rescaled Range Estimator used Table: Hurst exponent Data Set Original Series Difference Series Average Series 1.7178.622.4486 2.5978.5616.4238 3.6244.5732.4258 Cyriac James (IIT MADRAS) February 9, 211 22 / 31
Modeling and Prediction LP model with order 2 Parameter Estimation: Yule-Walker Method Training data: Monday - Thursday Testing: Friday Table: Average Relative Error Data Set Original Series Difference Series Average Series 1.4613.964.168 2.787.954.26 3.7776.8767.327 Cyriac James (IIT MADRAS) February 9, 211 23 / 31
Modeling and Prediction 2 18 16 Actual Vs Predicted : Data Set 1 Actual Predicted 14 Half open count 12 1 8 6 4 2 1 2 3 4 5 6 7 8 9 1 Sampling Interval ( in seconds) 2 Actual vs Predicted : Data Set 1 18 16 Actual Predicted Half open count : First difference 14 12 1 8 6 4 2 1 2 3 4 5 6 7 8 9 1 Sampling Interval (in seconds) Average half open count 2 15 1 5 Actual Vs Predicted : Data Set 1 Actual Predicted 1 2 3 4 5 6 7 8 9 1 Sampling Interval ( in seconds) Cyriac James (IIT MADRAS) February 9, 211 24 / 31
Detection.7.6.5 Probability of False Negative (FN) Probability of False Positive (FP) Probability.4.3.2 Threshold Value of Zero FN and 3% FP.1.1.2.3.4.5.6.7.8.9 1 Threshold Figure: FP vs FN Cyriac James (IIT MADRAS) February 9, 211 25 / 31
Detection 35 Prediction Error During an Attack: Data Set 1 3 Prediction Error 25 2 15 Attack Period 1 5 5 1 15 2 25 3 Sampling Interval (in seconds) Figure: FP vs FN Cyriac James (IIT MADRAS) February 9, 211 26 / 31
Conclusion Assumption of stationarity is not correct in all cases Stability does not imply stationarity. ACF graph alone cannot conclude stationarity. Transformations appear promising. Predictable Series : Slowly decaying ACF, low Hurst exponent and low Relative variance. Window over which series is stationary? Cyriac James (IIT MADRAS) February 9, 211 27 / 31
Drawbacks and Future Work Statistical Significance tests Other transformations: Median smoothening and Mean differencing. Hour based Analysis. Repeat experiments with traffic traces from a different source or network. Compare the traffic characteristics at the edge and core network. Application: Anomaly detection, Bandwidth management. Cyriac James (IIT MADRAS) February 9, 211 28 / 31
Publications 1 Cyriac James and Hema A. Murthy, Time Series Analysis of Network Data: A Case Study, in Third International Workshop on Network Science for Communication Networks, In Conjuction with IEEE Infocom 211 Status: Submitted on 15th January 211 Cyriac James (IIT MADRAS) February 9, 211 29 / 31
References G. E. P. Box, G. M. Jenkins, and G. C. Reinsel, Time Series Analysis: Forecasting and Control. Pearson Education, 1994. D. M. Divakaran, H. A. Murthy, and T. A. Gonsalves, Detection of SYN flooding attacks using linear prediction analysis, 14th IEEE International Conference on Networks, pp. 16, September 26. G. Zhang, S. Jiang, G. Wei, and Q. Guan, A prediction-based detection algorithm against distributed denial-of-service attacks, in Proceedings of the International Conference on Wireless Communications and Mobile Computing (IWCMC), June 29. J. Cheng, J. Yin, C. Wu, B. Zhang, and Y. Liu, DDOS attack detection method based on linear prediction model, in ICIC, 29. W. U. Qing-tao and S. Zhi-qing, Detecting DD O S attacks against web server using time series analysis, Wuhan Univesity Journal of Natural Sciences, vol. 11, no. 1, pp. 17518, 26. W. E. Leland, M. S. Taqqu, W. Willinger, and D. V. Wilson, On the self-similar nature of ethernet traffic, 1994. Cyriac James (IIT MADRAS) February 9, 211 3 / 31
References V. Paxson and S. Floyd, Wide-Area Traffic: The Failure of Poisson Modeling, 1995. T. Karagiannis, M. Molle, and M. Faloutsos, Long-range dependence ten years of internet traffic modeling, in IEEE INTERNET COMPUT- ING, vol. 8, Sept 24, pp. 5764. M. Roughan and J. Gottlieb, Large Scale Measurement and Modeling of Backbone Interent Traffic, in Internet Performance and Control of Network Systems, 22. B. Qian and K. Rasheed, Hurst Exponent And Financial Market Predictability, in IASTED conference on Financial Engineering and Applications, 24. H. Liu and M. S. Kim, Real-time detection of stealthy DD O S attacks using time-series decomposition, in Proceedings of ICC, July 21. G. Kirchgassner and J. Wolters, Introduction to Modern Time Series Analysis. Springer, 27. Cyriac James (IIT MADRAS) February 9, 211 31 / 31