Applying Image Analysis Methods to Network Traffic Classification Thorsten Kisner, and Firoz Kaderali Department of Communication Systems Faculty of Mathematics and Computer Science FernUniversität in Hagen, Germany SPRING 27 SPRING SIDAR Graduierten-Workshop über Reaktive Sicherheit
Outline 1 Motivation Texture Analysis Methods 2 3 Accuracy of classification Conclusion and Future Work
Outline Motivation Texture Analysis Methods 1 Motivation Texture Analysis Methods 2 3 Accuracy of classification Conclusion and Future Work
Texture Analysis Methods Grey Level Co-occurrence Matrix Definition Grey Level Co-occurrence Matrix (GLCM) C(δ, T) = [s(i, j,δ, T)] for texture analysis [1] [2]. s(i, j,δ, T) is a second order probability going from one grey level i to another grey level j given the displacement vector δ = ( x, y). s(i, j,δ, T) = Θ{ x x, x + δ T,g( x) = i, g( x + δ) = j} Θ{ x x, x + δ T } (1) T defines a tile of the original picture
Grey Level Co-occurrence Matrix Parameters describing a texture Texture Analysis Methods Angular Second Moment = (s(i, j)) 2 (2) i j Entropy = s(i, j) log(s(i, j)) (3) i j Inverse Difference Moment = s(i, j) 1 + (i j) 2 (4) i j Inertia = (i j) 2 s(i, j) (5) i j (2) describes the energy of the matrix, (3) the information content. (5) can be interpreted as the contrast and (4) as an inverse weighted measure of contrast.
Texture Analysis Methods In- and outgoing traffic, two types: SMTP and HTTP Measured at the gateway to the external network with the built-in packet and byte counter of iptables (1 second resolution in time). 7 independent traces of 9 hours (weekdays between 7:3am and 4:3pm) for each type of traffic. 6 traces for training data 1 traces for verification Like the windowing mechanism (see T in eq. (1)) in the texture analysis we divide each 9 hour time series in 6 segments of 9 minutes
Outline Motivation 1 Motivation Texture Analysis Methods 2 3 Accuracy of classification Conclusion and Future Work
In texture analysis the size of the co-occurrence matrix is explicitly given by the range of the greyscale values In our scenario the source for the co-occurrence is a time series with no explicitly given limit for the values Huge matrix size to the magnitude of 1 7 x1 7 doesn t make sense thus requiring quantisation. We analysed a linear quantisation to a matrix size of 2 i with i {2, 3,..., 12}.
45 Linearly Dependent 2.5 Not Dependent 4 35 3 Inertia Cluster Shade Cluster Prominence 2 Inverse Difference Moment Correlation Angular Second Moment Entropy 25 1.5 log 2 2 15 1 1 5.5 5 2 4 6 8 1 12 Size of GLCM log 2 2 4 6 8 1 12 Size of GLCM log 2 Figure: Parameters as a function of matrix size
Example Texture Analysis Methods x 1 6 4.5 4 3.5 3 Bytes / T i 2.5 2 1.5 1.5 5 1 15 2 25 3 35 4 45 5 55 time interval T i Figure: Typical network traffic time series
4 3 2 1 ASM.2.4.6.8 1 8 6 4 2 IDM.2.4.6.8 1 1 8 6 4 2 ENT.5 1 1.5 2 15 1 5 INE 5 1 15 2 25 2 15 1 5 CORR 1 1 2 3 6 4 2 CP x 1 3 SMTP Traffic HTTP Traffic.5 1 1.5 2 2.5 3 x 1 8 Figure: Histograms of selected GLCM parameters
1.9 SMTP Traffic HTTP Traffic.8 Inverse Difference Moment (IDM) and Correlation (CORR) plotted against each other. Intersection of both classes, but clustering can be observed. IDM.7.6.5.4.3.2.1.5.5 1 1.5 2 2.5 3 CORR x 1 3 Figure: IDM against CORR
Outline Motivation Accuracy of classification Conclusion and Future Work 1 Motivation Texture Analysis Methods 2 3 Accuracy of classification Conclusion and Future Work
Accuracy of classification Conclusion and Future Work Accuracy of classification k-nearest-neighbor (knn) algorithm with k = 5 to classify the 12 segments 1 of unknown traffic to the classes SMTP or HTTP Only use of the four most relevant parameters (Angular Second Moment (2), Entropy (3), Inverse Difference Moment (4) and Inertia (5)). Traffic Positive Negative Classification rate HTTP 55 5 91.67% SMTP 52 8 86.67% Total 17 13 89.17% Table: Accuracy of classification 1 1 days 6 segments 2 types
Accuracy of classification Conclusion and Future Work Conclusion Novel approach for identifying network traffic by mapping given time series to the known co-occurrence matrix of the domain of texture analysis. Using texture analysis methods we classified even inaccurate and aggregrated data with an accuracy of 9%.
Accuracy of classification Conclusion and Future Work Future Work Analysation of multi-dimensional time series. Examination of network traffic with the proposed method on packet level also including network flow information. Implementing a visualisation framework based on Grey Level Co-occurrence Matrix and related parameters.
Appendix For Further Reading End For Further Reading R. M. Haralick, K. Shanmugam and I. Dinstein, Textural features for image classification, IEEE Transactions on Systems, Man, and Cybernetics, 3(6), November 1973, 61-621 R.W. Conners, M. M. Trivedi, C.A. Harlow, Segmentation of a High-Resolution Urban Scene using Texture Operators, Computer Vision, Graphics and Image Processing, 25, 1984, 273-31
Appendix For Further Reading End Thank you!