IP NETWORK MONITORING AND AUTOMATIC ANOMALY DETECTION

IP NETWORK MONITORING AND AUTOMATIC ANOMALY DETECTION Michael de Castro, DEEC, Instituto Superior Técnico, Lisboa, Portugal November 2008 Abstract. This paper presents a new tool designed to assist in monitoring and network management. First, an algorithm is developed in order to allow automatic level-2 topology discovery. This feature is based on information collected by SNMP, provided by reference MIBs with the purpose of being supported by the vast majority of the equipments. The second part of this project describes a method of unsupervised learning which allows the detection of abnormal situations in the network. These irregularities are identified by a traffic model composed by a Gaussian mixture, responsible for representing the normal operation of the system. Index Terms Anomaly Detection, Information Security, Network Management, Network Monitoring, SNMP, Topology Discovery. I. INTRODUCTION Nowadays, a constantly growing amount of people s work relies on Information Technologies (IT). This rising need in network applications and information sharing resulted in higher IT requirements, as performance and quality of service (QoS), which became one of the main concerns of IT responsibles. In order to correctly manage IT networks, its configuration and operation have to be continuously monitored and reviewed by network administrators. It s fundamental to possess knowledge about network infrastructure for a proper network management or problem identification and it s important to keep this information constantly updated in order for it to remain reliable and quickly available. However, the size and complexity of today s networks make the manual search for topology difficult and timeconsuming, resulting in a frequent lack of details about the network. In response, this project proposes a new autonomous tool, designed to assist in monitoring and network management. The main focus consists in the development of an algorithm to allow automatic level-2 topology discovery, bypassing the need of manual operations. This tool requires a quantity of information to be obtained regularly from the network, supplying an important database which can be used for several new functionalities, in order to improve network control and administration. Data analysis provides the possibility to verify network operation and performance, with the purpose of identifying equipments failures or security issues. This project also implements that functionality, describing a process to identify network anomalies, even if unknown, with the purpose of mitigating risks and dangers that threaten the proper functioning of a network. A method of unsupervised learning is presented to allow the detection of abnormal situations in the network, which are identified by a traffic model composed by a gaussianas mixture representing the normal operation of the system. The remaining of this document is organized as follows. In Section 2, network management approaches are discussed in order to present the chosen methodology for this project. Section 3 describes the network topology discovery algorithm, while Section 4 focuses on anomaly detection. In Section 5 the results of the tests are presented and, finally, Section 6 concludes the document. II. NETWORK MANAGEMENT APPROACH The main objective of this project is to develop and implement a new network management tool, in order to increase administration efficiency and systems security. The chosen approach is based in automatically retrieving information from the network, building a flexible and complete data resource. This task can be accomplished by polling the different network equipments using Simple Network Management Protocol (SNMP), a management protocol supported by most of IP network s devices. This information gathering must be done as regularly as possible, in order to maintain the data updated and valuable. The obtained data can then be analysed in order to provide different functionalities. This tool tries to centralize several administration needs in a single console, to speed up information s access and problem s resolution. The principal aim is to contribute to the automation and optimization of monitoring and control operations, resulting in an improved network management and an increased security and quality of service. III. NETWORK TOPOLOGY DISCOVERY The knowledge about the correct network topology is crucial for network management tasks like, for instance, event correlation or localization of harmful irregularities. With the purpose of identifying this information, numerous methods and techniques were developed. A. Related Work One of the first approaches consists in searching the layer-3 topology, getting the routing information from the all available routers in the network [1, 2]. This methodology provides interesting results for large and decentralized networks; however, it s not very useful for small and medium network with a centralized star topology, as at Instituto Superior Técnico (IST). In these - 1 -

cases, most of the network is based on layer-2 equipments that forward the traffic to the core router, responsible for the connection to Internet. For this reason, the subsequent studies in this document will be focused towards layer-2 topology discovery. One of the first problems that come across in layer-2 topology is the utilization of Spanning Tree Protocol (STP) [3]. This layer-2 protocol offers connection redundancy and failure tolerance, but results in a higher number of topology changes, which complicates the discovery process. Although the STP provides some information that can be used to deduce connections between equipments [4, 5], it only identifies parts of the network, failing in recognising the global topology for complex or heterogeneous networks. From a different perspective, we can compare the network traffic from all the ports of the existent devices and conclude that the interfaces are connected if the sampled traffic is similar. Alternately, data packets can be injected in the network for analysing where they are delivered [6]. Yet, the results from these techniques are mainly based on approximations, which makes this algorithm non reliable for the objectives of the project. In order to provide consistent results, we can try to use the information available in switches Address Forwarding Table (AFT). When two equipments establish a communication, they exchange network packets that are forwarded by the switches located between the equipments, constructing a forwarding path. In the process, the involved switches populate their AFTs with the Media Access Control (MAC) addresses from the neighbour devices, which send or receive information through the network. From the integration of AFTs from several switches, we can conclude connections between equipments, with interesting results [7, 8, 9, 10, 11]. However, this methodology is often based in ideal systems or multinetwork topologies, resulting in requirements hardly verifiable in the real networks, with the characteristics of IST. For this reason, we will develop a new discovery algorithm, based in the AFT analysis, but flexible and aimed at heterogeneous star networks, mainly composed by layer-2 devices. B. Discovery Algorithm We will start by presenting the notation used in this section. We denote S i the switch i from the network where i is only an identifier and we define S ij the j interface from switch S i. For a given switch i, the AFT corresponding to the interface j is denoted by A ij while all the AFTs from that switch are noted by A i. Finally, the set of all switches in the network is represented by N. One of the first issues on AFT based algorithms is the fact that the AFTs are often incomplete of even empty. This occurs when the traffic is low in the network, providing little occupation for the switches. To mitigate this issue, we can ping all the switches from the respective network before starting each topology discovery, which will result in filling most of the AFT entries. If we consider complete AFTs (for each switch i, A i contains the MAC addresses from all the other switches in the network), we can easily deduce Lemma 1, based on [7]: Lemma 1: If A i and A k are complete, the switches i and k are connected by the interfaces j and l correspondingly if and only if A ij A kl = «and A ij» A kl = N. Demonstration: If S ij and S kl are directly connected, there can be no other switch between them, so A ij A kl = «. As the AFTs are complete, all the switches from the network are contained in the union of the AFTs of two connected interfaces, resulting in A ij» A kl = N. Considering now that A ij A kl = «and A ij» A kl = N, we can deduce the following statements. As A ij» A kl = N, the interfaces S ij and S kl must be connected, otherwise we would have S i, S k Ã(A i» A k ) and consequently A ij» A kl N, because S i Ã A i and S k Ã A k. If S ij and S kl are connected but not directly connected, there must be a path between S ij and S kl which contains at least one switch, thus A ij A kl «. Therefore, we can conclude that if A ij A kl = «and A ij» A kl = N, then the interfaces S ij and S kl must be directly connected. The application of Lemma 1 permits to search for every direct connection in the network, but requires the AFTs to be complete, which is very unlikely in real networks. In order to create a robust discovery tool, we will need to complement this algorithm with a different approach. The chosen idea is to analyse all the forwarding paths between pairs of switches [9]. Given S i and S j, all the switches that are connected between S i and S j are part of the forwarding path between S i and S j. A switch can only be part of the path if it has the addresses of S i and S j in different AFTs, as we can verify by Lemma 2: Lemma 2: Given three connected switches S i, S k and S m where C is the path between S i and S k ; if S i Õ A mr and S k Õ A ms, with r s, then S m belong to C. Demonstration: If S i, S k and S m are connected and S i, S k Õ A m, then there are only four possible situations: 1. S m connected to S i which is connected to S k by another interface. In this situation, S m is not part of C because S i, S k Õ A mr, which means that S i and S k are connected to the same interface of A m. 2. S m connected to S k, which is connected to S i by another interface. It s the same type of situation as the previous one, where S m does not belong to C, this time because S i, S k Õ A ms. 3. S m connected between S i and S k, but not making part of path C (this situation can occur if the three switches are directly connected to the same hub). A i contains S m and S k in the same interface, as A k with S m and S i ; however, both S i and S k are connected through the same interface of S m, which results in S i, S k Õ A mt. 4. S m connected between S i and S k, belonging to C. A i contains S m and S k in the same interface, as A k with S m and S i ; but now S m has S i Õ A mr and S k Õ A ms, with r s. We can conclude that for S m to be part of C, we only need to verify that S i Õ A mr and S k Õ A ms, with r s. The converse of Lemma 2 is not valid, because the AFTs may be incomplete. Consequently, this method can t assure - 2 -

that all network connections are discovered and that each identified connection is a direct connection between equipments. However, inferring the connections of all possible paths between pairs of switches returns important results, as we will verify latter on this document, even if the AFTs are mostly incomplete. The complete discovery algorithm starts by searching direct connections using Lemma 1. Afterwards, it will search all the possible paths between switches (Lemma 2) and sort the results using the relative information in the AFTs from the devices belonging to the paths. The correlation of all this information enables a topology discovery where only a small number of AFT s entries is needed in order to identify real connections, even in low loaded networks, as we will observe in the results from the implementation at IST network. This information provides faster problem identification and automatic location of links and equipments, which compose an important support tool for any system administrator with network management s responsibilities. IV. ANOMALY DETECTION The rising dependence in IT services and IT applications results in larger and more complex networks, therefore increasing its exposure to threats and failures. There are several risks resulting from equipment crashes, malicious attacks or security breaches which can be mitigate by a correct prevention plan and constant monitoring. There are quite a few tools responsible for equipment monitoring or intrusions detection, but they are mostly based in local thresholds supervision or known anomalies scans. It is quite common to verify unknown threats or complex situations to bypass automatic detection, which can be sometimes identifiable by experienced human operators. Even if hostile traffic is often different from benign traffic, it is frequently hard to translate this difference in a set of explicit rules or deterministic parameters. This is mainly due to the highly irregular nature of the traffic, which constantly transforms network patterns and anomaly effects. A. Related Work One first approach consists in searching for relevant differences in traffic, comparing the real samples with the results from statistical [12] or prediction [13, 14] models. Nevertheless, theses studies are mainly focused in backbone links or highly bandwidth nodes, where small anomalies are insignificant comparing with the global throughput, and the normal traffic presents some regularity. For this reason, this methodology can t easily be adapted to medium university networks like IST, where the normal traffic is predominantly irregular and hardly predictable. A different technique can be used, dividing the traffic in several components, corresponding to different frequencies, as with Fourier s transformation [15, 16]. Despite the complex implementation, the presented results are quite satisfactory, but only reflect local anomalies. With this algorithm, an irregular situation wouldn t be detected if all the values from the different equipments were below the limits, when independently observed, but where the global correlation of information would indicate an abnormal activity. In order to respond to a wider range of situations, we propose to study unsupervised learning systems with the intention of creating a model which characterizes the normal behaviour of the network. This kind of approach is not deterministic but provides a vaster applicability, as it isn t based in pre-defined rules or parameters and can continuously adapt itself to the real data. This methodology can be used with several different systems like neural networks or artificial intelligence models [17]. B. Traffic Model We propose to model the normal operation of the network by a probability density function (pdf) composed by a Gaussian mixture. As inputs for the unsupervised learning model, we choose to have the traffic values from different nodes in the network and the time and data of the samples. It s important to consider the time and date of the data, because there are strong discrepancies on network utilization between workday and week-end or night and business hours. The main objective of this study is to search the model parameters that best characterize the patterns in data, reflecting the normal behaviour of the network. After obtaining these values, we are able to set thresholds to the likelihood function in order to identify anomalous situations. The pdf of a mixture of M Gaussians at d dimensions is given by ( ) = M p x p( x j) P( j) (1) j = 1 where P( j) corresponds to the probability of the Gaussian j, whose pdf is 1 { } T 1 1 p( x j) = exp ( x μ) Σ ( x μ) d /2 1/2 (2 π ) Σ 2 where μ is the Gaussian mean and Σ the respective covariance matrix. We can now use varied approaches to identify the unknown parameters. One of the simplest solutions consists in using the expectation-maximization (EM) algorithm for finding the maximum likelihood estimates of the mixture parameters [17]. This deterministic algorithm provides the desired results but requires the number of Gaussians in the mixture to be manually chosen. The experimental results showed that the number of Gaussians that best represents the data differs from model to model, depending on the existent patterns in the data. With the purpose of setting this parameter automatically, we will complement the EM algorithm with a method for scoring models. Shannon s information theory [18] states that information can be compressed by an optimal code, so that it can use the smallest message to represent the original information. Following this theory, the length (in bits) of a model θ, with a probability P( θ ), can be given by 2 (2) length( θ ) = -log ( P ( θ )) (3) - 3 -

Based on Occam's razor principle, Wallace developed the Minimum Message Length (MML) methodology aiming at searching the simplest model that still fits the data correctly [19]. Considering a message E containing all traffic data, we can compress this information replacing the data by a model θ and a difference D, corresponding to the error between the model θ and the real data: length( E) = length( θ ) + length( D θ ) (4) = -log ( P( θ )) - log ( P( D θ )) 2 2 Both the model and the error can be modelled by a variety of functions or distributions, such as Gaussian mixture for example. In this case, the more complex is the model, the more Gaussians will be used in the mixture, but simpler will be the remaining error distribution. We can conclude that the minimization of the data message length provides a way to identify the preferential Gaussian mixture, following the MML principle. This method provides a way to balance model accuracy with the corresponding complexity. For the purposes of this project, we choose to use the PyMML [20] toolkit which provides a MML implementation that can be used to search and compute all the parameters from the Gaussian mixture that minimize the length of data representation. Using the presented methodology, we are able to construct a traffic model where the Gaussians represent the principal network patterns. In this case, the normal operation of the network fits in the identified patterns, resulting in a high pdf value. In opposition, irregular traffic is distant for the normal behaviour of the network, resulting in low pdf values. We can easily deduce that setting a threshold in the pdf values provides a way to identify network anomalies, even if unknown, corresponding to unusual events that result in abnormal traffic. As a result, we can generate alarms for potentially harmful situations that should be further investigated, increasing operational control and global security awareness. compare the results, we verify the percentage of found anomalies in three types of situations: real traffic, manually created anomalies and random traffic, as we can see at Figure 2. It is important that all simulated anomalies are detected; otherwise the number of false negatives will be too significant. On the other hand, we need to find the balance between the false positives (anomalies in real traffic superior to 0,5%) and the accuracy of the model, reflected by random traffic results, as these chaotic values are far away from the normal behavior, modeling abnormal situations. (a) January 2008 V. RESULTS FROM IST NETWORK The presented functionalities were tested at IST, a medium sized and heterogeneous network where traffic is strongly irregular as it depends, among other influences, on student s schedule. A. Network Topologies In a first experiment, we observe the results from the topology discovery algorithm. Even from a medium loaded network, no connections were detected using the full AFT algorithm. However, the path analysis allowed a complete topology discovery which correctly identified the existent connections at IST, for several configurations, as shown in Figure 1. We conclude that the developed algorithm returns valid results, accompanying network changes and completely fulfilling the proposed objectives. B. Detected Anomalies Several traffic models were created corresponding to different weeks of usual operation. In order to evaluate and (b) June 2008 Fig. 1. Results from the network topology discovery algorithm applied at IST. -7 Using the threshold 1.0 10, we verify that less than 0.4% of real traffic is detected as anomalous, which is acceptable, while over 96% of the irregular traffic is identified. We can conclude that the presented - 4 -

methodology produces traffic models that are able to correctly differentiate normal data from anomalies, meeting the desired objectives. Data Simulated Real Traffic Random traffic Threshold Model 1 Model 2 Model 1 Model 2 Model 1 Model anomalies 2-12 1.0 10 0.0% N/A 71.4% N/A 86.1% N/A -8 1.0 10 0.1% 0.2% 100.0% 100.0% 95.0% 97.2% -7 1.0 10 0.3% 0.4% 100.0% 100.0% 96.7% 97.7% -6 1.0 10 0.8% 1.2% 100.0% 100.0% 97.9% 98.9% -3 1.0 10 22.6% N/A 100.0% N/A 100.0% N/A Fig. 2. Percentage of anomalous traffic in different models. Finally, Figure 3 presents the traffic evolution by time around detected irregularities, which are represented by red arrows on the graphics. These situations confirm the previous results as the identified anomalies correspond to real irregularities that can be originated by harmful behaviours: (a) may be the consequence of a network attack or malicious application resulting in an intense network activity while (b) might indicate an equipment failure. Both situations justify further investigation which can be pushed by the anomaly detection alarms that warn system administrators of the incident. (a) Anomaly at port 1001 from core (b) Anomaly at port 2008 from core Fig. 3. Results from anomaly detection applied to IST core traffic. VI. CONCLUSIONS In this document, we proposed a new methodology for discovering layer-2 topology and detecting anomalies. The main objectives were achieved, providing several new functionalities to network management and security. Different information is made available, like global topology, device s details and network configurations. In addition, the anomaly detection method offers a new layer of security and awareness, not only to warn administrators about irregular operation, but also to provide a wide range of useful functionalities, like equipment s location or traffic analyser, assisting in problem resolution tasks. The practicality of this approach was verified by the execution of the new and centralized tool, implementing the presented techniques and algorithms. One of the main advantages of the implemented algorithms is their adaptability to a variety of network conditions, producing valuable results through different scenarios. The application at the IST network revealed all the network connections between equipments, as well as various network anomalies. These results proved that this tool enables a faster operation management, increasing system s global efficiency and security. REFERENCES [1] Hwa-Chun Lin, Yi-Fan Wang, Chien-Hsing Wang, Chien-Lin Chen, "Web-based Distributed Topology Discovery of IP Networks", Proceedings of 15th International Conference on Information Networking (ICOIN'01), 2001, pp. 857-862. [2] D. T. Stott, "Snmp-based layer-3 path discovery," Tech. Rep. ALR-2002-005, Avaya Labs Research, Avaya Inc., Basking Ridge, NJ, 2002. [3] IEEE Computer Society, IEEE Std 802.1D-2004, IEEE Standard for Local and Metropolitan Area Networks: Media Access Control (MAC) Bridges, IEEE Standard, 2004. [4] D. T. Stott, Layer-2 path discovery using spanning tree mibs, Avaya Labs Research, Avaya Inc., Basking Ridge, NJ, Tech. Rep. ALR-2002-004, 2002. [5] Yuzhao Li, Changxing Pei, Changhua Zhu, Jiandong Li, "An Algorithm for Discovering Physical Topology in Single Subnet IP Networks", in Proceedings of 19th International Conference on Advanced Information Networking and Applications (AINA'05), Volume 2, 2005, pp. 351-354. [6] Richard Black, Austin Donnelly, Cedric Fournet, "Ethernet Topology Discovery without Network Assistance", Proceedings of 12th IEEE International Conference on Network Protocols (ICNP'04), 2004, pp. 328-339. [7] Y. Breitbart, M. Garofalakis, C. Martin, R. Rastogi, S. Seshadri, and A. Silberschatz, Topology discovery in heterogeneous IP networks, in Proc. IEEE INFOCOM, 2000, pp. 265 274. [8] Yuri Breitbart, Minos Garofalakis, Ben Jai, Cliff Martin, Rajeev Rastogi, and Avi Silberschatz, Topology Discovery in Heterogeneous IP Networks: - 5 -

The NetInventory System, IEEE/ACM Transactions on Networking, Vol. 12, No. 3, June 2004, pp. 401-414. [9] Y. Bejerano, Y. Breitbart, M. Garofalakis, and R. Rastogi, Physical topology discovery for large multisubnet networks, in Proc. IEEE INFOCOM, 2003, pp. 342 352. [10] Yuzhao Li, Changxing Pei, Changhua Zhu, Jiandong Li, "An Algorithm for Discovering Physical Topology in Single Subnet IP Networks", in Proceedings of 19th International Conference on Advanced Information Networking and Applications (AINA'05), Volume 2, 2005, pp. 351-354. [11] B. Lowekamp, D. R. O Hallaron, and T. R. Gross, Topology Discovery for Large Ethernet Networks, in Proceedings of ACM SIGCOMM, San Diego, California, Aug. 2001. [12] G. Cormode, S. Muthukrishnan, "What's new: Finding significant differences in network data streams," in Proc. of IEEE Infocom, 2004. [13] B. Krishnamurthy, S. Sen, Y. Zhang, Y Chen, Sketch-based Change Detection: Methods, Evaluation, and Applications", Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement,2003. [14] Jun Jiang, Symeon Papavassiliou, A Network Fault Diagnostic Approach Based on a Statistical Traffic Normality Prediction Algorithm, GLOBECOM, IEEE, 2003, pp. 2918-2922. [15] Jun Gao, Guangmin Hu, Xingmiao Yao, Rocky K. C. Chang,,Anomaly Detection of Network Traffic Based on Wavelet Packet, IEEE, 2006. [16] Jun Lv, Xing Li, Tong Li, The New Detection Algorithms for Network Traffic Anomalies, Proceedings of the Sixth International Conference on Networking (ICN'07), IEEE, 2007. [17] Christopher M. Bishop, Neural Networks for Pattern Recognition, Clarendon Press, Oxford, UK, 1995. [18] Claude Elwood Shannon, Warren Weaver, The mathematical theory of communication, Urbana: University of Illinois Press, USA, 1949. [19] C.S. Wallace, Statistical and Inductive Inference by Minimum Message Length, Springer, USA, 2005. [20] Paul Harrison, PyMML, Python library for implementing MML estimators, version 0.5, 2005 (http://www.logarithmic.net/pfh/pymml). - 6 -