A statistical approach to IP-level classification of network traffic

Transcription

1 A statistical approach to IP-level classification of network traffic Manuel Crotti, Francesco Gringoli, Paolo Pelosato, Luca Salgarelli DEA, Università degli Studi di Brescia, via Branze, 38, Brescia, Italy Abstract Correct classification of traffic flows according to the application layer protocols that generated them is essential for most network-management, resource allocation and intrusion detection systems in TCP/IP networks. With the ever increasing number of network protocols and services running on nonstandard TCP ports, the classification methods based on the analysis of the transport layer header are rapidly becoming ineffective. On the other hand, mechanisms based on full payload analysis are too computationally demanding to be run on most high-bandwidth links. Here we present a novel classification technique based on the statistical analysis of network traffic performed at the IP-level. The key idea behind our approach is to build a set of protocol fingerprints that we believe summarize, in a compact and efficient way, the main IP-level statistical properties of application layer protocols. By means of a simple, lightweight algorithm based on the notion of anomaly scores, also presented in this paper, an unknown flow can be compared against known protocol fingerprints, detecting the application that generated the flow. Our methodology is completely based on IP-level analysis: no payload analysis or port analysis is required for the classification of an unknown flow. Besides introducing our approach, we describe preliminary experimental results that show how this technique is effective in correctly classifying network traffic in a real network environment. Keywords: Traffic classification, traffic measurement. I. INTRODUCTION Traffic classification mechanisms belong to the wide set of tools that help the allocation, control and management of resources in TCP/IP networks, and improve the reliability of Network Intrusion Detection Systems (NIDS). An effective mechanism for the classification of traffic flows according to the application layer protocols that generated them can suggest suitable measures to prevent or ease network congestion, to deploy QoS-aware mechanisms successfully, or to counter network attacks. Different techniques can be used to classify IP traffic. The simplest method is to identify the application layer service that generated each flow by its transport level source and destination ports [1]. Since many services are supposed to run on well-known, standard ports, this classification technique can be successful in some cases. However, nowadays standard services are frequently run on non-standard ports, for example to circumvent policy restrictions, e.g., running a HTTP server on port 8080 instead of the canonical 80. Moreover, some increasingly popular applications, such as peer-to-peer services, do not even rely on a predefined set of well-known ports. Another set of techniques as the ones present in many NIDS such as [2], [3] is based on the detailed analysis of the captured traffic at different layers, including the application layer, to classify flows. By means of an exhaustive payload analysis, these techniques try to discover which application layer protocol originated each flow. Apart from legal issues concerning the privacy of end users, the main drawback of this kind of approaches is the computational power needed to classify the traffic, since the finite state machines that drive the application layer protocols must be decoded. Therefore these techniques scale poorly to the capacity of current highspeed networks, limiting their use to lower bandwidth links. Moreover, even these signature-based classification algorithms can fail when traffic is tunneled: for example, when HTTP is used as a transport layer for peer-to-peer traffic, most signature-based engines would classify such flows as regular HTTP. In the worst case, payload analysis techniques can become completely ineffective, for example when end-to-end encryption mechanisms, such as Transport Layer Security, are used to protect the payload. Our approach belongs to yet another class of techniques, those which try to classify network traffic relying exclusively on the statistical properties of the flows (see for example [4], [5]). The key idea behind our work is that the statistical properties of three basic elements of each network flow, i.e. the size of the IP packets, their inter-arrival time and their cardinality within the flow, should be sufficient to determine which application layer protocol generated the flow. We describe our idea in this paper by providing three research contributions. We first define the notion of protocol fingerprints, which express the three statistical properties mentioned above in a compact and efficient way. We then introduce the notion of an anomaly score in this context. Our anomaly score defines how much an unknown traffic flow is close to a given protocol fingerprint. Finally, we introduce a simple algorithm that classifies unknown flows by checking their anomaly scores against all known protocol fingerprints. The key benefits of our approach versus existing techniques are its lightweightness and its robustness against the emergence of traffic generated by new application layer protocols. Furthermore, it is not based on signature matching and does not rely in any way on knowledge of port numbers and transport layer payloads: this theoretically allows the classification of traffic flows that are tunneled, or even encrypted.

2 A preliminary experimental application of our technique on traffic traces collected on the University of Brescia s Faculty of Engineering network shows promising results. The remainder of this paper is organized as follows. In Section II we briefly describe related work. In Section III we introduce our classification methodology, showing how we can derive protocol fingerprints by using statistical analysis at the IP-level. We also describe our definition of anomaly score of a given IP traffic flow versus a fingerprint, and introduce a simple classification algorithm based on such score. In Section IV we describe preliminary numerical results: given a set of protocol fingerprints we show how our definition of anomaly score is effective at classifying traffic flows by analyzing exclusively their properties at the IP-level. Finally, Section V concludes the paper. II. RELATED WORK The idea of using the statistical properties of network traffic to classify flows, or at least to describe their behavior, is not new. Early, pioneering work on Internet traffic characterization published in [6], [7] focuses on the relationship between the observed statistical properties of flows and the application protocols that generated them. These works show that analytical models describing random variables can be suitable to express the behavior of a few protocols. Such models include observed lengths, duration and inter-arrival times of different TCP flows. However, the analysis of inter-arrival statistics shows that non homogeneous Poisson models, though they can successfully describe some user-induced events (e.g. remote shell), do not capture most of the traffic characteristics. These preliminary works do not make any attempt to classify flows according to application layer protocols. One of the first attempts to classify content-biased traffic [8] shows how Real Audio flows may be identified among aggregates. With a simple analysis of packet lengths and interarrival times, the technique described in this work aims at allowing QoS deployment for audio traffic, independently from the particular transport protocol used. A similar approach has been used in [4] to analyze chat traffic. Stemming from the observation that this kind of traffic is dominated by human interactions, this work proved the feasibility of identifying chat flows, whether or not they are using their own transport protocol or are layered on top of other application protocols like HTTP. To overcome one of the key issues with statisticallytrained classifiers, i.e. the lack of verifiable reference data, this work was based on the statistical analysis of Internet Relay Chat traffic traces, since such traffic flows are easily identifiable even by payload analysis. These works differ from ours in that they focus exclusively on a single class of applications (audio traffic and chat traffic, respectively). In [9], [10] it is shown that traffic pattern similarity between different application layer protocols can be exploited to group observed flows into hierarchical clusters. Even if only a few representative features are taken from each flow (the number of exchanges between the endpoints, the total number of bytes, connection duration and so on), the produced clusters show the feasibility of untrained coarse statistical traffic classification aimed at the discrimination among different application classes (e.g., peer-to-peer flows versus remote login). Although these techniques represent the first attempt of untrained traffic classification, it is not clear if they could be applied to the finely grained application layer classification we are interested in. Other trained approaches confirm the possibility of discrimination between different application classes. One of the main obstacles to the deployment of effective QoS mechanisms in the Internet remains the lack of a fast and reliable mechanism to classify flows depending on which kind of application data they transport (e.g. bulk FTP data versus low jitter audio packets). The technique presented in [11] focuses on this problem: in this work it is shown that a useful set of features allowing this kind of classification can be located at different levels (single packet, flow, connection and so on) and they can be successfully exploited by Nearest Neighbor and Linear Discriminant Analysis algorithms. Another trained approach for class discrimination among flows has also been demonstrated with a supervised Bayesian Learning Machine approach in [12]. Although based on full payload analysis, [13] also tries to identify classes of traffic, instead on focusing on the classification of specific application layer protocols. Even if these approaches could lead to an effective deployment of QoS-based mechanisms, probably they would not be precise enough to allow fine-grained application layer protocol discrimination, e.g. POP versus IMAP. Finally, [5] is one of the recent works that focus on the statistical analysis of traffic, and that shows a promising approach for fine-grained protocol classification. It introduces a classification method based on the analysis of host behavior, with the same goals as ours: the classification of flows according to the applications that generated them without payload analysis. However, their approach differs considerably from ours: in the case of [5] the classification is made by associating an host behavior pattern to one or more applications and refining the association by means of heuristics and behavior stratification. III. OUR CLASSIFICATION METHODOLOGY: APPLICATION LAYER FINGERPRINTS AT THE NETWORK LAYER In this paper, we focus on the classification of IP flows produced by network applications exchanging data through TCP connections such as HTTP, SMTP, SSH, etc. 1 These applications always follow the same basic model of operation: a pair of client and server endpoints establish a connection by means of a three-way handshake procedure - connection set up; they exchange application data through TCP segments - communication phase; they decide to end the communication - connection tear down. If something goes wrong this threesteps procedure is usually terminated and the resulting flow is incomplete. With this basis we define flow F as the unidirectional, ordered sequence of IP packets produced either by the 1 The extension of our work to other kinds of transport layer protocols, such as the User Datagram Protocol, is left as future work.

3 client towards the server, or by the server towards the client. Each communication will therefore generate two flows, the client-server flow, composed of (N client +1) IP packets: F client = {Pkt 0,Pkt 1,Pkt 2,...,Pkt Nclient }, where Pkt j represents the j-th IP packet sent by the client to the server, and the corresponding server-client flow, composed by (N server +1) IP packets: F server = {Pkt 0,Pkt 1,Pkt 2,...,Pkt Nserver }. At the IP layer, each flow F can be characterized as an ordered sequence of N pairs P i = {s i, t i }, with 1 i N, where s i represents the Pkt i s size and t i represents the inter-arrival time between Pkt i 1 and Pkt i. Our study is based on the tenet that the statistical information contained in an appropriate amount of flows generated according to the same application layer protocol rules should be enough to decide whether an unknown flow is in agreement with such protocol or not. We name such statistical information protocol fingerprint, and we define it in the remainder of this section. A. Protocol fingerprint precursors: Probability Density Function vectors The generation of a given application layer protocol s fingerprint starts from the evaluation of a set of L Probability Density Functions PDF i, estimated from a set of flows (a training set) generated by the same, known protocol, and captured by a monitoring device. Here the i th PDF i is built on all the i th packets belonging to those flows that are at-least i packets long. Obviously as i increases, the corresponding PDF i is evaluated on a decreasing number of flows. Therefore, L is fixed to base each PDF i on a statistically significant number of flows. If this holds, PDF i describes the behavior of the i-th packets on the plane (packet-size s, interarrival time t) for a certain protocol. Variable s is discrete 2 and assumes values in a range dimensioned according to the minimum and maximum size of an IP packet on the type of network interface used to collect the traces. For example, on an Ethernet link, variable s would range from 40 to 1500 (bytes). Variable t is, instead, sampled with resolution coherent with the speed of the network interface used to capture the traffic traces and with the clock resolution of the capture device, and binned accordingly. In case of Tcpdump [14] used on off-the-shelf hardware, the PDF i plane can be realistically binned along the (log 10 ) t-axis from 10 7 to 10 3 (seconds), 2 As usual when we consider a random variable X assuming values on a finite size domain X = {x 1,..., x N } we implicitly state its distribution function is everywhere constant apart from a finite number of points that are a subset of X. We then consider, with a small notation misuse, its probability density function being everywhere null and positive on these points where it is equal to the amplitude of the corresponding distribution discontinuity. with step Each resulting PDF i matrix in our example above would be 1461x1001. Finally, if L +1 is the number of packets of the longer-lived flows used to analyze a certain protocol, we order the resulting L PDF i s into the Probability Density Function vector PDF. B. Anomaly score: from protocol PDFs to protocol fingerprints In order to classify an unknown traffic flow given a set of different protocol PDFs we need to check if the behavior of the flow is statistically compatible with the description given by at least one of the PDFs; furthermore we need also to choose which PDF describes it better. If we were able to do this we could state that the unknown flow belongs to the application protocol which built that PDF. We are looking for a definition of an anomaly score S that could describe how statistically far an unknown flow F, composed by a series of packet pairs P i,isfromagiven protocol PDF. A basic building block of such anomaly score is the value that the i th component of PDF assumes in P i. In fact, since the value of PDF i in (s, t) expresses the probability that the pair P i is set to (s, t), thevalue PDF i (P i ) gives us the correlation between the unknown flow s i th packet and the application layer protocol described by the specific PDF used: the higher the value, the higher the probability that the flow was generated by such protocol. To counter the fact that the random variables (s, t) are affected by noisy components such as network congestion and path MTU values, the values of PDF i in the region close to P i should also be allowed to have an impact on the definition of anomaly score S. To this end, we introduce the concept of protocol fingerprint M, defined as the vector of i matrices resulting from the application of a circular Gaussian filter to each component of the PDF vector, and rescaling every resulting matrix so that it still sums to 1. Figure 1 gives a graphic representation of the first three components of the fingerprint resulting from the HTTP protocol F client traces obtained from the training set described in the following section. It is clear how the behavior of a packet extracted from a HTTP flow strongly depends on its cardinality number inside the flow. We can now proceed to the actual definition of anomaly score S. We start by introducing an anomaly score vector A, whose i th component A i is a function of the value M i (P i ). The i th component of vector A is defined as follows: 1 A i (P i, M i )= (1) max (ε, M i (P i )) where M i (P i ) is the value of M i calculated in P i, and ε is a small positive quantity. We introduce the term ε to let 3 Note that although the binning of the t axis is done to accept timestamp differences of as little as 10 7 seconds, in our case this is a value that is far too conservative, since Tcpdump will not be able to reach this kind of accuracy on off-the-shelf hardware. However, this fact does not impact the correctness of our methodology, since the same inaccuracies imputable to Tcpdump are expected to affect homogeneously the generation of every protocol s PDF i.

4 Fig. 1. First three components of the HTTP fingerprint, as derived from the University of Brescia s F client traffic. A small section of M 2 is magnified to better show the differences from other protocol fingerprints. the score be always finite, even when M i is zero in P 4 i.by construction, the following will hold true, for any value of P i on the plane (s, t): 1 A i (P i, M i ) ε 1. Starting from the definition of anomaly score vector A we can now define the anomaly score S of F versus M as follows: N i=1 S (F, M) = A i (P i, M i ) /N A min (2) A max A min where N is the minimum between the number of pairs composing F and L, and A min,max are the allowed extreme values of A as defined above, i.e. 1 and ε 1, respectively. This implies that 0 S (F, M) 1. C. Classification algorithm Building on the definitions of flow F, protocol fingerprint M and anomaly score S (F, M), we can introduce the following simple classification algorithm. Given K protocol fingerprints M j, with 1 j K, and an unknown flow F, we state that F was originated by protocol p if its score versus M p is lower than for any other M j, with j p, and 1 j K. More precisely, our classification algorithm works as follows: 1) For each protocol fingerprint M j, with 1 j K, calculate S ( F, M j). 2) F was originated by protocol p if S (F, M p ) = min ( S ( F, M 1),..., S ( F, M K)). D. Using the technique in practice The application of this technique to classify TCP flows on a real network is relatively simple, and can be summarized in the following steps: 4 ε should be an order of magnitude smaller than the smallest non-zero value of M i. a. Collect traffic traces on the edge gateway of the network. This can involve using Tcpdump or any other trafficcapture mechanism available. These traces will serve as training set for our classification technique. b. Pre-classify the traces by means of any effective mechanism, either payload or header based, such as Snort, Bro, the techniques proposed in [5], [12], or a combination of these mechanisms. c. Build protocol fingerprints based on the pre-classified traces following the procedures described in Sections III- A and III-B. Install the fingerprints on the classification engine. d. Start the classification engine built on the algorithm introduced in Section III-C. This activity can be performed on live traffic. e. Periodically, if necessary, update the fingerprints by running steps III-D-III-D again. A few notes on the applicability of this technique are in order. 1) Applicability of the classification algorithm: At first glance the classification algorithm could appear computationally intensive since it needs an anomaly score for each protocol fingerprint to classify a flow. However, network administrators are usually interested in actively managing only a fraction of the protocols running on their networks: only a few fingerprints should be stored and examined to prioritize a limited number of critical services or block others. Indeed the calculation of an anomaly score is fast as the algebraic sum of L ordered terms obtained looking up values in PDF elements. At present the classification engine emits a verdict only when the flow is completed or L packets have been analyzed. This limits the application of our technique mainly to Network Analysis and hampers its use for real-time flow classification. We are working on a new definition of anomaly score which would allow our technique to classify flows as packets pass by, enabling its use in real-time. 2) Building accurate protocol fingerprints: The accuracy of the tools used in step b is critical. Pre-classification of the flows that will be used to build fingerprints should introduce as little noise as possible: for example, when building the HTTP fingerprints, the inclusion of HTTP flows tunneling peer-topeer bulk data should be strictly avoided. A simple method to collect pre-classified flows is to rely on traffic exchanged by trusted servers. In other words, having total control over the software that is executed on a server allows to capture as many flows as needed to produce statistically significant fingerprints. This is the mechanism that we adopted to build fingerprints in this preliminary stage of our research. In general, the validation of training sets is widely recognized as a difficult problem to solve [5]. A combination of different payload-based classifiers could, in many cases, help: even though computationally inefficient, this combination of mechanisms would have to be seldom run, i.e. when the fingerprints have to be created for the first time and when they have to be updated as application layer protocols. However, an

5 implementation of such an effective combination of techniques is not available to the research community at the time of this writing. We plan to address this problem in our ongoing work. IV. ANALYSIS AND EXPERIMENTAL RESULTS A. Testbed setup and protocol fingerprints In order to preliminary test the validity of our classification technique we collected network traces at the edge gateway of the University of Brescia Engineering Faculty s data center network. The data center is composed of a dozen high-end servers, running a mix of versions of the Linux operating system, and interconnected by several layer-2 100/1000BaseT segments. This network hosts , web, and shell accounts for around eight hundred people, between researchers, administrative staff and students. The edge gateway where the trace was collected connects the data center to the Internet by means of a 100 Mb/s link, and is implemented with a Linux-based dual-processor server. We collected traffic traces running Tcpdump for several hours a day, over a period of time spanning six months. Both the time and duration of each trace were chosen randomly. In the end, this phase resulted in traces for over 150 GB of traffic composed by a variable mix of semantically valid HTTP, POP3 and SMTP flows. Here, we define semantically valid the TCP flows that go beyond the initial 3-way-handshake procedure, i.e. that are characterized by more than two t values. We divided the traces in two sets: a training set, amounting to more than flows for each of the three protocols we are considering, used to build protocol fingerprints, and an evaluation set amounting to more than 6000 flows for each of the three protocols, used to validate our technique. Following the procedure described in Section III-D, we started by creating protocol fingerprints from the training set. We actually generated several sets of fingerprints, starting from increasingly large portions of the training set. This is useful to assess the sensitivity of our technique to the number of flows used to fingerprint a given protocol. The value used for the radius in the Gaussian filter (applied to PDFs to derive protocol fingerprints, see Section III-B) was We generated fingerprint vectors composed of twenty elements, which seem more than enough to statistically characterize even longer-lived flows. In this paper, the technique used to pre-classify the training set (step b, Section III-D) is very simple: since in this case the traffic used for the validation was either originated from or directed to the data center network describe above, we can be sure that by simply observing each flow s TCP headers we can detect with certainty its related application layer protocol. As stated in section III-D.2, since we have complete control over the software run on the network servers 6, we can be sure 5 This value was chosen empirically, since it gave the best results with the data available during this preliminary validation phase. The study of how the radius affects the precision of our technique, if it should be a fixed value or it should be fingerprint-dependent, and if the Gaussian filter should be parametrized differently in the two axis (s, t), is left to a future work. 6 Note that, on the contrary, we do not have any control over the clients which, for all the traces involved in our experiments, are on the Internet. that traffic flowing on a given port is actually valid traffic generated by its related application layer protocol, otherwise our servers would not generate semantically correct TCP flows. The same considerations hold for the procedure we used to obtain classification data on the evaluation set: by simply examining the TCP headers of each flow we could, in this case, obtain data to validate our technique. Fig. 2. First three components of the SMTP fingerprint, as derived from the University of Brescia s F client traffic. A small section of M 2 is magnified to better show the differences from other protocol fingerprints. Fig. 3. First three components of the POP3 fingerprint, as derived from the University of Brescia s F client traffic. A small section of M 2 is magnified to better show the differences from other protocol fingerprints. Figures 1, 2 and 3 show the first three components of the fingerprint vectors that we obtained for HTTP, SMTP and POP3, respectively. Once again, it is clear how the behavior of packets depends on their cardinality. Furthermore, the different packet distributions of the three considered protocols are

6 clearly visible. Note that although the figures for POP3 at this resolution seem to show less differences than the other two fingerprints, the same kind of cardinality-dependence exhibited by SMTP and HTTP is clearly visible for POP3 at higher resolutions. B. Experimental results By running the methodology explained in Section III, we classified the flows of the evaluation set. For each flow in the evaluation set, we compared the result given by our technique with the actual application layer protocol determined by the TCP headers as explained in section IV-A. Protocol F server F client HTTP 99.41% 97.46% SMTP 99.65% 97.79% POP % 92.79% TABLE I PERCENTAGE OF FLOWS FROM THE EVALUATION SET THAT WERE CORRECTLY CLASSIFIED. Fig. 4. Ratio of HTTP, POP3 and SMTP flows from the evaluation set that were correctly classified vs. number of flows used to build protocol fingerprints. Table I presents the main results of this experimental phase. As shown by the numbers, our technique correctly classifies the application layer protocol of each flow in excess of at least 92% of the times, with protocol fingerprints obtained from flows. The technique seems to perform better for F server flows than for F client ones. This can be justified by the fact that the statistics of the parameters we are observing is more deterministic, in our experiment, for F server than for F client flows. In fact, while the former are generated by a relatively uniform set of variables (operating systems, hardware and network conditions), we can expect much more heterogeneity for the latter (clients on the Internet). Nevertheless, our technique shows promising results even in classifying F client flows. As shown in Figure 4, our technique is sensitive to the number of flows used to build fingerprints: as long as this number increases we note that, apart from local oscillations, the hit ratio rises. However it is worth noting that even when the fingerprints are built on 1000 flows only, the hit ratio is still above 96% for all three protocols F server traffic. Among the F client flows, only the classification of HTTP achieves relatively lower results, but in any case above the 83% mark. Once again, this confirms the fact that our technique seems to perform reasonably well even with traffic generated under a highly heterogeneous set of conditions, as in the case of traffic coming from clients on the Internet. C. Extension to the classification of non-fingerprinted protocols The application of our technique on a generic TCP/IP network could, after some time spent collecting traces, produce fingerprints for the majority of the protocols that are effectively in use, allowing the direct application of our classification methodology to a large portion of the traffic. However, no matter how many the available fingerprints, it is reasonable to assume that it will always be possible to encounter a traffic flow that does not belong to any of them. In such cases our classification algorithm would fail: it would incorrectly assign the unknown flow to one of the known fingerprints. In fact, the proposed algorithm is based on a hard decision rule: given some fingerprints, the type of an unknown flow is always set to the closest fingerprint as stated by rule 2 of Section III- C. This approach cannot work for the classification of traffic which we don t have a fingerprint for. The problem here is that the absolute value of the anomaly score is not taken into account. To understand how to improve our algorithm we can observe Table II, where we report the mean anomaly scores of flows in the evaluation set versus the three available fingerprints. For flows produced by fingerprinted protocols (i.e., HTTP, SMTP and POP3), the mean score of traffic flows versus the available fingerprints is at its minimum for the fingerprint of the protocol that generated the flows (values in italic), and this is the factor that our classification algorithm is based on. Furthermore, the mean score of the flows versus the correct protocol fingerprint is below 0.1 in all cases and the differences are again more marked for F server flows. We can hence modify our algorithm and include the absolute score value in the decision mechanism so that a warning bell could ring when a protocol that does not belong to any available fingerprint is being classified. For example, we could set a threshold value so that when the smallest score is above it the flow is classified as unknown. This idea is validated by the last rows of each of Tables II, which show the mean anomaly score of SSH flows from the three protocol fingerprints. Contrary to what happens for fingerprinted protocols, in this case, all mean scores are well above the 0.1 mark and only a small set of flows reaches an anomaly score lower than 0.1 (0.6% of flows in the best case

7 and 6.5% in the worst case). This number seems to be a good candidate for the proposed threshold and could be used to improve the robustness of the proposed algorithm respect to flows generated by non-fingerprinted protocols. F client HTTP M SMTP M POP3 M HTTP flows SMTP flows POP3 flows SSH flows F server HTTP M SMTP M POP3 M HTTP flows SMTP flows POP3 flows SSH flows TABLE II MEAN VALUES OF ANOMALY SCORES S(F,M) FOR FINGERPRINTED PROTOCOLS AND FOR SSH. In other words, while the blind application of our methodology to the classification, for example, of SSH flows based on HTTP, POP3 and SMTP fingerprints would assign them (incorrectly) to SMTP traffic, the inclusion of a threshold value for the anomaly score to our algorithm would signal the fact that the unknown traffic does not belong to any of the fingerprinted protocols. Furthermore, a per-fingerprint threshold rather than a fixed one should yield even better results. Finally, the inclusion of data derived from the analysis of the deviation properties of the scores in the classification process should further increase the correctness of our classification technique. We will report on this topic in more detail in future papers. V. CONCLUSIONS AND FUTURE WORK In this paper we introduced and analyzed a new methodology for the IP-level classification of network traffic. The main highlight of our technique is the fact that it is based on the statistical properties of network traffic rather than on the analysis of its payloads. This means that it is less computationally intensive than payload-based mechanisms, making it more scalable to the increasing speed of today s networks. Furthermore, it can in principle be extended to the classification of encrypted traffic. Experimental tests show promising, albeit preliminary, results with respect to the correctness of our classification algorithm, and to its sensitivity to a series of parameters. Comparisons with related work are also encouraging: besides the novelty of the main idea behind our approach, numerical results presented in this paper show how this methodology can potentially surpass the performance of existing trafficclassification mechanisms. Our work in this area is continuing in several directions. A first natural step will be to run new tests with different training sets and expand the analysis to other fingerprinted protocols. Also, several improvements to our algorithm are possible. A few of them are the correlation of classification information from F client and F server, and the integration of different factors, such as the numerical value of the anomaly score of an unknown flow versus each fingerprint, in the decision process, moving away from the hard-decision algorithm presented in this paper. We are also studying mechanisms to apply the classification algorithm dynamically as the classifier examines each flow s packets. This would allow our technique to be applied in real-time on a router, allowing it to classify traffic after capturing only a few packets of each flow. Finally Table II shows that the mean anomaly score is protocol-biased. A preliminary analysis that we recently conducted indicates that the bias is related to the Gaussian filter applied to PDFs to obtain protocol fingerprints. We are investigating the possibility of building each fingerprint with a different smoothing factor, and to apply a Gaussian filter with different parameters on the s and t axes, in order to obtain more precise fingerprints and anomaly score measurements. REFERENCES [1] D. Moore, K. Keys, R. Koga, E. Lagache, and K. C. Claffy, The CoralReef Software Suite as a Tool for System and Network Administrators, in LISA 01: Proceedings of the 15th USENIX conference on System administration, (Berkeley, CA, USA), pp , USENIX Association, [2] V. Paxson, BRO: a system for detecting network intruders in real-time, in Proceedings of the 7th USENIX Security Symposium, (San Antonio, TX, USA), January [3] M. Roesch, SNORT: Lightweight Intrusion Detection for Networks, in LISA 99: Proceedings of the 13th Conference on Systems Administration, (Seattle, Washington, USA), pp , 7-12 November [4] C. Dewes, A. Wichmann, and A. Feldmann, An analysis of Internet chat systems, in IMC 03: Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement, (New York, NY, USA), pp , ACM Press, [5] T. Karagiannis, K. Papagiannaki, and M. Faloutsos, BLINC: multilevel traffic classification in the dark, in SIGCOMM 05: Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications, (New York, NY, USA), pp , ACM Press, [6] V. Paxson, Empirically derived analytic models of wide-area TCP connections, IEEE/ACM Trans. Netw., vol. 2, no. 4, pp , [7] V. Paxson and S. Floyd, Wide area traffic: the failure of Poisson modeling, IEEE/ACM Trans. Netw., vol. 3, no. 3, pp , [8] A. Mena and J. Heidemann, An Empirical Study of Real Audio Traffic, in Proceedings of the IEEE Infocom, (Tel-Aviv, Israel), pp , IEEE, March [9] F. Hernández-Campos, F. D. Smith, K. Jeffay, and A. B. Nobel, Statistical Clustering of Internet Communications Patterns, in Computing Science and Statistics, vol. 35, July [10] A. McGregor, M. Hall, P. Lorier, and J. Brunskill, Flow Clustering Using Machine Learning Techniques, in Proceedings of the Fifth Passive and Active Measurement Workshop (PAM 2004), Mar [11] M. Roughan, S. Sen, O. Spatscheck, and N. Duffield, Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification, in IMC 04: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, (New York, NY, USA), pp , ACM Press, [12] A. W. Moore and D. Zuev, Internet traffic classification using bayesian analysis techniques, in SIGMETRICS 05: Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, (New York, NY, USA), pp , ACM Press, [13] A. W. Moore and K. Papagiannaki, Toward the Accurate Identification of Network Applications, in Proceedings of the Sixth Passive and Active Measurement Workshop (PAM 2005), Oct [14] Tcpdump/Libpcap.