Data Streaming Algorithms for Estimating Entropy of Network Traffic

Size: px
Start display at page:

Download "Data Streaming Algorithms for Estimating Entropy of Network Traffic"

Transcription

1 Data Streaing Algoriths for Estiating Entropy of Network Traffic Ashwin Lall University of Rochester Vyas Sekar Carnegie Mellon University Mitsunori Ogihara University of Rochester Jun (Ji) Xu Georgia Inst. of Technology Hui Zhang Carnegie Mellon University ABSTRACT Using entropy of traffic distributions has been shown to aid a wide variety of network onitoring applications such as anoaly detection, clustering to reveal interesting patterns, and traffic classification. However, realizing this potential benefit in practice requires accurate algoriths that can operate on high-speed links, with low CPU and eory requireents. In this paper, we investigate the proble of estiating the entropy in a streaing coputation odel. We give lower bounds for this proble, showing that neither approxiation nor randoization alone will let us copute the entropy efficiently. We present two algoriths for randoly approxiating the entropy in a tie and space efficient anner, applicable for use on very high speed (greater than OC-48) links. The first algorith for entropy estiation is inspired by the structural siilarity with the seinal work of Alon et al. for estiating frequency oents, and we provide strong theoretical guarantees on the error and resource usage. Our second algorith utilizes the observation that the perforance of the streaing algorith can be enhanced by separating the high-frequency ites (or elephants) fro the low-frequency ites (or ice). We evaluate our algoriths on traffic traces fro different deployent scenarios. Categories and Subject Descriptors C.2.3 [Coputer Systes Organization]: Coputer- Counication Networks: Network Operations Network Monitoring; Supported in part by grants Xerox/NYSRAT #C43 and NSF-EIA-256. Supported in part by NSF grant NETS-NBD and NSF CAREER Award ANI Supported in part by grants NSF CNS and ANI and U.S. Ary Research Office contract nuber DAAD Perission to ake digital or hard copies of all or part of this work for personal or classroo use is granted without fee provided that copies are not ade or distributed for profit or coercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific perission and/or a fee. SIGMETRICS/Perforance 6, June 26 3, 26, Saint Malo, France. Copyright 26 ACM /6/6...$5.. General Ters Algoriths, Measureent, Theory Keywords Traffic Analysis, Data Streaing. INTRODUCTION In network traffic flow analysis there has been a shift of focus fro siple volue-based analysis to network flow distribution-based analysis. Much work has been published for aking inference about the network status fro such statistics [2, 7, 24]. Intrinsically, distribution-based analysis could capture the network status ore succinctly than volue-based analysis would, but it requires appropriate etrics to encapsulate and capture features of the underlying traffic distribution. The standard quantities in assessing distributions are the oents (the ean, standard deviation, skewness, kurtosis, etc.). A nuber of recent epirical studies [7, 7, 23, 24] have suggested the use of entropy as a succinct eans of suarizing traffic distributions for different applications, in particular, in anoaly detection and in fine-grained traffic analysis and classification. With respect to anoaly detection [7], the use of entropy for tracking changes in traffic distributions provides two significant benefits. First, the use of entropy can increase the sensitivity of detection to uncover anoalous incidents that ay not anifest as volue anoalies. Second, using such traffic features provides additional diagnostic inforation into the nature of the anoalous incidents (e.g., aking distinction aong wors, DDoS attacks, and scans) that is not available fro just voluebased anoaly detection. With respect to fine-grained traffic analysis and traffic classification [24], the entropy of traffic feature distributions offers useful inforation to easure distance aong (traffic) clusters. While these recent studies deonstrate that using the entropy of traffic distributions has treendous value for network onitoring applications, realizing the potential benefit requires efficient algoriths for coputing the entropy. In general, coputing traffic statistics on high-speed links is a hard task, because it is infeasible for traditional ethods to keep up with the line-rates, due to constraints on available processing capacity. In addition, constraints iposed on eory ake it alost ipossible to copute the statistics per flow, or even to aintain per-flow state. Then, the

2 use of sapling coes as a natural solution. Sapling based ethods [5, 6] have been shown to be able to reduce the processing and eory requireents, and to be suitable for capturing soe traffic statistics. However, one ust trade off accuracy for efficiency the estiates obtained fro sapled data ay have large errors []. One ay then naturally wonder whether there are efficient ethods for accurately estiating the entropy. In particular, we ask the following questions: What aount of resources (tie and space) do we provably need to capture the entropy of a strea of packets on a high-speed link? Are there efficient algoriths for entropy coputation that can operate on high-speed links which have low eory and CPU costs? To address these questions, data streaing algoriths assue significance. Data streaing algoriths [9] for coputing different statistics over input streas have recently received treendous interest fro the networking and theory counities. Data streaing algoriths have the desirable property that both the coputational and eory requireents are low. This property akes the ideal for such high-speed onitoring applications. They are also guaranteed to work with any distribution, which akes the useful in dealing with data for which the distribution is not known. The contribution of this paper is the investigation and application of streaing algoriths to copute the entropy over network traffic streas. The challenge is to design algoriths for estiating entropy that are lightweight in ters of both eory and coputational coplexity. We present two algoriths for coputing the entropy in a streaing odel. The first algorith is based on the insight that estiating the entropy shares structural siilarity with the well-known proble of estiating the frequency oents [2]. Despite the apparent structural siilarity, providing theoretical approxiation and resource guarantees for entropy estiation is a challenging task. Our contributions are the identification of appropriate estiator functions for calculating the entropy accurately, and providing proofs of approxiation guarantees and resource usage. The theoretical guarantees hold for arbitrary streas, without aking any assuptions regarding the underlying distributions and structural properties of their distribution. Network traffic data-streas have considerable underlying structure (e.g., they ay have a Zipfian or power-law distribution), which suggests that we can optiize algoriths further by leveraging this fact. Our second algorith builds on the basic streaing algorith, but can substantially iprove the efficiency based on techniques for separating the large (elephant) flows fro the sall (ice) flows. We use a lightweight sapling ethod that enables sieving out the elephant flows fro the strea, and extend the earlier algorith to utilize this separation to achieve better perforance in practice. We evaluate our algoriths on real traffic traces collected fro three different deployent scenarios. The first streaing algorith outperfors traditional sapling based approaches, and provides uch lower estiation errors while This approach has also been independently proposed by Chakrabarti et al. []. We will discuss this and other approaches in Section 8, highlighting that while our intellectual trails cross each other on soe results, our approaches and evaluations differ substantially in others. using siilar (or lesser) eory resources. Interestingly, we notice that the observed errors are an order of agnitude saller than the theoretical error guarantees. While it has proved difficult to provide rigorous theoretical (i.e., worstcase) guarantees for the second algorith (which akes use of the elephant-ice separation), we find that the observed errors are further reduced with this approach. The reainder of this paper is organized as follows. We introduce the notation that we will use and forally define the proble in Section 2. In Section 3 we prove that any (deterinistic) approxiation algorith or (exact) randoized algorith ust use a linear aount of space. Section 4 outlines the basic streaing algorith and provides theoretical approxiation guarantees, while Section 5 provides iproveents based on the technique of separating the elephant and ice flows. We evaluate our algoriths on realworld traces in Section 6, confiring the effectiveness of our approaches. We discuss soe features of our algoriths in Section 7 and related work in Section 8, before concluding in Section PROBLEM FORMULATION We first outline the notation used in the reainder of the paper, and forulate the proble of estiating entropy in a streaing context. Throughout this paper we will assue that all ites coing over the strea are drawn fro the set [n] = {, 2, 3,..., n}. For exaple if we are interested in easuring the entropy of packets over various application ports, then n is the nuber of ports (axiu of ports for each protocol). Siilarly, if we are interested in easuring the entropy of packets over unique source or destination addresses in the traffic strea, then n would have a axiu value of 2 32 for 32-bit IPv4 addresses. We will denote the frequency of ite i [n] (e.g., the nuber of packets seen at port i) by i and the total nuber of ites in the strea by, i.e., = n i. The jth ite observed in the strea will be denoted by a j [n]. We define n to be the nuber of distinct ites that appear in the strea, since it is possible that not all n ites are present. As a siple exaple consider a strea drawn fro a set of n = 4 different possible objects {A, B, C, D}. Let the strea X = (A, A, B, B, C, A, B, A, C). For this strea, the total nuber of ites = = 9, with the nuber of distinct ites n = 3. Note that all our analysis is in ters of, rather than n, since in general n >>. The natural definition of entropy (soeties referred to as saple entropy) in this setting is the expression H n i log ( i ). Intuitively, the entropy is a easure of the diversity or randoness of the data coing over the strea. The entropy attains its iniu value of zero when all the ites coing over the strea are the sae and its axiu value of log when all the ites in the strea are distinct. Unless otherwise specified, all logariths in this paper are to the base 2 and we define log =. For our exaple strea X, the entropy H(X) = (4/9) log (4/9) (3/9) log (3/9) (2/9) log (2/9) =.53. Often it is useful to noralize this nuber to copare entropy estiates across different easureent epochs. For this purpose, we define the standardized entropy to be H/ log. In our exaple, the standardized entropy is.53/ log 9 =.48.

3 To copute the entropy, n i i H = log ( ) [ = i log i ] i log i i = log () i log i, it suffices to copute S i i log i, since we can keep a count of exactly with log bits. For the reainder of this paper we will concern ourselves with estiating the value S. The easure of accuracy we use to evaluate our estiates is the notion of relative error, which is defined to be S S /S, where S is the estiated value and S the true value. For practical applications in traffic onitoring, we require that the relative error be low (say less than 2-3%), so that the accuracy of applications such as anoaly detection and traffic clustering is not affected. An accurate estiate of S ay not necessarily give an accurate estiate of H. In particular, when H is very sall and S is close to its axiu value, a sall relative error estiate of S ay not correspond to a sall relative error estiation of H. Let S be the estiated value of S and H the estiated value of H coputed fro S, i.e., H = log () S/. Suppose we have an algorith to copute S with relative error at ost ɛ. Then, the relative error in estiating H can be bounded as follows: H H H = = i log () S/ log () + S/ H S S H ɛ S H. Note that the relative error in H actually depends on the ratio S, which can theoretically becoe arbitrarily high if H H is close to zero. However, given reasonable lower bounds for how sall H can get, an algorith that can give an approxiation of S with relative error at ost ɛ can be converted to one that gives an approxiation of H with relative error ɛ = Θ(ɛ). Specifically, since we know that S log, if we assue a lower bound of α log for H (for soe constant α) then the relative error in estiating H is at ost ɛ = ɛ/α. Thus any approxiation schee for S can be converted to one for H if we can assue a lower bound on the entropy. Our evaluations (Section 6.2) confir that the errors for H and S are coparable. 3. LOWER BOUNDS In this paper we will present a randoized approxiation algorith that uses O(log ) space for coputing the value S of a strea. Before we do this, we would like to answer the first question of how uch effort is required to estiate the entropy of a given traffic distribution. We will deonstrate that any exact randoized algorith or any deterinistic approxiation algorith needs at least linear (in the length of the strea) space. This otivates the need to use both randoization and approxiation. We first deonstrate that any randoized algorith to copute S ust use Ω() space by reducing the counication coplexity proble of set intersection to it. Using counication coplexity is a coon way to prove lower bounds for streaing algoriths [2, 8]. We show here how to apply it to the coputation of S (and hence the entropy H). In the counication coplexity odel two parties (typically called Alice and Bob), who have non-overlapping but jointly coplete parts of the input, wish to copute soe function of the input. The counication coplexity of the function at input size n is then the largest nuber of bits that the parties have to counicate using the best protocol to copute the function, for any input of size n. There are no bounds on the coputational power of either party and the only resource being easured is the nuber of bits counicated. For the proble of set intersection, Alice and Bob have subsets A and B of {,..., N} as input. The question is then whether the sets A and B have any eleents in coon. It is known that the deterinistic counication coplexity of this proble is Θ(N) [5]. It was shown by Kalyanasundara and Schnitger in [] that any counication coplexity protocol for set intersection that has probability of error at ost δ, for any δ < /2, ust use Ω(N) bits of counication. We ake use of this result in the proof. Theore. Any randoized streaing algorith to copute the exact value of S when there are at ost ites ust use Ω() bits of space. Proof. Let us assue that we have a randoized streaing algorith that coputes S = i i log i for any strea exactly using s bits of space. This gives rise to a counication coplexity protocol, using Θ(s) bits of counication, for coputing set intersection that works as follows. Suppose that Alice and Bob have as input subsets of the set {,..., /2}. Alice siulates the algorith using her set (in any arbitrary order) as input into the algorith and sends the saved state of the algorith (at ost Θ(s) bits) to Bob. Bob then restarts the algorith, starting with that saved state, and enters his entire set. At the end of this run, Bob checks the output of the algorith if the output is zero, he outputs disjoint, otherwise he outputs not disjoint. The above protocol relies on the fact that any ites that have frequency at ost one do not count toward the su S (since log = log = ). So, the value of S coputed is exactly twice the size of the intersection. If we find that the intersection has size zero then we know that Alice and Bob s sets are disjoint, otherwise they have soething in coon. Hence, even if the streaing algorith is randoized, it ust use s = Ω() bits. If it used fewer bits it would lead to a randoized protocol for set intersection with less than Ω(N) counication, which we know fro [] to be ipossible. Theore 2. Any deterinistic streaing algorith to approxiate S with relative error less than /3 ust use Ω() bits of space. Proof. The proof that any (non-randoized) approxiation algorith is inefficient is siilar to the proof of Proposition 3.7 in [2]. Let G be a faily of 2 Θ() subsets of {,..., 2}, such that each subset has cardinality /2 and any pair of distinct subsets has at ost /4 eleents in coon. (It is possible to show such a G exists using the probabilistic ethod.)

4 Let us assue for a contradiction that there exists a deterinistic streaing algorith that estiates S with relative error at ost /3, using less than linear (in ) space. For every pair of eleents G, G 2 G, let A(G, G 2) be the sequence of length consisting of the eleents of G in sorted order followed by the eleents of G 2 in sorted order. By the pigeonhole principle, if the eory used by the algorith has less than log G = Ω() bits, then at least two distinct subsets G i, G j G result in the sae eory configuration when their contents are entered into the algorith. Hence, the algorith cannot distinguish between the streas A(G i, G i) and A(G j, G i). For the input A(G i, G i) we have that S = (/2)(2 log 2) =, but for A(G j, G i), S (/4)(2 log 2) = /2. Now, if the relative error for A(G i, G i) is less than /3, its estiated value is ore than 2/3, but if the relative error for A(G j, G i) is less than /3 its estiated value is less than 2/3. Therefore, the algorith akes a relative error of at least /3 on at least one of these inputs. This tells us that any non-randoized algorith ust either use Ω() space or have a relative error of at least /3. Thus, we see that if we use only randoization or only approxiation we cannot hope to use a sublinear aount of space. As a result, the following algoriths that we present are both randoized and approxiate. Fortunately, when we allow for these two relaxations we get algoriths that are sublinear (in particular, polylogarithic) in space and tie per ite. 4. A STREAMING ALGORITHM In this section we present our first algorith and show guarantees on the perforance and the size of the eory footprint. The basic algorith is based on the key insight that estiating S is structurally siilar to estiating the frequency oents [2]. The advantage of this technique is that it gives an unbiased estiate of the entropy, with strong theoretical guarantees on the space consuption based upon the desired accuracy of the algorith. We then show how the assuptions and analysis of the algorith can be further tightened. 4. Algorith As deonstrated in the previous section, randoization and approxiation alone do not allow us to estiate S efficiently. Hence, we present an algorith that is an (ɛ, δ)- approxiation of S. An (ɛ, δ)-approxiation algorith is one that has a relative error of at ost ɛ with probability at least δ, i.e., P r( X X Xɛ) δ, where X and X are the real and estiated values, respectively. This algorith uses the idea of the celebrated Alon Matias Szegedy frequency oent estiation algorith [2]. Conceptually, the algorith can be divided into three stages. In the first stage we select rando locations in the strea. These locations decide the set of counters that the algorith tracks during the online stage. In the second stage, the online stage, we keep track of the nuber of occurrences of ites that appear at the randoly selected locations. For each selected ite, we keep an exact counter for the nuber of subsequent occurrences of that ite. For exaple, if position k in the strea was selected, we keep an exact counter for the ite at position k (denoted as a k ) for the reainder of the strea (i.e., between locations k and ). In the third Algorith : The streaing algorith : Pre-processing stage 2: z := 32 log /ɛ 2, g := 2 log (/δ) 3: choose z g locations in the strea at rando 4: Online stage 5: for each ite a j in the strea do 6: if a j already has one or ore counters then 7: increent all of a j s counters 8: if j is one of the randoly chosen locations then 9: start keeping a count for a j, initialized at : Post-processing stage : // View the g z counts as a atrix c of size g z 2: for i := to g do 3: for j := to z do 4: X i,j := (c i,j log c i,j (c i,j ) log (c i,j )) 5: for i := to g do 6: avg[i] := the average of the Xs in group i 7: return the edian of avg[],..., avg[g] and final stage the algorith uses the various counters it has tracked to obtain an estiator for the S value of the strea. The goal of the post-processing or estiating stage is to obtain an estiate of S that is unbiased and whose error is provably low. We present the pseudocode for this algorith in Algorith. In the pre-processing stage we need to choose z g locations in the strea. Note that for this stage we need to know the length of the strea to both copute z and to choose the rando locations. The choice of the rando locations can be deferred as described in [2] and to copute z we can use a safe overestiate for log without increasing the space too uch. In the online stage, for each such position we keep a counter c for that ite fro that position on. We update at ost one record per ite during the online stage, using a data structure described in the following section. In the post-processing stage, for each of the tracked counters we copute an unbiased estiator for S as follows: X = (c log c (c ) log (c )). These g z unbiased estiators are then divided into g groups each containing z variables. First we copute the average over each of the g groups, and then obtain the edian of the groups as our returned estiate for S. Intuitively, the estiator variable X provides us an unbiased estiate of S, but does not give good guarantees on the variance, and hence the relative error. By coputing any such estiates, and obtaining the edian over the averages of ultiple groups, we can provide rigorous guarantees on the error as we will see in Section Ipleentation Details One ajor advantage of this algorith is that it is light weight. For any ite in the strea, the algorith has to update its count if the ite is being counted. Checking whether the ite is being counted can be done very quickly using a hash table. However, it is possible that a single ite has ultiple records for it. In the worst case, we would need to update every record for each ite. We could greatly iprove the efficiency of the algorith by instead keeping

5 a single record for every unique ite. This can be ipleented by only updating the ost recent record for that ite and aintaining a pointer to the next ost recent record. When the entire strea has been processed, the counts for the older records can be reconstructed fro those of the newer ones. The record data structure that we suggest is illustrated in Figure. Each record in our ipleentation would require 2 bits because we would need to store the ite label ITEM LABEL ( bits), the counter for the ite COUNTER (32 bits), a pointer CHAINING PTR (32 bits) to resolve hash collisions if we use chaining and another pointer PREV PTR (32 bits) to point to the older records for the ite. We use a conservative estiate of bits for each ite label, assuing that we would store all 5 ain IP packet header fields, i.e., srcaddr, dstaddr, srcport, dstport, protocol. CHAINING_PTR (32 bits) ITEM_LABEL (~ bits) COUNTER (32 bits) PREV_PTR (32 bits) Figure : The record data structure At the end of each epoch the algorith needs to perfor the operations of averaging and finding the edian of a list. However, both these operations only need to be done in the post-processing step. If we ake an epoch sufficiently large, then these coputations need be done relatively infrequently. 4.3 Theoretical Guarantees We present analysis that shows we can give strong guarantees while using very little space. The proof is along the lines of the one in [2] and the ain contribution here is to show how the variance of the variable X can be bounded to give such a sall space requireent. The proof requires the assuption that S or, equivalently, that H log. We show in Section 4.5 why this assuption is reasonable. Theore 3. If we assue that S, then Algorith is an (ɛ, δ)-approxiation algorith for S that uses O(log log (/δ)/ɛ 2 ) records. Proof. We will first show that the variable X is an unbiased estiator for S. We will then ake use of Chebyshev s inequality to bound the probability of having a relative error greater than ɛ. Next, we show that if we average z = 32 log /ɛ 2 variables, this probability is at ost /8. We can then use Chernoff bounds to show that if we take g = 2 log (/δ) such averages, with probability at least δ ore than half of the have less than ɛ relative error. In this case, the edian of the averages ust have relative error less than ɛ. We first observe that the expected value of each variable X is an unbiased estiate of our desired quantity S: E[X] = = n i (j log j (j ) log (j )) j= n i log i = S. To ake use of Chebyshev s inequality, we need to bound the variance of X fro above, in ters of S 2. The bound proceeds as follows: V ar(x) = E(X 2 ) E(X) 2 E(X 2 ) [ n = 2 j ] (i log i (i ) log (i )) 2. Now we observe that j= i=2 n log n (n ) log (n ) = log n n (n ) n log nn n n 2 = 2 log n, () where the inequality coes fro the facts that the logarith function is onotonically increasing and that for all n >, n n 2 (n ) n, which is proven as follows: For n = 2 the fact can easily be checked. For all other n, n > e, so n n 2 (n ) = ( ) n n = ( + ) n. n n n n n This is at ost e/n. So, the inequality holds. Now, substituting () into the bound on the variance, we get that V ar(x) 4 n i (2 log j) 2 j=2 n i log 2 i ( ) 4 log i log i i ( ) 4S log i log i = 4S 2 log, where for the last inequality we ake use of our assuption that S. Let the average of the ith group be Y i. We know that V ar(y i) = V ar(x)/z and that it is also an unbiased estiator of S. Applying Chebyshev s inequality, we get that i

6 for each Y i, P r( Y i S > ɛs) V ar(yi) ɛ 2 S 2 4S2 log zɛ 2 S 2 = 4 log zɛ 2 8. Now, by Chernoff bounds we get that with probability at least δ, at least g/2 of the averages have at ost ɛ relative error. Hence, the edian of the averages has relative error at ost ɛ with probability at least δ. Note that if we had chosen z = log /(ɛ 2 δ) we could have guaranteed an error probability of at ost δ with just this one bigger group. While the analysis in the proof works well for saller δ (i.e., δ /28), for practical applications we ay want to use larger δ. Because of the independence of each run, with δ = % we detect anoalous entropy values within one epoch with 9% certainty, within two epochs with 99% certainty and so on. For the case where δ is greater than /28.8% we can use the average of a single group of z = log /(ɛ 2 δ) estiators for our estiate. The total space (in bits) used by this algorith is ( log log (/δ) ) O (log n + log ). ɛ 2 For fixed δ and ɛ this algorith uses O(log ) records of size O(log + log n) bits. Nuerical Illustration: To put this into a practical perspective, let us consider an exaple where we have a strea of length = illion, with n = 6 illion distinct ites. To copute the entropy exactly, we could have to aintain counts for each ite using 6 illion ite labels and counters (32 bits/record 6 illion records = 94 MB). Using Algorith we could approxiate the entropy with at ost 25% relative error at least 75% of the tie with 54 thousand records or.4 MB, using 2 bit records as discussed earlier. 4.4 Exact Space Bounds In practical settings we want to know the exact values of the paraeters of the above algorith so that we use as little space as possible. We tighten the bound on the nuber of groups needed by aking the observation that j j = j( + (j ) j j )j < ej. Here is a tighter (nonasyptotic) analysis for the bound on the variance: Theore 4. If we assue that S, then Algorith can be odified to use exactly records. (6 log +64) log (/δ) ɛ 2 Proof. E(X 2 ) = n i (j log j (j ) log (j )) 2 j= n i log 2 (ej) j= ( n ) i n i = log 2 j + log 2 e + 2 log e log j j= j= ( n ) i log 2 i + log 2 e + 2S log e ( n ) S i log 2 i + log 2 e + 2S log e S ( S log + log 2 e + 2S log e ) = S 2 (log + log 2 e/s + 2 log e) (2) S 2 (log + log 2 e + 2 log e) (3) S 2 (log + 5), where (2) and (3) require the assuption that S. Hence we have that the variance V ar(x) = E(X 2 ) (E(X)) 2 S 2 (log + 4). So, we see that z = 8 log +32 suffices. ɛ 2 Nuerical Illustration: Returning to our exaple of a strea of size 67 illion, the above iproveents would drop the nuber of records for the case of at ost 25% error with 75% probability to just 6 thousand (4 KB). 4.5 A Note on Assuptions For the above analysis, we needed to ake the assuption that S. It is not hard to see (and prove) that we need soe kind of lower bound on the value of S to protect ourselves fro the case that we are trying to distinguish two streas of low S value. If one strea has all unique eleents (so that S = ) and another has only one repeated eleent, then it is very hard to distinguish the. However, we ust distinguish the to have less than % relative error. Assuing that S, or that H log, is reasonable because H attains its axiu value at log. We now show soe other conditions that give us that S, thereby aking the reasonable assuptions to ake. Theore 5. If 2n then S. Proof. It is easy to show using Lagrange ultipliers that S attains its iniu value when all the ites in the strea have the sae count. Hence, a lower bound for S is S n n log (/n ) = log (/n ). Since we have assued that 2n, this gives us that S log (/n ) log 2 =. Hence, we need only assue that each ite in the strea appears at least twice on average. This assuption protects us fro the case described earlier and in any setting where S can get arbitrarily sall. We feel that in any practical setting this siple assuption is very reasonable. For exaple, on all the traces that we experiented on, the factor /n was in the range of 5 to 3.

7 4.6 A Constant-space Solution As it turns out, if we ake a stronger (but still reasonable) assuption on how large the entropy can get, we can ake the space usage of the algorith independent of (assuing fixed sized records). Upper bounding the entropy is reasonable to do since even during abnoral events (e.g wor attacks), when the randoness of the distributions are increased, there will still be a sufficiently large aount of legitiate activity to offset the increased randoness. Recall that H attains its axiu at log, when each of the ites in the strea appears exactly once. We will assue that H β log. This gives us the following bound on S: S = log H log β( log ) = ( β) log. We can now apply this to decrease the space usage of our algorith: Theore 6. If we assue that H β log, then Algorith can be odified to use exactly records. 64 log (/δ) ( β)ɛ 2 Proof. We once again bound the variance: V ar(x) = E(X 2 ) E(X) 2 E(X 2 ) [ n = 2 j ] (i log i (i ) log (i )) 2 4 j= i=2 n i (2 log j) 2 j=2 n i log 2 i ( ) 4 log i log i 4S 2 /( β). i 32 ( β)ɛ 2 Hence, we need only z = groups, which is independent of. The desired bound on the nuber of records follows fro this. Nuerical Illustration: For a strea with 67 illion packets, if we ake the siple assuption that the entropy never goes above 9% of its axiu value then we need 2 thousand records (525 KB), and if we assue that it never exceeds 75% of its axiu value then we only need 8, 2 records (25 KB). Note that these space bounds will not increase with the size of the strea they depend only on the error paraeters. Hence, we can use a few hundred kilobytes for arbitrarily large streas, as long as we can safely ake an assuption about how large its standardized entropy can get. 5. SEPARATING THE ELEPHANTS FROM THE MICE The algorith described in the previous section provides worst-case theoretical guarantees independent of the structure of the underlying traffic distributions. In practice, however, ost network traffic streas have significant structure. In particular a siple but useful insight [6] is that traffic distributions often have a clear dearcation between large flows (or elephants), and saller flows (or ice). A sall nuber of elephant flows contribute a large volue of traffic, and for any traffic onitoring applications it ay often suffice to estiate the elephants accurately. In our second algorith (see Algorith 2) we ake use of the idea of separating the elephants fro the ice in the strea. By separately estiating the contribution of the elephants and ice to the entropy we can further iprove the accuracy of our results, thereby also decreasing the space usage of the algorith. We believe that such a sieving idea has uch broader applicability. Other streaing algoriths for estiating different traffic statistics can potentially benefit by using such an idea. Intuitively, the aount of space needed by the first algorith is directly proportional to the variance of the estiator X (see Section 4.3), and by sieving out the high-count ites we can significantly decrease the variance of the estiator and hence the space required. For this algorith we change the ethod of sapling slightly. Rather than pre-copute positions in the strea (which requires foreknowledge of the length of the strea), we saple each location with soe sall probability. After the ite is sapled, an exact count is aintained for it, siilar to the Saple and Hold algorith described in [6]. If an ite is sapled exactly once, then we consider it a ouse and copute the entropy of the ice using the previous algorith. If an ite is sapled ore than once, we consider it an elephant and estiate its exact value. Note that this ethod is different fro [9] in that we are looking for ites that are sapled ultiple ties, not necessarily in consecutive saples. Once an ite is sapled a second tie, it is considered an elephant. To estiate its exact value (i.e. to copensate for the nuber of ties the ite appeared before it was first sapled), we siply add the count between the first and second sapling. Intuitively, the nuber of occurrences of the ite between successive saples should be equal if it is evenly distributed. This ethod of approxiating the exact count of the elephant was epirically found to be a good estiator. The record data structure for this sapling ethod is siilar that used by Algorith. The ain difference is that we no longer need a pointer to older copies of an ite since we only aintain a single count for each unique ite. To be able to tell whether the ite has been sapled before or not (to deterine whether it should be prooted to an elephant) we require just a single additional bit. Thus, we see that this sapling ethod requires inial overhead to separate the elephants fro the ice. The sieving algorith assues that every flow that is sapled twice is elevated to the status of an elephant. Rather than choose the elevation threshold, we evaluated different values of the threshold before we arrive at the nuber two. Figure 2 shows the relative error in estiating the S (of the destination address distribution) as a function of k, the threshold for prooting ice to elephants, for three different packet traces. The next section provides further details on the traces used in our evaluations. We observe that the lowest error is achieved with a value of k = 2. Intuitively, a higher strike-threshold decreases the nuber of elephants, and we do not achieve the desired elephant-ice separation.

8 Algorith 2: The sieving algorith : Online stage 2: for each ite in the strea do 3: if the ite is sapled then 4: if the ite is already being counted then 5: proote the ite to elephant status 6: else 7: allocate space for a counter for this ite 8: else 9: increent the counter for this ite, if there is one : Post-processing stage : S e := 2: for each elephant (with estiated count c) do 3: S e := S e + c log c 4: estiate the contribution of the ice S fro the reaining counts using Algorith 5: return S e + S Relative Error Trace Trace 2 Trace Threshold for elephant status (k) Figure 2: Selecting the threshold k for Sieving A natural question with such a sieving algorith is one regarding the relative weights of the two different contributing factors. Intuitively, if either the elephant or the ice flows are not substantial contributors, then we can potentially reduce the space usage further by ignoring the contribution of the insignificant one. We epirically confired the need for accurate estiation of both the elephant and the ice flows. Figure 3 shows the relative contribution of the elephant and ice flows to the S estiate (for the destination address distribution on Trace ). We observe that both elephant and ice flows have substantial contributions to the overall estiation, and ignoring one of the can yield inaccurate results for estiating S, and hence H. The results across different traces and across different traffic distributions of interest were siilar and are oitted for brevity. 6. EVALUATION We first describe the datasets used in this paper. We then present a coparison of the two streaing algoriths introduced in this paper with other sapling based approaches. There are two natural etrics for characterizing the perforance of the streaing algorith for entropy coputation: resource usage and error. The resource usage is related to the nuber of counters used by different algoriths, which directly translates into the total eory (SRAM) require- Relative Contribution to S Estiate Elephants Mice Epoch Figure 3: Confiring that estiating both elephants and ice is necessary ents of the algorith, and the total CPU usage. For the following evaluations, we use the notion of relative error to deterine the accuracy of different algoriths. Datasets: We use three different packet-header traces for evaluating the accuracy of our algoriths. We provide a brief description of each. 2 University Trace (Trace ): Our first packet trace is an hour-long packet trace collected fro USC s Los Nettos collecting facility on Feb 2, 24. We bin the trace into -inute epochs, with each epoch containing roughly.7 illion TCP packets, 3267 distinct IP addresses, and 565 ports per inute. We refer to this as Trace in the following discussion. Departent Trace (Trace 2): We use a 5-hour long packet trace collected on Aug 5, 23 at the gateway router of a ediu sized departent with approxiately 35 hosts. We observe all traffic to and fro the 35 hosts behind the access router to the coercial Internet, other non-departent university hosts and servers. We bin the dataset into 5-inute epochs for our evaluation, with each epoch observing 5 TCP packets, 2587 distinct addresses, and 4672 distinct ports on average. We refer to this as Trace 2 in the following discussion. University Trace (Trace 3): The third trace we use is an hour-long trace collected at the access link of the university to the rest of the Internet, at UNC on Apr 24, 23. We bin this trace into -inute epochs as with Trace. Each epoch contains on average 2.5 illion packets, distinct IP addresses, and 88 unique application ports. We refer to this as Trace 3. Distributions of Interest: We focus on two ain types of distributions for our evaluation. The nuber of distinct source and destination addresses observed in a dataset, and the distribution of traffic across destinations are typically affected by network attacks, including DDoS and wor attacks. We track the distribution of traffic across different addresses for the source and destination addresses. Understanding the application ix that traverses a network can usually be apped into a study of the distribution of traffic on different application ports. The distribution of traffic 2 The university traces are available on request fro the respective universities. The departent trace is a private dataset fro CMU.

9 across different ports can also be indicative of scanning attacks or the eergence of new popular applications. In each case we are interested in the distribution of the nuber of packets observed at each port or address (source or destination) within the easureent epoch. Lakhina et al. [7] give an overview of different types of network events and distributions that each would affect. For Algorith we use the assuption that /n 2 since it is both a weak assuption (i.e., weaker than the one ade in Section 4.6) and easy to check. To confir that this assuption holds for our traces and distributions, we present the ratio /n for the here. For Trace the ratio is roughly 55 for the addresses and 5 for the ports. n For Trace 2, is around 93 for the addresses and 95 for the ports. Lastly, the ratio is around 97 for the addresses and 3 for the ports in Trace 3. Thus we see that in all of our traces the assuption is satisfied. 6. Coparison with Sapling Algoriths We first evaluate the accuracy of estiation of our streaing algoriths by coparing the against the following:. Sapling: This is the well-known unifor packet sapling approach used in ost coercial router ipleentations [2]. Given a sapling probability p, the sapling approach will pick each packet independently with probability p. The estiation of S and H is perfored over the set of sapled packets, after noralizing the counts by /p. 2. Saple and Hold: This is the sapling approach proposed by Estan and Varghese [6]. Here given a sapling probability p, the algorith picks each ite in the strea with probability p and and keeps an exact count for that ite fro that point on. Each saple is appropriately renoralized (increenting by a factor /p) to account for occurrences of the ite before it was sapled. The Sieving algorith introduced in Section 5 is also conceptually a sapling algorith, siilar to Saple and Hold which selects a sapling probability p apriori. In order to perfor a fair coparison of the perforance across the different algoriths, we noralize the nuber of records to keep track of to be the sae. For the following experients we fix the sapling probability p, and pick the (ɛ, δ) values for Algorith, such that the nuber of counters across different algoriths used is the sae. We ipleented and tested our algoriths on coodity hardware (Intel Xeon 3. GHz desktops with GB of RAM). We found that the total CPU utilization for the streaing algoriths was very low even though we used a preliinary ipleentation with very few code optiizations, each easureent epoch took less than seconds to process when the epoch length was an entire inute. This deonstrates that our algorith can cofortably run in real-tie. We also found that the post-processing step consued a negligible fraction of the tie of each run. Since all of the algoriths are randoized or sapling-based, for the following results we present the ean relative errors and estiates over 5 independent runs. We found that the standard deviations were very sall, and do not present the deviations for clarity of presentation. Figure 4 copares the perforance of different algoriths across different traces, using a sapling rate of p =. for the different algoriths, and using ɛ and δ for Algorith such that the nuber of counters used in all four algoriths is roughly the sae. The figures show the CDF of the relative error in estiating the entropy of destination addresses observed across different easureent epochs. The streaing algoriths consistently outperfor the sapling based approaches. For exaple, on Trace we observe that the worst-case relative error with the sapling based approaches can be as high as 8%, whereas the streaing algoriths guarantee a error of at ost 6% (Algorith ) and 4% (Sieving). We also find that the sieving algorith provides substantially ore accurate estiates for the sae space usage copared to the basic streaing algorith. The sieving algorith has a worst-case error of at ost 2-5%, which bodes well for the practical utility of the algoriths for traffic onitoring applications. For the rest of the discussion, for brevity we only present the results fro Trace, and suarize the results fro Trace 2 and Trace 3. Figure 5 copares the CDF of relative error across easureent epochs, for different distributions of interest fro Trace. We observe a siilar trend across algoriths: the sieving algorith is consistently better than Algorith, which again is substantially ore accurate than the sapling based approaches. Both the streaing algoriths have a worst-case error of 7% and ean error of less than 3% across all the different traffic etrics of interest, which is a tolerable operating range for typical onitoring applications, confiring the practical utility of our approaches. We suarize the results for the other two traces in Table and Table 2. Table : Trace 2: Mean relative error in S estiate Distribution Saple Saple&Hold Algo. Sieving DSTADDR SRCADDR DSTPORT SRCPORT Table 2: Trace 3: Mean relative error in S estiate Distribution Saple Saple&Hold Algo. Sieving DSTADDR SRCADDR DSTPORT SRCPORT Error in estiating entropy Recall fro our discussion in Section 2, that it ay be the case an accurate estiation of S does not necessarily translate into an accurate estiate of H. However, we find fro our evaluations that the streaing algoriths can yield very accurate estiates of H as well. Figures 6(a) and 6(b) copare the relative error in estiating S to the relative error in estiating H, for Algorith and the sieving algorith respectively. We observe that across different traces and distributions, that the relative error in estiating H is very low as well (less than 3% ean error with the sieving algorith). Figure 7 also provides visual confiration of the utility of the different algoriths, in tracking the standardized entropy for the destination address distribution. The sieving algorith once again appears to have greatest accuracy,

10 Fraction of easureent epochs Sapling Saple and Hold. Algorith Sieving Algorith Fraction of easureent epochs Sapling Saple and Hold. Algorith Sieving Algorith Fraction of easureent epochs Sapling Saple and Hold. Algorith Sieving Algorith (a) Trace (b) Trace 2 (c) Trace 3 Figure 4: Coparing perforance of different traces for estiating destination address entropy Fraction of easureent epochs Sapling.2 Saple and Hold. Algorith Sieving Algorith Fraction of easureent epochs Sapling.2 Saple and Hold. Algorith Sieving Algorith Fraction of easureent epochs Sapling.2 Saple and Hold. Algorith Sieving Algorith Fraction of easureent epochs Sapling.2 Saple and Hold. Algorith Sieving Algorith (a) Destination Address (b) Source Address (c) Destination Port (d) Source Port Figure 5: Coparing different distributions, using Trace Relative Error Algorith H Algorith S Epoch (a) Algorith Relative Error Sieve Algorith H Sieve Algorith S Epoch (b) Sieving algorith Figure 6: in S vs. relative error in H which can be confired with visual inspection. We suarize the results for the other two traces in Table 3 and Table 4, and we observe that in each case the error in H is coparable to (or less than) the corresponding error in S (Tables and 2 respectively). Last, we vary the eory consuption of the algorith, and show how the ean and axiu relative errors (for destination address entropy on Trace ) vary as a function of the eory usage in Figure 8. We observe that the streaing algoriths have an order of agnitude lower error than the sapling algoriths, and can achieve very high accuracy (< 2% ean error), even with as low as KB of SRAM usage. Note that even though the sapling algoriths also can give reasonably low errors at higher eory consuption (> 8 KB), the corresponding sapling rates are uch Table 3: Trace 2: Mean relative error in H estiate Distribution Saple Saple&Hold Algo. Sieving DSTADDR SRCADDR DSTPORT SRCPORT Table 4: Trace 3: Mean relative error in H estiate Distribution Saple Saple&Hold Algo. Sieving DSTADDR SRCADDR DSTPORT SRCPORT higher (> in 2 packet sapling) than what is feasible for very high-speed links. 7. DISCUSSION One interesting observation fro our evaluations is that the observed errors on the traffic traces are uch saller than the theoretical guarantees for Algorith. In particular, we observe that the epirical error is at least one order of agnitude saller than the theoretical error guarantee. This is because the algorith ust guarantee the error bound for any strea with any distribution. Realworld packet traces have considerable underlying structure that the algorith cannot directly take advantage of.

11 Standardized Entropy Actual Algorith Estiate Epoch (a) Algorith Standardized Entropy Actual Sieving Estiate Epoch (b) Sieving algorith Figure 7: Verifying the accuracy in estiating the standardized entropy Mean relative error Saple Saple&Hold Algorith Sieving algorith Space usage (in KB) (a) Mean error Maxiu relative error Saple Saple&Hold Algorith Sieving algorith Space usage (in KB) (b) Maxiu error Figure 8: in estiating H vs. eory usage for different algoriths It now follows that one way to tighten the bounds on the space/error tradeoff is to ake reasonable assuptions about the distribution of the strea and have our algoriths take advantage of the. In Section 4.6 we deonstrate this by aking the siple assuption that the standardized entropy of the strea never goes above soe fixed constant. This gives us an algorith that needs a fixed nuber of records, independent of the size of the strea. Such additional assuptions can help in tightening our space bounds. However, in order to be as general and trace-independent as possible, in our algoriths and evaluations we use very weak assuptions (i.e., 2n ). It is a coon observation that network packets have a skewed Zipfian distribution. We took advantage of this fact by separating out the few high-count elephants to facilitate the estiation of the reainder ore accurately. In doing so, however, we do not ake any assuption about the nature of the strea. Algorith 2 has the property that if there are no elephants in the strea, then it should perfor coparably to Algorith. Hence, we expect that, in general, Algorith 2 should perfor better for highly-skewed distributions, but no better than Algorith when the skew is less pronounced. 8. RELATED WORK Many of today s networking onitoring applications use the traffic volue, in ters of flow, packet, and byte counts as the priary etric of choice. These are especially of interest for anoaly detection echaniss to flag incidents of interest. Soe of the well-known ethods include signal analysis (e.g., [3]), forecasting (e.g., [4, 2]), and other statistical approaches (e.g., [6, 25]). There has been a recent interest in using entropy and traffic distribution features for different network onitoring applications. Lakhina et al. [7] use the entropy to augent anoaly detection and network diagnosis, within their PCA fraework. Others have suggested the use of such inforation easures for tracking alicious network activity [7, 23]. Xu et al. [24] use the entropy as a etric to autoatically cluster traffic, to infer patterns of interesting activity. For detecting specific types of attacks, researchers have suggested the use of entropy of different traffic features for wor [23] and DDoS detection [7]. Streaing algoriths have received a lot of interest in the algoriths and networking counity. The seinal work is that of Alon et al. [2] who provide a fraework for estiating frequency oents. Since then, there has been a huge body of literature produced on streaing algoriths, and this is well surveyed in [9]. Kuar et al. use a cobination of counting algoriths and Bayesian estiation for accurate estiation of flow size distributions [3, 4]. Streaing algoriths have also been used for identifying heavy-hitters in streas [22, 26]. While the entropy can theoretically be estiated fro the flow size distribution [3], coputing the flow size distribution conceptually provides uch greater functionality than that required for an accurate estiate of the entropy. The coplexity of estiating the flow size distribution is significantly higher than the coplexity of estiating the entropy, requiring significantly ore eory and effort in post-processing. We are aware of two concurrent efforts in the streaing algoriths counity for estiating entropy. Chakrabarti et al. [] independently proposed an algorith to estiate S that is siilar to Algorith. In this paper we show how this algorith can be odified such that the eory usage is independent of the size of the strea if we ake a siple assuption on how large the standardized entropy can get. McGregor et al. [8] outline algoriths estiating entropy and other inforation-theoretic easures in the streaing context. However, our algoriths provide unbiased estiates of the entropy, and do not ake strong assuptions regarding the underlying distribution. We also provide extensive epirical validation of the utility and accuracy of our algoriths on real datasets, and observe that our sieving approach actually outperfors Algorith. 9. CONCLUSIONS In this paper, we addressed the need for efficient algoriths for estiating the entropy of network traffic streas, for enabling several real-tie traffic onitoring capabilities. We presented lower bounds for the proble of estiating the entropy of a strea, deonstrating that for space-efficient estiation of entropy, both randoization and approxiation are necessary. We provide two streaing algoriths for the proble of estiating the entropy. The first algorith is based on the key insight that the proble shares structural siilarity with the proble of estiating frequency oents over streas. By virtue of the strong bounds that we obtain on the variance of the estiator variable, we are able to liit the space usage of the algorith to polylogarithic in the length of the strea. Under soe practical assuptions of the size of the entropy, we also give an algorith that saples a nuber of flows that is independent of the length of the strea. We also identified a ethod for

12 increasing the accuracy of entropy estiation by separating the elephants fro the ice. Our evaluations on ultiple packet traces deonstrate that our techniques produce very accurate estiates, with very low CPU and eory requireents, aking the suitable for deployent on routers with ulti-gigabit per second links. Acknowledgents We would like to thank A. Chakrabarti, K. Do Ba, and S. Muthukrishnan for their useful discussion and for kindly sharing the ost recent version of their paper [] with us. We thank Minho Sung for helping us with the datasets used in this paper. We would also like to acknowledge the useful feedback provided by the anonyous reviewers.. REFERENCES [] A. Chakrabarti, K. Do Ba, and S. Muthukrishnan. Estiating entropy and entropy nor on data streas. In Proceedings of the 23rd International Syposiu on Theoretical Aspects of Coputer Science (STACS), 26. [2] N. Alon, Y. Matias, and M. Szegedy. The space coplexity of approxiating the frequency oents. In Proceedings of ACM Syposiu on Theory of Coputing (STOC), 996. [3] P. Barford, J. Kline, D. Plonka, and A. Ron. A Signal Analysis of Network Traffic Anoalies. In Proceedings of ACM SIGCOMM Internet Measureent Workshop (IMW), 22. [4] J. D. Brutlag. Aberrant behavior detection in tie series for network onitoring. In Proceedings of USENIX Large Installation Syste Adinistration Conference (LISA), 2. [5] N. Duffield, C. Lund, and M. Thorup. Estiating flow distributions fro sapled flow statistics. In Proceedings of ACM SIGCOMM, 23. [6] C. Estan and G. Varghese. New directions in traffic easureent and accounting. In Proceedings of ACM SIGCOMM, 22. [7] L. Feinstein, D. Schnackenberg, R. Balupari, and D. Kindred. Statistical approaches to DDoS attack detection and response. In Proceedings of the DARPA Inforation Survivability Conference and Exposition, 23. [8] S. Guha, A. McGregor, and S. Venkatasubraanian. Streaing and sublinear approxiation of entropy and inforation distances. In Proceedings of ACM Syposiu on Discrete Algoriths (SODA), 26. [9] F. Hao, M. Kodiala, and T. V. Lakshan. ACCEL-RATE: a faster echanis for eory efficient per-flow traffic estiation. In Proceedings of ACM SIGMETRICS, 24. [] N. Hohn and D. Veitch. Inverting sapled traffic. In Proceedings of ACM/USENIX Internet Measureent Conference (IMC), 23. [] B. Kalyanasundara and G. Schnitger. The probabilistic counication coplexity of set intersection. SIAM Journal on Discrete Matheatics, 5(4): , 992. [2] V. Karacheti, D. Geiger, Z. Kede, and S. Muthukrishnan. Detecting alicious network traffic using inverse distributions of packet contents. In Proceedings of ACM SIGCOMM Workshop on Mining Network Data (MineNet), 25. [3] A. Kuar, M. Sung, J. Xu, and J. Wang. Data streaing algoriths for efficient and accurate estiation of flow distribution. In Proceedings of ACM SIGMETRICS/IFIP WG 7.3 Perforance, 24. [4] A. Kuar, M. Sung, J. Xu, and E. Zegura. A data streaing algorith for estiating subpopulation flow size distribution. In Proceedings of ACM SIGMETRICS, 25. [5] E. Kushilevitz and N. Nisan. Counication coplexity. Cabridge University Press, New York, NY, USA, 997. [6] A. Lakhina, M. Crovella, and C. Diot. Diagnosing network-wide traffic anoalies. In Proceedings of ACM SIGCOMM, 24. [7] A. Lakhina, M. Crovella, and C. Diot. Mining anoalies using traffic feature distributions. In Proceedings of ACM SIGCOMM, 25. [8] K. Levchenko, R. Paturi, and G. Varghese. On the difficulty of scalably detecting network attacks. In Proceedings of ACM Conference on Coputer and Counications Security (CCS), 24. [9] S. Muthukrishnan. Data streas: algoriths and applications. [2] Cisco Netflow /Tech/np/netflow/index.shtl. [2] M. Roughan, A. Greenberg, C. Kalanek, M. Rusewicz, J. Yates, and Y. Zhang. Experience in easuring internet backbone traffic variability: Models, etrics, easureents and eaning. In Proceedings of International Teletraffic Congress (ITC), 23. [22] S. Venkataraan, D. Song, P. B. Gibbons, and A. Blu. New Streaing Algoriths for Fast Detection of Superspreaders. In Proceedings of Network and Distributed Syste Security Syposiu (NDSS), 25. [23] A. Wagner and B. Plattner. Entropy Based Wor and Anoaly Detection in Fast IP Networks. In Proceedings of IEEE International Workshop on Enabling Technologies, Infrastructures for Collaborative Enterprises, 25. [24] K. Xu, Z.-L. Zhang, and S. Bhattacharya. Profiling internet backbone traffic: Behavior odels and applications. In Proceedings of ACM SIGCOMM, 25. [25] Y. Zhang, Z. Ge, M. Roughan, and A. Greenberg. Network anoography. In Proceedings of ACM/USENIX Internet Measureent Conference (IMC), 25. [26] Y. Zhang, S. Singh, S. Sen, N. Duffield, and C. Lund. Online identification of hierarchical heavy hitters: algoriths, evaluations, and applications. In Proceedings of ACM/USENIX Internet Measureent Conference (IMC), 24.

Data Streaming Algorithms for Estimating Entropy of Network Traffic

Data Streaming Algorithms for Estimating Entropy of Network Traffic Data Streaming Algorithms for Estimating Entropy of Network Traffic Ashwin Lall University of Rochester Mitsunori Ogihara University of Rochester Vyas Sekar Carnegie Mellon University Jun (Jim) Xu Georgia

More information

Online Bagging and Boosting

Online Bagging and Boosting Abstract Bagging and boosting are two of the ost well-known enseble learning ethods due to their theoretical perforance guarantees and strong experiental results. However, these algoriths have been used

More information

An Innovate Dynamic Load Balancing Algorithm Based on Task

An Innovate Dynamic Load Balancing Algorithm Based on Task An Innovate Dynaic Load Balancing Algorith Based on Task Classification Hong-bin Wang,,a, Zhi-yi Fang, b, Guan-nan Qu,*,c, Xiao-dan Ren,d College of Coputer Science and Technology, Jilin University, Changchun

More information

Cooperative Caching for Adaptive Bit Rate Streaming in Content Delivery Networks

Cooperative Caching for Adaptive Bit Rate Streaming in Content Delivery Networks Cooperative Caching for Adaptive Bit Rate Streaing in Content Delivery Networs Phuong Luu Vo Departent of Coputer Science and Engineering, International University - VNUHCM, Vietna vtlphuong@hciu.edu.vn

More information

An Approach to Combating Free-riding in Peer-to-Peer Networks

An Approach to Combating Free-riding in Peer-to-Peer Networks An Approach to Cobating Free-riding in Peer-to-Peer Networks Victor Ponce, Jie Wu, and Xiuqi Li Departent of Coputer Science and Engineering Florida Atlantic University Boca Raton, FL 33431 April 7, 2008

More information

This paper studies a rental firm that offers reusable products to price- and quality-of-service sensitive

This paper studies a rental firm that offers reusable products to price- and quality-of-service sensitive MANUFACTURING & SERVICE OPERATIONS MANAGEMENT Vol., No. 3, Suer 28, pp. 429 447 issn 523-464 eissn 526-5498 8 3 429 infors doi.287/so.7.8 28 INFORMS INFORMS holds copyright to this article and distributed

More information

Use of extrapolation to forecast the working capital in the mechanical engineering companies

Use of extrapolation to forecast the working capital in the mechanical engineering companies ECONTECHMOD. AN INTERNATIONAL QUARTERLY JOURNAL 2014. Vol. 1. No. 1. 23 28 Use of extrapolation to forecast the working capital in the echanical engineering copanies A. Cherep, Y. Shvets Departent of finance

More information

An Integrated Approach for Monitoring Service Level Parameters of Software-Defined Networking

An Integrated Approach for Monitoring Service Level Parameters of Software-Defined Networking International Journal of Future Generation Counication and Networking Vol. 8, No. 6 (15), pp. 197-4 http://d.doi.org/1.1457/ijfgcn.15.8.6.19 An Integrated Approach for Monitoring Service Level Paraeters

More information

arxiv:0805.1434v1 [math.pr] 9 May 2008

arxiv:0805.1434v1 [math.pr] 9 May 2008 Degree-distribution stability of scale-free networs Zhenting Hou, Xiangxing Kong, Dinghua Shi,2, and Guanrong Chen 3 School of Matheatics, Central South University, Changsha 40083, China 2 Departent of

More information

Halloween Costume Ideas for the Wii Game

Halloween Costume Ideas for the Wii Game Algorithica 2001) 30: 101 139 DOI: 101007/s00453-001-0003-0 Algorithica 2001 Springer-Verlag New York Inc Optial Search and One-Way Trading Online Algoriths R El-Yaniv, 1 A Fiat, 2 R M Karp, 3 and G Turpin

More information

The Research of Measuring Approach and Energy Efficiency for Hadoop Periodic Jobs

The Research of Measuring Approach and Energy Efficiency for Hadoop Periodic Jobs Send Orders for Reprints to reprints@benthascience.ae 206 The Open Fuels & Energy Science Journal, 2015, 8, 206-210 Open Access The Research of Measuring Approach and Energy Efficiency for Hadoop Periodic

More information

ON SELF-ROUTING IN CLOS CONNECTION NETWORKS. BARRY G. DOUGLASS Electrical Engineering Department Texas A&M University College Station, TX 77843-3128

ON SELF-ROUTING IN CLOS CONNECTION NETWORKS. BARRY G. DOUGLASS Electrical Engineering Department Texas A&M University College Station, TX 77843-3128 ON SELF-ROUTING IN CLOS CONNECTION NETWORKS BARRY G. DOUGLASS Electrical Engineering Departent Texas A&M University College Station, TX 778-8 A. YAVUZ ORUÇ Electrical Engineering Departent and Institute

More information

Extended-Horizon Analysis of Pressure Sensitivities for Leak Detection in Water Distribution Networks: Application to the Barcelona Network

Extended-Horizon Analysis of Pressure Sensitivities for Leak Detection in Water Distribution Networks: Application to the Barcelona Network 2013 European Control Conference (ECC) July 17-19, 2013, Zürich, Switzerland. Extended-Horizon Analysis of Pressure Sensitivities for Leak Detection in Water Distribution Networks: Application to the Barcelona

More information

Energy Efficient VM Scheduling for Cloud Data Centers: Exact allocation and migration algorithms

Energy Efficient VM Scheduling for Cloud Data Centers: Exact allocation and migration algorithms Energy Efficient VM Scheduling for Cloud Data Centers: Exact allocation and igration algoriths Chaia Ghribi, Makhlouf Hadji and Djaal Zeghlache Institut Mines-Téléco, Téléco SudParis UMR CNRS 5157 9, Rue

More information

Implementation of Active Queue Management in a Combined Input and Output Queued Switch

Implementation of Active Queue Management in a Combined Input and Output Queued Switch pleentation of Active Queue Manageent in a obined nput and Output Queued Switch Bartek Wydrowski and Moshe Zukeran AR Special Research entre for Ultra-Broadband nforation Networks, EEE Departent, The University

More information

Real Time Target Tracking with Binary Sensor Networks and Parallel Computing

Real Time Target Tracking with Binary Sensor Networks and Parallel Computing Real Tie Target Tracking with Binary Sensor Networks and Parallel Coputing Hong Lin, John Rushing, Sara J. Graves, Steve Tanner, and Evans Criswell Abstract A parallel real tie data fusion and target tracking

More information

A Scalable Application Placement Controller for Enterprise Data Centers

A Scalable Application Placement Controller for Enterprise Data Centers W WWW 7 / Track: Perforance and Scalability A Scalable Application Placeent Controller for Enterprise Data Centers Chunqiang Tang, Malgorzata Steinder, Michael Spreitzer, and Giovanni Pacifici IBM T.J.

More information

Energy Proportionality for Disk Storage Using Replication

Energy Proportionality for Disk Storage Using Replication Energy Proportionality for Disk Storage Using Replication Jinoh Ki and Doron Rote Lawrence Berkeley National Laboratory University of California, Berkeley, CA 94720 {jinohki,d rote}@lbl.gov Abstract Energy

More information

Dynamic Placement for Clustered Web Applications

Dynamic Placement for Clustered Web Applications Dynaic laceent for Clustered Web Applications A. Karve, T. Kibrel, G. acifici, M. Spreitzer, M. Steinder, M. Sviridenko, and A. Tantawi IBM T.J. Watson Research Center {karve,kibrel,giovanni,spreitz,steinder,sviri,tantawi}@us.ib.co

More information

PERFORMANCE METRICS FOR THE IT SERVICES PORTFOLIO

PERFORMANCE METRICS FOR THE IT SERVICES PORTFOLIO Bulletin of the Transilvania University of Braşov Series I: Engineering Sciences Vol. 4 (53) No. - 0 PERFORMANCE METRICS FOR THE IT SERVICES PORTFOLIO V. CAZACU I. SZÉKELY F. SANDU 3 T. BĂLAN Abstract:

More information

Media Adaptation Framework in Biofeedback System for Stroke Patient Rehabilitation

Media Adaptation Framework in Biofeedback System for Stroke Patient Rehabilitation Media Adaptation Fraework in Biofeedback Syste for Stroke Patient Rehabilitation Yinpeng Chen, Weiwei Xu, Hari Sundara, Thanassis Rikakis, Sheng-Min Liu Arts, Media and Engineering Progra Arizona State

More information

Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2

Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2 Exploiting Hardware Heterogeneity within the Sae Instance Type of Aazon EC2 Zhonghong Ou, Hao Zhuang, Jukka K. Nurinen, Antti Ylä-Jääski, Pan Hui Aalto University, Finland; Deutsch Teleko Laboratories,

More information

Machine Learning Applications in Grid Computing

Machine Learning Applications in Grid Computing Machine Learning Applications in Grid Coputing George Cybenko, Guofei Jiang and Daniel Bilar Thayer School of Engineering Dartouth College Hanover, NH 03755, USA gvc@dartouth.edu, guofei.jiang@dartouth.edu

More information

Approximately-Perfect Hashing: Improving Network Throughput through Efficient Off-chip Routing Table Lookup

Approximately-Perfect Hashing: Improving Network Throughput through Efficient Off-chip Routing Table Lookup Approxiately-Perfect ing: Iproving Network Throughput through Efficient Off-chip Routing Table Lookup Zhuo Huang, Jih-Kwon Peir, Shigang Chen Departent of Coputer & Inforation Science & Engineering, University

More information

Searching strategy for multi-target discovery in wireless networks

Searching strategy for multi-target discovery in wireless networks Searching strategy for ulti-target discovery in wireless networks Zhao Cheng, Wendi B. Heinzelan Departent of Electrical and Coputer Engineering University of Rochester Rochester, NY 467 (585) 75-{878,

More information

Software Quality Characteristics Tested For Mobile Application Development

Software Quality Characteristics Tested For Mobile Application Development Thesis no: MGSE-2015-02 Software Quality Characteristics Tested For Mobile Application Developent Literature Review and Epirical Survey WALEED ANWAR Faculty of Coputing Blekinge Institute of Technology

More information

A framework for performance monitoring, load balancing, adaptive timeouts and quality of service in digital libraries

A framework for performance monitoring, load balancing, adaptive timeouts and quality of service in digital libraries Int J Digit Libr (2000) 3: 9 35 INTERNATIONAL JOURNAL ON Digital Libraries Springer-Verlag 2000 A fraework for perforance onitoring, load balancing, adaptive tieouts and quality of service in digital libraries

More information

Generating Certification Authority Authenticated Public Keys in Ad Hoc Networks

Generating Certification Authority Authenticated Public Keys in Ad Hoc Networks SECURITY AND COMMUNICATION NETWORKS Published online in Wiley InterScience (www.interscience.wiley.co). Generating Certification Authority Authenticated Public Keys in Ad Hoc Networks G. Kounga 1, C. J.

More information

Managing Complex Network Operation with Predictive Analytics

Managing Complex Network Operation with Predictive Analytics Managing Coplex Network Operation with Predictive Analytics Zhenyu Huang, Pak Chung Wong, Patrick Mackey, Yousu Chen, Jian Ma, Kevin Schneider, and Frank L. Greitzer Pacific Northwest National Laboratory

More information

Protecting Small Keys in Authentication Protocols for Wireless Sensor Networks

Protecting Small Keys in Authentication Protocols for Wireless Sensor Networks Protecting Sall Keys in Authentication Protocols for Wireless Sensor Networks Kalvinder Singh Australia Developent Laboratory, IBM and School of Inforation and Counication Technology, Griffith University

More information

ADJUSTING FOR QUALITY CHANGE

ADJUSTING FOR QUALITY CHANGE ADJUSTING FOR QUALITY CHANGE 7 Introduction 7.1 The easureent of changes in the level of consuer prices is coplicated by the appearance and disappearance of new and old goods and services, as well as changes

More information

Information Processing Letters

Information Processing Letters Inforation Processing Letters 111 2011) 178 183 Contents lists available at ScienceDirect Inforation Processing Letters www.elsevier.co/locate/ipl Offline file assignents for online load balancing Paul

More information

Evaluating Inventory Management Performance: a Preliminary Desk-Simulation Study Based on IOC Model

Evaluating Inventory Management Performance: a Preliminary Desk-Simulation Study Based on IOC Model Evaluating Inventory Manageent Perforance: a Preliinary Desk-Siulation Study Based on IOC Model Flora Bernardel, Roberto Panizzolo, and Davide Martinazzo Abstract The focus of this study is on preliinary

More information

The Benefit of SMT in the Multi-Core Era: Flexibility towards Degrees of Thread-Level Parallelism

The Benefit of SMT in the Multi-Core Era: Flexibility towards Degrees of Thread-Level Parallelism The enefit of SMT in the Multi-Core Era: Flexibility towards Degrees of Thread-Level Parallelis Stijn Eyeran Lieven Eeckhout Ghent University, elgiu Stijn.Eyeran@elis.UGent.be, Lieven.Eeckhout@elis.UGent.be

More information

Analyzing Spatiotemporal Characteristics of Education Network Traffic with Flexible Multiscale Entropy

Analyzing Spatiotemporal Characteristics of Education Network Traffic with Flexible Multiscale Entropy Vol. 9, No. 5 (2016), pp.303-312 http://dx.doi.org/10.14257/ijgdc.2016.9.5.26 Analyzing Spatioteporal Characteristics of Education Network Traffic with Flexible Multiscale Entropy Chen Yang, Renjie Zhou

More information

Applying Multiple Neural Networks on Large Scale Data

Applying Multiple Neural Networks on Large Scale Data 0 International Conference on Inforation and Electronics Engineering IPCSIT vol6 (0) (0) IACSIT Press, Singapore Applying Multiple Neural Networks on Large Scale Data Kritsanatt Boonkiatpong and Sukree

More information

Position Auctions and Non-uniform Conversion Rates

Position Auctions and Non-uniform Conversion Rates Position Auctions and Non-unifor Conversion Rates Liad Blurosen Microsoft Research Mountain View, CA 944 liadbl@icrosoft.co Jason D. Hartline Shuzhen Nong Electrical Engineering and Microsoft AdCenter

More information

CRM FACTORS ASSESSMENT USING ANALYTIC HIERARCHY PROCESS

CRM FACTORS ASSESSMENT USING ANALYTIC HIERARCHY PROCESS 641 CRM FACTORS ASSESSMENT USING ANALYTIC HIERARCHY PROCESS Marketa Zajarosova 1* *Ph.D. VSB - Technical University of Ostrava, THE CZECH REPUBLIC arketa.zajarosova@vsb.cz Abstract Custoer relationship

More information

Reliability Constrained Packet-sizing for Linear Multi-hop Wireless Networks

Reliability Constrained Packet-sizing for Linear Multi-hop Wireless Networks Reliability Constrained acket-sizing for inear Multi-hop Wireless Networks Ning Wen, and Randall A. Berry Departent of Electrical Engineering and Coputer Science Northwestern University, Evanston, Illinois

More information

Modeling Parallel Applications Performance on Heterogeneous Systems

Modeling Parallel Applications Performance on Heterogeneous Systems Modeling Parallel Applications Perforance on Heterogeneous Systes Jaeela Al-Jaroodi, Nader Mohaed, Hong Jiang and David Swanson Departent of Coputer Science and Engineering University of Nebraska Lincoln

More information

Data Set Generation for Rectangular Placement Problems

Data Set Generation for Rectangular Placement Problems Data Set Generation for Rectangular Placeent Probles Christine L. Valenzuela (Muford) Pearl Y. Wang School of Coputer Science & Inforatics Departent of Coputer Science MS 4A5 Cardiff University George

More information

Performance Evaluation of Machine Learning Techniques using Software Cost Drivers

Performance Evaluation of Machine Learning Techniques using Software Cost Drivers Perforance Evaluation of Machine Learning Techniques using Software Cost Drivers Manas Gaur Departent of Coputer Engineering, Delhi Technological University Delhi, India ABSTRACT There is a treendous rise

More information

Research Article Performance Evaluation of Human Resource Outsourcing in Food Processing Enterprises

Research Article Performance Evaluation of Human Resource Outsourcing in Food Processing Enterprises Advance Journal of Food Science and Technology 9(2): 964-969, 205 ISSN: 2042-4868; e-issn: 2042-4876 205 Maxwell Scientific Publication Corp. Subitted: August 0, 205 Accepted: Septeber 3, 205 Published:

More information

SAMPLING METHODS LEARNING OBJECTIVES

SAMPLING METHODS LEARNING OBJECTIVES 6 SAMPLING METHODS 6 Using Statistics 6-6 2 Nonprobability Sapling and Bias 6-6 Stratified Rando Sapling 6-2 6 4 Cluster Sapling 6-4 6 5 Systeatic Sapling 6-9 6 6 Nonresponse 6-2 6 7 Suary and Review of

More information

SUPPORTING YOUR HIPAA COMPLIANCE EFFORTS

SUPPORTING YOUR HIPAA COMPLIANCE EFFORTS WHITE PAPER SUPPORTING YOUR HIPAA COMPLIANCE EFFORTS Quanti Solutions. Advancing HIM through Innovation HEALTHCARE SUPPORTING YOUR HIPAA COMPLIANCE EFFORTS Quanti Solutions. Advancing HIM through Innovation

More information

Partitioned Elias-Fano Indexes

Partitioned Elias-Fano Indexes Partitioned Elias-ano Indexes Giuseppe Ottaviano ISTI-CNR, Pisa giuseppe.ottaviano@isti.cnr.it Rossano Venturini Dept. of Coputer Science, University of Pisa rossano@di.unipi.it ABSTRACT The Elias-ano

More information

INTEGRATED ENVIRONMENT FOR STORING AND HANDLING INFORMATION IN TASKS OF INDUCTIVE MODELLING FOR BUSINESS INTELLIGENCE SYSTEMS

INTEGRATED ENVIRONMENT FOR STORING AND HANDLING INFORMATION IN TASKS OF INDUCTIVE MODELLING FOR BUSINESS INTELLIGENCE SYSTEMS Artificial Intelligence Methods and Techniques for Business and Engineering Applications 210 INTEGRATED ENVIRONMENT FOR STORING AND HANDLING INFORMATION IN TASKS OF INDUCTIVE MODELLING FOR BUSINESS INTELLIGENCE

More information

Botnets Detection Based on IRC-Community

Botnets Detection Based on IRC-Community Botnets Detection Based on IRC-Counity Wei Lu and Ali A. Ghorbani Network Security Laboratory, Faculty of Coputer Science University of New Brunswick, Fredericton, NB E3B 5A3, Canada {wlu, ghorbani}@unb.ca

More information

AUC Optimization vs. Error Rate Minimization

AUC Optimization vs. Error Rate Minimization AUC Optiization vs. Error Rate Miniization Corinna Cortes and Mehryar Mohri AT&T Labs Research 180 Park Avenue, Florha Park, NJ 0793, USA {corinna, ohri}@research.att.co Abstract The area under an ROC

More information

Impact of Processing Costs on Service Chain Placement in Network Functions Virtualization

Impact of Processing Costs on Service Chain Placement in Network Functions Virtualization Ipact of Processing Costs on Service Chain Placeent in Network Functions Virtualization Marco Savi, Massio Tornatore, Giacoo Verticale Dipartiento di Elettronica, Inforazione e Bioingegneria, Politecnico

More information

Mathematical Model for Glucose-Insulin Regulatory System of Diabetes Mellitus

Mathematical Model for Glucose-Insulin Regulatory System of Diabetes Mellitus Advances in Applied Matheatical Biosciences. ISSN 8-998 Volue, Nuber (0), pp. 9- International Research Publication House http://www.irphouse.co Matheatical Model for Glucose-Insulin Regulatory Syste of

More information

2. FINDING A SOLUTION

2. FINDING A SOLUTION The 7 th Balan Conference on Operational Research BACOR 5 Constanta, May 5, Roania OPTIMAL TIME AND SPACE COMPLEXITY ALGORITHM FOR CONSTRUCTION OF ALL BINARY TREES FROM PRE-ORDER AND POST-ORDER TRAVERSALS

More information

An Improved Decision-making Model of Human Resource Outsourcing Based on Internet Collaboration

An Improved Decision-making Model of Human Resource Outsourcing Based on Internet Collaboration International Journal of Hybrid Inforation Technology, pp. 339-350 http://dx.doi.org/10.14257/hit.2016.9.4.28 An Iproved Decision-aking Model of Huan Resource Outsourcing Based on Internet Collaboration

More information

Reconnect 04 Solving Integer Programs with Branch and Bound (and Branch and Cut)

Reconnect 04 Solving Integer Programs with Branch and Bound (and Branch and Cut) Sandia is a ultiprogra laboratory operated by Sandia Corporation, a Lockheed Martin Copany, Reconnect 04 Solving Integer Progras with Branch and Bound (and Branch and Cut) Cynthia Phillips (Sandia National

More information

6. Time (or Space) Series Analysis

6. Time (or Space) Series Analysis ATM 55 otes: Tie Series Analysis - Section 6a Page 8 6. Tie (or Space) Series Analysis In this chapter we will consider soe coon aspects of tie series analysis including autocorrelation, statistical prediction,

More information

Introduction to the Microsoft Sync Framework. Michael Clark Development Manager Microsoft

Introduction to the Microsoft Sync Framework. Michael Clark Development Manager Microsoft Introduction to the Michael Clark Developent Manager Microsoft Agenda Why Is Sync both Interesting and Hard Sync Fraework Overview Using the Sync Fraework Future Directions Suary Why Is Sync Iportant Coputing

More information

Exercise 4 INVESTIGATION OF THE ONE-DEGREE-OF-FREEDOM SYSTEM

Exercise 4 INVESTIGATION OF THE ONE-DEGREE-OF-FREEDOM SYSTEM Eercise 4 IVESTIGATIO OF THE OE-DEGREE-OF-FREEDOM SYSTEM 1. Ai of the eercise Identification of paraeters of the euation describing a one-degree-of- freedo (1 DOF) atheatical odel of the real vibrating

More information

Adaptive Modulation and Coding for Unmanned Aerial Vehicle (UAV) Radio Channel

Adaptive Modulation and Coding for Unmanned Aerial Vehicle (UAV) Radio Channel Recent Advances in Counications Adaptive odulation and Coding for Unanned Aerial Vehicle (UAV) Radio Channel Airhossein Fereidountabar,Gian Carlo Cardarilli, Rocco Fazzolari,Luca Di Nunzio Abstract In

More information

Audio Engineering Society. Convention Paper. Presented at the 119th Convention 2005 October 7 10 New York, New York USA

Audio Engineering Society. Convention Paper. Presented at the 119th Convention 2005 October 7 10 New York, New York USA Audio Engineering Society Convention Paper Presented at the 119th Convention 2005 October 7 10 New York, New York USA This convention paper has been reproduced fro the authors advance anuscript, without

More information

Insurance Spirals and the Lloyd s Market

Insurance Spirals and the Lloyd s Market Insurance Spirals and the Lloyd s Market Andrew Bain University of Glasgow Abstract This paper presents a odel of reinsurance arket spirals, and applies it to the situation that existed in the Lloyd s

More information

The Application of Bandwidth Optimization Technique in SLA Negotiation Process

The Application of Bandwidth Optimization Technique in SLA Negotiation Process The Application of Bandwidth Optiization Technique in SLA egotiation Process Srecko Krile University of Dubrovnik Departent of Electrical Engineering and Coputing Cira Carica 4, 20000 Dubrovnik, Croatia

More information

MINIMUM VERTEX DEGREE THRESHOLD FOR LOOSE HAMILTON CYCLES IN 3-UNIFORM HYPERGRAPHS

MINIMUM VERTEX DEGREE THRESHOLD FOR LOOSE HAMILTON CYCLES IN 3-UNIFORM HYPERGRAPHS MINIMUM VERTEX DEGREE THRESHOLD FOR LOOSE HAMILTON CYCLES IN 3-UNIFORM HYPERGRAPHS JIE HAN AND YI ZHAO Abstract. We show that for sufficiently large n, every 3-unifor hypergraph on n vertices with iniu

More information

Pricing Asian Options using Monte Carlo Methods

Pricing Asian Options using Monte Carlo Methods U.U.D.M. Project Report 9:7 Pricing Asian Options using Monte Carlo Methods Hongbin Zhang Exaensarbete i ateatik, 3 hp Handledare och exainator: Johan Tysk Juni 9 Departent of Matheatics Uppsala University

More information

An Efficient Algorithm for Measuring Medium- to Large-sized Flows in Network Traffic

An Efficient Algorithm for Measuring Medium- to Large-sized Flows in Network Traffic An Efficient Algorithm for Measuring Medium- to Large-sized Flows in Network Traffic Ashwin Lall Georgia Inst. of Technology Mitsunori Ogihara University of Miami Jun (Jim) Xu Georgia Inst. of Technology

More information

Factored Models for Probabilistic Modal Logic

Factored Models for Probabilistic Modal Logic Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008 Factored Models for Probabilistic Modal Logic Afsaneh Shirazi and Eyal Air Coputer Science Departent, University of Illinois

More information

Resource Allocation in Wireless Networks with Multiple Relays

Resource Allocation in Wireless Networks with Multiple Relays Resource Allocation in Wireless Networks with Multiple Relays Kağan Bakanoğlu, Stefano Toasin, Elza Erkip Departent of Electrical and Coputer Engineering, Polytechnic Institute of NYU, Brooklyn, NY, 0

More information

A Study on the Chain Restaurants Dynamic Negotiation Games of the Optimization of Joint Procurement of Food Materials

A Study on the Chain Restaurants Dynamic Negotiation Games of the Optimization of Joint Procurement of Food Materials International Journal of Coputer Science & Inforation Technology (IJCSIT) Vol 6, No 1, February 2014 A Study on the Chain estaurants Dynaic Negotiation aes of the Optiization of Joint Procureent of Food

More information

REQUIREMENTS FOR A COMPUTER SCIENCE CURRICULUM EMPHASIZING INFORMATION TECHNOLOGY SUBJECT AREA: CURRICULUM ISSUES

REQUIREMENTS FOR A COMPUTER SCIENCE CURRICULUM EMPHASIZING INFORMATION TECHNOLOGY SUBJECT AREA: CURRICULUM ISSUES REQUIREMENTS FOR A COMPUTER SCIENCE CURRICULUM EMPHASIZING INFORMATION TECHNOLOGY SUBJECT AREA: CURRICULUM ISSUES Charles Reynolds Christopher Fox reynolds @cs.ju.edu fox@cs.ju.edu Departent of Coputer

More information

Standards and Protocols for the Collection and Dissemination of Graduating Student Initial Career Outcomes Information For Undergraduates

Standards and Protocols for the Collection and Dissemination of Graduating Student Initial Career Outcomes Information For Undergraduates National Association of Colleges and Eployers Standards and Protocols for the Collection and Disseination of Graduating Student Initial Career Outcoes Inforation For Undergraduates Developed by the NACE

More information

How To Get A Loan From A Bank For Free

How To Get A Loan From A Bank For Free Finance 111 Finance We have to work with oney every day. While balancing your checkbook or calculating your onthly expenditures on espresso requires only arithetic, when we start saving, planning for retireent,

More information

Equivalent Tapped Delay Line Channel Responses with Reduced Taps

Equivalent Tapped Delay Line Channel Responses with Reduced Taps Equivalent Tapped Delay Line Channel Responses with Reduced Taps Shweta Sagari, Wade Trappe, Larry Greenstein {shsagari, trappe, ljg}@winlab.rutgers.edu WINLAB, Rutgers University, North Brunswick, NJ

More information

The AGA Evaluating Model of Customer Loyalty Based on E-commerce Environment

The AGA Evaluating Model of Customer Loyalty Based on E-commerce Environment 6 JOURNAL OF SOFTWARE, VOL. 4, NO. 3, MAY 009 The AGA Evaluating Model of Custoer Loyalty Based on E-coerce Environent Shaoei Yang Econoics and Manageent Departent, North China Electric Power University,

More information

Image restoration for a rectangular poor-pixels detector

Image restoration for a rectangular poor-pixels detector Iage restoration for a rectangular poor-pixels detector Pengcheng Wen 1, Xiangjun Wang 1, Hong Wei 2 1 State Key Laboratory of Precision Measuring Technology and Instruents, Tianjin University, China 2

More information

Preference-based Search and Multi-criteria Optimization

Preference-based Search and Multi-criteria Optimization Fro: AAAI-02 Proceedings. Copyright 2002, AAAI (www.aaai.org). All rights reserved. Preference-based Search and Multi-criteria Optiization Ulrich Junker ILOG 1681, route des Dolines F-06560 Valbonne ujunker@ilog.fr

More information

Markovian inventory policy with application to the paper industry

Markovian inventory policy with application to the paper industry Coputers and Cheical Engineering 26 (2002) 1399 1413 www.elsevier.co/locate/copcheeng Markovian inventory policy with application to the paper industry K. Karen Yin a, *, Hu Liu a,1, Neil E. Johnson b,2

More information

PREDICTION OF POSSIBLE CONGESTIONS IN SLA CREATION PROCESS

PREDICTION OF POSSIBLE CONGESTIONS IN SLA CREATION PROCESS PREDICTIO OF POSSIBLE COGESTIOS I SLA CREATIO PROCESS Srećko Krile University of Dubrovnik Departent of Electrical Engineering and Coputing Cira Carica 4, 20000 Dubrovnik, Croatia Tel +385 20 445-739,

More information

Investing in corporate bonds?

Investing in corporate bonds? Investing in corporate bonds? This independent guide fro the Australian Securities and Investents Coission (ASIC) can help you look past the return and assess the risks of corporate bonds. If you re thinking

More information

AN ALGORITHM FOR REDUCING THE DIMENSION AND SIZE OF A SAMPLE FOR DATA EXPLORATION PROCEDURES

AN ALGORITHM FOR REDUCING THE DIMENSION AND SIZE OF A SAMPLE FOR DATA EXPLORATION PROCEDURES Int. J. Appl. Math. Coput. Sci., 2014, Vol. 24, No. 1, 133 149 DOI: 10.2478/acs-2014-0011 AN ALGORITHM FOR REDUCING THE DIMENSION AND SIZE OF A SAMPLE FOR DATA EXPLORATION PROCEDURES PIOTR KULCZYCKI,,

More information

Evaluating the Effectiveness of Task Overlapping as a Risk Response Strategy in Engineering Projects

Evaluating the Effectiveness of Task Overlapping as a Risk Response Strategy in Engineering Projects Evaluating the Effectiveness of Task Overlapping as a Risk Response Strategy in Engineering Projects Lucas Grèze Robert Pellerin Nathalie Perrier Patrice Leclaire February 2011 CIRRELT-2011-11 Bureaux

More information

Calculating the Return on Investment (ROI) for DMSMS Management. The Problem with Cost Avoidance

Calculating the Return on Investment (ROI) for DMSMS Management. The Problem with Cost Avoidance Calculating the Return on nvestent () for DMSMS Manageent Peter Sandborn CALCE, Departent of Mechanical Engineering (31) 45-3167 sandborn@calce.ud.edu www.ene.ud.edu/escml/obsolescence.ht October 28, 21

More information

Physics 211: Lab Oscillations. Simple Harmonic Motion.

Physics 211: Lab Oscillations. Simple Harmonic Motion. Physics 11: Lab Oscillations. Siple Haronic Motion. Reading Assignent: Chapter 15 Introduction: As we learned in class, physical systes will undergo an oscillatory otion, when displaced fro a stable equilibriu.

More information

Modeling Cooperative Gene Regulation Using Fast Orthogonal Search

Modeling Cooperative Gene Regulation Using Fast Orthogonal Search 8 The Open Bioinforatics Journal, 28, 2, 8-89 Open Access odeling Cooperative Gene Regulation Using Fast Orthogonal Search Ian inz* and ichael J. Korenberg* Departent of Electrical and Coputer Engineering,

More information

On Computing Nearest Neighbors with Applications to Decoding of Binary Linear Codes

On Computing Nearest Neighbors with Applications to Decoding of Binary Linear Codes On Coputing Nearest Neighbors with Applications to Decoding of Binary Linear Codes Alexander May and Ilya Ozerov Horst Görtz Institute for IT-Security Ruhr-University Bochu, Gerany Faculty of Matheatics

More information

Investing in corporate bonds?

Investing in corporate bonds? Investing in corporate bonds? This independent guide fro the Australian Securities and Investents Coission (ASIC) can help you look past the return and assess the risks of corporate bonds. If you re thinking

More information

Local Area Network Management

Local Area Network Management Technology Guidelines for School Coputer-based Technologies Local Area Network Manageent Local Area Network Manageent Introduction This docuent discusses the tasks associated with anageent of Local Area

More information

SOME APPLICATIONS OF FORECASTING Prof. Thomas B. Fomby Department of Economics Southern Methodist University May 2008

SOME APPLICATIONS OF FORECASTING Prof. Thomas B. Fomby Department of Economics Southern Methodist University May 2008 SOME APPLCATONS OF FORECASTNG Prof. Thoas B. Foby Departent of Econoics Southern Methodist University May 8 To deonstrate the usefulness of forecasting ethods this note discusses four applications of forecasting

More information

Modified Latin Hypercube Sampling Monte Carlo (MLHSMC) Estimation for Average Quality Index

Modified Latin Hypercube Sampling Monte Carlo (MLHSMC) Estimation for Average Quality Index Analog Integrated Circuits and Signal Processing, vol. 9, no., April 999. Abstract Modified Latin Hypercube Sapling Monte Carlo (MLHSMC) Estiation for Average Quality Index Mansour Keraat and Richard Kielbasa

More information

- 265 - Part C. Property and Casualty Insurance Companies

- 265 - Part C. Property and Casualty Insurance Companies Part C. Property and Casualty Insurance Copanies This Part discusses proposals to curtail favorable tax rules for property and casualty ("P&C") insurance copanies. The syste of reserves for unpaid losses

More information

An Optimal Task Allocation Model for System Cost Analysis in Heterogeneous Distributed Computing Systems: A Heuristic Approach

An Optimal Task Allocation Model for System Cost Analysis in Heterogeneous Distributed Computing Systems: A Heuristic Approach An Optial Tas Allocation Model for Syste Cost Analysis in Heterogeneous Distributed Coputing Systes: A Heuristic Approach P. K. Yadav Central Building Research Institute, Rooree- 247667, Uttarahand (INDIA)

More information

Entity Search Engine: Towards Agile Best-Effort Information Integration over the Web

Entity Search Engine: Towards Agile Best-Effort Information Integration over the Web Entity Search Engine: Towards Agile Best-Effort Inforation Integration over the Web Tao Cheng, Kevin Chen-Chuan Chang University of Illinois at Urbana-Chapaign {tcheng3, kcchang}@cs.uiuc.edu. INTRODUCTION

More information

ESTIMATING LIQUIDITY PREMIA IN THE SPANISH GOVERNMENT SECURITIES MARKET

ESTIMATING LIQUIDITY PREMIA IN THE SPANISH GOVERNMENT SECURITIES MARKET ESTIMATING LIQUIDITY PREMIA IN THE SPANISH GOVERNMENT SECURITIES MARKET Francisco Alonso, Roberto Blanco, Ana del Río and Alicia Sanchis Banco de España Banco de España Servicio de Estudios Docuento de

More information

Considerations on Distributed Load Balancing for Fully Heterogeneous Machines: Two Particular Cases

Considerations on Distributed Load Balancing for Fully Heterogeneous Machines: Two Particular Cases Considerations on Distributed Load Balancing for Fully Heterogeneous Machines: Two Particular Cases Nathanaël Cheriere Departent of Coputer Science ENS Rennes Rennes, France nathanael.cheriere@ens-rennes.fr

More information

( C) CLASS 10. TEMPERATURE AND ATOMS

( C) CLASS 10. TEMPERATURE AND ATOMS CLASS 10. EMPERAURE AND AOMS 10.1. INRODUCION Boyle s understanding of the pressure-volue relationship for gases occurred in the late 1600 s. he relationships between volue and teperature, and between

More information

Binary Embedding: Fundamental Limits and Fast Algorithm

Binary Embedding: Fundamental Limits and Fast Algorithm Binary Ebedding: Fundaental Liits and Fast Algorith Xinyang Yi The University of Texas at Austin yixy@utexas.edu Eric Price The University of Texas at Austin ecprice@cs.utexas.edu Constantine Caraanis

More information

Using Bloom Filters to Refine Web Search Results

Using Bloom Filters to Refine Web Search Results Using Bloo Filters to Refine Web Search Results Navendu Jain Departent of Coputer Sciences University of Texas at Austin Austin, TX, 78712 nav@cs.utexas.edu Mike Dahlin Departent of Coputer Sciences University

More information

Quality evaluation of the model-based forecasts of implied volatility index

Quality evaluation of the model-based forecasts of implied volatility index Quality evaluation of the odel-based forecasts of iplied volatility index Katarzyna Łęczycka 1 Abstract Influence of volatility on financial arket forecasts is very high. It appears as a specific factor

More information

A Fast Algorithm for Online Placement and Reorganization of Replicated Data

A Fast Algorithm for Online Placement and Reorganization of Replicated Data A Fast Algorith for Online Placeent and Reorganization of Replicated Data R. J. Honicky Storage Systes Research Center University of California, Santa Cruz Ethan L. Miller Storage Systes Research Center

More information

Online Methods for Multi-Domain Learning and Adaptation

Online Methods for Multi-Domain Learning and Adaptation Online Methods for Multi-Doain Learning and Adaptation Mark Dredze and Koby Craer Departent of Coputer and Inforation Science University of Pennsylvania Philadelphia, PA 19104 USA {dredze,craer}@cis.upenn.edu

More information

CLOSED-LOOP SUPPLY CHAIN NETWORK OPTIMIZATION FOR HONG KONG CARTRIDGE RECYCLING INDUSTRY

CLOSED-LOOP SUPPLY CHAIN NETWORK OPTIMIZATION FOR HONG KONG CARTRIDGE RECYCLING INDUSTRY CLOSED-LOOP SUPPLY CHAIN NETWORK OPTIMIZATION FOR HONG KONG CARTRIDGE RECYCLING INDUSTRY Y. T. Chen Departent of Industrial and Systes Engineering Hong Kong Polytechnic University, Hong Kong yongtong.chen@connect.polyu.hk

More information

Calculation Method for evaluating Solar Assisted Heat Pump Systems in SAP 2009. 15 July 2013

Calculation Method for evaluating Solar Assisted Heat Pump Systems in SAP 2009. 15 July 2013 Calculation Method for evaluating Solar Assisted Heat Pup Systes in SAP 2009 15 July 2013 Page 1 of 17 1 Introduction This docuent describes how Solar Assisted Heat Pup Systes are recognised in the National

More information