Analysis of symbolic sequences using the Jensen-Shannon divergence

Size: px
Start display at page:

Download "Analysis of symbolic sequences using the Jensen-Shannon divergence"

Transcription

1 PHYSICAL REVIEW E, VOLUME 65, Analysis of sybolic sequences using the Jensen-Shannon divergence Ivo Grosse, 1,2 Pedro Bernaola-Galván, 2,3 Pedro Carpena, 2,3 Raón Roán-Roldán, 4 Jose Oliver, 5 and H. Eugene Stanley 2 1 Cold Spring Harbor Laboratory, Cold Spring Harbor, New Yor Center for Polyer Studies and Departent of Physics, Boston University, Boston, Massachusetts Departaento de Física Aplicada II, ETSI de Telecounicación, Universidad de Málaga, E Málaga, Spain 4 Departaento de Física Aplicada, Universidad de Granada, E Granada, Spain 5 Departaento de Genética e Instituto de Biotecnología, Universidad de Granada, E Granada, Spain Received 22 Deceber 2000; revised anuscript received 8 August 2001; published 25 March 2002 We study statistical properties of the Jensen-Shannon divergence D, which quantifies the difference between probability distributions, and which has been widely applied to analyses of sybolic sequences. We present three interpretations of D in the fraewor of statistical physics, inforation theory, and atheatical statistics, and obtain approxiations of the ean, the variance, and the probability distribution of D in rando, uncorrelated sequences. We present a segentation ethod based on D that is able to segent a nonstationary sybolic sequence into stationary subsequences, and apply this ethod to DNA sequences, which are nown to be nonstationary on a wide range of different length scales. DOI: /PhysRevE PACS nubers: Cc I. INTRODUCTION The statistical analysis of sybolic sequences is of central iportance in various fields of science, such as sybolic dynaics 1,2, linguistics following the pioneering wors of Shannon 3, or DNA sequence analysis 4 7. One advantage of using inforation theoretical functionals for the analysis of sybolic sequences is that they do not require the sybolic sequence to be apped to a nuerical sequence, which is necessary in spectral or correlation analyses 8. One of these functionals is the Jensen-Shannon divergence D 9 12, which quantifies the difference between two or ore probability distributions, and which can be used to copare the sybol coposition between different sequences. There are three reasons why we choose D as a easure of divergence between probability distributions: i D is related to other inforation-theoretical functionals, such as the relative entropy or the Kullbac divergence, and hence it shares their atheatical properties as well as their intuitive interpretability, ii D can be generalized to easure the distance between ore than two distributions, and iii the copared distributions can be weighted, which allows us to tae into account the different lengths of the subsequences fro which the probability distributions are coputed 13. D has been used for easuring the distance between rando graphs 10, for testing the goodness-of-fit of point estiations 12, in the analysis of DNA sequences 13,14, in the segentation of textured iages 15, and in the design of a statistical characterization of the obility edge in disordered aterials 16. In addition, by aing use of its ability to be generalized to an arbitrary nuber of probability distributions, D has been used to quantify the coplex heterogeneity of DNA sequences as well as to detect borders between coding and noncoding DNA 20. Here we describe in detail soe statistical properties of D as well as soe theoretical bacground relevant for the above-entioned applications. This paper is organized as follows: in Sec. II we introduce D and soe of its atheatical properties. In Sec. III we provide three interpretations of D, one in the fraewor of statistical physics, one in the fraewor of inforation theory, and one in the fraewor of atheatical statistics. In Sec. IV we discuss soe statistical properties of D, and we derive the ean, the variance, and the asyptotic probability distribution function of D. In Sec. V we apply the Jensen-Shannon divergence to the proble of segenting a nonstationary sequence into stationary subsequences, and show that in this context the axiu value D ax of the Jensen-Shannon divergence D sapled along a sequence becoes a quantity of central iportance. Hence, we study the probability distribution of D ax by eans of Monte-Carlo siulations. In Sec. VI we present three exaples of how D can be applied to the proble of segenting nonstationary sybolic sequences such as DNA sequences into stationary subsequences, and Sec. VII concludes this paper. II. THE JENSEN-SHANNON DIVERGENCE Several easures have been proposed to quantify the difference soeties called divergence between two or ore probability distributions 9. One of those easures is the Jensen-Shannon divergence, which is defined as follows: let p (1) (p (1) 1,p (1) 2,...,p (1) ) and p (2) (p (2) 1,p (2) 2,...,p (2) ) denote two probability distributions satisfying the usual constraints p j) ( ( i 1 and 0p j) i 1 for all,2,..., and j1, 2; and let (1) and (2) denote the weights of the distributions p (1) and p (2), satisfying the constraints (1) (2) 1 and 0 ( j) 1. Then the Jensen-Shannon divergence D between the probability distributions p (1) and p (2) with weights (1) and (2) is defined by 11 where Dp 1,p 2 H 1 p 1 2 p 2 1 Hp 1 2 Hp 2, X/2002/654/ /$ The Aerican Physical Society

2 IVO GROSSE et al. PHYSICAL REVIEW E Hp p i log 2 p i denotes the Shannon entropy of the probability distribution p(p 1,p 2,...,p ). The Jensen-Shannon divergence D can be shown to be a special case of the Jensen difference divergence introduced by Burbea and Rao 21. Also, D can be shown to be a special case of the divergence introduced by Csiszar 12,22. Hence, the Jensen-Shannon divergence D shares all atheatical properties of both the Jensen difference divergence and the divergence. It is interesting to note that the Jensen-Shannon divergence is the only easure that siultaneously belongs to the faily of Jensen difference divergences and the faily of divergences 12, i.e., the intersection of the faily of Jensen difference divergences and the faily of divergences contains only a single easure, and that easure is the Jensen-Shannon divergence D. In the following two paragraphs we list soe atheatical properties of D that turn out to be iportant for its application as a divergence easure. 1 By using the Jensen inequality 23 it is easy to see that Dp 1,p 2 0, with D p (1),p (2) 0 if and only if p (1) p (2). 2 D is syetric in its arguents p (1) and p (2), i.e., Dp 1,p 2 Dp 2,p 1. 3 D is well defined even if p (1) and p (2) are not absolutely continuous, i.e., D is well-defined even if p i (1) vanishes without vanishing p i (2) or if p i (2) vanishes without vanishing p i (1). D can be generalized to quantify the divergence between an arbitrary nuber of probability distributions. Let us consider probability distributions p (1), p (2),..., p (), and let us denote by (1), (2),..., () the corresponding weights. We can define the Jensen-Shannon divergence between the probability distributions p (1),p (2),...,p () with weights (1), (2),..., () by Dp 1,p 2,...,p H j1 j p j j1 j Hp j. It is interesting to note that the three atheatical properties entioned above for the binary case can be generalized to the -ary case as follows: 1 The Jensen inequality 23 iplies that Dp 1,p 2,...,p 0, with D p (1),p (2),...,p () 0 if and only if all probability distributions p (1),p (2),...,p () are identical, i.e., if and only if p (1) p (2) p () D is syetric in its arguents p (1),p (2),...,p (), i.e., D is invariant under any perutation of its arguents p (1),p (2),...,p (). 3 D is well defined even if the probability distributions p (1),p (2),...,p () are not absolutely continuous. III. INTERPRETATIONS OF D In the following three sections we will present three intuitive interpretations of the Jensen-Shannon divergence D. A. Interpretation of D in the fraewor of statistical physics In this section we show that D can be interpreted as the intensive ixture entropy in the following way: let us consider vessels, each one containing a ixture of ideal gases, let f ( j) ( f 1 ( j), f 2 ( j),...,f ( j) ) denote the vector of olar fractions of the gases in the jth vessel for j1,2,...,, and let n ( j) denote the total nuber of olecules in the jth vessel. Then we now fro the second law of therodynaics that the su of the Boltzann entropies of the separate vessels is saller than or equal to the Boltzann entropy of the joint vessel that we obtain after ixing the gases fro all vessels, and we can easily show that the difference of the su of the entropies obtained before the ideal gases are ixed and the entropy obtained after the ideal gases are ixed is equal to H ix N B ln 2Hf j1 n j B ln 2Hf j, where B denotes the Boltzann constant, N j1 n ( j) denotes the total nuber of ideal gas particles in all vessels, and f j1 (n ( j) /N)f ( j) denotes the vector of olar fractions of the gases in the ixture containing the gas particles of all of the vessels. H ix is coonly called ixing entropy, and it is easy to see that H ix N B ln 2D, if the weights are chosen to be ( j) n ( j) /N. Hence, D can be interpreted as the intensive ixture entropy easured in units of B ln 2. B. Interpretation of D in the fraewor of inforation theory In this section we show that D can be interpreted as the utual inforation in the following way: let us consider a sequence S of N sybols chosen fro the alphabet A a 1,a 2,...,a, and let us denote by p i the probability of finding sybol a i at an arbitrary but fixed position in sequence S, for,2,...,. Suppose that the sequence S is divided into subsequences S (1),S (2),...,S () of given lengths n (1),n (2),...,n () (, and let us denote by p j) i the probability of finding sybol a i at an arbitrary but fixed position in sequence S ( j), for,2,..., and j1,2,...,. In order to establish the connection between D and the utual inforation defined in the fraewor of inforation theory, we define the rando vector (a, s), where the ran

3 ANALYSIS OF SYMBOLIC SEQUENCES USING THE... PHYSICAL REVIEW E do variables a A and s S (1), S(2),...,S() are generated as follows: draw a rando position n with a unifor probability distribution along the sequence S, denote by a the sybol at position n, denote by s the subsequence that contains position n, and denote by p ij the joint probability of aa i and ss ( j) for,2,..., and j1,2,.... Then we obtain that the rando variable a assues the values a 1,a 2,...,a with probabilities p 1,p 2,...,p, and the rando variable s assues the values S (1),S (2),...,S () with probabilities (1) n (1) /N, (2) n (2) /N..., () n () /N, where the arginal possibilities p i and ( j) are defined by p i j1 p ij and j for,2,..., and j1,2,...,. Suppose that soeone is drawing a sybol a fro the entire sequence S, not telling us fro which subsequence s this sybol was drawn, and suppose it is our tas to guess that subsequence S fro which sybol a was drawn. One question answered by inforation theory is: How uch inforation I can we obtain fro learning the identity of the sybol a about the identity of that subsequence s fro which sybol a was drawn, provided we now the probability distribution p ij? I is called the utual inforation in a about s and defined by 3 I j1 p ij p ij log 2 j. p i Taing into account that p i ( j) denotes the conditional probability of finding sybol a i at an arbitrary but fixed position in a given fixed sequence S ( j), it follows that p ij ( j) p i ( j), and Eq. 9 can be rewritten as I j1 By rewriting Eq. 10 we obtain I j1 j p i j log 2 p i j j p i j log 2 p i j j1 p ij 9 p i. 10 j p i j log 2 p i. 11 As p i j1 ( j) ( p j) i defines the probability of finding sybol a i in the whole sequence, we obtain IDp 1,p 2,...,p. 12 Hence, D is identical to the utual inforation in a about s, which quantifies the aount of inforation we obtain fro learning the identity of the chosen sybol a about the identity of that subsequence s fro which sybol a was chosen. As I is syetric in its arguents a and s, we ay also consider the following gae: suppose soeone is drawing a sybol a fro sequence S, not telling us the identity of the drawn sybol a, but telling us the identity of that subsequence s fro which sybol a was drawn. Suppose further that it is our tas to guess the identity of the drawn sybol a. One question answered by inforation theory is: How uch inforation I can we obtain fro learning the identity of the subsequence s about the identity of the drawn sybol a, provided we now the probability distribution p ij. It can be atheatically proven that the utual inforation in a about s is identical to the utual inforation in s about a, and hence we can state that the Jensen-Shannon divergence D quantifies the aount of inforation we obtain fro learning the identity of the subsequence s about the identity of the drawn sybol a. If p (1) p (2) p (), then it is clear that nowing the identity of the sybol a does not tell us anything about the identity of the subsequence s fro which a was drawn, as the probability distributions of a are identical in all subsequences s. Liewise, it is clear that in this case nowing the subsequence s fro which a was drawn does not tell us anything about the identity of a. Hence, it is intuitively clear that the utual inforation in a about s or the utual inforation in s about a is equal to zero, and hence it is also intuitively clear that in this case the Jensen-Shannon divergence D is equal to zero. C. Interpretation of D in the fraewor of atheatical statistics In this section we show that D can be interpreted as the log-lielihood ratio in the following way: consider the proble of estiating the probabilities p(p 1,p 2,...,p ) fro a sybolic i.i.d. 24 sequence S of length N, in which at each position a sybol a i Aa 1,a 2,...,a is randoly drawn with probability p i. The axiu lielihood principle suggests to choose that probability vector p which axiizes the lielihood LSp p i F i, 13 where F i denotes the nuber of occurrences of sybol a i in sequence S. As the logarith is a strictly onotonic function, one ay equally search for that p which axiizes ln L F i ln p i. It is easy to derive by using one Lagrange ultiplier for the constraint p i 1 that p i F i /N axiizes the log-lielihood ln L. Hence, we obtain as axiu log-lielihood ln L ax N f i ln f i Nln 2Hf, 14 where f i F i /N denotes the relative frequency of finding sybol a i in sequence S of length N. Now consider the slightly ore coplicated proble of a nonstationary sequence S of length N consisting of stationary subsequences S (1),S (2),...,S () with lengths n (1),n (2),...,n () ( j), where the probability p i of generating sybol a i in subsequence S ( j) ay vary fro subsequence to subsequence. The lielihood of obtaining the entire sequence S is equal to the product of the lielihoods of obtain

4 IVO GROSSE et al. PHYSICAL REVIEW E ing the subsequences S (1),S (2),...,S (). Hence, the axiu lielihood principle suggests to choose for each j 1,2,..., that probability vector p ( j) (p 1 ( j),p 2 ( j),...,p ( j) ) that axiizes the lielihood LS j p j p j i F j i, 15 ( j) where F i is the nuber of occurrences of sybol a i in subsequence S ( j). It is again easy to derive by using ( Lagrange ultipliers for the constraints p j) i 1 that ( p j) ( i F j) i /n ( j) axiizes the log-lielihood ln L (j). Hence, we obtain as axiu log-lielihood ln L j ax n j f j i ln f j i n j ln 2Hf j, 16 i ( where f j) ( i F j) i /n ( j) denotes the relative frequency of finding sybol a i in subsequence S ( j) of length n ( j). As proble one with just one sequence is a special case of proble two of having sequences, the su of the (j) axiu log-lielihoods j1 ln L ax cannot be saller than ln L ax, because in the worst case in which all of the subsequences of proble two were identical, proble two would just reduce to proble one, giving the sae loglielihood as in proble one. Hence, the quantity L j1 ln L j ax ln L ax 17 is non-negative, and L is coonly called the loglielihood ratio. It is straightforward to see fro Eqs. 14, 16, and 17 that LNln 2D. 18 Hence, in the fraewor of atheatical statistics L can be interpreted as the increase of the log-lielihood when sequence S, instead of being odeled as a sequence generated with a single probability vector p, is odeled as a concatenation of subsequences S (1),S (2),...,S () in that order generated fro the probability vectors p (1),p (2),...,p (). The inequality L0 states that any partition of the original sequence into subsequences increases the lielihood of the second odel over the first odel. In order to choose hypothesis two subsequences in favor of hypothesis one only one sequence, we require that L be significantly greater than zero, and it is the goal of this paper to derive an approxiation of the probability distribution function of L. Note that in all of the above interpretations of D the weights of the distributions (1), (2),..., () are proportional to the sizes n (1),n (2),...,n () of the eleents considered: the nuber of particles of each of the ideal vessels or the nuber of sybols in each of the subsequences. It is interesting that this particular choice of weights arises in a natural way fro all of the three interpretations presented above, and as we will see later this choice of weights endows the Jensen-Shannon divergence D with several statistical properties that ae D particularly suitable for the analysis of sybolic sequences. IV. STATISTICAL PROPERTIES OF D Forally, D is a function of the probability distributions p (1),p (2),...,p (), but in analyses of experiental data those probability distributions are not directly observable. However, when we study experiental sybolic sequences we can estiate those probability distributions p ( j) fro the frequency distributions f ( j) ( ( f j) ( 1, f j) ( 2,...,f j) ( ), where f j) i denotes the relative frequency of sybol a i in subsequence S ( j), for,2,..., and j1,2,...,. In all analyses of experiental data the Jensen-Shannon divergence D ust be coputed fro those observable frequency distributions f (1),f (2),...,f () rather than fro the nonobservable probability distributions p (1),p (2),...,p (). ( As a consequence of replacing the probabilities p j) i by the ( j) corresponding relative frequencies f i in Eq. 1, the nuerical values of D will fluctuate fro data set to data set, even if those data sets can be assued to be generated fro the sae probability distribution. ( The fluctuation of f j) i fro data set to data set ay not only result in fluctuations of the nuerical values of D, but also in a systeatic shift bias of the nuerical values of D coputed fro the observed data as copared to the nuerical value of D coputed fro the unobservable probability distributions. In order to illustrate the presence of those fluctuations of D as well as its systeatic shift called bias, we perfor the following control experients: We generate an enseble of 2000 binary sequences ( 2) of N2500 sybols each, obtained by joining 2 subsequences as follows: we generate the left sequence of length n500 by concatenating rando, uncorrelated sybols drawn fro the probability distribution p (1) (0.45,0.55), and the right sequence of length Nn 2000 sybols drawn fro the probability distribution p (2) (0.55,0.45). We ove a cursor along the entire sequence, and we copute D between the subsequences at both sides of the cursor for all positions n (1) 1,2,..., N1 and n (2) N1, N 2,...,1. In order to illustrate the effect of different choices of the weights ( j), we copute the Jensen-Shannon divergence in two different ways: i for the choice of equal weights ( j) 1/ for all subsequences S ( j), and ii for the natural choice of weights ( j) n ( j) /N. In the following we denote by D 1/ the Jensen-Shannon divergence with the choice of equal weights i, and we denote by D the Jensen- Shannon divergence with the natural choice of weights ii. An ideal estiator of D, which quantifies the difference between two probability distributions, should reach its axiu value exactly at that point which separates the subsequences generated by different probability distributions, i.e., it should reach its axiu value when n (1) n500 and n (2) Nn2000. Figure 1a shows D versus n (1) and D 1/2 versus n (1), where the sybol denotes the en

5 ANALYSIS OF SYMBOLIC SEQUENCES USING THE... PHYSICAL REVIEW E FIG. 1. Coparison of D and D 1/2. We generate an enseble of 2000 binary sequences of length N2500, obtained by joining two subsequences of lengths n and Nn, where the left subsequence of length n is generated fro a probability distribution (x,1x) and the right subsequence of length Nn is generated fro a probability distribution (y,1y). We ove a cursor along the entire sequence and we copute D and D 1/2 between the subsequences at both sides of the cursor. Finally we plot the enseble averages D solid line and D 1/2 dashed line as a function of the position of the cursor n (1) 1,2,...,N1. In a we choose n500, x0.45, and y0.55, and find that D achieves its global axiu at n (1) 500 in the vicinity of the true fusion point of the two subsequences at n500, whereas D 1/2 achieves its global axiu at the edges n (1) 0 or n (1) 2500 far away fro the true fusion point of the two subsequences at n500. This finding indicates that D ight serve as an appropriate divergence easure to quantify the copositional differences between sybolic subsequences, whereas D 1/2 ight not. In b we choose n1250, x0.45, and y0.55, and find again that D achieves its global axiu at n (1) 1250 in the vicinity of the true fusion point of the two subsequences at n 1250, whereas D 1/2 achieves its global axiu at the edges n (1) 0orn (1) 2500 far away fro true fusion point of the two subsequences at n1250, confiring the finding fro a that D ight serve as an appropriate divergence easure to quantify the copositional differences between sybolic subsequences, whereas D 1/2 ight not. In c we choose n1250 and xy0.5, and we find that D stays quite constant at a sall value of approxiately bits, reflecting the fact that the analyzed sequences are stationary, whereas D 1/2 is clearly increasing as n (1) 0 or n (1) 2500, confiring the finding fro a and b that D ight serve as an appropriate divergence easure to quantify the copositional differences between sybolic subsequences, whereas D 1/2 ight not. The effect that even in the case of i.i.d. sequences the expected value of D is greater than zero is referred to as finite-size effect, and we address this finite-size effect in Sec. IV. seble average over all 2000 realizations. Figure 1a shows that there are draatic finite size effects when using D 1/2 dashed line instead of D solid line. While D clearly achieves its global axiu at position n (1) n500 ared with a vertical dotted line in Fig. 1a, D 1/2 achieves its highest values at the beginning and the end of the horizontal axis, i.e., at very sall and very large values of n (1). We perfor a second control experient siilar to the first experient, in which we change the lengths of the two subsequences to n1250 as well as Nn1250, and in which we eep all other paraeters the sae as before. Figure 1b shows clearly that, again, D achieves its axiu at n (1) n1250, while D 1/2 achieves its highest values at the beginning and the end of the horizontal axis, i.e., at very sall and very large values of n (1). These control experients deonstrate two results: i the location of the axiu of D can separate regions of different coposition and size in a sybolic sequence, and ii the estiation of D 1/2 and D fro sequences of finite length is affected by finite size effects. In order to illustrate point ii directly, we perfor a third control experient in which we generate the two subsequences fro the sae probability distribution. In this case the experientally obtained values of D that are nonzero are due only to statistical fluctuations. Figure 1c shows D versus n (1) and D 1/2 versus n (1) for an enseble of 2000 stationary, binary sequences of length N2500 in which each sybol is generated with probability 0.5. We find that, for all positions n (1), the values of D are approxiately the sae, whereas the values D 1/2 depend draatically on n (1). Figure 1c also shows that D is not identical to zero, and we devote the following three sections to derivations of approxiations of the ean, the variance, and the probability distribution function of D. A. Mean of D In this section we will derive an analytical approxiation of the ean value of D when coputed fro an enseble of finite i.i.d. sequences of length N. It follows directly fro the Jensen inequality that the expected value, Hf, of the entropy coputed fro an enseble of finite-length sequences cannot be greater than the theoretical value, Hp, of the entropy coputed fro the unobservable probabilities 25, i.e., HfHp, 19 where denotes the expectation value over the enseble of finite-length i.i.d. sequences generated by the probability distribution p. This atheatical stateent is intuitively clear: due to the finite saple size, the relative frequency vector f fluctuates fro saple to saple around the probability vector p, and the ajority of these fluctuations will ae f less unifor than p. Since the entropy Hp quantifies the unifority of the probability distribution p, we expect that the ajority of the values of Hf coputed fro an enseble of fluctuating frequency vectors f will be saller than the value of Hp

6 IVO GROSSE et al. PHYSICAL REVIEW E Up to first order the expected value of Hf can be approxiated by HfHp 1 2N ln 2, 20 where is the nuber of coponents of the probability and frequency vectors p and f, N is the saple size, and the sybol indicates that we neglect ters of the order of O(1/N 2 ). By applying Eq. 20 to each of the subsequences we obtain Hf j Hp j 1 2n j ln 2, 21 for j1,2,...,, where the sybol indicates that we neglect ters of the order of O 1/(n ( j) ) 2. We will use approxiations 20 and 21 to derive in the reainder of this section the expected value of the Jensen-Shannon divergence Df (1),f (2),...,f () coputed fro an enseble of i.i.d. sequences of total length N. In order to avoid lengthy expressions, we define DF Df (1),f (2),...,f () and DPDp (1),p (2),...,p (), and by substituting Eqs. 20 and 21 into Eq. 1 we obtain DFDP 1 N 2N ln 2 j1 j n j1. 22 This expression shows that, in general, the bias DF DP depends on the lengths n ( j) of the subsequences. It is easy to see that one choice of weights that aes Eq. 22 independent of the subsequence lengths n ( j) is j n j /N, 23 for j1,2,...,. This finding is interesting because this particular choice of weights turns out to be identical to the natural choice of weights in all of the three interpretations of D presented in Sec. III. With this choice of weights, the expected value of the Jensen-Shannon divergence D becoes DFDp 1 2N ln 2 1, 24 which is independent of the subsequence lengths n ( j). Figure 2 illustrates the independence of the ean value of D of the subsequence lengths n ( j), and it also shows that Eq. 24 is a reasonable approxiation of the ean value of D. Hence, expression 24 can be used as a reference to decide if a difference in coposition between two sequences is larger than expected. Note that in Fig. 1c the average value of D fits the value predicted by Eq. 24, naely, D bits. In addition, fro Eq. 24 we see that the bias of the quantity ND is independent of the sequence length N, which allows us to copare Jensen-Shannon divergence values obtained fro sequences of different sizes. FIG. 2. Mean value of D as a function of the total sequence length N, ranging fro N10 to N10 5, averaged over an enseble of 2000 i.i.d. sequences generated fro a four-letters alphabet (4), where each sybol occurs with probability 1/4. For each sequence length N we choose three different cutting points n (1) 0.5N, n (1) 0.6N, and n (1) 0.7N, and we copute for each N and each n (1) and each of the 2000 i.i.d. sequences the Jensen- Shannon divergence D between the coposition of the left subsequence of length n (1) and the coposition of the right subsequence of length n (2) Nn (1). For each N and n (1) we copute the average of D over the enseble of all 2000 i.i.d. sequences, and the figure shows the enseble average D as a function of N and n (1). We find that log 10 D decays alost linearly as a function log 10 N, with a slope very close to 1, for each n (1) 0.5N circles, n (1) 0.6N triangles, and n (1) 0.7N diaonds, and we also find that the approxiation of D fro Eq. 24 solid line agrees very well with the siulation results. With the naive choice of weights ( j) 1/ we obtain for the expected value of the Jensen-Shannon divergence the approxiation D 1/ FD 1/ P 1 2N ln 2 N A1, 25 where A j1 1/n ( j) denotes the haronic ean of the subsequence lengths n ( j). Clearly D 1/ depends on the subsequence lengths n ( j), and we see that D 1/ becoes inial for n ( j) N/, while D 1/ diverges to infinity for n ( j) 0. This analytical approxiation of the expected value of D 1/ is consistent with the draatic increase of the dashed line corresponding to D 1/ close to the edges n (1) 0or n (2) 0 of the abscissa of Fig. 1. There is another advantage of choosing the weights ( j) by Eq. 23. We will show in the following section that the choice of the weights ( j) n ( j) /N iniizes the quadratic deviation of the observed fro the true Jensen-Shannon divergence. This advantage is ore iportant than the advantage of having a bias that is independent of n ( j), because the bias can be corrected analytically, in a first-order approxiation, whereas the quadratic deviation of the observed fro the true Jensen-Shannon divergence i.e., the quadratic error cannot be reduced. Hence, it is desirable to obtain an estiator of D that iniizes the quadratic deviation of the observed fro the true Jensen-Shannon divergence i.e., the quadratic error, and we will show in the following section that the choice of the weights ( j) n ( j) /N yields exactly that optial estiator

7 ANALYSIS OF SYMBOLIC SEQUENCES USING THE... PHYSICAL REVIEW E B. Variance of D The variance of DF is given by 2 DF 2 Hf j1 j Hf j 2 Hf 2 j 2 Hf j j1 2 j1 2 j1 j covhf,hf j j l covhf j,hf l. l j1 26 As the set of vectors f (1),f (2),...,f () is productultinoially distributed, we obtain that Hf ( j) and Hf (l) are statistically independent for any jl. Hence, the ters cov(hf ( j),hf (l) ) are all equal to zero, and we need to consider only the ters 2 (Hf), 2 (Hf ( j) ), and cov(hf,hf ( j) ). By Taylor-expanding Hf about p we obtain a first-order approxiation of the variance of Hf 5,6,27,28, 2 Hf 1 N 2 log 2 p, 27 where n j denotes the length of subsequence S ( j), 2 (log 2 p ( j) ) denotes the variance of the nubers log 2 p i with respect to the probability distribution p i, and the sybol indicates that we neglect ters of the order of O(1/N 2 ). Liewise, we obtain a first-order approxiation of the variance of Hf ( j), 2 Hf j 1 n j 2 log 2 p j, 28 where N denotes the length of the whole sequence, 2 (log 2 p ( j) ) denotes the variance of the nubers log 2 p i (j) with respect to the probability distribution p i ( j) for every j1,2,...,, and the sybol indicates that we neglect ters of the order of O 1/(n ( j) ) 2. In the Appendix we derive a siilar first-order approxiation of the covariance ters, and under the null hypothesis that p (1) p (2) p () p we obtain and Hf ( j) Eq. 29 is equal to the first-order approxiation of the variance of Hf Eq. 27. By substituting the expressions fro Eqs. 27, 28, and 29 into Eq. 26 we obtain for the variance of the Jensen- Shannon divergence with arbitrary weights (1), (2),..., (), 2 D j1 j j n N j 1 2 log 2 p, 30 under the null hypothesis that p (1) p (2) p () p, where the sybol indicates that we neglect ters of the order of O(1/N 2 ). Let us now consider that choice of weights ( j) which iniizes the quadratic deviation of the observed fro the true Jensen-Shannon divergence DFDP 2 2 DDEDP As the second ter on the right hand side of Eq. 31 is of the order of O(1/N 2 ), the iniization of the quadratic deviation of the observed fro the true Jensen-Shannon divergence reduces to the iniization of the variance of the Jensen-Shannon divergence estiator. By using one Lagrange ultiplier for the noralization constraint j ( j) 1 we obtain that the set of weights ( j) n ( j) /N iniizes the variance of the Jensen-Shannon divergence D. This finding is intriguing, because this set of weights is i identical to the natural choice of weights in all of the three interpretations of D presented in Sec. III as well as ii identical to the special choice of weights that aes the bias of D independent of the subsequence lengths n ( j) Eq. 24. Furtherore, we find that for the special choice of weights ( j) n ( j) /N the variance of D vanishes in O(1/N). This eans that for the special choice of weights ( j) n ( j) /N the leading ter of 2 (D) decreases with the sequence length N as 1/N 2, whereas in general it decreases as 1/N. It is clear that for the special choice of weights ( j) n ( j) /N the O(1/N) ter of 2 (D) becoes independent of both n ( j) and p, and it is interesting that for this special choice of weights the O(1/N 2 ) ter of 2 (D) also turns out to be independent of both n ( j) and p. In contrast, we find that for the naive choice of weights ( j) 1/ the variance of D 1/ neither vanishes in O(1/N) nor does it becoe independent of the subsequence lengths n ( j), and we obtain for the variance of the Jensen-Shannon divergence D 1/, covhf,hf j 1 N 2 log 2 p 29 2 D 2 log 2 p N N 2 A1, 32 for all j1,2,...,, where 2 (log 2 p) denotes the variance of the nubers log 2 p i with respect to the probability distribution p i, and the sybol indicates that we neglect ters of the order of O(1/N 2 ). It is interesting to note that the first-order approxiation of the covariance between Hf where A j1 1/n ( j) denotes the haronic ean of the subsequence lengths n ( j). Note that the expression inside the parentheses on the right-hand side of Eq. 32 is siilar to the expression inside the parentheses on the right-hand side of Eq. 25. Hence, the variance of D 1/ shows a singular

8 IVO GROSSE et al. PHYSICAL REVIEW E behavior siilar to that of the ean of D 1/ when the length of at least one subsequence becoes very sall. C. Probability distribution of D Expression 24 provides a good criterion to tell whether an experientally observed Jensen-Shannon divergence D between frequency distributions is greater than expected by chance, but it does not tell if D is significantly greater than expected by chance. In this section we will derive the probability distribution of D in order to quantify the statistical significance of experientally observed values of D. Given an observed value of Dx, we will calculate the probability of obtaining this value or a lower value by chance under the null hypothesis that all sequences are generated fro the sae probability distribution. We call this probability the significance threshold of the given value x, and we denote it by sxprobdx. 33 As s(x) does not see to adit an easy analytical expression, we will obtain an approxiation by using the Taylor expansion x log 2 x a xa xa2 ln 2 a2 ln2 O xa3, 34 to approxiate D in ters of quadratic functions as follows: D j1 j1 j1 p j j p j i j i log 2 j p i p j i j j p i ln 2 pi j j p i j 2 j1 p i j 2ln2 35 j pi j p i j 2 p i j. 36 2ln2 It is interesting to note that in this quadratic approxiation of D there are no constant or linear ters because the first double su of Eq. 35 vanishes exactly due to noralization of the probability distributions p i ( j), p i, and ( j). If we express the 2 statistic 31 in the sae notation, we obtain 2 N pi j j p i j 2 j1 p i j 2Nln 2D. 37 The above 2 statistic is nown to converge for asyptotically large values of N to the 2 distribution with ( 1)(1) degrees of freedo 31. Hence, also 2N(ln2)D converges for asyptotically large values of N to the 2 distribution with (1)(1) degrees of freedo, i.e., we obtain for asyptotically large values of N the approxiation /2,Nln 2x sxf 2Nln 2x, 38 /2 where (a,x) and (a) denote the incoplete and coplete gaa functions, respectively 31,32. The fact that D can be interpreted as utual inforation agrees with Eq. 38, as it is nown that, up to a ultiplicative constant, the utual inforation converges for asyptotically large values of N to the 2 probability distribution with (1)(1) degrees of freedo 6. V. STATISTICAL PROPERTIES OF D ax Expression 38 gives the significance threshold of a single value of D coputed between two saples of fixed length. Fro the practical point of view this is equivalent to preselecting a fixed point that divides a sequence into two subsequences and asing for the probability that both subsequences have been generated fro different probability distributions. But, in general, when facing an unnown sequence we do not have any a priori nowledge of the location of the possible cutting point. The proble of finding the point where a nonstationary sequence can be ost liely divided into two stationary subsequences has been widely studied in atheatics. There, the proble is nown as the change-point proble 33 35, which consists of finding out i whether there exists a change point in the studied sequence, and ii at which position in the sequence the change point is located, provided it exists. Tas i corresponds to deterining whether the studied sequence is nonstationary, and tas ii corresponds to deterining the ost liely location of the nonstationarity, provided it exists. Since 2N(ln2)D can be interpreted as the log-lielihood ratio of the odel with change point and the odel without change point, the axiization of D along the sequence yields a natural way of deterining the ost liely location of the change point. Hence, we ove a cursor along the entire sequence, copute D between the subsequences at both sides of the cursor for all positions, and choose that position as the optial change point at which D reaches its axiu value D ax. In Sec. VI we describe a recursive segentation algorith that is based on this idea. The proble we will address in this section is to decide if the value D ax of the Jensen- Shannon divergence at the optial change point is sufficiently large to partition the sequence at that point, or if the value D ax is sufficiently sall to consider the entire sequence as stationary and not partition it at all. Hence, we will address in this section the proble of coputing the statistical significance of experientally observed values of D ax. Even if the studied sequence has been generated fro a single probability distribution, we find D ax 0 due to statistical fluctuations. Moreover, we find that D ax increases above any significance threshold s coputed in Sec. IV as N increases. To decide if the obtained value D ax x is statistically significant we need to copute the probability of obtaining this value or a lower value by chance in a rando sequence, i.e., we need to copute

9 ANALYSIS OF SYMBOLIC SEQUENCES USING THE... PHYSICAL REVIEW E s ax xprobd ax x. 39 Obviously s ax (x)s(x). In fact, if each value of D at each position of the cursor were independent of the others, we would obtain 36 s ax xsx N F 1 2Nln 2x N, 40 where N denotes the sequence length. Note that we are dealing with the coparison between only two distributions ( 2), and hence the nuber of degrees of freedo is 1. It is clear that the rando variables D sapled at different positions of the sae sequence are not statistically independent, because the value of D at a given position is alost identical to the value of D at the neighboring positions. For binary (2) i.i.d. sequences Horvath 37 derives an analytic expression for s ax (x) in the liit of asyptotically large sequence lengths N, and Csorgo and Horvath 38 generalize that result to arbitrary by deriving that the probability distribution function of Z N 2N(ln2)D ax converges for asyptotically large values of N to ProbA N Z N B N x 2 exp2e x, 41 where N denotes the sequence length, 1 denotes the nuber of degrees of freedo, A N is defined by and B N () is defined by A N 2lnlnN, B N 2 lnlnn 2 ln ln ln Nln 2. By converting Eq. 41 into our notation we obtain s ax xexp2e B N A N 2N ln 2x. 44 In the following paragraphs we test how accurately the asyptotic approxiation s ax (x) agrees with the finite-size histogra ŝ ax (x) obtained by Monte-Carlo siulations of sequences of length N ranging fro 10 2 to For each sequence length N10 2,10 4,10 6, and 10 8, we generate an enseble of 10 5 quaternary (4) i.i.d. sequences of length N, and for each sequence of each enseble we ove a cursor along the sequence and copute at each position 15nN15 the Jensen-Shannon divergence D 39. We define D ax as the axiu of all values of D coputed fro one sequence, and by collecting all values D ax of each enseble of 10 5 rando i.i.d. sequences of length N we obtain the histogras ŝ ax (x) for each N. Figure 3a shows the histogras ŝ ax (x) for 4 and N10 2, 10 4, 10 6, and 10 8 sybols together with the asyptotic approxiations s ax (x) solid lines. We find that the asyptotic approxiations s ax (x) are not very accurate, and that even for sequence lengths as large as N10 8 there is still a significant deviation between ŝ ax (x) and s ax (x). Figure 3a also shows that the deviations between ŝ ax (x) FIG. 3. Histogras ŝ ax (x) of x2n(ln2) D ax and their asyptotic approxiations s ax (x) obtained fro ensebles of 10 5 quaternary (4) i.i.d. sequences of length N10 2,10 4,10 6, and a shows that the asyptotic approxiations s ax (x) are not very accurate for finite-size sequences ranging in length N fro 10 2 to 10 8, and that the largest deviations between ŝ ax (x) and s ax (x) occur in the right tails of the distributions. b shows a plot of the differences between the histogras ŝ ax (x) and their asyptotic approxiations s ax (x) versus x2n(ln2) D ax. We find that the accuracy of the approxiations increases with increasing N, but that even for sequences of length N10 8 the deviations between ŝ ax (x) and s ax (x) are greater than and s ax (x) are particularly large in the right tail, where we desire both distributions agree particularly well. Figure 3b illustrates the deviations between ŝ ax (x) and s ax (x) by plotting ŝ ax (x)s ax (x) versus 2N(ln2)x. We find that the deviations between ŝ ax (x) and s ax (x) tend to becoe saller as the sequence length N increases, but even for sequences of length N10 8 the deviations between ŝ ax (x) and s ax (x) are greater than As the asyptotic approxiation s ax (x) is not very accurate for sequences ranging in length fro N10 2 to 10 8, we recruit Monte-Carlo siulations to obtain nuerical approxiations of ŝ ax (x) as a function of the sequence length N and the alphabet size. We find that the functional for of ŝ ax (x) sees to be very siilar to the functional for stated in Eq. 40 if we replace the sequence length N by an effective length N eff, and if we introduce a scaling factor 1, by which we ultiply the arguent of F 1. Specifically, we find that the probability distribution of D ax ay be approxiated by

10 IVO GROSSE et al. PHYSICAL REVIEW E s ax xsx N efff 1 2Nln 2x N eff. 45 N eff can be understood as the effective nuber of independent cutting points, and the scaling factor accoplishes that the variance of D ax is reduced due to correlations between the values of D coputed at different positions of the sae sequence. Note that, in principle, both paraeters N eff and depend on both N and. To find an approxiation of that dependence of N eff and on N and, we perfor the following siulations: 1 We generate, for a given alphabet size and a given sequence length N, an enseble of 10 5 rando i.i.d. sequences. 2 For each sequence, we ove a cursor along the sequence and copute at each position 15nN15 the Jensen-Shannon divergence D 39, and we define D ax as the axiu of all values of D coputed fro one sequence. 3 For each enseble of 10 5 rando i.i.d. sequences we obtain the histogra ŝ ax (x), and we fit the paraeters N eff and of s ax (x) given by expression 45 to ŝ ax (x) by iniizing the Kologorov-Sirnov distance ŝ ax (x)s ax (x). 4 We repeat the above procedure for different values of and N. Figure 4a shows the histogras ŝ ax (x) for 4 and N10 2, 10 4, 10 6, and 10 8 sybols together with the finite-size approxiation s ax (x) obtained by the above procedure. We find by visual inspection of Fig. 4a and by extensive analysis of the Kologorov-Sirnov distances between ŝ ax (x) and s ax (x) for varying fro 2 to 12 and N varying fro 10 2 to 10 8 that s ax (x) fro Eq. 45 provides a good approxiation of ŝ ax (x). Figure 4b shows the deviations between ŝ ax (x) and s ax (x) by plotting ŝ ax (x)s ax (x) versus 2N(ln2)x, and we find that the axiu deviation between ŝ ax (x) and s ax (x) stays below 0.02 for all of the cases we analyze, ranging fro 2 to12 and fro N10 2 to N10 8. Moreover, we find that the axiu deviation between ŝ ax (x) and s ax (x) stays below 0.01 if we restrict the coparison of ŝ ax (x) and s ax (x) to the right tails of the distributions, where we want the approxiations to be particularly accurate. Next, we study how the paraeters N eff and obtained by the fitting procedure described above depend on the alphabet size and the sequence length N. Figure 5 shows N eff and versus N for varying values of. First, we find that is practically independent of N. Second, we find that for each the effective nuber of cutting points N eff adits a good linear fit as a function of ln N, i.e., N eff a ln Nb. 46 Both paraeters a and b depend on the alphabet size, and we present the least-squares values of a and b as a function of in Table I. FIG. 4. Histogras ŝ ax (x) ofx2n(ln2) D ax and their finitesize approxiations s ax (x) obtained fro ensebles of 10 5 quaternary (4) i.i.d. sequences of length N10 2,10 4,10 6, and a shows that the approxiations s ax (x) are ore accurate for sequences of length N ranging fro 10 2 and 10 8 than the asyptotic approxiations s ax (x) presented in Fig. 3, and that the largest deviations between ŝ ax (x) and s ax (x) do not occur in the right tails of the distributions, which we desire to approxiate as accurately as possible. b shows a plot of the differences between the histogras ŝ ax (x) and their finite-size approxiations s ax (x) versus x2n(ln2) D ax. We find that the deviations between ŝ ax (x) and s ax (x) are saller than Moreover, we find that the deviations between ŝ ax (x) and s ax (x) are saller than 0.01 if we restrict the coparison of ŝ ax (x) and s ax (x) to the tails of the distributions, which we desire to approxiate as accurately as possible. VI. APPLICATIONS OF D In this section we illustrate how the results obtained in the previous sections ay be used to develop an algorith that can partition a nonstationary sequence into stationary subsequences. We describe this segentation algorith based on the Jensen-Shannon divergence D in detail, and we present three application exaples of this recursive segentation algorith. Many sequence analysis techniques rely on the stationarity of the analyzed sequence, i.e., they rely on the assuption that all portions of the sequence have at least the sae coposition. This a priori assuption is very often in conflict with experiental data, such as, for exaple, in case of DNA sequences 40. The algorith described here, which is an iproved version of the algorith presented in Refs. 13 and 18, allows us to decopose a nonstationary sequence

11 ANALYSIS OF SYMBOLIC SEQUENCES USING THE... PHYSICAL REVIEW E FIG. 5. Paraeter values of N eff squares and circles as a function of the sequence length N, ranging fro 200 to 10 5, for an alphabet size 4. We find that is alost independent of N, 0.80, while N eff adits a good linear fit to ln N. The least-squares fit to N eff a ln Nb yields a2.44 and b6.15. into stationary subsequences of hoogeneous coposition as follows: First, we ove along the sequence a cursor that divides at each position the sequence into two subsequences, and we copute D for each position of the cursor. We select that point at which D reaches its axiu value D ax, and we copute its statistical significance s ax. If this s ax exceeds a given threshold s 0, the sequence is cut at this point, and the procedure continues recursively for each of the two resulting subsequences. Otherwise, the sequence reains undivided. The process stops when none of the possible cutting points has a significance threshold exceeding s 0, and we say that the sequence is segented at significance threshold s 0. In the following three sections we present three exaples that illustrate this recursive segentation process. A. Segentation of a odel sequence with nown copositional doains In order to test if the segentation algorith wors, we generate a binary sequence of length obtained by joining patches of different length and coposition. We choose the sizes of the patches randoly fro a power-law distribution in order to obtain a wide range of different sizes, and we choose the coposition of the patches randoly fro a truncated Gaussian distribution centered at 1/2. To show graphically the variation in coposition along this sequence, we plot in Fig. 6 the wal of the sequence. Given a binary sequence y i,,...,n, where y i can assue the values 1 or1, the wal of the sequence at position n is defined by 41 TABLE I. Values of the paraeters a, b, and obtained by least-squares fitting of s ax (x) for three values of the alphabet size. a b FIG. 6. Segentation of a coputer generated binary sequence of length obtained by joining patches of different length and coposition. The solid line represents the wal of the sequence see text and the vertical dotted lines represent the locations of the cuts obtained by the recursive segentation procedure at significance threshold s 0 95%. We find that the recursive segentation procedure is indeed capable of partitioning the nonstationary input sequence into stationary subsequences at those points vertical dotted lines at which the local coposition of the sequence changes, indicated by changes of the slope of the sequence wal solid line. n wn y i. 47 Regions with a positive slope in Fig. 6 correspond to an abundance of 1 s, and regions with a negative slope correspond to an abundance of 1 s. We apply the segentation procedure presented above to this exaple sequence, and the vertical lines in Fig. 6 correspond to the cuts obtained by eans of the segentation procedure. Figure 6 shows clearly that the positions of the cuts coincide accurately with changes in the slope of w(n). Moreover, regions without any cut do not see to show a significant change of the slope of w(n). This observation allows us to conjecture that the subsequences obtained by the segentation procedure are indeed hoogeneous at the considered significance threshold. It is worth entioning that the ethod does not rely on any initial assuption about the size distribution of the subsequences, and as we can verify by inspecting Fig. 6 the resulting subsequences have indeed a great variety of sizes. B. Length distribution of copositionally stationary doains in proaryotic and euaryotic DNA In this subsection we present one exaple in which we apply the recursive segentation procedure to DNA sequences with the goal of studying the length distribution of copositionally stationary doains in proaryotic and euaryotic DNA. We segent at a significance threshold of s 0 95% the coplete genoe of the bacteriu Escherichia coli 42 with a length of base pairs bp as well as the huan ajor histocopatibility coplex MHC region of chroosoe 6 43 with a siilar size of bp. In both cases we use the natural four-letter alphabet A

The Velocities of Gas Molecules

The Velocities of Gas Molecules he Velocities of Gas Molecules by Flick Colean Departent of Cheistry Wellesley College Wellesley MA 8 Copyright Flick Colean 996 All rights reserved You are welcoe to use this docuent in your own classes

More information

Factor Model. Arbitrage Pricing Theory. Systematic Versus Non-Systematic Risk. Intuitive Argument

Factor Model. Arbitrage Pricing Theory. Systematic Versus Non-Systematic Risk. Intuitive Argument Ross [1],[]) presents the aritrage pricing theory. The idea is that the structure of asset returns leads naturally to a odel of risk preia, for otherwise there would exist an opportunity for aritrage profit.

More information

A Gas Law And Absolute Zero

A Gas Law And Absolute Zero A Gas Law And Absolute Zero Equipent safety goggles, DataStudio, gas bulb with pressure gauge, 10 C to +110 C theroeter, 100 C to +50 C theroeter. Caution This experient deals with aterials that are very

More information

Reliability Constrained Packet-sizing for Linear Multi-hop Wireless Networks

Reliability Constrained Packet-sizing for Linear Multi-hop Wireless Networks Reliability Constrained acket-sizing for inear Multi-hop Wireless Networks Ning Wen, and Randall A. Berry Departent of Electrical Engineering and Coputer Science Northwestern University, Evanston, Illinois

More information

Online Bagging and Boosting

Online Bagging and Boosting Abstract Bagging and boosting are two of the ost well-known enseble learning ethods due to their theoretical perforance guarantees and strong experiental results. However, these algoriths have been used

More information

arxiv:0805.1434v1 [math.pr] 9 May 2008

arxiv:0805.1434v1 [math.pr] 9 May 2008 Degree-distribution stability of scale-free networs Zhenting Hou, Xiangxing Kong, Dinghua Shi,2, and Guanrong Chen 3 School of Matheatics, Central South University, Changsha 40083, China 2 Departent of

More information

A Gas Law And Absolute Zero Lab 11

A Gas Law And Absolute Zero Lab 11 HB 04-06-05 A Gas Law And Absolute Zero Lab 11 1 A Gas Law And Absolute Zero Lab 11 Equipent safety goggles, SWS, gas bulb with pressure gauge, 10 C to +110 C theroeter, 100 C to +50 C theroeter. Caution

More information

Machine Learning Applications in Grid Computing

Machine Learning Applications in Grid Computing Machine Learning Applications in Grid Coputing George Cybenko, Guofei Jiang and Daniel Bilar Thayer School of Engineering Dartouth College Hanover, NH 03755, USA gvc@dartouth.edu, guofei.jiang@dartouth.edu

More information

Construction Economics & Finance. Module 3 Lecture-1

Construction Economics & Finance. Module 3 Lecture-1 Depreciation:- Construction Econoics & Finance Module 3 Lecture- It represents the reduction in arket value of an asset due to age, wear and tear and obsolescence. The physical deterioration of the asset

More information

Lecture L26-3D Rigid Body Dynamics: The Inertia Tensor

Lecture L26-3D Rigid Body Dynamics: The Inertia Tensor J. Peraire, S. Widnall 16.07 Dynaics Fall 008 Lecture L6-3D Rigid Body Dynaics: The Inertia Tensor Version.1 In this lecture, we will derive an expression for the angular oentu of a 3D rigid body. We shall

More information

Searching strategy for multi-target discovery in wireless networks

Searching strategy for multi-target discovery in wireless networks Searching strategy for ulti-target discovery in wireless networks Zhao Cheng, Wendi B. Heinzelan Departent of Electrical and Coputer Engineering University of Rochester Rochester, NY 467 (585) 75-{878,

More information

Media Adaptation Framework in Biofeedback System for Stroke Patient Rehabilitation

Media Adaptation Framework in Biofeedback System for Stroke Patient Rehabilitation Media Adaptation Fraework in Biofeedback Syste for Stroke Patient Rehabilitation Yinpeng Chen, Weiwei Xu, Hari Sundara, Thanassis Rikakis, Sheng-Min Liu Arts, Media and Engineering Progra Arizona State

More information

Cooperative Caching for Adaptive Bit Rate Streaming in Content Delivery Networks

Cooperative Caching for Adaptive Bit Rate Streaming in Content Delivery Networks Cooperative Caching for Adaptive Bit Rate Streaing in Content Delivery Networs Phuong Luu Vo Departent of Coputer Science and Engineering, International University - VNUHCM, Vietna vtlphuong@hciu.edu.vn

More information

( C) CLASS 10. TEMPERATURE AND ATOMS

( C) CLASS 10. TEMPERATURE AND ATOMS CLASS 10. EMPERAURE AND AOMS 10.1. INRODUCION Boyle s understanding of the pressure-volue relationship for gases occurred in the late 1600 s. he relationships between volue and teperature, and between

More information

6. Time (or Space) Series Analysis

6. Time (or Space) Series Analysis ATM 55 otes: Tie Series Analysis - Section 6a Page 8 6. Tie (or Space) Series Analysis In this chapter we will consider soe coon aspects of tie series analysis including autocorrelation, statistical prediction,

More information

Use of extrapolation to forecast the working capital in the mechanical engineering companies

Use of extrapolation to forecast the working capital in the mechanical engineering companies ECONTECHMOD. AN INTERNATIONAL QUARTERLY JOURNAL 2014. Vol. 1. No. 1. 23 28 Use of extrapolation to forecast the working capital in the echanical engineering copanies A. Cherep, Y. Shvets Departent of finance

More information

Physics 211: Lab Oscillations. Simple Harmonic Motion.

Physics 211: Lab Oscillations. Simple Harmonic Motion. Physics 11: Lab Oscillations. Siple Haronic Motion. Reading Assignent: Chapter 15 Introduction: As we learned in class, physical systes will undergo an oscillatory otion, when displaced fro a stable equilibriu.

More information

SAMPLING METHODS LEARNING OBJECTIVES

SAMPLING METHODS LEARNING OBJECTIVES 6 SAMPLING METHODS 6 Using Statistics 6-6 2 Nonprobability Sapling and Bias 6-6 Stratified Rando Sapling 6-2 6 4 Cluster Sapling 6-4 6 5 Systeatic Sapling 6-9 6 6 Nonresponse 6-2 6 7 Suary and Review of

More information

Data Set Generation for Rectangular Placement Problems

Data Set Generation for Rectangular Placement Problems Data Set Generation for Rectangular Placeent Probles Christine L. Valenzuela (Muford) Pearl Y. Wang School of Coputer Science & Inforatics Departent of Coputer Science MS 4A5 Cardiff University George

More information

This paper studies a rental firm that offers reusable products to price- and quality-of-service sensitive

This paper studies a rental firm that offers reusable products to price- and quality-of-service sensitive MANUFACTURING & SERVICE OPERATIONS MANAGEMENT Vol., No. 3, Suer 28, pp. 429 447 issn 523-464 eissn 526-5498 8 3 429 infors doi.287/so.7.8 28 INFORMS INFORMS holds copyright to this article and distributed

More information

Evaluating Inventory Management Performance: a Preliminary Desk-Simulation Study Based on IOC Model

Evaluating Inventory Management Performance: a Preliminary Desk-Simulation Study Based on IOC Model Evaluating Inventory Manageent Perforance: a Preliinary Desk-Siulation Study Based on IOC Model Flora Bernardel, Roberto Panizzolo, and Davide Martinazzo Abstract The focus of this study is on preliinary

More information

Modeling operational risk data reported above a time-varying threshold

Modeling operational risk data reported above a time-varying threshold Modeling operational risk data reported above a tie-varying threshold Pavel V. Shevchenko CSIRO Matheatical and Inforation Sciences, Sydney, Locked bag 7, North Ryde, NSW, 670, Australia. e-ail: Pavel.Shevchenko@csiro.au

More information

Halloween Costume Ideas for the Wii Game

Halloween Costume Ideas for the Wii Game Algorithica 2001) 30: 101 139 DOI: 101007/s00453-001-0003-0 Algorithica 2001 Springer-Verlag New York Inc Optial Search and One-Way Trading Online Algoriths R El-Yaniv, 1 A Fiat, 2 R M Karp, 3 and G Turpin

More information

Adaptive Modulation and Coding for Unmanned Aerial Vehicle (UAV) Radio Channel

Adaptive Modulation and Coding for Unmanned Aerial Vehicle (UAV) Radio Channel Recent Advances in Counications Adaptive odulation and Coding for Unanned Aerial Vehicle (UAV) Radio Channel Airhossein Fereidountabar,Gian Carlo Cardarilli, Rocco Fazzolari,Luca Di Nunzio Abstract In

More information

Applying Multiple Neural Networks on Large Scale Data

Applying Multiple Neural Networks on Large Scale Data 0 International Conference on Inforation and Electronics Engineering IPCSIT vol6 (0) (0) IACSIT Press, Singapore Applying Multiple Neural Networks on Large Scale Data Kritsanatt Boonkiatpong and Sukree

More information

Capacity of Multiple-Antenna Systems With Both Receiver and Transmitter Channel State Information

Capacity of Multiple-Antenna Systems With Both Receiver and Transmitter Channel State Information IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO., OCTOBER 23 2697 Capacity of Multiple-Antenna Systes With Both Receiver and Transitter Channel State Inforation Sudharan K. Jayaweera, Student Meber,

More information

Modified Latin Hypercube Sampling Monte Carlo (MLHSMC) Estimation for Average Quality Index

Modified Latin Hypercube Sampling Monte Carlo (MLHSMC) Estimation for Average Quality Index Analog Integrated Circuits and Signal Processing, vol. 9, no., April 999. Abstract Modified Latin Hypercube Sapling Monte Carlo (MLHSMC) Estiation for Average Quality Index Mansour Keraat and Richard Kielbasa

More information

Analyzing Spatiotemporal Characteristics of Education Network Traffic with Flexible Multiscale Entropy

Analyzing Spatiotemporal Characteristics of Education Network Traffic with Flexible Multiscale Entropy Vol. 9, No. 5 (2016), pp.303-312 http://dx.doi.org/10.14257/ijgdc.2016.9.5.26 Analyzing Spatioteporal Characteristics of Education Network Traffic with Flexible Multiscale Entropy Chen Yang, Renjie Zhou

More information

An Innovate Dynamic Load Balancing Algorithm Based on Task

An Innovate Dynamic Load Balancing Algorithm Based on Task An Innovate Dynaic Load Balancing Algorith Based on Task Classification Hong-bin Wang,,a, Zhi-yi Fang, b, Guan-nan Qu,*,c, Xiao-dan Ren,d College of Coputer Science and Technology, Jilin University, Changchun

More information

Calculating the Return on Investment (ROI) for DMSMS Management. The Problem with Cost Avoidance

Calculating the Return on Investment (ROI) for DMSMS Management. The Problem with Cost Avoidance Calculating the Return on nvestent () for DMSMS Manageent Peter Sandborn CALCE, Departent of Mechanical Engineering (31) 45-3167 sandborn@calce.ud.edu www.ene.ud.edu/escml/obsolescence.ht October 28, 21

More information

An Approach to Combating Free-riding in Peer-to-Peer Networks

An Approach to Combating Free-riding in Peer-to-Peer Networks An Approach to Cobating Free-riding in Peer-to-Peer Networks Victor Ponce, Jie Wu, and Xiuqi Li Departent of Coputer Science and Engineering Florida Atlantic University Boca Raton, FL 33431 April 7, 2008

More information

Fuzzy Sets in HR Management

Fuzzy Sets in HR Management Acta Polytechnica Hungarica Vol. 8, No. 3, 2011 Fuzzy Sets in HR Manageent Blanka Zeková AXIOM SW, s.r.o., 760 01 Zlín, Czech Republic blanka.zekova@sezna.cz Jana Talašová Faculty of Science, Palacký Univerzity,

More information

PERFORMANCE METRICS FOR THE IT SERVICES PORTFOLIO

PERFORMANCE METRICS FOR THE IT SERVICES PORTFOLIO Bulletin of the Transilvania University of Braşov Series I: Engineering Sciences Vol. 4 (53) No. - 0 PERFORMANCE METRICS FOR THE IT SERVICES PORTFOLIO V. CAZACU I. SZÉKELY F. SANDU 3 T. BĂLAN Abstract:

More information

Factored Models for Probabilistic Modal Logic

Factored Models for Probabilistic Modal Logic Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008 Factored Models for Probabilistic Modal Logic Afsaneh Shirazi and Eyal Air Coputer Science Departent, University of Illinois

More information

Impact of Processing Costs on Service Chain Placement in Network Functions Virtualization

Impact of Processing Costs on Service Chain Placement in Network Functions Virtualization Ipact of Processing Costs on Service Chain Placeent in Network Functions Virtualization Marco Savi, Massio Tornatore, Giacoo Verticale Dipartiento di Elettronica, Inforazione e Bioingegneria, Politecnico

More information

The Benefit of SMT in the Multi-Core Era: Flexibility towards Degrees of Thread-Level Parallelism

The Benefit of SMT in the Multi-Core Era: Flexibility towards Degrees of Thread-Level Parallelism The enefit of SMT in the Multi-Core Era: Flexibility towards Degrees of Thread-Level Parallelis Stijn Eyeran Lieven Eeckhout Ghent University, elgiu Stijn.Eyeran@elis.UGent.be, Lieven.Eeckhout@elis.UGent.be

More information

Lecture L9 - Linear Impulse and Momentum. Collisions

Lecture L9 - Linear Impulse and Momentum. Collisions J. Peraire, S. Widnall 16.07 Dynaics Fall 009 Version.0 Lecture L9 - Linear Ipulse and Moentu. Collisions In this lecture, we will consider the equations that result fro integrating Newton s second law,

More information

Pricing Asian Options using Monte Carlo Methods

Pricing Asian Options using Monte Carlo Methods U.U.D.M. Project Report 9:7 Pricing Asian Options using Monte Carlo Methods Hongbin Zhang Exaensarbete i ateatik, 3 hp Handledare och exainator: Johan Tysk Juni 9 Departent of Matheatics Uppsala University

More information

On Computing Nearest Neighbors with Applications to Decoding of Binary Linear Codes

On Computing Nearest Neighbors with Applications to Decoding of Binary Linear Codes On Coputing Nearest Neighbors with Applications to Decoding of Binary Linear Codes Alexander May and Ilya Ozerov Horst Görtz Institute for IT-Security Ruhr-University Bochu, Gerany Faculty of Matheatics

More information

Dynamic Placement for Clustered Web Applications

Dynamic Placement for Clustered Web Applications Dynaic laceent for Clustered Web Applications A. Karve, T. Kibrel, G. acifici, M. Spreitzer, M. Steinder, M. Sviridenko, and A. Tantawi IBM T.J. Watson Research Center {karve,kibrel,giovanni,spreitz,steinder,sviri,tantawi}@us.ib.co

More information

Partitioned Elias-Fano Indexes

Partitioned Elias-Fano Indexes Partitioned Elias-ano Indexes Giuseppe Ottaviano ISTI-CNR, Pisa giuseppe.ottaviano@isti.cnr.it Rossano Venturini Dept. of Coputer Science, University of Pisa rossano@di.unipi.it ABSTRACT The Elias-ano

More information

Extended-Horizon Analysis of Pressure Sensitivities for Leak Detection in Water Distribution Networks: Application to the Barcelona Network

Extended-Horizon Analysis of Pressure Sensitivities for Leak Detection in Water Distribution Networks: Application to the Barcelona Network 2013 European Control Conference (ECC) July 17-19, 2013, Zürich, Switzerland. Extended-Horizon Analysis of Pressure Sensitivities for Leak Detection in Water Distribution Networks: Application to the Barcelona

More information

Binary Embedding: Fundamental Limits and Fast Algorithm

Binary Embedding: Fundamental Limits and Fast Algorithm Binary Ebedding: Fundaental Liits and Fast Algorith Xinyang Yi The University of Texas at Austin yixy@utexas.edu Eric Price The University of Texas at Austin ecprice@cs.utexas.edu Constantine Caraanis

More information

Quality evaluation of the model-based forecasts of implied volatility index

Quality evaluation of the model-based forecasts of implied volatility index Quality evaluation of the odel-based forecasts of iplied volatility index Katarzyna Łęczycka 1 Abstract Influence of volatility on financial arket forecasts is very high. It appears as a specific factor

More information

Markovian inventory policy with application to the paper industry

Markovian inventory policy with application to the paper industry Coputers and Cheical Engineering 26 (2002) 1399 1413 www.elsevier.co/locate/copcheeng Markovian inventory policy with application to the paper industry K. Karen Yin a, *, Hu Liu a,1, Neil E. Johnson b,2

More information

Pure Bending Determination of Stress-Strain Curves for an Aluminum Alloy

Pure Bending Determination of Stress-Strain Curves for an Aluminum Alloy Proceedings of the World Congress on Engineering 0 Vol III WCE 0, July 6-8, 0, London, U.K. Pure Bending Deterination of Stress-Strain Curves for an Aluinu Alloy D. Torres-Franco, G. Urriolagoitia-Sosa,

More information

Multi-Class Deep Boosting

Multi-Class Deep Boosting Multi-Class Deep Boosting Vitaly Kuznetsov Courant Institute 25 Mercer Street New York, NY 002 vitaly@cis.nyu.edu Mehryar Mohri Courant Institute & Google Research 25 Mercer Street New York, NY 002 ohri@cis.nyu.edu

More information

AN ALGORITHM FOR REDUCING THE DIMENSION AND SIZE OF A SAMPLE FOR DATA EXPLORATION PROCEDURES

AN ALGORITHM FOR REDUCING THE DIMENSION AND SIZE OF A SAMPLE FOR DATA EXPLORATION PROCEDURES Int. J. Appl. Math. Coput. Sci., 2014, Vol. 24, No. 1, 133 149 DOI: 10.2478/acs-2014-0011 AN ALGORITHM FOR REDUCING THE DIMENSION AND SIZE OF A SAMPLE FOR DATA EXPLORATION PROCEDURES PIOTR KULCZYCKI,,

More information

Image restoration for a rectangular poor-pixels detector

Image restoration for a rectangular poor-pixels detector Iage restoration for a rectangular poor-pixels detector Pengcheng Wen 1, Xiangjun Wang 1, Hong Wei 2 1 State Key Laboratory of Precision Measuring Technology and Instruents, Tianjin University, China 2

More information

RECURSIVE DYNAMIC PROGRAMMING: HEURISTIC RULES, BOUNDING AND STATE SPACE REDUCTION. Henrik Kure

RECURSIVE DYNAMIC PROGRAMMING: HEURISTIC RULES, BOUNDING AND STATE SPACE REDUCTION. Henrik Kure RECURSIVE DYNAMIC PROGRAMMING: HEURISTIC RULES, BOUNDING AND STATE SPACE REDUCTION Henrik Kure Dina, Danish Inforatics Network In the Agricultural Sciences Royal Veterinary and Agricultural University

More information

Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2

Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2 Exploiting Hardware Heterogeneity within the Sae Instance Type of Aazon EC2 Zhonghong Ou, Hao Zhuang, Jukka K. Nurinen, Antti Ylä-Jääski, Pan Hui Aalto University, Finland; Deutsch Teleko Laboratories,

More information

SOME APPLICATIONS OF FORECASTING Prof. Thomas B. Fomby Department of Economics Southern Methodist University May 2008

SOME APPLICATIONS OF FORECASTING Prof. Thomas B. Fomby Department of Economics Southern Methodist University May 2008 SOME APPLCATONS OF FORECASTNG Prof. Thoas B. Foby Departent of Econoics Southern Methodist University May 8 To deonstrate the usefulness of forecasting ethods this note discusses four applications of forecasting

More information

2. FINDING A SOLUTION

2. FINDING A SOLUTION The 7 th Balan Conference on Operational Research BACOR 5 Constanta, May 5, Roania OPTIMAL TIME AND SPACE COMPLEXITY ALGORITHM FOR CONSTRUCTION OF ALL BINARY TREES FROM PRE-ORDER AND POST-ORDER TRAVERSALS

More information

ASIC Design Project Management Supported by Multi Agent Simulation

ASIC Design Project Management Supported by Multi Agent Simulation ASIC Design Project Manageent Supported by Multi Agent Siulation Jana Blaschke, Christian Sebeke, Wolfgang Rosenstiel Abstract The coplexity of Application Specific Integrated Circuits (ASICs) is continuously

More information

Comment on On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes

Comment on On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes Coent on On Discriinative vs. Generative Classifiers: A Coparison of Logistic Regression and Naive Bayes Jing-Hao Xue (jinghao@stats.gla.ac.uk) and D. Michael Titterington (ike@stats.gla.ac.uk) Departent

More information

Real Time Target Tracking with Binary Sensor Networks and Parallel Computing

Real Time Target Tracking with Binary Sensor Networks and Parallel Computing Real Tie Target Tracking with Binary Sensor Networks and Parallel Coputing Hong Lin, John Rushing, Sara J. Graves, Steve Tanner, and Evans Criswell Abstract A parallel real tie data fusion and target tracking

More information

Implementation of Active Queue Management in a Combined Input and Output Queued Switch

Implementation of Active Queue Management in a Combined Input and Output Queued Switch pleentation of Active Queue Manageent in a obined nput and Output Queued Switch Bartek Wydrowski and Moshe Zukeran AR Special Research entre for Ultra-Broadband nforation Networks, EEE Departent, The University

More information

MINIMUM VERTEX DEGREE THRESHOLD FOR LOOSE HAMILTON CYCLES IN 3-UNIFORM HYPERGRAPHS

MINIMUM VERTEX DEGREE THRESHOLD FOR LOOSE HAMILTON CYCLES IN 3-UNIFORM HYPERGRAPHS MINIMUM VERTEX DEGREE THRESHOLD FOR LOOSE HAMILTON CYCLES IN 3-UNIFORM HYPERGRAPHS JIE HAN AND YI ZHAO Abstract. We show that for sufficiently large n, every 3-unifor hypergraph on n vertices with iniu

More information

COMBINING CRASH RECORDER AND PAIRED COMPARISON TECHNIQUE: INJURY RISK FUNCTIONS IN FRONTAL AND REAR IMPACTS WITH SPECIAL REFERENCE TO NECK INJURIES

COMBINING CRASH RECORDER AND PAIRED COMPARISON TECHNIQUE: INJURY RISK FUNCTIONS IN FRONTAL AND REAR IMPACTS WITH SPECIAL REFERENCE TO NECK INJURIES COMBINING CRASH RECORDER AND AIRED COMARISON TECHNIQUE: INJURY RISK FUNCTIONS IN FRONTAL AND REAR IMACTS WITH SECIAL REFERENCE TO NECK INJURIES Anders Kullgren, Maria Krafft Folksa Research, 66 Stockhol,

More information

Information Processing Letters

Information Processing Letters Inforation Processing Letters 111 2011) 178 183 Contents lists available at ScienceDirect Inforation Processing Letters www.elsevier.co/locate/ipl Offline file assignents for online load balancing Paul

More information

ESTIMATING LIQUIDITY PREMIA IN THE SPANISH GOVERNMENT SECURITIES MARKET

ESTIMATING LIQUIDITY PREMIA IN THE SPANISH GOVERNMENT SECURITIES MARKET ESTIMATING LIQUIDITY PREMIA IN THE SPANISH GOVERNMENT SECURITIES MARKET Francisco Alonso, Roberto Blanco, Ana del Río and Alicia Sanchis Banco de España Banco de España Servicio de Estudios Docuento de

More information

HOW CLOSE ARE THE OPTION PRICING FORMULAS OF BACHELIER AND BLACK-MERTON-SCHOLES?

HOW CLOSE ARE THE OPTION PRICING FORMULAS OF BACHELIER AND BLACK-MERTON-SCHOLES? HOW CLOSE ARE THE OPTION PRICING FORMULAS OF BACHELIER AND BLACK-MERTON-SCHOLES? WALTER SCHACHERMAYER AND JOSEF TEICHMANN Abstract. We copare the option pricing forulas of Louis Bachelier and Black-Merton-Scholes

More information

Preference-based Search and Multi-criteria Optimization

Preference-based Search and Multi-criteria Optimization Fro: AAAI-02 Proceedings. Copyright 2002, AAAI (www.aaai.org). All rights reserved. Preference-based Search and Multi-criteria Optiization Ulrich Junker ILOG 1681, route des Dolines F-06560 Valbonne ujunker@ilog.fr

More information

Position Auctions and Non-uniform Conversion Rates

Position Auctions and Non-uniform Conversion Rates Position Auctions and Non-unifor Conversion Rates Liad Blurosen Microsoft Research Mountain View, CA 944 liadbl@icrosoft.co Jason D. Hartline Shuzhen Nong Electrical Engineering and Microsoft AdCenter

More information

Modeling Cooperative Gene Regulation Using Fast Orthogonal Search

Modeling Cooperative Gene Regulation Using Fast Orthogonal Search 8 The Open Bioinforatics Journal, 28, 2, 8-89 Open Access odeling Cooperative Gene Regulation Using Fast Orthogonal Search Ian inz* and ichael J. Korenberg* Departent of Electrical and Coputer Engineering,

More information

Introduction to Unit Conversion: the SI

Introduction to Unit Conversion: the SI The Matheatics 11 Copetency Test Introduction to Unit Conversion: the SI In this the next docuent in this series is presented illustrated an effective reliable approach to carryin out unit conversions

More information

The Fundamentals of Modal Testing

The Fundamentals of Modal Testing The Fundaentals of Modal Testing Application Note 243-3 Η(ω) = Σ n r=1 φ φ i j / 2 2 2 2 ( ω n - ω ) + (2ξωωn) Preface Modal analysis is defined as the study of the dynaic characteristics of a echanical

More information

Audio Engineering Society. Convention Paper. Presented at the 119th Convention 2005 October 7 10 New York, New York USA

Audio Engineering Society. Convention Paper. Presented at the 119th Convention 2005 October 7 10 New York, New York USA Audio Engineering Society Convention Paper Presented at the 119th Convention 2005 October 7 10 New York, New York USA This convention paper has been reproduced fro the authors advance anuscript, without

More information

A CHAOS MODEL OF SUBHARMONIC OSCILLATIONS IN CURRENT MODE PWM BOOST CONVERTERS

A CHAOS MODEL OF SUBHARMONIC OSCILLATIONS IN CURRENT MODE PWM BOOST CONVERTERS A CHAOS MODEL OF SUBHARMONIC OSCILLATIONS IN CURRENT MODE PWM BOOST CONVERTERS Isaac Zafrany and Sa BenYaakov Departent of Electrical and Coputer Engineering BenGurion University of the Negev P. O. Box

More information

The Virtual Spring Mass System

The Virtual Spring Mass System The Virtual Spring Mass Syste J. S. Freudenberg EECS 6 Ebedded Control Systes Huan Coputer Interaction A force feedbac syste, such as the haptic heel used in the EECS 6 lab, is capable of exhibiting a

More information

Airline Yield Management with Overbooking, Cancellations, and No-Shows JANAKIRAM SUBRAMANIAN

Airline Yield Management with Overbooking, Cancellations, and No-Shows JANAKIRAM SUBRAMANIAN Airline Yield Manageent with Overbooking, Cancellations, and No-Shows JANAKIRAM SUBRAMANIAN Integral Developent Corporation, 301 University Avenue, Suite 200, Palo Alto, California 94301 SHALER STIDHAM

More information

Energy Efficient VM Scheduling for Cloud Data Centers: Exact allocation and migration algorithms

Energy Efficient VM Scheduling for Cloud Data Centers: Exact allocation and migration algorithms Energy Efficient VM Scheduling for Cloud Data Centers: Exact allocation and igration algoriths Chaia Ghribi, Makhlouf Hadji and Djaal Zeghlache Institut Mines-Téléco, Téléco SudParis UMR CNRS 5157 9, Rue

More information

An Integrated Approach for Monitoring Service Level Parameters of Software-Defined Networking

An Integrated Approach for Monitoring Service Level Parameters of Software-Defined Networking International Journal of Future Generation Counication and Networking Vol. 8, No. 6 (15), pp. 197-4 http://d.doi.org/1.1457/ijfgcn.15.8.6.19 An Integrated Approach for Monitoring Service Level Paraeters

More information

Data Streaming Algorithms for Estimating Entropy of Network Traffic

Data Streaming Algorithms for Estimating Entropy of Network Traffic Data Streaing Algoriths for Estiating Entropy of Network Traffic Ashwin Lall University of Rochester Vyas Sekar Carnegie Mellon University Mitsunori Ogihara University of Rochester Jun (Ji) Xu Georgia

More information

ABSTRACT KEYWORDS. Comonotonicity, dependence, correlation, concordance, copula, multivariate. 1. INTRODUCTION

ABSTRACT KEYWORDS. Comonotonicity, dependence, correlation, concordance, copula, multivariate. 1. INTRODUCTION MEASURING COMONOTONICITY IN M-DIMENSIONAL VECTORS BY INGE KOCH AND ANN DE SCHEPPER ABSTRACT In this contribution, a new easure of coonotonicity for -diensional vectors is introduced, with values between

More information

STRONGLY CONSISTENT ESTIMATES FOR FINITES MIX'IURES OF DISTRIBUTION FUNCTIONS ABSTRACT. An estimator for the mixing measure

STRONGLY CONSISTENT ESTIMATES FOR FINITES MIX'IURES OF DISTRIBUTION FUNCTIONS ABSTRACT. An estimator for the mixing measure STRONGLY CONSISTENT ESTIMATES FOR FINITES MIX'IURES OF DISTRIBUTION FUNCTIONS Keewhan Choi Cornell University ABSTRACT The probles with which we are concerned in this note are those of identifiability

More information

INTEGRATED ENVIRONMENT FOR STORING AND HANDLING INFORMATION IN TASKS OF INDUCTIVE MODELLING FOR BUSINESS INTELLIGENCE SYSTEMS

INTEGRATED ENVIRONMENT FOR STORING AND HANDLING INFORMATION IN TASKS OF INDUCTIVE MODELLING FOR BUSINESS INTELLIGENCE SYSTEMS Artificial Intelligence Methods and Techniques for Business and Engineering Applications 210 INTEGRATED ENVIRONMENT FOR STORING AND HANDLING INFORMATION IN TASKS OF INDUCTIVE MODELLING FOR BUSINESS INTELLIGENCE

More information

Calculation Method for evaluating Solar Assisted Heat Pump Systems in SAP 2009. 15 July 2013

Calculation Method for evaluating Solar Assisted Heat Pump Systems in SAP 2009. 15 July 2013 Calculation Method for evaluating Solar Assisted Heat Pup Systes in SAP 2009 15 July 2013 Page 1 of 17 1 Introduction This docuent describes how Solar Assisted Heat Pup Systes are recognised in the National

More information

ADJUSTING FOR QUALITY CHANGE

ADJUSTING FOR QUALITY CHANGE ADJUSTING FOR QUALITY CHANGE 7 Introduction 7.1 The easureent of changes in the level of consuer prices is coplicated by the appearance and disappearance of new and old goods and services, as well as changes

More information

Efficient Key Management for Secure Group Communications with Bursty Behavior

Efficient Key Management for Secure Group Communications with Bursty Behavior Efficient Key Manageent for Secure Group Counications with Bursty Behavior Xukai Zou, Byrav Raaurthy Departent of Coputer Science and Engineering University of Nebraska-Lincoln Lincoln, NE68588, USA Eail:

More information

Experiment 2 Index of refraction of an unknown liquid --- Abbe Refractometer

Experiment 2 Index of refraction of an unknown liquid --- Abbe Refractometer Experient Index of refraction of an unknown liquid --- Abbe Refractoeter Principle: The value n ay be written in the for sin ( δ +θ ) n =. θ sin This relation provides us with one or the standard ethods

More information

A magnetic Rotor to convert vacuum-energy into mechanical energy

A magnetic Rotor to convert vacuum-energy into mechanical energy A agnetic Rotor to convert vacuu-energy into echanical energy Claus W. Turtur, University of Applied Sciences Braunschweig-Wolfenbüttel Abstract Wolfenbüttel, Mai 21 2008 In previous work it was deonstrated,

More information

How To Get A Loan From A Bank For Free

How To Get A Loan From A Bank For Free Finance 111 Finance We have to work with oney every day. While balancing your checkbook or calculating your onthly expenditures on espresso requires only arithetic, when we start saving, planning for retireent,

More information

A Scalable Application Placement Controller for Enterprise Data Centers

A Scalable Application Placement Controller for Enterprise Data Centers W WWW 7 / Track: Perforance and Scalability A Scalable Application Placeent Controller for Enterprise Data Centers Chunqiang Tang, Malgorzata Steinder, Michael Spreitzer, and Giovanni Pacifici IBM T.J.

More information

Managing Complex Network Operation with Predictive Analytics

Managing Complex Network Operation with Predictive Analytics Managing Coplex Network Operation with Predictive Analytics Zhenyu Huang, Pak Chung Wong, Patrick Mackey, Yousu Chen, Jian Ma, Kevin Schneider, and Frank L. Greitzer Pacific Northwest National Laboratory

More information

PREDICTION OF POSSIBLE CONGESTIONS IN SLA CREATION PROCESS

PREDICTION OF POSSIBLE CONGESTIONS IN SLA CREATION PROCESS PREDICTIO OF POSSIBLE COGESTIOS I SLA CREATIO PROCESS Srećko Krile University of Dubrovnik Departent of Electrical Engineering and Coputing Cira Carica 4, 20000 Dubrovnik, Croatia Tel +385 20 445-739,

More information

Equivalent Tapped Delay Line Channel Responses with Reduced Taps

Equivalent Tapped Delay Line Channel Responses with Reduced Taps Equivalent Tapped Delay Line Channel Responses with Reduced Taps Shweta Sagari, Wade Trappe, Larry Greenstein {shsagari, trappe, ljg}@winlab.rutgers.edu WINLAB, Rutgers University, North Brunswick, NJ

More information

Stable Learning in Coding Space for Multi-Class Decoding and Its Extension for Multi-Class Hypothesis Transfer Learning

Stable Learning in Coding Space for Multi-Class Decoding and Its Extension for Multi-Class Hypothesis Transfer Learning Stable Learning in Coding Space for Multi-Class Decoding and Its Extension for Multi-Class Hypothesis Transfer Learning Bang Zhang, Yi Wang 2, Yang Wang, Fang Chen 2 National ICT Australia 2 School of

More information

Protecting Small Keys in Authentication Protocols for Wireless Sensor Networks

Protecting Small Keys in Authentication Protocols for Wireless Sensor Networks Protecting Sall Keys in Authentication Protocols for Wireless Sensor Networks Kalvinder Singh Australia Developent Laboratory, IBM and School of Inforation and Counication Technology, Griffith University

More information

Performance Evaluation of Machine Learning Techniques using Software Cost Drivers

Performance Evaluation of Machine Learning Techniques using Software Cost Drivers Perforance Evaluation of Machine Learning Techniques using Software Cost Drivers Manas Gaur Departent of Coputer Engineering, Delhi Technological University Delhi, India ABSTRACT There is a treendous rise

More information

Enrolment into Higher Education and Changes in Repayment Obligations of Student Aid Microeconometric Evidence for Germany

Enrolment into Higher Education and Changes in Repayment Obligations of Student Aid Microeconometric Evidence for Germany Enrolent into Higher Education and Changes in Repayent Obligations of Student Aid Microeconoetric Evidence for Gerany Hans J. Baugartner *) Viktor Steiner **) *) DIW Berlin **) Free University of Berlin,

More information

Exercise 4 INVESTIGATION OF THE ONE-DEGREE-OF-FREEDOM SYSTEM

Exercise 4 INVESTIGATION OF THE ONE-DEGREE-OF-FREEDOM SYSTEM Eercise 4 IVESTIGATIO OF THE OE-DEGREE-OF-FREEDOM SYSTEM 1. Ai of the eercise Identification of paraeters of the euation describing a one-degree-of- freedo (1 DOF) atheatical odel of the real vibrating

More information

CLOSED-LOOP SUPPLY CHAIN NETWORK OPTIMIZATION FOR HONG KONG CARTRIDGE RECYCLING INDUSTRY

CLOSED-LOOP SUPPLY CHAIN NETWORK OPTIMIZATION FOR HONG KONG CARTRIDGE RECYCLING INDUSTRY CLOSED-LOOP SUPPLY CHAIN NETWORK OPTIMIZATION FOR HONG KONG CARTRIDGE RECYCLING INDUSTRY Y. T. Chen Departent of Industrial and Systes Engineering Hong Kong Polytechnic University, Hong Kong yongtong.chen@connect.polyu.hk

More information

The Application of Bandwidth Optimization Technique in SLA Negotiation Process

The Application of Bandwidth Optimization Technique in SLA Negotiation Process The Application of Bandwidth Optiization Technique in SLA egotiation Process Srecko Krile University of Dubrovnik Departent of Electrical Engineering and Coputing Cira Carica 4, 20000 Dubrovnik, Croatia

More information

Generating Certification Authority Authenticated Public Keys in Ad Hoc Networks

Generating Certification Authority Authenticated Public Keys in Ad Hoc Networks SECURITY AND COMMUNICATION NETWORKS Published online in Wiley InterScience (www.interscience.wiley.co). Generating Certification Authority Authenticated Public Keys in Ad Hoc Networks G. Kounga 1, C. J.

More information

Software Quality Characteristics Tested For Mobile Application Development

Software Quality Characteristics Tested For Mobile Application Development Thesis no: MGSE-2015-02 Software Quality Characteristics Tested For Mobile Application Developent Literature Review and Epirical Survey WALEED ANWAR Faculty of Coputing Blekinge Institute of Technology

More information

Energy Proportionality for Disk Storage Using Replication

Energy Proportionality for Disk Storage Using Replication Energy Proportionality for Disk Storage Using Replication Jinoh Ki and Doron Rote Lawrence Berkeley National Laboratory University of California, Berkeley, CA 94720 {jinohki,d rote}@lbl.gov Abstract Energy

More information

A quantum secret ballot. Abstract

A quantum secret ballot. Abstract A quantu secret ballot Shahar Dolev and Itaar Pitowsky The Edelstein Center, Levi Building, The Hebrerw University, Givat Ra, Jerusale, Israel Boaz Tair arxiv:quant-ph/060087v 8 Mar 006 Departent of Philosophy

More information

High Performance Chinese/English Mixed OCR with Character Level Language Identification

High Performance Chinese/English Mixed OCR with Character Level Language Identification 2009 0th International Conference on Docuent Analysis and Recognition High Perforance Chinese/English Mixed OCR with Character Level Language Identification Kai Wang Institute of Machine Intelligence,

More information

Evaluating the Effectiveness of Task Overlapping as a Risk Response Strategy in Engineering Projects

Evaluating the Effectiveness of Task Overlapping as a Risk Response Strategy in Engineering Projects Evaluating the Effectiveness of Task Overlapping as a Risk Response Strategy in Engineering Projects Lucas Grèze Robert Pellerin Nathalie Perrier Patrice Leclaire February 2011 CIRRELT-2011-11 Bureaux

More information