Bayes Point Machines

Size: px
Start display at page:

Download "Bayes Point Machines"

Transcription

1 Journal of Machine Learning Research (2) Subitted 2/; Published 8/ Bayes Point Machines Ralf Herbrich Microsoft Research, St George House, Guildhall Street, CB2 3NH Cabridge, United Kingdo Thore Graepel Technical University of Berlin, Franklinstr 28/29, 587 Berlin, Gerany Colin Capbell Departent of Engineering Matheatics, Bristol University, BS8 TR Bristol, United Kingdo Editor: Christopher K I Willias Abstract Kernel-classifiers coprise a powerful class of non-linear decision functions for binary classification The support vector achine is an exaple of a learning algorith for kernel classifiers that singles out the consistent classifier with the largest argin, ie inial real-valued output on the training saple, within the set of consistent hypotheses, the so-called version space We suggest the Bayes point achine as a well-founded iproveent which approxiates the Bayes-optial decision by the centre of ass of version space We present two algoriths to stochastically approxiate the centre of ass of version space: a billiard sapling algorith and a sapling algorith based on the well known perceptron algorith It is shown how both algoriths can be extended to allow for soft-boundaries in order to adit training errors Experientally, we find that for the zero training error case Bayes point achines consistently outperfor support vector achines on both surrogate data and real-world benchark data sets In the soft-boundary/soft-argin case, the iproveent over support vector achines is shown to be reduced Finally, we deonstrate that the realvalued output of single Bayes points on novel test points is a valid confidence easure and leads to a steady decrease in generalisation error when used as a rejection criterion Introduction Kernel achines have recently gained a lot of attention due to the popularisation of the support vector achine (Vapnik, 995) with a focus on classification and the revival of Gaussian processes for regression (Willias, 999) Subsequently, support vector achines have been odified to handle regression (Sola, 998) and Gaussian processes have been adapted to the proble of classification (Willias and Barber, 998; Opper and Winther, 2) Both schees essentially work in the sae function space that is characterised by kernels and covariance functions, respectively Whilst the foral siilarity of the two ethods is striking, the underlying paradigs of inference are very different The support vector achine was inspired by results fro statistical/pac learning theory while Gaussian processes are usually considered in a Bayesian fraework This ideological clash can be viewed as a continuation in achine learning of the by now classical disagreeent between Bayesian and frequentistic statistics (Aitchison, 964) With regard to algorithics the two schools of thought appear to favour two different ethods of learning and predicting: the support vector counity as a consequence of the forulation of the support vector achine as a quadratic prograing proble focuses on learning as optiisation while the Bayesian counity favours sapling schees based on the Bayesian posterior Of course there exists a strong relationship between the two ideas, in particular with the Bayesian axiu a posteriori (MAP) estiator being the solution of an optiisation proble In practice, optiisation based algoriths have the advantage of a unique, deterinistic solution and the availability of the cost function as an indicator of the quality of the solution In contrast, Bayesian algoriths based on sapling and voting are ore flexible and enjoy the so-called anytie property, providing a c 2 R Herbrich, T Graepel, C Capbell

2 HERBRICH, GRAEPEL, CAMPBELL relatively good solution at any point in tie Often, however, they suffer fro the coputational costs of sapling the Bayesian posterior In this paper we present the Bayes point achine as an approxiation to Bayesian inference for linear classifiers in kernel space In contrast to the Gaussian process viewpoint we do not define a Gaussian prior on the length w of the weight vector Instead, we only consider weight vectors of length w = because it is only the spatial direction of the weight vector that atters for classification It is then natural to define a unifor prior on the resulting ball-shaped hypothesis space Hence, we deterine the centre of ass of the resulting posterior that is unifor in version space, ie in the zero training error region It should be kept in ind that the centre of ass is erely an approxiation to the real Bayes point fro which the nae of the algorith was derived In order to estiate the centre of ass we suggest both a dynaic syste called a kernel billiard and an approxiative ethod that uses the perceptron algorith trained on perutations of the training saple The latter ethod proves to be efficient enough to ake the Bayes point achine applicable to large data sets An additional insight into the usefulness of the centre of ass coes fro the statistical echanics approach to neural coputing where the generalisation error for Bayesian learning algoriths has been calculated for the case of randoly constructed and unbiased patterns x (Opper and Haussler, 99) Thus if ζ is the nuber of training exaples per weight and ζ is large, the generalisation error of the centre of ass scales as 44/ζ whereas scaling with ζ is poorer for the solutions found by the linear support vector achine (scales as 5/ζ; see Opper and Kinzel, 995), Adaline (scales as 24/ ζ; see Opper et al, 99) and other approaches Of course any of the viewpoints and algoriths presented in this paper are based on extensive previous work carried out by nuerous authors in the past In particular it sees worthwhile to ention that linear classifiers have been studied intensively in two rather distinct counities: The achine learning counity and the statistical physics counity While it is beyond the scope of this paper to review the entire history of the field we would like to ephasise that our geoetrical viewpoint as expressed later in the paper has been inspired by the very original paper Playing billiard in version space by P Ruján (Ruján, 997) Also, in that paper the ter Bayes point was coined and the idea of using a billiard-like dynaical syste for unifor sapling was introduced Both we (Herbrich et al, 999a,b, 2a) and Ruján and Marchand (2) independently generalised the algorith to be applicable in kernel space Finally, following a theoretical suggestion of Watkin (993) we were able to scale up the Bayes point algorith to large data sets by using different perceptron solutions fro perutations of the training saple The paper is structured as follows: In the following section we review the basic ideas of Bayesian inference with a particular focus on classification learning Along with a discussion about the optiality of the Bayes classification strategy we show that for the special case of linear classifiers in feature space the centre of ass of all consistent classifiers is arbitrarily close to the Bayes point (with increasing training saple size) and can be efficiently estiated in the linear span of the training data Moreover, we give a geoetrical picture of support vector learning in feature space which reveals that the support vector achine can be viewed as an approxiation to the Bayes point achine In Section 3 we present two algoriths for the estiation of the centre of ass of version space one exact ethod and an approxiate ethod tailored for large training saples An extensive list of experiental results is presented in Section 4, both on sall achine learning benchark datasets as well as on large scale datasets fro the field of handwritten digit recognition In Section 5 we suarise the results and discuss soe theoretical extensions of the ethod presented In order to unburden the ain text, the lengthy proofs as well as the pseudocode have been relegated to the appendix We denote n tuples by italic bold letters (eg x =(x,,x n )), vectors by roan bold letters (eg x), rando variables by sans serif font (eg X) and vector spaces by calligraphic capitalised letters (eg X ) The sybols P,E and I denote a probability easure, the expectation of a rando variable and the indicator function, respectively 246

3 BAYES POINT MACHINES 2 A Bayesian Consideration of Learning In this section we would like to revisit the Bayesian approach to learning (see Buntine, 992; MacKay, 99; Neal, 996; Bishop, 995, for a ore detailed treatent) Suppose we are given a training saple z =(x,y)=((x,y ),,(x,y )) (X Y ) of size drawn iid fro an unknown distribution P Z = P XY Furtherore, assue we are given a fixed set H Y X of functions h : X Y referred to as hypothesis space The task of learning is then to find the function h which perfors best on new yet unseen patterns z =(x,y) drawn according to P XY Definition (Learning Algorith) A (deterinistic) learning algorith A : = Z Y X is a apping fro training saples z of arbitrary size N to functions fro X to Y The iage of A, ie {A(z) z Z } Y X, is called the effective hypothesis space H A, of the learning algorith A for the training saple size N If there exists a hypothesis space H Y X such that for every training saple size N we have H A, H we shall oit the indices on H In order to assess to quality of a function h H we assue the existence of a loss function l : Y Y R + The loss l (y,y ) R + is understood to easure the incurred cost when predicting y while the true output was y Hence we always assue that for all y Y, l (y,y)= A typical loss function for classification is the so called zero-one loss l defined as follows Definition 2 (Zero-One Loss) Given a fixed output space Y, the zero-one loss is defined by l ( y,y ) := I y y Based on the concept of a loss l, let us introduce several quality easures for hypotheses h H Definition 3 (Generalisation and Training Error) Given a probability easure P XY and a loss l : Y Y R + the generalisation error R[h] of a function h : X Y is defined by R[h] := E XY [l (h(x),y)] Given a training saple z =(x,y) (X Y ) of size and a loss l : Y Y R + the training error R ep [h,z] of a function h : X Y is given by R ep [h,z] := l (h(x i ),y i ) i= Clearly, only the generalisation error R[h] is appropriate to capture the perforance of a fixed classifier h H on new patterns z =(x,y) Nonetheless, we shall see that the training error plays a crucial role as it provides an estiate of the generalisation error based on the training saple Definition 4 (Generalisation Error of Algoriths) Suppose we are given a fixed learning algorith A : = Z Y X Then for any fixed training saple size N the generalisation error R [A] of A is defined by R [A] := E Z [R[A(Z)]], that is, the expected generalisation error of the hypotheses found by the algorith Note that for any loss function l : Y Y R + a sall generalisation error R [A] of the algorith A guarantees a sall generalisation error for ost randoly drawn training saples z because by Markov s inequality we have for ε >, P Z (R[A(Z)] > ε E Z [R[A(Z)]]) ε Hence we can view R [A] also as a perforance easure of A s hypotheses for randoly drawn training saples z Finally, let us consider a probability easure P H over the space of all possible appings fro X to Y Then, the average generalisation error of a learning algorith A is defined as follows 247

4 HERBRICH, GRAEPEL, CAMPBELL Definition 5 (Average Generalisation Error of Algoriths) Suppose we are given a fixed learning algorith A : = Z Y X Then for each fixed training saple size N the average generalisation error R [A] of A is defined by R [A] := E H [ EZ H=h [ EX [ EY X=x,H=h [l ((A(Z))(x),Y)] ]]], () that is, the average perforance of the algorith s A solution learned over the rando draw of training saples and target hypotheses The average generalisation error is the standard easure of perforance of an algorith A if we have little knowledge about the potential function h that labels all our data expressed via P H Then, the easure () averages out our ignorance about the unknown h thus considering perforance of A on average There is a noticeable relation between R [A] and R [A] if we assue that given a easure P H, the conditional distribution of outputs y given x is governed by P Y X=x (y)=p H (H(x)=y) (2) Under this condition we have that R [A]=R [A] This result, however, is not too surprising taking into account that under the assuption (2) the easure P H fully encodes the unknown relationship between inputs x and outputs y 2 The Bayesian Solution In the Bayesian fraework we are not siply interested in h := argin h H R[h] itself but in our knowledge or belief in h To this end, Bayesians use the concept of prior and posterior belief, ie the knowledge of h before having seen any data and after having seen the data which in the current case is our training saple z It is well known that under consistency rules known as Cox s axios (Cox, 946) beliefs can be apped onto probability easures P H Under these rather plausible conditions the only consistent way to transfer prior belief P H into posterior belief P H Z =z is therefore given by Bayes theore: P Z P H Z =z (h) = H=h (z) [ E H PZ H=h (z) ] P H (h)= PY X =x,h=h (y) [ E H PY X =x,h=h (y) ] P H (h) (3) The second expression is obtained by noticing that P Z H=h (z)=p Y X =x,h=h (y)p X H=h (x)=p Y X =x,h=h (y)p X (x) because hypotheses do not have an influence on the generation of patterns Based on a given loss function l we can further decopose the first ter of the nuerator of (3) known as the likelihood of h Let us assue that the probability of a class y given an instance x and an hypothesis h is inverse proportional to the exponential of the loss incurred by h on x Thus we obtain exp( β l (h(x),y)) P Y X=x,H=h (y) = exp( β l (h(x),y )) = exp( β l (h(x),y)) C (x) y Y { +exp( β) if l (h(x),y) := l (h(x),y)= = exp( β) +exp( β) if l (h(x),y) := l (h(x),y)=, (4) where C (x) is a noralisation constant which in the case of the zero-one loss l is independent 2 of x and β controls the assued level of noise Note that the loss used in the exponentiated loss likelihood function In fact, it already suffices to assue that E Y X=x [l (y,y)] = E H [l (y,h(x))], ie the prior correctly odels the conditional distribution of the classes as far as the fixed loss is concerned 2 Note that for loss functions with real-valued arguents this need not be the case which akes a noralisation independent of x quite intricate (see Sollich, 2, for a detailed treatent) 248

5 BAYES POINT MACHINES is not to be confused with the decision-theoretic loss used in the Bayesian fraework, which is introduced only after a posterior has been obtained in order to reach a risk optial decision Definition 6 (PAC Likelihood) Suppose we are given an arbitrary loss function l : Y Y R + Then, we call the function P Y X=x,H=h (y) := I y=h(x), (5) of h the PAC likelihood for h Note that (5) is the liiting case of (4) for β Assuing the PAC likelihood it iediately follows that for any prior belief P H the posterior belief P H Z =z siplifies to { PH (h) P H (V (z)) if h V (z) P H Z =z (h)=, (6) ifh / V (z) where the version space V (z) is defined as follows (see Mitchell, 977, 982) Definition 7 (Version Space) Given an hypothesis space H Y X and a training saple z =(x,y) (X Y ) of size N the version space V (z) H is defined by V (z) := { h H i {,,} : h(x i )=y i } Since all inforation contained in the training saple z is used to update the prior P H by equation (3) all that will be used to classify a novel test point x is the posterior belief P H Z =z 22 The Bayes Classification Strategy In order to classify a new test point x, for each class y the Bayes classification strategy 3 deterines the loss incurred by each hypothesis h H applied to x and weights it according to its posterior probability P H Z =z (h) The final decision is ade for the class y Y that achieves the iniu expected loss, ie Bayes z (x) := argin y Y This strategy has the following appealing property E H Z =z [l (H(x),y)] (7) Theore 8 (Optiality of the Bayes Classification Strategy) Suppose we are given a fixed hypothesis space H Y X Then, for any training saple size N, for any syetric loss l : Y Y R +, for any two easures P H and P X, aong all learning algoriths the Bayes classification strategy Bayes z given by (7) iniises the average generalisation error R [Bayes z ] under the assuption that for each h with P H (h) > y Y : x X : E Y X=x,H=h [l (y,y)] = l (y,h(x)) (8) Proof Let us consider a fixed learning algorith A Then it holds true that [ [ [ R [A] = E H EZ H=h EX EY X=x,H=h [l ((A(Z))(x),Y)] ]]] [ [ [ = E X EH EZ H=h EY X=x,H=h [l ((A(Z))(x),Y)] ]]] [ [ [ = E X EZ EH Z =z EY X=x,H=h [l ((A(Z))(x),Y)] ]]] [ [ = E X EZ EH Z =z [l ((A(Z))(X),H(X))] ]], (9) where we exchanged the order of expectations over X in the second line, applied the theore of repeated integrals (see, eg Feller, 966) in the third line and finally used (8) in the last line Using the syetry of the loss function, the inner-ost expression of (9) is iniised by the Bayes classification strategy (7) 3 The reason we do not call this apping fro X to Y a classifier is that the resulting apping is (in general) not within the hypothesis space considered beforehand 249

6 HERBRICH, GRAEPEL, CAMPBELL for any possible training saple z and any possible test point x Hence, (7) iniises the whole expression which proves the theore In order to enhance the understanding of this result let us consider the siple case of l = l and Y = {,+} Then, given a particular classifier h H having non-zero prior probability P H (h) >, by assuption (8) we require that the conditional distribution of classes y given x is delta peaked at h(x) because E Y X=x,H=h (l (y,y)) = l (y,h(x)), P Y X=x,H=h ( y) = I y h(x), P Y X=x,H=h (y) = I h(x)=y Although for a fixed h H drawn according to P H we do not know that Bayes z achieves the sallest generalisation error R[Bayes z ] we can guarantee that on average over the rando draw of h s the Bayes classification strategy is superior In fact, the optial classifier for a fixed h H is siply h itself 4 and in general Bayes z (x) h(x) for at least a few x X 23 The Bayes Point Algorith Although the Bayes classification strategy is on average the optial strategy to perfor when given liited aount of training data z, it is coputationally very deanding as it requires the evaluation of P H Z =z (l (H(x),y)) for each possible y at each new test point x (Graepel et al, 2) The proble arises because the Bayes classification strategy does not correspond to any one single classifier h H One way to tackle this proble is to require the classifier A(z) learned fro any training saple z to lie within a fixed hypothesis space H Y X containing functions h H whose evaluation at a particular test point x can be carried out efficiently Thus if it is additionally required to liit the possible solution of a learning algorith to a given hypothesis space H Y X, we can in general only hope to approxiate Bayes z Definition 9 (Bayes Point Algorith) Suppose we are given a fixed hypothesis space H X Y and a fixed loss l : Y Y R + Then, for any two easures P X and P H, the Bayes point algorith A bp is given by A bp (z) := argin h H E X [ EH Z =z [l (h(x),h(x))] ], that is, for each training saple z Z the Bayes point algorith chooses the classifier h bp := A bp (z) H that iics best the Bayes classification strategy (7) on average over randoly drawn test points The classifier A bp (z) is called the Bayes point Assuing the correctness of the odel given by (8) we furtherore reark that the Bayes point algorith A bp is the best approxiation to the Bayes classification strategy (7) in ters of the average generalisation error, ie easuring the distance of the learning algorith A for H using the distance A Bayes = R [A] R [Bayes] In this sense, for a fixed training saple z we can view the Bayes point h bp as a projection of Bayes z into the hypothesis space H Y X The difficulty with the Bayes point algorith, however, is the need to know the input distribution P X for the deterination of the hypothesis learned fro z This soehow liits the applicability of the algorith as opposed to the Bayes classification strategy which requires only broad prior knowledge about the underlying relationship expressed via soe prior belief P H 4 It is worthwhile entioning that the only inforation to be used in any classification strategy is the training saple z and the prior P H Hence it is ipossible to detect which classifier h H labels a fixed tuple x only on the basis of the labels y observed on the training saple Thus, although we ight be lucky in guessing h for a fixed h H and z Z we cannot do better than the Bayes classification strategy Bayes z when considering the average perforance the average being taken over the rando choice of the classifiers and the training saples z 25

7 BAYES POINT MACHINES 23 THE BAYES POINT FOR LINEAR CLASSIFIERS We now turn our attention to the special case of linear classifiers where we assue that N easureents of the objects x are taken by features φ i : X R thus foring a (vectorial) feature ap φ : X K l N 2 = (φ (x),,φ N (x)) Note that by this forulation the special case of vectorial objects x is autoatically taken care of by the identity ap φ(x)=x For notational convenience we use the shorthand notation 5 x for φ(x) such that x,w := N i= φ i (x)w i Hence, for a fixed apping φ the hypothesis space is given by H := { x sign( x,w ) w W }, W := {w K w = } () As each hypothesis h w is uniquely defined by its weight vector w we shall in the following consider prior beliefs P W over W, ie possible weight vectors (of unit length), in place of priors P H By construction, the output space is Y = {,+} and we furtherore consider the special case of l = l as defined by Definition 2 If we assue that the input distribution is spherically Gaussian in the feature space K of diensionality d = di(k ), ie f X (x)= ( exp x 2), () then we find that the centre of ass w c = π d 2 E W Z =z [W] EW Z =z [W] is a very good approxiation to the Bayes point w bp and converges towards w bp if the posterior belief P W Z =z becoes sharply peaked (for a siilar result see Watkin, 993) Theore (Optiality of the Centre of Mass) Suppose we are given a fixed apping φ : X K l N 2 Then, for all N, ifp X possesses the density () and the prior belief is correct, ie (8) is valid, the average generalisation error of the centre of ass as given by (2) always fulfils R [A c ] R [ Abp ] E Z [κ(ε(z))], (2) where and { arccos(ε) κ(ε) := π ε 2 if ε < 23 otherwise ε(z) := in w c,w w:p W Z =z (w)>, The lengthy proof of this theore is given in Appendix A The interesting fact to note about this result is that li ε κ(ε)= and thus whenever the prior belief P W is not vanishing for soe w, li E Z [κ(ε(z))] =, because for increasing training saple size the posterior is sharply peaked at the weight vector labelling the data 6 This shows that for increasing training saple size the centre of ass (under the posterior P W Z =z) is a good approxiation to the optial projection of the Bayes classification strategy the Bayes point Henceforth, any algorith which ais at returning the centre of ass under the posterior P W Z =z is called a Bayes point achine Note that in the case of the PAC likelihood as defined in Definition 6 the centre of ass under the posterior P W Z =z coincides with the centre of ass of version space (see Definition 7) 5 This should not be confused with x which denotes the saple (x,,x ) of training objects 6 This result is a slight generalisation of the result in Watkin (993) which only proved this to be true for the unifor prior P W 25

8 HERBRICH, GRAEPEL, CAMPBELL Ü Ü Û ¼ µ Û Ü Û Ü Û ¼ Figure : Shown is the argin a = γ x (w)= x,w under the assuption that w = x = At the sae tie, a (length of the dotted line) equals the distance of x fro the hyperplane {x x,w = } (dashed line) as well as the distance of the weight vector w fro the hyperplane {w x,w = } (dashed { line) Note, } however, that the Euclidean distance of w fro the separating boundary w W x,w = equals b(a) where b is a strictly onotonic function of its arguent 24 A (Pseudo) Bayesian Derivation of the Support Vector Machine In this section we would like to show that the well known support vector achine (Boser et al, 992; Cortes, 995; Vapnik, 995) can also be viewed as an approxiation to the centre of ass of version space V (z) in the noise free scenario, ie considering the PAC likelihood given in Definition 6, and additionally assuing that x i x : x i = φ(x i ) = const In order to see this let us recall that the support vector achine ais at axiising the argin γ z (w) of the weight vector w on the training saple z given by γ z (w) := in i {,,} y i x i,w } w {{ } γ xi (w) = in w y i x i,w, (3) i {,,} which for all w of unit length is erely the inial real-valued output (flipped to the correct sign) over the whole training saple In order to solve this proble algorithically one takes advantage of the fact that fixing the real-valued output to one (rather than the nor w of the weight vector w) renders the proble of finding the argin axiiser w SVM as a proble with a quadratic objective function ( w 2 = w w) under linear constraints (y i x i,w ), ie ( ) w SVM := argax w W in y i x i,w i {,,} argin w {v ini {,,} y i x i,v =} (4) ( w 2) (5) Note that the set of weight vectors in (5) are called the weight vectors of the canonical hyperplanes (see Vapnik, 998, p 42) and that this set is highly dependent on the given training saple Nonetheless, the solution to (5) is (up to scaling) equivalent to the solution of (4) a forulation uch ore aenable for theoretical studies Interestingly, however, the quantity γ xi (w) as iplicitly defined in (3) is not only the distance of the point y i x i fro the hyperplane having the noral w but also x i ties the Euclidean distance of the point w fro the hyperplane having the noral y i x i (see Figure ) Thus γ z (w) can be viewed as the radius of 252

9 BAYES POINT MACHINES the ball { v W w v b(γ z (w)) } that only contains weight vectors in version space V (z) Here, b : R + R + is a strictly onotonic function of its arguent and its effect is graphically depicted in Figure As a consequence thereof, axiising the argin γ z (w) over the choice of w returns the classifier w SVM that is the centre of the largest ball still inscribable in version space Note that the whole reasoning relied on the assuption that all training points x i have a constant nor in feature space K If this assuption is violated, each distance of a classifier w to the hyperplane having the noral y i x i is easured on a different scale and thus the points with the largest nor x i in feature space K have the highest influence on the resulting solution To circuvent this proble is has been suggested elsewhere that input vectors should be noralised in feature space before applying any kernel ethod in particular the support vector achine algorith (see Herbrich and Graepel, 2; Schölkopf et al, 999; Joachis, 998; Haussler, 999) Furtherore, all indices I SV {,,} at which the iniu y i x i,w SVM in (4) is attained are the ones for which y i x i,w = in the forulation (5) As the latter are called support vectors we see that the support vectors are the training points at which the largest inscribable ball touches the corresponding hyperplane { w W (yi x i,w = ) } 25 Applying the Kernel Trick When solving (5) over the possible choices of w W it is well known that the solution w SVM adits the following representation w SVM = α i x i, that is the solution to (5) ust live in the linear span of the training points This follows naturally fro the following theore (see also Schölkopf et al, 2) Theore (Representer Theore) Suppose we are given a fixed apping φ : X K l N 2, a training saple z =(x,y) Z, a cost function c : X Y R R { } strictly onotonically decreasing in the third arguent and the class of linear functions in K as given by () Then any w z W defined by w z := argin c(x,y,( x,w,, x,w )) (6) w W adits a representation of the for i= α R : w z = i= α i x i (7) The proof is given in Appendix A2 In order to see that this theore applies to support vector achines note that (4) is equivalent to the iniiser of (6) when using c(x,y,( x,w,, x,w )) = in y i y y i x i,w, which is strictly onotonically decreasing in its third arguent A slightly ore difficult arguent is necessary to see that the centre of ass (2) can also be written as a iniiser of (6) using a[ specific cost function c At first we recall that the centre of ass has the property of iniising E W Z =z w W 2] over the choice of w W (see also (3)) Theore 2 (Sufficiency of the linear span) Suppose we are given a fixed apping φ : X K l N 2 Let us assue that P W is unifor and P Y X=x,W=w (y) = f (sign(y x,w )), ie the likelihood depends on the sign of the real-valued output y x,w of w Let L x := { i= α ix i α R } be the linear span of apped data points {x,,x } and W x := W L x Then for any training saple z Z and any w W w W v 2 dp W Z =z (v)=c w v 2 dp W Z W =z (v), (8) x 253

10 HERBRICH, GRAEPEL, CAMPBELL that is, up to a constant C R + that is independent of w it suffices to consider vectors of unit length in the linear span of the apped training points {x,,x } The proof is given in Appendix A3 An iediate consequence of this theore is the fact that we only need to consider the diensional sphere W x in order to find the centre of ass under the assuption of a unifor prior P W Hence a loss function c such that (6) finds the centre of ass is given by ( ) c(x,y,( x,w,, x,w )) = 2 α i x i,w dp A Z R =(x,y) where P A Z =z is only non-zero for vectors α such that i= α ix i = and is independent of w The treendous advantage of a representation of the solution w z by (7) becoes apparent when considering the real-valued output of a classifier at any given data point (either training or test point) w z,x = α i x i,x = α i x i,x = α i k (x i,x) i= i= i= Clearly, all that is needed in the feature space K is the inner product function k (x, x) := φ(x),φ( x) (9) Reversing the chain of arguents indicates how the kernel trick ay be used to find an efficient ipleentation We fix a syetric function k : X X R called kernel and show that there exists a feature apping φ k : X K l N 2 such that (9) is valid for all x, x X A sufficient condition for k being a valid inner product function is given by Mercer s theore (see Mercer, 99) In a nutshell, whenever the evaluation of k at any given saple (x,,x ) results in a positive seidefinite atrix G ij := k (x i,x j ) then k is a so called Mercer kernel The atrix G is called the Gra atrix and is the only quantity needed in support vector and Bayes point achine learning For further details on the kernel trick the reader is referred to Schölkopf et al (999); Cristianini and Shawe-Taylor (2); Wahba (99); Vapnik (998) 3 Estiating the Bayes Point in Feature Space In order to estiate the Bayes point in feature space K we consider a Monte Carlo ethod, ie instead of exactly coputing the expectation (2) we approxiate it by an average over weight vectors w drawn according to P W Z =z and restricted to W x (see Theore 2) In the following we will restrict ourselves to the PAC likelihood given in (5) and P W being unifor on the unit sphere W K By this assuption we know that the posterior is unifor over version space (see (6)) In Figure 2 we plotted an exaple for the special case of N = 3 diensional feature space K It is, however, already very difficult to saple uniforly fro version space V (z) as this set of points lives on a convex polyhedron on the unit sphere in 7 W x In the following two subsections we present two ethods to achieve this sapling The first ethod develops on an idea of Ruján (997) (later followed up by a kernel version of the algorith in Ruján and Marchand, 2) that is based on the idea of playing billiards in version space V (z), ie after entering the version space with a very siple learning algorith such as the kernel perceptron (see Algorith ) the classifier w isconsidered as a billiardball and isbounced fora while within the convex polyhedron V (z) If this billiard is ergodic with respect to the unifor distribution over V (z), ie the travel tie of the billiard ball spent in a subset W V (z) is proportional to W V (z), then averaging over the trajectory of the billiard ball leads in the liit of an infinite nuber of bounces to the centre of ass of version space The second ethod presented tries to overcoe the large coputational deands of the billiard ethod by only approxiately achieving a unifor sapling of version space The idea is to use the perceptron 7 Note that by Theore 2 it suffices to saple fro the projection of the version space onto W x 254

11 BAYES POINT MACHINES Figure 2: Plot of a version space (convex polyhedron containing the black dot) V (z) in a 3 diensional feature space K Each hyperplane is defined by a training exaple via its noral vector y i x i learning algorith in dual variables with different perutations Π : {,,} {,,} so as to obtain different consistent classifiers w i V (z) (see Watkin, 993, for a siilar idea) Obviously, the nuber of different saples obtained is finite and thus it is ipossible to achieve exactness of the ethod in the liit of considering all perutations Nevertheless, we shall deonstrate that in particular for the task of handwritten digit recognition the achieved perforances are coparable to state-of-the-art learning algoriths Finally, we would like to reark that recently there have been presented other efficient ethods to estiate the Bayes point directly (Rychetsky et al, 2; Minka, 2) The ain idea in Rychetsky et al (2) is to work out all corners w i of version space and average over the in order to approxiate the centre of ass of version space Note that there are exactly corners because the i th corner w i satisfies x j,w i = for all j i and yi x i,w i > If X =(x,,x ) is the N atrix of apped training points x =(x,,x ) flipped to their correct side and we use the approach (7) for w this siplifies to X w i = X Xα i = Gα i =(,,,y i,,) =: y i e i where the rhs is the i th unit vector ultiplied by y i As a consequence, the expansion coefficients α i of the i th corner w i can easily be coputed as α i = y i G e i and then need to be noralised such that w i = The difficulty with this approach, however, is the fact that the inversion of the Gra atrix G is O ( 3) and is thus as coputationally coplex as support vector learning while not enjoying the anytie property of a sapling schee The algorith presented in Minka (2, Chapter 5) (also see Opper and Winther, 2, for an equivalent ethod) uses the idea of approxiating the posterior easure P W Z =z by a product of Gaussian densities so that the centre of ass can be coputed analytically Although the approxiation of the cut-off posterior over P W Z =z resulting fro the delta-peaked likelihood given in Definition 6 by Gaussian easures sees very crude at first glance, Minka could show that his ethod copares favourably to the results presented in this paper 3 Playing Billiards in Version Space In this subsection we present the billiard ethod to estiate the Bayes point, ie the centre of ass of version space when assuing a PAC likelihood and a unifor prior P W over weight vectors of unit length (the pseudo 255

12 HERBRICH, GRAEPEL, CAMPBELL º Û Ý Ü Û ¼ Û Ý Ü Û ¼ Û Ý ¾ Ü ¾ Û ¼ ½ Û Ñ ¼ ¾ Û Ý ½ Ü ½ Û ¼ Figure 3: Scheatic view of the kernel billiard algorith Starting at b V (z) a trajectory of billiard bounces b,,b 5, is calculated and then averaged over so as to obtain an estiate ŵ c of the centre of ass of version space code is given on page 275) By Theore 2 each position b of the billiard ball and each estiate w i of the centre of ass of V (z) can be expressed as linear cobinations of the apped input points, ie w = i= α i x i, b = i= γ i x i, α,γ R Without loss of generality we can ake the following ansatz for the direction vector v of the billiard ball v = i= β i x i, β R Using this notation inner products and nors in feature space K becoe b,v = i= j= γ i β j k (x i,x j ), b 2 = i, j= γ i γ j k (x i,x j ), (2) where k : X X R is a Mercer kernel and has to be chosen beforehand At the beginning we assue that w = α = Before generating a billiard trajectory in version space V (z) we first run any learning algorith to find an initial starting point b inside the version space (eg support vector learning or the kernel perceptron (see Algorith )) Then the kernel billiard algorith consists of three steps (see also Figure 3): Deterine the closest boundary in direction v i starting fro current position b i Since it is coputationally very deanding to calculate the flight tie of the billiard ball on geodesics of the hyper-sphere W x (see also Neal, 997) we ake use of the fact that the shortest distance in Euclidean space (if it exists) is also the shortest distance on the hyper-sphere W x Thus, we have for the flight tie τ j of the billiard ball at position b i in direction v i to the hyperplane with noral vector y j x j bi,x j τ j = (2) vi,x j After calculating all flight ties, we look for the sallest positive, ie c = argin τ j j {i τ i > } 256

13 BAYES POINT MACHINES Deterining the closest bounding hyperplane in Euclidean space rather than on geodesics causes probles if the surface of the hyper-sphere W x is alost orthogonal to the direction vector v i, in which case τ c If this happens we randoly generate a direction vector v i pointing towards the version space V (z) Assuing that the last bounce took place at the hyperplane having noral y c x c this condition can easily be checked by y c v i,x c > (22) Note that since the saples are taking fro the bouncing points the above procedure of dealing with the curvature of the hyper-sphere does not constitute an approxiation but is exact An alternative ethod of dealing with the proble of the curvature of the hyper-sphere W can be found in Minka (2, Section 58) 2 Update the billiard ball s position to b i+ and the new direction vector to v i+ The new point b i+ and the new direction v i+ are calculated fro b i+ = b i + τ c v i, (23) v i+ = v i 2 v i,x c x c 2 x c (24) Afterwards the position b i+ and the direction vector v i+ need to be noralised This is easily achieved by equation (2) 3 Update the centre of ass w i of the whole trajectory by the new line segent fro b i to b i+ calculated on the hyper-sphere W x Since the solution w lies on the hyper-sphere W x (see Theore ) we cannot siply update the centre of ass using a weighted vector addition Let us introduce the operation µ acting on vectors of unit length This function has to have the following properties s µ t 2 =, t s µ t = µ t s, s µ t = ρ ( s,t,µ)s + ρ 2 ( s,t,µ)t, ρ ( s,t,µ), ρ 2 ( s,t,µ) This rather arcane definition ipleents a weighted addition of s and t such that µ is the fraction between the resulting chord length t s µ t and the total chord length t s In Appendix A4 it is shown that the following forulae for ρ ( s,t,µ) and ρ 2 ( s,t,µ) ipleent such a weighted addition ρ ( s,t,µ) = µ µ2 µ 2 s,t 2, s,t + ρ 2 ( s,t,µ) = ρ ( s,t,µ) s,t ± ( µ 2 ( s,t ) ) By assuing a constant line density on the anifold V (z) the whole line between b i and b i+ can be represented by the idpoint on the anifold V (z) given by = b i + b i+ b i + b i+ Thus, one updates the centre of ass of the trajectory by ( Ξ i w i+ = ρ ( w i,, )w i + ρ 2 w i,, Ξ i + ξ i Ξ i Ξ i + ξ i ), 257

14 HERBRICH, GRAEPEL, CAMPBELL where ξ i = b i b i+ is the length of the trajectory in the i th step and Ξ i = i j= ξ j for the accuulated length up to the i th step Note that the operation µ is only an approxiation to addition operation we sought because an exact weighting would require the arc lengths rather than chord lengths As a stopping criterion we suggest coputing an upper bound on ρ 2, the weighting factor of the new part of the trajectory If this value falls below a pre-specified threshold (TOL) we stop the algorith Note that the increase in Ξ i will always lead to terination 32 Large Scale Bayes Point Machines Clearly, all we need for estiating the centre of ass of version space (2) is a set of unit length weight vectors w i drawn uniforly fro V (z) In order to save coputational resources it ight be advantageous to achieve a unifor saple only approxiately The classical perceptron learning algorith offers the possibility to obtain up to! different classifiers in version space siply by learning on different perutations of the training saple Of course due to the sparsity of the solution the nuber of different classifiers obtained is usually considerably less A classical theore to be found in Novikoff (962) guarantees the convergence of this procedure and furtherore provides an upper bound on the nuber t of istakes needed until convergence More precisely, if there exists a classifier w SVM with argin γ z (w SVM ) > (see (3)) then the nuber of istakes until convergence which is an upper bound on the sparsity of the solution is not ore than ς 2 γ 2 z (w SVM ), where ς is the sallest real nuber such that x i K ς The quantity γ z (w SVM ) is axiised for the solution w SVM found by the support vector achine, and whenever the support vector achine is theoretically justified by results fro learning theory (see Shawe-Taylor et al, 998; Vapnik, 998) the ratio ς 2 γ 2 z (w SVM ) is considerably less than, say d Algorithically, we can benefit fro this sparsity by the following trick : since w = α i x i i= all we need to store is the diensional vector α Furtherore, we keep track of the diensional vector o of real-valued outputs o i = x i,w t = α j k (x i,x j ) of the current solution at the i th training point By definition, in the beginning α = o = Now, if o i y i < we update α i by α i + y i and update o by o j o j + y i k (x i,x j ) which requires only kernel calculations (the evaluation of the i th row of the Gra atrix G) In suary, the eory requireent of this algorith is 2 and the nuber of kernel calculations is not ore than d As a consequence, the coputational requireent of this algorith is no ore than the coputational requireent for the evaluation of the argin γ z (w SVM )! We suggest to use this efficient perceptron learning algorith in order to obtain saples w i for the coputation of the centre of ass (2) In order to investigate the usefulness of this approach experientally, we copared the distribution of generalisation errors of saples obtained by perceptron learning on peruted training saples with saples obtained by a full Gibbs sapling (see Graepel and Herbrich, 2, for details on the kernel Gibbs sapler) For coputational reasons, we used only 88 training patterns and 453 test patterns of the classes and 2 fro the MNIST data set 8 In Figure 4 (a) and (b) we plotted the distribution over rando saples using the kernel 9 k ( x,x ) = ( x,x + ) 5 (25) j= Using a quantile-quantile (QQ) plot technique we can copare both distributions in one graph (see Figure 4 (c)) These plots suggest that by siple perutation of the training saple we are able to obtain a saple of classifiers exhibiting a siilar distribution of generalisation error to the one obtained by tie-consuing Gibbs sapling 8 This data set is publicly available at 9 We decided to use this kernel because it showed excellent generalisation perforance when using the support vector achine 258

15 BAYES POINT MACHINES frequency frequency kernel perceptron generalisation error generalisation error kernel Gibbs sapler (a) (b) (c) Figure 4: (a) Histogra of generalisation errors (estiated on a test set) using a kernel Gibbs sapler (b) Histogra of generalisation errors (estiated on a test set) using a kernel perceptron (c) QQ plot of distributions (a) and (b) The straight line indicates that the two distributions only differ by an additive and ultiplicative constant, ie they exhibit the sae rate of decay A very advantageous feature of this approach as copared to support vector achines are its adjustable tie and eory requireents and the anytie availability of a solution due to sapling If the training saple grows further and we are not able to spend ore tie learning, we can adjust the nuber of saples w used at the cost of slightly worse generalisation error (see also Section 4) 33 Extension to Training Error To allow for training errors we recall that the version space conditions are given by (x i,y i ) z : y i x i,w = y i j=α j k (x i,x j ) > (26) Now we introduce the following version space conditions in place of (26): (x i,y i ) z : y i j=α j k (x i,x j ) > λy i α i k(x i,x i ), (27) where λ is an adjustable paraeter related to the softness of version space boundaries Clearly, considering this fro the billiard viewpoint, equation (27) can be interpreted as allowing penetration of the walls, an idea already hinted at in Ruján (997) Since the linear decision function is invariant under any positive rescaling of expansion coefficients α, a factor α i on the right hand side akes λ scale invariant as well Although other ways of incorporating training errors are conceivable our forulation allows for a siple odification of the algoriths described in the previous two subsections To see this we note that equation (27) can be rewritten as ) > (x i,y i ) z : y i ( j=α j ( + λi i= j )k (x i,x j ) Hence we can use the above algoriths but with an additive correction to the diagonal ters of the Gra atrix This additive correction to the kernel diagonals is siilar to the quadratic argin loss used to introduce a soft argin during training of support vector achines (see Cortes, 995; Shawe-Taylor and Cristianini, 2) Another insight into the introduction of soft boundaries coes fro noting that the distance between two points x i and x j in feature space K can be written x i x j 2 = x i 2 + x j 2 2 xi,x j, 259

16 HERBRICH, GRAEPEL, CAMPBELL λ = λ = 5 λ = λ = 5 λ = 2 λ = 25 5 Figure 5: Paraeter spaces for a two diensional toy proble obtained by introducing training error via an additive correction to the diagonal ter of the kernel atrix In order to visualise the resulting paraeter space we fixed = 3 and noralised all axes by the product of eigenvalues λ λ 2 λ 3 See text for further explanation which in the case of points of unit length in feature space becoes 2( + λ k (x i,x j )) Thus, if we add λ to the diagonal eleents of the Gra atrix, the points becoe equidistant for λ This would give the resulting version space a ore regular shape As a consequence, the centre of the largest inscribable ball (support vector achine solution) would tend towards the centre of ass of the whole of version space We would like to recall that the effective paraeter space of weight vectors considered is given by { } W x := w = α i x i w 2 = α i α j xi,x j = i= i= In ters of α this can be rewritten as { α R α Gα = } G ij = x i,x j = k (xi,x j ) Let us represent the Gra atrix by its spectral decoposition, ie G = UΛU where U U = I and Λ = diag(λ,,λ ) being the diagonal atrix of eigenvalues λ i Thus we know that the paraeter space is the set of all coefficients α = U α which fulfil { α R : α Λ α = } j= This is the defining equation of an diensional axis parallel ellipsoid Now adding the ter λ to the diagonal of G akes G a full rank atrix (see Micchelli, 986) In Figure 5 we plotted the paraeter space for a 2D toy proble using only = 3 training points Although the paraeter space is 3 diensional for all λ > we obtain a pancake like paraeter space for sall values of λ Forλ the set α of adissible coefficients becoes the diensional ball, ie the training exaples becoe ore and ore orthogonal with increasing λ The way we incorporated training errors corresponds to the choice of a new kernel given by k λ (x, x) := k (x, x)+λ I x= x 26

17 BAYES POINT MACHINES Figure 6: Version spaces V (z) for two 3 diensional toy probles (Left) One can see that the approxiation of the Bayes point (diaond) by the centre of the largest inscribable ball (cross) is reasonable if the version space is regularly shaped (Right) The situation changes in the case of an elongated and asyetric version space V (z) Finally, note that this odification of the kernel has no effect on new test points x / x that are not eleents of the training saple x For an explanation of the effect of λ in the context of Gaussian processes see Opper and Winther (2) 4 Experiental Results In this section we present experiental results both on University of California, Irvine (UCI) benchark datasets and on two bigger task of handwritten digit recognition, naely US postal service (USPS) and odified National Institute of Standards (MNIST) digit recognition tasks We copared our results to the perforance of a support vector achine using reported test set perforance fro Rätsch et al (2) (UCI) Schölkopf (997, p 57) (USPS) and Cortes (995) (MNIST) All the experients were done using Algorith 2 in Appendix B 4 Artificial Data For illustration purposes we setup a toy dataset of training and test points in R 3 The data points were uniforly generated in [,] 3 and labelled by a randoly generated linear decision rule using the kernel k (x, x) = x, x In Figures 6 we illustrate the potential benefits of a Bayes point achine over a support vector achine for elongated version spaces By using the billiard algorith to estiate the Bayes point (see Subsection 3), we were able to track all positions b i where the billiard ball hits a version space boundary This allows us to easily visualise the version spaces V (z) For the exaple illustrated in Figure 6 (right) the support vector achine and Bayes point solutions with hard argins/boundaries are far apart resulting in a noticeable reduction in generalisation error of the Bayes point achines (8%) copared to the support vector achine (5%) solution whereas for regularly shaped version spaces (Figure 6 (left)) the difference is negligible (6% to 6%) publicly available at 26

18 HERBRICH, GRAEPEL, CAMPBELL SVM BPM Y 2 Y X 2 2 X Figure 7: Decision functions for a 2D toy proble of a support vector achine (SVM) (left) and Bayes point achine (BPM) (right) using hard argins (λ = ) and RBF kernels with σ = Note that the Bayes point achine result in a uch flatter function sacrificing argin (γ z (w SVM )=36 γ z (w c )=2) for soothness In a second illustrative exaple we copared the soothness of the resulting decision function when using kernels both with support vector achines and Bayes point achines In order to odel a non-linear decision surface we used the radial basis function (RBF) kernel ( ) x x 2 k (x, x)=exp 2σ 2 (28) Figure 7 shows the resulting decision functions in the hard argin/boundary case Clearly, the Bayes point achine solution appears uch soother than the support vector achine solution although its geoetrical argin of 2 is significantly saller The above exaples should only be considered as aids to enhance the understanding of the Bayes point achines algorith s properties rather than strict arguents about general superiority 42 UCI Benchark Datasets To investigate the perforance on real world datasets we copared hard argin support vector achines to Bayes point achines with hard boundaries (λ = ) when using the kernel billiard algorith described in Subsection 3 We studied the perforance on 5 standard bencharking datasets fro the UCI Repository, and banana and wavefor, two toy datasets (see Rätsch et al, 2) In each case the data was randoly partitioned into training and test sets in the ratio 6%:4% The eans and standard deviations of the average generalisation errors on the test sets are presented as percentages in the coluns headed SVM (hard argin) and BPM (λ = ) in Table As can be seen fro the results, the Bayes point achine outperfors support vector achines on alost all datasets at a statistically significant level Note, however, that the result of the t-test is strictly valid only under the assuption that training and test data were independent an assuption which ay be violated by the procedure of splitting the one data set into different pairs of training and test sets (Dietterich, 998) Thus, the resulting p values should serve only as an indication for the significance of the result In order to deonstrate the effect of positive λ (soft boundaries) we trained a Bayes point achine with soft boundaries and copared it to training a support vector achine with soft argin using the sae Gra 262

19 BAYES POINT MACHINES SVM (hard argin) BPM (hard boundary) σ p-value Heart 254±4 228±34 Thyroid 53±24 44±2 3 Diabetes 33±24 32±25 5 Wavefor 3± 2±9 2 Banana 62±5 5±4 5 Sonar 54±37 59±38 Ionosphere 9±25 5± Table : Experiental results on seven benchark datasets We used the RBF kernel given in (28) with values of σ found optial for SVMs Shown is the estiated generalisation error in percent The standard deviation was obtained on different runs The final colun gives the p values of a paired t test for the hypothesis BPM is better than SVM indicating that the iproveent is statistically significant atrix (see equation (27)) It can be shown that such a support vector achine corresponds to a soft argin support vector achine where the argin slacks are penalised quadratically (see Cortes, 995; Shawe-Taylor and Cristianini, 2; Herbrich, 2) In Figure 8 we have plotted the generalisation error as a function of λ for the toy proble fro Figure 6 and the dataset heart using the sae setup as in the previous experient We observe that the support vector achine with an l 2 soft argin achieves a iniu of the generalisation error which is close to, or just above, the iniu error which can be achieved using a Bayes point achine with positive λ This ay not be too surprising taking the change of geoetry into account (see Section 33) Thus, also the soft argin support vector achine approxiates Bayes point achine with soft boundaries Finally we would like to reark that the running tie of the kernel billiard was not uch different fro the running tie of our support vector achine ipleentation We did not use any chunking or decoposition algoriths (see, eg Osuna et al, 997; Joachis, 999; Platt, 999) which in case of support vector achines would have decreased the running tie by orders of agnitudes The ost noticeable difference in running tie was with the wavefor and banana dataset where we are given = 4 observations This can be explained by the fact that the coputational effort of the kernel billiard ethod is O ( B 2) where B is the nuber of bounces As we set our tolerance criterion TOL for stopping very low ( 4 ), the approxiate nuber B of bounces for these datasets was B Hence, in contrast to the coputational effort of using the support vector achines of O ( 3) the nuber B of bounces lead to a uch higher coputational deand when using the kernel billiard 43 Handwritten Digit Recognition For the two tasks we now consider our inputs are n n grey value iages which were transfored into n 2 diensional vectors by concatenation of the rows The grey values were taken fro the set {,,255} All iages were labelled by one of the ten classes to 9 For each of the ten classes y = {,,9} we ran the perceptron algorith L = ties each tie labelling all training points of class y by + and the reaining training points by On a Pentiu III 5 MHz with 28 MB eory each learning trial took 2 inutes (MNIST) or 2 inutes (USPS), respectively For the classification of a test iage x Note, however, that we ade use of the fact that 4% of the grey values of each iage are since they encode background Therefore, we encoded each iage as an index-value list which allows uch faster coputation of the inner products x, x and speeds up the algorith by a factor of

20 HERBRICH, GRAEPEL, CAMPBELL classification error 5 5 SVM BPM classification error SVM BPM λ λ Figure 8: Coparison of soft boundary Bayes point achine with soft argin support vector achine Plotted is the generalisation error versus λ for a toy proble using linear kernels (left) and the heart dataset using RBF kernels with σ = 3 (right) The error bars indicate one standard deviation of the estiated ean we calculated the real-valued output of all different classifiers 2 by f i (x) = x,w i w i x = r= s= (α i ) j k (x j,x) j= (α i ) r (α i ) s k (x r,x s ), k (x,x) where we used the kernel k given by (25) Here, (α i ) j refers to the expansion coefficient corresponding to the i th classifier and the j th data point Now, for each of the ten classes we calculated the real-valued decision of the Bayes point estiate ŵ c,y by 3 f bp,y (x)= x,ŵ c,y = L In a Bayesian spirit, the final decision was carried out by h bp (x) := argax y {,,9} L i= x,w i+yl f bp,y (x) Note that f bp,y (x) can be interpreted as an (unnoralised) approxiation of the posterior probability that x is of class y when restricted to the function class () (see Platt, 2) In order to test the dependence of the generalisation error on the agnitude ax y f bp,y (x) we fixed a certain rejection rate r [,] and rejected the set of r test points with the sallest value of ax y f bp,y (x) MNIST Handwritten Digits In the first of our large scale experient we used the full MNIST dataset with 6 training exaples and test exaples of grey value iages of handwritten digits The plot resulting fro learning only consistent classifiers per class and rejection based on the realvalued output of the single Bayes points is depicted in Figure 9 (left) As can be seen fro this plot, even without rejection the Bayes point has excellent generalisation perforance when copared to support vector achines which achieve a generalisation error of 4 4% Furtherore, rejection based on the real-valued 2 For notational siplicity we assue that the first L classifiers are classifiers for the class, the next L for class and so on 3 Note that in this subsection y ranges fro {,,9} 4 The result of % with the kernel (25) and a polynoial degree of four could not be reproduced and is thus considered invalid (personal counication with P Haffner) Note also that the best results with support vector achines were obtained when using a soft argin 264

Online Bagging and Boosting

Online Bagging and Boosting Abstract Bagging and boosting are two of the ost well-known enseble learning ethods due to their theoretical perforance guarantees and strong experiental results. However, these algoriths have been used

More information

Lecture L26-3D Rigid Body Dynamics: The Inertia Tensor

Lecture L26-3D Rigid Body Dynamics: The Inertia Tensor J. Peraire, S. Widnall 16.07 Dynaics Fall 008 Lecture L6-3D Rigid Body Dynaics: The Inertia Tensor Version.1 In this lecture, we will derive an expression for the angular oentu of a 3D rigid body. We shall

More information

Machine Learning Applications in Grid Computing

Machine Learning Applications in Grid Computing Machine Learning Applications in Grid Coputing George Cybenko, Guofei Jiang and Daniel Bilar Thayer School of Engineering Dartouth College Hanover, NH 03755, USA gvc@dartouth.edu, guofei.jiang@dartouth.edu

More information

Data Set Generation for Rectangular Placement Problems

Data Set Generation for Rectangular Placement Problems Data Set Generation for Rectangular Placeent Probles Christine L. Valenzuela (Muford) Pearl Y. Wang School of Coputer Science & Inforatics Departent of Coputer Science MS 4A5 Cardiff University George

More information

Reliability Constrained Packet-sizing for Linear Multi-hop Wireless Networks

Reliability Constrained Packet-sizing for Linear Multi-hop Wireless Networks Reliability Constrained acket-sizing for inear Multi-hop Wireless Networks Ning Wen, and Randall A. Berry Departent of Electrical Engineering and Coputer Science Northwestern University, Evanston, Illinois

More information

Image restoration for a rectangular poor-pixels detector

Image restoration for a rectangular poor-pixels detector Iage restoration for a rectangular poor-pixels detector Pengcheng Wen 1, Xiangjun Wang 1, Hong Wei 2 1 State Key Laboratory of Precision Measuring Technology and Instruents, Tianjin University, China 2

More information

Use of extrapolation to forecast the working capital in the mechanical engineering companies

Use of extrapolation to forecast the working capital in the mechanical engineering companies ECONTECHMOD. AN INTERNATIONAL QUARTERLY JOURNAL 2014. Vol. 1. No. 1. 23 28 Use of extrapolation to forecast the working capital in the echanical engineering copanies A. Cherep, Y. Shvets Departent of finance

More information

CRM FACTORS ASSESSMENT USING ANALYTIC HIERARCHY PROCESS

CRM FACTORS ASSESSMENT USING ANALYTIC HIERARCHY PROCESS 641 CRM FACTORS ASSESSMENT USING ANALYTIC HIERARCHY PROCESS Marketa Zajarosova 1* *Ph.D. VSB - Technical University of Ostrava, THE CZECH REPUBLIC arketa.zajarosova@vsb.cz Abstract Custoer relationship

More information

Searching strategy for multi-target discovery in wireless networks

Searching strategy for multi-target discovery in wireless networks Searching strategy for ulti-target discovery in wireless networks Zhao Cheng, Wendi B. Heinzelan Departent of Electrical and Coputer Engineering University of Rochester Rochester, NY 467 (585) 75-{878,

More information

arxiv:0805.1434v1 [math.pr] 9 May 2008

arxiv:0805.1434v1 [math.pr] 9 May 2008 Degree-distribution stability of scale-free networs Zhenting Hou, Xiangxing Kong, Dinghua Shi,2, and Guanrong Chen 3 School of Matheatics, Central South University, Changsha 40083, China 2 Departent of

More information

Applying Multiple Neural Networks on Large Scale Data

Applying Multiple Neural Networks on Large Scale Data 0 International Conference on Inforation and Electronics Engineering IPCSIT vol6 (0) (0) IACSIT Press, Singapore Applying Multiple Neural Networks on Large Scale Data Kritsanatt Boonkiatpong and Sukree

More information

SOME APPLICATIONS OF FORECASTING Prof. Thomas B. Fomby Department of Economics Southern Methodist University May 2008

SOME APPLICATIONS OF FORECASTING Prof. Thomas B. Fomby Department of Economics Southern Methodist University May 2008 SOME APPLCATONS OF FORECASTNG Prof. Thoas B. Foby Departent of Econoics Southern Methodist University May 8 To deonstrate the usefulness of forecasting ethods this note discusses four applications of forecasting

More information

6. Time (or Space) Series Analysis

6. Time (or Space) Series Analysis ATM 55 otes: Tie Series Analysis - Section 6a Page 8 6. Tie (or Space) Series Analysis In this chapter we will consider soe coon aspects of tie series analysis including autocorrelation, statistical prediction,

More information

Airline Yield Management with Overbooking, Cancellations, and No-Shows JANAKIRAM SUBRAMANIAN

Airline Yield Management with Overbooking, Cancellations, and No-Shows JANAKIRAM SUBRAMANIAN Airline Yield Manageent with Overbooking, Cancellations, and No-Shows JANAKIRAM SUBRAMANIAN Integral Developent Corporation, 301 University Avenue, Suite 200, Palo Alto, California 94301 SHALER STIDHAM

More information

Binary Embedding: Fundamental Limits and Fast Algorithm

Binary Embedding: Fundamental Limits and Fast Algorithm Binary Ebedding: Fundaental Liits and Fast Algorith Xinyang Yi The University of Texas at Austin yixy@utexas.edu Eric Price The University of Texas at Austin ecprice@cs.utexas.edu Constantine Caraanis

More information

Algorithmica 2001 Springer-Verlag New York Inc.

Algorithmica 2001 Springer-Verlag New York Inc. Algorithica 2001) 30: 101 139 DOI: 101007/s00453-001-0003-0 Algorithica 2001 Springer-Verlag New York Inc Optial Search and One-Way Trading Online Algoriths R El-Yaniv, 1 A Fiat, 2 R M Karp, 3 and G Turpin

More information

Exact Matrix Completion via Convex Optimization

Exact Matrix Completion via Convex Optimization Exact Matrix Copletion via Convex Optiization Eanuel J. Candès and Benjain Recht Applied and Coputational Matheatics, Caltech, Pasadena, CA 91125 Center for the Matheatics of Inforation, Caltech, Pasadena,

More information

ON SELF-ROUTING IN CLOS CONNECTION NETWORKS. BARRY G. DOUGLASS Electrical Engineering Department Texas A&M University College Station, TX 77843-3128

ON SELF-ROUTING IN CLOS CONNECTION NETWORKS. BARRY G. DOUGLASS Electrical Engineering Department Texas A&M University College Station, TX 77843-3128 ON SELF-ROUTING IN CLOS CONNECTION NETWORKS BARRY G. DOUGLASS Electrical Engineering Departent Texas A&M University College Station, TX 778-8 A. YAVUZ ORUÇ Electrical Engineering Departent and Institute

More information

Media Adaptation Framework in Biofeedback System for Stroke Patient Rehabilitation

Media Adaptation Framework in Biofeedback System for Stroke Patient Rehabilitation Media Adaptation Fraework in Biofeedback Syste for Stroke Patient Rehabilitation Yinpeng Chen, Weiwei Xu, Hari Sundara, Thanassis Rikakis, Sheng-Min Liu Arts, Media and Engineering Progra Arizona State

More information

Information Processing Letters

Information Processing Letters Inforation Processing Letters 111 2011) 178 183 Contents lists available at ScienceDirect Inforation Processing Letters www.elsevier.co/locate/ipl Offline file assignents for online load balancing Paul

More information

An Innovate Dynamic Load Balancing Algorithm Based on Task

An Innovate Dynamic Load Balancing Algorithm Based on Task An Innovate Dynaic Load Balancing Algorith Based on Task Classification Hong-bin Wang,,a, Zhi-yi Fang, b, Guan-nan Qu,*,c, Xiao-dan Ren,d College of Coputer Science and Technology, Jilin University, Changchun

More information

AN ALGORITHM FOR REDUCING THE DIMENSION AND SIZE OF A SAMPLE FOR DATA EXPLORATION PROCEDURES

AN ALGORITHM FOR REDUCING THE DIMENSION AND SIZE OF A SAMPLE FOR DATA EXPLORATION PROCEDURES Int. J. Appl. Math. Coput. Sci., 2014, Vol. 24, No. 1, 133 149 DOI: 10.2478/acs-2014-0011 AN ALGORITHM FOR REDUCING THE DIMENSION AND SIZE OF A SAMPLE FOR DATA EXPLORATION PROCEDURES PIOTR KULCZYCKI,,

More information

Exercise 4 INVESTIGATION OF THE ONE-DEGREE-OF-FREEDOM SYSTEM

Exercise 4 INVESTIGATION OF THE ONE-DEGREE-OF-FREEDOM SYSTEM Eercise 4 IVESTIGATIO OF THE OE-DEGREE-OF-FREEDOM SYSTEM 1. Ai of the eercise Identification of paraeters of the euation describing a one-degree-of- freedo (1 DOF) atheatical odel of the real vibrating

More information

Cooperative Caching for Adaptive Bit Rate Streaming in Content Delivery Networks

Cooperative Caching for Adaptive Bit Rate Streaming in Content Delivery Networks Cooperative Caching for Adaptive Bit Rate Streaing in Content Delivery Networs Phuong Luu Vo Departent of Coputer Science and Engineering, International University - VNUHCM, Vietna vtlphuong@hciu.edu.vn

More information

Comment on On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes

Comment on On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes Coent on On Discriinative vs. Generative Classifiers: A Coparison of Logistic Regression and Naive Bayes Jing-Hao Xue (jinghao@stats.gla.ac.uk) and D. Michael Titterington (ike@stats.gla.ac.uk) Departent

More information

On Computing Nearest Neighbors with Applications to Decoding of Binary Linear Codes

On Computing Nearest Neighbors with Applications to Decoding of Binary Linear Codes On Coputing Nearest Neighbors with Applications to Decoding of Binary Linear Codes Alexander May and Ilya Ozerov Horst Görtz Institute for IT-Security Ruhr-University Bochu, Gerany Faculty of Matheatics

More information

Stable Learning in Coding Space for Multi-Class Decoding and Its Extension for Multi-Class Hypothesis Transfer Learning

Stable Learning in Coding Space for Multi-Class Decoding and Its Extension for Multi-Class Hypothesis Transfer Learning Stable Learning in Coding Space for Multi-Class Decoding and Its Extension for Multi-Class Hypothesis Transfer Learning Bang Zhang, Yi Wang 2, Yang Wang, Fang Chen 2 National ICT Australia 2 School of

More information

Extended-Horizon Analysis of Pressure Sensitivities for Leak Detection in Water Distribution Networks: Application to the Barcelona Network

Extended-Horizon Analysis of Pressure Sensitivities for Leak Detection in Water Distribution Networks: Application to the Barcelona Network 2013 European Control Conference (ECC) July 17-19, 2013, Zürich, Switzerland. Extended-Horizon Analysis of Pressure Sensitivities for Leak Detection in Water Distribution Networks: Application to the Barcelona

More information

Pricing Asian Options using Monte Carlo Methods

Pricing Asian Options using Monte Carlo Methods U.U.D.M. Project Report 9:7 Pricing Asian Options using Monte Carlo Methods Hongbin Zhang Exaensarbete i ateatik, 3 hp Handledare och exainator: Johan Tysk Juni 9 Departent of Matheatics Uppsala University

More information

Software Quality Characteristics Tested For Mobile Application Development

Software Quality Characteristics Tested For Mobile Application Development Thesis no: MGSE-2015-02 Software Quality Characteristics Tested For Mobile Application Developent Literature Review and Epirical Survey WALEED ANWAR Faculty of Coputing Blekinge Institute of Technology

More information

Analyzing Spatiotemporal Characteristics of Education Network Traffic with Flexible Multiscale Entropy

Analyzing Spatiotemporal Characteristics of Education Network Traffic with Flexible Multiscale Entropy Vol. 9, No. 5 (2016), pp.303-312 http://dx.doi.org/10.14257/ijgdc.2016.9.5.26 Analyzing Spatioteporal Characteristics of Education Network Traffic with Flexible Multiscale Entropy Chen Yang, Renjie Zhou

More information

A Scalable Application Placement Controller for Enterprise Data Centers

A Scalable Application Placement Controller for Enterprise Data Centers W WWW 7 / Track: Perforance and Scalability A Scalable Application Placeent Controller for Enterprise Data Centers Chunqiang Tang, Malgorzata Steinder, Michael Spreitzer, and Giovanni Pacifici IBM T.J.

More information

Multi-Class Deep Boosting

Multi-Class Deep Boosting Multi-Class Deep Boosting Vitaly Kuznetsov Courant Institute 25 Mercer Street New York, NY 002 vitaly@cis.nyu.edu Mehryar Mohri Courant Institute & Google Research 25 Mercer Street New York, NY 002 ohri@cis.nyu.edu

More information

RECURSIVE DYNAMIC PROGRAMMING: HEURISTIC RULES, BOUNDING AND STATE SPACE REDUCTION. Henrik Kure

RECURSIVE DYNAMIC PROGRAMMING: HEURISTIC RULES, BOUNDING AND STATE SPACE REDUCTION. Henrik Kure RECURSIVE DYNAMIC PROGRAMMING: HEURISTIC RULES, BOUNDING AND STATE SPACE REDUCTION Henrik Kure Dina, Danish Inforatics Network In the Agricultural Sciences Royal Veterinary and Agricultural University

More information

Managing Complex Network Operation with Predictive Analytics

Managing Complex Network Operation with Predictive Analytics Managing Coplex Network Operation with Predictive Analytics Zhenyu Huang, Pak Chung Wong, Patrick Mackey, Yousu Chen, Jian Ma, Kevin Schneider, and Frank L. Greitzer Pacific Northwest National Laboratory

More information

Modeling operational risk data reported above a time-varying threshold

Modeling operational risk data reported above a time-varying threshold Modeling operational risk data reported above a tie-varying threshold Pavel V. Shevchenko CSIRO Matheatical and Inforation Sciences, Sydney, Locked bag 7, North Ryde, NSW, 670, Australia. e-ail: Pavel.Shevchenko@csiro.au

More information

AUC Optimization vs. Error Rate Minimization

AUC Optimization vs. Error Rate Minimization AUC Optiization vs. Error Rate Miniization Corinna Cortes and Mehryar Mohri AT&T Labs Research 180 Park Avenue, Florha Park, NJ 0793, USA {corinna, ohri}@research.att.co Abstract The area under an ROC

More information

ASIC Design Project Management Supported by Multi Agent Simulation

ASIC Design Project Management Supported by Multi Agent Simulation ASIC Design Project Manageent Supported by Multi Agent Siulation Jana Blaschke, Christian Sebeke, Wolfgang Rosenstiel Abstract The coplexity of Application Specific Integrated Circuits (ASICs) is continuously

More information

Markovian inventory policy with application to the paper industry

Markovian inventory policy with application to the paper industry Coputers and Cheical Engineering 26 (2002) 1399 1413 www.elsevier.co/locate/copcheeng Markovian inventory policy with application to the paper industry K. Karen Yin a, *, Hu Liu a,1, Neil E. Johnson b,2

More information

INTEGRATED ENVIRONMENT FOR STORING AND HANDLING INFORMATION IN TASKS OF INDUCTIVE MODELLING FOR BUSINESS INTELLIGENCE SYSTEMS

INTEGRATED ENVIRONMENT FOR STORING AND HANDLING INFORMATION IN TASKS OF INDUCTIVE MODELLING FOR BUSINESS INTELLIGENCE SYSTEMS Artificial Intelligence Methods and Techniques for Business and Engineering Applications 210 INTEGRATED ENVIRONMENT FOR STORING AND HANDLING INFORMATION IN TASKS OF INDUCTIVE MODELLING FOR BUSINESS INTELLIGENCE

More information

Data Streaming Algorithms for Estimating Entropy of Network Traffic

Data Streaming Algorithms for Estimating Entropy of Network Traffic Data Streaing Algoriths for Estiating Entropy of Network Traffic Ashwin Lall University of Rochester Vyas Sekar Carnegie Mellon University Mitsunori Ogihara University of Rochester Jun (Ji) Xu Georgia

More information

Physics 211: Lab Oscillations. Simple Harmonic Motion.

Physics 211: Lab Oscillations. Simple Harmonic Motion. Physics 11: Lab Oscillations. Siple Haronic Motion. Reading Assignent: Chapter 15 Introduction: As we learned in class, physical systes will undergo an oscillatory otion, when displaced fro a stable equilibriu.

More information

Factored Models for Probabilistic Modal Logic

Factored Models for Probabilistic Modal Logic Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008 Factored Models for Probabilistic Modal Logic Afsaneh Shirazi and Eyal Air Coputer Science Departent, University of Illinois

More information

Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2

Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2 Exploiting Hardware Heterogeneity within the Sae Instance Type of Aazon EC2 Zhonghong Ou, Hao Zhuang, Jukka K. Nurinen, Antti Ylä-Jääski, Pan Hui Aalto University, Finland; Deutsch Teleko Laboratories,

More information

Audio Engineering Society. Convention Paper. Presented at the 119th Convention 2005 October 7 10 New York, New York USA

Audio Engineering Society. Convention Paper. Presented at the 119th Convention 2005 October 7 10 New York, New York USA Audio Engineering Society Convention Paper Presented at the 119th Convention 2005 October 7 10 New York, New York USA This convention paper has been reproduced fro the authors advance anuscript, without

More information

Capacity of Multiple-Antenna Systems With Both Receiver and Transmitter Channel State Information

Capacity of Multiple-Antenna Systems With Both Receiver and Transmitter Channel State Information IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO., OCTOBER 23 2697 Capacity of Multiple-Antenna Systes With Both Receiver and Transitter Channel State Inforation Sudharan K. Jayaweera, Student Meber,

More information

The AGA Evaluating Model of Customer Loyalty Based on E-commerce Environment

The AGA Evaluating Model of Customer Loyalty Based on E-commerce Environment 6 JOURNAL OF SOFTWARE, VOL. 4, NO. 3, MAY 009 The AGA Evaluating Model of Custoer Loyalty Based on E-coerce Environent Shaoei Yang Econoics and Manageent Departent, North China Electric Power University,

More information

This paper studies a rental firm that offers reusable products to price- and quality-of-service sensitive

This paper studies a rental firm that offers reusable products to price- and quality-of-service sensitive MANUFACTURING & SERVICE OPERATIONS MANAGEMENT Vol., No. 3, Suer 28, pp. 429 447 issn 523-464 eissn 526-5498 8 3 429 infors doi.287/so.7.8 28 INFORMS INFORMS holds copyright to this article and distributed

More information

CLOSED-LOOP SUPPLY CHAIN NETWORK OPTIMIZATION FOR HONG KONG CARTRIDGE RECYCLING INDUSTRY

CLOSED-LOOP SUPPLY CHAIN NETWORK OPTIMIZATION FOR HONG KONG CARTRIDGE RECYCLING INDUSTRY CLOSED-LOOP SUPPLY CHAIN NETWORK OPTIMIZATION FOR HONG KONG CARTRIDGE RECYCLING INDUSTRY Y. T. Chen Departent of Industrial and Systes Engineering Hong Kong Polytechnic University, Hong Kong yongtong.chen@connect.polyu.hk

More information

PERFORMANCE METRICS FOR THE IT SERVICES PORTFOLIO

PERFORMANCE METRICS FOR THE IT SERVICES PORTFOLIO Bulletin of the Transilvania University of Braşov Series I: Engineering Sciences Vol. 4 (53) No. - 0 PERFORMANCE METRICS FOR THE IT SERVICES PORTFOLIO V. CAZACU I. SZÉKELY F. SANDU 3 T. BĂLAN Abstract:

More information

Adaptive Modulation and Coding for Unmanned Aerial Vehicle (UAV) Radio Channel

Adaptive Modulation and Coding for Unmanned Aerial Vehicle (UAV) Radio Channel Recent Advances in Counications Adaptive odulation and Coding for Unanned Aerial Vehicle (UAV) Radio Channel Airhossein Fereidountabar,Gian Carlo Cardarilli, Rocco Fazzolari,Luca Di Nunzio Abstract In

More information

Stochastic Online Scheduling on Parallel Machines

Stochastic Online Scheduling on Parallel Machines Stochastic Online Scheduling on Parallel Machines Nicole Megow 1, Marc Uetz 2, and Tark Vredeveld 3 1 Technische Universit at Berlin, Institut f ur Matheatik, Strasse des 17. Juni 136, 10623 Berlin, Gerany

More information

Evaluating the Effectiveness of Task Overlapping as a Risk Response Strategy in Engineering Projects

Evaluating the Effectiveness of Task Overlapping as a Risk Response Strategy in Engineering Projects Evaluating the Effectiveness of Task Overlapping as a Risk Response Strategy in Engineering Projects Lucas Grèze Robert Pellerin Nathalie Perrier Patrice Leclaire February 2011 CIRRELT-2011-11 Bureaux

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Modified Latin Hypercube Sampling Monte Carlo (MLHSMC) Estimation for Average Quality Index

Modified Latin Hypercube Sampling Monte Carlo (MLHSMC) Estimation for Average Quality Index Analog Integrated Circuits and Signal Processing, vol. 9, no., April 999. Abstract Modified Latin Hypercube Sapling Monte Carlo (MLHSMC) Estiation for Average Quality Index Mansour Keraat and Richard Kielbasa

More information

The Virtual Spring Mass System

The Virtual Spring Mass System The Virtual Spring Mass Syste J. S. Freudenberg EECS 6 Ebedded Control Systes Huan Coputer Interaction A force feedbac syste, such as the haptic heel used in the EECS 6 lab, is capable of exhibiting a

More information

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, ACCEPTED FOR PUBLICATION 1. Secure Wireless Multicast for Delay-Sensitive Data via Network Coding

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, ACCEPTED FOR PUBLICATION 1. Secure Wireless Multicast for Delay-Sensitive Data via Network Coding IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, ACCEPTED FOR PUBLICATION 1 Secure Wireless Multicast for Delay-Sensitive Data via Network Coding Tuan T. Tran, Meber, IEEE, Hongxiang Li, Senior Meber, IEEE,

More information

Performance Evaluation of Machine Learning Techniques using Software Cost Drivers

Performance Evaluation of Machine Learning Techniques using Software Cost Drivers Perforance Evaluation of Machine Learning Techniques using Software Cost Drivers Manas Gaur Departent of Coputer Engineering, Delhi Technological University Delhi, India ABSTRACT There is a treendous rise

More information

Modeling Cooperative Gene Regulation Using Fast Orthogonal Search

Modeling Cooperative Gene Regulation Using Fast Orthogonal Search 8 The Open Bioinforatics Journal, 28, 2, 8-89 Open Access odeling Cooperative Gene Regulation Using Fast Orthogonal Search Ian inz* and ichael J. Korenberg* Departent of Electrical and Coputer Engineering,

More information

Energy Efficient VM Scheduling for Cloud Data Centers: Exact allocation and migration algorithms

Energy Efficient VM Scheduling for Cloud Data Centers: Exact allocation and migration algorithms Energy Efficient VM Scheduling for Cloud Data Centers: Exact allocation and igration algoriths Chaia Ghribi, Makhlouf Hadji and Djaal Zeghlache Institut Mines-Téléco, Téléco SudParis UMR CNRS 5157 9, Rue

More information

Online Appendix I: A Model of Household Bargaining with Violence. In this appendix I develop a simple model of household bargaining that

Online Appendix I: A Model of Household Bargaining with Violence. In this appendix I develop a simple model of household bargaining that Online Appendix I: A Model of Household Bargaining ith Violence In this appendix I develop a siple odel of household bargaining that incorporates violence and shos under hat assuptions an increase in oen

More information

Quality evaluation of the model-based forecasts of implied volatility index

Quality evaluation of the model-based forecasts of implied volatility index Quality evaluation of the odel-based forecasts of iplied volatility index Katarzyna Łęczycka 1 Abstract Influence of volatility on financial arket forecasts is very high. It appears as a specific factor

More information

Modeling Parallel Applications Performance on Heterogeneous Systems

Modeling Parallel Applications Performance on Heterogeneous Systems Modeling Parallel Applications Perforance on Heterogeneous Systes Jaeela Al-Jaroodi, Nader Mohaed, Hong Jiang and David Swanson Departent of Coputer Science and Engineering University of Nebraska Lincoln

More information

Factor Model. Arbitrage Pricing Theory. Systematic Versus Non-Systematic Risk. Intuitive Argument

Factor Model. Arbitrage Pricing Theory. Systematic Versus Non-Systematic Risk. Intuitive Argument Ross [1],[]) presents the aritrage pricing theory. The idea is that the structure of asset returns leads naturally to a odel of risk preia, for otherwise there would exist an opportunity for aritrage profit.

More information

ESTIMATING LIQUIDITY PREMIA IN THE SPANISH GOVERNMENT SECURITIES MARKET

ESTIMATING LIQUIDITY PREMIA IN THE SPANISH GOVERNMENT SECURITIES MARKET ESTIMATING LIQUIDITY PREMIA IN THE SPANISH GOVERNMENT SECURITIES MARKET Francisco Alonso, Roberto Blanco, Ana del Río and Alicia Sanchis Banco de España Banco de España Servicio de Estudios Docuento de

More information

parallelmcmccombine: An R Package for Bayesian Methods for Big Data and Analytics

parallelmcmccombine: An R Package for Bayesian Methods for Big Data and Analytics parallelcccobine: An R Package for Bayesian ethods for Big Data and Analytics Alexey iroshnikov 1, Erin. Conlon 1* 1 Departent of atheatics and Statistics, University of assachusetts, Aherst, assachusetts,

More information

Support Vector Machine Soft Margin Classifiers: Error Analysis

Support Vector Machine Soft Margin Classifiers: Error Analysis Journal of Machine Learning Research? (2004)?-?? Subitted 9/03; Published??/04 Support Vector Machine Soft Margin Classifiers: Error Analysis Di-Rong Chen Departent of Applied Matheatics Beijing University

More information

Partitioned Elias-Fano Indexes

Partitioned Elias-Fano Indexes Partitioned Elias-ano Indexes Giuseppe Ottaviano ISTI-CNR, Pisa giuseppe.ottaviano@isti.cnr.it Rossano Venturini Dept. of Coputer Science, University of Pisa rossano@di.unipi.it ABSTRACT The Elias-ano

More information

( C) CLASS 10. TEMPERATURE AND ATOMS

( C) CLASS 10. TEMPERATURE AND ATOMS CLASS 10. EMPERAURE AND AOMS 10.1. INRODUCION Boyle s understanding of the pressure-volue relationship for gases occurred in the late 1600 s. he relationships between volue and teperature, and between

More information

MINIMUM VERTEX DEGREE THRESHOLD FOR LOOSE HAMILTON CYCLES IN 3-UNIFORM HYPERGRAPHS

MINIMUM VERTEX DEGREE THRESHOLD FOR LOOSE HAMILTON CYCLES IN 3-UNIFORM HYPERGRAPHS MINIMUM VERTEX DEGREE THRESHOLD FOR LOOSE HAMILTON CYCLES IN 3-UNIFORM HYPERGRAPHS JIE HAN AND YI ZHAO Abstract. We show that for sufficiently large n, every 3-unifor hypergraph on n vertices with iniu

More information

ABSTRACT KEYWORDS. Comonotonicity, dependence, correlation, concordance, copula, multivariate. 1. INTRODUCTION

ABSTRACT KEYWORDS. Comonotonicity, dependence, correlation, concordance, copula, multivariate. 1. INTRODUCTION MEASURING COMONOTONICITY IN M-DIMENSIONAL VECTORS BY INGE KOCH AND ANN DE SCHEPPER ABSTRACT In this contribution, a new easure of coonotonicity for -diensional vectors is introduced, with values between

More information

Preference-based Search and Multi-criteria Optimization

Preference-based Search and Multi-criteria Optimization Fro: AAAI-02 Proceedings. Copyright 2002, AAAI (www.aaai.org). All rights reserved. Preference-based Search and Multi-criteria Optiization Ulrich Junker ILOG 1681, route des Dolines F-06560 Valbonne ujunker@ilog.fr

More information

Dynamic Placement for Clustered Web Applications

Dynamic Placement for Clustered Web Applications Dynaic laceent for Clustered Web Applications A. Karve, T. Kibrel, G. acifici, M. Spreitzer, M. Steinder, M. Sviridenko, and A. Tantawi IBM T.J. Watson Research Center {karve,kibrel,giovanni,spreitz,steinder,sviri,tantawi}@us.ib.co

More information

Position Auctions and Non-uniform Conversion Rates

Position Auctions and Non-uniform Conversion Rates Position Auctions and Non-unifor Conversion Rates Liad Blurosen Microsoft Research Mountain View, CA 944 liadbl@icrosoft.co Jason D. Hartline Shuzhen Nong Electrical Engineering and Microsoft AdCenter

More information

ADJUSTING FOR QUALITY CHANGE

ADJUSTING FOR QUALITY CHANGE ADJUSTING FOR QUALITY CHANGE 7 Introduction 7.1 The easureent of changes in the level of consuer prices is coplicated by the appearance and disappearance of new and old goods and services, as well as changes

More information

Real Time Target Tracking with Binary Sensor Networks and Parallel Computing

Real Time Target Tracking with Binary Sensor Networks and Parallel Computing Real Tie Target Tracking with Binary Sensor Networks and Parallel Coputing Hong Lin, John Rushing, Sara J. Graves, Steve Tanner, and Evans Criswell Abstract A parallel real tie data fusion and target tracking

More information

PREDICTION OF MILKLINE FILL AND TRANSITION FROM STRATIFIED TO SLUG FLOW

PREDICTION OF MILKLINE FILL AND TRANSITION FROM STRATIFIED TO SLUG FLOW PREDICTION OF MILKLINE FILL AND TRANSITION FROM STRATIFIED TO SLUG FLOW ABSTRACT: by Douglas J. Reineann, Ph.D. Assistant Professor of Agricultural Engineering and Graee A. Mein, Ph.D. Visiting Professor

More information

Impact of Processing Costs on Service Chain Placement in Network Functions Virtualization

Impact of Processing Costs on Service Chain Placement in Network Functions Virtualization Ipact of Processing Costs on Service Chain Placeent in Network Functions Virtualization Marco Savi, Massio Tornatore, Giacoo Verticale Dipartiento di Elettronica, Inforazione e Bioingegneria, Politecnico

More information

A magnetic Rotor to convert vacuum-energy into mechanical energy

A magnetic Rotor to convert vacuum-energy into mechanical energy A agnetic Rotor to convert vacuu-energy into echanical energy Claus W. Turtur, University of Applied Sciences Braunschweig-Wolfenbüttel Abstract Wolfenbüttel, Mai 21 2008 In previous work it was deonstrated,

More information

A quantum secret ballot. Abstract

A quantum secret ballot. Abstract A quantu secret ballot Shahar Dolev and Itaar Pitowsky The Edelstein Center, Levi Building, The Hebrerw University, Givat Ra, Jerusale, Israel Boaz Tair arxiv:quant-ph/060087v 8 Mar 006 Departent of Philosophy

More information

Generating Certification Authority Authenticated Public Keys in Ad Hoc Networks

Generating Certification Authority Authenticated Public Keys in Ad Hoc Networks SECURITY AND COMMUNICATION NETWORKS Published online in Wiley InterScience (www.interscience.wiley.co). Generating Certification Authority Authenticated Public Keys in Ad Hoc Networks G. Kounga 1, C. J.

More information

Models and Algorithms for Stochastic Online Scheduling 1

Models and Algorithms for Stochastic Online Scheduling 1 Models and Algoriths for Stochastic Online Scheduling 1 Nicole Megow Technische Universität Berlin, Institut für Matheatik, Strasse des 17. Juni 136, 10623 Berlin, Gerany. eail: negow@ath.tu-berlin.de

More information

Efficient Key Management for Secure Group Communications with Bursty Behavior

Efficient Key Management for Secure Group Communications with Bursty Behavior Efficient Key Manageent for Secure Group Counications with Bursty Behavior Xukai Zou, Byrav Raaurthy Departent of Coputer Science and Engineering University of Nebraska-Lincoln Lincoln, NE68588, USA Eail:

More information

Markov Models and Their Use for Calculations of Important Traffic Parameters of Contact Center

Markov Models and Their Use for Calculations of Important Traffic Parameters of Contact Center Markov Models and Their Use for Calculations of Iportant Traffic Paraeters of Contact Center ERIK CHROMY, JAN DIEZKA, MATEJ KAVACKY Institute of Telecounications Slovak University of Technology Bratislava

More information

Lecture L9 - Linear Impulse and Momentum. Collisions

Lecture L9 - Linear Impulse and Momentum. Collisions J. Peraire, S. Widnall 16.07 Dynaics Fall 009 Version.0 Lecture L9 - Linear Ipulse and Moentu. Collisions In this lecture, we will consider the equations that result fro integrating Newton s second law,

More information

Matching and decision for Vehicle tracking in road situation

Matching and decision for Vehicle tracking in road situation Matching and decision for Vehicle tracking in road situation Doinique GRUYER, Véronique BERGE-CHERFAOUI Heudiasyc-UMR CNRS 6599 Université de Technologie de Copiègne BP 0 59, 6005 Copiègne, France e-ail

More information

Construction Economics & Finance. Module 3 Lecture-1

Construction Economics & Finance. Module 3 Lecture-1 Depreciation:- Construction Econoics & Finance Module 3 Lecture- It represents the reduction in arket value of an asset due to age, wear and tear and obsolescence. The physical deterioration of the asset

More information

Reconnect 04 Solving Integer Programs with Branch and Bound (and Branch and Cut)

Reconnect 04 Solving Integer Programs with Branch and Bound (and Branch and Cut) Sandia is a ultiprogra laboratory operated by Sandia Corporation, a Lockheed Martin Copany, Reconnect 04 Solving Integer Progras with Branch and Bound (and Branch and Cut) Cynthia Phillips (Sandia National

More information

A framework for performance monitoring, load balancing, adaptive timeouts and quality of service in digital libraries

A framework for performance monitoring, load balancing, adaptive timeouts and quality of service in digital libraries Int J Digit Libr (2000) 3: 9 35 INTERNATIONAL JOURNAL ON Digital Libraries Springer-Verlag 2000 A fraework for perforance onitoring, load balancing, adaptive tieouts and quality of service in digital libraries

More information

Example: Suppose that we deposit $1000 in a bank account offering 3% interest, compounded monthly. How will our money grow?

Example: Suppose that we deposit $1000 in a bank account offering 3% interest, compounded monthly. How will our money grow? Finance 111 Finance We have to work with oney every day. While balancing your checkbook or calculating your onthly expenditures on espresso requires only arithetic, when we start saving, planning for retireent,

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

HOW CLOSE ARE THE OPTION PRICING FORMULAS OF BACHELIER AND BLACK-MERTON-SCHOLES?

HOW CLOSE ARE THE OPTION PRICING FORMULAS OF BACHELIER AND BLACK-MERTON-SCHOLES? HOW CLOSE ARE THE OPTION PRICING FORMULAS OF BACHELIER AND BLACK-MERTON-SCHOLES? WALTER SCHACHERMAYER AND JOSEF TEICHMANN Abstract. We copare the option pricing forulas of Louis Bachelier and Black-Merton-Scholes

More information

Partitioning Data on Features or Samples in Communication-Efficient Distributed Optimization?

Partitioning Data on Features or Samples in Communication-Efficient Distributed Optimization? Partitioning Data on Features or Saples in Counication-Efficient Distributed Optiization? Chenxin Ma Industrial and Systes Engineering Lehigh University, USA ch54@lehigh.edu Martin Taáč Industrial and

More information

An Optimal Task Allocation Model for System Cost Analysis in Heterogeneous Distributed Computing Systems: A Heuristic Approach

An Optimal Task Allocation Model for System Cost Analysis in Heterogeneous Distributed Computing Systems: A Heuristic Approach An Optial Tas Allocation Model for Syste Cost Analysis in Heterogeneous Distributed Coputing Systes: A Heuristic Approach P. K. Yadav Central Building Research Institute, Rooree- 247667, Uttarahand (INDIA)

More information

Gaussian Processes for Regression: A Quick Introduction

Gaussian Processes for Regression: A Quick Introduction Gaussian Processes for Regression A Quick Introduction M Ebden, August 28 Coents to arkebden@engoacuk MOTIVATION Figure illustrates a typical eaple of a prediction proble given soe noisy observations of

More information

Electric Forces between Charged Plates

Electric Forces between Charged Plates CP.1 Goals of this lab Electric Forces between Charged Plates Overview deterine the force between charged parallel plates easure the perittivity of the vacuu (ε 0 ) In this experient you will easure the

More information

The Velocities of Gas Molecules

The Velocities of Gas Molecules he Velocities of Gas Molecules by Flick Colean Departent of Cheistry Wellesley College Wellesley MA 8 Copyright Flick Colean 996 All rights reserved You are welcoe to use this docuent in your own classes

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Evaluating Inventory Management Performance: a Preliminary Desk-Simulation Study Based on IOC Model

Evaluating Inventory Management Performance: a Preliminary Desk-Simulation Study Based on IOC Model Evaluating Inventory Manageent Perforance: a Preliinary Desk-Siulation Study Based on IOC Model Flora Bernardel, Roberto Panizzolo, and Davide Martinazzo Abstract The focus of this study is on preliinary

More information

REQUIREMENTS FOR A COMPUTER SCIENCE CURRICULUM EMPHASIZING INFORMATION TECHNOLOGY SUBJECT AREA: CURRICULUM ISSUES

REQUIREMENTS FOR A COMPUTER SCIENCE CURRICULUM EMPHASIZING INFORMATION TECHNOLOGY SUBJECT AREA: CURRICULUM ISSUES REQUIREMENTS FOR A COMPUTER SCIENCE CURRICULUM EMPHASIZING INFORMATION TECHNOLOGY SUBJECT AREA: CURRICULUM ISSUES Charles Reynolds Christopher Fox reynolds @cs.ju.edu fox@cs.ju.edu Departent of Coputer

More information