Support Vector Machine Soft Margin Classifiers: Error Analysis

Size: px
Start display at page:

Download "Support Vector Machine Soft Margin Classifiers: Error Analysis"

Transcription

1 Journal of Machine Learning Research? (2004)?-?? Subitted 9/03; Published??/04 Support Vector Machine Soft Margin Classifiers: Error Analysis Di-Rong Chen Departent of Applied Matheatics Beijing University of Aeronautics and Astronautics Beijing , P R CHINA Qiang Wu Yiing Ying Ding-Xuan Zhou Departent of Matheatics City University of Hong Kong Kowloon, Hong Kong, P R CHINA drchen@buaaeducn wuqiang@studentcityueduhk yying@cityueduhk azhou@athcityueduhk Editor: Peter Bartlett Abstract The purpose of this paper is to provide a PAC error analysis for the q-nor soft argin classifier, a support vector achine classification algorith It consists of two parts: regularization error and saple error While any techniques are available for treating the saple error, uch less is known for the regularization error and the corresponding approxiation error for reproducing kernel Hilbert spaces We are ainly concerned about the regularization error It is estiated for general distributions by a K-functional in weighted L q spaces For weakly separable distributions (ie, the argin ay be zero) satisfactory convergence rates are provided by eans of separating functions A projection operator is introduced, which leads to better saple error estiates especially for sall coplexity kernels The isclassification error is bounded by the V -risk associated with a general class of loss functions V The difficulty of bounding the offset is overcoe Polynoial kernels and Gaussian kernels are used to deonstrate the ain results The choice of the regularization paraeter plays an iportant role in our analysis Keywords: support vector achine classification, isclassification error, q-nor soft argin classifier, regularization error, approxiation error 1 Introduction In this paper we study support vector achine (SVM) classification algoriths and investigate the SVM q-nor soft argin classifier with 1 < q < Our purpose is to provide an error analysis for this algorith in the PAC fraework Let (X, d) be a copact etric space and Y = 1, 1 A binary classifier f : X 1, 1 is a function fro X to Y which divides the input space X into two classes Let ρ be a probability distribution on Z := X Y and (X, Y) be the corresponding rando variable The isclassification error for a classifier f : X Y is defined to be the c 2004 Di-Rong Chen, Qiang Wu, Yiing Ying, and Ding-Xuan Zhou

2 Chen, Wu, Ying, and Zhou probability of the event f(x ) Y: R(f) := Prob f(x ) Y = X P (Y = f(x) x)dρ X (x) (1) Here ρ X is the arginal distribution on X and P ( x) is the conditional probability easure given X = x The SVM q-nor soft argin classifier (Cortes & Vapnik 1995; Vapnik 1998) is constructed fro saples and depends on a reproducing kernel Hilbert space associated with a Mercer kernel Let K : X X R be continuous, syetric and positive seidefinite, ie, for any finite set of distinct points x 1,, x l X, the atrix (K(x i, x j )) l i,j=1 is positive seidefinite Such a kernel is called a Mercer kernel The Reproducing Kernel Hilbert Space (RKHS) H K associated with the kernel K is defined (Aronszajn, 1950) to be the closure of the linear span of the set of functions K x := K(x, ) : x X with the inner product, HK =, K satisfying K x, K y K = K(x, y) and K x, g K = g(x), x X, g H K Denote C(X) as the space of continuous functions on X with the nor Let κ := K Then the above reproducing property tells us that g κ g K, g H K (2) Define H K := H K + R For a function f = f 1 + b with f 1 H K and b R, we denote f = f 1 and b f = b R The constant ter b is called the offset For a function f : X R, the sign function is defined as sgn(f)(x) = 1 if f(x) 0 and sgn(f)(x) = 1 if f(x) < 0 Now the SVM q-nor soft argin classifier (SVM q-classifier) associated with the Mercer kernel K is defined as sgn (f z ), where f z is a iniizer of the following optiization proble involving a set of rando saples z = (x i, y i ) i=1 Z independently drawn according to ρ: 1 f z := arg in f H K 2 f 2 K + C ξ q i, i=1 subject to y i f(x i ) 1 ξ i, and ξ i 0 for i = 1,, Here C is a constant which depends on : C = C(), and often li C() = Throughout the paper, we assue 1 < q <, N, C > 0, and z = (x i, y i ) i=1 are rando saples independently drawn according to ρ Our target is to understand how sgn(f z ) converges (with respect to the isclassification error) to the best classifier, the Bayes rule, as and hence C() tend to infinity Recall the regression function of ρ: f ρ (x) = ydρ(y x) = P (Y = 1 x) P (Y = 1 x), x X (4) Y Then the Bayes rule is given (eg Devroye, L Györfi & G Lugosi 1997) by the sign of the regression function f c := sgn(f ρ ) Estiating the excess isclassification error (3) R(sgn(f z )) R(f c ) (5) 2

3 SVM Soft Margin Classifiers for the classification algorith (3) is our goal In particular, we try to understand how the choice of the regularization paraeter C affects the error To investigate the error bounds for general kernels, we rewrite (3) as a regularization schee Define the loss function V = V q as V (y, f(x)) := (1 yf(x)) q + = y f(x) q χ yf(x) 1, (6) where (t) + = ax0, t The corresponding V -risk is E(f) := E(V (y, f(x))) = V (y, f(x))dρ(x, y) (7) If we set the epirical error as E z (f) := 1 Z V (y i, f(x i )) = 1 i=1 (1 y i f(x i )) q +, (8) then the schee (3) can be written as (see Evgeniou, Pontil & Poggio 2000) f z = arg in E z (f) + 1 f H K 2C f 2 K (9) Notice that when H K is replaced by H K, the schee (9) is exactly the Tikhonov regularization schee (Tikhonov & Arsenin, 1977) associated with the loss function V So one ay hope that the ethod for analyzing regularization schees can be applied The definitions of the V -risk (7) and the epirical error (8) tell us that for a function f = f + b H K, the rando variable ξ = V (y, f(x)) on Z has the ean E(f) and 1 i=1 ξ(z i) = E z (f) Thus we ay expect by soe standard epirical risk iniization (ERM) arguent (eg Cucker & Sale 2001, Evgeniou, Pontil & Poggio 2000, Shawe- Taylor et al 1998, Vapnik 1998, Wahba 1990) to derive bounds for E(f z ) E(f q ), where f q is a iniizer of the V -risk (7) It was shown in Lin (2002) that for q > 1 such a iniizer is given by i=1 f q (x) = (1 + f ρ(x)) 1/(q 1) (1 f ρ (x)) 1/(q 1) (1 + f ρ (x)) 1/(q 1), x X (10) + (1 f ρ (x)) 1/(q 1) For q = 1 a iniizer is f c, see Wahba (1999) Note that sgn(f q ) = f c Recall that for the classification algorith (3), we are interested in the excess isclassification error (5), not the excess V -risk E(f z ) E(f q ) But we shall see in Section 3 that R(sgn(f)) R(f c ) 2 (E(f) E(f q )) One ight apply this to the function f z and get estiates for (5) However, special features of the loss function (6) enable us to do better: by restricting f z onto [ 1, 1], we can iprove the saple error estiates The idea of the following projection operator was introduced for this purpose in Bartlett (1998) Definition 1 The projection operator π is defined on the space of easurable functions f : X R as 1, if f(x) 1, π(f)(x) = 1, if f(x) 1, (11) f(x), if 1 < f(x) < 1 3

4 Chen, Wu, Ying, and Zhou It is trivial that sgn(π(f)) = sgn(f) Hence R(sgn(f z )) R(f c ) 2 (E(π(f z )) E(f q )) (12) The definition of the loss function (6) also tells us that V (y, π(f)(x)) V (y, f(x)), so E(π(f)) E(f) and E z (π(f)) E z (f) (13) According to (12), we need to estiate E(π(f z )) E(f q ) in order to bound (5) To this end, we introduce a regularizing function f K,C H K It is arbitrarily chosen and depends on C Proposition 2 Let f K,C H K, and f z be defined by (9) Then E(π(f z )) E(f q ) can be bounded by E(f K,C ) E(f q ) + 1 2C f K,C 2 K + E(π(f z )) E z (π(f z )) + E z (f K,C ) E(f K,C ) (14) Proof Decopose the difference E(π(f z )) E(f q ) as ( E(π(f z )) E z (π(f z )) + E z (π(f z )) E z (f K,C ) E(f K,C ) + ) ( E z (f K,C ) + 1 2C f z 2 K E(f K,C ) E(f q ) + 1 2C f K,C 2 K 2C f K,C 2 K 1 2C f z 2 K By the definition of f z and (13), the second ter is 0 Then the stateent follows ) The first ter in (14) is called the regularization error (Sale & Zhou 2004) It can be expressed as a generalized K-functional inf E(f) E(fq ) + 1 2C f 2 K when fk,c takes f H K a special choice f K,C (a standard choice in the literature, eg Steinwart 2001) defined as f K,C := arg in E(f) + 1 f H K 2C f 2 K (15) Definition 3 Let V be a general loss function and f V ρ be a iniizer of the V -risk (7) The regularization error for the regularizing function f K,C H K is defined as D(C) := E(f K,C ) E(f V ρ ) + 1 2C f K,C 2 K (16) It is called the regularization error of the schee (9) when f K,C = f K,C : D(C) := inf E(f) E(fρ V ) + 1 f H K 2C f 2 K 4

5 SVM Soft Margin Classifiers The ain concern of this paper is the regularization error We shall investigate its asyptotic behavior This investigation is not only iportant for bounding the first ter in (14), but also crucial for bounding the second ter (saple error): it is well known in structural risk iniization (Shawe-Taylor et al 1998) that the size of the hypothesis space is essential This is deterined by D(C) in our setting Once C is fixed, the saple error estiate becoes routine Therefore, we need to understand the choice of the paraeter C fro the bound for D(C) Proposition 4 For any C 1/2 and any f K,C H K, there holds where κ := E 0 /(1 + κq4 q 1 ), D(C) D(C) κ2 2C (17) E 0 := inf b R E(b) E(f q) (18) Moreover, κ = 0 if and only if for soe p 0 [0, 1], P (Y = 1 x) = p 0 in probability This proposition will be proved in Section 5 According to Proposition 4, the decay of D(C) cannot be faster than O(1/C) except for soe very special distributions This special case is caused by the offset in (9), for which D(C) 0 Throughout this paper we shall ignore this trivial case and assue κ > 0 When ρ is strictly separable, D(C) = O(1/C) But this is a very special phenoenon In general, one should not expect E(f) = E(f q ) for soe f H K Even for (weakly) separable distributions with zero argin, D(C) decays as O(C p ) for soe 0 < p < 1 To realize such a decay for these separable distributions, the regularizing function will be ultiples of a separating function For details and the concepts of strictly or weakly separable distributions, see Section 2 For general distributions, we shall choose f K,C = f K,C in Sections 6 and 7 and estiate the regularization error of the schee (9) associated with the loss function (6) by eans of the approxiation in the function space L q ρ X In particular, D(C) 1 K(fq, 2C ), where K(f q, t) is a K-functional defined as K(f q, t) := inf f f q q L q + t f 2 ρ K f H K X inf q2 q 1 (2 q 1 + 1) f f q L q ρx + t f 2 K f H K, if 1 < q 2,, if q > 2 In the case q = 1, the regularization error (Wu & Zhou, 2004) depends on the approxiation in L 1 ρ X of the function f c which is not continuous in general For q > 1, the regularization error depends on the approxiation in L q ρ X of the function f q When the regression function has good soothness, f q has uch higher regularity than f c Hence the convergence for q > 1 ay be faster than that for q = 1, which iproves the regularization error The second ter in (14) is called the saple error When C is fixed, the saple error E(f z ) E z (f z ) is well understood (except for the offset ter) In (14), the saple error is for the function π(f z ) instead of f z while the isclassification error (5) is kept: (19) 5

6 Chen, Wu, Ying, and Zhou R(sgn(π(f z ))) = R(sgn(f z )) Since the bound for V (y, π(f)(x)) is uch saller than that for V (y, f(x)), the projection iproves the saple error estiate Based on estiates for the regularization error and saple error above, our error analysis will provide ε(δ,, C, β) > 0 for any 0 < δ < 1 such that with confidence 1 δ, E(π(f z )) E(f q ) (1 + β)d(c) + ε(δ,, C, β) (20) Here 0 < β 1 is an arbitrarily fixed nuber Moreover, li ε(δ,, C, β) = 0 If f q lies in the L q ρ X -closure of H K, then li K(f q, t) = 0 by (19) Hence E(π(f z )) t 0 E(f q ) 0 with confidence as (and hence C = C()) becoes large This is the case when K is a universal kernel, ie, H K is dense in C(X), or when a sequence of kernels whose RKHS tends to be dense (eg polynoial kernels with increasing degrees) is used In suary, estiating the excess isclassification error R(sgn(f z )) R(f c ) consists of three parts: the coparison (12), the regularization error D(C) and the saple error ε(δ,, C, β) in (20) As functions of the variable C, D(C) decreases while ε(δ,, C, β) usually increases Choosing suitable values for the regularization paraeter C will give the optial convergence rate To this end, we need to consider the tradeoff between these two errors This can be done by iniizing the right side of (20), as shown in the following for Lea 5 Let p, α, τ > 0 Denote c p,α,τ := (p/τ) τ τ+p + (τ/p) p τ+p Then for any C > 0, ( ) αp C p + Cτ 1 α c τ+p p,α,τ (21) The equality holds if and only if C = (p/τ) 1 τ+p α τ+p This yields the optial power αp τ+p The goal of the regularization error estiates is to have p in D(C) = O(C p ), as large as possible But p 1 according to Proposition 4 Good ethods for saple error estiates provide large α and sall τ such that ε(δ,, C, β) = O( Cτ ) Notice that, as always, both α the approxiation properties (represented by the exponent p) and the estiation properties (represented by the exponents τ and α) are iportant to get good estiates for the learning rates (with the optial rate αp τ+p ) 2 Deonstrating with Weakly Separable Distributions With soe special cases let us deonstrate how our error analysis yields soe guidelines for choosing the regularization paraeter C For weakly separable distributions, we also copare our results with bounds in the literature To this end, we need the covering nuber of the unit ball B := f H K : f K 1 of H K (considered as a subset of C(X)) Definition 6 For a subset F of a etric space and η > 0, the covering nuber N (F, η) is defined to be the inial integer l N such that there exist l disks with radius η covering F 6

7 SVM Soft Margin Classifiers Denote the covering nuber of B in C(X) by N (B, η) Recall the constant κ defined in (18) Since the algorith involves an offset ter, we also need its covering nuber and set N (η) := (κ + 1 κ ) 1 η + 1 N (B, η) (22) Definition 7 We say that the Mercer kernel K has logarithic coplexity exponent s 1 if for soe c > 0, there holds log N (η) c (log(1/η)) s, η > 0 (23) The kernel K has polynoial coplexity exponent s > 0 if log N (η) c (1/η) s, η > 0 (24) The covering nuber N (B, η) has been extensively studied (see eg Bartlett 1998, Williason, Sola & Schölkopf 2001, Zhou 2002, Zhou 2003a) In particular, for convolution type kernels K(x, y) = k(x y) with ˆk 0 decaying exponentially fast, (23) holds (Zhou 2002, Theore 3) As an exaple, consider the Gaussian kernel K(x, y) = exp x y 2 /σ 2 with σ > 0 If X [0, 1] n and 0 < η exp90n 2 /σ 2 11n 3, then (23) is valid (Zhou 2002) with s = n + 1 A lower bound (Zhou 2003a) holds with s = n 2, which shows the upper bound is alost sharp It was also shown in (Zhou 2003a) that K has polynoial coplexity exponent 2n/p if K is C p To deonstrate our ain results concerning the regularization paraeter C, we shall only choose kernels having logarithic coplexity exponents or having polynoial coplexity exponents with s sall This eans, H K has sall coplexity The special case we consider here is the deterinistic case: R(f c ) = 0 Understanding how to find D(C) and ε(δ,, C, β) in (20) and then how to choose the paraeter C is our target For M 0, denote 0, if M = 0, θ M = (25) 1, if M > 0 Proposition 8 Suppose R(f c ) = 0 If f K,C H K satisfies V (y, f K,C (x)) M, then for every 0 < δ < 1, with confidence at least 1 δ, we have R(sgn(f z )) 2 ax ε 4M log(2/δ), + 4D(C), (26) where with M := 2CD(C) + 2Cεθ M, ε > 0 is the unique solution to the equation ( ) ε log N q2 q+3 M (a) If (23) is valid, then with c = 2 q c((q + 4) log 8) s, 3ε = log(δ/2) (27) 2q+9 ( (log + log(cd(c)) + ε log(cθm )) s ) + log(2/δ) c 7

8 Chen, Wu, Ying, and Zhou (b) If (24) is valid and C 1, then with a constant c (given explicitly in the proof), (2CD(C)) s/(2s+2) ε c log(2/δ) 1/(s+1) + (2C)s/(s+2) 1/( s 2 +1) (28) Note that the above constant c depends on c, q, and κ, but not on C, or δ The proof of Proposition 8 will be given in Section 5 We can now derive our error bound for weakly separable distributions Definition 9 We say that ρ is (weakly) separable by H K if there is a function fsp H K, called a separating function, satisfying f sp K = 1 and yfsp(x) > 0 alost everywhere It has separation exponent θ (0, + ] if there are positive constants γ, c such that ρ X x X : fsp(x) < γt c t θ, t > 0 (29) Observe that condition (29) with θ = + is equivalent to ρ X x X : fsp(x) < γt = 0, 0 < t < 1 That is, fsp(x) γ alost everywhere Thus, weakly separable distributions with separation exponent θ = + are exactly strictly separable distributions Recall (eg Vapnik 1998, Shawe-Taylor et al 1998) that ρ is said to be strictly separable by H K with argin γ > 0 if ρ is (weakly) separable together with the requireent yfsp(x) γ alost everywhere Fro the first part of Definition 9, we know that for a separable distribution ρ, the set x : fsp(x) = 0 has ρ X -easure zero, hence li ρ Xx X : fsp(x) < t = 0 t 0 The separation exponent θ easures the asyptotic behavior of ρ near the boundary of two classes It gives the convergence with a polynoial decay Theore 10 If ρ is separable and has separation exponent θ (0, + ] with (29) valid, then for every 0 < δ < 1, with confidence at least 1 δ, we have R(sgn(f z )) 2ε + 8 log(2/δ) where ε is solved by ( ε log N q2 q+3 2(2c ) 2 θ+2 γ 2θ 2 θ+2 C θ+2 + 2Cε + 4(2c ) 2 θ+2 (Cγ 2 ) θ θ+2, (30) ) 3ε = log(δ/2) (31) 2q+9 (a) If (23) is satisfied, with a constant c depending on γ we have ( log(2/δ) + (log + log C) s ) R(sgn(f z )) c + C θ θ+2 (32) 8

9 SVM Soft Margin Classifiers (b) If (24) is satisfied, there holds R(sgn(f z )) c log 2 ( C δ s (s+1)(θ+2) 1 s+1 + C s s+2 1 s/2+1 ) + C θ θ+2 (33) Proof Choose t = ( γθ 2c C )1/(θ+2) > 0 and the function f K,C = 1 t f sp H K Then we have 1 2C f K,C 2 K = 1 2Ct 2 Since yfsp(x) > 0 alost everywhere, we know that for alost every x X, y = sgn(fsp)(x) This eans f ρ = sgn(fsp) and then f q = sgn(fsp) It follows that R(f c ) = 0 and V (y, f K,C (x)) = (1 f sp(x) t ) q + [0, 1] alost everywhere Therefore, we ay take M = 1 in Proposition 8 Moreover, E(f K,C ) = (1 yf ) sp(x) q (1 t dρ = f ) sp(x) q + t dρ X + This can be bounded by (29) as (1 f sp(x) x: fsp(x) <t Z t ) q X dρ X ρ X x X : fsp(x) < t c ( t ) θ + γ It follows fro the choice of t that D(C) c ( t ) θ 1 + γ 2Ct 2 = (2c ) 2 θ+2 (Cγ 2 ) θ θ+2 (34) Then our conclusion follows fro Proposition 8 In case (a), the bound (32) tells us to choose C such that (log C) s / 0 and C as By (32) a reasonable choice of the regularization paraeter C is C = θ+2 θ, (log )s which yields R(sgn(f z )) = O( ) In case (b), when (24) is valid, C = θ+2 θ+s+θs = R(sgn(fz )) = O( θ θ+s+θs ) (35) The following is a typical exaple of separable distribution which is not strictly separable In this exaple, the separation exponent is θ = 1 Exaple 1 Let X = [ 1/2, 1/2] and ρ be the Borel probability easure on Z such that ρ X is the Lebesgue easure on X and 1, for 1/2 x 1/4 and 1/4 x 1/2, f ρ (x) = 1, for 1/4 x < 1/4 (a) If K is the linear polynoial kernel K(x, y) = x y, then R(sgn(f z )) R(f c ) 1/4 (b) If K is the quadratic polynoial kernel K(x, y) = (x y) 2, then with confidence 1 δ, R(sgn(f z )) q(q + 4)2q+11 (log + log C + 2 log(2/δ)) + 16 C 1/3 Hence one should take C such that log C/ 0 and C as 9

10 Chen, Wu, Ying, and Zhou Proof The first stateent is trivial To see (b), we note that dih K = 1 and κ = 1/4 Also, E 0 = inf E(b) = 1 Hence κ = b R 1/(1 + q4 q 2 ) Since H K = ax 2 : a R and ax 2 K = a, N (B, η) 1/(2η) Then (23) holds with s = 1 and c = 4q By Proposition 8, we see that ε q(q+4)2q+11 (log(c)+log(2/δ)) Take the function fsp(x) = x 2 1/16 H K with fsp K = 1 We see that yfsp(x) > 0 alost everywhere Moreover, [ 1 16t t x X : fsp(x) < t, 4 4 ] [ t, t The easure of this set is bounded by 8 2t Hence (29) holds with θ = 1, γ = 1 and c = 8 2 Then Theore 10 gives the stated bound for R(sgn(f z )) 4 ] In this paper, for the saple error estiates we only use the (unifor) covering nubers in C(X) Within the last a few years, the epirical covering nubers (in l or l 2 ), the leave-one-out error or stability analysis (eg Vapnik 1998, Bousquet & Ellisseeff 2002, Zhang 2004), and soe other advanced epirical process techniques such as the local Radeacher averages (van der Vaart & Wellner 1996, Bartlett, Bousquet & Mendelson 2004, and references therein) and the entropy integrals (van der Vaart & Wellner, 1996) have been developed to get better saple error estiates These techniques can be applied to various learning algoriths They are powerful to handle general hypothesis spaces even with large capacity In Zhang (2004) the leave-one-out technique was applied to iprove the saple error estiates given in Bousquet & Ellisseeff (2002): the saple error has a kernel-independent bound O( C C ), iproving the bound O( ) in Bousquet & Ellisseeff (2002); while the regularization error D(C) depends on K and ρ In particular, for q = 2 the bound (Zhang 2004, Corollary 42) takes the for: ( E(E(f z )) 1 + 4κ2 C ) 2 inf E(f) + 1 ( f H K 2C f 2 K = 1 + 4κ2 C ) 2 D(C) + E(fq ) (36) In Bartlett, Jordan & McAuliffe (2003) the epirical process techniques are used to iprove the saple error estiates for ERM In particular, Theore 12 there states that for a convex set F of functions on X, the iniizer f of the epirical error over F satisfies E( f) inf E(f) + K ax f F ( ε cr L 2 log(1/δ) ) 1/(2 β), BL log(1/δ), (37) Here K, c r, β are constants, and ε is solved by an inequality The constant B is a bound for differences, and L is the Lipschitz constant for the loss φ with respect to a pseudoetric on R For ore details, see Bartlett, Jordan & McAuliffe (2003) The definitions of B and L together tell us that φ(y 1 f(x 1 )) φ(y 2 f(x 2 )) LB, (x 1, y 1 ) Z, (x 2, y 2 ) Z, f F (38) Because of our iproveent for the bound of the rando variables V (y, f(x)) given by the projection operator, for the algorith (3) the error bounds we derive here are better 10

11 SVM Soft Margin Classifiers than existing results when the kernel has sall coplexity Let us confir this for separable distributions with separation exponent 0 < θ < and for kernels with polynoial coplexity exponent s > 0 satisfying (24) Recall that D(C) = O(C θ θ+2 ) Take q = 2 Consider the estiate (36) This together with (34) tells us that the derived bound for E(E(f z ) E(f q )) is at least of the order D(C) + C D(C) = O(C θ θ+2 + C θ+2 2 ) By Lea 5, the optial bound derived fro (36) is O( θ θ+2 ) Thus, our bound (35) is better than (36) when 0 < s < 2 θ+1 Turn to the estiate (37) Take F to be the ball of radius 2C D(C) of H K (the expected sallest ball where fz lies), we find that the constant LB in (38) should be at least κ 2C D(C) This tells us that one ter in the bound (37) for the saple error is ( 2C at least O D(C) ) = O( C θ+2 1 ), while the other two ters are ore involved Applying Lea 5 again, we find that the bound derived fro (37) is at least O( θ θ+1 ) So our bound (35) is better than (37) at least for 0 < s < 1 θ+1 Thus, our analysis with the help of the projection operator iproves existing error bounds when the kernel has sall coplexity Note that in (35), the values of s for which the projection operator gives an iproveent are values corresponding to rapidly diinishing regularization (C = β with β > 1 being large) For kernels with large coplexity, refined epirical process techniques (eg van der Vaart & Wellner 1996, Mendelson 2002, Zhang 2004, Bartlett, Jordan & McAuliffe 2003) should give better bounds: the capacity of the hypothesis space ay have ore influence on the saple error than the bound of the rando variable V (y, f(x)), and the entropy integral is powerful to reduce this influence To find the range of this large coplexity, one needs explicit rate analysis for both the saple error and regularization error This is out of our scope It would be interesting to get better error bounds by cobining the ideas of epirical process techniques and the projection operator Proble 11 How uch iproveent can we get for the total error when we apply the projection operator to epirical process techniques? Proble 12 Given θ > 0, s > 0, q 1, and 0 < δ < 1, what is the largest nuber α > 0 such that R(sgn(f z )) = O( α ) with confidence 1 δ whenever the distribution ρ is separable with separation exponent θ and the kernel K satisfies (24)? What is the corresponding optial choice of the regularization paraeter C? In particular, for Exaple 1 we have the following Conjecture 13 In Exaple 1, with confidence 1 δ there holds R(sgn(f z )) = O( log(2/δ) ) by choosing C = 3 Consider the well known setting (Vapnik 1998, Shawe-Taylor et al 1998, Steinwart 2001, Cristianini & Shawe-Taylor 2000) of strictly separable distributions with argin γ > 0 In this case, Shawe-Taylor et al (1998) shows that R(sgn(f)) 2 (log N (F, 2, γ/2) + log(2/δ)) (39) 11

12 Chen, Wu, Ying, and Zhou with confidence 1 δ for > 2/γ whenever f F satisfies y i f(x i ) γ Here F is a function set, N (F, 2, γ/2) = sup N (F, t, γ/2) and N (F, t, γ/2) denotes the covering t X 2 nuber of the set (f(t i )) 2 i=1 : f F in R2 with the l etric For a coparison of this covering nuber with that in C(X), see Pontil (2003) In our analysis, we can take f K,C = 1 γ f sp H K Then M = V (y, f K,C (x)) = 0, and D(C) = 1 We see fro Proposition 8 (a) that when (23) is satisfied, there holds 2Cγ 2 R(f z ) c( (log(1/γ)+log )s +log(1/δ) ) But the optial bound O( 1 ) is also valid for spaces with larger coplexity, which can be seen fro (39) by the relation of the covering nubers: N (F, 2, γ/2) N (F, γ/2) See also Steinwart (2001) We see the shortcoing of our approach for kernels with large coplexity This convinces the interest of Proble 11 raised above 3 Coparison of Errors In this section we consider how to bound the isclassification error by the V -risk A systeatic study of this proble was done for convex loss functions by Zhang (2004), and for ore general loss functions by Bartlett, Jordan, and McAuliffe (2003) Using these results, it can be shown that for the loss function V q, there holds ( ) ψ R(sgn(f)) R(f c ) E(f) E(f q ), where ψ : [0, 1] R + is a function defined in Bartlett, Jordan, and McAuliffe (2003) and can be explicitly coputed In fact, we have R(sgn(f)) R(f c ) c E(f) E(f q ) (40) Such a coparison of errors holds true even for a general convex loss function, see Theore 34 in Appendix The derived bound for the constant c in (40) need not be optial In the following we shall give an optial estiate in a sipler for The constant we derive depends on q and is given by C q = 1, if 1 q 2, 2(q 1)/q, if q > 2 (41) We can see that C q 2 Theore 14 Let f : X R be easurable Then R(sgn(f)) R(f c ) C q (E(f) E(f q )) 2 (E(f) E(f q )) Proof By the definition, only points with sgn(f)(x) f c (x) are involved for the isclassification error Hence R(sgn(f)) R(f c ) = f ρ (x) χ sgn(f)(x) sgn(f q)(x)dρ X (42) X 12

13 SVM Soft Margin Classifiers This in connection with the Schwartz inequality iplies that R(sgn(f)) R(f c ) X f ρ (x) 2 χ sgn(f)(x) sgn(f q)(x)dρ X 1/2 Thus it is sufficient to show for those x X with sgn(f)(x) sgn(f q )(x), where for x X, we have denoted E(f x) := By the definition of the loss function V q, we have It follows that f ρ (x) 2 C q (E(f x) E(f q x)), (43) Y V q (y, f(x))dρ(y x) (44) E(f x) = (1 f(x)) q 1 + f ρ (x) + + (1 + f(x)) q 1 f ρ (x) + (45) 2 2 E(f x) E(f q x) = f(x) fq(x) where F (u) is the function (depending on the paraeter f ρ (x)): F (u) = 1 f ρ(x) q(1 + f q (x) + u) q f ρ(x) 2 F (u)du, (46) q(1 f q (x) u) q 1 +, u R Since F (u) is nondecreasing and F (0) = 0, we see fro (46) that when sgn(f)(x) sgn(f q )(x), there holds But E(f x) E(f q x) E(f q x) = fq(x) 0 F (u)du = E(0 x) E(f q x) = 1 E(f q x) (47) 2 q 1 ( 1 f ρ (x) 2) (1 + fρ (x)) 1/(q 1) + (1 f ρ (x)) 1/(q 1) q 1 (48) Therefore, (43) is valid (with t = f ρ (x)) once the following inequality is verified: t 2 2 q 1 ( 1 t 2) 1 C q (1 + t) 1/(q 1) + (1 t) 1/(q 1) q 1, t [ 1, 1] This is the sae as the inequality ) 1/(q 1) (1 t2 (1 + t) 1/(q 1) + (1 t) 1/(q 1), t [ 1, 1] (49) 2 C q To prove (49), we use the Taylor expansion (1 + u) α = k=0 ( ) α u k = k k=0 k 1 l=0 (α l) k! 13 u k, u ( 1, 1)

14 Chen, Wu, Ying, and Zhou With α = 1/(q 1), we have ) 1/(q 1) (1 t2 = C q k=0 ( ) k 1 1 l=0 q 1 + l k! 1 C k q t 2k and (all the odd power ters vanish) (1 + t) 1/(q 1) + (1 t) 1/(q 1) 2 = k=0 ( ) k 1 1 l=0 q 1 + l k! ( k 1 l=0 1 q 1 + l + k 1 + l + k ) t 2k Note that k 1 l=0 This proves (49) and hence our conclusion 1 q 1 + l + k 1 + l + k 1 Cq k When R(f c ) = 0, the estiate in Theore 14 can be iproved (Zhang 2004, Bartlett, Jordan & McAuliffe 2003) to R(sgn(f)) E(f) (50) In fact, R(f c ) = 0 iplies f ρ (x) = 1 alost everywhere This in connection with (48) and (47) gives E(f q x) = 0 and E(f x) 1 when sgn(f)(x) f c (x) Then (43) can be iproved to the for f ρ (x) E(f x) Hence (50) follows fro (42) Theore 14 tells us that the isclassification error can be bounded by the V -risk associated with the loss function V So our next step is to study the convergence of the q-classifier with respect to the V -risk E 4 Bounding the Offset If the offset b is fixed in the schee (3), the saple error can be bounded by standard arguent using soe easureents of the capacity of the RKHS However, the offset is part of the optiization proble and even its boundedness can not be seen fro the definition This akes our setting here essentially different fro the standard Tikhonov regularization schee The difficulty of bounding the offset b has been realized in the literature (eg Steinwart 2002, Bousquet & Ellisseeff 2002) In this paper we shall overcoe this difficulty by eans of special features of the loss function V The difficulty raised by the offset can also be seen fro the stability analysis (Bousquet & Ellisseeff 2002) As shown in Bousquet & Ellisseeff (2002), the SVM 1-nor classifier without the offset is uniforly stable, eaning that sup z Z,z 0 Z V (y, f z (x)) V (y, f z (x)) L β with β 0 as, and z is the sae as z except that one eleent of z is replaced by z 0 The SVM 1-nor classifier with the offset is not uniforly stable To see this, we choose x 0 X and saples z = (x 0, y i ) 2n+1 i=1 with y i = 1 for i = 1,, n + 1, and y i = 1 for i = n+2,, 2n+1 Take z 0 = (x 0, 1) As x i are identical, one can see fro the definition 14

15 SVM Soft Margin Classifiers (9) that f z = 0 (since E z (f z ) = E z (b z + f z (x 0 ))) It follows that f z = 1 while f z = 1 Thus, f z f z = 2 which does not converge to zero as = 2n + 1 tends to infinity It is unknown whether the q-nor classifier is uniforly stable for q > 1 How to bound the offset is the ain goal of this section In Wu & Zhou (2004) a direct coputation is used to realize this point for q = 1 Here the index q > 1 akes a direct coputation very difficult, and we shall use two bounds to overcoe this difficulty By x (X, ρ X ) we ean that x lies in the support of the easure ρ X on X Lea 15 For any C > 0, N and z Z, a iniizer of (9) satisfies and a iniizer of (15) satisfies in f z(x i ) 1 and ax f z(x i ) 1 (51) 1 i 1 i inf f K,C (x) 1 and sup f K,C (x) 1 (52) x (X,ρ X ) x (X,ρ X ) Proof Suppose a iniizer of (9) f z satisfies r := in 1 i f z(x i ) > 1 Then f z (x i ) (r 1) 1 for each i We clai that y i = 1, i = 1,, In fact, if the set I := i 1,, : y i = 1 is not epty, we have E z (f z (r 1)) = 1 (1 + f z (x i ) (r 1)) q < 1 (1 + f z (x i )) q = E z (f z ), i I which is a contradiction to the definition of f z Hence our clai is verified Fro the clai we see that E z (f z (r 1)) = 0 = E z (f z ) This tells us that f z := f z (r 1) is a iniizer of (9) satisfying the first inequality, hence both inequalities of (51) In the sae way, if a iniizer of (9) f z satisfies r := ax f z(x i ) < 1 Then we 1 i can see that y i = 1 for each i Hence E z (f z r 1) = E z (f z ) and f z := f z r 1 is a iniizer of (9) satisfying the second inequality and hence both inequalities of (51) Therefore, we can always find a iniizer of (9) satisfying (51) We prove the second stateent in the sae way Suppose r := i I inf x (X,ρ X ) f K,C (x) > 1 for a iniizer f K,C of (15) Then f K,C (x) (r 1) 1 for alost every x (X, ρ X ) Hence ( ) ( E fk,c (r 1) = 1 + f q K,C (x) (r 1)) P (Y = 1 x)dρx E( f K,C ) X As f K,C is a iniizer of (15), the above equality ust hold It follows that P (Y = 1 x) = 0 for alost every x (X, ρ X ) Hence F K,C := f K,C (r 1) is a iniizer of (15) satisfying the first inequality and thereby both inequalities of (52) Siilarly, when r := sup f K,C (x) < 1 for a iniizer f K,C of (15) Then P (Y = x (X,ρ X ) 1 x) = 0 for alost every x (X, ρ X ) Hence F K,C := f K,C r 1 is a iniizer of (15) satisfying the second inequality and thereby both inequalities of (52) 15

16 Chen, Wu, Ying, and Zhou Thus, (52) can always be realized by a iniizer of (15) bk,c In what follows we always choose f z and f K,C to satisfy (51) and (52), respectively Lea 15 yields bounds for the H K -nor and offset for f z and f K,C Denote b fk,c Lea 16 For any C > 0, N, f K,C H K, and z Z, there hold (a) f K,C K 2C D(C) 2C, b K,C 1 + f K,C (b) f K,C 1 + 2κ 2C D(C) 1 + 2κ 2C, E( f K,C ) E(f q ) + D(C) 1 (c) b z 1 + κ f z K Proof By the definition (15), we see fro the choice f = that Then the first inequality in (a) follows Note that (52) gives and D(C) = E( f K,C ) E(f q ) + 1 2C f K,C 2 K 1 E(f q ) (53) 1 sup f K,C b K,C + f K,C x (X,ρ X ) bk,c f K,C inf f K,C 1 x (X,ρ X ) Thus, b K,C 1 + f K,C This proves the second inequality in (a) Since the first inequality in (a) and (2) lead to f K,C κ 2C D(C), we obtain f K,C f K,C 1 + 2κ 2C D(C) Hence the first inequality in (b) holds The second inequality in (b) is an easy consequence of (53) The inequality in (c) follows fro (51) in the sae way as the proof of the second inequality in (a) as 5 Convergence of the q-nor Soft Margin Classifier In this section, we apply the ERM technique to analyze the convergence of the q-classifier for q > 1 The situation here is ore coplicated than that for q = 1 We need the following lea concerning q > 1 Lea 17 For q > 1, there holds (x) q + (y)q + q (axx, y) q 1 + x y, x, y R If y [ 1, 1], then (1 x) q + (1 y)q + q4 q 1 x y + q2 q 1 x y q, x R 16

17 SVM Soft Margin Classifiers Proof We only need to prove the first inequality for x > y This is trivial: (x) q + (y)q + = x y q(u) q 1 + du q(x)q 1(x y) If y [ 1, 1], then 1 y 0 and the first inequality yields (1 x) q + (1 y)q + q (ax1 x, 1 y) q 1 + x y (54) When x 2y 1, we have (1 x) q + (1 y)q + q2 q 1 (1 y) q 1 x y q4 q 1 x y When x < 2y 1, we have 1 x < 2(y x) and x < 1 This in cobination with ax1 x, 1 y ax1 x, 2 and (54) iplies (1 x) q + (1 y)q + q(1 x) q q 1 x y q2 q 1 x y q + q2 q 1 x y This proves Lea 17 + The second part of Lea 17 will be used in Section 6 The first part can be used to verify Proposition 4 Proof of Proposition 4 It is trivial that D(C) D(C) To show the second inequality, apply the first inequality of Lea 17 to the two nubers 1 y f K,C (x) and 1 y b K,C We see that (1 y f K,C (x)) q + (1 y b K,C ) q + q(1 + f K,C (x) + b K,C ) q 1 f K,C(x) Notice that f K,C (x) κ f K,C K and by Lea 16, b K,C 1 + κ f K,C K Hence E( f K,C ) E( b K,C ) κq2 q 1 (1 + κ f K,C K ) q 1 f K,C K It follows that when f K,C K κ, we have E( f K,C ) E(f q ) E 0 κq2 q 1 (1 + κ κ) q 1 κ Since E 0 1, the definition (18) of κ yields κ 1/(1 + κ) in1, 1/κ Hence D(C) E( f K,C ) E(f q ) E 0 κq4 q 1 κ = κ κ 2 As C 1/2, we conclude (17) in this case When f K,C K > κ, we also have Thus in both case we have verified (17) D(C) 1 2C f K,C 2 K κ2 2C 17

18 Chen, Wu, Ying, and Zhou Note that κ = 0 if and only if E 0 = 0 This eans for soe b 0 [ 1, 1], f q(x) = b 0 in probability By the definition of f q, the last assertion of Proposition 4 follows The loss function V is not Lipschitz, but Lea 17 enables us to derive a bound for E(π(f z )) E z (π(f z )) with confidence, as done for Lipschitz loss functions in Mukherjee, Rifkin & Poggio (2002) Since the function π(f z ) changes and lies in a set of functions, we shall copare the error with the epirical error for functions fro a set F := π(f) : f B R + [ B, B] (55) Here B R = f H K : f K R and the constant B is a bound for the offset The following probability inequality was otivated by saple error estiates for the square loss (Barron 1990, Bartlett 1998, Cucker & Sale 2001, Lee, Bartlett & Williason 1998) and will be used in our estiates Lea 18 Suppose a rando variable ξ satisfies 0 ξ M, and z = (z i ) i=1 are independent saples Let µ = E(ξ) Then for every ε > 0 and 0 < α 1, there holds µ 1 i=1 Prob ξ(z i) z Z α ε exp 3α2 ε µ + ε 8M Proof As ξ satisfies ξ µ M, the one-side Bernstein inequality tells us that µ 1 i=1 Prob ξ(z i) z Z α α 2 (µ + ε)ε ε exp µ + ε 2 ( σ Mα µ + ε ε ) Here σ 2 E(ξ 2 ) ME(ξ) = Mµ since 0 ξ M Then we find that This yields the desired inequality σ Mα µ + ε ε Mµ M(µ + ε) 4 M(µ + ε) 3 Now we can turn to the error bound involving a function set Lea 17 yields the following bounds concerning the loss function V : E z (f) E z (g) q ax (1 + f ) q 1, (1 + g ) q 1 f g (56) and E(f) E(g) q ax (1 + f ) q 1, (1 + g ) q 1 f g (57) Lea 19 Let F be a subset of C(X) such that f 1 for each f F Then for every ε > 0 and 0 < α 1, we have E(f) E z (f) Prob z Z sup 4α ( ) αε ε N F, f F E(f) + ε q2 q 1 exp 3α2 ε 2 q+3 18

19 SVM Soft Margin Classifiers Proof Let f j J j=1 F with J = N (F, αε at f j with radius f F Then for each j, Lea 18 tells q2 q 1 ) such that F is covered by balls centered αε q2 q 1 Note that ξ = V (y, f(x)) satisfies 0 ξ (1 + f ) q 2 q for Prob z Z E(f j ) E z (f j ) α ε E(fj ) + ε exp 3α2 ε 8 2 q For each f F, there is soe j such that f f j αε This in connection with q2 q 1 (56) and (57) tells us that E z (f) E z (f j ) and E(f) E(f j ) are both bounded by αε Hence E z (f) E z (f j ) α E(f) E(f j ) ε and α ε E(f) + ε E(f) + ε The latter iplies that E(f j ) + ε 2 E(f) + ε Therefore, E(f) E z (f) Prob z Z sup 4α J E(f j ) E z (f j ) ε Prob α ε f F E(f) + ε E(fj ) + ε j=1 which is bounded by J exp 3α2 ε 2 q+3 Take F to be the set (55) The following covering nuber estiate will be used Lea 20 Let F be given by (55) with R κ and B = 1 + κr Its covering nuber in C(X) can be bounded as follows: ( η ) N (F, η) N, η > 0 2R Proof It follows fro the fact π(f) π(g) f g that N (F, η) N (B R + [ B, B], η) The latter is bounded by (κ + 1 κ ) 2R η + 1N ( B R, η 2) since 2B η (κ + 1 κ ) 2R η and (f + b f ) (g + b g ) f g + b f b g Note that an η 2R -covering of B is the sae as an η 2 -covering of B R Then our conclusion follows fro Definition 6 We are in a position to state our ain result on the error analysis Recall θ M in (25) Theore 21 Let f K,C H K, M V (y, f K,C (x)), and 0 < β 1 For every ε > 0, we have E(π(f z )) E(f q ) (1 + β) (ε + D(C)) with confidence at least 1 F (ε) where F : R + R + is defined by ε 2 ( ) β(ε + D(C)) 3/2 F (ε) := exp + N 2M( + ε) q2 q+3 exp 3β2 (D(C) + ε) 2 Σ 2 q+9 ( + ε) with := D(C) + E(f q ) and Σ := 2C( + ε)( + θ M ε) (58) 19

20 Chen, Wu, Ying, and Zhou Proof Let ε > 0 We prove our conclusion in three steps Step 1: Estiate E z (f K,C ) E(f K,C ) Consider the rando variable ξ = V (y, f K,C (x)) with 0 ξ M If M > 0, since σ 2 (ξ) ME(f K,C ) M(D(C) + E(f q )), by the one-side Bernstein inequality we obtain E z (f K,C ) E(f K,C ) ε with confidence at least ε 2 1 exp 2(σ 2 (ξ) Mε) 1 exp ε 2 2M( + ε) If M = 0, then ξ = 0 alost everywhere Hence E z (f K,C ) E(f K,C ) = 0 with probability 1 Thus, in both cases, there exists U 1 Z with easure at least 1 exp ε2 such that E z (f K,C ) E(f K,C ) θ M ε whenever z U 1 2M(ε+ ) Step 2: Estiate E(π(f z )) E z (π(f z )) Let F be given by (55) with R = 2C ( + θ M ε) and B = 1 + κr By Proposition 4, R 2CD(C) κ Applying Lea 19 to F with ε := D(C) + ε > 0 and α = β 8 ε/ ( ε + E(fq )) (0, 1/8], we have E(f) E z (f) 4α ε E(f) E(f q ) + ε + E(f q ), f F (59) for z U 2 where U 2 is a subset of Z with easure at least ( α ε ) 1 N F, q2 q 1 exp 3α2 ε ( β(ε + D(C)) 3/2 ) 2 q+3 1 N q2 q+3 exp Σ 3β2 (D(C) + ε) 2 2 q+9 ( + ε) In the above inequality we have used Lea 20 to bound the covering nuber For z U 1 U 2, we have 1 2C f z 2 K E z (f K,C ) + 1 2C f K,C 2 K E(f K,C ) + θ M ε + 1 2C f K,C 2 K which equals to D(C) + θ M ε + E(f q ) = + θ M ε It follows that f z K R By Lea 16, b z B This eans, π(f z ) F and (59) is valid for f = π(f z ) Step 3: Bound E(π(f z )) E(f q ) using (14) Let α and ε be the sae as in Step 2 For z U 1 U2, both E z (f K,C ) E(f K,C ) θ M ε ε and (59) with f = π(f z ) hold true Then (14) tells us that E(π(f z )) E(f q ) 4α ε (E(π(f z )) E(f q )) + ε + E(f q ) + ε Denote r := E(π(f z )) E(f q ) + ε + E(f q ) > 0, we see that It follows that Hence r ε + E(f q ) + 4α ε r + ε r 2α ε + 4α 2 ε + 2 ε + E(f q ) E(π(f z )) E(f q ) = r ( ε + E(f q )) ε + 8α 2 ε + 4α ε 4α E(f q )/ ε 20

21 SVM Soft Margin Classifiers Putting the choice of α into above, we find that β 2 ε 2 E(π(f z )) E(f q ) ε + 8 ( ε + E(f q )) + β ε ε 2 ε + E(f q ) β 2 ε 16 ( ε + E(f q )) E(f q) ε which is bounded by (1 + β) ε = (1 + β)(ε + D(C)) Finally, noting that the easure of U 1 U2 is at least 1 F (ε), the proof is finished Observe that the function F is strictly decreasing and F (0) > 1, li ε F (ε) = 0 Hence for every 0 < δ < 1 there exists a unique nuber ε > 0 satisfying F (ε) = δ Also, for a fixed ε > 0, F (ε) < δ for sufficiently large, since F (ε) can be written as a + cb with 0 < a, b < 1 Therefore, Theore 21 can be restated in the following for Corollary 22 Let F be given in Theore 21 For every 0 < δ < 1, define ε(δ,, C, β) > 0 to be the unique solution to the equation F (ε) = δ Then with confidence at least 1 δ, (20) holds Moreover, ε(δ,, C, β) = 0 li Now we can prove the stateents we ade in Section 2 for the deterinistic case Proof of Proposition 8 Take β = 1 Since R(f c ) = 0, we know that f q = f c, E(f q ) = 0 and = D(C) Then Σ 2C(D(C) + θ M ε) D(C) + ε and ( ) β(ε + D(C)) 3/2 ( ) ε N q2 q+3 N Σ q2 q+3 M The function value F (ε) can be bounded by ε 2 ( exp + N 2M(D(C) + ε) ε q2 q+3 M The first ter above is bounded by δ/2 when ε at ost δ/2 if ε ε Therefore 4M log(2/δ) ε(δ,, C, β) ax 4M log(2/δ) ) exp 3ε 2 q+9 + D(C) The second ter is + D(C), ε, and (26) follows fro Theore 21 (a) When (23) holds, noting that the left side of (27) is strictly decreasing, it is easy to check that (log + log(cd(c)) + ε 2 q c((q + 4) log 8) s log(cθm )) s + log(2/δ) This yields the stated bound for ε in the case (a) (b) If (24) is true, then the function defined by F (ε) := c (q2q+4 ) s (2CD(C)) s/2 + (2Cε) s/2 ε s 21 3ε 2 q+9

22 Chen, Wu, Ying, and Zhou satisfies F (ε ) log(δ/2) Since F is a decreasing function, ε ε whenever F ( ε ) log(δ/2) If ε r 1/(s+1) (2CD(C)) s/(2s+2) log 2 δ with r 2 q+8 κ s/(s+1) + q2 q+10 c 1/(s+1) / log 2, (60) then c(q2 q+4 ) s (2CD(C)) s/2 ( ε ) s ε rs/(s+1) 2q+8 2 q+8 (2CD(C)) s 2 2s+2 log δ c(q2 q+4 ) s 2 q+8 (r log(2/δ)) s+1 1 Since r q2 q+10 c 1/(s+1) / log 2, according to Proposition 4, this can be bounded by r 2 q+8 s/(s+1) (2CD(C)) s/(2s+2) log 2 1 r δ 2 2 q+9 κs/(s+1) log δ log δ 2 Here we have used the condition r 2 q+8 κ s/(s+1) In the sae way, if ε r 2/(s+2) (2C) s/(s+2) log 2 δ with we have for C 1/2, r 2 q+9 + q 2 4 q+5 c 2/(s+2) / log 2, (61) c(q2 q+4 ) s ( 2C ε ) s/2 ε 2 q log δ 2 Cobining the above two bounds, we obtain the desired estiate (28) with c deterined by the two conditions (60) and (61) The proof of Proposition 8 is coplete In the general case, the following bounds hold Corollary 23 For every 0 < δ 1, with confidence at least 1 δ there holds 2 (1 D(C)) q/2 q+1 + κ 2C log(2/δ) E(π(f z )) E(f q ) 2 ax, ε + 2 D(C), where ε is the solution to the equation ( ε 3/2 ) log N q4 q+2 3ε2 2C 4 q+5 = log δ 2 (62) Proof Take β = 1 and f K,C = f K,C By Lea 16, f K,C 1 + 2κ 2C D(C) and we can take M = 2 q (1 + κ 2C D(C)) q Also, = D(C) + E(f q ) 1 It follows that Σ 2C( + ε) 2C(1 + ε) Since E(π(f z )) 2 q, we only need to consider the range ε 2 q to bound ε(δ,, C, β) In this range, ( ) ( ) β(ε + D(C)) 3/2 ε 3/2 N q2 q+3 N Σ q4 q+2 2C 22

23 SVM Soft Margin Classifiers Then F (ε) can be bounded by ε 2 ( exp 4 q+1 (1 + κ 2C D(C)) + N q ε 3/2 q4 q+2 2C ) exp 3ε2 4 q+5 Thus F (ε) δ if 2 q+1 (1 + κ 2C D(C)) q/2 log(2/δ) ε ax, ε where ε is the solution to the equation (62) This together with Theore 21 yields the desired estiate The bound for the saple error derived in Corollary 23 ay be further iproved by the well developed epirical process techniques in the literature We shall discuss this elsewhere The total error (14) consists of two parts We shall not discuss the possibility of further iproving the saple error bound here, because it is of the sae iportance to understand the regularization error This becoes ore iportant when not uch estiate is available for the regularization error In the previous sections, we could copute D(C) explicitly only for special cases Most of the tie, ρ is not strictly separable, even not weakly separable Hence it is desirable to estiate D(C) explicitly for general distributions In the following we shall choose f K,C = f K,C and estiate D(C) 6 Error Analysis by Approxiation in L q Spaces The ain result on the convergence analysis given in Section 5 enables us to have soe nice observations These follow fro facts on approxiation in L q spaces Lea 24 If 1 < q 2, then (1 + u) q u q + qu, u R Proof Set the continuous function f(u) := 1 + u q + qu (1 + u) q + Then f(0) = 0 Since 0 < q 1 1, for u > 0 we have Hence f(u) 0 for u 0 For 1 < u < 0, we see that f (u) = q ( 1 + u q 1 (1 + u) q 1) 0 f (u) = q ( 1 ( u) q 1 (1 + u) q 1) = q ( 1 u q 1 (1 u ) q 1) 0 Hence f(u) 0 for 1 < u < 0 Finally, when u < 1, there holds f (u) = q ( 1 ( u) q 1) 0 23

24 Chen, Wu, Ying, and Zhou Therefore, we also have f(u) f( 1) 0 for u 1 Thus f(u) 0 on the whole real line and Lea 24 is proved Theore 25 Let f : X R be easurable Then f f q q L q, if 1 < q 2, ρ E(f) E(f q ) X ( ) q2 q 1 f f q L q ρx 2 q 1 + f f q q 1 L q, if q > 2 ρ X Proof Since f q (x) 1, by the second inequality of Lea 17, for each x X we have E(f x) E(f q x) = (1 yf(x)) q + (1 yf q(x)) q + dρ(y x) It follows that E(f) E(f q ) = X Y q4 q 1 f(x) f q (x) + q2 q 1 f(x) f q (x) q E(f x) E(f q x)dρ X q4 q 1 f f q L 1 ρx + q2 q 1 f f q q L q ρ X Then the inequality for the case q > 2 follows fro the Hölder inequality Turn to the case 1 < q 2 It is sufficient to show that for each x X, The definition (10) of f q tells us that and (1 + f q (x)) q 1 = (1 f q (x)) q 1 = These expressions in connection with (45) iply E(f x) = and together with (48) E(f x) E(f q x) f(x) f q (x) q (63) 2 q 1 (1 + f ρ (x)) (1 + f ρ (x)) 1/(q 1) + (1 f ρ (x)) 1/(q 1) q 1 2 q 1 (1 f ρ (x)) (1 + f ρ (x)) 1/(q 1) + (1 f ρ (x)) 1/(q 1) q 1 (1 + f(x))q + (1 f q(x)) q 1 (1 + f q (x)) q 1 + (1 f q (x)) q 1 + (1 f(x))q + (1 + f q(x)) q 1 (1 + f q (x)) q 1 + (1 f q (x)) q 1 E(f q x) = 2(1 f q(x)) q 1 (1 + f q (x)) q 1 (1 + f q (x)) q 1 + (1 f q (x)) q 1 Thus (63) follows fro the following inequality (by taking t = f(x) and θ = f q (x)): (1 + t) q + (1 θ)q 1 (1 + θ) q 1 + (1 θ) q 1 + (1 t)q + (1 + θ)q 1 (1 + θ) q 1 + (1 θ) q 1 2 (1 θ)q 1 (1 + θ) q 1 (1 + θ) q 1 + (1 θ) q 1 t θ q, t R, θ ( 1, 1) (64) 24

25 SVM Soft Margin Classifiers What is left is to verify the inequality (64) Since 1 < θ < 1, we have ( ) (1 + t) q 1 + t q ( + (1 θ)q 1 = (1 θ 2 ) q 1 (1 + θ) = (1 θ 2 ) q 1 (1 + θ) 1 + t θ 1 + θ 1 + θ By Lea 24 with u = (t θ)/(1 + θ), we see that ( (1 + t) q + (1 θ)q 1 (1 θ 2 ) q 1 (1 + θ) 1 + t θ 1 + θ In the sae way, by Lea 24 with u = (t θ)/(1 θ), we have ( (1 t) q + (1 + θ)q 1 (1 θ 2 ) q 1 (1 θ) 1 + t θ q 1 θ Cobining the above two estiates, we obtain + q + q t θ ) 1 + θ q t θ 1 θ (1 + t) q + (1 θ)q 1 + (1 t) q + (1 + θ)q 1 2(1 θ 2 ) q 1 + t θ q (1 θ) q 1 + (1 + θ) q 1 This proves our clai (64), thereby Theore 25 ) ) q + Recall the K-functional given by (19) Theore 26 For each C > 0, there holds D(C) K(f q, 1 2C ) Proof The case 1 < q 2 is an easy consequence of Theore 25 Turn to the case q > 2 The special choice f = H K and the fact f q L q ρx 1 tell us that for any t > 0, K(f q, t) = inf f H K f fq L q ρx 1 q2 q 1 (2 q 1 + 1) f f q L q ρx + t f 2 K According to Theore 25, for f H K with f f q L q ρx 1, we have Thus, D(C) = and the proof of Theore 26 is coplete E(f) E(f q ) q2 q 1 (2 q 1 + 1) f f q L q ρx inf E(f) E(f q ) + 1 f H K 2C f 2 1 K K(f q, 2C ) We can now derive several observations fro Theore 26 The first observation says that the SVM q-classifier converges when f q lies in the closure of H K in L q ρ X In particular, for any Borel probability easure ρ, this is always the case if K is a universal kernel since C(X) is dense in L q ρ X 25

26 Chen, Wu, Ying, and Zhou Corollary 27 If f q lies in the closure of H K in L q ρ X, then for every ε > 0 and 0 < δ < 1, there exist C ε > 0, 0 N and a sequence C with li C = such that Prob z Z R(sgn(f z )) R(f c ) ε 1 δ, 0, C ε C C Proof Since f q lies in the closure of H K in L q ρ X, for every ε > 0 there is soe f ε = fε +b ε H K such that q2 q (2 q 1 + 1) f ε f q L q ρx ε 2 /16 Take C ε = 8 fε 2 K /ε2 Then for any C C ε, we have 1 2C f ε 2 K ε2 /16 Theore 26 tells us that D(C) ε 2 /8 Take 0 such that and Also, we choose C such that 2 q+1 (1 + κ 2C ε ) q/2 log(2/δ) 0 ε2 8 ( ε 3 ) log N q4 q+4 3 0ε 4 2C ε 4 q+8 log δ 2 2 q+1 (1 + κ 2C ) q/2 log(2/δ) < ε2 8 and Then by Corollary 52, ε(δ,, C, β) ε2 8 Theore 14, our conclusion is proved ( ε 3 ) log N q4 q+4 3ε4 2C 4 q+8 log δ 2 when 0 and C ε C C Together with Our second observation fro Theore 26 concerns nonuniversal kernels which nonetheless ensures the convergence of the SVM q-classifier The point here is that H K is not dense in C(X), but after adding the offset the space H K becoes dense Exaple 2 Let K be a Mercer kernel on X = [0, 1]: K(x, y) = j J a j (x y) j, where J is a subset of N, a j > 0 for each j J, and j J a j < Note that this kernel satisfies K 0 (y) 0, hence f(0) = 0 for all f H K Hence the space H K is not dense in C(X) and K is not an universal kernel But if 1 j =, then H K is dense in C(X) (Zhou j J 2003a) and hence in L q ρ X Therefore, the SVM q-classifier associated with the (identical) kernel K converges Reark 28 In Section 4 and the proof of Theore 21, we have shown how the offset influences the saple error Proposition 4 and Exaple 2 tell that it ay also influence the approxiation error However, our analysis in the following two sections will not focus on this point and it ay be an interesting topic 26

27 SVM Soft Margin Classifiers In practical applications, one can use varying kernels for (3) Definition 29 Let K d d N be a sequence of Mercer kernels on X We say that the SVM q-classifier associated with the kernels K d converges if for a sequence C N of positive nubers, f z defined by (3) with K = K d and C = C satisfies the following: For every Borel probability easure ρ on Z, and 0 < δ < 1, ε > 0, for sufficiently large d there is soe d N such that Prob z Z R(sgn(f z )) R(f c ) ε 1 δ, d For a universal kernel K, one ay take K d to be identically K and the convergence holds But the kernels could change such as the polynoial kernels (Boser, Guyon & Vapnik 1992) Our third observation fro Theore 26 is to confir the convergence of the SVM q-classifiers with these kernels Proposition 30 For any 1 < q <, n N and X R n, the SVM q-classifier associated with K d d=1, the polynoial kernels K d(x, y) = (1 + x y) d, converges Proposition 30 is a consequence of a quantitative result below Let P d be the space of all polynoials of degree at ost d It is a RKHS H Kd with the Mercer kernel K d (x, y) = (1 + x y) d The rich knowledge fro approxiation theory tells us that for an arbitrary Borel probability easure on Z, there is a sequence of polynoials p d P d d=1 such that li f q p d L q d ρx = 0 The rate of this convergence depends on the regularity of the function f q (hence the function f ρ ) and the arginal distribution ρ X With this in hand, we can now state the result on the convergence of the q-classifier with polynoial kernels K d Corollary 31 Let X R n, and ρ be an arbitrary Borel probability easure on Z Let d N and K d (x, y) = (1 + x y) d be the polynoial kernel Set X := sup x Let x X p d P d d=1 satisfy E d := f q p d L q ρx 0 (as d ) Set N := (n + d)!/(n!d!) + 1 and 0 < σ < 2 q Then there exists q,σ N such that for q,σ and C = σ, for every 0 < δ < 1, with confidence 1 δ there holds R(sgn(f z )) R(f c ) q2 q+1 E d + p d Kd σ/2 + 2q+5 (N log ( 2 δ ))1/4 (1 + X 2 ) qd/8 1 4 qσ 8 Proof Take p d = p d with zero offset in the K-functional Then by Theore 26, D(C) K(f q, 1 2C ) q2q 1 (2 q 1 + 1)E d + 1 2C p d 2 K d The covering nubers of the finite diensional space P d (eg Cucker & Zhou 2004) and (22) give us the estiate: N (η) (κ + 1 κ ) ( 2 η + 1 ) N, where N 1 = (n + d)!/(n!d!) is the diension of the space P d (R n ) Also, κ = K d (1 + X 2 ) d/2 27

28 Chen, Wu, Ying, and Zhou Take C = σ By Corollary 23, solving the equation (62) yields ε(δ,, C, β) (2q+5 N + log(κ + 1 κ )(log + log(2/δ))1/2 + 4q+2 κ q 2 log( 2 δ ) 1 2 qσ 4 which is bounded by 4 5+q N log ( 2 δ )κq/2 qσ for q,σ Here q,σ is an integer depending on q, σ (but not on, d or δ) Then for each q,σ, with confidence 1 δ the desired estiate holds true Reark 32 Note that P d is a finite diension space Thus the nors Kd and L q ρx are equivalent for a fixed d when ρ X is non-degenerate It would be interesting to copare the nor p Kd with p L q ρx for p P d as d tends to infinity Proof of Proposition 30 For every ε > 0, there exists soe d 0 N such that q2 q 1 E d ε/2 for every d d 0 Then by Corollary 31 we can find soe q,σ d N such that σ/2 d p d Kd ε/4 and 2 q+5 (N log(2/δ)) 1/4 (1 + X 2 ) dq/8 qσ d ε/4 Then for any d we have R(sgn(f z )) R(f c ) ε with confidence 1 δ This proves Proposition 30 7 Rate of Convergence for the q-nor Soft Margin Classifier Corollary 23 and Theore 26 enable us to get the convergence rate for the SVM q-classifier The rate depends on the K-functional K(f q, t) It can be characterized by the quantity (Sale & Zhou 2003, Zhou 2003b) I q (g, R) := inf g f L qρx f H K (65) f K R Define J q (f q, R) := Then the following corollary holds true Corollary 33 For any t > 0 there holds (Iq (f q, R)) q, if 1 < q 2, q2 q 1 (2 q 1 + 1)I q (f q, R), if q > 2 K(f q, t) inf R>0 J q(f q, R) + tr 2 One ay choose appropriate R to estiate the convergence rate of K(f q, t), which together with Corollary 23 gives the convergence rate of the V -risk and a strategy of choosing the regularization paraeter C In general, the choice of R depends on the regularity of f q and the kernel K Let us deonstrate this by exaples In what follows let X R n have Lipschitz boundary and ρ be a probability easure such that dρ X = dx is the Lebesgue easure Consider q = 2 and thus f q = f ρ We use the 28

29 SVM Soft Margin Classifiers approxiation error studied in Sale & Zhou (2003) (see also Niyogi & Girosi (1996) for related discussion): I2(g, R) := inf g f f H L 2 K f K R to bound the ter I 2 (f ρ, R) With the choice b f = 0 we obtain I 2 (f ρ, R) I 2(f ρ, R) Note that we disregard the influence of the offset here and thus the rate need not be optial The first exaple includes spline kernels (Wahba 1990) Exaple 3 Let X R n and K be a Mercer kernel such that H K is the Sobolev space H r (X) with r > n/2 If f ρ lies in the Sobolev space H s (X) with 0 < s < r, then E(π(f z )) E(f ρ ) = O ( log + C ) ) + Cn/(4r+3n) 2r/(4r+3n) + O (C s/r Thus, C should be chosen such that C, C/ 0 as Proof It was shown in Sale & Zhou (2003, Theore 31) that for 0 < θ < 1, I2 (f ρ, R) = O(R θ/(1 θ) ) if and only if f ρ lies in the interpolation space (L 2 ρ X, H K ) θ, It is well known that H s (X) (L 2, H r (X)) s/r, for 0 < s < r Here H K = H r (X) and dρ X = dx Therefore, the assuption f ρ H s (X) tells us that f ρ (L 2 ρ X, H K ) s/r, Hence there holds I 2 (f ρ, R) I2(f ρ, R) C ρ R s/(r s) for soe constant C ρ Choose R = C ρ (r s)/r C (r s)/2r to obtain K(f ρ, 1 2C ) (I 2(f ρ, R)) 2 + R2 2C 3 2 C2(r s)/r ρ C s/r Using the well known covering nuber estiates for Sobolev spaces log N (B R, η) C r ( 1 η ) n/r and solving the equation (62), we see that ε 2 6 log κ + log + log C + log(2/δ) This proves the conclusion n/(4r) C r C n/(4r+3n) 2r/(4r+3n) Exaple 4 Let σ > 0, s > 0 and K be the Gaussian kernel K(x, y) = exp x y 2 σ 2 29

30 Chen, Wu, Ying, and Zhou (a) If f ρ lies in the interpolation space (L 2 ρ X, H K ) θ, for soe 0 < θ < 1, that is, K(f ρ, t) C θ t θ for soe constant C θ, then for any 0 < δ < 1, with confidence 1 δ, there holds ( ) ) C + (log ) n+1 E(π(f z )) E(f ρ ) = O + O (C θ This iplies the paraeter C should be taken to satisfy C and C/ 0 as An asyptotic optial choice is C = O( 1/(1+2θ) ) With this choice, the convergence rate is O( θ/(1+2θ) ) (b) If f ρ H s (X) with s > 0, then for any 0 < δ < 1, with confidence 1 δ, we have ( ) C + (log ) n+1 ( E(π(f z )) E(f ρ ) = O + O (log C) s/2) An asyptotically optial choice is C = O ( (log ) s ) which gives the convergence rate O((log ) s/2 ) Proof Solving the equation (62) with the covering nuber estiate (23) yields ( (1 + c)(log C + log ) ε n+1 = O ) Then the stateent in (a) follows easily fro Corollary 23, Theore 21 and Corollary 33 To see the conclusion in (b), we notice that the assuption f ρ H s (X) provides the approxiation error estiates (Sale & Zhou 2003, Zhou 2003b) I 2 (f ρ, R) I 2(f ρ, R) C s (log R) s/4 for every R > C s, where C s is a constant depending on s, σ, n and the Sobolev nor of f ρ Choose R = 2C (log C) s/4 to obtain K(f ρ, 1 2C ) (I 2(f ρ, R)) 2 + R2 2C (2s C 2 s + 1)(log C) s/2 Then the desired bound follows fro Theore 26 and the above established bound for ε It was shown in Sale & Zhou (2003) that for the Gaussian kernel in Exaple 4, I2 (g, R) = O(R ɛ ) with ɛ > 0 only if g is C Hence logarithic convergence rate is expected for a Sobolev function f ρ However, in practice, one often chooses the different variances σ of the Gaussian kernel according to the different saple size With this flexibility, the regularization error can be iproved greatly and polynoial convergence rates are possible Acknowledgents 30

31 SVM Soft Margin Classifiers The authors would like to thank Professor Zhen Wang for his helpful discussion on the best constants in Theore 14 and Theore 25 They are also grateful to the referees for the constructive suggestions and coents This work is supported partially by the Research Grants Council of Hong Kong [Project No CityU 1087/02P] and by City University of Hong Kong [Project No ] The corresponding author is Ding-Xuan Zhou Yiing Ying is on leave fro Institute of Matheatics, Chinese Acadey of Science, Beijing , P R CHINA Appendix Error Coparison for a General Loss Function In this appendix for a general convex loss function we bound the excess isclassification error by the excess V -risk Here the loss function takes the for V (y, f(x)) = φ(yf(x)) for a univariate function φ : R R + For each x X, we denote E(f x) := Y V (y, f(x))dρ(y x), (66) then E(f x) = Q(η(x), f(x)) Here Q : [0, 1] (R ± ) R + is given by Q(η, f) = ηφ(f) + (1 η)φ( f), and η : X R is defined by η(x) := P (Y = 1 x) Set f φ (η) := arg in f R ± Q(η, f) Then fρ V (x) = fφ (η(x)) The ain result of Zhang (2004) can be stated as follows Theore A Let φ be convex Assue fφ (η) > 0 when η > 05 Assue there exists c > 0 and s 1 such that for all η [0, 1], then for any easurable function f(x): 05 η s c s ( Q(η, 0) Q(η, f φ (η))), (67) R(sgn(f)) R(f c ) 2c ( E(f) E(f V ρ ) ) 1/s Further analysis was ade by Bartlett, Jordan & McAuliffe (2003) For exaple, it was proved in Bartlett, Jordan & McAuliffe (2003, Theore 6) that for a convex function φ, f φ (η) > 0 for any η > 05 if and only if φ is differentiable at 0 and φ (0) < 0 Borrowing soe ideas fro Bartlett, Jordan & McAuliffe (2003), we can derive a siple criterion for the condition (67) with s = 2 The existence of φ (0) eans that the function φ (x) is well defined in a neighborhood of 0 and is differentiable at 0 Note that the convexity of φ iplies φ (0) 0 31

32 Chen, Wu, Ying, and Zhou Theore 34 Let φ : R R + be a convex function such that φ (0) exists If φ (0) < 0 and φ (0) > 0, then (67) holds for s = 2 Hence for any easurable function f: R(sgn(f)) R(f c ) 2c E(f) E(fρ V ) Proof By the definition of φ (0), there exists soe 1/2 c 0 > 0 such that φ (f) φ (0) φ (0) f φ (0), f [ c 0, c 0 ] 2 This iplies φ (0) + φ (0)f φ (0) 2 If η > 1/2, then for 0 f c 0, f φ (f) φ (0) + φ (0)f + φ (0) f 2 Q f (η, f) = ηφ (f) (1 η)φ ( f) (2η 1)φ (0) + φ (0)f + φ (0) f 2 Thus for 0 f η := in φ (0) φ (0) (η 1 2 ), c 0, we have Q f (η, f) (2η 1)φ (0) φ (0) φ (0) φ (0) (η 1 2 ) φ (0) 2 (η 1 2 ) < 0 Therefore as a function of the variable f, Q(η, f) is strictly decreasing on the interval [0, η ] But fφ (η) > 0 is its inial point, hence Q(η, 0) Q(η, f φ (η)) Q(η, 0) Q(η, η) φ (0) 2 (η 1 2 ) η When φ (0) φ (0) (η 1 2 ) > c 0, we have η = c 0 2c 0 (η 1 2 ) Hence Q(η, 0) Q(η, f φ (η)) φ (0) 2 That is, (67) holds with s = 2 and in 2c 0, φ (0) φ (η 1 (0) 2 )2 2φ c = ax (0) φ (0), 1 φ (0)c 0 The proof for η < 1/2 is the sae by estiating the upper bound of Q f (η, f) for f < 0 Turn to the special loss function V = V q given in (6) by φ(t) = (1 t) q + Applying Theore 34, we see that the function φ satisfies φ (0) = q < 0 and φ (0) = q(q 1) > 0 This verifies (40) and the constant c can be obtained fro the proof of Theore 34 32

33 SVM Soft Margin Classifiers References N Aronszajn Theory of reproducing kernels Trans Aer Math Soc 68 (1950), , 1950 A R Barron Coplexity regularization with applications to artificial neural networks G Roussa, editor, in Nonparaetric Functional Estiation, Kluwer, Dortrecht, pages , 1990 P L Bartlett The saple coplexity of pattern classification with neural networks: the size of the weights is ore iportant than the size of the network IEEE Trans Infor Theory, 44: , 1998 P L Bartlett, O Bousquet, and S Mendelson Local Radeacher coplexities Annals Statis, 2004, to appear P L Bartlett, M I Jordan, and J D McAuliffe Convexity, classification, and risk bounds Preprint, 2003 B E Boser, I Guyon, and V Vapnik A training algorith for optial argin classifiers In Proceedings of the Fifth Annual Workshop of Coputational Learning Theory, 5, Pittsburgh, ACM, pages , 1992 O Bousquet and A Elisseeff Stability and generalization J Mach Learning Res, 2: , 2002 C Cortes and V Vapnik Support-vector networks Mach Learning, 20: , 1995 N Cristianini and J Shawe-Taylor An Introduction to Support Vector Machines Cabridge University Press, 2000 F Cucker and S Sale On the atheatical foundations of learning Bull Aer Math Soc, 39: 1 49, 2001 F Cucker and D X Zhou Learning Theory: an Approxiation Theory Viewpoint Monograph anuscript in preparation for Cabridge University Press, 2004 L Devroye, L Györfi, and G Lugosi A Probabilistic Theory of Pattern Recognition Springer-Verlag, New York, 1997 T Evgeniou, M Pontil, and T Poggio Regularization networks and support vector achines Adv Coput Math, 13: 1 50, 2000 W S Lee, P Bartlett, and R Williason The iportance of convexity in learning with least square loss IEEE Trans Infor Theory, 44: , 1998 Y Lin Support vector achines and the Bayes rule in classification Data Mining and Knowledge Discovery 6: , 2002 S Mendelson Iproving the saple coplexity using global data IEEE Trans Infor Theory 48: ,

34 Chen, Wu, Ying, and Zhou S Mukherjee, R Rifkin, and T Poggio Regression and classification with regularization In D D Denison, M H Hansen, C C Holes, B Mallick, and B Yu, eds, Lecture Notes in Statistics: Nonlinear Estiation and Classification, Springer-Verlag, New York, pages , 2002 P Niyogi and F Girosi On the relationship between generalization error, hypothesis coplexity, and saple coplexity for radial basis functions Neural Cop, 8: , 1996 M Pontil A note on different covering nubers in learning theory J Coplexity 19: , 2003 F Rosenblatt Principles of Neurodynaics Spartan Book, New York, 1962 J Shawe-Taylor, P L Bartlett, R C Williason, and M Anthony Structural risk iniization over data-dependent hierarchies IEEE Trans Infor Theory 44: , 1998 S Sale and D X Zhou Estiating the approxiation error in learning theory Anal Appl, 1: 17 41, 2003 S Sale and D X Zhou Shannon sapling and function reconstruction fro point values Bull Aer Math Soc, 41: , 2004 I Steinwart On the influence of the kernel on the consistency of support vector achines Journal of achine learning research, 2: 67 93, 2001 I Steinwart Support vector achines are universally consistent J Coplexity, 18: , 2002 A Tikhonov and V Arsenin Solutions of Ill-posed Probles W H Winston, 1977 A W van der Vaart and J A Wellner Weak Convergence and Epirical Processes Springer-Verlag, New York, 1996 V Vapnik Estiation of Dependences Based on Epirical Data Springer-Verlag, New York, 1982 V Vapnik Statistical Learning Theory John Wiley & Sons, 1998 G Wahba Spline odels for observational data SIAM, 1990 G Wahba Support vector achines, reproducing kernel Hilbert spaces and the Randoized GACV In Schölkopf, Burges and Sola, eds, Advances in Kernel Methods - Support Vector Learning, MIT Press, pages 69 88, 1999 R C Williason, A J Sola, and B Schölkopf Generalization perforance of regularization networks and support vector achines via entropy nubers of copact operators IEEE Trans Infor Theory 47: , 2001 Q Wu and D X Zhou Analysis of support vector achine classification Preprint,

35 SVM Soft Margin Classifiers T Zhang Statistical behavior and consistency of classification ethods based on convex risk iniization Annals Statis, 32: 56 85, 2004 D X Zhou The covering nuber in learning theory J Coplexity, 18: , 2002 D X Zhou Capacity of reproducing kernel spaces in learning theory IEEE Trans Infor Theory, 49: , 2003a D X Zhou Density proble and approxiation error in learning theory Preprint, 2003b 35

Machine Learning Applications in Grid Computing

Machine Learning Applications in Grid Computing Machine Learning Applications in Grid Coputing George Cybenko, Guofei Jiang and Daniel Bilar Thayer School of Engineering Dartouth College Hanover, NH 03755, USA [email protected], [email protected]

More information

Reliability Constrained Packet-sizing for Linear Multi-hop Wireless Networks

Reliability Constrained Packet-sizing for Linear Multi-hop Wireless Networks Reliability Constrained acket-sizing for inear Multi-hop Wireless Networks Ning Wen, and Randall A. Berry Departent of Electrical Engineering and Coputer Science Northwestern University, Evanston, Illinois

More information

arxiv:0805.1434v1 [math.pr] 9 May 2008

arxiv:0805.1434v1 [math.pr] 9 May 2008 Degree-distribution stability of scale-free networs Zhenting Hou, Xiangxing Kong, Dinghua Shi,2, and Guanrong Chen 3 School of Matheatics, Central South University, Changsha 40083, China 2 Departent of

More information

This paper studies a rental firm that offers reusable products to price- and quality-of-service sensitive

This paper studies a rental firm that offers reusable products to price- and quality-of-service sensitive MANUFACTURING & SERVICE OPERATIONS MANAGEMENT Vol., No. 3, Suer 28, pp. 429 447 issn 523-464 eissn 526-5498 8 3 429 infors doi.287/so.7.8 28 INFORMS INFORMS holds copyright to this article and distributed

More information

Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints

Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints Journal of Machine Learning Research 13 2012) 2503-2528 Subitted 8/11; Revised 3/12; Published 9/12 rading Regret for Efficiency: Online Convex Optiization with Long er Constraints Mehrdad Mahdavi Rong

More information

Applying Multiple Neural Networks on Large Scale Data

Applying Multiple Neural Networks on Large Scale Data 0 International Conference on Inforation and Electronics Engineering IPCSIT vol6 (0) (0) IACSIT Press, Singapore Applying Multiple Neural Networks on Large Scale Data Kritsanatt Boonkiatpong and Sukree

More information

Data Set Generation for Rectangular Placement Problems

Data Set Generation for Rectangular Placement Problems Data Set Generation for Rectangular Placeent Probles Christine L. Valenzuela (Muford) Pearl Y. Wang School of Coputer Science & Inforatics Departent of Coputer Science MS 4A5 Cardiff University George

More information

Online Bagging and Boosting

Online Bagging and Boosting Abstract Bagging and boosting are two of the ost well-known enseble learning ethods due to their theoretical perforance guarantees and strong experiental results. However, these algoriths have been used

More information

ON SELF-ROUTING IN CLOS CONNECTION NETWORKS. BARRY G. DOUGLASS Electrical Engineering Department Texas A&M University College Station, TX 77843-3128

ON SELF-ROUTING IN CLOS CONNECTION NETWORKS. BARRY G. DOUGLASS Electrical Engineering Department Texas A&M University College Station, TX 77843-3128 ON SELF-ROUTING IN CLOS CONNECTION NETWORKS BARRY G. DOUGLASS Electrical Engineering Departent Texas A&M University College Station, TX 778-8 A. YAVUZ ORUÇ Electrical Engineering Departent and Institute

More information

Analyzing Spatiotemporal Characteristics of Education Network Traffic with Flexible Multiscale Entropy

Analyzing Spatiotemporal Characteristics of Education Network Traffic with Flexible Multiscale Entropy Vol. 9, No. 5 (2016), pp.303-312 http://dx.doi.org/10.14257/ijgdc.2016.9.5.26 Analyzing Spatioteporal Characteristics of Education Network Traffic with Flexible Multiscale Entropy Chen Yang, Renjie Zhou

More information

Binary Embedding: Fundamental Limits and Fast Algorithm

Binary Embedding: Fundamental Limits and Fast Algorithm Binary Ebedding: Fundaental Liits and Fast Algorith Xinyang Yi The University of Texas at Austin [email protected] Eric Price The University of Texas at Austin [email protected] Constantine Caraanis

More information

Use of extrapolation to forecast the working capital in the mechanical engineering companies

Use of extrapolation to forecast the working capital in the mechanical engineering companies ECONTECHMOD. AN INTERNATIONAL QUARTERLY JOURNAL 2014. Vol. 1. No. 1. 23 28 Use of extrapolation to forecast the working capital in the echanical engineering copanies A. Cherep, Y. Shvets Departent of finance

More information

Factored Models for Probabilistic Modal Logic

Factored Models for Probabilistic Modal Logic Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008 Factored Models for Probabilistic Modal Logic Afsaneh Shirazi and Eyal Air Coputer Science Departent, University of Illinois

More information

PERFORMANCE METRICS FOR THE IT SERVICES PORTFOLIO

PERFORMANCE METRICS FOR THE IT SERVICES PORTFOLIO Bulletin of the Transilvania University of Braşov Series I: Engineering Sciences Vol. 4 (53) No. - 0 PERFORMANCE METRICS FOR THE IT SERVICES PORTFOLIO V. CAZACU I. SZÉKELY F. SANDU 3 T. BĂLAN Abstract:

More information

Halloween Costume Ideas for the Wii Game

Halloween Costume Ideas for the Wii Game Algorithica 2001) 30: 101 139 DOI: 101007/s00453-001-0003-0 Algorithica 2001 Springer-Verlag New York Inc Optial Search and One-Way Trading Online Algoriths R El-Yaniv, 1 A Fiat, 2 R M Karp, 3 and G Turpin

More information

Models and Algorithms for Stochastic Online Scheduling 1

Models and Algorithms for Stochastic Online Scheduling 1 Models and Algoriths for Stochastic Online Scheduling 1 Nicole Megow Technische Universität Berlin, Institut für Matheatik, Strasse des 17. Juni 136, 10623 Berlin, Gerany. eail: [email protected]

More information

Extended-Horizon Analysis of Pressure Sensitivities for Leak Detection in Water Distribution Networks: Application to the Barcelona Network

Extended-Horizon Analysis of Pressure Sensitivities for Leak Detection in Water Distribution Networks: Application to the Barcelona Network 2013 European Control Conference (ECC) July 17-19, 2013, Zürich, Switzerland. Extended-Horizon Analysis of Pressure Sensitivities for Leak Detection in Water Distribution Networks: Application to the Barcelona

More information

Information Processing Letters

Information Processing Letters Inforation Processing Letters 111 2011) 178 183 Contents lists available at ScienceDirect Inforation Processing Letters www.elsevier.co/locate/ipl Offline file assignents for online load balancing Paul

More information

2. FINDING A SOLUTION

2. FINDING A SOLUTION The 7 th Balan Conference on Operational Research BACOR 5 Constanta, May 5, Roania OPTIMAL TIME AND SPACE COMPLEXITY ALGORITHM FOR CONSTRUCTION OF ALL BINARY TREES FROM PRE-ORDER AND POST-ORDER TRAVERSALS

More information

Multi-Class Deep Boosting

Multi-Class Deep Boosting Multi-Class Deep Boosting Vitaly Kuznetsov Courant Institute 25 Mercer Street New York, NY 002 [email protected] Mehryar Mohri Courant Institute & Google Research 25 Mercer Street New York, NY 002 [email protected]

More information

CRM FACTORS ASSESSMENT USING ANALYTIC HIERARCHY PROCESS

CRM FACTORS ASSESSMENT USING ANALYTIC HIERARCHY PROCESS 641 CRM FACTORS ASSESSMENT USING ANALYTIC HIERARCHY PROCESS Marketa Zajarosova 1* *Ph.D. VSB - Technical University of Ostrava, THE CZECH REPUBLIC [email protected] Abstract Custoer relationship

More information

6. Time (or Space) Series Analysis

6. Time (or Space) Series Analysis ATM 55 otes: Tie Series Analysis - Section 6a Page 8 6. Tie (or Space) Series Analysis In this chapter we will consider soe coon aspects of tie series analysis including autocorrelation, statistical prediction,

More information

Factor Model. Arbitrage Pricing Theory. Systematic Versus Non-Systematic Risk. Intuitive Argument

Factor Model. Arbitrage Pricing Theory. Systematic Versus Non-Systematic Risk. Intuitive Argument Ross [1],[]) presents the aritrage pricing theory. The idea is that the structure of asset returns leads naturally to a odel of risk preia, for otherwise there would exist an opportunity for aritrage profit.

More information

Media Adaptation Framework in Biofeedback System for Stroke Patient Rehabilitation

Media Adaptation Framework in Biofeedback System for Stroke Patient Rehabilitation Media Adaptation Fraework in Biofeedback Syste for Stroke Patient Rehabilitation Yinpeng Chen, Weiwei Xu, Hari Sundara, Thanassis Rikakis, Sheng-Min Liu Arts, Media and Engineering Progra Arizona State

More information

Capacity of Multiple-Antenna Systems With Both Receiver and Transmitter Channel State Information

Capacity of Multiple-Antenna Systems With Both Receiver and Transmitter Channel State Information IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO., OCTOBER 23 2697 Capacity of Multiple-Antenna Systes With Both Receiver and Transitter Channel State Inforation Sudharan K. Jayaweera, Student Meber,

More information

Pricing Asian Options using Monte Carlo Methods

Pricing Asian Options using Monte Carlo Methods U.U.D.M. Project Report 9:7 Pricing Asian Options using Monte Carlo Methods Hongbin Zhang Exaensarbete i ateatik, 3 hp Handledare och exainator: Johan Tysk Juni 9 Departent of Matheatics Uppsala University

More information

Quality evaluation of the model-based forecasts of implied volatility index

Quality evaluation of the model-based forecasts of implied volatility index Quality evaluation of the odel-based forecasts of iplied volatility index Katarzyna Łęczycka 1 Abstract Influence of volatility on financial arket forecasts is very high. It appears as a specific factor

More information

Partitioned Elias-Fano Indexes

Partitioned Elias-Fano Indexes Partitioned Elias-ano Indexes Giuseppe Ottaviano ISTI-CNR, Pisa [email protected] Rossano Venturini Dept. of Coputer Science, University of Pisa [email protected] ABSTRACT The Elias-ano

More information

Comment on On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes

Comment on On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes Coent on On Discriinative vs. Generative Classifiers: A Coparison of Logistic Regression and Naive Bayes Jing-Hao Xue ([email protected]) and D. Michael Titterington ([email protected]) Departent

More information

AUC Optimization vs. Error Rate Minimization

AUC Optimization vs. Error Rate Minimization AUC Optiization vs. Error Rate Miniization Corinna Cortes and Mehryar Mohri AT&T Labs Research 180 Park Avenue, Florha Park, NJ 0793, USA {corinna, ohri}@research.att.co Abstract The area under an ROC

More information

Bayes Point Machines

Bayes Point Machines Journal of Machine Learning Research (2) 245 279 Subitted 2/; Published 8/ Bayes Point Machines Ralf Herbrich Microsoft Research, St George House, Guildhall Street, CB2 3NH Cabridge, United Kingdo Thore

More information

Data Streaming Algorithms for Estimating Entropy of Network Traffic

Data Streaming Algorithms for Estimating Entropy of Network Traffic Data Streaing Algoriths for Estiating Entropy of Network Traffic Ashwin Lall University of Rochester Vyas Sekar Carnegie Mellon University Mitsunori Ogihara University of Rochester Jun (Ji) Xu Georgia

More information

Searching strategy for multi-target discovery in wireless networks

Searching strategy for multi-target discovery in wireless networks Searching strategy for ulti-target discovery in wireless networks Zhao Cheng, Wendi B. Heinzelan Departent of Electrical and Coputer Engineering University of Rochester Rochester, NY 467 (585) 75-{878,

More information

Image restoration for a rectangular poor-pixels detector

Image restoration for a rectangular poor-pixels detector Iage restoration for a rectangular poor-pixels detector Pengcheng Wen 1, Xiangjun Wang 1, Hong Wei 2 1 State Key Laboratory of Precision Measuring Technology and Instruents, Tianjin University, China 2

More information

AN ALGORITHM FOR REDUCING THE DIMENSION AND SIZE OF A SAMPLE FOR DATA EXPLORATION PROCEDURES

AN ALGORITHM FOR REDUCING THE DIMENSION AND SIZE OF A SAMPLE FOR DATA EXPLORATION PROCEDURES Int. J. Appl. Math. Coput. Sci., 2014, Vol. 24, No. 1, 133 149 DOI: 10.2478/acs-2014-0011 AN ALGORITHM FOR REDUCING THE DIMENSION AND SIZE OF A SAMPLE FOR DATA EXPLORATION PROCEDURES PIOTR KULCZYCKI,,

More information

Efficient Key Management for Secure Group Communications with Bursty Behavior

Efficient Key Management for Secure Group Communications with Bursty Behavior Efficient Key Manageent for Secure Group Counications with Bursty Behavior Xukai Zou, Byrav Raaurthy Departent of Coputer Science and Engineering University of Nebraska-Lincoln Lincoln, NE68588, USA Eail:

More information

An Innovate Dynamic Load Balancing Algorithm Based on Task

An Innovate Dynamic Load Balancing Algorithm Based on Task An Innovate Dynaic Load Balancing Algorith Based on Task Classification Hong-bin Wang,,a, Zhi-yi Fang, b, Guan-nan Qu,*,c, Xiao-dan Ren,d College of Coputer Science and Technology, Jilin University, Changchun

More information

ABSTRACT KEYWORDS. Comonotonicity, dependence, correlation, concordance, copula, multivariate. 1. INTRODUCTION

ABSTRACT KEYWORDS. Comonotonicity, dependence, correlation, concordance, copula, multivariate. 1. INTRODUCTION MEASURING COMONOTONICITY IN M-DIMENSIONAL VECTORS BY INGE KOCH AND ANN DE SCHEPPER ABSTRACT In this contribution, a new easure of coonotonicity for -diensional vectors is introduced, with values between

More information

Software Quality Characteristics Tested For Mobile Application Development

Software Quality Characteristics Tested For Mobile Application Development Thesis no: MGSE-2015-02 Software Quality Characteristics Tested For Mobile Application Developent Literature Review and Epirical Survey WALEED ANWAR Faculty of Coputing Blekinge Institute of Technology

More information

Cross-Domain Metric Learning Based on Information Theory

Cross-Domain Metric Learning Based on Information Theory Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Cross-Doain Metric Learning Based on Inforation Theory Hao Wang,2, Wei Wang 2,3, Chen Zhang 2, Fanjiang Xu 2. State Key Laboratory

More information

Preference-based Search and Multi-criteria Optimization

Preference-based Search and Multi-criteria Optimization Fro: AAAI-02 Proceedings. Copyright 2002, AAAI (www.aaai.org). All rights reserved. Preference-based Search and Multi-criteria Optiization Ulrich Junker ILOG 1681, route des Dolines F-06560 Valbonne [email protected]

More information

Cooperative Caching for Adaptive Bit Rate Streaming in Content Delivery Networks

Cooperative Caching for Adaptive Bit Rate Streaming in Content Delivery Networks Cooperative Caching for Adaptive Bit Rate Streaing in Content Delivery Networs Phuong Luu Vo Departent of Coputer Science and Engineering, International University - VNUHCM, Vietna [email protected]

More information

Position Auctions and Non-uniform Conversion Rates

Position Auctions and Non-uniform Conversion Rates Position Auctions and Non-unifor Conversion Rates Liad Blurosen Microsoft Research Mountain View, CA 944 [email protected] Jason D. Hartline Shuzhen Nong Electrical Engineering and Microsoft AdCenter

More information

An Integrated Approach for Monitoring Service Level Parameters of Software-Defined Networking

An Integrated Approach for Monitoring Service Level Parameters of Software-Defined Networking International Journal of Future Generation Counication and Networking Vol. 8, No. 6 (15), pp. 197-4 http://d.doi.org/1.1457/ijfgcn.15.8.6.19 An Integrated Approach for Monitoring Service Level Paraeters

More information

Airline Yield Management with Overbooking, Cancellations, and No-Shows JANAKIRAM SUBRAMANIAN

Airline Yield Management with Overbooking, Cancellations, and No-Shows JANAKIRAM SUBRAMANIAN Airline Yield Manageent with Overbooking, Cancellations, and No-Shows JANAKIRAM SUBRAMANIAN Integral Developent Corporation, 301 University Avenue, Suite 200, Palo Alto, California 94301 SHALER STIDHAM

More information

ASIC Design Project Management Supported by Multi Agent Simulation

ASIC Design Project Management Supported by Multi Agent Simulation ASIC Design Project Manageent Supported by Multi Agent Siulation Jana Blaschke, Christian Sebeke, Wolfgang Rosenstiel Abstract The coplexity of Application Specific Integrated Circuits (ASICs) is continuously

More information

RECURSIVE DYNAMIC PROGRAMMING: HEURISTIC RULES, BOUNDING AND STATE SPACE REDUCTION. Henrik Kure

RECURSIVE DYNAMIC PROGRAMMING: HEURISTIC RULES, BOUNDING AND STATE SPACE REDUCTION. Henrik Kure RECURSIVE DYNAMIC PROGRAMMING: HEURISTIC RULES, BOUNDING AND STATE SPACE REDUCTION Henrik Kure Dina, Danish Inforatics Network In the Agricultural Sciences Royal Veterinary and Agricultural University

More information

The Research of Measuring Approach and Energy Efficiency for Hadoop Periodic Jobs

The Research of Measuring Approach and Energy Efficiency for Hadoop Periodic Jobs Send Orders for Reprints to [email protected] 206 The Open Fuels & Energy Science Journal, 2015, 8, 206-210 Open Access The Research of Measuring Approach and Energy Efficiency for Hadoop Periodic

More information

SAMPLING METHODS LEARNING OBJECTIVES

SAMPLING METHODS LEARNING OBJECTIVES 6 SAMPLING METHODS 6 Using Statistics 6-6 2 Nonprobability Sapling and Bias 6-6 Stratified Rando Sapling 6-2 6 4 Cluster Sapling 6-4 6 5 Systeatic Sapling 6-9 6 6 Nonresponse 6-2 6 7 Suary and Review of

More information

INTEGRATED ENVIRONMENT FOR STORING AND HANDLING INFORMATION IN TASKS OF INDUCTIVE MODELLING FOR BUSINESS INTELLIGENCE SYSTEMS

INTEGRATED ENVIRONMENT FOR STORING AND HANDLING INFORMATION IN TASKS OF INDUCTIVE MODELLING FOR BUSINESS INTELLIGENCE SYSTEMS Artificial Intelligence Methods and Techniques for Business and Engineering Applications 210 INTEGRATED ENVIRONMENT FOR STORING AND HANDLING INFORMATION IN TASKS OF INDUCTIVE MODELLING FOR BUSINESS INTELLIGENCE

More information

Real Time Target Tracking with Binary Sensor Networks and Parallel Computing

Real Time Target Tracking with Binary Sensor Networks and Parallel Computing Real Tie Target Tracking with Binary Sensor Networks and Parallel Coputing Hong Lin, John Rushing, Sara J. Graves, Steve Tanner, and Evans Criswell Abstract A parallel real tie data fusion and target tracking

More information

Stable Learning in Coding Space for Multi-Class Decoding and Its Extension for Multi-Class Hypothesis Transfer Learning

Stable Learning in Coding Space for Multi-Class Decoding and Its Extension for Multi-Class Hypothesis Transfer Learning Stable Learning in Coding Space for Multi-Class Decoding and Its Extension for Multi-Class Hypothesis Transfer Learning Bang Zhang, Yi Wang 2, Yang Wang, Fang Chen 2 National ICT Australia 2 School of

More information

Partitioning Data on Features or Samples in Communication-Efficient Distributed Optimization?

Partitioning Data on Features or Samples in Communication-Efficient Distributed Optimization? Partitioning Data on Features or Saples in Counication-Efficient Distributed Optiization? Chenxin Ma Industrial and Systes Engineering Lehigh University, USA [email protected] Martin Taáč Industrial and

More information

Modeling operational risk data reported above a time-varying threshold

Modeling operational risk data reported above a time-varying threshold Modeling operational risk data reported above a tie-varying threshold Pavel V. Shevchenko CSIRO Matheatical and Inforation Sciences, Sydney, Locked bag 7, North Ryde, NSW, 670, Australia. e-ail: [email protected]

More information

Lecture L26-3D Rigid Body Dynamics: The Inertia Tensor

Lecture L26-3D Rigid Body Dynamics: The Inertia Tensor J. Peraire, S. Widnall 16.07 Dynaics Fall 008 Lecture L6-3D Rigid Body Dynaics: The Inertia Tensor Version.1 In this lecture, we will derive an expression for the angular oentu of a 3D rigid body. We shall

More information

Evaluating Software Quality of Vendors using Fuzzy Analytic Hierarchy Process

Evaluating Software Quality of Vendors using Fuzzy Analytic Hierarchy Process IMECS 2008 9-2 March 2008 Hong Kong Evaluating Software Quality of Vendors using Fuzzy Analytic Hierarchy Process Kevin K.F. Yuen* Henry C.W. au Abstract This paper proposes a fuzzy Analytic Hierarchy

More information

Fuzzy Sets in HR Management

Fuzzy Sets in HR Management Acta Polytechnica Hungarica Vol. 8, No. 3, 2011 Fuzzy Sets in HR Manageent Blanka Zeková AXIOM SW, s.r.o., 760 01 Zlín, Czech Republic [email protected] Jana Talašová Faculty of Science, Palacký Univerzity,

More information

Modified Latin Hypercube Sampling Monte Carlo (MLHSMC) Estimation for Average Quality Index

Modified Latin Hypercube Sampling Monte Carlo (MLHSMC) Estimation for Average Quality Index Analog Integrated Circuits and Signal Processing, vol. 9, no., April 999. Abstract Modified Latin Hypercube Sapling Monte Carlo (MLHSMC) Estiation for Average Quality Index Mansour Keraat and Richard Kielbasa

More information

On Computing Nearest Neighbors with Applications to Decoding of Binary Linear Codes

On Computing Nearest Neighbors with Applications to Decoding of Binary Linear Codes On Coputing Nearest Neighbors with Applications to Decoding of Binary Linear Codes Alexander May and Ilya Ozerov Horst Görtz Institute for IT-Security Ruhr-University Bochu, Gerany Faculty of Matheatics

More information

Optimal Resource-Constraint Project Scheduling with Overlapping Modes

Optimal Resource-Constraint Project Scheduling with Overlapping Modes Optial Resource-Constraint Proect Scheduling with Overlapping Modes François Berthaut Lucas Grèze Robert Pellerin Nathalie Perrier Adnène Hai February 20 CIRRELT-20-09 Bureaux de Montréal : Bureaux de

More information

Online Appendix I: A Model of Household Bargaining with Violence. In this appendix I develop a simple model of household bargaining that

Online Appendix I: A Model of Household Bargaining with Violence. In this appendix I develop a simple model of household bargaining that Online Appendix I: A Model of Household Bargaining ith Violence In this appendix I develop a siple odel of household bargaining that incorporates violence and shos under hat assuptions an increase in oen

More information

Performance Evaluation of Machine Learning Techniques using Software Cost Drivers

Performance Evaluation of Machine Learning Techniques using Software Cost Drivers Perforance Evaluation of Machine Learning Techniques using Software Cost Drivers Manas Gaur Departent of Coputer Engineering, Delhi Technological University Delhi, India ABSTRACT There is a treendous rise

More information

Research Article Performance Evaluation of Human Resource Outsourcing in Food Processing Enterprises

Research Article Performance Evaluation of Human Resource Outsourcing in Food Processing Enterprises Advance Journal of Food Science and Technology 9(2): 964-969, 205 ISSN: 2042-4868; e-issn: 2042-4876 205 Maxwell Scientific Publication Corp. Subitted: August 0, 205 Accepted: Septeber 3, 205 Published:

More information

The Velocities of Gas Molecules

The Velocities of Gas Molecules he Velocities of Gas Molecules by Flick Colean Departent of Cheistry Wellesley College Wellesley MA 8 Copyright Flick Colean 996 All rights reserved You are welcoe to use this docuent in your own classes

More information

An Optimal Task Allocation Model for System Cost Analysis in Heterogeneous Distributed Computing Systems: A Heuristic Approach

An Optimal Task Allocation Model for System Cost Analysis in Heterogeneous Distributed Computing Systems: A Heuristic Approach An Optial Tas Allocation Model for Syste Cost Analysis in Heterogeneous Distributed Coputing Systes: A Heuristic Approach P. K. Yadav Central Building Research Institute, Rooree- 247667, Uttarahand (INDIA)

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano

More information

Evaluating the Effectiveness of Task Overlapping as a Risk Response Strategy in Engineering Projects

Evaluating the Effectiveness of Task Overlapping as a Risk Response Strategy in Engineering Projects Evaluating the Effectiveness of Task Overlapping as a Risk Response Strategy in Engineering Projects Lucas Grèze Robert Pellerin Nathalie Perrier Patrice Leclaire February 2011 CIRRELT-2011-11 Bureaux

More information

How To Get A Loan From A Bank For Free

How To Get A Loan From A Bank For Free Finance 111 Finance We have to work with oney every day. While balancing your checkbook or calculating your onthly expenditures on espresso requires only arithetic, when we start saving, planning for retireent,

More information

Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2

Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2 Exploiting Hardware Heterogeneity within the Sae Instance Type of Aazon EC2 Zhonghong Ou, Hao Zhuang, Jukka K. Nurinen, Antti Ylä-Jääski, Pan Hui Aalto University, Finland; Deutsch Teleko Laboratories,

More information

HOW CLOSE ARE THE OPTION PRICING FORMULAS OF BACHELIER AND BLACK-MERTON-SCHOLES?

HOW CLOSE ARE THE OPTION PRICING FORMULAS OF BACHELIER AND BLACK-MERTON-SCHOLES? HOW CLOSE ARE THE OPTION PRICING FORMULAS OF BACHELIER AND BLACK-MERTON-SCHOLES? WALTER SCHACHERMAYER AND JOSEF TEICHMANN Abstract. We copare the option pricing forulas of Louis Bachelier and Black-Merton-Scholes

More information

Markovian inventory policy with application to the paper industry

Markovian inventory policy with application to the paper industry Coputers and Cheical Engineering 26 (2002) 1399 1413 www.elsevier.co/locate/copcheeng Markovian inventory policy with application to the paper industry K. Karen Yin a, *, Hu Liu a,1, Neil E. Johnson b,2

More information

The Application of Bandwidth Optimization Technique in SLA Negotiation Process

The Application of Bandwidth Optimization Technique in SLA Negotiation Process The Application of Bandwidth Optiization Technique in SLA egotiation Process Srecko Krile University of Dubrovnik Departent of Electrical Engineering and Coputing Cira Carica 4, 20000 Dubrovnik, Croatia

More information

A framework for performance monitoring, load balancing, adaptive timeouts and quality of service in digital libraries

A framework for performance monitoring, load balancing, adaptive timeouts and quality of service in digital libraries Int J Digit Libr (2000) 3: 9 35 INTERNATIONAL JOURNAL ON Digital Libraries Springer-Verlag 2000 A fraework for perforance onitoring, load balancing, adaptive tieouts and quality of service in digital libraries

More information

An Approach to Combating Free-riding in Peer-to-Peer Networks

An Approach to Combating Free-riding in Peer-to-Peer Networks An Approach to Cobating Free-riding in Peer-to-Peer Networks Victor Ponce, Jie Wu, and Xiuqi Li Departent of Coputer Science and Engineering Florida Atlantic University Boca Raton, FL 33431 April 7, 2008

More information

Equivalent Tapped Delay Line Channel Responses with Reduced Taps

Equivalent Tapped Delay Line Channel Responses with Reduced Taps Equivalent Tapped Delay Line Channel Responses with Reduced Taps Shweta Sagari, Wade Trappe, Larry Greenstein {shsagari, trappe, ljg}@winlab.rutgers.edu WINLAB, Rutgers University, North Brunswick, NJ

More information

An improved TF-IDF approach for text classification *

An improved TF-IDF approach for text classification * Zhang et al. / J Zheiang Univ SCI 2005 6A(1:49-55 49 Journal of Zheiang University SCIECE ISS 1009-3095 http://www.zu.edu.cn/zus E-ail: [email protected] An iproved TF-IDF approach for text classification

More information

A Fast Algorithm for Online Placement and Reorganization of Replicated Data

A Fast Algorithm for Online Placement and Reorganization of Replicated Data A Fast Algorith for Online Placeent and Reorganization of Replicated Data R. J. Honicky Storage Systes Research Center University of California, Santa Cruz Ethan L. Miller Storage Systes Research Center

More information

CLOSED-LOOP SUPPLY CHAIN NETWORK OPTIMIZATION FOR HONG KONG CARTRIDGE RECYCLING INDUSTRY

CLOSED-LOOP SUPPLY CHAIN NETWORK OPTIMIZATION FOR HONG KONG CARTRIDGE RECYCLING INDUSTRY CLOSED-LOOP SUPPLY CHAIN NETWORK OPTIMIZATION FOR HONG KONG CARTRIDGE RECYCLING INDUSTRY Y. T. Chen Departent of Industrial and Systes Engineering Hong Kong Polytechnic University, Hong Kong [email protected]

More information

Managing Complex Network Operation with Predictive Analytics

Managing Complex Network Operation with Predictive Analytics Managing Coplex Network Operation with Predictive Analytics Zhenyu Huang, Pak Chung Wong, Patrick Mackey, Yousu Chen, Jian Ma, Kevin Schneider, and Frank L. Greitzer Pacific Northwest National Laboratory

More information

SOME APPLICATIONS OF FORECASTING Prof. Thomas B. Fomby Department of Economics Southern Methodist University May 2008

SOME APPLICATIONS OF FORECASTING Prof. Thomas B. Fomby Department of Economics Southern Methodist University May 2008 SOME APPLCATONS OF FORECASTNG Prof. Thoas B. Foby Departent of Econoics Southern Methodist University May 8 To deonstrate the usefulness of forecasting ethods this note discusses four applications of forecasting

More information

Generating Certification Authority Authenticated Public Keys in Ad Hoc Networks

Generating Certification Authority Authenticated Public Keys in Ad Hoc Networks SECURITY AND COMMUNICATION NETWORKS Published online in Wiley InterScience (www.interscience.wiley.co). Generating Certification Authority Authenticated Public Keys in Ad Hoc Networks G. Kounga 1, C. J.

More information

Online Learning, Stability, and Stochastic Gradient Descent

Online Learning, Stability, and Stochastic Gradient Descent Online Learning, Stability, and Stochastic Gradient Descent arxiv:1105.4701v3 [cs.lg] 8 Sep 2011 September 9, 2011 Tomaso Poggio, Stephen Voinea, Lorenzo Rosasco CBCL, McGovern Institute, CSAIL, Brain

More information

1 if 1 x 0 1 if 0 x 1

1 if 1 x 0 1 if 0 x 1 Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Calculating the Return on Investment (ROI) for DMSMS Management. The Problem with Cost Avoidance

Calculating the Return on Investment (ROI) for DMSMS Management. The Problem with Cost Avoidance Calculating the Return on nvestent () for DMSMS Manageent Peter Sandborn CALCE, Departent of Mechanical Engineering (31) 45-3167 [email protected] www.ene.ud.edu/escml/obsolescence.ht October 28, 21

More information

A CHAOS MODEL OF SUBHARMONIC OSCILLATIONS IN CURRENT MODE PWM BOOST CONVERTERS

A CHAOS MODEL OF SUBHARMONIC OSCILLATIONS IN CURRENT MODE PWM BOOST CONVERTERS A CHAOS MODEL OF SUBHARMONIC OSCILLATIONS IN CURRENT MODE PWM BOOST CONVERTERS Isaac Zafrany and Sa BenYaakov Departent of Electrical and Coputer Engineering BenGurion University of the Negev P. O. Box

More information

Physics 211: Lab Oscillations. Simple Harmonic Motion.

Physics 211: Lab Oscillations. Simple Harmonic Motion. Physics 11: Lab Oscillations. Siple Haronic Motion. Reading Assignent: Chapter 15 Introduction: As we learned in class, physical systes will undergo an oscillatory otion, when displaced fro a stable equilibriu.

More information

Budget-optimal Crowdsourcing using Low-rank Matrix Approximations

Budget-optimal Crowdsourcing using Low-rank Matrix Approximations Budget-optial Crowdsourcing using Low-rank Matrix Approxiations David R. Karger, Sewoong Oh, and Devavrat Shah Departent of EECS, Massachusetts Institute of Technology Eail: {karger, swoh, devavrat}@it.edu

More information

Non-Price Equilibria in Markets of Discrete Goods

Non-Price Equilibria in Markets of Discrete Goods Non-Price Equilibria in Markets of Discrete Goods (working paper) Avinatan Hassidi Hai Kaplan Yishay Mansour Noa Nisan ABSTRACT We study arkets of indivisible ites in which price-based (Walrasian) equilibria

More information

Adaptive Modulation and Coding for Unmanned Aerial Vehicle (UAV) Radio Channel

Adaptive Modulation and Coding for Unmanned Aerial Vehicle (UAV) Radio Channel Recent Advances in Counications Adaptive odulation and Coding for Unanned Aerial Vehicle (UAV) Radio Channel Airhossein Fereidountabar,Gian Carlo Cardarilli, Rocco Fazzolari,Luca Di Nunzio Abstract In

More information