BDUOL: Double Updating Online Learning on a Fixed Budget

Size: px
Start display at page:

Download "BDUOL: Double Updating Online Learning on a Fixed Budget"

Transcription

1 BDUOL: Double Updating Online Learning on a Fixed Budget Peilin Zhao and Steven C.H. Hoi School of Computer Engineering, Nanyang Technological University, Singapore {zhao0106,chhoi}@ntu.edu.sg Abstract. Kernel-based online learning often exhibits promising empirical performance for various applications according to previous studies. However, it often suffers a main shortcoming, that is, the unbounded number of support vectors, making it unsuitable for handling large-scale datasets. In this paper, we investigate the problem of budget kernel-based online learning that aims to constrain the number of support vectors by a predefined budget when learning the kernel-based prediction function in the online learning process. Unlike the existing studies, we present a new framework of budget kernel-based online learning based on a recently proposed online learning method called Double Updating Online Learning (DUOL), which has shown state-of-the-art performance as compared with the other traditional kernel-based online learning algorithms. We analyze the theoretical underpinning of the proposed Budget Double Updating Online Learning (BDUOL) framework, and then propose several BDUOL algorithms by designing different budget maintenance strategies. We evaluate the empirical performance of the proposed BDUOL algorithms by comparing them with several well-known budget kernel-based online learning algorithms, in which encouraging results validate the efficacy of the proposed technique. 1 Introduction The goal of kernel-based online learning is to incrementally learn a nonlinear kernel-based prediction function from a sequence of training instances [1 4]. Although it often yields significantly better performance than linear online learning, the main shortcoming of kernel-based online learning is its potentially unbounded number of support vectors with the kernel-based prediction function, which thus requires a large amount of memory for storing support vectors and a high computational cost of making predictions at each iteration, making it unsuitable for large-scale applications. In this paper, we aim to tackle this challenge by studying a framework for kernel-based online learning on a fixed budget or known as budget online learning for short, in which the number of support vectors for the prediction function is bounded by some predefined budget size. In literature, several algorithms have been proposed for budget online learning. Crammer et al. [5] proposed a heuristic approach for budget online learning

2 2 Zhao and Hoi by extending the classical kernel-based perceptron method [6], which was further improved in [7]. The basic idea of these two algorithms is to remove the support vector that has the least impact on the classification performance whenever the budget, i.e., the maximal number of support vectors, is reached. The main shortcoming of these two algorithms is that they are heuristic without solid theoretic supports (e.g., no any mistake/regret bound was given). Forgetron [8] is perhaps the first approach for budget online learning that offers a theoretical bound of the total number of mistakes. In particular, at each iteration, if the online classifier makes a mistake, it conducts a three-step updating: (i) it first runs the standard Perceptron [6] for updating the prediction function; (ii) it then shrinks the weights of support vectors by a carefully chosen scaling factor; and (iii) it finally removes the support vector with the least weight. Another similar approach is the Randomized Budget Perceptron (RBP) [9], which randomly removes one of existing support vectors when the number of support vectors exceeds the predefined budget. In general, RBP achieves similar mistake bound and empirical performance as Forgetron. Unlike the above strategy that discards one support vector to maintain the budget, Projectron [10] adopts a projection strategy to bound the number of support vectors. Specifically, at each iteration where a training example is misclassified, it first updates the kernel classifier by applying a standard Perceptron; it then projects the new classifier into the space spanned by all the support vectors except the new example received at the current iteration, if the difference between the new classifier and its projection is less than a given threshold, otherwise it will remain unchanged. Empirical studies show that Projectron usually outperforms Forgetron in classification but with significantly longer running time. In addition to its high computational cost, another shortcoming of Projectron is that although the number of support vectors is bounded, it is unclear the exact number of support vectors achieved by Projectron in theory. The above budget online learning approaches were designed based on the Perceptron learning framework [6]. In this paper, we propose a new framework of Budget Double Updating Online Learning (BDUOL) based on a recently proposed Double Updating Online Learning(DUOL) technique[4], which has shown state-of-the-art performance for online learning. The key challenge is to develop an appropriate strategy for maintaining the budget whenever the size of support vectors overflows. In this paper, following the theory of double updating online learning, we analyze the theoretical underpinning of the BDUOL framework, and propose a principled approach to developing three different budget maintenance strategies. We also analyze the mistake bounds of the proposed BDUOL algorithms and evaluate their empirical performance extensively. The rest of the paper is organized as follows. Section 2 first introduces the problem setting and then presents both theoretical and algorithmic framework of the proposed budget online learning technique. Section 3 presents several different budget maintenance strategies for BDUOL. Section 4 discusses our empirical studies. Section 5 concludes this work.

3 BDUOL: Budget Double Updating Online Learning 3 2 Double Updating Online Learning on A Fixed Budget In this section, we first introduce the problem setting for online learning and Double Updating Online Learning (DUOL), and then present the details of the proposed Budget Double Updating Online Learning framework. 2.1 Problem Setting We consider the problem of online classification on a fixed budget. Our goal is to learn a prediction function f : R d R from a sequence of training examples {(x 1,y 1 ),...,(x T,y T )}, where x t R d is a d-dimensional instance and y t Y = { 1,+1} is the class label assigned to x t. We use sign(f(x)) to predict the class assignment for any x, and f(x) to measure the classification confidence. Let l(f(x),y) : R Y R be the loss function that penalizes the deviation of estimates f(x) from observed labels y. We refer to the output f of the learning algorithm as a hypothesis and denote the set of all possible hypotheses by H = {f f : R d R}. In this paper, we consider H a Reproducing Kernel Hilbert Space (RKHS) endowed with a kernel function κ(, ) : R d R d R [11] implementing the inner product, such that: 1) reproducing property f,κ(x, ) = f(x) for x R d ; 2) H is the closure of the span of all κ(x, ) with x R d, that is, κ(x, ) H for every x X. The inner product, induces a norm on f H in the usual way: f H := f,f 1 2. To make it clear, we use H κ to denote an RKHS with explicit dependence on kernel function κ. Throughout the analysis, we assume κ(x,x) 1 for any x R d. 2.2 Double Updating Online Learning: A Review Our BDUOL algorithm is designed based on the state-of-the-art Double Updating Online Learning (DUOL) method [4]. Unlike traditional online learning algorithms that usually perform a single update for each misclassified example, DUOL not only updates the weight for the newly added Support Vector (SV), but also updates that of another existing SV, which conflicts most with the new SV. Furthermore, both theoretical and empirical analysis have demonstrated the effectiveness of this algorithm. Below we briefly review the basics of double updating online learning. Consider an incoming instance x t received at the t-th step of online learning. The algorithm predicts the class label ŷ t = sgn(f t 1 (x t )) using the following kernel-based classifier: f t 1 ( ) = i S t 1 γ i y i κ(x i, ), wheres t 1 istheindexsetofthesvsforthe(t 1)-thstep,and γ i istheweightof the i-th existing support vector. After making the prediction, the algorithm will suffer a loss, defined by a hinge loss as l(f t 1 (x t ),y t ) = max(0,1 y t f t 1 (x t )). If l(f t 1 (x t ),y t ) > 0, the DUOL algorithm will update the prediction function f t 1 to f t by adding the training example (x t,y t ) as a new support vector.

4 4 Zhao and Hoi Specifically, when the new added example (x t,y t ) conflicts with (x b,y b ), b S t 1, by satisfying conditions:1)l t = 1 y t f t 1 (x t ) > 0;2) l b = 1 y b f t 1 (x b ) > 0; 3) y t y b κ(x t,x b ) min( ρ,y t y a κ(x t,x a )), a S t 1, and a b, where ρ [0, 1) is a threshold, then the updating strategy referred as double updating will be adopted as follows: f t ( ) = f t 1 ( )+γ t y t κ(x t, )+d γb y b κ(x b, ), where γ t and d γb are computed in the following equations: (C,C ˆγ b ) if (k t C +w ab (C ˆγ b ) l t ) < 0 and (k b (C ˆγ b )+w ab C l b ) < 0 (C, l b w ab C k b ) if w2 ab C w abl b k tk b C+k b l t k b > 0 and l b w ab C k b [ ˆγ b,c ˆγ b ] (γ t,d γb )= ( lt w ab(c ˆγ b ) k t,c ˆγ b ) if lt w ab(c ˆγ b ) k t [0,C] and,(1) l l b k b (C ˆγ b ) w t w ab (C ˆγ b ) ab k t > 0 ( k bl t w ab l b k tk b, ktl b w ab l t wab 2 k tk b ) if k bl t w ab l b wab 2 k tk b [0,C] and wab 2 [ ˆγ b,c ˆγ b ] k tl b w ab l t k tk b w 2 ab where k t = κ(x t,x t ),k b = κ(x b,x b ), w ab = y t y b κ(x t,x b ) and C > 0; or when no existing SV conflicts with (x t,y t ), the single update strategy will be adopted as follows: f t ( ) = f t 1 ( )+γ t y t κ(x t, ), (2) where γ t = min(c,l t /kt 2 ). It is not difficult to see that the single update strategy is reduced to the Passive-Aggressive updating strategy [3]. 2.3 Framework of Budget Double Updating Online Learning Although DUOL outperforms various traditional single updating algorithms, one major limitation is that it does not bound the number of support vectors, which could result in high computation and heavy memory cost when being applied to large-scale applications. In this paper, we aim to overcome this limitation by proposing a budget double updating online learning framework in which the number of support vectors is bounded by a predefined budget size. Let us denote by B a predefined budget size for the maximal number of support vectors associated with the prediction function. The key difference of Budget DUOL over regular DUOL is to develop an appropriate budget maintenance step to ensure that the number of support vectors with the classifier f t is less than the budget B at the beginning of each online updating step. In particular, let us denote by f t 1 the classifier produced by a regular DUOL at the t 1-th step, when the support vector size of f t 1 is equal to the B, BDUOL performs the classifier update towards budget maintenance: f t 1 f t 1 f t 1 such that the support vector size of the updated f t 1 is smaller than B. The details of the proposed Budget DUOL (BDUOL) algorithmic framework are summarized in Algorithm 1.

5 BDUOL: Budget Double Updating Online Learning 5 Algorithm 1 The Budget Double Updating Online Learning Algorithm (BDUOL) Procedure 1: Initialize S 0 =, f 0 = 0; 2: for t=1,2,...,t do 3: Receive a new instance x t; 4: Predict ŷ t = sign(f t 1(x t)); 5: Receive its label y t; 6: l t = max{0,1 y tf t 1(x t)}; 7: if l t > 0 then 8: if ( S t == B) then 9: f t 1 = f t 1 f t 1; (Budget Maintenance) 10: end if 11: l t = max{0,1 y tf t 1(x t)}; 12: if l t > 0 then 13: w min = ; 14: for i S t 1 do 15: if (ft 1 i 1) then 16: if (y iy tκ(x i,x t) w min) then 17: w min = y iy tκ(x i,x t); 18: (x b,y b ) = (x i,y i); 19: end if 20: end if 21: end for 22: ft 1 t = y tf t 1(x t); 23: S t = S t 1 {t}; 24: if (w min ρ) then 25: Compute γ t and d γb using equation (1); 26: for i S t do 27: ft i ft 1 i +y iγ ty tκ(x i,x t)+y id γb y b κ(x i,x b ); 28: end for 29: f t = f t 1 +γ ty tκ(x t, )+d γb y b κ(x b, ); 30: else /* no auxiliary example found */ 31: γ t = min(c,l t/κ(x t,x t)); 32: for i S t do 33: ft i ft 1 i +y iγ ty tκ(x i,x t); 34: end for 35: f t = f t 1 +γ ty tκ(x t, ); 36: end if 37: else 38: f t = f t 1; S t = S t 1; 39: for i S t do 40: ft i ft 1; i 41: end for 42: end if 43: end if 44: end for return f T, S T End Fig. 1. The Algorithms of Budget Double Updating Online Learning (BDUOL).

6 6 Zhao and Hoi The key challenge of BDUOL is to choose an appropriate reduction term f t 1 byaproperbudget maintenancestrategy,whichcan onlymeet the budget requirement but also minimize the impact of the reduction on the prediction performance. Unlike some existing heuristic budget maintenance approaches, in this paper, we propose a principled approach for developing several different budget maintenance strategies. Before presenting the detailed strategies, in the following, we analyze the theoretical underpinning of the proposed budget double updating online learning scheme, which is the theoretical foundation for the proposed budget maintenance strategies in Section Theoretical Analysis In this section, we analyze the mistake bound of the proposed BDUOL algorithm. To simplify the analysis, the primal-dual framework is used to derive the mistake bound following the strategy of DUOL algorithm. Through this framework, we will show that the gap between the mistake bound of DUOL and BDUOL is bounded by the cumulative dual ascent induced by the function reduction, i.e., f = γ i y i κ(x i, ). To facilitate the analysis, we firstly introduce the following lemma, which provides the dual objective function of the SVM. Lemma 1. The dual objective of P t (f) = 1 2 f H κ +C t l(f(x i),y i ), C > 0 is D t (γ 1,...,γ t ) = γ i 1 2 γ i y i κ(x i, ) 2 H κ, γ i [0,C], (3) where the relation between f and γ i, i = 1,...,t is f( ) = t γ iy i κ(x i, ). According to the above lemma, after the t-th budget maintenance, the resultant dual ascent will be computed as follows: Theorem 1 The Dual Ascent DA t = D t (γ 1 γ 1,...,γ t γ t ) D t (γ 1,...,γ t ) for the t-th budget maintenance, i.e., f t = f t f t, is given as follows: Proof. DA t = γ i + γ i y i f t (x i ) 1 2 f t 2 H κ. (4) D t (γ 1 γ 1,...,γ t γ t ) D t (γ 1,...,γ t ) = (γ i γ i ) 1 2 (γ i γ i )y i κ(x i, ) 2 H κ [ γ i 1 2 γ i y i κ(x i, ) 2 H κ ] = = γ i [ f t 2 H κ f t f t 2 H κ ] γ i [ f t 2 H κ f t 2 H κ +2 f t, f t f t 2 H κ ]

7 = = = γ i + f t, f t 1 2 f t 2 H κ γ i + f t, γ i + BDUOL: Budget Double Updating Online Learning 7 γ i y i κ(x i, ) 1 2 f t 2 H κ γ i y i f t (x i ) 1 2 f t 2 H κ. Based on the above dual ascent for budget maintenance, we can now analyze the mistake bound of the proposed BDUOL algorithm. To ease our discussion, we first introduce the following lemma [4] about the mistake bound of DUOL. Lemma 2. Let (x 1,y 1 ),...,(x T,y T ) be a sequence of examples, where x t R d, y t { 1,+1} and κ(x t,x t ) 1 for all t, and assume C 1. Then for any function f in H κ, the number of prediction mistakes M made by DUOL on this sequence of examples is bounded by: 2 min f H κ { 1 2 f 2 H κ +C T where ρ [0,1), Md w(ρ) > 0 and Ms d (ρ) > 0. l(f(x i ),y i ) } ρ2 1+ρ 2 Mw d (ρ) 1 ρ Ms d (ρ), Combining the above lemma with Theorem 1, it is not difficult to derive the following mistake bound for the proposed BDUOL algorithm. Theorem 2 Let (x 1,y 1 ),...,(x T,y T ) be a sequence of examples, where x t R d, y t { 1,+1} and κ(x t,x t ) 1 for all t, and assume C 1. Then for any function f in H κ, the number of prediction mistakes M made by BDUOL on this sequence of examples is bounded by: 2 min f H κ { 1 2 f 2 H κ +C where ρ [0,1). T } l(f(x i ),y i ) 2 T Proof. According to the proof of Lemma 2 [4], we have 1 2 M s + 1+ρ2 Md w 2 (ρ)+ 1 { 1 1 ρ Ms d (ρ) min f H κ 2 f 2 H κ +C DA i ρ2 2 Mw d (ρ) 1+ρ 1 ρ Ms d(ρ), T } l(f(x i ),y i ), and M = M s + Md w(ρ) + Ms d (ρ). Furthermore, by taking the dual ascents of budget maintenance into consideration, we have T DA i + M s ρ2 Md w 2 (ρ)+ Ms d (ρ) { 1 1 ρ min f H κ 2 f 2 H κ +C Rearranging the above inequality will concludes the theorem. T } l(f(x i ),y i ).

8 8 Zhao and Hoi According to the above theorem, we can see that, if there is no budget maintenance step, the mistake bound of BDUOL is reduced to the previous mistake bound for the regular DUOL algorithm. This theorem indicates that in order to minimize the mistake bound of BDUOL, one should try to maximize the cumulative dual ascent, i.e., T DA i, when designing an appropriate budget maintenance strategy. 3 Budget Maintenance Strategies In this section, we follow the above theoretical results to develop several different budget maintenance strategies in a principled approach. In particular, as revealed by Theorem 2, a key to improving the mistake bound of BDUOL is to maximize the cumulative dual ascent T t=1 DA t caused by the budget maintenance. To achieve this purpose, we propose to maximize the dual ascent caused by budget maintenance at each online learning step, i.e., max DA t = γ 1,..., γ t γ i + γ i y i f t (x i ) 1 2 f t 2 H κ. (5) Below, we propose three different budget maintenance strategies, and analyze the principled approach of achieving the best dual ascent as well as the time complexity and memory cost for each strategy. 3.1 BDUOL Algorithm by Removal Strategy The first strategy for budget maintenance is the removal strategy that discards one of existing support vectors, which is similar to the strategies used by Forgetron [8] and RBP [9]. Unlike the previous heuristic removal strategy, the key idea of our removal strategy is to discard the support vector which can maximize the dual ascent by following our previous analysis. Specifically, let us assume the j-th SV is selected for removal. We then have the following function reduction term: f t = γ j y j κ(x j, ). (6) Asaresult,theoptimalremovalsolutionistodiscardtheSVwhichcanmaximize the following dual ascent term: DA t,j = γ j (1 y j f t (x j )) 1 2 (γ j) 2 κ(x j,x j ). (7) We note that the above removal strategy is similar with the one in [5], when the Gaussian kernel is adopted where κ(x j,x j ) = 1. However, our strategy is strongly theoretically motivated. Complexity Analysis. Since the BDUOL algorithm will cache the value y j f t (x j ) foreverysv, the computationalcomplexityofda t,j is O(1) in practice. Thus, this BDUOL algorithm requires O(B) time complexity of computing all

9 BDUOL: Budget Double Updating Online Learning 9 the DA t,j s. After removing the SV with the largest DA t,j, the complexity of updating all the values of y j f t (x j ) is O(B). Furthermore, combining the above discussion with the fact that the original DUOL s time complexity is O(B), we can conclude that the overall time complexity of this BDUOL algorithm is also O(B). As for the memory cost, since only B SVs, their weight parameters and the y j f t (x j )s have to be cached, the space complexity is thus also O(B). 3.2 BDUOL Algorithm by Projection Strategy Although the above removal strategy is optimized to find the best support vector that maximizes the dual ascent (i.e., minimizes the loss of dual ascent caused by budget maintenance), it is still unavoidable to result in the loss of the dual ascent due to the removal of one existing support vector. To minimize such loss, we propose a projection strategy for budget maintenance. Specifically, in the projection strategy, the selected j-th SV for removal will be projected to the space spanned by the rest SVs. The objective is to find the function closest to γ j y j κ(x j, ) in the space spanned by the remaining SVs, or formally: min γ jy j κ(x j, ) β i y i κ(x i, ) 2 H β κ. (8) i [ γ i,c γ i],i j i j which is essentially a Quadratic Programming (QP) problem. So, we can exploit the existing efficient QP solvers to find the optimal solution. However, solving the above QP problem directly may not be efficient enough for online learning purpose. To further improve the efficiency, we also proposed an approximate solution by firstly solving the unconstrained optimization problem and then projecting the solution into the feasible region of the constraints. Specifically, setting the gradient of the above equation with respect to β = [β i ], i j as zero, one can obtain the optimal solution as β = γ j y j K 1 k j./y, (9) where K is the kernel matrix for x i,i j, k j = [κ(x i,x j )], i j,./ is elementwise division and y = [y i ], i j. In the above, inverting K can be efficiently realized by using Woodbury formula [12]. As a result, we should set f t ( ) = γ j y j κ(x j, ) i j β i y i κ(x i, ). (10) However, the resultant f t f t s SV weights may not fall into the range [0,C]. To fix this problem, we will project each β i as follows: β = Π [ γ,c γ] (γ j y j K 1 k j./y), (11) where Π [a,b] (x) = max(a,min(b,x)) and γ = [γ i ], i j.

10 10 Zhao and Hoi Now the key problem is to find the best SV among B + 1 candidates for projection. Since after the projection of γ j y j κ(x j, ), the resultant dual ascent is DA t,j = i γ i + i γ i y i f t (x i ) 1 2 f t 2 H κ. (12) So the best SV is the one which can achieve the largest value DA t,j. Complexity Analysis. The time complexity for BDUOL is dominated by the computation K 1. Computing K 1 using K will cost O(B 3 ) time, however using Woodbury formula, we can efficiently compute it using only O(B 2 ) time. In addition, we need to compute B times projections for every SV, so the total time complexity for one step of updating is O(B 3 ). Finally, the memory burden for the BDUOL algorithm is O(B 2 ), since the storage of kernel matrix K and inverse kernel matrix K 1 dominated the main memory cost. 3.3 BDUOL Algorithm by Nearest Neighbor Strategy The above projection strategy is able to achieve a better improvement of dual ascent than the removal strategy, it is however much more computationally expensive. To balance the tradeoff between efficiency and effectiveness, we propose an efficient nearest neighbor strategy which approximates the projection strategy by projecting the removed SV to its nearest neighbor SV, based on the distance in the mapped feature space. This strategy is motivated by the fact that the nearest neighbor SV usually could be a good representative of the removed SV. Using this strategy, we can significantly improve the time efficiency. In particular, as we use only the nearest neighbor for projection, the corresponding solution according to equation 11 can be expressed: β Nj = Π [ γnj,c γ Nj ](γ j y j κ(x Nj,x Nj ) 1 κ(x Nj,x j )/y Nj ), (13) where κ(x Nj, ) is the nearest neighbor of κ(x j, ). As a result, the corresponding f t = γ j y j κ(x j, ) β Nj y Nj κ(x Nj, ). Since after the projection of γ j y j κ(x j, ), the resultant dual ascent is DA t,j = i γ i + i γ i y i f t (x i ) 1 2 f t 2 H κ. (14) Thus, the best SV is the one that has the largest value DA t,j. Complexity Analysis. All the computation steps except looking for the nearest neighbor have constant time complexity of O(1). For improving nearest neighbor searching, we can cache the indexes and the distances about the nearest neighbors of the SVs, making finding the nearest neighbor in O(1). After remove the best SV, we should also update the caches. The time complexity for this updating is at most O(B). In summary, the time complexity for the overall strategy is O(B), and the overall memory cost is also O(B). Remark: To improve the efficiencies of the projection and nearest neighbor strategies, we could use the projection or the nearest neighbor strategy to keep the information from the SV selected by the removal strategy and then remove it, since the search cost of the removal strategy is quite lower than the other two strategies.

11 4 Experimental Results BDUOL: Budget Double Updating Online Learning 11 In this section, we evaluate the empirical performance of the proposed algorithms for Budget Double Updating Online Learning(BDUOL) by comparing them with the state-of-the-art algorithms for budget online learning. 4.1 Algorithms for Comparison In our experiments, we implement the proposed BDUOL algorithms as follows: BDUOL remo : the BDUOL algorithm by the removal strategy for budget maintenance described in section 3.1, BDUOL proj : the BDUOL algorithm by the exact projection strategy by a standard QP solver for budget maintenance described in section 3.2, BDUOL appr : the BDUOL algorithm by the approximate projection strategy for budget maintenance described in section 3.2, BDUOL near : the BDUOL algorithm by the nearest neighbor strategy for budget maintenance described in section 3.3, For comparison, we include the following state-of-the-art algorithms for budget online learning: RBP : the Random Budget Perceptron algorithm [9], Forgetron : the Forgetron algorithm [8], Projectron : the Projectron algorithm [10], and Projectron++ : the aggressive version of Projectron algorithm [10]. Besides, we also include two non-budget online learning algorithms as yardstick: Perceptron : the classical Perceptron algorithm [6], and DUOL : the Double Updating Online Learning algorithm [4]. 4.2 Experimental Testbed and Setup Table 1. Details of the datasets in our experiments. Dataset # instances # features german MITface mushrooms spambase splice w7a We test all the algorithms on six benchmark data sets from web machine learning repositories listed in Table 1. These data sets can be downloaded from LIBSVM website 1, UCI machine learning repository 2, and MIT CBCL face

12 12 Zhao and Hoi data sets 3. These datasets were chosen fairly randomly to cover various sizes of datasets. To make a fair comparison, all the algorithms in our comparison adopt the same experimental setup. A gaussian kernel is adopted in our study, for which the kernel width is set to 8 for all the algorithms and datasets. To make the number of support vectors fixed for Projectron and Projectron++ algorithms, we simply store the received SVs before the budget overflows, and then project the subsequent ones into the space spanned by the stored SVs afterward. The penalty parameter C in the DUOL algorithm was selected by 5-fold cross validation for all the datasets from range 2 [ 10:10]. Due to the cross-validation, we randomly divide every dataset into two equal subsets: cross validation (CV) dataset and online learning dataset. And C is set as the same value with DUOL. Furthermore, the value ρ for the DUOL and its budget variants is set as 0, according to the previous study on the effect of ρ. The budget sizes B for different datasets are set as proper fractions of the support vector size of Perceptron, which are shown in Table 3. All the experiments were conducted 20 times, each with a different random permutation of data points. All the results were reported by averaging over the 20 runs. For performance metrics, we evaluate the online classification performance by evaluating online cumulative mistake rates and running time cost. 4.3 Performance Evaluation of Non-budget Algorithms Table 2 summarizes the average performance of the two non-budget algorithms for kernel-based online learning. First of all, similar to the previous study [4], we found that DUOL outperforms Perceptron significantly for all the datasets according to t-test results, which validates our motivation of choosing DUOL as the basic online learning algorithm for budget online learning. Second, we noticed that the support vector size of DUOL is in general much larger than that of Perceptron. Finally, the time cost of DUOL is much higher than that of Perceptron, mostly due to the larger number of support vectors. Both the large number of support vectors and high computational time motivate the need of studying budget DUOL algorithms in this work. Algorithm Perceptron DUOL Datasets Mistake (%) Support Vectors (#) Time (s) Mistakes (%) Support Vectors (#) Time (s) german %± ± %± ± MITface %± ± %± ± mushrooms %± ± %± ± spambase %± ± %± ± splice %± ± %± ± w7a %± ± %± ± Table 2. Evaluation of non-budget algorithms on the the data sets. 3

13 BDUOL: Budget Double Updating Online Learning Performance Evaluation of Budget Algorithms Table 3 summarizes the results of different budget online learning algorithms. We can draw several observations. First of all, we observe that RBP and Forgetron achieve very similar performance for most cases. In addition, we also find that Projectron++ achieves a lower mistake rate than Projectron for almost all the datasets and for varied budget sizes, which is similar to the previous results reported in [10]. Moreover, compared with the baseline algorithms RBP and Forgetron, the proposed BDUOL remo algorithm using a simple removal strategy achieves comparable or better mistake rate when the budget size is large, but fails to improve when the budget size is very small, which indicates a simple removal strategy may not be always effective and a better budget maintenance strategy is needed. Second, among all the algorithms in comparison for budget online learning, we find that BDUOL proj always achieves the lowest mistake rates for most cases. These promising results indicate the projection strategy can effectively reduce the information loss. However, we also notice that the time cost of the BDUOL proj is among the highest ones, which indicates it is important to find some more efficient strategy. Third, by comparing two approximate strategies, we find that BDUOL appr achieves better mistake rates than BDUOL near only on the german and mushroomsdatasets,while it consumestoo muchtime than the proposedbduol near algorithms, which indicates BDUOL near achieves better trade off between mistake rates and time complexity than BDUOL appr. In addition, when the number of budget is large, BDUOL near always achieves similar performance with the BDUOL proj, while consumes significantly less time, which indicates the proposed nearest neighbor strategy is a good alternative of the projection strategy. Finally, Figure 2 and Figure 3 show the detailed online evaluation processes of the several budget online learning algorithms. Similar observations from these figures further verified the efficacy of the proposed BDUOL technique. 5 Conclusions This paper presented a new framework of budget double updating online learning for kernel-based online learning on a fixed budget, which requires the number of support vectors associated with the prediction function is always bounded by a predefined budget. We theoretically analyzed its performance, which reveals that its effectiveness is tightly connected with the dual ascent achieved by the model reduction for budget maintenance. Based on the theoretical analysis, we proposed three budget maintenance strategies: removal, projection, and nearest neighbor. We evaluate the proposed algorithms extensively on benchmark datasets. The promising empirical results show that the proposed algorithms outperform the state-of-the-art budget online learning algorithms in terms of mistake rates. Future work will exploit different budget maintenance strategies and extend the proposed work to multi-class budgeted online learning.

14 14 Zhao and Hoi Budget Size B=50 B=100 B=150 Dataset Algorithm Mistake (%) Time (s) Mistakes (%) Time (s) Mistakes (%) Time (s) german RBP %± %± %± Fogetron %± %± %± Projectron %± %± %± Projectron %± %± %± BDUOLremo %± %± %± BDUOLnear %± %± %± BDUOLappr %± %± %± BDUOL proj %± %± %± Budget Size B=50 B=100 B=150 Dataset Algorithm Mistake (%) Time (s) Mistakes (%) Time (s) Mistakes (%) Time (s) MITface RBP %± %± %± Fogetron %± %± %± Projectron %± %± %± Projectron %± %± %± BDUOLremo %± %± %± BDUOLnear %± %± %± BDUOLappr %± %± %± BDUOL proj %± %± %± Budget Size B=50 B=75 B=100 Dataset Algorithm Mistake (%) Time (s) Mistakes (%) Time (s) Mistakes (%) Time (s) mushrooms RBP %± %± %± Fogetron %± %± %± Projectron %± %± %± Projectron %± %± %± BDUOLremo %± %± %± BDUOLnear %± %± %± BDUOLappr %± %± %± BDUOL proj %± %± %± Budget Size B=200 B=400 B=600 Dataset Algorithm Mistake (%) Time (s) Mistakes (%) Time (s) Mistakes (%) Time (s) spambase RBP %± %± %± Fogetron %± %± %± Projectron %± %± %± Projectron %± %± %± BDUOLremo %± %± %± BDUOLnear %± %± %± BDUOLappr %± %± %± BDUOL proj %± %± %± Budget Size B=100 B=200 B=300 Dataset Algorithm Mistake (%) Time (s) Mistakes (%) Time (s) Mistakes (%) Time (s) splice RBP %± %± %± Fogetron %± %± %± Projectron %± %± %± Projectron %± %± %± BDUOLremo %± %± %± BDUOLnear %± %± %± BDUOLappr %± %± %± BDUOL proj %± %± %± Budget Size B=300 B=400 B=500 Dataset Algorithm Mistake (%) Time (s) Mistakes (%) Time (s) Mistakes (%) Time (s) w7a RBP %± %± %± Fogetron %± %± %± Projectron %± %± %± Projectron %± %± %± BDUOLremo %± %± %± BDUOLnear %± %± %± BDUOLappr %± %± %± BDUOL proj %± %± %± Table 3. Evaluation of several budgeted algorithms with varied budget sizes.

15 BDUOL: Budget Double Updating Online Learning RBP Forgetron Projectron Projectron++ BDUOL remo BDUOL near BDUOL appr BDUOL proj (a) german(b=50) (b) german(b=100) (c) german(b=150) RBP Forgetron Projectron Projectron++ BDUOL remo BDUOL near BDUOL appr BDUOL proj (d) MITface(B=50) (e) MITface(B=100) (f) MITface(B=150) RBP Forgetron Projectron Projectron++ BDUOL remo BDUOL near BDUOL appr BDUOL proj (g) mushrooms(b=50) (h) mushrooms(b=75) (i) mushrooms(b=100) Fig.2. Evaluation of online mistake rates against the number of samples on three datasets. The plotted curves are averaged over 20 random permutations. Acknowledgments This work was supported by Singapore MOE tier 1 grant (RG33/11) and Microsoft Research grant (M ). References 1. Jyrki Kivinen, Alexander J. Smola, and Robert C. Williamson. Online learning with kernels. IEEE Transactions on Signal Processing, 52(8): , Li Cheng, S. V. N. Vishwanathan, Dale Schuurmans, Shaojun Wang, and Terry Caelli. Implicit online learning with kernels. In NIPS, pages , Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7: , Peilin Zhao, Steven C. H. Hoi, and Rong Jin. Double updating online learning. Journal of Machine Learning Research, 12: , 2011.

16 16 Zhao and Hoi RBP Forgetron Projectron Projectron++ BDUOL remo BDUOL near BDUOL appr BDUOL proj (j) spambase(b=200) (k) spambase(b=400) (l) spambase(b=600) RBP Forgetron Projectron Projectron++ BDUOL remo BDUOL near BDUOL appr BDUOL proj (m) splice(b=100) (n) splice(b=200) (o) splice(b=300) RBP Forgetron Projectron Projectron++ BDUOL remo BDUOL near BDUOL appr BDUOL proj (p) w7a(b=300) (q) w7a(b=400) (r) w7a(b=500) Fig.3. Evaluation of online mistake rates against the number of samples on three datasets. The plotted curves are averaged over 20 random permutations. 5. Koby Crammer, Jaz S. Kandola, and Yoram Singer. Online classification on a budget. In NIPS, Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65: , Jason Weston and Antoine Bordes. Online (and offline) on an even tighter budget. In AISTATS, pages , Ofer Dekel, Shai Shalev-Shwartz, and Yoram Singer. The forgetron: A kernel-based perceptron on a budget. SIAM J. Comput., 37(5): , Giovanni Cavallanti, Nicolò Cesa-Bianchi, and Claudio Gentile. Tracking the best hyperplane with a simple budget perceptron. Machine Learning, 69(2-3): , Francesco Orabona, Joseph Keshet, and Barbara Caputo. Bounded kernel-based online learning. Journal of Machine Learning Research, 10: , Vladimir N. Vapnik. Statistical Learning Theory. Wiley, Gert Cauwenberghs and Tomaso Poggio. Incremental and decremental support vector machine learning. In Advances in Neural Information Processing Systems 13 (NIPS), pages , 2000.

DUOL: A Double Updating Approach for Online Learning

DUOL: A Double Updating Approach for Online Learning : A Double Updating Approach for Online Learning Peilin Zhao School of Comp. Eng. Nanyang Tech. University Singapore 69798 zhao6@ntu.edu.sg Steven C.H. Hoi School of Comp. Eng. Nanyang Tech. University

More information

Online Passive-Aggressive Algorithms on a Budget

Online Passive-Aggressive Algorithms on a Budget Zhuang Wang Dept. of Computer and Information Sciences Temple University, USA zhuang@temple.edu Slobodan Vucetic Dept. of Computer and Information Sciences Temple University, USA vucetic@temple.edu Abstract

More information

Online Feature Selection for Mining Big Data

Online Feature Selection for Mining Big Data Online Feature Selection for Mining Big Data Steven C.H. Hoi, Jialei Wang, Peilin Zhao, Rong Jin School of Computer Engineering, Nanyang Technological University, Singapore Department of Computer Science

More information

Online Classification on a Budget

Online Classification on a Budget Online Classification on a Budget Koby Crammer Computer Sci. & Eng. Hebrew University Jerusalem 91904, Israel kobics@cs.huji.ac.il Jaz Kandola Royal Holloway, University of London Egham, UK jaz@cs.rhul.ac.uk

More information

Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.

Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass. Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

More information

Steven C.H. Hoi. Methods and Applications. School of Computer Engineering Nanyang Technological University Singapore 4 May, 2013

Steven C.H. Hoi. Methods and Applications. School of Computer Engineering Nanyang Technological University Singapore 4 May, 2013 Methods and Applications Steven C.H. Hoi School of Computer Engineering Nanyang Technological University Singapore 4 May, 2013 http://libol.stevenhoi.org/ Acknowledgements Peilin Zhao Jialei Wang Rong

More information

Support Vector Machine (SVM)

Support Vector Machine (SVM) Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

A Potential-based Framework for Online Multi-class Learning with Partial Feedback

A Potential-based Framework for Online Multi-class Learning with Partial Feedback A Potential-based Framework for Online Multi-class Learning with Partial Feedback Shijun Wang Rong Jin Hamed Valizadegan Radiology and Imaging Sciences Computer Science and Engineering Computer Science

More information

Towards better accuracy for Spam predictions

Towards better accuracy for Spam predictions Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

Online (and Offline) on an Even Tighter Budget

Online (and Offline) on an Even Tighter Budget Online (and Offline) on an Even Tighter Budget Jason Weston NEC Laboratories America, Princeton, NJ, USA jasonw@nec-labs.com Antoine Bordes NEC Laboratories America, Princeton, NJ, USA antoine@nec-labs.com

More information

Fast Kernel Classifiers with Online and Active Learning

Fast Kernel Classifiers with Online and Active Learning Journal of Machine Learning Research 6 (2005) 1579 1619 Submitted 3/05; Published 9/05 Fast Kernel Classifiers with Online and Active Learning Antoine Bordes NEC Laboratories America 4 Independence Way

More information

How To Train A Classifier With Active Learning In Spam Filtering

How To Train A Classifier With Active Learning In Spam Filtering Online Active Learning Methods for Fast Label-Efficient Spam Filtering D. Sculley Department of Computer Science Tufts University, Medford, MA USA dsculley@cs.tufts.edu ABSTRACT Active learning methods

More information

Online Kernel Selection: Algorithms and Evaluations

Online Kernel Selection: Algorithms and Evaluations Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Online Kernel Selection: Algorithms and Evaluations Tianbao Yang 1, Mehrdad Mahdavi 1, Rong Jin 1, Jinfeng Yi 1, Steven C. H.

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Date: April 12, 2001. Contents

Date: April 12, 2001. Contents 2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

More information

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the

More information

Large Scale Learning to Rank

Large Scale Learning to Rank Large Scale Learning to Rank D. Sculley Google, Inc. dsculley@google.com Abstract Pairwise learning to rank methods such as RankSVM give good performance, but suffer from the computational burden of optimizing

More information

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.

More information

Offline sorting buffers on Line

Offline sorting buffers on Line Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com

More information

A fast multi-class SVM learning method for huge databases

A fast multi-class SVM learning method for huge databases www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Multiple Kernel Learning on the Limit Order Book

Multiple Kernel Learning on the Limit Order Book JMLR: Workshop and Conference Proceedings 11 (2010) 167 174 Workshop on Applications of Pattern Analysis Multiple Kernel Learning on the Limit Order Book Tristan Fletcher Zakria Hussain John Shawe-Taylor

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

Online Learning for Big Data Analytics

Online Learning for Big Data Analytics Online Learning for Big Data Analytics Irwin King and Haiqin Yang Dept. of Computer Science and Engineering The Chinese University of Hong Kong 1 Outline Introduction Big data: definition and history Online

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Journal of Machine Learning Research 7 2006) 551 585 Submitted 5/05; Published 3/06 Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Joseph Keshet Shai Shalev-Shwartz Yoram Singer School of

More information

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

More information

Mathematical finance and linear programming (optimization)

Mathematical finance and linear programming (optimization) Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

Lecture 2: The SVM classifier

Lecture 2: The SVM classifier Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function

More information

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725 Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

More information

The Artificial Prediction Market

The Artificial Prediction Market The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

More information

Machine Learning in Spam Filtering

Machine Learning in Spam Filtering Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

More information

AdaBoost. Jiri Matas and Jan Šochman. Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz

AdaBoost. Jiri Matas and Jan Šochman. Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz AdaBoost Jiri Matas and Jan Šochman Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Presentation Outline: AdaBoost algorithm Why is of interest? How it works? Why

More information

Monotonicity Hints. Abstract

Monotonicity Hints. Abstract Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology

More information

Cost-Sensitive Online Active Learning with Application to Malicious URL Detection

Cost-Sensitive Online Active Learning with Application to Malicious URL Detection Cost-Sensitive Online Active Learning with Application to Malicious URL Detection ABSTRACT Peilin Zhao School of Computer Engineering Nanyang Technological University 50 Nanyang Avenue, Singapore 639798

More information

Notes from Week 1: Algorithms for sequential prediction

Notes from Week 1: Algorithms for sequential prediction CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 22-26 Jan 2007 1 Introduction In this course we will be looking

More information

BANACH AND HILBERT SPACE REVIEW

BANACH AND HILBERT SPACE REVIEW BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but

More information

α = u v. In other words, Orthogonal Projection

α = u v. In other words, Orthogonal Projection Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v

More information

Online Learning in Biometrics: A Case Study in Face Classifier Update

Online Learning in Biometrics: A Case Study in Face Classifier Update Online Learning in Biometrics: A Case Study in Face Classifier Update Richa Singh, Mayank Vatsa, Arun Ross, and Afzel Noore Abstract In large scale applications, hundreds of new subjects may be regularly

More information

Introduction to Online Learning Theory

Introduction to Online Learning Theory Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent

More information

Steven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore

Steven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore Steven C.H. Hoi School of Computer Engineering Nanyang Technological University Singapore Acknowledgments: Peilin Zhao, Jialei Wang, Hao Xia, Jing Lu, Rong Jin, Pengcheng Wu, Dayong Wang, etc. 2 Agenda

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Sparse Online Learning via Truncated Gradient

Sparse Online Learning via Truncated Gradient Sparse Online Learning via Truncated Gradient John Langford Yahoo! Research jl@yahoo-inc.com Lihong Li Department of Computer Science Rutgers University lihong@cs.rutgers.edu Tong Zhang Department of Statistics

More information

Trading regret rate for computational efficiency in online learning with limited feedback

Trading regret rate for computational efficiency in online learning with limited feedback Trading regret rate for computational efficiency in online learning with limited feedback Shai Shalev-Shwartz TTI-C Hebrew University On-line Learning with Limited Feedback Workshop, 2009 June 2009 Shai

More information

List of Publications by Claudio Gentile

List of Publications by Claudio Gentile List of Publications by Claudio Gentile Claudio Gentile DiSTA, University of Insubria, Italy claudio.gentile@uninsubria.it November 6, 2013 Abstract Contains the list of publications by Claudio Gentile,

More information

Lecture 3: Finding integer solutions to systems of linear equations

Lecture 3: Finding integer solutions to systems of linear equations Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture

More information

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Semi-Supervised Support Vector Machines and Application to Spam Filtering Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery

More information

Mind the Duality Gap: Logarithmic regret algorithms for online optimization

Mind the Duality Gap: Logarithmic regret algorithms for online optimization Mind the Duality Gap: Logarithmic regret algorithms for online optimization Sham M. Kakade Toyota Technological Institute at Chicago sham@tti-c.org Shai Shalev-Shartz Toyota Technological Institute at

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

SVM Ensemble Model for Investment Prediction

SVM Ensemble Model for Investment Prediction 19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of

More information

Decompose Error Rate into components, some of which can be measured on unlabeled data

Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Decomposition for Regression Bias-Variance Decomposition for Classification Bias-Variance

More information

Online Training on a Budget of Support Vector Machines Using Twin Prototypes

Online Training on a Budget of Support Vector Machines Using Twin Prototypes Online Training on a Budget of Support Vector Machines Using Twin Prototypes Zhuang Wang and Slobodan Vucetic Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA Received

More information

How To Learn From Noisy Distributions On Infinite Dimensional Spaces

How To Learn From Noisy Distributions On Infinite Dimensional Spaces Learning Kernel Perceptrons on Noisy Data using Random Projections Guillaume Stempfel, Liva Ralaivola Laboratoire d Informatique Fondamentale de Marseille, UMR CNRS 6166 Université de Provence, 39, rue

More information

Multiclass Classification. 9.520 Class 06, 25 Feb 2008 Ryan Rifkin

Multiclass Classification. 9.520 Class 06, 25 Feb 2008 Ryan Rifkin Multiclass Classification 9.520 Class 06, 25 Feb 2008 Ryan Rifkin It is a tale Told by an idiot, full of sound and fury, Signifying nothing. Macbeth, Act V, Scene V What Is Multiclass Classification? Each

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

Duality in Linear Programming

Duality in Linear Programming Duality in Linear Programming 4 In the preceding chapter on sensitivity analysis, we saw that the shadow-price interpretation of the optimal simplex multipliers is a very useful concept. First, these shadow

More information

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

The Set Covering Machine

The Set Covering Machine Journal of Machine Learning Research 3 (2002) 723-746 Submitted 12/01; Published 12/02 The Set Covering Machine Mario Marchand School of Information Technology and Engineering University of Ottawa Ottawa,

More information

NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

More information

Loss Functions for Preference Levels: Regression with Discrete Ordered Labels

Loss Functions for Preference Levels: Regression with Discrete Ordered Labels Loss Functions for Preference Levels: Regression with Discrete Ordered Labels Jason D. M. Rennie Massachusetts Institute of Technology Comp. Sci. and Artificial Intelligence Laboratory Cambridge, MA 9,

More information

Vector and Matrix Norms

Vector and Matrix Norms Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

More information

1. Classification problems

1. Classification problems Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification

More information

Support Vector Machines for Classification and Regression

Support Vector Machines for Classification and Regression UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer

More information

4.1 Introduction - Online Learning Model

4.1 Introduction - Online Learning Model Computational Learning Foundations Fall semester, 2010 Lecture 4: November 7, 2010 Lecturer: Yishay Mansour Scribes: Elad Liebman, Yuval Rochman & Allon Wagner 1 4.1 Introduction - Online Learning Model

More information

Integer Factorization using the Quadratic Sieve

Integer Factorization using the Quadratic Sieve Integer Factorization using the Quadratic Sieve Chad Seibert* Division of Science and Mathematics University of Minnesota, Morris Morris, MN 56567 seib0060@morris.umn.edu March 16, 2011 Abstract We give

More information

Adapting Codes and Embeddings for Polychotomies

Adapting Codes and Embeddings for Polychotomies Adapting Codes and Embeddings for Polychotomies Gunnar Rätsch, Alexander J. Smola RSISE, CSL, Machine Learning Group The Australian National University Canberra, 2 ACT, Australia Gunnar.Raetsch, Alex.Smola

More information

Support Vector Machine. Tutorial. (and Statistical Learning Theory)

Support Vector Machine. Tutorial. (and Statistical Learning Theory) Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@nec-labs.com 1 Support Vector Machines: history SVMs introduced

More information

MONITORING SYSTEMS IN INTENSIVE CARE UNITS USING INCREMENTAL SUPPORT VECTOR MACHINES

MONITORING SYSTEMS IN INTENSIVE CARE UNITS USING INCREMENTAL SUPPORT VECTOR MACHINES MONITORING SYSTEMS IN INTENSIVE CARE UNITS USING INCREMENTAL SUPPORT VECTOR MACHINES Fahmi Ben Rejab, KaoutherNouira and AbdelwahedTrabelsi BESTMOD, Institut Supérieur de Gestion de Tunis, Université de

More information

Scalable Developments for Big Data Analytics in Remote Sensing

Scalable Developments for Big Data Analytics in Remote Sensing Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,

More information

Predicting Flight Delays

Predicting Flight Delays Predicting Flight Delays Dieterich Lawson jdlawson@stanford.edu William Castillo will.castillo@stanford.edu Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing

More information

PRIME FACTORS OF CONSECUTIVE INTEGERS

PRIME FACTORS OF CONSECUTIVE INTEGERS PRIME FACTORS OF CONSECUTIVE INTEGERS MARK BAUER AND MICHAEL A. BENNETT Abstract. This note contains a new algorithm for computing a function f(k) introduced by Erdős to measure the minimal gap size in

More information

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,

More information

Early defect identification of semiconductor processes using machine learning

Early defect identification of semiconductor processes using machine learning STANFORD UNIVERISTY MACHINE LEARNING CS229 Early defect identification of semiconductor processes using machine learning Friday, December 16, 2011 Authors: Saul ROSA Anton VLADIMIROV Professor: Dr. Andrew

More information

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1 (R) User Manual Environmental Energy Technologies Division Berkeley, CA 94720 http://simulationresearch.lbl.gov Michael Wetter MWetter@lbl.gov February 20, 2009 Notice: This work was supported by the U.S.

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

More information

Maximum Margin Clustering

Maximum Margin Clustering Maximum Margin Clustering Linli Xu James Neufeld Bryce Larson Dale Schuurmans University of Waterloo University of Alberta Abstract We propose a new method for clustering based on finding maximum margin

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

An improved on-line algorithm for scheduling on two unrestrictive parallel batch processing machines

An improved on-line algorithm for scheduling on two unrestrictive parallel batch processing machines This is the Pre-Published Version. An improved on-line algorithm for scheduling on two unrestrictive parallel batch processing machines Q.Q. Nong, T.C.E. Cheng, C.T. Ng Department of Mathematics, Ocean

More information

Jubatus: An Open Source Platform for Distributed Online Machine Learning

Jubatus: An Open Source Platform for Distributed Online Machine Learning Jubatus: An Open Source Platform for Distributed Online Machine Learning Shohei Hido Seiya Tokui Preferred Infrastructure Inc. Tokyo, Japan {hido, tokui}@preferred.jp Satoshi Oda NTT Software Innovation

More information

THREE DIMENSIONAL GEOMETRY

THREE DIMENSIONAL GEOMETRY Chapter 8 THREE DIMENSIONAL GEOMETRY 8.1 Introduction In this chapter we present a vector algebra approach to three dimensional geometry. The aim is to present standard properties of lines and planes,

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

Bootstrapping Big Data

Bootstrapping Big Data Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

More information