Direct Convex Relaxations of Sparse SVM
|
|
|
- Rosemary Manning
- 9 years ago
- Views:
Transcription
1 Antoni B. Chan Nuno Vasconcelos Gert R. G. Lanckriet Department of Electrical and Computer Engineering, University of California, San Diego, CA, 9037, USA Abstract Although support vector machines (SVMs) for binary classification give rise to a decision rule that only relies on a subset of the training data points (support vectors), it will in general be based on all available features in the input space. We propose two direct, novel convex relaxations of a nonconvex sparse SVM formulation that explicitly constrains the cardinality of the vector of feature weights. One relaxation results in a quadratically-constrained quadratic program (QCQP), while the second is based on a semidefinite programg (SDP) relaxation. The QCQP formulation can be interpreted as applying an adaptive soft-threshold on the SVM hyperplane, while the SDP formulation learns a weighted inner-product (i.e. a kernel) that results in a sparse hyperplane. Experimental results show an increase in sparsity while conserving the generalization performance compared to a standard as well as a linear programg SVM.. Introduction Support vector machines (SVMs) (Vapnik, 995) address binary classification problems by constructing a imum-margin hyperplane that separates two classes of data points. Typically, the SVM hyperplane is a function of a relatively small set of training points, i.e., those on or over the margin. Although sparse with respect to data points, the SVM hyperplane is usually not sparse in the original feature space; the decision surface spans all dimensions of the latter, and all features contribute to the decision rule. In many applications this may not be desired. First, if it is known Appearing in Proceedings of the 4 th International Conference on Machine Learning, Corvallis, OR, 007. Copyright 007 by the author(s)/owner(s). that some of the features are noise it may be sensible to simply ignore the associated dimensions, improving the generalization ability of the decision rule. Second, if some features are redundant (e.g. linear combinations of other features), it may be possible to ignore the redundant features without drastically changing the classification performance, and thus reduce the computational (or experimental) requirements of feature extraction. Third, feature selection is often desirable for reasons of interpretability, i.e. there is a need to find which features are important for a given physical process. For example in biological experiments, the features may be gene expression data on DNA microarrays, and identifying important features may lead to a better understanding of the underlying biological process. Finally, if the input space is high dimensional, and obtaining the features is expensive (e.g., the result of costly or time-consug biological experiments), economical considerations may strongly advocate for a classifier based on a small subset of features. An example of the improved generalization achievable with a sparse decision-rule is presented in Figure, which shows a two-dimensional classification problem where the second feature is noise. Twenty feature-vectors were randomly drawn from each class y {, }, with the first feature containing the signal x N(3y, 3), and the second feature containing only noise x N(0, 3), where N(µ, σ ) is a Gaussian of mean µ and variance σ. A standard and a sparse SVM (using an SDP relaxation to be described later) are trained, and the hyperplanes and margins of the two SVMs are shown in the figure. Note that the standard SVM hyperplane is skewed by the noise in the second feature, while the sparse SVM finds a hyperplane that only depends on the first feature. The latter has better generalization properties. Numerous methods for feature selection have been proposed (Guyon & Elisseeff, 003; Blum & Langley, 997). In the area of feature selection in SVMs, previous work falls into two categories: ) algorithms that
2 x x + Class Class SVM Sparse SVM Figure. Example of a two-dimensional classification problem where the second feature is noise. The normal SVM fits the noise in the data, while the sparse SVM is robust and ignores the noise. adopt a feature selection strategy disjoint from SVM training, and ) algorithms that simultaneously learn the optimal subset of features and the SVM classifier. Algorithms in the first category rank features based on the parameters of the SVM hyperplane. In (Guyon et al., 00), recursive feature eliation is combined with linear SVMs. After training an SVM, the feature with smallest weight in the decision rule is removed. A new SVM is trained on the remaining features and the process is repeated until the desired number of features remains. This method was later extended in (Rakotomamonjy, 003) to other ranking criteria, still based on the hyperplane parameters. (Weston et al., 003) proposes another iterative method, alternating between SVM training and re-scaling of the data, according to the feature weights. A second category of approaches integrates feature selection and classifier training by adjusting the SVM formulation to ensure sparsity of the resulting decision rule. The proposed adjustments indirectly attempt to imize the l 0 -norm (i.e., cardinality) of the hyperplane normal, a non-convex and difficult problem. The linear programg SVM (LP-SVM) (Bennett & Mangasarian, 99; Bradley & Mangasarian, 000; Zhu et al., 003) achieves sparsity by imizing the convex envelope of the l 0 -norm, i.e., the l -norm of the normal vector (rather than the l -norm as in the standard SVM). It has been applied to various problems in computational biology (Grate et al., 00; Fung & Mangasarian, 004) and drug-design (Bi et al., 003). Several methods achieve sparsity by augmenting the SVM objective function with a penalty term on the cardinality of hyperplane normal. (Bradley & Mangasarian, 998) proposes to modify the LP-SVM with a penalty term based on an l 0 -norm approximation. Similarly, (Neumann et al., 005) proposes two modified SVMs that add penalty terms based on the l - norm and on an l 0 -norm approximation. In contrast to adding a penalty term on the cardinality, several methods introduce adaptive scale parameters that multiply with the hyperplane normal, and feature selection is achieved by encouraging sparsity of the scale parameters. (Grandvalet & Canu, 003) proposes to learn the scale parameters simultaneously with the standard SVM problem. In (Weston et al., 000) the scale parameters and SVM are learned by imizing a bound on the leave-one-out error, while (Peleg & Meir, 004) uses the global imization of a data-dependent generalization error-bound. In this paper, we study a sparse SVM formulation that is obtained by augmenting the standard SVM formulation with an explicit cardinality constraint on the hyperplane normal. Note that this formulation is in contrast to previous work, which either penalizes an l 0 -norm approximation of the hyperplane in the objective, or uses adaptive scale-parameters. We explore two direct convex relaxations of the sparse SVM formulation. A first relaxation results in a quadratically-constrained quadratic program (QCQP) (Boyd & Vandenberghe, 004), while the second is based on a semidefinite programg (SDP) relaxation (Lemaréchal & Oustry, 999). Empirical results show an increase in sparsity, with roughly identical classification performance, compared to both the standard and LP-SVM. The remainder of the paper is organized as follows. In Section, we briefly review the standard SVM and the LP-SVM. Section 3 presents the sparse SVM and derives the QCQP and SDP convex relaxations. Finally, in Section 4 we present the results of experiments on a synthetic example and on a large set of UCI databases.. Standard SVM Given a set of feature vectors {x i } N with x i R d, and corresponding labels {y i } N with y i {, }, the standard (linear) SVM learns the hyperplane that separates the two classes of training points with imal margin, measured in l -norm. The C-SVM (Vapnik, 995) formulation introduces slack variables to allow errors for data that may not be linearly separable. The SVM hyperplane is learned by solving a quadratic programg problem:
3 Problem (C-SVM) w,ξ,b N w + C y i (w T x i + b), i =,...,N, 0. The dual of the C-SVM is also a quadratic program. Problem (C-SVM dual) α α i α i α j y i y j x T i x j i,j= α i y i = 0, 0 α i C, i =,...,N. The normal of the imum-margin hyperplane can be computed as w = N α iy i x i (Vapnik, 995). The α i variables are usually sparse (i.e., many of them are zero), and the hyperplane only depends on a few training points. The hyperplane is, however, generally not sparse in the original feature space; w is usually a dense vector and the decision rule depends on all features. An alternative formulation, which encourages feature selection, is the LP-SVM, introduced in (Bennett & Mangasarian, 99). It replaces the l -norm of the C-SVM formulation with the l -norm. Problem 3 (LP-SVM) w,ξ,b w + C y i (w T x i + b), i =,...,N, 0. The dual of the LP-SVM is also a linear program, Problem 4 (LP-SVM dual) α α i α i y i x i, 3. Sparse SVM 0 α i C, i =,...,N. α i y i = 0, The goal of the sparse SVM (SSVM) methods now proposed is to construct a imum-margin hyperplane based on a limited subset of features in input space. This is to be achieved by computing the hyperplane parameters and the optimal feature subset simultaneously. The sparsity of the hyperplane normal w can be explicitly enforced by adding a cardinality constraint on w Problem 5 (Sparse SVM) w,ξ,b N w + C y i (w T x i + b), i =,...,N, 0, Card(w) r. where Card(w) is the cardinality of w, i.e., its l 0 -norm or the number of non-zero entries. This is a difficult optimization problem, since the cardinality constraint is not convex. However, a convex relaxation of the SSVM can be found by replacing the non-convex cardinality constraint by a weaker, non-convex, constraint. Indeed, for all w R d, it follows from the Cauchy- Schwartz inequality that Card(w) = r w r w, () enabling the replacement of the cardinality constraint by w r w. This weaker, non-convex, constraint can now be relaxed in two ways, leading to two convex relaxations of the sparse SVM, a quadratically-constrained quadratic program (QCQP) and a semidefinite program (SDP). 3.. QCQP Relaxation If the l -norm of w is bounded by another variable t, i.e., w t, then the constraint w r w can be relaxed to the weaker, convex, constraint w rt. Relaxing the cardinality constraint of Problem 5 in this way, gives rise to a quadratically-constrained quadratic programg relaxation of the sparse SVM (QCQP-SSVM): Problem 6 (QCQP-SSVM) w,ξ,b,t N t + C y i (w T x i + b), i =,..., N, 0, w t, w r t. This problem is equivalent to
4 Problem 7 (QCQP-SSVM) w,ξ,b,t ( ) r w, w + C y i (w T x i + b), i =,..., N, 0. q j + ν j * 0 In other words, the QCQP-SSVM is a combination of the C-SVM and the LP-SVM where the (squared) l -norm encourages sparsity and the l -norm encourages a large margin (in l -norm). The dual of QCQP- SSVM is given by (see (Chan et al., 007) for a derivation): Problem 8 (QCQP-SSVM dual) α,ν,η,µ α i η α i y i x i + ν rµ ( η) α i y i = 0, 0 α i C, i =,...,N, µ ν j µ, j =,...,d, 0 η, 0 µ. The hyperplane ( normal w can be recovered from the dual as w = N ) η α iy i x i + ν. Note that the dual formulation is similar to the C-SVM dual (Problem ) in that they both contain similar linear and quadratic terms of α i. However, the QCQP-SSVM dual introduces a d-dimensional vector ν, subjected to a box constraint of µ, which adds to the SVM hyperplane q = N α iy i x i. The role of the vector ν is to apply a soft-threshold to the entries of q. Consider the case where we only optimize ν, while holding all the other variables fixed. It can be shown that the optimal ν is q j, q j µ νj = µ, q j > µ () +µ, q j < µ Hence, when computing the hyperplane w = η (q+ν ), the feature weight w j with corresponding q j µ will be set to zero, while all other weights will have their magnitudes reduced by µ. This is equivalent to applying a soft-threshold of µ on the hyperplane weights q j (see Figure ), and leads to sparse entries in w. In the general case, the magnitude of the soft-threshold µ (and hence the sparsity) is regularized by a quadratic penalty term weighted by the parameter r. In this sense, the QCQP-SSVM dual is automatically learning an adaptive soft-threshold on the original SVM hyperplane q j Figure. Example of a soft-threshold of µ = on hyperplane weight q j. There are two interesting modes of operation of the QCQP-SSVM dual with respect to the trade-off parameter η. The first is when η = 0, which corresponds to when the l -norm constraint is active and the l -norm constraint is inactive in the primal (Problem 6). Noting that ν = N α iy i x i imizes the quadratic α term, the dual problem reduces to Problem 9 α,µ µ α i rµ α i y i x i µ, α i y i = 0, 0 α i C, i =,...,N. Problem 9 is similar to the dual of the LP-SVM (Problem 4), except for two additions: ) the variable µ which sets the box constraint on i α iy i x i (for the LP-SVM, this is set to µ = ); and ) a quadratic penalty, weighted by r, that regularizes µ, i.e. keeps the box constraint from becog too large. This can be interpreted as an extension of the LP-SVM dual where the box-constraint is automatically learned. The second interesting mode is when η =, which corresponds to when the l -norm constraint is active and the l -norm constraint is inactive in the primal (Problem 6). In this case, µ = 0 and ν = 0, and the QCQP-SSVM dual simplifies to the standard SVM dual (Problem ). 3.. SDP Relaxation A semidefinite programg relaxation (Lemaréchal & Oustry, 999) of the sparse SVM is obtained by first rewriting the weak, non-convex, cardinality constraint () using the matrix W = ww T w r w T W rtr(w), W = ww T. (3)
5 where W is the element-wise absolute value of the matrix W. Replacing the cardinality constraint of Problem 5 with the non-convex constraint in (3) yields Problem 0 W,w,b,ξ N tr(w) + C y i (w T x i + b), i =,...,N, 0, T W rtr(w), W = ww T, which is still non-convex because of the quadratic equality constraint, W = ww T. Since we are imizing tr(w), the quadratic equality constraint is equivalent to (Lemaréchal & Oustry, 999) W = ww T W wwt 0, rank(w) =. (4) Finally, relaxing the constraint (4) by simply dropping the rank constraint leads to a convex relaxation of the sparse SVM as a semidefinite program (SDP-SSVM): Problem (SDP-SSVM) W,w,b,ξ N tr(w) + C y i (w T x i + b), i =,...,N, 0, T W rtr(w), W ww T 0. The dual of the SDP-SSVM is also an SDP (again see (Chan et al., 007) for derivation): Problem (SDP-SSVM dual) α,µ,λ,ν α i α i α j y i y j x T i Λ x j i,j= α i y i = 0, 0 α i C, i =,...,N, Λ = ( µr)i + ν 0, µ 0, µ ν j,k µ, j, k =,...,d. The hyperplane w is computed from the optimal dual variables as w = Λ N α iy i x i. The SDP-SSVM dual is similar to the SVM dual (Problem ), but in the SDP dual, the inner product between x i and x j is replaced by a more general, weighted inner product x i, x j Λ = x T i Λ x j. The weighting matrix Λ is regularized by µ and r, which controls the amount of rotation and scaling. Ideally, setting Λ = 0 will imize the quadratic term in the objective function. However, this is not possible since the entries of Λ are bounded by µ. Instead, the optimal weighting matrix is one that increases the influence of the relevant features (i.e. scales up those features), while demoting the less relevant features. Hence, the SDP-SSVM learns a weighting on the inner product (i.e., learns a kernel) such that the separating hyperplane in the feature space is sparse. 4. Experiments We compare the performance of the two proposed sparse SVMs, QCQP-SSVM and SDP-SSVM, with the standard C-SVM and LP-SVM, on a synthetic problem and on fifteen UCI data sets. 4.. Data Sets The synthetic problem is a binary classification problem, similar to (Weston et al., 000), where only the first six dimensions of the feature space are relevant for classification. With probability 0.7, the first three features {x, x, x 3 } are drawn as x i yn(i, ) and the second triplet {x 4, x 5, x 6 } as x i N(0, ). Otherwise, the first three features are drawn as x i N(0, ), and the second triplet as x i yn(i 3, ). The remaining features are noise x i N(0, 0) for i = 7,...,30. One thousand data points are sampled, with equal probability for each class y {, }. The remaining experiments were performed on the fifteen UCI data sets listed in Table. Most of these are straightforward binary or multi-class classification problems. For the Wisconsin prognostic breast cancer data set (wpbc) two classes were formed by selecting examples with recurrence before 4 months, and examples with non-recurrence after 4 months (wpbc 4). The data set was also split using 60 month recurrence (wpbc 60). The Cleveland heart-disease data set was split into two classes based on the disease level (> and ). All data sets are available from (Newman et al., 998), and the brown yeast data set is available from (Weston et al., 003). 4.. Experimental Setup For each data set, each dimension of the data is normalized to zero mean and unit variance. The data is split randomly with 80% of the data used for training and cross-validation, and 0% held-out for testing. Two standard SVMs, the C-SVM (Problem ) and the LP-SVM (Problem 3), and two sparse SVMs,
6 C SVM LP SVM QCQP SSVM SDP SSVM C SVM LP SVM QCQP SSVM SDP SSVM Test Error Cardinality Number of training examples Number of training examples Figure 3. Results on the toy experiment: (left) test error and (right) cardinality of the SVM hyperplane versus the number of training examples. Table. Test error and sparsity results averaged over the 5 UCI data sets. C-SVM LP-SVM QCQP-SSVM SDP-SSVM average change in error rate w.r.t. C-SVM 0.000% 0.337% 0.09% 0.449% average sparsity (cardinality / dimension) average sparsity w.r.t. LP-SVM the QCQP-SSVM (Problem 6) and the SDP-SSVM (Problem ), are learned from the training data. The sparsity parameters are set to r qcqp = 0.0 for the QCQP-SSVM, and r sdp = for the SDP-SSVM. Although the value of r could be selected using crossvalidation, in these experiments we select a low value of r to imize the sparsity of the SSVM. For each SVM, the optimal C parameter is selected using 5- fold cross-validation (80% training, 0% validation) over the range C = { 0, 9,, 0, }. The C value yielding the best average accuracy on the validation sets is used to train the SVM on all training data. Finally, the cardinality of the hyperplane w is computed as the number of weights w j with large relative magnitude, i.e. the number of weights with w j / i ( w i ) 0 4. The weights that do not fit this criteria are set to zero. Each SVM is tested on the held-out data, and the final results reported are averaged over 0 trials of different random splits of the data. For the three multiclass datasets (wine, image, and brown yeast), a - vs-all classification problem is generated for each of the classes. The reported results are averaged over all classes. Because of their large size, the image and spambase data sets are split with 50% for training and 50% for testing. All SVMs are trained using standard convex optimization toolboxes. In particular, SeDuMi (Sturm, 999) or CSDP (Borchers, 999) is used to solve SDP-SSVM, and MOSEK for the other SVM. Investigation of the trade-off between sparsity and accuracy using r is also interesting, but is not presented here Results Figure 3 (left) shows the test error on the synthetic data set versus the number examples used to train each of the four SVMs. The results suggest that the accuracy of all SVMs depends on the latter. When trained with at least 0 examples, the test errors of LP-SVM, QCQP-SSVM, and SDP-SSVM are similar and consistently smaller than the test error of C-SVM. On the other hand, the performance of the sparse SVMs is worse than C-SVM and LP-SVM when trained with too few examples (e.g., 0). Figure 3 (right) shows the cardinality of the classifiers. C-SVM has full cardinality (30), while LP-SVM has an average cardinality of 6.. The QCQP-SSVM selects slightly fewer features (5.9), while SDP-SSVM selects the lowest number of features (5.). The overall experimental results on the 5 UCI data sets are shown in Table. In these experiments, the test errors obtained with the C-SVM and the sparse SVMs are roughly identical, e.g. on average the QCQP-SSVM error rate only increases by 0.09% over the C-SVM error rate. However, while C-SVM typically uses all the dimensions of the feature space, LP- SVM, QCQP-SSVM, and SDP-SSVM use much fewer. The QCQP-SSVM used the fewest features, with average sparsity (i.e. the ratio between the cardinality and the dimension of the feature space) of In contrast, the SDP-SSVM and LP-SVM had an average sparsity of 0.6 and 0.658, respectively. Table shows the cardinality and the change in test
7 Table. Results on 5 UCI data sets: d is the dimension of the feature space, w 0 is the average cardinality of SVM hyperplane, err is the average test error, and err is the average change in test error with respect to the C-SVM test error. The lowest cardinality and best test errors among the LP-SVM, QCQP-SSVM, and SDP-SSVM are highlighted. C-SVM LP-SVM QCQP-SSVM SDP-SSVM UCI Data Set d w 0 err w 0 err w 0 err w 0 err. Pima Indians Diabetes % % % %. Breast Cancer (Wisc.) % % % % 3. Wine.0 3.5% % % % 4. Heart Disease (Cleve.) %.4 0.4% % % 5. Image Segm % % % % 6. SPECT.4 7.% %.4 0.9%. 0.09% 7. Breast Cancer (wdbc) % % % % 8. Breast Cancer (wpbc 4) % % % % 9. Breast Cancer (wpbc 60) % %..5% % 0. Ionosphere % % %. 4.6%. SPECTF % % % %. spambase % % % % 3. sonar % % 4..5% % 4. brown yeast % % % % 5. musk % % % % error for each of the UCI datasets, and Figure 4 plots the sparsities. Note that the SSVMs have the best sparsity for all of the datasets, with the improvement tending to increase with the dimensionality. In some data sets, the test error drops significantly when using SSVM (e.g. ionosphere, 4.30%), which indicates that some features in these data sets are noise (similar to the toy experiment). In others, the error remains the same while the sparsity increases (e.g. brown yeast), which suggests that these datasets contain redundant features. In both cases, the SSVM ignores the noise or redundant features and achieves a sparse solution. Finally, there are some cases where the SSVM is too aggressive and the added sparsity introduces a slight increase in error (e.g. sonar). This is perhaps an indication that the data set is not sparse, although the SSVM still finds a sparse classifier. 5. Conclusions and Future Work In this paper, we have formulated the sparse SVM as a standard SVM with an explicit cardinality constraint on the weight vector. Relaxing the cardinality constraint yields two convex optimization problems, a QCQP and an SDP, that approximately solve the original sparse SVM formulation. An interpretation of the QCQP formulation is that it applies an adaptive soft-threshold on the hyperplane weights to achieve sparsity. On the other hand, the SDP formulation learns an inner-product weighting (i.e. a kernel) that results in a sparse hyperplane. Experimental results on fifteen UCI data sets show that both sparse SVM Card. / Dim C SVM LP SVM QCQP SSVM SDP SSVM Database Figure 4. UCI results: sparsity of the SVM hyperplane. achieve a test error similar to standard C-SVM, while using fewer features than both C-SVM and LP-SVM. One interesting property of SDP-SSVM, which is missing for LP-SVM, is that its dual problem depends on an inner product (x T i Λ x j ), which suggests the possibility of kernelizing SDP-SSVM. In this case, the sparsity of the weight vector in the high-dimensional feature space (induced by the kernel) may lead to better generalization of the classifier. The main obstacle, however, is that the weighting matrix Λ lives in the feature space. Hence, further study is needed on the properties of the matrix and whether it can be computed using the kernel. Finally, the implementation of SDP-SSVM using the
8 off-the-shelf SDP optimizers (SeDuMi and CSDP) is quite slow for high-dimensional data. Future work will be directed at developing a customized solver that will make the SDP-SSVM amenable for larger and higher dimensional datasets. Acknowledgments The authors thank the anonymous reviewers for insightful comments, and Sameer Agarwal for helpful discussions. This work was partially supported by NSF award IIS , NSF grant DMS-MSPA , and NSF IGERT award DGE References Bennett, K. P., & Mangasarian, O. L. (99). Robust linear programg discriation of two linearly inseparable sets. Optim. Methods Softw.,, Bi, J., Bennett, K. P., Embrechts, M., Breneman, C. M., & Song, M. (003). Dimensionality reduction via sparse support vector machines. J. Mach. Learn. Res., 3, Blum, A. L., & Langley, P. (997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97, Borchers, B. (999). CSDP, a C library for semidefinite programg. Optim. Methods Softw.,, Boyd, S., & Vandenberghe, L. (004). Convex optimization. Cambridge University Press. Bradley, P. S., & Mangasarian, O. L. (998). Feature selection via concave imization and support vector machines. Intl. Conf. on Machine Learning. Bradley, P. S., & Mangasarian, O. L. (000). Massive data discriation via linear support vector machines. Optim. Methods Softw., 3(), 0. Chan, A. B., Vasconcelos, N., & Lanckriet, G. R. G. (007). Duals of the QCQP and SDP sparse SVM (Technical Report SVCL-TR ). University of California, San Diego. Fung, G. M., & Mangasarian, O. L. (004). A feature selection Newton method for support vector machine classification. Computational Optimization and Applications, 8, Grandvalet, Y., & Canu, S. (003). Adaptive scaling for feature selection in SVMs. Neural Information Processing Systems. Grate, L. R., Bhattacharyya, C., Jordan, M. I., & Mian, I. S. (00). Simultaneous relevant feature identification and classification in high-dimensional spaces. Lecture Notes in Computer Science. Guyon, I., & Elisseeff, A. (003). An introduction to variable and feature selection. J. Mach. Learn. Res., 3, Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (00). Gene selection for cancer classification using support vector machines. Mach. Learn., 46, Lemaréchal, C., & Oustry, F. (999). Semidefinite relaxations and Lagrangian duality with application to combinatorial optimization (Technical Report 370). INRIA. MOSEK (006). MOSEK optimization software (Technical Report). Neumann, J., Schnörr, C., & Steidl, G. (005). Combined SVM-based feature selection and classification. Mach. Learn., 6, Newman, D. J., Hettich, S., Blake, C. L., & Merz, C. J. (998). UCI repository of machine learning databases (Technical Report). mlearn. Peleg, D., & Meir, R. (004). A feature selection algorithm based on the global imization of a generalization error bound. Neural Information Processing Systems. Rakotomamonjy, A. (003). Variable selection using SVM-based criteria. J. Mach. Learn. Res., 3, Sturm, J. F. (999). Using SeDuMi.0, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw.,, Vapnik, V. N. (995). The nature of statistical learning theory. Springer-Verlag. Weston, J., Elisseeff, A., Scholkopf, B., & Tipping, M. (003). Use of zero-norm with linear models and kernel methods. J. Mach. Learn. Res., 3, Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., & Vapnik, V. (000). Feature selection for SVMs. Neural Information Processing Systems. Zhu, J., Rossett, S., Hastie, T., & Tibshirani, R. (003). -norm support vector machines. Neural Information Processing Systems.
Support Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
A Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
Support Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More Generality in Efficient Multiple Kernel Learning
Manik Varma [email protected] Microsoft Research India, Second Main Road, Sadashiv Nagar, Bangalore 560 080, India Bodla Rakesh Babu [email protected] CVIT, International Institute of Information
Introduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
Support Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
An Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.
An Overview Of Software For Convex Optimization Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 [email protected] In fact, the great watershed in optimization isn t between linearity
Maximum Margin Clustering
Maximum Margin Clustering Linli Xu James Neufeld Bryce Larson Dale Schuurmans University of Waterloo University of Alberta Abstract We propose a new method for clustering based on finding maximum margin
Lecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
Maximum-Margin Matrix Factorization
Maximum-Margin Matrix Factorization Nathan Srebro Dept. of Computer Science University of Toronto Toronto, ON, CANADA [email protected] Jason D. M. Rennie Tommi S. Jaakkola Computer Science and Artificial
Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725
Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
Massive Data Classification via Unconstrained Support Vector Machines
Massive Data Classification via Unconstrained Support Vector Machines Olvi L. Mangasarian and Michael E. Thompson Computer Sciences Department University of Wisconsin 1210 West Dayton Street Madison, WI
Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard
Several Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
Practical Guide to the Simplex Method of Linear Programming
Practical Guide to the Simplex Method of Linear Programming Marcel Oliver Revised: April, 0 The basic steps of the simplex algorithm Step : Write the linear programming problem in standard form Linear
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Nonlinear Optimization: Algorithms 3: Interior-point methods
Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris [email protected] Nonlinear optimization c 2006 Jean-Philippe Vert,
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Big Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
Privacy-Preserving Outsourcing Support Vector Machines with Random Transformation
Privacy-Preserving Outsourcing Support Vector Machines with Random Transformation Keng-Pei Lin Ming-Syan Chen Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan Research Center
Linear Algebra Review. Vectors
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka [email protected] http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length
Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data
Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA
Multiple Kernel Learning on the Limit Order Book
JMLR: Workshop and Conference Proceedings 11 (2010) 167 174 Workshop on Applications of Pattern Analysis Multiple Kernel Learning on the Limit Order Book Tristan Fletcher Zakria Hussain John Shawe-Taylor
D-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
Least-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
RSVM: Reduced Support Vector Machines
page 1 RSVM: Reduced Support Vector Machines Yuh-Jye Lee and Olvi L. Mangasarian 1 Introduction Abstract An algorithm is proposed which generates a nonlinear kernel-based separating surface that requires
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
Lecture 2: August 29. Linear Programming (part I)
10-725: Convex Optimization Fall 2013 Lecture 2: August 29 Lecturer: Barnabás Póczos Scribes: Samrachana Adhikari, Mattia Ciollaro, Fabrizio Lecci Note: LaTeX template courtesy of UC Berkeley EECS dept.
Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
Supporting Online Material for
www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence
Author Gender Identification of English Novels
Author Gender Identification of English Novels Joseph Baena and Catherine Chen December 13, 2013 1 Introduction Machine learning algorithms have long been used in studies of authorship, particularly in
A Hybrid Forecasting Methodology using Feature Selection and Support Vector Regression
A Hybrid Forecasting Methodology using Feature Selection and Support Vector Regression José Guajardo, Jaime Miranda, and Richard Weber, Department of Industrial Engineering, University of Chile Abstract
Semi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
Towards better accuracy for Spam predictions
Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 [email protected] Abstract Spam identification is crucial
An Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia [email protected] Tata Institute, Pune,
Solving polynomial least squares problems via semidefinite programming relaxations
Solving polynomial least squares problems via semidefinite programming relaxations Sunyoung Kim and Masakazu Kojima August 2007, revised in November, 2007 Abstract. A polynomial optimization problem whose
Supervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
SUPPORT VECTOR MACHINE (SVM) is the optimal
130 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 1, JANUARY 2008 Multiclass Posterior Probability Support Vector Machines Mehmet Gönen, Ayşe Gönül Tanuğur, and Ethem Alpaydın, Senior Member, IEEE
Early defect identification of semiconductor processes using machine learning
STANFORD UNIVERISTY MACHINE LEARNING CS229 Early defect identification of semiconductor processes using machine learning Friday, December 16, 2011 Authors: Saul ROSA Anton VLADIMIROV Professor: Dr. Andrew
Machine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
Proximal mapping via network optimization
L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:
Benchmarking Open-Source Tree Learners in R/RWeka
Benchmarking Open-Source Tree Learners in R/RWeka Michael Schauerhuber 1, Achim Zeileis 1, David Meyer 2, Kurt Hornik 1 Department of Statistics and Mathematics 1 Institute for Management Information Systems
Regularized Logistic Regression for Mind Reading with Parallel Validation
Regularized Logistic Regression for Mind Reading with Parallel Validation Heikki Huttunen, Jukka-Pekka Kauppi, Jussi Tohka Tampere University of Technology Department of Signal Processing Tampere, Finland
DUOL: A Double Updating Approach for Online Learning
: A Double Updating Approach for Online Learning Peilin Zhao School of Comp. Eng. Nanyang Tech. University Singapore 69798 [email protected] Steven C.H. Hoi School of Comp. Eng. Nanyang Tech. University
Machine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc.
1. Introduction Linear Programming for Optimization Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc. 1.1 Definition Linear programming is the name of a branch of applied mathematics that
On the Path to an Ideal ROC Curve: Considering Cost Asymmetry in Learning Classifiers
On the Path to an Ideal ROC Curve: Considering Cost Asymmetry in Learning Classifiers Francis R. Bach Computer Science Division University of California Berkeley, CA 9472 [email protected] Abstract
Component Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
A fast multi-class SVM learning method for huge databases
www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.
Epipolar Geometry We consider two perspective images of a scene as taken from a stereo pair of cameras (or equivalently, assume the scene is rigid and imaged with a single camera from two different locations).
Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach
MASTER S THESIS Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach PAULINE ALDENVIK MIRJAM SCHIERSCHER Department of Mathematical
Principal components analysis
CS229 Lecture notes Andrew Ng Part XI Principal components analysis In our discussion of factor analysis, we gave a way to model data x R n as approximately lying in some k-dimension subspace, where k
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,
Multiclass Classification. 9.520 Class 06, 25 Feb 2008 Ryan Rifkin
Multiclass Classification 9.520 Class 06, 25 Feb 2008 Ryan Rifkin It is a tale Told by an idiot, full of sound and fury, Signifying nothing. Macbeth, Act V, Scene V What Is Multiclass Classification? Each
Infinite Kernel Learning
Max Planck Institut für biologische Kybernetik Max Planck Institute for Biological Cybernetics Technical Report No. TR-78 Infinite Kernel Learning Peter Vincent Gehler and Sebastian Nowozin October 008
L13: cross-validation
Resampling methods Cross validation Bootstrap L13: cross-validation Bias and variance estimation with the Bootstrap Three-way data partitioning CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU
Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
Support Vector Machines for Classification and Regression
UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer
Making Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research
A QCQP Approach to Triangulation. Chris Aholt, Sameer Agarwal, and Rekha Thomas University of Washington 2 Google, Inc.
A QCQP Approach to Triangulation 1 Chris Aholt, Sameer Agarwal, and Rekha Thomas 1 University of Washington 2 Google, Inc. 2 1 The Triangulation Problem X Given: -n camera matrices P i R 3 4 -n noisy observations
CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott
Nonlinear Programming Methods.S2 Quadratic Programming
Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective
Large Margin DAGs for Multiclass Classification
S.A. Solla, T.K. Leen and K.-R. Müller (eds.), 57 55, MIT Press (000) Large Margin DAGs for Multiclass Classification John C. Platt Microsoft Research Microsoft Way Redmond, WA 9805 [email protected]
Machine Learning in FX Carry Basket Prediction
Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines
Visualization of large data sets using MDS combined with LVQ.
Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk
A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode
A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode Seyed Mojtaba Hosseini Bamakan, Peyman Gholami RESEARCH CENTRE OF FICTITIOUS ECONOMY & DATA SCIENCE UNIVERSITY
Model Trees for Classification of Hybrid Data Types
Model Trees for Classification of Hybrid Data Types Hsing-Kuo Pao, Shou-Chih Chang, and Yuh-Jye Lee Dept. of Computer Science & Information Engineering, National Taiwan University of Science & Technology,
17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function
17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function, : V V R, which is symmetric, that is u, v = v, u. bilinear, that is linear (in both factors):
Lecture 11: 0-1 Quadratic Program and Lower Bounds
Lecture : - Quadratic Program and Lower Bounds (3 units) Outline Problem formulations Reformulation: Linearization & continuous relaxation Branch & Bound Method framework Simple bounds, LP bound and semidefinite
Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
A Feature Selection Methodology for Steganalysis
A Feature Selection Methodology for Steganalysis Yoan Miche 1, Benoit Roue 2, Amaury Lendasse 1, Patrick Bas 12 1 Laboratory of Computer and Information Science Helsinki University of Technology P.O. Box
USING SUPPORT VECTOR MACHINES FOR LANE-CHANGE DETECTION
USING SUPPORT VECTOR MACHINES FOR LANE-CHANGE DETECTION Hiren M. Mandalia Drexel University Philadelphia, PA Dario D. Salvucci Drexel University Philadelphia, PA Driving is a complex task that requires
Predicting Flight Delays
Predicting Flight Delays Dieterich Lawson [email protected] William Castillo [email protected] Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing
Music Mood Classification
Music Mood Classification CS 229 Project Report Jose Padial Ashish Goel Introduction The aim of the project was to develop a music mood classifier. There are many categories of mood into which songs may
Loss Functions for Preference Levels: Regression with Discrete Ordered Labels
Loss Functions for Preference Levels: Regression with Discrete Ordered Labels Jason D. M. Rennie Massachusetts Institute of Technology Comp. Sci. and Artificial Intelligence Laboratory Cambridge, MA 9,
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,
Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.
Summer course on Convex Optimization Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.Minnesota Interior-Point Methods: the rebirth of an old idea Suppose that f is
CHAPTER 9. Integer Programming
CHAPTER 9 Integer Programming An integer linear program (ILP) is, by definition, a linear program with the additional constraint that all variables take integer values: (9.1) max c T x s t Ax b and x integral
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
Sub-class Error-Correcting Output Codes
Sub-class Error-Correcting Output Codes Sergio Escalera, Oriol Pujol and Petia Radeva Computer Vision Center, Campus UAB, Edifici O, 08193, Bellaterra, Spain. Dept. Matemàtica Aplicada i Anàlisi, Universitat
Tree based ensemble models regularization by convex optimization
Tree based ensemble models regularization by convex optimization Bertrand Cornélusse, Pierre Geurts and Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège B-4000
Mathematical Models of Supervised Learning and their Application to Medical Diagnosis
Genomic, Proteomic and Transcriptomic Lab High Performance Computing and Networking Institute National Research Council, Italy Mathematical Models of Supervised Learning and their Application to Medical
Moving Least Squares Approximation
Chapter 7 Moving Least Squares Approimation An alternative to radial basis function interpolation and approimation is the so-called moving least squares method. As we will see below, in this method the
Identification algorithms for hybrid systems
Identification algorithms for hybrid systems Giancarlo Ferrari-Trecate Modeling paradigms Chemistry White box Thermodynamics System Mechanics... Drawbacks: Parameter values of components must be known
Numerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 6. Eigenvalues and Singular Values In this section, we collect together the basic facts about eigenvalues and eigenvectors. From a geometrical viewpoint,
Subspace Analysis and Optimization for AAM Based Face Alignment
Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China [email protected] Stan Z. Li Microsoft
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
Conic optimization: examples and software
Conic optimization: examples and software Etienne de Klerk Tilburg University, The Netherlands Etienne de Klerk (Tilburg University) Conic optimization: examples and software 1 / 16 Outline Conic optimization
Support Vector Machine. Tutorial. (and Statistical Learning Theory)
Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. [email protected] 1 Support Vector Machines: history SVMs introduced
Advice Refinement in Knowledge Based SVMs
Advice Refinement in Knowledge-Based SVMs Gautam Kunapuli Univ. of Wisconsin-Madison 1300 University Avenue Madison, WI 53705 [email protected] Richard Maclin Univ. of Minnesota, Duluth 1114 Kirby Drive
KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA
Rahayu, Kernel Logistic Regression-Linear for Leukemia Classification using High Dimensional Data KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA S.P. Rahayu 1,2
