RSVM: Reduced Support Vector Machines
|
|
|
- Magdalen Bradley
- 9 years ago
- Views:
Transcription
1 page 1 RSVM: Reduced Support Vector Machines Yuh-Jye Lee and Olvi L. Mangasarian 1 Introduction Abstract An algorithm is proposed which generates a nonlinear kernel-based separating surface that requires as little as 1% of a large dataset for its eplicit evaluation. To generate this nonlinear surface, the entire dataset is used as a constraint in an optimization problem with very few variables corresponding to the 1% of the data kept. The remainder of the data can be thrown away after solving the optimization problem. This is achieved by making use of a rectangular m m kernel K(A, Ā ) that greatly reduces the size of the quadratic program to be solved and simplifies the characterization of the nonlinear separating surface. Here, the m rows of A represent the original m data points while the m rows of Ā represent a greatly reduced m data points. Computational results indicate that test set correctness for the reduced support vector machine (RSVM), with a nonlinear separating surface that depends on a small randomly selected portion of the dataset, is better than that of a conventional support vector machine (SVM) with a nonlinear surface that eplicitly depends on the entire dataset, and much better than a conventional SVM using a small random sample of the data. Computational times, as well as memory usage, are much smaller for RSVM than that of a conventional SVM using the entire dataset. Support vector machines have come to play a very doant role in data classification using a kernel-based linear or nonlinear classifier [23, 6, 21, 22]. Two major problems that confront large data classification by a nonlinear kernel are: 1. The sheer size of the mathematical programg problem that needs to be solved and the time it takes to solve, even for moderately sized datasets. Computer Sciences Department, University of Wisconsin, Madison, WI [email protected]. Computer Sciences Department, University of Wisconsin, Madison, WI [email protected], corresponding author. 1
2 page The dependence of the nonlinear separating surface on the entire dataset which creates unwieldy storage problems that prevents the use of nonlinear kernels for anything but a small dataset. For eample, even for a thousand point dataset, one is confronted by a fully dense quadratic program with 1001 variables and 1000 constraints resulting in constraint matri with over a million entries. In contrast, our proposed approach would typically reduce the problem to one with a 101 variables and a 1000 constraints which is readily solved by a smoothing technique [10] as an unconstrained 101-dimensional imization problem. This generates a nonlinear separating surface which depends on a hundred data points only, instead of the conventional nonlinear kernel surface which would depend on the entire 1000 points. In [24], an approimate kernel has been proposed which is based on an eigenvalue decomposition of a randomly selected subset of the training set. However, unlike our approach, the entire kernel matri is generated within an iterative linear equation solution procedure. We note that our data-reduction approach should work equally well for 1-norm based support vector machines [1], chunking methods [2] as well as Platt s sequential imization optimization (SMO) [19]. We briefly outline the contents of the paper now. In Section 2 we describe kernel-based classification for linear and nonlinear kernels. In Section 3 we outline our reduced SVM approach. Section 4 gives computational and graphical results that show the effectiveness and power of RSVM. Section 5 concludes the paper. A word about our notation and background material. All vectors will be column vectors unless transposed to a row vector by a prime superscript. For a vector in the n-dimensional real space R n, the plus function + is defined as ( + ) i = ma {0, i }, while the step function is defined as ( ) i = 1 if i > 0 else ( ) i = 0, i = 1,..., n. The scalar (inner) product of two vectors and y in the n-dimensional real space R n will be denoted by y and the p-norm of will be denoted by p. For a matri A R m n, A i is the ith row of A which is a row vector in R n. A column vector of ones of arbitrary dimension will be denoted by e. For A R m n and B R n l, the kernel K(A, B) maps R m n R n l into R m l. In particular, if and y are column vectors in R n then, K(, y) is a real number, K(, A ) is a row vector in R m and K(A, A ) is an m m matri. The base of the natural logarithm will be denoted by ε. 2 Linear and Nonlinear Kernel Classification We consider the problem of classifying m points in the n-dimensional real space R n, represented by the m n matri A, according to membership of each point A i in the classes +1 or -1 as specified by a given m m diagonal matri D with ones or us ones along its diagonal. For this problem the standard support vector machine with a linear kernel AA [23, 6] is given by the following quadratic program
3 page 3 3 for some ν > 0: (w,γ,y) R n+1+m νe y w w s.t. D(Aw eγ) + y e y 0. As depicted in Figure 1, w is the normal to the bounding planes: w γ = +1 w γ = 1, and γ deteres their location relative to the origin. The first plane above bounds the class +1 points and the second plane bounds the class -1 points when the two classes are strictly linearly separable, that is when the slack variable y = 0. The linear separating surface is the plane (1) (2) w = γ, (3) midway between the bounding planes (2). If the classes are linearly inseparable then the two planes bound the two classes with a soft margin detered by a nonnegative slack variable y, that is: w γ + y i +1, for = A i and D ii = +1, w γ y i 1, for = A i and D ii = 1. The 1-norm of the slack variable y is imized with weight ν in (1). The quadratic 2 term in (1), which is twice the reciprocal of the square of the 2-norm distance w 2 between the two bounding planes of (2) in the n-dimensional space of w R n for a fied γ, maimizes that distance, often called the margin. Figure 1 depicts 2 the points represented by A, the bounding planes (2) with margin w 2, and the separating plane (3) which separates A+, the points represented by rows of A with D ii = +1, from A, the points represented by rows of A with D ii = 1. In our smooth approach, the square of 2-norm of the slack variable y is imized with weight ν 2 instead of the 1-norm of y as in (1). In addition the distance between the planes (2) is measured in the (n+1)-dimensional space of (w, γ) R n+1, 2 that is (w,γ) 2. Measuring the margin in this (n + 1)-dimensional space instead of R n induces strong conveity and has little or no effect on the problem as was shown in [14]. Thus using twice the reciprocal squared of the margin instead, yields our modified SVM problem as follows: (w,γ,y) R n+1+m ν 2 y y (w w + γ 2 ) s.t. D(Aw eγ) + y e y 0. It was shown computationally in [15] that this reformulation (5) of the conventional support vector machine formulation (1) yields similar results to (1). At a solution of problem (5), y is given by (4) (5) y = (e D(Aw eγ)) +, (6)
4 page 4 4 PSfrag replacements A- w = γ + 1 w = γ 1 A+ Margin= 2 w 2 w Separating Surface: w = γ Figure 1. The bounding planes (2) with margin 2, and the plane (3) w 2 separating A+, the points represented by rows of A with D ii = +1, from A, the points represented by rows of A with D ii = 1. where, as defined in the Introduction, ( ) + replaces negative components of a vector by zeros. Thus, we can replace y in (5) by (e D(Aw eγ)) + and convert the SVM problem (5) into an equivalent SVM which is an unconstrained optimization problem as follows: (w,γ) R n+1 ν 2 (e D(Aw eγ)) (w w + γ 2 ). (7) This problem is a strongly conve imization problem without any constraints. It is easy to show that it has a unique solution. However, the objective function in (7) is not twice differentiable which precludes the use of a fast Newton method. In [10] we smoothed this problem and applied a fast Newton method to solve it as well as the nonlinear kernel problem which we describe now. We first describe how the generalized support vector machine (GSVM) [12] generates a nonlinear separating surface by using a completely arbitrary kernel. The GSVM solves the following mathematical program for a general kernel K(A, A ): (u,γ,y) R 2m+1 νe y + f(u) s.t. D(K(A, A )Du eγ) + y e y 0. Here f(u) is some conve function on R m which suppresses the parameter u and ν is some positive number that weights the classification error e y versus the suppression (8)
5 page 5 5 of u. A solution of this mathematical program for u and γ leads to the nonlinear separating surface K(, A )Du = γ. (9) The linear formulation (1) of Section 2 is obtained if we let K(A, A ) = AA, w = A Du and f(u) = 1 2 u DAA Du. We now use a different classification objective which not only suppresses the parameter u but also suppresses γ in our nonlinear formulation: (u,γ,y) R 2m+1 At a solution of (10), y is given by ν 2 y y (u u + γ 2 ) s.t. D(K(A, A )Du eγ) + y e y 0. (10) y = (e D(K(A, A )Du eγ)) +, (11) where, as defined earlier, ( ) + replaces negative components of a vector by zeros. Thus, we can replace y in (10) by (e D(K(A, A )Du eγ)) + and convert the SVM problem (10) into an equivalent SVM which is an unconstrained optimization problem as follows: (u,γ) R m+1 ν 2 (e D(K(A, A )Du eγ)) (u u + γ 2 ). (12) Again, as in (7), this problem is a strongly conve imization problem without any constraints, has a unique solution but its objective function is not twice differentiable. To apply a fast Newton method we use the smoothing techniques of [4, 5] and replace + by a very accurate smooth approimation as was done in [10]. 1 Thus we replace + by p(, α), the integral of the sigmoid function 1+ε of neural α networks [11, 4] for some α > 0. That is: p(, α) = + 1 α log(1 + ε α ), α > 0. (13) This p function with a smoothing parameter α is used here to replace the plus function of (12) to obtain a smooth support vector machine (SSVM) : ν (u,γ) R m+1 2 p(e D(K(A, A )Du eγ), α) (u u + γ 2 ). (14) It was shown in [10] that the solution of problem (10) is obtained by solving problem (14) with α approaching infinity. Computationally, we used the limit values of 1 the sigmoid function 1+ε and the p function (13) as the smoothing parameter α α approaches infinity, that is the unit step function with value 1 2 at zero and the plus function ( ) + respectively. This gave etremely good results both here and in [10]. The twice differentiable property of the objective function of (14) enables us to utilize a globally quadratically convergent Newton algorithm for solving the smooth support vector machine (14) [10, Algorithm 3.1] which consists of solving successive
6 page 6 6 linearizations of the gradient of the objective function set to zero. Problem (14) which is capable of generating a highly nonlinear separating surface (9), retains the strong conveity and differentiability properties for any arbitrary kernel. However, we still have to contend with two difficulties. Firstly, problem (14) is a problem in m + 1 variables, where m could be of the order of millions for large datasets. Secondly, the resulting nonlinear separating surface (9) depends on the entire dataset represented by the matri A. This creates an unwieldy storage difficulty for very large datasets and makes the use of nonlinear kernels impractical for such problems. To avoid these two difficulties we turn our attention to the reduced support vector machine. 3 RSVM: The Reduced Support Vector Machine The motivation for RSVM comes from the practical objective of generating a nonlinear separating surface (9) for a large dataset which requires a small portion of the dataset for its characterization. The difficulty in using nonlinear kernels on large datasets is twofold. First is the computational difficulty in solving the the potentially huge unconstrained optimization problem (14) which involves the kernel function K(A, A ) that typically leads to the computer running out of memory even before beginning the solution process. For eample for the Adult dataset with points, which is actually solved with RSVM in Section 4, this would mean a map into a space of over one billion dimensions for a conventional SVM. The second difficulty comes from utilizing the formula (9) for the separating surface on a new unseen point. The formula dictates that we store and utilize the entire data set represented by the matri A which may be prohibitively epensive storage-wise and computing-time-wise. For eample for the Adult dataset just mentioned which has an input space of 123 dimensions, this would mean that the nonlinear surface (9) requires a storage capacity for 4,005,126 numbers. To avoid all these difficulties and based on eperience with chunking methods [2, 13], we hit upon the idea of using a very small random subset of the dataset given by m points of the original m data points with m << m, that we call Ā and use Ā in place of A in both the unconstrained optimization problem (14), to cut problem size and computation time, and for the same purposes in evaluating the nonlinear surface (9). Note that the matri A is left intact in K(A, Ā ). Computational testing results show a standard deviation of or less of test set correctness over 50 random choices for Ā. By contrast if both A and A are replaced by Ā and Ā respectively, then test set correctness declines substantially compared to RSVM, while the standard deviation of test set correctness over 50 cases increases more than tenfold over that of RSVM. The justification for our proposed approach is this. We use a small random Ā sample of our dataset as a representative sample with respect to the entire dataset A both in solving the optimization problem (14) and in evaluating the the nonlinear separating surface (9). We interpret this as a possible instance-based learning [17, Chapter 8] where the small sample Ā is learning from the much larger training set A by forg the appropriate rectangular kernel relationship K(A, Ā ) between the
7 page 7 7 original and reduced sets. This formulation works etremely well computationally as evidenced by the computational results that we present in the net section of the paper. By using the formulations described in Section 2 for the full dataset A R m n with a square kernel K(A, A ) R m m, and modifying these formulations for the reduced dataset Ā R m n with corresponding diagonal matri D and rectangular kernel K(A, Ā ) R m m, we obtain our RSVM Algorithm below. This algorithm solves, by smoothing, the RSVM quadratic program obtained from (10) by replacing A with Ā as follows: (ū,γ,y) R m+1+m ν 2 y y (ū ū + γ 2 ) s.t. D(K(A, Ā ) Dū eγ) + y e y 0. (15) Algorithm 3.1 RSVM Algorithm (i) Choose a random subset matri Ā R m n of the original data matri A R m n. Typically m is 1% to 10% of m. (The random matri Ā choice was such that the distance between its rows eceeded a certain tolerance.) (ii) Solve the following modified version of the SSVM (14) where A only is replaced by Ā with corresponding D D: ν (ū,γ) R m+1 2 p(e D(K(A, Ā ) Dū eγ), α) (ū ū + γ 2 ), (16) which is equivalent to solving (10) with A only replaced by Ā. (iii) The separating surface is given by (9) with A replaced by Ā as follows: K(, Ā ) Dū = γ, (17) where (ū, γ) R m+1 is the unique solution of (16), and R n is a free input space variable of a new point. (iv) A new input point R n is classified into class +1 or 1 depending on whether the step function: is +1 or zero, respectively. (K(, Ā ) Dū γ), (18) As stated earlier, this algorithm is quite insensitive as to which submatri Ā is chosen for (16)-(17), as far as tenfold cross-validation correctness is concerned. In fact, another choice for Ā is to choose it randomly but only keep rows that are more than a certain imal distance apart. This leads to a slight improvement in testing correctness but increases computational time somewhat. Replacing both A and A in a conventional SVM by randomly chosen reduced matrices Ā and Ā gives poor testing set results that vary significantly with the choice of Ā, as will be demonstrated in the numerical results given in the net section to which we turn now.
8 page Computational Results We applied RSVM to three groups of publicly available test problems: the checkerboard problem [8, 9], si test problems from the University of California (UC) Irvine repository [18] and the Adult data set from the same repository. We show that RSVM performs better than a conventional SVM using the entire training set and much better than a conventional SVM using only the same randomly chosen set by RSVM. We also show, using time comparisons, that RSVM performs better than sequential imal optimization (SMO) [19] and projected conjugate gradient chunking (PCGC) [7, 3]. Computational time on the Adult datasets grows nearly linearly for RSVM, whereas SMO and PCGC times grow at a much faster nonlinear rate. All our eperiments were solved by using the globally quadratically convergent smooth support vector machine (SSVM) algorithm [10] that merely solves a finite sequence of systems of linear equations defined by a positive definite Hessian matri to get a Newton direction at each iteration. Typically 5 to 8 systems of linear equations are solved by SSVM and hence each data point A i, i = 1,..., m is accessed 5 to 8 times by SSVM. Note that no special optimization packages such as linear or quadratic programg solvers are needed. We implemented SSVM using standard native MATLAB commands [16]. We used a Gaussian kernel [12]: ε α Ai Aj 2 2, i, j = 1,..., m for all our numerical tests. A polynomial kernel of degree 6 was also used on the checkerboard with similar results which are not reported here. All parameters in these tests were chosen for optimal performance on a tuning set, a surrogate for a test set. All our eperiments were run on the University of Wisconsin Computer Sciences Department Ironsides cluster. This cluster of four Sun Enterprise E6000 machines, each machine consisting of 16 UltraSPARC II 250 MHz processors and 2 gigabytes of RAM, resulting in a total of 64 processors and 8 gigabytes of RAM. The checkerboard dataset [8, 9] consists of 1000 points in R 2 of black and white points taken from siteen black and white squares of a checkerboard. This dataset is chosen in order to depict graphically the effectiveness of RSVM using a random 5% or 10% of the given 1000-point training dataset compared to the very poor performance of a conventional SVM on the same 5% or 10% randomly chosen subset. Figures 2 and 4 show the poor pattern approimating a checkerboard obtained by a conventional SVM using a Gaussian kernel, that is solving (10) with both A and A replaced by the randomly chosen Ā and Ā respectively. Test set correctness of this conventional SVM using the reduced Ā and Ā averaged, over 15 cases, 43.60% for the 50-point dataset and 67.91% for the 100-point dataset, on a test set of points. In contrast, using our RSVM Algorithm 3.1 on the same randomly chosen submatrices Ā, yields the much more accurate representations of the checkerboard depicted in Figures 3 and 5 with corresponding average test set correctness of 96.70% and 97.55% on the same test set.
9 page Figure 2. SVM: Checkerboard resulting from a randomly selected 50 points, out of a 1000-point dataset, and used in a conventional Gaussian kernel SVM (10). The resulting nonlinear surface, separating white and black areas, generated using the 50 random points only, depends eplicitly on those points only. Correctness on a point test set averaged 43.60% on 15 randomly chosen 50-point sets, with a standard deviation of and best correctness of 61.03% depicted above Figure 3. RSVM: Checkerboard resulting from randomly selected 50 points and used in a reduced Gaussian kernel SVM (15). The resulting nonlinear surface, separating white and black areas, generated using the entire 1000-point dataset, depends eplicitly on the 50 points only. The remaining 950 points can be thrown away once the separating surface has been generated. Correctness on a point test set averaged 96.7% on 15 randomly chosen 50-point sets, with a standard deviation of and best correctness of 98.04% depicted above.
10 page Figure 4. SVM: Checkerboard resulting from a randomly selected 100 points, out of a 1000-point dataset, and used in a conventional Gaussian kernel SVM (10). The resulting nonlinear surface, separating white and black areas, generated using the 100 random points only, depends eplicitly on those points only. Correctness on a point test set averaged 67.91% on 15 randomly chosen 100-point sets, with a standard deviation of and best correctness of 76.09% depicted above Figure 5. RSVM: Checkerboard resulting from randomly selected 100 points and used in a reduced Gaussian kernel SVM (15). The resulting nonlinear surface, separating white and black areas, generated using the entire 1000-point dataset, depends eplicitly on the 100 points only. The remaining 900 points can be thrown away once the separating surface has been generated. Correctness on a point test set averaged 97.55% on 15 randomly chosen 100-point sets, with a standard deviation of and best correctness of 98.26% depicted above.
11 page The net set of numerical results in Table 1 on the si UC Irvine test problems: Ionosphere, BUPA Liver, Pima Indians, Cleveland Heart, Tic-Tac-Toe and Mushroom, show that RSVM, with m m 10 on all these datasets, got better test set correctness than that of a conventional SVM (10) using the full data matri A and much better than the conventional SVM (10) using the same reduced matrices Ā and Ā. RSVM was also better than the linear SVM using the full data matri A. A possible reason for the improved test set correctness of RSVM is the avoidance of data overfitting by using a reduced data matri Ā instead of the full data matri A. Tenfold Test Set Correctness % (Best in Bold) Tenfold Computational Time, Seconds Gaussian Kernel Matri Used in SSVM Dataset Size K(A, Ā ) K(A, A ) K(Ā, Ā ) AA (Linear) m n, m m m m m m m m n Cleveland Heart , BUPA Liver , Ionosphere , Pima Indians , Tic-Tac-Toe , Mushroom N/A , N/A Table 1. Tenfold cross-validation correctness results on si UC Irvine datasets demonstrate that the RSVM Algorithm 3.1 can get test set correctness that is better than a conventional nonlinear SVM (10) using either the full data matri A or the reduced matri Ā, as well as a linear kernel SVM using the full data matri A. The computer ran out of memory while generating the full nonlinear kernel for the Mushroom dataset. Average on these si datasets of the standard deviation of the tenfold test set correctness for K(A, Ā ) was and for K(Ā, Ā ) was N/A denotes not available results because the kernel K(A, A ) was too large to store. The third group of test problems, the UCI Adult dataset, uses an m that ranges between 1% to 5% of m in the RSVM Algorithm 3.1. We make the following observations on this set of results given in Table 2: (i) Test set correctness of RSVM was better on average by 10.52% and by as much as 12.52% over a conventional SVM using the same reduced submatrices Ā and
12 page Ā. (ii) The standard deviation of test set correctness for 50 randomly chosen Ā for RSVM was no greater than 0.002, while the corresponding standard deviation for a conventional SVM for the same 50 random Ā and Ā was as large as In fact, smallness of the standard deviation was used as a guide to detering m, the size of the reduced data used in RSVM. Adult Dataset Size K(A, Ā ) m m K(Ā, Ā ) m m Ā m 123 (Training, Testing) Testing % Std. Dev. Testing % Std. Dev. m m/m (1605, 30957) % (2265, 30297) % (3185, 29377) % (4781, 27781) % (6414, 26148) % (11221, 21341) % (16101, 16461) % (22697, 9865) % (32562, 16282) % Table 2. Computational results for 50 runs of RSVM on each of nine commonly used subsets of the Adult dataset [18]. Each run uses a randomly chosen Ā from A for use in an RSVM Gaussian kernel, with the number of rows m of Ā between 1% and 5% of the number of rows m of the full data matri A. Test set correctness for the largest case is the same as that of SMO [20]. Finally, Table 3 and Figure 6 show the nearly linear time growth of RSVM on the Adult dataset as a function of the number of points m in the dataset, compared to the faster nonlinear time growth of SMO [19] and PCGC [7, 3]. 5 Conclusion We have proposed a Reduced Support Vector Machine (RSVM) Algorithm 3.1 that uses a randomly selected subset of the data that is typically 10% or less of the original dataset to obtain a nonlinear separating surface. Despite this reduced dataset, RSVM gets better test set results than that obtained by using the entire data. This may be attributable to a reduction in data overfitting. The reduced dataset is all that is needed in characterizing the final nonlinear separating surface. This is very important for massive datasets such as those used in fraud detection which number in the millions. We may think that all the information in the discarded data has
13 page Adult Datasets - Training Set Size vs. CPU Time in Seconds Size RSVM SMO PCGC N/A N/A N/A Table 3. CPU time comparisons of RSVM, SMO [19] and PCGC [7, 3] with a Gaussian kernel on the Adult datasets. SMO and PCGC were run on a 266 MHz Pentium II processor under Windows NT 4 and using Microsoft s Visual C compiler. PCGC ran out of memory (128 Megabytes) while generating the kernel matri when the training set size is bigger than We quote results from [19]. N/A denotes not available results because the kernel K(A, A ) was too large to store RSVM SMO PCG Chunking Time (CPU sec.) Training set size Figure 6. Indirect CPU time comparison of RSVM, SMO and PCGC for a Gaussian kernel SVM on the nine Adult data subsets. been distilled into the parameters defining the nonlinear surface during the training process via the rectangular kernel K(A, Ā ). Although the training process, which consists of the RSVM Algorithm 3.1, uses the entire dataset in an unconstrained optimization problem (14), it is a problem in R m+1 with m m 10, and hence much easier to solve than that for the full dataset which would be a problem in R m+1. The choice of the random data submatri Ā to be used in RSVM does not af-
14 page fect test set correctness. In contrast, a random choice for a data submatri for a conventional SVM has standard deviation of test set correctness which is more than ten times that of RSVM. With all these properties, RSVM appears to be a very promising method for handling large classification problems using a nonlinear separating surface. Acknowledgements The research described in this Data Mining Institute Report 00-07, July 2000, was supported by National Science Foundation Grants CCR and CDA , by Air Force Office of Scientific Research Grant F and by the Microsoft Corporation. We thank Paul S. Bradley for valuable comments and David R. Musicant for his Gaussian kernel generator.
15 page 15 Bibliography [1] P. S. Bradley and O. L. Mangasarian, Feature selection via concave imization and support vector machines, in Machine Learning Proceedings of the Fifteenth International Conference(ICML 98), J. Shavlik, editor, Morgan Kaufmann, San Francisco, California, 1998, pp [2] P. S. Bradley and O. L. Mangasarian, Massive data discriation via linear support vector machines, Optimization Methods and Software, 13 (2000), pp [3] C. J. C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, 2(2) (1998), pp [4] Chunhui Chen and O. L. Mangasarian, Smoothing methods for conve inequalities and linear complementarity problems, Mathematical Programg 71(1) (1995), pp [5], A class of smoothing functions for nonlinear and mied complementarity problems, Computational Optimization and Applications, 5(2) (1996), pp [6] V. Cherkassky and F. Mulier, Learning from Data - Concepts, Theory and Methods, John Wiley & Sons, New York, [7] P. E. Gill, W. Murray, and M. H. Wright, Practical Optimization, Academic Press, London, [8] T. K. Ho and E. M. Kleinberg, Building projectable classifiers of arbitrary compleity, in Proceedings of the 13th International Conference on Pattern Recognition, Vienna, Austria, 1996, pp Checker dataset at: ftp://ftp.cs.wisc.edu/mathprog/cpo-dataset/machine-learn/checker. [9] L. Kaufman, Solving the quadratic programg problem arising in support vector classification, in Advances in Kernel Methods - Support Vector Learning, B. Schölkopf, C. J. C. Burges, and A. J. Smola, eds., MIT Press, 1999, pp
16 page [10] Yuh-Jye Lee and O. L. Mangasarian, SSVM: A smooth support vector machine, Technical Report 99-03, Data Mining Institute, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, September Computational Optimization and Applications, to appear. ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/99-03.ps. [11] O. L. Mangasarian, Mathematical programg in neural networks, ORSA Journal on Computing, 5(4) (1993), pp [12], Generalized support vector machines, in Advances in Large Margin Classifiers, A. Smola, P. Bartlett, B. Schölkopf, and D. Schuurmans, eds., MIT Press,Cambridge, MA, 2000, pp [13] O. L. Mangasarian and D. R. Musicant, Massive support vector regression, Technical Report 99-02, Data Mining Institute, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, July Machine Learning, to appear. ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/99-02.ps. [14], Successive overrelaation for support vector machines, IEEE Transactions on Neural Networks, 10 (1999), pp [15], Lagrangian support vector machines, Technical Report 00-06, Data Mining Institute, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, June Journal of Machine Learning Research, to appear. ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/00-06.ps. [16] MATLAB, User s Guide, The MathWorks, Inc., Natick, MA 01760, [17] T. M. Mitchell, Machine Learning. McGraw-Hill, Boston, [18] P. M. Murphy and D. W. Aha, UCI repository of machine learning databases, mlearn/mlrepository.html. [19] J. Platt, Sequential imal optimization: A fast algorithm for training support vector machines, in Advances in Kernel Methods - Support Vector Learning, B. Schölkopf, C. J. C. Burges, and A. J. Smola, eds., MIT Press, 1999, pp jplatt/smo.html. [20], Personal communication, May [21] B. Schölkopf, C. Burges, and A. Smola (editors), Advances in Kernel Methods: Support Vector Machines, MIT Press, Cambridge, MA, [22] A. Smola, P. L. Bartlett, B. Schölkopf, and J. Schürmann (editors), Advances in Large Margin Classifiers, MIT Press, Cambridge, MA, [23] V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, 1995.
17 page [24] C. K. I. Williams and M. Seeger, Using the Nyström method to speed up kernel machines, in Advances in Neural Information Processing Systems (NIPS2000), 2000, to appear.
Massive Data Classification via Unconstrained Support Vector Machines
Massive Data Classification via Unconstrained Support Vector Machines Olvi L. Mangasarian and Michael E. Thompson Computer Sciences Department University of Wisconsin 1210 West Dayton Street Madison, WI
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
Support Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
Support Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
Scaling Kernel-Based Systems to Large Data Sets
Data Mining and Knowledge Discovery, Volume 5, Number 3, 200 Scaling Kernel-Based Systems to Large Data Sets Volker Tresp Siemens AG, Corporate Technology Otto-Hahn-Ring 6, 8730 München, Germany ([email protected])
Moving Least Squares Approximation
Chapter 7 Moving Least Squares Approimation An alternative to radial basis function interpolation and approimation is the so-called moving least squares method. As we will see below, in this method the
Introduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
A Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
Support Vector Machine. Tutorial. (and Statistical Learning Theory)
Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. [email protected] 1 Support Vector Machines: history SVMs introduced
On Convergence Rate of Concave-Convex Procedure
On Convergence Rate of Concave-Conve Procedure Ian En-Hsu Yen [email protected] Po-Wei Wang [email protected] Nanyun Peng Johns Hopkins University Baltimore, MD 21218 [email protected] Shou-De
Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler
Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error
Large Margin DAGs for Multiclass Classification
S.A. Solla, T.K. Leen and K.-R. Müller (eds.), 57 55, MIT Press (000) Large Margin DAGs for Multiclass Classification John C. Platt Microsoft Research Microsoft Way Redmond, WA 9805 [email protected]
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: [email protected]
Nonlinear Programming Methods.S2 Quadratic Programming
Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective
Making Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research
Introduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
Linear Algebra Review. Vectors
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka [email protected] http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length
SURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH
1 SURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH Y, HONG, N. GAUTAM, S. R. T. KUMARA, A. SURANA, H. GUPTA, S. LEE, V. NARAYANAN, H. THADAKAMALLA The Dept. of Industrial Engineering,
Machine Learning Logistic Regression
Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
Machine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,
SOLVING LINEAR SYSTEMS
SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
Privacy-Preserving Outsourcing Support Vector Machines with Random Transformation
Privacy-Preserving Outsourcing Support Vector Machines with Random Transformation Keng-Pei Lin Ming-Syan Chen Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan Research Center
Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues
Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
Lecture 2: August 29. Linear Programming (part I)
10-725: Convex Optimization Fall 2013 Lecture 2: August 29 Lecturer: Barnabás Póczos Scribes: Samrachana Adhikari, Mattia Ciollaro, Fabrizio Lecci Note: LaTeX template courtesy of UC Berkeley EECS dept.
Lecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
Vector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty
CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
Several Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
Adaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING
ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING BY OMID ROUHANI-KALLEH THESIS Submitted as partial fulfillment of the requirements for the degree of
Solving polynomial least squares problems via semidefinite programming relaxations
Solving polynomial least squares problems via semidefinite programming relaxations Sunyoung Kim and Masakazu Kojima August 2007, revised in November, 2007 Abstract. A polynomial optimization problem whose
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Mathematical Models of Supervised Learning and their Application to Medical Diagnosis
Genomic, Proteomic and Transcriptomic Lab High Performance Computing and Networking Institute National Research Council, Italy Mathematical Models of Supervised Learning and their Application to Medical
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Multiclass Bounded Logistic Regression Efficient Regularization with Interior Point Method
Multiclass Bounded Logistic Regression Efficient Regularization with Interior Point Method Ribana Roscher, Wolfgang Förstner [email protected], [email protected] TR-IGG-P-2009-02 July 23, 2009 Technical
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
(Quasi-)Newton methods
(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting
The Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data
Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Neil D. Lawrence Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield,
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Beating the MLB Moneyline
Beating the MLB Moneyline Leland Chen [email protected] Andrew He [email protected] 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series
Model Trees for Classification of Hybrid Data Types
Model Trees for Classification of Hybrid Data Types Hsing-Kuo Pao, Shou-Chih Chang, and Yuh-Jye Lee Dept. of Computer Science & Information Engineering, National Taiwan University of Science & Technology,
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
Lecture 8 February 4
ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt
A fast multi-class SVM learning method for huge databases
www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,
Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen
(für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained
Big Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
Music Mood Classification
Music Mood Classification CS 229 Project Report Jose Padial Ashish Goel Introduction The aim of the project was to develop a music mood classifier. There are many categories of mood into which songs may
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
Integer Factorization using the Quadratic Sieve
Integer Factorization using the Quadratic Sieve Chad Seibert* Division of Science and Mathematics University of Minnesota, Morris Morris, MN 56567 [email protected] March 16, 2011 Abstract We give
Nonlinear Optimization: Algorithms 3: Interior-point methods
Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris [email protected] Nonlinear optimization c 2006 Jean-Philippe Vert,
Predicting Flight Delays
Predicting Flight Delays Dieterich Lawson [email protected] William Castillo [email protected] Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing
Similarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem
3 An Illustrative Example
Objectives An Illustrative Example Objectives - Theory and Examples -2 Problem Statement -2 Perceptron - Two-Input Case -4 Pattern Recognition Example -5 Hamming Network -8 Feedforward Layer -8 Recurrent
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy
BMI Paper The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy Faculty of Sciences VU University Amsterdam De Boelelaan 1081 1081 HV Amsterdam Netherlands Author: R.D.R.
Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
Linear Algebra: Vectors
A Linear Algebra: Vectors A Appendix A: LINEAR ALGEBRA: VECTORS TABLE OF CONTENTS Page A Motivation A 3 A2 Vectors A 3 A2 Notational Conventions A 4 A22 Visualization A 5 A23 Special Vectors A 5 A3 Vector
An Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia [email protected] Tata Institute, Pune,
AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS
AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS Revised Edition James Epperson Mathematical Reviews BICENTENNIAL 0, 1 8 0 7 z ewiley wu 2007 r71 BICENTENNIAL WILEY-INTERSCIENCE A John Wiley & Sons, Inc.,
MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS
MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a
Multiple Kernel Learning on the Limit Order Book
JMLR: Workshop and Conference Proceedings 11 (2010) 167 174 Workshop on Applications of Pattern Analysis Multiple Kernel Learning on the Limit Order Book Tristan Fletcher Zakria Hussain John Shawe-Taylor
Classification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
Programming Exercise 3: Multi-class Classification and Neural Networks
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks
CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES *
CLUSTERING LARGE DATA SETS WITH MIED NUMERIC AND CATEGORICAL VALUES * ZHEUE HUANG CSIRO Mathematical and Information Sciences GPO Box Canberra ACT, AUSTRALIA [email protected] Efficient partitioning
Classification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
1 Introduction to Matrices
1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns
Support Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
SUPPORT VECTOR MACHINE (SVM) is the optimal
130 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 1, JANUARY 2008 Multiclass Posterior Probability Support Vector Machines Mehmet Gönen, Ayşe Gönül Tanuğur, and Ethem Alpaydın, Senior Member, IEEE
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Semi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents
Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents William H. Sandholm January 6, 22 O.. Imitative protocols, mean dynamics, and equilibrium selection In this section, we consider
Roots of Equations (Chapters 5 and 6)
Roots of Equations (Chapters 5 and 6) Problem: given f() = 0, find. In general, f() can be any function. For some forms of f(), analytical solutions are available. However, for other functions, we have
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
Derivative Free Optimization
Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTH-MAT-R--2014/02--SE Department of Mathematics Linköping University S-581 83 Linköping, Sweden. Three lectures 1 on Derivative Free
Data Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan
Data Mining: An Overview David Madigan http://www.stat.columbia.edu/~madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:
13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.
3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in three-space, we write a vector in terms
DUOL: A Double Updating Approach for Online Learning
: A Double Updating Approach for Online Learning Peilin Zhao School of Comp. Eng. Nanyang Tech. University Singapore 69798 [email protected] Steven C.H. Hoi School of Comp. Eng. Nanyang Tech. University
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, [email protected] Abstract: Independent
Nonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
Network Intrusion Detection using Semi Supervised Support Vector Machine
Network Intrusion Detection using Semi Supervised Support Vector Machine Jyoti Haweliya Department of Computer Engineering Institute of Engineering & Technology, Devi Ahilya University Indore, India ABSTRACT
Linear Programming. March 14, 2014
Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1
Scalar Valued Functions of Several Variables; the Gradient Vector
Scalar Valued Functions of Several Variables; the Gradient Vector Scalar Valued Functions vector valued function of n variables: Let us consider a scalar (i.e., numerical, rather than y = φ(x = φ(x 1,
Mathematical Programming for Data Mining: Formulations and Challenges
Mathematical Programming for Data Mining: Formulations and Challenges P. S. Bradley Usama M. Fayyad O. L. Mangasarian Decision Theory & Adaptive Systems Computer Sciences Dept. Microsoft Research University
