Lecture 3: SVM dual, kernels and regression
|
|
- Justin Walsh
- 7 years ago
- Views:
Transcription
1 Lecture 3: SVM dual, kernels and regression C19 Machine Learning Hilary 215 A. Zisserman Primal and dual forms Linear separability revisted Feature maps Kernels for SVMs Regression Ridge regression Basis functions
2 SVM review We have seen that for an SVM learning a linear classifier f(x) =w > x + b is formulated as solving an optimization problem over w : min w R d w 2 + C i max (, 1 y i f(x i )) This quadratic optimization problem is known as the primal problem. Instead,theSVMcanbeformulatedtolearnalinearclassifier f(x) = i α i y i (x i > x)+b by solving an optimization problem over α i. This is know as the dual problem, and we will look at the advantages of this formulation.
3 Sketch derivation of dual form The Representer Theorem states that the solution w can always be written as a linear combination of the training data: w = j=1 α j y j x j Proof: see example sheet. Now, substitute for w in f(x) =w > x + b f(x) = j=1 α j y j x j > x + b = j=1 α j y j ³ xj > x + b and for w in the cost function min w w 2 subject to y i ³ w > x i + b 1, i w 2 = X j α j y j x j X > k α k y k x k = X jk Hence, an equivalent optimization problem is over α j min α j X jk α j α k y j y k (x j > x k ) subject to y i j=1 α j α k y j y k (x j > x k ) α j y j (x > j x i )+b 1, i and a few more steps are required to complete the derivation.
4 Primal and dual formulations N is number of training points, and d is dimension of feature vector x. Primal problem: for w R d min w R w 2 + C d i max (, 1 y i f(x i )) Dual problem: for α R N (stated without proof): X max α i 1 X α j α k y j y k (x > j x k ) subject to α i C for i, and X α i i 2 jk i α i y i = Need to learn d parameters for primal, and N for dual If N<<dthen more efficient to solve for α than w Dual form only involves (x j > x k ). We will return to why this is an advantage whenwelookatkernels.
5 Primal and dual formulations Primal version of classifier: f(x) =w > x + b Dual version of classifier: f(x) = i α i y i (x i > x)+b At first sight the dual form appears to have the disadvantage of a K-NN classifier it requires the training data points x i. However, many of the α i s are zero. The ones that are non-zero define the support vectors x i.
6 Support Vector Machine w T x + b = b w Support Vector Support Vector w f(x) = X i α i y i (x i > x)+b support vectors
7 C = 1 soft margin
8 Handling data that is not linearly separable introduce slack variables min w R d,ξ i R w 2 + C + subject to ξ i i y i ³ w > x i + b 1 ξ i for i =1...N linear classifier not appropriate??
9 Solution 1: use polar coordinates r θ < > θ r Data is linearly separable in polar coordinates Acts non-linearly in original space!! Φ : Ã x1 x 2 Ã r θ R 2 R 2
10 Solution 2: map data to higher dimension Φ : Ã x1 x 2! x 2 1 x 2 2 2x1 x 2 R 2 R 3 Data is linearly separable in 3D This means that the problem can still be solved by a linear classifier
11 SVM classifiers in a transformed feature space R d R D f(x) = Φ Φ : x Φ(x) R d R D Learn classifier linear in w for R D : Φ(x) is a feature map f(x) =w > Φ(x) + b
12 Primal Classifier in transformed feature space Classifier, withw R D : Learning, forw R D f(x) =w > Φ(x) + b min w R w 2 + C D i max (, 1 y i f(x i )) Simply map x to Φ(x) where data is separable Solve for w in high dimensional space R D If D>>dthen there are many more parameters to learn for w. Can this be avoided?
13 Classifier: Dual Classifier in transformed feature space Learning: subject to f(x) = f(x) = X max α i max α i i X i i i α i 1 2 α i 1 2 α i y i x i > x + b α i y i Φ(x i ) > Φ(x) + b X jk X jk α j α k y j y k x j > x k α j α k y j y k Φ(x j ) > Φ(x k ) α i C for i, and X i α i y i =
14 Dual Classifier in transformed feature space Note, that Φ(x) only occurs in pairs Φ(x j ) > Φ(x i ) Once the scalar products are computed, only the N dimensional vector α needs to be learnt; it is not necessary to learn in the D dimensional space, as it is for the primal Write k(x j, x i )=Φ(x j ) > Φ(x i ). This is known as a Kernel Classifier: Learning: subject to f(x) = X max α i 1 α i i 2 i α i y i k(x i, x) + b X jk α j α k y j y k k(x j, x k ) α i C for i, and X i α i y i =
15 Special transformations Φ : Ã x1 x 2! x 2 1 x 2 2 2x1 x 2 R 2 R 3 Φ(x) > Φ(z) = ³ x 2 1,x2 2, 2x 1 x 2 z 2 1 z 2 2 2z1 z 2 Kernel Trick = x 2 1 z2 1 + x2 2 z2 2 +2x 1x 2 z 1 z 2 = (x 1 z 1 + x 2 z 2 ) 2 = (x > z) 2 Classifier can be learnt and applied without explicitly computing Φ(x) All that is required is the kernel k(x, z) =(x > z) 2 Complexity of learning depends on N (typically it is O(N 3 )) not on D
16 Example kernels Linear kernels k(x, x )=x > x Polynomial kernels k(x, x )= ³ 1+x > x d for any d> Contains all polynomials terms up to degree d Gaussian kernels k(x, x ) = exp ³ x x 2 /2σ 2 for σ > Infinite dimensional feature space
17 SVM classifier with Gaussian kernel f(x) = i α i y i k(x i, x)+b N = size of training data weight (may be zero) support vector Gaussian kernel k(x, x )=exp ³ x x 2 /2σ 2 Radial Basis Function (RBF) SVM f(x) = i α i y i exp ³ x x i 2 /2σ 2 + b
18 RBF Kernel SVM Example.6.4 feature y feature x data is not linearly separable in original feature space
19 σ =1. C = f(x) =1 f(x) = f(x) = 1 f(x) = i α i y i exp ³ x x i 2 /2σ 2 + b
20 σ =1. C = 1 Decrease C, gives wider (soft) margin
21 σ =1. C =1 f(x) = i α i y i exp ³ x x i 2 /2σ 2 + b
22 σ =1. C = f(x) = i α i y i exp ³ x x i 2 /2σ 2 + b
23 σ =.25 C = Decrease sigma, moves towards nearest neighbour classifier
24 σ =.1 C = f(x) = i α i y i exp ³ x x i 2 /2σ 2 + b
25 Kernel Trick - Summary Classifiers can be learnt for high dimensional features spaces, without actually having to map the points into the high dimensional space Data may be linearly separable in the high dimensional space, but not linearly separable in the original feature space Kernels can be used for an SVM because of the scalar product in the dual form, but can also be used elsewhere they are not tied to the SVM formalism Kernels apply also to objects that are not vectors, e.g. k(h, h )= P k min(h k,h k ) for histograms with bins h k,h k
26 Regression y Suppose we are given a training set of N observations ((x 1,y 1 ),...,(x N,y N )) with x i R d,y i R The regression problem is to estimate f(x) from this data such that y i = f(x i )
27 Learning by optimization As in the case of classification, learning a regressor can be formulated as an optimization: Minimize with respect to f F i=1 l (f(x i ),y i ) + λr (f) loss function regularization There is a choice of both loss functions and regularization e.g. squared loss, SVM hinge-like loss squared regularizer, lasso regularizer
28 Choice of regression function non-linear basis functions Function for regression y(x, w) is a non-linear function of x, but linear in w: f(x, w) =w + w 1 φ 1 (x)+w 2 φ 2 (x)+...+ w M φ M (x) =w > Φ(x) For example, for x R, polynomial regression with φ j (x) =x j : f(x, w) =w + w 1 φ 1 (x)+w 2 φ 2 (x)+...+ w M φ M (x) = e.g. for M =3, f(x, w) =(w,w 1,w 2,w 3 ) Φ : x Φ(x) R 1 R 4 1 x x 2 x 3 = w> Φ(x) MX w j x j j=
29 Least squares ridge regression Cost function squared loss: target value y i loss function regularization x i Regression function for x (1D): f(x, w) =w + w 1 φ 1 (x)+w 2 φ 2 (x)+...+ w M φ M (x) =w > Φ(x) NB squared loss arises in Maximum Likelihood estimation for an error model y i =ỹ i + n i n i N (, σ 2 ) measured value true value
30 Solving for the weights w Notation: write the target and regressed values as N-vectors y = y 1 y 2.. y N f = Φ(x 1 ) > w Φ(x 2 ) > w.. Φ(x N ) > w = Φw = 1 φ 1 (x 1 )... φ M (x 1 ) 1 φ 1 (x 2 )... φ M (x 2 ) φ 1 (x N )... φ M (x N ) w w 1.. w M Φ is an N M design matrix e.g. for polynomial regression with basis functions up to x 2 Φw = 1 x 1 x x 2 x x N x 2 N w w 1 w 2
31 ee(w) = 1 2 = 1 2 i=1 i=1 {f(x i, w) y i } 2 + λ 2 kwk2 ³ yi w > Φ(x i ) 2 + λ 2 kwk2 = 1 2 (y Φw)2 + λ 2 kwk2 Now, compute where derivative w.r.t. w is zero for minimum Hence ee(w) dw = Φ> (y Φw) + λw = ³ Φ > Φ + λi w = Φ > y w = ³ Φ > Φ + λi 1 Φ > y
32 M basis functions, N data points w = ³ Φ > Φ + λi 1 Φ > y = assume N > M M x 1 M x M M x N N x 1 This shows that there is a unique solution. If λ = (no regularization), then w =(Φ > Φ ) 1 Φ > y = Φ + y where Φ + is the pseudo-inverse of Φ (pinv in Matlab) Adding the term λi improves the conditioning of the inverse, since if Φ is not full rank, then (Φ > Φ + λi) willbe(forsufficiently large λ) As λ, w 1 λ Φ> y Often the regularization is applied only to the inhomogeneous part of w, i.e. to w, wherew =(w, w)
33 w = ³ Φ > Φ + λi 1 Φ > y f(x, w) = w > Φ(x) =Φ(x) > w = Φ(x) > ³ Φ > Φ + λi 1 Φ > y = b(x) > y Output is a linear blend, b(x), of the training values {y i }
34 Example 1: polynomial basis functions The red curve is the true function (which is not a polynomial) ideal fit Sample points Ideal fit The data points are samples from the curve with added noise in y. y.5 There is a choice in both the degree, M, of the basis functions used, and in the strength of the regularization f(x, w) = MX w j x j = w > Φ(x) j= x Φ : x Φ(x) R R M+1 w is a M+1 dimensional vector
35 N = 9 samples, M = Sample points Ideal fit lambda = Sample points Ideal fit lambda = y y x x Sample points Ideal fit lambda = 1e Sample points Ideal fit lambda = 1e y y x x
36 M = 3 M = least-squares fit Sample points Ideal fit Least-squares solution least-squares fit Sample points Ideal fit Least-squares solution.5.5 y y y Polynomial basis functions x x y Polynomial basis x functions x
37 Example 2: Gaussian basis functions The red curve is the true function (which is not a polynomial) The data points are samples from the curve with added noise in y ideal fit Sample points Ideal fit Basis functions are centred on the training data (N points) There is a choice in both the scale, sigma, of the basis functions used, and in the strength of the regularization f(x, w) = w i e (x x i) 2 /σ 2 i=1 y = w > Φ(x) x Φ : x Φ(x) R R N w is a N-vector
38 N = 9 samples, sigma = Sample points Ideal fit lambda = Sample points Ideal fit lambda = y y x x Sample points Ideal fit lambda = 1e Sample points Ideal fit lambda = 1e y y x x
39 Choosing lambda using a validation set 6 5 Ideal fit Validation Training Min error Sample points Ideal fit Validation set fit 4.5 error norm 3 y log x
40 1.5 1 Sigma =.334 Sample points Ideal fit Validation set fit Sigma =.1 Sample points Ideal fit Validation set fit.5.5 y y x Gaussian basis functions x Gaussian basis functions y y x x
41 Application: regressing face pose Estimate two face pose angles: yaw (around the Y axis) pitch (around the X axis) Compute a HOG feature vector for each face region Learn a regressor from the HOG vector to the two pose angles
42 Summary and dual problem So far we have considered the primal problem where f(x, w) = MX i=1 andwewantedasolutionforw R M w i φ i (x) =w > Φ(x) As in the case of SVMs, we can also consider the dual problem where w = i=1 a i Φ(x i ) and f(x, a) = and obtain a solution for a R N. i a i Φ(x i ) > Φ(x) Again there is a closed form solution for a, the solution involves the N N Gram matrix k(x i,x j )=Φ(x i ) > Φ(x j ), so we can use the kernel trick again to replace scalar products
43 Background reading and more Bishop, chapters 6 & 7 for kernels and SVMs Hastie et al, chapter 12 Bishop, chapter 3 for regression More on web page:
Support Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationLecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
More informationSupport Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationA Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
More informationSeveral Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationSupport Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationSupport Vector Machine. Tutorial. (and Statistical Learning Theory)
Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@nec-labs.com 1 Support Vector Machines: history SVMs introduced
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationSimilarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl
More informationBig Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
More informationVector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty
More informationInner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
More informationT ( a i x i ) = a i T (x i ).
Chapter 2 Defn 1. (p. 65) Let V and W be vector spaces (over F ). We call a function T : V W a linear transformation form V to W if, for all x, y V and c F, we have (a) T (x + y) = T (x) + T (y) and (b)
More informationAIMS Big data. AIMS Big data. Outline. Outline. Lecture 5: Structured-output learning January 7, 2015 Andrea Vedaldi
AMS Big data AMS Big data Lecture 5: Structured-output learning January 7, 5 Andrea Vedaldi. Discriminative learning. Discriminative learning 3. Hashing and kernel maps 4. Learning representations 5. Structured-output
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationLecture 6: Logistic Regression
Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,
More informationData visualization and dimensionality reduction using kernel maps with a reference point
Data visualization and dimensionality reduction using kernel maps with a reference point Johan Suykens K.U. Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg 1 B-31 Leuven (Heverlee), Belgium Tel: 32/16/32 18
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationAcknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues
Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the
More informationSupport Vector Machines
CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe are indeed the best)
More informationSupport Vector Machines for Classification and Regression
UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationNOTES ON LINEAR TRANSFORMATIONS
NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all
More informationDuality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725
Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T
More informationMachine Learning in FX Carry Basket Prediction
Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines
More informationAdaBoost. Jiri Matas and Jan Šochman. Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz
AdaBoost Jiri Matas and Jan Šochman Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Presentation Outline: AdaBoost algorithm Why is of interest? How it works? Why
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationLinearly Independent Sets and Linearly Dependent Sets
These notes closely follow the presentation of the material given in David C. Lay s textbook Linear Algebra and its Applications (3rd edition). These notes are intended primarily for in-class presentation
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More information( ) which must be a vector
MATH 37 Linear Transformations from Rn to Rm Dr. Neal, WKU Let T : R n R m be a function which maps vectors from R n to R m. Then T is called a linear transformation if the following two properties are
More informationArtificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
More informationFactorization Theorems
Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization
More informationMathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
More informationLinear Programming Notes V Problem Transformations
Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material
More informationMATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).
MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors Jordan canonical form (continued) Jordan canonical form A Jordan block is a square matrix of the form λ 1 0 0 0 0 λ 1 0 0 0 0 λ 0 0 J = 0
More informationLocal features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
More informationKERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA
Rahayu, Kernel Logistic Regression-Linear for Leukemia Classification using High Dimensional Data KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA S.P. Rahayu 1,2
More informationIntroduction: Overview of Kernel Methods
Introduction: Overview of Kernel Methods Statistical Data Analysis with Positive Definite Kernels Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University
More informationAnalysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
More informationDecompose Error Rate into components, some of which can be measured on unlabeled data
Bias-Variance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Decomposition for Regression Bias-Variance Decomposition for Classification Bias-Variance
More informationTHE SVM APPROACH FOR BOX JENKINS MODELS
REVSTAT Statistical Journal Volume 7, Number 1, April 2009, 23 36 THE SVM APPROACH FOR BOX JENKINS MODELS Authors: Saeid Amiri Dep. of Energy and Technology, Swedish Univ. of Agriculture Sciences, P.O.Box
More informationMAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A =
MAT 200, Midterm Exam Solution. (0 points total) a. (5 points) Compute the determinant of the matrix 2 2 0 A = 0 3 0 3 0 Answer: det A = 3. The most efficient way is to develop the determinant along the
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationLecture 2: Homogeneous Coordinates, Lines and Conics
Lecture 2: Homogeneous Coordinates, Lines and Conics 1 Homogeneous Coordinates In Lecture 1 we derived the camera equations λx = P X, (1) where x = (x 1, x 2, 1), X = (X 1, X 2, X 3, 1) and P is a 3 4
More information1 Sets and Set Notation.
LINEAR ALGEBRA MATH 27.6 SPRING 23 (COHEN) LECTURE NOTES Sets and Set Notation. Definition (Naive Definition of a Set). A set is any collection of objects, called the elements of that set. We will most
More informationMACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationSystems of Linear Equations
Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and
More information1 Introduction to Matrices
1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns
More information1 Norms and Vector Spaces
008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)
More informationDistributed Machine Learning and Big Data
Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya
More informationby the matrix A results in a vector which is a reflection of the given
Eigenvalues & Eigenvectors Example Suppose Then So, geometrically, multiplying a vector in by the matrix A results in a vector which is a reflection of the given vector about the y-axis We observe that
More informationSolving Systems of Linear Equations
LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how
More informationExact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure
Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationLINEAR ALGEBRA W W L CHEN
LINEAR ALGEBRA W W L CHEN c W W L Chen, 1997, 2008 This chapter is available free to all individuals, on understanding that it is not to be used for financial gain, and may be downloaded and/or photocopied,
More informationMulticlass Classification. 9.520 Class 06, 25 Feb 2008 Ryan Rifkin
Multiclass Classification 9.520 Class 06, 25 Feb 2008 Ryan Rifkin It is a tale Told by an idiot, full of sound and fury, Signifying nothing. Macbeth, Act V, Scene V What Is Multiclass Classification? Each
More informationUnderstanding Basic Calculus
Understanding Basic Calculus S.K. Chung Dedicated to all the people who have helped me in my life. i Preface This book is a revised and expanded version of the lecture notes for Basic Calculus and other
More informationMoving Least Squares Approximation
Chapter 7 Moving Least Squares Approimation An alternative to radial basis function interpolation and approimation is the so-called moving least squares method. As we will see below, in this method the
More informationLecture 8 February 4
ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt
More informationSemi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationUniversity of Lille I PC first year list of exercises n 7. Review
University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients
More informationIntroduction to nonparametric regression: Least squares vs. Nearest neighbors
Introduction to nonparametric regression: Least squares vs. Nearest neighbors Patrick Breheny October 30 Patrick Breheny STA 621: Nonparametric Statistics 1/16 Introduction For the remainder of the course,
More information203.4770: Introduction to Machine Learning Dr. Rita Osadchy
203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:
More informationJava Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationLinear Algebra Review. Vectors
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationPYTHAGOREAN TRIPLES KEITH CONRAD
PYTHAGOREAN TRIPLES KEITH CONRAD 1. Introduction A Pythagorean triple is a triple of positive integers (a, b, c) where a + b = c. Examples include (3, 4, 5), (5, 1, 13), and (8, 15, 17). Below is an ancient
More informationMathematics Review for MS Finance Students
Mathematics Review for MS Finance Students Anthony M. Marino Department of Finance and Business Economics Marshall School of Business Lecture 1: Introductory Material Sets The Real Number System Functions,
More informationIntroduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
More informationEarly defect identification of semiconductor processes using machine learning
STANFORD UNIVERISTY MACHINE LEARNING CS229 Early defect identification of semiconductor processes using machine learning Friday, December 16, 2011 Authors: Saul ROSA Anton VLADIMIROV Professor: Dr. Andrew
More informationAlgebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.
Chapter 1 Vocabulary identity - A statement that equates two equivalent expressions. verbal model- A word equation that represents a real-life problem. algebraic expression - An expression with variables.
More informationOrthogonal Diagonalization of Symmetric Matrices
MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding
More information1 VECTOR SPACES AND SUBSPACES
1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such
More informationMAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS
MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a
More informationMatrix Representations of Linear Transformations and Changes of Coordinates
Matrix Representations of Linear Transformations and Changes of Coordinates 01 Subspaces and Bases 011 Definitions A subspace V of R n is a subset of R n that contains the zero element and is closed under
More informationMaking Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research
More informationDEFINITION 5.1.1 A complex number is a matrix of the form. x y. , y x
Chapter 5 COMPLEX NUMBERS 5.1 Constructing the complex numbers One way of introducing the field C of complex numbers is via the arithmetic of matrices. DEFINITION 5.1.1 A complex number is a matrix of
More informationElasticity Theory Basics
G22.3033-002: Topics in Computer Graphics: Lecture #7 Geometric Modeling New York University Elasticity Theory Basics Lecture #7: 20 October 2003 Lecturer: Denis Zorin Scribe: Adrian Secord, Yotam Gingold
More informationTwo Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
More informationNetwork Intrusion Detection using Semi Supervised Support Vector Machine
Network Intrusion Detection using Semi Supervised Support Vector Machine Jyoti Haweliya Department of Computer Engineering Institute of Engineering & Technology, Devi Ahilya University Indore, India ABSTRACT
More informationServer Load Prediction
Server Load Prediction Suthee Chaidaroon (unsuthee@stanford.edu) Joon Yeong Kim (kim64@stanford.edu) Jonghan Seo (jonghan@stanford.edu) Abstract Estimating server load average is one of the methods that
More informationCorrelation and Convolution Class Notes for CMSC 426, Fall 2005 David Jacobs
Correlation and Convolution Class otes for CMSC 46, Fall 5 David Jacobs Introduction Correlation and Convolution are basic operations that we will perform to extract information from images. They are in
More informationTo give it a definition, an implicit function of x and y is simply any relationship that takes the form:
2 Implicit function theorems and applications 21 Implicit functions The implicit function theorem is one of the most useful single tools you ll meet this year After a while, it will be second nature to
More informationSimple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
More informationLarge-Scale Sparsified Manifold Regularization
Large-Scale Sparsified Manifold Regularization Ivor W. Tsang James T. Kwok Department of Computer Science and Engineering The Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong
More information