Introduction to Machine Learning
|
|
- Chloe Lucas
- 7 years ago
- Views:
Transcription
1 Introduction to Machine Learning Felix Brockherde 12 Kristof Schütt 1 1 Technische Universität Berlin 2 Max Planck Institute of Microstructure Physics IPAM Tutorial 2013 Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
2 What is Machine Learning? Infered Structure Data with Pattern Algorithm ML Model ML is about learning structure from data Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
3 Examples Drug discovery Face recognition BCI Recommender systems Search engines DNA splice site detection Speech recognition Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
4 This Talk Part 1: Learning Theory and Supervised ML Basic Ideas of Learning Theory Support Vector Machines Kernels Kernel Ridge Regression Part 2: Unsupervised ML and Application PCA Model Selection Feature Representation Not covered Probabilistic Models Neural Networks Online Learning Reinforcement Learning Semi-supervised Learning etc. Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
5 Supervised Learning Classification Regression y i { 1, +1} y i R Given: Points X = (x 1,..., x N ) with x i R d and Labels Y = (y 1,..., y n ) generated by some joint probability distribution. Learn underlying unknown mapping f (x) = y Important: Performance on unseen data Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
6 Basic Ideas in Learning Theory Risk minimization (RM) Learn a model function f from examples (x 1, y 1 ),..., (x N, y N ) R d R or {+1, 1}, generated from P(x, y) such that the expected number of errors on test data (drawn fom P(x, y)), 1 R[f ] = 2 f (x) y 2 dp(x, y), is minimal. Problem: Distribution P(x, y) is unknown Empirical Risk Minimization (ERM) Replace the average over P(x, y) by average of training samples (i.e. minimize the training error): R emp [f ] = 1 N N i=1 1 2 f (x i) y i 2 Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
7 Law of large numbers: R emp [f ] R[f ] as N. Question: Does min f R emp [f ] give us min f R[f ] for sufficiently large N? No: uniform convergence needed Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
8 Law of large numbers: R emp [f ] R[f ] as N. Question: Does min f R emp [f ] give us min f R[f ] for sufficiently large N? No: uniform convergence needed Error bound for classification With probablity of at least 1 η: D(log 2N D R[f ] R emp [f ] + + 1) log( η 4 ) N where D is the VC dimension (Vapnik and Chervonenkis (1971)). Introduce structure on set of possible functions and use Structural Risk Minimization (SRM). Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
9 The linear function class has VC-dimension D = 3 min f R emp [f ] + Complexity[f ] Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
10 Support Vector Machines (SVM) Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
11 Support Vector Machines (SVM) Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
12 Support Vector Machines (SVM) Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
13 Support Vector Machines (SVM) Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
14 Support Vector Machines (SVM) 2 w {x w x + b = +1} Normalize w so that min xi w x i + b = 1. {x w x + b = 1} b w {x w x + b = 0} w x 1 + b = +1 w x 2 + b = 1 w (x 1 x 2 ) = 2 w w (x 1 x 2 ) = 2 w Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
15 VC Dimension of Hyperplane Classifiers Theorem (Cortes and Vapnik (1995)) Hyperplanes in canonical form have VC Dimension D min{r 2 w 2 + 1, N + 1} where R the radius of the smallest sphere containing the data. SRM Bound: R[f ] R emp [f ] + D(log 2N D + 1) log( η 4 ) N maximal margin = minimum w 2 good generalization, i.e. low risk: min w,b w 2 subject to y i (w x i + b) 1 for i = 1... N Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
16 Slack variables Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
17 Slack variables Introduce slack variables ξ i : ξ i min w,b,ξ i subject to w 2 + C N i=1 ξ i y i (w x i + b) 1 ξ i ξ i 0 Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
18 Non-linear hyperplanes Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
19 Non-linear hyperplanes Map into a higher dimensional feature space: Φ : R 2 R 3 (x 1, x 2 ) (x 2 1, 2x 1 x 2, x 2 2 ) Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
20 Dual SVM Primal min w,b,ξ i w 2 + C N i=1 ξ i subject to y i (w Φ(x i ) + b) 1 ξ i and ξ i 0 for i = 1... N Dual max α subject to N α i 1 N α i α j y i y j (Φ(x i ) Φ(x j )) 2 i=1 i,j=1 N α i y i = 0 and C α i 0 for i = 1... N i=1 Data points x i only appear in scalar products (Φ(x i ) Φ(x j )). Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
21 The Kernel Trick Replace scalar products with kernel function (Müller et al. (2001)): k(x, y) = Φ(x) Φ(y) Compute kernel matrix K ij = k(x i, x j ), i.e. never use Φ directly Underlying mapping Φ can be unknown Kernels can be adopted to specific task, e.g. using prior knowledge (kernels for graphs, trees, strings,... ) Common kernels Gaussian Kernel: k(x, y) = ) exp ( x y 2 2σ 2 Linear Kernel: k(x, y) = x y Polynomial Kernel: k(x, y) = (x y + c) d Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
22 The Support Vectors in SVM max α subject to N α i 1 N α i α j y i y j (Φ(x i ) Φ(x j )) 2 i=1 i,j=1 N α i y i = 0 and C α i 0 for i = 1... N i=1 KKT conditions y i [wφ(x i )) + b] > 1 = a i = 0 x i irrelevant y i [wφ(x i )) + b] = 1 = on/in margin x i Support Vector Old model f (x) = w Φ(x i ) + b becomes via w = N i=1 α iy i Φ(x i ): N f (x) = α i y i k(x i, x) + b f (x) = α i y i k(x i, x) + b i=1 x i SV Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
23 Kernel Ridge Regression (KRR) Ridge Regression min w N y i w x i 2 + λ w 2 i=1 Setting derivative to zero gives w = ( λi + ) 1 N N x i x i y i x i i=1 i=1 Linear Model: f (x) = w x Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
24 Kernelizing Ridge Regression Setting X = (x 1,..., x N ) R d N and Y = (y 1,..., y n ) R N : w = (λi + XX ) 1 XY Apply Woodbury Matrix identity: w = X (X X + λi ) 1 Y Introduce α: α = (K + λi ) 1 Y and w = N Φ(x i )α i i=1 Kernel Model: f (x) = w Φ(x) = N i=1 α ik(x i, x) Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
25 Unsupervised Learning MIXTURE MODELS AND EM Learn structure from unlabeled data Fit an assumed model / distribution to the data Examples clustering blind source separation outlier detection dimensionality reduction (a) 2 (b) 2 (c) (d) 2 (e) 2 (f) (g) 2 (h) 2 (i) Figure 9.1 Illustration of the K-means algorithm using the re-scaled Old Faithful data set. (a) Green points denote the data set in a two-dimensional Euclidean space. The initial choices for centres µ 1 and µ 2 are shown by the red and blue crosses, respectively. (b) In the initial E step, each data point is assigned either to the red cluster or to the blue cluster, according to which cluster centre is nearer. This is equivalent to classifying the points according to which side of the perpendicular bisector of the two cluster centres, shown by the magenta line, they lie on. (c) In the subsequent M step, each cluster centre is re-computed to be the mean of the points assigned to the corresponding cluster. (d) (i) show successive E and M steps through to final convergence of the algorithm. Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
26 Principal Component Analysis (PCA) Given centered data matrix X = (x 1,..., x N ) R NxD best linear approximation { w 1 = arg min X X ww 2} w =1 direction of largest variance { w 1 = arg max X w 2 } w =1 matrix reduction for further components Pearson (1901) X k+1 = X k X k ww Pearson, K On lines and planes of closest fit to systems of points in space. Philosophical Magazine 2: Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
27 Principal Component Analysis (PCA) Given centered data matrix X R NxD, decompose correlated data matrix into uncorrelated, orthogonal PCs diagonalize covariance matrix Σ = 1 N X X Σw k = σ 2 k w k order principal components w k by variance σ 2 k project data to first n principal components Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
28 Principal Component Analysis (PCA) Given centered data matrix X R NxD, decompose correlated data matrix into uncorrelated, orthogonal PCs diagonalize covariance matrix Σ = 1 N X X Σw k = σ 2 k w k order principal components w k by variance σ 2 k project data to first n principal components What about nonlinear correlations? Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
29 Kernel Principal Component Analysis (kpca) Transformation to feature space X X f : Σ f = 1 N X f X f, K = X f X f, K ij = k(x i, x j ) Σ f w k = σ 2 k w k Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
30 Kernel Principal Component Analysis (kpca) Transformation to feature space X X f : Σ f = 1 N X f X f, K = X f X f, K ij = k(x i, x j ) X f X f w k = Nσ 2 k w k Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
31 Kernel Principal Component Analysis (kpca) Transformation to feature space X X f : Σ f = 1 N X f X f, K = X f X f, K ij = k(x i, x j ) X f X f w k = Nσk 2 w k w k = X f α k X f X f X f α k = Nσk 2 X f α k Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
32 Kernel Principal Component Analysis (kpca) Transformation to feature space X X f : Σ f = 1 N X f X f, K = X f X f, K ij = k(x i, x j ) X f X f w k = Nσk 2 w k w k = X f α k X f X f X f α k = Nσk 2 X f α k X f X f X f X f X f α k = Nσ 2 k X f X f α k Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
33 Kernel Principal Component Analysis (kpca) Transformation to feature space X X f : Σ f = 1 N X f X f, K = X f X f, K ij = k(x i, x j ) X f X f w k = Nσk 2 w k w k = X f α k X f X f X f α k = Nσk 2 X f α k X f K 2 α k = Nσ 2 k Kα k Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
34 Kernel Principal Component Analysis (kpca) Transformation to feature space X X f : Σ f = 1 N X f X f, K = X f X f, K ij = k(x i, x j ) X f X f w k = Nσk 2 w k w k = X f α k X f X f X f α k = Nσk 2 X f α k X f K 2 α k = Nσ 2 k Kα k K 1 Kα k = Nσ 2 k α k Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
35 Kernel Principal Component Analysis (kpca) Projection: x f w k = x f X f α k N = α k,i k(x, x i ) i=1 Schölkopf et al. (1997) Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
36 Model Selection Find the model that best fits the data distribution We can only estimate this distribution Consider noise ratio / distribution data correlation Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
37 Hyperparameters train test 0.5 adjust model complexity regularization, kernel parameters, etc. have to be tuned using examples not used for training standard solution: exhaustive search over parameter grid f(x) x f (x) = sin(x) ( x xi 2 ) α i exp f (x) = i σ 2 α = (K + τi ) 1 y Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
38 Grid Search f(x) x 1.5 σ RMSE f(x) f(x) τ x x Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
39 k-fold cross-validation split data model selection training test 4x inner loop evaluation training test 5x outer loop Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
40 k-fold cross-validation split data model selection training test 4x inner loop evaluation training test 5x outer loop Don t even think about looking at the test set! Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
41 From objects to vectors How to represent complex objects for kernel methods? explicit map to vector space: φ : M R n use standard kernel (e.g., linear, polynomial, gaussian) k : R n R n R on mapped features direct use of kernel function: k : M M R Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
42 Feature Representation Given a physical object (molecule, crystal, etc.) and a property of interest, what is a good ML representation? no loss of valuable information support generalization remove invariances decompose problem incorporation of domain knowledge depends on data set, target function and learning method Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
43 Feature Representation - Molecules Coulomb matrix: 0.5Zi 2.4 C ij = Z i Z j r i r j if i = j if i j (a) (b) (c) (d) (e) (Rupp et al., 2012; Montavon et al., 2012) Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
44 Feature Representation - Molecules PCA of Coulomb matrices with atom permutations Montavon et al. (2013) Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
45 Results - Molecules Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
46 Feature Representation - Crystals element pair r 1 r n α α g αα (r 1 ) g αα (r n ) α β g αβ (r 1 ) g αβ (r n ) β α g βα (r 1 ) g βα (r n ) β β g ββ (r 1 ) g ββ (r n ) Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
47 Results - Crystals Learning curve of DOS fermi predictions K.T. Schütt, H. Glawe, F. Brockherde, A. Sanna, K.-R. Müller, E.K.U. Gross, How to represent crystal structures for machine learning: towards fast prediction of electronic properties, arxiv, 2013 Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
48 Machine Learning has been successfully applied to various research fields. Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
49 Machine Learning has been successfully applied to various research fields.... is based on statistical learning theory. Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
50 Machine Learning has been successfully applied to various research fields.... is based on statistical learning theory.... provides fast and accurate predictions on previously unseen data. Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
51 Machine Learning has been successfully applied to various research fields.... is based on statistical learning theory.... provides fast and accurate predictions on previously unseen data.... is able to model non-linear relationships of high-dimensional data. Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
52 Machine Learning has been successfully applied to various research fields.... is based on statistical learning theory.... provides fast and accurate predictions on previously unseen data.... is able to model non-linear relationships of high-dimensional data. Feature representation is key! Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
53 Literature I Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3): Montavon, G., Hansen, K., Fazli, S., Rupp, M., Biegler, F., Ziehe, A., Tkatchenko, A., Lilienfeld, A. V., and Müller, K.-R. (2012). Learning invariant representations of molecules for atomization energy prediction. In Advances in Neural Information Processing Systems, pages Montavon, G., Rupp, M., Gobre, V., Vazquez-Mayagoitia, A., Hansen, K., Tkatchenko, A., Müller, K.-R., and von Lilienfeld, O. A. (2013). Machine learning of molecular electronic properties in chemical compound space. arxiv preprint arxiv: Müller, K.-R., Mika, S., Ratsch, G., Tsuda, K., and Scholkopf, B. (2001). An introduction to kernel-based learning algorithms. Neural Networks, IEEE Transactions on, 12(2): Pearson, K. (1901). Liii. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11): Rupp, M., Tkatchenko, A., Müller, K.-R., and von Lilienfeld, O. A. (2012). Fast and accurate modeling of molecular atomization energies with machine learning. Physical Review Letters, 108(5): Schölkopf, B., Smola, A., and Müller, K.-R. (1997). Kernel principal component analysis. In Artificial Neural Networks ICANN 97, pages Springer. Vapnik, V. N. and Chervonenkis, A. Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability & Its Applications, 16(2): Felix Brockherde, Kristof Schütt Introduction to Machine Learning IPAM Tutorial / 35
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
More informationSupport Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationSupport Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport Vector Machines for Classification and Regression
UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer
More informationA Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationKlaus-Robert Müller et al. Big Data and Machine Learning
Klaus-Robert Müller et al. Big Data and Machine Learning Some Remarks Machine Learning small data (expensive!) big data big data in neuroscience: BCI et al. social media data physics & materials Toward
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationSupport Vector Machine. Tutorial. (and Statistical Learning Theory)
Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@nec-labs.com 1 Support Vector Machines: history SVMs introduced
More informationSemi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
More informationIntroduction: Overview of Kernel Methods
Introduction: Overview of Kernel Methods Statistical Data Analysis with Positive Definite Kernels Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationEarly defect identification of semiconductor processes using machine learning
STANFORD UNIVERISTY MACHINE LEARNING CS229 Early defect identification of semiconductor processes using machine learning Friday, December 16, 2011 Authors: Saul ROSA Anton VLADIMIROV Professor: Dr. Andrew
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationExact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure
Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationKlaus-Robert Müller et al. Machine Learning and Big Data
Klaus-Robert Müller et al. Machine Learning and Big Data Election of the Pope: 2005 [from Wiegand] Election of the Pope: 2013 [from Wiegand] Today s Talk Remarks big data vs. small data (expensive!) Machine
More informationSupervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationMachine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
More informationArtificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
More informationSeveral Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
More informationSupport Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More informationFeature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationData visualization and dimensionality reduction using kernel maps with a reference point
Data visualization and dimensionality reduction using kernel maps with a reference point Johan Suykens K.U. Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg 1 B-31 Leuven (Heverlee), Belgium Tel: 32/16/32 18
More information10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard
More informationLarge-Scale Sparsified Manifold Regularization
Large-Scale Sparsified Manifold Regularization Ivor W. Tsang James T. Kwok Department of Computer Science and Engineering The Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong
More informationA Study on the Comparison of Electricity Forecasting Models: Korea and China
Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 675 683 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.675 Print ISSN 2287-7843 / Online ISSN 2383-4757 A Study on the Comparison
More informationARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
More informationComponent Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
More informationLocal features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
More informationClassification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
More informationMachine Learning in Spam Filtering
Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.
More informationAnalysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationIntroduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu
Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics
More informationData clustering optimization with visualization
Page 1 Data clustering optimization with visualization Fabien Guillaume MASTER THESIS IN SOFTWARE ENGINEERING DEPARTMENT OF INFORMATICS UNIVERSITY OF BERGEN NORWAY DEPARTMENT OF COMPUTER ENGINEERING BERGEN
More informationClassification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
More informationAcknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues
Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the
More informationComparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationNon-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationMaschinelles Lernen mit MATLAB
Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical
More informationSupervised and unsupervised learning - 1
Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in
More informationA Tutorial on Support Vector Machines for Pattern Recognition
c,, 1 43 () Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. A Tutorial on Support Vector Machines for Pattern Recognition CHRISTOPHER J.C. BURGES Bell Laboratories, Lucent Technologies
More informationSelf Organizing Maps: Fundamentals
Self Organizing Maps: Fundamentals Introduction to Neural Networks : Lecture 16 John A. Bullinaria, 2004 1. What is a Self Organizing Map? 2. Topographic Maps 3. Setting up a Self Organizing Map 4. Kohonen
More informationOptimization for Machine Learning
Optimization for Machine Learning Lecture 4: SMO-MKL S.V. N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 11, 2012 S.V. N. Vishwanathan (Purdue University) Optimization for Machine Learning
More informationClassifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
More informationMACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More informationNeural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29
More informationFiltered Gaussian Processes for Learning with Large Data-Sets
Filtered Gaussian Processes for Learning with Large Data-Sets Jian Qing Shi, Roderick Murray-Smith 2,3, D. Mike Titterington 4,and Barak A. Pearlmutter 3 School of Mathematics and Statistics, University
More informationUnsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning
Unsupervised Learning and Data Mining Unsupervised Learning and Data Mining Clustering Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...
More informationModelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More informationMachine Learning. 01 - Introduction
Machine Learning 01 - Introduction Machine learning course One lecture (Wednesday, 9:30, 346) and one exercise (Monday, 17:15, 203). Oral exam, 20 minutes, 5 credit points. Some basic mathematical knowledge
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationEMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA
EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA Andreas Christmann Department of Mathematics homepages.vub.ac.be/ achristm Talk: ULB, Sciences Actuarielles, 17/NOV/2006 Contents 1. Project: Motor vehicle
More informationMachine Learning in FX Carry Basket Prediction
Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines
More informationMaximum Margin Clustering
Maximum Margin Clustering Linli Xu James Neufeld Bryce Larson Dale Schuurmans University of Waterloo University of Alberta Abstract We propose a new method for clustering based on finding maximum margin
More informationA User s Guide to Support Vector Machines
A User s Guide to Support Vector Machines Asa Ben-Hur Department of Computer Science Colorado State University Jason Weston NEC Labs America Princeton, NJ 08540 USA Abstract The Support Vector Machine
More informationMedical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu
Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationScalable Developments for Big Data Analytics in Remote Sensing
Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,
More informationOnline Classification on a Budget
Online Classification on a Budget Koby Crammer Computer Sci. & Eng. Hebrew University Jerusalem 91904, Israel kobics@cs.huji.ac.il Jaz Kandola Royal Holloway, University of London Egham, UK jaz@cs.rhul.ac.uk
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationActive Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationLCs for Binary Classification
Linear Classifiers A linear classifier is a classifier such that classification is performed by a dot product beteen the to vectors representing the document and the category, respectively. Therefore it
More information203.4770: Introduction to Machine Learning Dr. Rita Osadchy
203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:
More informationLecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationDesigning a learning system
Lecture Designing a learning system Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x4-8845 http://.cs.pitt.edu/~milos/courses/cs750/ Design of a learning system (first vie) Application or Testing
More informationClassifying Manipulation Primitives from Visual Data
Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if
More informationClassification of high resolution satellite images
Thesis for the degree of Master of Science in Engineering Physics Classification of high resolution satellite images Anders Karlsson Laboratoire de Systèmes d Information Géographique Ecole Polytéchnique
More informationUNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable
More informationfml The SHOGUN Machine Learning Toolbox (and its python interface)
fml The SHOGUN Machine Learning Toolbox (and its python interface) Sören Sonnenburg 1,2, Gunnar Rätsch 2,Sebastian Henschel 2,Christian Widmer 2,Jonas Behr 2,Alexander Zien 2,Fabio de Bona 2,Alexander
More informationSome stability results of parameter identification in a jump diffusion model
Some stability results of parameter identification in a jump diffusion model D. Düvelmeyer Technische Universität Chemnitz, Fakultät für Mathematik, 09107 Chemnitz, Germany Abstract In this paper we discuss
More informationPrincipal components analysis
CS229 Lecture notes Andrew Ng Part XI Principal components analysis In our discussion of factor analysis, we gave a way to model data x R n as approximately lying in some k-dimension subspace, where k
More informationα = u v. In other words, Orthogonal Projection
Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationGraphical Modeling for Genomic Data
Graphical Modeling for Genomic Data Carel F.W. Peeters cf.peeters@vumc.nl Joint work with: Wessel N. van Wieringen Mark A. van de Wiel Molecular Biostatistics Unit Dept. of Epidemiology & Biostatistics
More informationProbabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur
Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:
More informationIntroduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
More informationPenalized Logistic Regression and Classification of Microarray Data
Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification
More informationMaking Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research
More informationKERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA
Rahayu, Kernel Logistic Regression-Linear for Leukemia Classification using High Dimensional Data KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA S.P. Rahayu 1,2
More informationMachine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/2013 1 / 34
Machine Learning Javier Béjar cbea LSI - FIB Term 2012/2013 Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/2013 1 / 34 Outline 1 Introduction to Inductive learning 2 Search and inductive learning
More information