Latent variable and deep modeling with Gaussian processes; application to system identification. Andreas Damianou


 Ronald Golden
 2 years ago
 Views:
Transcription
1 Latent variable and deep modeling with Gaussian processes; application to system identification Andreas Damianou Department of Computer Science, University of Sheffield, UK Brown University, 17 Feb. 2016
2 Outline Part 1: Introduction Recap from yesterday Part 2: Incorporating latent variables Unsupervised GP learning Structure in the latent space  representation learning Bayesian Automatic Relevance Determination Deep GP learning Multiview GP learning Part 3: Dynamical systems Policy learning Autoregressive Dynamics Going deeper: Deep Recurrent Gaussian Process Regressive dynamics with deep GPs
3 Outline Part 1: Introduction Recap from yesterday Part 2: Incorporating latent variables Unsupervised GP learning Structure in the latent space  representation learning Bayesian Automatic Relevance Determination Deep GP learning Multiview GP learning Part 3: Dynamical systems Policy learning Autoregressive Dynamics Going deeper: Deep Recurrent Gaussian Process Regressive dynamics with deep GPs
4 Part 1: GP Takehome messages Gaussian processes as infinite dimensional Gaussian distributions can be used as priors over functions Nonparametric: training data act as parameters Uncertainty Quantification Learning from scarce data
5 Notation and graphical models Graphical models x y f x f y White nodes: Observed variables Shaded nodes: Unobserved, or latent variables. Convention: Sometimes the latent function will be placed next to the arrow; sometimes I ll explicitly include the collection of instantiations f
6 Outline Part 1: Introduction Recap from yesterday Part 2: Incorporating latent variables Unsupervised GP learning Structure in the latent space  representation learning Bayesian Automatic Relevance Determination Deep GP learning Multiview GP learning Part 3: Dynamical systems Policy learning Autoregressive Dynamics Going deeper: Deep Recurrent Gaussian Process Regressive dynamics with deep GPs
7 Unsupervised learning: GPLVM x y f If X is unobserved, treat it as a parameter and optimize over it. GPLVM is interpreted as nonlinear, nonparametric probabilistic PCA (Lawrence, JMLR 2015). Objective (likelihood function) is similar to GPs: p(y x) = p(y f)p(f x)df = N (y 0, K ff + σ 2 I) but now x s are optimized too. Neil Lawrence, JMLR, 2005.
8 Unsupervised learning: GPLVM x y f If X is unobserved, treat it as a parameter and optimize over it. GPLVM is interpreted as nonlinear, nonparametric probabilistic PCA (Lawrence, JMLR 2015). Objective (likelihood function) is similar to GPs: p(y x) = p(y f)p(f x)df = N (y 0, K ff + σ 2 I) but now x s are optimized too. Neil Lawrence, JMLR, 2005.
9 Unsupervised learning: GPLVM x y f If X is unobserved, treat it as a parameter and optimize over it. GPLVM is interpreted as nonlinear, nonparametric probabilistic PCA (Lawrence, JMLR 2015). Objective (likelihood function) is similar to GPs: p(y x) = p(y f)p(f x)df = N (y 0, K ff + σ 2 I) but now x s are optimized too. Neil Lawrence, JMLR, 2005.
10 Fitting the GPLVM
11 Fitting the GPLVM 5.5 Figure credits: C. H. Ek
12 Fitting the GPLVM 5.5 Figure credits: C. H. Ek Additional difficulty: x s are also missing! Improvement: Invoke the Bayesian methodology to find x s too.
13 Bayesian GPLVM Additionally integrate out the latent space. p(y) = p(y f)p(f x)p(x)df dx where p(x) N (0, I) Titsias and Lawrence 2010, AISTATS; Damianou et al. 2015, JMLR
14 Automatic dimensionality detection In general, X = [x 1,..., x N ] with x (n) R Q. Automatic dimenionality detection by employing automatic relevance determination (ARD) priors for the mapping f. f GP(0, k f ) with: k f (x (i), x (j)) = σ 2 exp 1 Q ( w (q) x (i,q) x (j,q)) 2 2 Example: q=1
15 Outline Part 1: Introduction Recap from yesterday Part 2: Incorporating latent variables Unsupervised GP learning Structure in the latent space  representation learning Bayesian Automatic Relevance Determination Deep GP learning Multiview GP learning Part 3: Dynamical systems Policy learning Autoregressive Dynamics Going deeper: Deep Recurrent Gaussian Process Regressive dynamics with deep GPs
16 Sampling from a deep GP Input f Unobserved f Y Output
17 Model x = given f h = f h (x) = GP(0, k h (x, x)) h = f h (x) + ɛ h f y = f y (h) = GP(0, k f (h, h)) y = f y (h) + ɛ y So: y = f y (f h (x) + ɛ h ) + ɛ y Objective: p(y x) = p(y f y )p(f y h)p(h f h )p(f h x)df y df h dh Damianou and Lawrence, AISTATS 2013; Damianou, PhD Thesis, 2015
18 Deep GP: Step function (credits for idea to J. Hensman, R. Calandra, M. Deisenroth) (standard GP) (deep GP  1 hidden layer) (deep GP  3 hidden layers)
19 Deep GP with three hidden plus one warping layer Standard GP
20 Nonlinear feature learning Stacked GP (nonlinear) Stacked PCA (linear) Successive warping creates knots which act as features. Features discovered in one layer remain in the next ones (ie knots are not untied) f 1 (f 2 (f 3 (x))) With linear warpings (stacked PCA) we can t achieve this effect: W 1 (W 2 (W 3 x))) = W x Damianou, PhD Thesis, 2015
21 Deep GP: MNIST example (Live sampling video)
22 Outline Part 1: Introduction Recap from yesterday Part 2: Incorporating latent variables Unsupervised GP learning Structure in the latent space  representation learning Bayesian Automatic Relevance Determination Deep GP learning Multiview GP learning Part 3: Dynamical systems Policy learning Autoregressive Dynamics Going deeper: Deep Recurrent Gaussian Process Regressive dynamics with deep GPs
23 Multiview: Manifold Relevance Determination (MRD) Multiview data arise from multiple information sources. These sources naturally contain some overlapping, or shared signal (since they describe the same phenomenon ), but also have some private signal. MRD: Model such data using overlapping sets of latent variables Demo: Faces, mocapsilhouette Damianou et al., ICML 2012; Damianou, PhD Thesis, 2015
24 Multiview: Manifold Relevance Determination (MRD) Multiview data arise from multiple information sources. These sources naturally contain some overlapping, or shared signal (since they describe the same phenomenon ), but also have some private signal. MRD: Model such data using overlapping sets of latent variables Demo: Faces, mocapsilhouette Damianou et al., ICML 2012; Damianou, PhD Thesis, 2015
25 Multiview: Manifold Relevance Determination (MRD) Multiview data arise from multiple information sources. These sources naturally contain some overlapping, or shared signal (since they describe the same phenomenon ), but also have some private signal. MRD: Model such data using overlapping sets of latent variables Demo: Faces, mocapsilhouette Damianou et al., ICML 2012; Damianou, PhD Thesis, 2015
26 Deep GPs: Another multiview example X X
27 Automatic structure discovery summary: ARD and MRD Tools: ARD: Eliminate uncessary nodes/connections MRD: Conditional independencies Approximating evidence: Number of layers (?)
28 Automatic structure discovery summary: ARD and MRD Tools: ARD: Eliminate uncessary nodes/connections MRD: Conditional independencies Approximating evidence: Number of layers (?)
29 More deepgp representation learning examples icub interaction Faces  autoencoder
30 Outline Part 1: Introduction Recap from yesterday Part 2: Incorporating latent variables Unsupervised GP learning Structure in the latent space  representation learning Bayesian Automatic Relevance Determination Deep GP learning Multiview GP learning Part 3: Dynamical systems Policy learning Autoregressive Dynamics Going deeper: Deep Recurrent Gaussian Process Regressive dynamics with deep GPs
31 Dynamical systems examples Through a model, we wish to learn to: perform free simulation by learning patterns coming from the latent generation process (a mechanistic system we do not know) perform inter/extrapolation in timeseries data which are very highdimensional (e.g. video) detect outliers in data coming from a dynamical system optimize policies for control based on a model of the data.
32 Policy learning Modelbased policy learning (Cartpole) (Unicycle) Work by: Marc Deisenroth, Andrew McHutchon, Carl Rasmussen
33 NARX model A standard NARX model considers an input vector x i R D comprised of L y past observed outputs y i R and L u past exogenous inputs u i R: x i = [y i 1,, y i Ly, u i 1,, u i Lu ], y i = f(x i ) + ɛ (y) i, ɛ (y) i Latent autoregressive GP model: N (ɛ (y) i 0, σy), 2 x i = f(x i 1,, x i Lx u i 1,, u i Lu ) + ɛ (x) i, y i = x i + ɛ (y) i, Contribution 1: Simultaneous autoregressive and representation learning. Contribution 2: Latents avoid the feedback of possibly corrupted observations into the dynamics. Mattos, Damianou, Barreto, Lawrence, 2016
34 NARX model A standard NARX model considers an input vector x i R D comprised of L y past observed outputs y i R and L u past exogenous inputs u i R: x i = [y i 1,, y i Ly, u i 1,, u i Lu ], y i = f(x i ) + ɛ (y) i, ɛ (y) i Latent autoregressive GP model: N (ɛ (y) i 0, σy), 2 x i = f(x i 1,, x i Lx u i 1,, u i Lu ) + ɛ (x) i, y i = x i + ɛ (y) i, Contribution 1: Simultaneous autoregressive and representation learning. Contribution 2: Latents avoid the feedback of possibly corrupted observations into the dynamics. Mattos, Damianou, Barreto, Lawrence, 2016
35 Robustness to outliers Latent autoregressive GP model: x i = f(x i 1,, x i Lx u i 1,, u i Lu ) + ɛ (x) i, y i = x i + ɛ (y) i, ɛ (x) i ɛ (y) i N (ɛ (x) i 0, σx), 2 N (ɛ (y) 0, τ 1 i i ), τ i Γ(τ i α, β), Contribution 3: Switchingoff outliers by including the above Studentt likelihood for the noise. Mattos, Damianou, Barreto, Lawrence, DYCOPS 2016
36 Robust GP autoregressive model: demonstration (a) Artificial 1. (b) Artificial 2. Figure: RMSE values for free simulation on test data with different levels of contamination by outliers.
37 (c) Artificial 3. (d) Artificial 4. (e) Artificial 5.
38 Going deeper: Deep Recurrent Gaussian Process u x (1) x (2) x (H ) y Figure 1: RGP graphical model with H hidden layers. x is the lagged latent function values augmented with the lagged exogenous inputs. Mattos, Dai, Damianou, Barreto, Lawrence, ICLR 2016
39 Inference is tricky... log p(y) N L 2 Tr H+1 h=1 ( ( K (H+1) z log 2πσ 2 h 1 ) 1 Ψ (H+1) (σH+1 2 y Ψ (H+1) )2 1 { ( H + 1 N 2σ 2 h=1 h i=l K (h) z 1 2 K(h) + 1 ( µ (h)) (h) Ψ 2(σh 2)2 1 N ) i=l+1 x (h) i ( q 2σ 2 H+1 ) ) ( y y + Ψ (H+1) 0 K z (H+1) ( K z (H+1) + 1 Ψ (H+1) σh ( λ (h) i + µ (h)) µ (h) + Ψ (h) 0 Tr z + 1 Ψ (h) σh 2 2 ( K z (h) + 1 ) 1 ( Ψ (h) σh 2 2 ( ) L log q + x (h) i x (h) i i=1 x (h) i 1 2 K(H+1) z + 1 σh+1 2 ) 1 ( ) y Ψ (h) 1 ( q x (h) i Ψ (H+1) 1 ( ( K (h) z ) µ (h) ) ( log p Ψ (H+1) 2 ) 1 Ψ (h) 2 x (h) i ) }. ) )
40 Results Results in nonlinear systems identification: 1. artificial dataset 2. drive dataset: by a system with two electric motors that drive a pulley using a flexible belt. input: the sum of voltages applied to the motors output: speed of the belt.
41 RGP GPNARX MLPNARX LSTM
42 RGP GPNARX MLPNARX LSTM
43 Avatar control Figure: The generated motion with a step function signal, starting with walking (blue), switching to running (red) and switching back to walking (blue) Videos: Switching between learned speeds Interpolating (un)seen speed Constant unseen speed
44 Regressive dynamics with deep GPs t x h f Instead of coupling f s by encoding the Markov property, we couple them by coupling the f s inputs through another GP with time as input. y = f(x) + ɛ x GP(0, k x (t, t)) f GP(0, k f (x, x)) y Damianou et al., NIPS 2011; Damianou, Titsias and Lawrence, JMLR, 2015
45 Dynamics Dynamics are encoded in the covariance matrix K = k(t, t). We can consider special forms for K. Model individual sequences Model periodic data (missa) (dog) (mocap)
46 Summary Datadriven, modelbased approach to real and control problems Gaussian processes: Uncertainty quantification / propagation gives an advantage Deep Gaussian processes: Representation learning + dynamics learning Future work: Deep Gaussian processes + mechanistic information; consider real applications. Thanks!
47 Thanks Thanks to: Neil Lawrence, Michalis Titsias, Carl Henrik Ek, James Hensman, Cesar Lincoln Mattos, Zhenwen Dai, Javier Gonzalez, Tony Prescott, Uriel MartinezHernandez, Luke Boorman Thanks to George Karniadakis, Paris Perdikaris for hosting me
48 BACKUP SLIDES 1: Deep GP Optimization  variational inference
49 MAP optimisation? x f 1 h 1 f 2 h 2 f 3 y Joint Distr. = p(y h 2 )p(h 2 h 1 )p(h 1 x) MAP optimization is extremely problematic because: Dimensionality of hs has to be decided a priori Prone to overfitting, if h are treated as parameters Deep structures are not supported by the model s objective but have to be forced [Lawrence & Moore 07] We want: To use the marginal likelihood as the objective: marg. lik. = h 2,h 1 p(y h 2 )p(h 2 h 1 )p(h 1 x) Further regularization tools.
50 Marginal likelihood is intractable Let s try to marginalize out the top layer only: p(h 2 ) = p(h 2 h 1 )p(h 1 )dh 1 = p(h 2 f 2 )p(f 2 h 1 )p(h 1 )df 2 h 1 = p(h 2 f 2 )[ ] p(f 2 h 1 )p(h 1 )dh 1 df 2 } {{ } Intractable! Intractability: h 1 appears nonlinearly in p(f 2 h 1 ), inside K 1 (and also the determinant term), where K = k(h 1, h 1 ).
51 Solution: Variational Inference Similar issues arise for 1layer models. Solution was given by Titsias and Lawrence, A small modification to that solution does the trick in deep GPs too. Extend Titsias method for variational learning of inducing variables in Sparse GPs. Analytic variational bound F p(y x) Approximately marginalise out h Hence obtain the approximate posterior q(h)
52 Inducing points: sparseness, tractability and Big Data h (1) f (1) h (2) f (2) h (30) f (30) h (31) f (31) h (N) f (N)
53 Inducing points: sparseness, tractability and Big Data h (1) f (1) h (2) f (2) h (30) f (30) h (i) u (i) h (31) f (31) h (N) f (N)
54 Inducing points: sparseness, tractability and Big Data h (1) f (1) h (2) f (2) h (30) f (30) h (i) u (i) h (31) f (31) h (N) f (N) Inducing points originally introduced for faster (sparse) GPs But this also induces tractability in our models, due to the conditional independencies assumed Viewing them as global variables extension to Big Data [Hensman et al., UAI 2013]
55 Bayesian regularization F = Data Fit KL (q(h 1 ) p(h 1 )) + } {{ } Regularisation L H (q(h l )) } {{ } Regularisation l=2
arxiv:1410.4984v1 [cs.dc] 18 Oct 2014
Gaussian Process Models with Parallelization and GPU acceleration arxiv:1410.4984v1 [cs.dc] 18 Oct 2014 Zhenwen Dai Andreas Damianou James Hensman Neil Lawrence Department of Computer Science University
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More informationBayesian Time Series Learning with Gaussian Processes
Bayesian Time Series Learning with Gaussian Processes Roger FrigolaAlcalde Department of Engineering St Edmund s College University of Cambridge August 2015 This dissertation is submitted for the degree
More informationGaussian Processes for Big Data
Gaussian Processes for Big Data James Hensman Dept. Computer Science The University of Sheffield Sheffield, UK Nicolò Fusi Dept. Computer Science The University of Sheffield Sheffield, UK Neil D. Lawrence
More informationHybrid DiscriminativeGenerative Approach with Gaussian Processes
Hybrid DiscriminativeGenerative Approach with Gaussian Processes Ricardo AndradePacheco acq11ra@sheffield.ac.uk Max Zwießele m.zwiessele@sheffield.ac.uk James Hensman james.hensman@sheffield.ac.uk Neil
More informationLearning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
More informationNeural Networks. CAP5610 Machine Learning Instructor: GuoJun Qi
Neural Networks CAP5610 Machine Learning Instructor: GuoJun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector
More informationModelbased Synthesis. Tony O Hagan
Modelbased Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
More informationCheng Soon Ong & Christfried Webers. Canberra February June 2016
c Cheng Soon Ong & Christfried Webers Research Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 31 c Part I
More informationGaussian Process Dynamical Models
Gaussian Process Dynamical Models Jack M. Wang, David J. Fleet, Aaron Hertzmann Department of Computer Science University of Toronto, Toronto, ON M5S 3G4 {jmwang,hertzman}@dgp.toronto.edu, fleet@cs.toronto.edu
More informationBayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationGaussian Process Training with Input Noise
Gaussian Process Training with Input Noise Andrew McHutchon Department of Engineering Cambridge University Cambridge, CB PZ ajm57@cam.ac.uk Carl Edward Rasmussen Department of Engineering Cambridge University
More informationLecture 10: Sequential Data Models
CSC2515 Fall 2007 Introduction to Machine Learning Lecture 10: Sequential Data Models 1 Example: sequential data Until now, considered data to be i.i.d. Turn attention to sequential data Timeseries: stock
More informationCell Phone based Activity Detection using Markov Logic Network
Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart
More informationLearning Shared Latent Structure for Image Synthesis and Robotic Imitation
University of Washington CSE TR 20050602 To appear in Advances in NIPS 18, 2006 Learning Shared Latent Structure for Image Synthesis and Robotic Imitation Aaron P. Shon Keith Grochow Aaron Hertzmann
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationSolving Challenging Nonlinear Regression Problems by Manipulating a Gaussian Distribution
Solving Challenging Nonlinear Regression Problems by Manipulating a Gaussian Distribution Sheffield Gaussian Process Summer School, 214 Carl Edward Rasmussen Department of Engineering, University of Cambridge
More informationManifold Learning with Variational Autoencoder for Medical Image Analysis
Manifold Learning with Variational Autoencoder for Medical Image Analysis Eunbyung Park Department of Computer Science University of North Carolina at Chapel Hill eunbyung@cs.unc.edu Abstract Manifold
More informationNeural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation
Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdelrahman Mohamed A brief history of backpropagation
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationAcknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues
Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the
More informationGaussian Processes for Big Data Problems
Gaussian Processes for Big Data Problems Marc Deisenroth Department of Computing Imperial College London http://wp.doc.ic.ac.uk/sml/marcdeisenroth Machine Learning Summer School, Chalmers University 14
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationSwitching Gaussian Process Dynamic Models for Simultaneous Composite Motion Tracking and Recognition
Switching Gaussian Process Dynamic Models for Simultaneous Composite Motion Tracking and Recognition Jixu Chen Minyoung Kim Yu Wang Qiang Ji Department of Electrical,Computer and System Engineering Rensselaer
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationSampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data
Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian
More informationDeterministic Samplingbased Switching Kalman Filtering for Vehicle Tracking
Proceedings of the IEEE ITSC 2006 2006 IEEE Intelligent Transportation Systems Conference Toronto, Canada, September 1720, 2006 WA4.1 Deterministic Samplingbased Switching Kalman Filtering for Vehicle
More informationGaussian Process Latent Variable Models for Visualisation of High Dimensional Data
Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Neil D. Lawrence Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield,
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationFeature Extraction and Selection. More Info == Better Performance? Curse of Dimensionality. Feature Space. HighDimensional Spaces
More Info == Better Performance? Feature Extraction and Selection APR Course, Delft, The Netherlands Marco Loog Feature Space Curse of Dimensionality A pdimensional space, in which each dimension is a
More informationLecture 6: The Bayesian Approach
Lecture 6: The Bayesian Approach What Did We Do Up to Now? We are given a model Loglinear model, Markov network, Bayesian network, etc. This model induces a distribution P(X) Learning: estimate a set
More informationBayesian Image SuperResolution
Bayesian Image SuperResolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian
More informationTracking Algorithms. Lecture17: Stochastic Tracking. Joint Probability and Graphical Model. Probabilistic Tracking
Tracking Algorithms (2015S) Lecture17: Stochastic Tracking Bohyung Han CSE, POSTECH bhhan@postech.ac.kr Deterministic methods Given input video and current state, tracking result is always same. Local
More informationBayesian networks  Timeseries models  Apache Spark & Scala
Bayesian networks  Timeseries models  Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup  November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
More informationEfficient online learning of a nonnegative sparse autoencoder
and Machine Learning. Bruges (Belgium), 2830 April 2010, dside publi., ISBN 293030102. Efficient online learning of a nonnegative sparse autoencoder Andre Lemme, R. Felix Reinhart and Jochen J. Steil
More informationMachine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
More informationMonte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)
Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationCoding and decoding with convolutional codes. The Viterbi Algor
Coding and decoding with convolutional codes. The Viterbi Algorithm. 8 Block codes: main ideas Principles st point of view: infinite length block code nd point of view: convolutions Some examples Repetition
More informationGaussian Processes to Speed up Hamiltonian Monte Carlo
Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo
More informationUnsupervised Learning
Unsupervised Learning Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK zoubin@gatsby.ucl.ac.uk http://www.gatsby.ucl.ac.uk/~zoubin September 16, 2004 Abstract We give
More informationINDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulationbased method for estimating the parameters of economic models. Its
More informationPrinciples of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
More informationFeature Engineering in Machine Learning
Research Fellow Faculty of Information Technology, Monash University, Melbourne VIC 3800, Australia August 21, 2015 Outline A Machine Learning Primer Machine Learning and Data Science BiasVariance Phenomenon
More informationInference on Phasetype Models via MCMC
Inference on Phasetype Models via MCMC with application to networks of repairable redundant systems Louis JM Aslett and Simon P Wilson Trinity College Dublin 28 th June 202 Toy Example : Redundant Repairable
More informationAUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
More informationIntroduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011
Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning
More informationMachine Learning and Statistics: What s the Connection?
Machine Learning and Statistics: What s the Connection? Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh, UK August 2006 Outline The roots of machine learning
More informationLearning Deep Hierarchies of Representations
Learning Deep Hierarchies of Representations Yoshua Bengio, U. Montreal Google Research, Mountain View, California September 23rd, 2009 Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier
More informationData Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a
More informationClassification. Chapter 3
Chapter 3 Classification In chapter we have considered regression problems, where the targets are real valued. Another important class of problems is classification problems, where we wish to assign an
More informationMixtures of Robust Probabilistic Principal Component Analyzers
Mixtures of Robust Probabilistic Principal Component Analyzers Cédric Archambeau, Nicolas Delannay 2 and Michel Verleysen 2  University College London, Dept. of Computer Science Gower Street, London WCE
More informationExact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure
Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission
More informationDetection of changes in variance using binary segmentation and optimal partitioning
Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002Topics in StatisticsBiological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationProbabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm
Probabilistic Graphical Models 10708 Homework 1: Due January 29, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 13. You must complete all four problems to
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.
Lecture 1 Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationProbability Theory. Elementary rules of probability Sum rule. Product rule. p. 23
Probability Theory Uncertainty is key concept in machine learning. Probability provides consistent framework for the quantification and manipulation of uncertainty. Probability of an event is the fraction
More informationClassification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
More informationMachine Learning Overview
Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 1. As a scientific Discipline 2. As an area of Computer Science/AI
More informationDynamical Binary Latent Variable Models for 3D Human Pose Tracking
Dynamical Binary Latent Variable Models for 3D Human Pose Tracking Graham W. Taylor New York University New York, USA gwtaylor@cs.nyu.edu Leonid Sigal Disney Research Pittsburgh, USA lsigal@disneyresearch.com
More informationMACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More informationData Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan
Data Mining: An Overview David Madigan http://www.stat.columbia.edu/~madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:
More informationLecture 11: Graphical Models for Inference
Lecture 11: Graphical Models for Inference So far we have seen two graphical models that are used for inference  the Bayesian network and the Join tree. These two both represent the same joint probability
More informationIntroduction to Machine Learning
Introduction to Machine Learning Felix Brockherde 12 Kristof Schütt 1 1 Technische Universität Berlin 2 Max Planck Institute of Microstructure Physics IPAM Tutorial 2013 Felix Brockherde, Kristof Schütt
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN13: 9780470860809 ISBN10: 0470860804 Editors Brian S Everitt & David
More informationMachine Learning in FX Carry Basket Prediction
Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines
More informationMachine learning for algo trading
Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950F, Spring 2012 Instructor: Erik Sudderth Graduate TAs: Dae Il Kim & Ben Swanson Head Undergraduate TA: William Allen Undergraduate TAs: Soravit
More informationPseudoLabel : The Simple and Efficient SemiSupervised Learning Method for Deep Neural Networks
PseudoLabel : The Simple and Efficient SemiSupervised Learning Method for Deep Neural Networks DongHyun Lee sayit78@gmail.com Nangman Computing, 117D Garden five Tools, Munjeongdong Songpagu, Seoul,
More informationFlexible and efficient Gaussian process models for machine learning
Flexible and efficient Gaussian process models for machine learning Edward Lloyd Snelson M.A., M.Sci., Physics, University of Cambridge, UK (2001) Gatsby Computational Neuroscience Unit University College
More informationMonte Carlobased statistical methods (MASM11/FMS091)
Monte Carlobased statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February 5, 2013 J. Olsson Monte Carlobased
More informationIntroduction to latent variable models
Introduction to latent variable models lecture 1 Francesco Bartolucci Department of Economics, Finance and Statistics University of Perugia, IT bart@stat.unipg.it Outline [2/24] Latent variables and their
More informationArtificial Intelligence Mar 27, Bayesian Networks 1 P (T D)P (D) + P (T D)P ( D) =
Artificial Intelligence 15381 Mar 27, 2007 Bayesian Networks 1 Recap of last lecture Probability: precise representation of uncertainty Probability theory: optimal updating of knowledge based on new information
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM 10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationMachine Learning I Week 14: Sequence Learning Introduction
Machine Learning I Week 14: Sequence Learning Introduction Alex Graves Technische Universität München 29. January 2009 Literature Pattern Recognition and Machine Learning Chapter 13: Sequential Data Christopher
More informationParametric Statistical Modeling
Parametric Statistical Modeling ECE 275A Statistical Parameter Estimation Ken KreutzDelgado ECE Department, UC San Diego Ken KreutzDelgado (UC San Diego) ECE 275A SPE Version 1.1 Fall 2012 1 / 12 Why
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationDynamic Modeling, Predictive Control and Performance Monitoring
Biao Huang, Ramesh Kadali Dynamic Modeling, Predictive Control and Performance Monitoring A Datadriven Subspace Approach 4y Spri nnger g< Contents Notation XIX 1 Introduction 1 1.1 An Overview of This
More informationIntroduction to machine learning and pattern recognition Lecture 1 Coryn BailerJones
Introduction to machine learning and pattern recognition Lecture 1 Coryn BailerJones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 What is machine learning? Data description and interpretation
More informationSimilarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle
Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases Andreas Züfle Geo Spatial Data Huge flood of geo spatial data Modern technology New user mentality Great research potential
More informationPattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
More informationRealtime Body Tracking Using a Gaussian Process Latent Variable Model
Realtime Body Tracking Using a Gaussian Process Latent Variable Model Shaobo Hou Aphrodite Galata Fabrice Caillette Neil Thacker Paul Bromiley School of Computer Science The University of Manchester ISBE
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationLearning Deep Feature Hierarchies
Learning Deep Feature Hierarchies Yoshua Bengio, U. Montreal Stanford University, California September 21st, 2009 Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,
More informationTracking in flussi video 3D. Ing. Samuele Salti
Seminari XXIII ciclo Tracking in flussi video 3D Ing. Tutors: Prof. Tullio Salmon Cinotti Prof. Luigi Di Stefano The Tracking problem Detection Object model, Track initiation, Track termination, Tracking
More informationBayesian probability theory
Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations
More informationSemiSupervised Support Vector Machines and Application to Spam Filtering
SemiSupervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
More informationarxiv: v2 [math.st] 29 Aug 2015
Gaussian Processes for Regression: A Quick Introduction M. Ebden, August 008 Comments to mebden@gmail.com arxiv:505.0965v [math.st] 9 Aug 05 MOTIVATION Figure illustrates a typical example of a prediction
More informationAnalytic Long Term Forecasting with Periodic Gaussian Processes
Master Thesis Computer Science December 2013 Analytic Long Term Forecasting with Periodic Gaussian Processes Author: Nooshin Haji Ghassemi School of Computing Blekinge Institute of Technology 37179 Karlskrona
More informationCSC321 Introduction to Neural Networks and Machine Learning. Lecture 21 Using Boltzmann machines to initialize backpropagation.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 21 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton Some problems with backpropagation The amount of information
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationA Hybrid Neural NetworkLatent Topic Model
Li Wan Leo Zhu Rob Fergus Dept. of Computer Science, Courant institute, New York University {wanli,zhu,fergus}@cs.nyu.edu Abstract This paper introduces a hybrid model that combines a neural network with
More informationBig Data, Machine Learning, Causal Models
Big Data, Machine Learning, Causal Models Sargur N. Srihari University at Buffalo, The State University of New York USA Int. Conf. on Signal and Image Processing, Bangalore January 2014 1 Plan of Discussion
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationMonocular 3D Human Motion Tracking Using Dynamic Probabilistic Latent Semantic Analysis
Monocular 3D Human Motion Tracking Using Dynamic Probabilistic Latent Semantic Analysis Kooksang Moon and Vladimir Pavlović Department of Computer Science, Rutgers University, Piscataway, NJ 8854 {ksmoon,
More information