Estimating the Inverse Covariance Matrix of Independent Multivariate Normally Distributed Random Variables
|
|
- Blake Heath
- 7 years ago
- Views:
Transcription
1 Estimating the Inverse Covariance Matrix of Independent Multivariate Normally Distributed Random Variables Dominique Brunet, Hanne Kekkonen, Vitor Nunes, Iryna Sivak Fields-MITACS Thematic Program on Inverse Problems and Imaging Group on Bayesian/Statistical Inverse Problems July 27, 2012
2 Menu Hanne Problem Formulation and Motivation Frequentist and Bayesian Origins Relation to Imaging Iryna The LASSO Methods Vitor Newton Methods and Quasi-Newton Methods The QUIC Trick Thoughts on Future Work Dominique Simulation and Results Conclusions
3 Bayes formula brings a priori information and measured data together Statistical Inversion We consider the linear measurement model m = Ax + ε where x R p and m R p are treated as random variables. Bayes formula Bayes formula gives us the posterior distribution π(x m): π(x m) π(m x)π(x) We recover x as a point estimate from π(x m).
4 Likelihood distribution describes the measurement If we assume that the noise ε is Gaussian ε k N p(µ, σ 2 ), 1 k p we can write the likelihood distribution which measures data misfit as ( 1 π(m x) = π ε( Ax m ) = σ (2π) exp 1 ) 2σ Ax 2 m 2 2.
5 Prior distribution contains all other information about unknown x The prior distribution shouldn t depend on the measurement. The prior distribution should assign a clearly higher probability to those values of x that we expect to see than to unexpected x. Designing a good prior distributions is one of the main difficulties in statistical inversion.
6 Gaussian prior distribution Gaussian model If we assume that also x is Gaussian we can write π(x) = (det(σ 1 )) 1/2 (2π) p/2 x N p(µ, Σ) ( exp 1 ) 2 (x µ)t Σ 1 (x µ)
7 Estimating inverse covariance matrix Σ 1 We consider the problem of finding a good estimator for inverse covariance matrix Σ 1 with a constraint that certain given pairs of variables are conditionally independent. Conditional independence constraints describe the sparsity pattern of the inverse covariance matrix Σ 1, zeros showing the conditional independence between variables.
8 Why inverse covariance matrix? Simple example We can think a line of objects which are connected by a spring. If we move the leftmost object y 1 also the rightmost object y 5 of the line will move so clearly y 1 and y 5 are not independent.
9 However y 1 and y 5 are conditionally independent given all the other variables. This is because given the information of how y 2 moves knowing y 5 gives us no further information about y 1. In fact y i is conditionally dependent only of its neighbours.
10 What does this have to do with pictures? Neighbourhood -Rule We are interested in exploring the assumption that every pixel is conditionally dependent only of its neighbours nz = 838 In this case the inverse covariance matrix has non zero elements only on nine diagonals.
11 Log-likelihood function Likelihood function We want to estimate parameters Σ 1 and µ of N p(µ, Σ), based on N independent samples x i. The likelihood function is N π(x i Σ, µ) = det N Σ 2 i=1 (2π) Np 2 { exp 1 2 N } (x j µ) T σ 1 (x j µ) j=1 Log-likelihood function Log-likelihood function of data is, up to a constant L(Σ 1, µ) = N log det Σ 1 N (x j µ) T Σ 1 (x j µ) j=1
12 How to obtain sparse Σ 1 maximum likelihood estimate Maximizing the log-likelihood with respect to Σ 1 leads to the maximum likelihood estimate Σ 1 = S 1 which isn t usually sparse (here S is the sample covariance obtained from the data). Also when p > N, S will be singular and so the maximum likelihood estimate cannot be computed. Penalised log-likelihood function To find a sparse Σ 1 we use penalised log-likelihood function argmin Σ 1 0 { log det Σ 1 + (SΣ 1 ) + P Σ 1 1 } where where S is the sample covariance obtained from the data and P is our sparsity constraint.
13 In a nutshell π(x m) π(m x)π(x) Expect that x N p(µ, Σ). Penalised log-likelihood function What is Σ 1? Back to original inverse problem Solve the original inverse problem 1 using Σ argmin x Ax m 2 (x µ) T Σ 1 (x µ) Σ 1 Estimating Use algorithms to find Σ 1
14 Graphical Lasso a 11 a 12 a 13 Σ 1 1 = a 12 a 22 a 23 a 13 a 23 a 33 b 11 0 b 13 Σ 1 2 = 0 b 22 b 23 b 13 b 23 b 33
15 Graphical Lasso a 11 a 12 a 13 Σ 1 1 = a 12 a 22 a 23 a 13 a 23 a 33 b 11 0 b 13 Σ 1 2 = 0 b 22 b 23 b 13 b 23 b 33
16 Graphical Lasso
17 Graphical Lasso
18 GLASSO Θ := Σ 1 ˆΘ = argmin Θ 0 log det(θ) + tr(sθ) + λ Θ 1 ˆΘ 1 S λγ( ˆΘ) = 0, where (Γ( ˆΘ)) ij = Γ( ˆΘ ij ) the subgradient of ˆΘ ij : Γ( ˆΘ ij ) = sign( ˆΘ ij ) if ˆΘ ij 0, Γ( ˆΘ ij ) ( 1, 1) if ˆΘ ij = 0
19 GLASSO Θ := Σ 1, W := Θ 1 ( ) ( ) ( ) Θ11 θ Θ = 12 S11 s, S = 12 W11 w, W = 12 θ 21 θ 22 s 21 s 22 w 21 w 22 Start with W = S + λi. We also added Thresholding. Cycle around rows/columns till convergence: solve QP: 1 ˆβ = arg min β R p 1 2 β W 11 β + β s 12 + λ β 1, W 11 0 using as warm starts the solution from the previous round for this column update ŵ 12 = W 11 ˆβ save ˆβ for this column in the matrix B For every row/column compute ˆθ 22 = (s 22 + λ ˆβ ŵ 12 ) 1, ˆθ 12 = ˆβ ˆθ 22. [J. Friedman, T. Hastie, R. Tibshirani, 2007]
20 GLASSO 1.7 Theorem. Solution to the graphical lasso problem to be diagonal with blocks C 1, C 2,...C K S ij λ for all i C k, j C l, k l. Θ 1 Θ 2 Θ =... ΘK where Θ k solves the graphical lasso problem applied to submatrix of S consisting of the entries whose indices are in C k. [D.M. Witten, J. Friedman, N. Simon, 2011]
21 Problem Formulation Cost Functional Definition Our goal is to minimize the following functional, f (v), under the set of definite positive matrices. min f (Σ 1 ) = log(det Σ 1 ) + tr(σ 1 S) + ρ Σ R(Σ 1 ) 2 2 (1) Minimizing the previous is equivalent to minimize : min f (Θ) = log(det Θ) + tr(θs) + ρ Θ 0 2 R(Θ) 2 2 (2)
22 Model Reduction- Closest Neighbor Neighborhood -Rule Reduce the model by assuming that each pixel is only related to itself and to those which it shares a boundary or a corner. As the following figure shows: Neighborhood -Rule This rule permits to reduced the problem dimension from n = (n xn y) 2 to m = 5n xn y 3n x 3n y + 2, where n x and n y are the number of pixels on the x-axis and y-axis respectively.
23 Model Reduction-Problem Reformulation Transformation Once g : R m R n,n is defined, the problem becomes an unconstrained optimization problem on R m. Therefore the minimizing problem (2) is written as: min J(v) = f (g(v)) = log(det g(v)) + tr(g(v)s) + ρ v R m 2 R(g(v)) 2 2 (3)
24 Numerical Approach Quasi-Newton s Method Expanding J(v) to a quadratic function we get : J(v + ) J(v) + J(v) + t H(v) (4) Therefore the that minimizes J(v + ) must be the solution of = H 1 (v) J(v). So given an initial guess v 0 one can solve the iterative sequence: v k = v k 1 α k H 1 k (v k 1 )J (v k 1 ) (5)
25 Gradient Computation: The gradient of J(v) is computed exactly by: Where f (Θ) is given by J (v) = f (g(v)) g (v) R m (6) ( f ) i,j = Θ 1 j,i + S j,i + +ρ[r (Θ) Θ σ i,j ] i,j (7) Hessian Computation: Since is computational expensive to compute the Hessian, one should use an approximation method such as BFGS.
26 Numerical Results, n=100 Error Analysis g(v)s I 2 Error Analysis g(v) Θ 2
27 QUIC Algorithm, Cho Jui Hsieh University of Texas: Cost Function min Θ J(Θ) = log(det Θ) + tr(θs) + ρ 2 Θ 1 (8) Split Into a Differentiable Function and a Non-Differentiable g(θ) = log(det Θ) + tr(θs) (9) h(θ) = ρ 2 Θ 1 (10)
28 QUIC Algorithm, Cho Jui Hsieh University of Texas: Second Order Approximation of g(.) G( ) = g(θ + ) = g(θ) + g(θ + ) + t H (11) New Cost functional min F( ) = G( ) + h(θ + ) (12) Choice of = µ[e j e t i + e i e t j ] (13) min µ F (µ) (14)
29 Overview / Future Work Overview Model Reduction Numerical Approach Numerical Results QUIC Algorithm Future Work Improve Memory Efficiency Better Choice of x 0 Different Rules of Reduction Incorporate the Model Reduction to QUIC
30 nz = 838 Student Version of MATLAB Hanne Iryna Vitor Dominique Conclusions Simulation Inverse Covariance Test Matrix Multivariate Normal Sampling Σ 1 = 0 x 1, x 2,..., x N Inverse Covariance Estimation ˆΣ 1 Sample Covariance S = xx T /N
31 Timing (s) Sample data: X R d N with N = 100. Covariance matrix S R d d. Algorithm d = 49 d = 100 d = 200 d = 400 S GLASSO 0.01 ± ± ± ± 0.22 QUIC 0.04 ± ± ± ± 2.46 quasi-newton
32 Error Analysis (N = 100) quasi!newton QUIC or GLASSO S inverse Frobenius norm d
33 Training Set extracted from
34 Results original noisy (! = 5%) GRMF denoising TV denoising
35 Remarks 1. Computed Inverse Covariance Matrix on subblocks and aggregated blocks; 2. Assumed Stationarity; 3. Assumed Gaussianity of the image pixels; 4. Performed very basic (poor) alignment of training set; 5. Had a small (29) (and unhealthy) training set; Nevertheless, it performs better than TV denoising!
36 Future Challenges 1. Computational Challenge: computing inverse covariance matrix on full images (e.g. 40, , 000) Could Iteratively Re-weighted Least Squares Minimization for Sparse Recovery help? 2. Parameter selection: warm start and generalized cross-validation; 3. Non-stationary model: use bigger and better aligned database; 4. Try Gaussianity model in transform domain or higher order models;
37 THANK YOU FOR YOUR TIME!!!
Statistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationPattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
More information(Quasi-)Newton methods
(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationBig Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
More informationModel Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data
Journal of Machine Learning Research 9 (2008) 485-516 Submitted 5/07; Revised 12/07; Published 3/08 Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data
More informationProbabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur
Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:
More informationEpipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.
Epipolar Geometry We consider two perspective images of a scene as taken from a stereo pair of cameras (or equivalently, assume the scene is rigid and imaged with a single camera from two different locations).
More information10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method
578 CHAPTER 1 NUMERICAL METHODS 1. ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS As a numerical technique, Gaussian elimination is rather unusual because it is direct. That is, a solution is obtained after
More informationClassification by Pairwise Coupling
Classification by Pairwise Coupling TREVOR HASTIE * Stanford University and ROBERT TIBSHIRANI t University of Toronto Abstract We discuss a strategy for polychotomous classification that involves estimating
More information4F7 Adaptive Filters (and Spectrum Estimation) Least Mean Square (LMS) Algorithm Sumeetpal Singh Engineering Department Email : sss40@eng.cam.ac.
4F7 Adaptive Filters (and Spectrum Estimation) Least Mean Square (LMS) Algorithm Sumeetpal Singh Engineering Department Email : sss40@eng.cam.ac.uk 1 1 Outline The LMS algorithm Overview of LMS issues
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationSOLVING LINEAR SYSTEMS
SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationCCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationLocal classification and local likelihoods
Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor
More informationVariational approach to restore point-like and curve-like singularities in imaging
Variational approach to restore point-like and curve-like singularities in imaging Daniele Graziani joint work with Gilles Aubert and Laure Blanc-Féraud Roma 12/06/2012 Daniele Graziani (Roma) 12/06/2012
More informationNumerical Methods For Image Restoration
Numerical Methods For Image Restoration CIRAM Alessandro Lanza University of Bologna, Italy Faculty of Engineering CIRAM Outline 1. Image Restoration as an inverse problem 2. Image degradation models:
More informationLecture Topic: Low-Rank Approximations
Lecture Topic: Low-Rank Approximations Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original
More informationAirport Planning and Design. Excel Solver
Airport Planning and Design Excel Solver Dr. Antonio A. Trani Professor of Civil and Environmental Engineering Virginia Polytechnic Institute and State University Blacksburg, Virginia Spring 2012 1 of
More informationFactorization Theorems
Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization
More informationSmoothing and Non-Parametric Regression
Smoothing and Non-Parametric Regression Germán Rodríguez grodri@princeton.edu Spring, 2001 Objective: to estimate the effects of covariates X on a response y nonparametrically, letting the data suggest
More informationNumerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems
Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001,
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationCS229 Lecture notes. Andrew Ng
CS229 Lecture notes Andrew Ng Part X Factor analysis Whenwehavedatax (i) R n thatcomesfromamixtureofseveral Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting, we usually
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationApplications to Data Smoothing and Image Processing I
Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is
More informationi=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by
Statistics 580 Maximum Likelihood Estimation Introduction Let y (y 1, y 2,..., y n be a vector of iid, random variables from one of a family of distributions on R n and indexed by a p-dimensional parameter
More informationExact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure
Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
More informationPa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on
Pa8ern Recogni6on and Machine Learning Chapter 4: Linear Models for Classifica6on Represen'ng the target values for classifica'on If there are only two classes, we typically use a single real valued output
More informationA Learning Based Method for Super-Resolution of Low Resolution Images
A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method
More informationBIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION
BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION Ş. İlker Birbil Sabancı University Ali Taylan Cemgil 1, Hazal Koptagel 1, Figen Öztoprak 2, Umut Şimşekli
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationRow Echelon Form and Reduced Row Echelon Form
These notes closely follow the presentation of the material given in David C Lay s textbook Linear Algebra and its Applications (3rd edition) These notes are intended primarily for in-class presentation
More informationExtracting correlation structure from large random matrices
Extracting correlation structure from large random matrices Alfred Hero University of Michigan - Ann Arbor Feb. 17, 2012 1 / 46 1 Background 2 Graphical models 3 Screening for hubs in graphical model 4
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationSparse modeling: some unifying theory and word-imaging
Sparse modeling: some unifying theory and word-imaging Bin Yu UC Berkeley Departments of Statistics, and EECS Based on joint work with: Sahand Negahban (UC Berkeley) Pradeep Ravikumar (UT Austin) Martin
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
More informationLABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationGI01/M055 Supervised Learning Proximal Methods
GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators
More information6. Cholesky factorization
6. Cholesky factorization EE103 (Fall 2011-12) triangular matrices forward and backward substitution the Cholesky factorization solving Ax = b with A positive definite inverse of a positive definite matrix
More informationPricing and calibration in local volatility models via fast quantization
Pricing and calibration in local volatility models via fast quantization Parma, 29 th January 2015. Joint work with Giorgia Callegaro and Martino Grasselli Quantization: a brief history Birth: back to
More informationParametric Statistical Modeling
Parametric Statistical Modeling ECE 275A Statistical Parameter Estimation Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 275A SPE Version 1.1 Fall 2012 1 / 12 Why
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More informationBinary Image Reconstruction
A network flow algorithm for reconstructing binary images from discrete X-rays Kees Joost Batenburg Leiden University and CWI, The Netherlands kbatenbu@math.leidenuniv.nl Abstract We present a new algorithm
More information7 Gaussian Elimination and LU Factorization
7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method
More informationPartial Least Squares (PLS) Regression.
Partial Least Squares (PLS) Regression. Hervé Abdi 1 The University of Texas at Dallas Introduction Pls regression is a recent technique that generalizes and combines features from principal component
More informationConvolution. 1D Formula: 2D Formula: Example on the web: http://www.jhu.edu/~signals/convolve/
Basic Filters (7) Convolution/correlation/Linear filtering Gaussian filters Smoothing and noise reduction First derivatives of Gaussian Second derivative of Gaussian: Laplacian Oriented Gaussian filters
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationStrong Rules for Discarding Predictors in Lasso-type Problems
Strong Rules for Discarding Predictors in Lasso-type Problems Robert Tibshirani, Jacob Bien, Jerome Friedman, Trevor Hastie, Noah Simon, Jonathan Taylor, and Ryan J. Tibshirani Departments of Statistics
More informationBindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8
Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationGLM, insurance pricing & big data: paying attention to convergence issues.
GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationAbstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix multiplication).
MAT 2 (Badger, Spring 202) LU Factorization Selected Notes September 2, 202 Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix
More information1 Determinants and the Solvability of Linear Systems
1 Determinants and the Solvability of Linear Systems In the last section we learned how to use Gaussian elimination to solve linear systems of n equations in n unknowns The section completely side-stepped
More informationHigh Quality Image Deblurring Panchromatic Pixels
High Quality Image Deblurring Panchromatic Pixels ACM Transaction on Graphics vol. 31, No. 5, 2012 Sen Wang, Tingbo Hou, John Border, Hong Qin, and Rodney Miller Presented by Bong-Seok Choi School of Electrical
More informationReduced echelon form: Add the following conditions to conditions 1, 2, and 3 above:
Section 1.2: Row Reduction and Echelon Forms Echelon form (or row echelon form): 1. All nonzero rows are above any rows of all zeros. 2. Each leading entry (i.e. left most nonzero entry) of a row is in
More informationSeveral Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
More informationLinear Equations ! 25 30 35$ & " 350 150% & " 11,750 12,750 13,750% MATHEMATICS LEARNING SERVICE Centre for Learning and Professional Development
MathsTrack (NOTE Feb 2013: This is the old version of MathsTrack. New books will be created during 2013 and 2014) Topic 4 Module 9 Introduction Systems of to Matrices Linear Equations Income = Tickets!
More informationThe Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem
More information1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationDecember 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation
More informationBig Data Techniques Applied to Very Short-term Wind Power Forecasting
Big Data Techniques Applied to Very Short-term Wind Power Forecasting Ricardo Bessa Senior Researcher (ricardo.j.bessa@inesctec.pt) Center for Power and Energy Systems, INESC TEC, Portugal Joint work with
More informationData analysis in supersaturated designs
Statistics & Probability Letters 59 (2002) 35 44 Data analysis in supersaturated designs Runze Li a;b;, Dennis K.J. Lin a;b a Department of Statistics, The Pennsylvania State University, University Park,
More informationDetection of changes in variance using binary segmentation and optimal partitioning
Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the
More informationBilinear Prediction Using Low-Rank Models
Bilinear Prediction Using Low-Rank Models Inderjit S. Dhillon Dept of Computer Science UT Austin 26th International Conference on Algorithmic Learning Theory Banff, Canada Oct 6, 2015 Joint work with C-J.
More informationDistributed Machine Learning and Big Data
Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya
More informationYousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008
A tutorial on: Iterative methods for Sparse Matrix Problems Yousef Saad University of Minnesota Computer Science and Engineering CRM Montreal - April 30, 2008 Outline Part 1 Sparse matrices and sparsity
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationSimilar matrices and Jordan form
Similar matrices and Jordan form We ve nearly covered the entire heart of linear algebra once we ve finished singular value decompositions we ll have seen all the most central topics. A T A is positive
More informationProbabilistic Latent Semantic Analysis (plsa)
Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References
More informationNonlinear Optimization: Algorithms 3: Interior-point methods
Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationPart II Redundant Dictionaries and Pursuit Algorithms
Aisenstadt Chair Course CRM September 2009 Part II Redundant Dictionaries and Pursuit Algorithms Stéphane Mallat Centre de Mathématiques Appliquées Ecole Polytechnique Sparsity in Redundant Dictionaries
More informationDirect Methods for Solving Linear Systems. Matrix Factorization
Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationTWO L 1 BASED NONCONVEX METHODS FOR CONSTRUCTING SPARSE MEAN REVERTING PORTFOLIOS
TWO L 1 BASED NONCONVEX METHODS FOR CONSTRUCTING SPARSE MEAN REVERTING PORTFOLIOS XIAOLONG LONG, KNUT SOLNA, AND JACK XIN Abstract. We study the problem of constructing sparse and fast mean reverting portfolios.
More informationLecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method
Lecture 3 3B1B Optimization Michaelmas 2015 A. Zisserman Linear Programming Extreme solutions Simplex method Interior point method Integer programming and relaxation The Optimization Tree Linear Programming
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl
More informationSECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA
SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem. The Section 1 presents a geometric motivation for the
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationAPPLIED MISSING DATA ANALYSIS
APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview
More informationFactor analysis. Angela Montanari
Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number
More informationGeneral Framework for an Iterative Solution of Ax b. Jacobi s Method
2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,
More informationA Stochastic 3MG Algorithm with Application to 2D Filter Identification
A Stochastic 3MG Algorithm with Application to 2D Filter Identification Emilie Chouzenoux 1, Jean-Christophe Pesquet 1, and Anisia Florescu 2 1 Laboratoire d Informatique Gaspard Monge - CNRS Univ. Paris-Est,
More informationGaussian Processes for Big Data Problems
Gaussian Processes for Big Data Problems Marc Deisenroth Department of Computing Imperial College London http://wp.doc.ic.ac.uk/sml/marc-deisenroth Machine Learning Summer School, Chalmers University 14
More informationClarify Some Issues on the Sparse Bayesian Learning for Sparse Signal Recovery
Clarify Some Issues on the Sparse Bayesian Learning for Sparse Signal Recovery Zhilin Zhang and Bhaskar D. Rao Technical Report University of California at San Diego September, Abstract Sparse Bayesian
More informationManifold Learning Examples PCA, LLE and ISOMAP
Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition
More informationDerivative Free Optimization
Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTH-MAT-R--2014/02--SE Department of Mathematics Linköping University S-581 83 Linköping, Sweden. Three lectures 1 on Derivative Free
More informationMethods of Data Analysis Working with probability distributions
Methods of Data Analysis Working with probability distributions Week 4 1 Motivation One of the key problems in non-parametric data analysis is to create a good model of a generating probability distribution,
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More information