Estimating the Inverse Covariance Matrix of Independent Multivariate Normally Distributed Random Variables

Size: px
Start display at page:

Download "Estimating the Inverse Covariance Matrix of Independent Multivariate Normally Distributed Random Variables"

Transcription

1 Estimating the Inverse Covariance Matrix of Independent Multivariate Normally Distributed Random Variables Dominique Brunet, Hanne Kekkonen, Vitor Nunes, Iryna Sivak Fields-MITACS Thematic Program on Inverse Problems and Imaging Group on Bayesian/Statistical Inverse Problems July 27, 2012

2 Menu Hanne Problem Formulation and Motivation Frequentist and Bayesian Origins Relation to Imaging Iryna The LASSO Methods Vitor Newton Methods and Quasi-Newton Methods The QUIC Trick Thoughts on Future Work Dominique Simulation and Results Conclusions

3 Bayes formula brings a priori information and measured data together Statistical Inversion We consider the linear measurement model m = Ax + ε where x R p and m R p are treated as random variables. Bayes formula Bayes formula gives us the posterior distribution π(x m): π(x m) π(m x)π(x) We recover x as a point estimate from π(x m).

4 Likelihood distribution describes the measurement If we assume that the noise ε is Gaussian ε k N p(µ, σ 2 ), 1 k p we can write the likelihood distribution which measures data misfit as ( 1 π(m x) = π ε( Ax m ) = σ (2π) exp 1 ) 2σ Ax 2 m 2 2.

5 Prior distribution contains all other information about unknown x The prior distribution shouldn t depend on the measurement. The prior distribution should assign a clearly higher probability to those values of x that we expect to see than to unexpected x. Designing a good prior distributions is one of the main difficulties in statistical inversion.

6 Gaussian prior distribution Gaussian model If we assume that also x is Gaussian we can write π(x) = (det(σ 1 )) 1/2 (2π) p/2 x N p(µ, Σ) ( exp 1 ) 2 (x µ)t Σ 1 (x µ)

7 Estimating inverse covariance matrix Σ 1 We consider the problem of finding a good estimator for inverse covariance matrix Σ 1 with a constraint that certain given pairs of variables are conditionally independent. Conditional independence constraints describe the sparsity pattern of the inverse covariance matrix Σ 1, zeros showing the conditional independence between variables.

8 Why inverse covariance matrix? Simple example We can think a line of objects which are connected by a spring. If we move the leftmost object y 1 also the rightmost object y 5 of the line will move so clearly y 1 and y 5 are not independent.

9 However y 1 and y 5 are conditionally independent given all the other variables. This is because given the information of how y 2 moves knowing y 5 gives us no further information about y 1. In fact y i is conditionally dependent only of its neighbours.

10 What does this have to do with pictures? Neighbourhood -Rule We are interested in exploring the assumption that every pixel is conditionally dependent only of its neighbours nz = 838 In this case the inverse covariance matrix has non zero elements only on nine diagonals.

11 Log-likelihood function Likelihood function We want to estimate parameters Σ 1 and µ of N p(µ, Σ), based on N independent samples x i. The likelihood function is N π(x i Σ, µ) = det N Σ 2 i=1 (2π) Np 2 { exp 1 2 N } (x j µ) T σ 1 (x j µ) j=1 Log-likelihood function Log-likelihood function of data is, up to a constant L(Σ 1, µ) = N log det Σ 1 N (x j µ) T Σ 1 (x j µ) j=1

12 How to obtain sparse Σ 1 maximum likelihood estimate Maximizing the log-likelihood with respect to Σ 1 leads to the maximum likelihood estimate Σ 1 = S 1 which isn t usually sparse (here S is the sample covariance obtained from the data). Also when p > N, S will be singular and so the maximum likelihood estimate cannot be computed. Penalised log-likelihood function To find a sparse Σ 1 we use penalised log-likelihood function argmin Σ 1 0 { log det Σ 1 + (SΣ 1 ) + P Σ 1 1 } where where S is the sample covariance obtained from the data and P is our sparsity constraint.

13 In a nutshell π(x m) π(m x)π(x) Expect that x N p(µ, Σ). Penalised log-likelihood function What is Σ 1? Back to original inverse problem Solve the original inverse problem 1 using Σ argmin x Ax m 2 (x µ) T Σ 1 (x µ) Σ 1 Estimating Use algorithms to find Σ 1

14 Graphical Lasso a 11 a 12 a 13 Σ 1 1 = a 12 a 22 a 23 a 13 a 23 a 33 b 11 0 b 13 Σ 1 2 = 0 b 22 b 23 b 13 b 23 b 33

15 Graphical Lasso a 11 a 12 a 13 Σ 1 1 = a 12 a 22 a 23 a 13 a 23 a 33 b 11 0 b 13 Σ 1 2 = 0 b 22 b 23 b 13 b 23 b 33

16 Graphical Lasso

17 Graphical Lasso

18 GLASSO Θ := Σ 1 ˆΘ = argmin Θ 0 log det(θ) + tr(sθ) + λ Θ 1 ˆΘ 1 S λγ( ˆΘ) = 0, where (Γ( ˆΘ)) ij = Γ( ˆΘ ij ) the subgradient of ˆΘ ij : Γ( ˆΘ ij ) = sign( ˆΘ ij ) if ˆΘ ij 0, Γ( ˆΘ ij ) ( 1, 1) if ˆΘ ij = 0

19 GLASSO Θ := Σ 1, W := Θ 1 ( ) ( ) ( ) Θ11 θ Θ = 12 S11 s, S = 12 W11 w, W = 12 θ 21 θ 22 s 21 s 22 w 21 w 22 Start with W = S + λi. We also added Thresholding. Cycle around rows/columns till convergence: solve QP: 1 ˆβ = arg min β R p 1 2 β W 11 β + β s 12 + λ β 1, W 11 0 using as warm starts the solution from the previous round for this column update ŵ 12 = W 11 ˆβ save ˆβ for this column in the matrix B For every row/column compute ˆθ 22 = (s 22 + λ ˆβ ŵ 12 ) 1, ˆθ 12 = ˆβ ˆθ 22. [J. Friedman, T. Hastie, R. Tibshirani, 2007]

20 GLASSO 1.7 Theorem. Solution to the graphical lasso problem to be diagonal with blocks C 1, C 2,...C K S ij λ for all i C k, j C l, k l. Θ 1 Θ 2 Θ =... ΘK where Θ k solves the graphical lasso problem applied to submatrix of S consisting of the entries whose indices are in C k. [D.M. Witten, J. Friedman, N. Simon, 2011]

21 Problem Formulation Cost Functional Definition Our goal is to minimize the following functional, f (v), under the set of definite positive matrices. min f (Σ 1 ) = log(det Σ 1 ) + tr(σ 1 S) + ρ Σ R(Σ 1 ) 2 2 (1) Minimizing the previous is equivalent to minimize : min f (Θ) = log(det Θ) + tr(θs) + ρ Θ 0 2 R(Θ) 2 2 (2)

22 Model Reduction- Closest Neighbor Neighborhood -Rule Reduce the model by assuming that each pixel is only related to itself and to those which it shares a boundary or a corner. As the following figure shows: Neighborhood -Rule This rule permits to reduced the problem dimension from n = (n xn y) 2 to m = 5n xn y 3n x 3n y + 2, where n x and n y are the number of pixels on the x-axis and y-axis respectively.

23 Model Reduction-Problem Reformulation Transformation Once g : R m R n,n is defined, the problem becomes an unconstrained optimization problem on R m. Therefore the minimizing problem (2) is written as: min J(v) = f (g(v)) = log(det g(v)) + tr(g(v)s) + ρ v R m 2 R(g(v)) 2 2 (3)

24 Numerical Approach Quasi-Newton s Method Expanding J(v) to a quadratic function we get : J(v + ) J(v) + J(v) + t H(v) (4) Therefore the that minimizes J(v + ) must be the solution of = H 1 (v) J(v). So given an initial guess v 0 one can solve the iterative sequence: v k = v k 1 α k H 1 k (v k 1 )J (v k 1 ) (5)

25 Gradient Computation: The gradient of J(v) is computed exactly by: Where f (Θ) is given by J (v) = f (g(v)) g (v) R m (6) ( f ) i,j = Θ 1 j,i + S j,i + +ρ[r (Θ) Θ σ i,j ] i,j (7) Hessian Computation: Since is computational expensive to compute the Hessian, one should use an approximation method such as BFGS.

26 Numerical Results, n=100 Error Analysis g(v)s I 2 Error Analysis g(v) Θ 2

27 QUIC Algorithm, Cho Jui Hsieh University of Texas: Cost Function min Θ J(Θ) = log(det Θ) + tr(θs) + ρ 2 Θ 1 (8) Split Into a Differentiable Function and a Non-Differentiable g(θ) = log(det Θ) + tr(θs) (9) h(θ) = ρ 2 Θ 1 (10)

28 QUIC Algorithm, Cho Jui Hsieh University of Texas: Second Order Approximation of g(.) G( ) = g(θ + ) = g(θ) + g(θ + ) + t H (11) New Cost functional min F( ) = G( ) + h(θ + ) (12) Choice of = µ[e j e t i + e i e t j ] (13) min µ F (µ) (14)

29 Overview / Future Work Overview Model Reduction Numerical Approach Numerical Results QUIC Algorithm Future Work Improve Memory Efficiency Better Choice of x 0 Different Rules of Reduction Incorporate the Model Reduction to QUIC

30 nz = 838 Student Version of MATLAB Hanne Iryna Vitor Dominique Conclusions Simulation Inverse Covariance Test Matrix Multivariate Normal Sampling Σ 1 = 0 x 1, x 2,..., x N Inverse Covariance Estimation ˆΣ 1 Sample Covariance S = xx T /N

31 Timing (s) Sample data: X R d N with N = 100. Covariance matrix S R d d. Algorithm d = 49 d = 100 d = 200 d = 400 S GLASSO 0.01 ± ± ± ± 0.22 QUIC 0.04 ± ± ± ± 2.46 quasi-newton

32 Error Analysis (N = 100) quasi!newton QUIC or GLASSO S inverse Frobenius norm d

33 Training Set extracted from

34 Results original noisy (! = 5%) GRMF denoising TV denoising

35 Remarks 1. Computed Inverse Covariance Matrix on subblocks and aggregated blocks; 2. Assumed Stationarity; 3. Assumed Gaussianity of the image pixels; 4. Performed very basic (poor) alignment of training set; 5. Had a small (29) (and unhealthy) training set; Nevertheless, it performs better than TV denoising!

36 Future Challenges 1. Computational Challenge: computing inverse covariance matrix on full images (e.g. 40, , 000) Could Iteratively Re-weighted Least Squares Minimization for Sparse Recovery help? 2. Parameter selection: warm start and generalized cross-validation; 3. Non-stationary model: use bigger and better aligned database; 4. Try Gaussianity model in transform domain or higher order models;

37 THANK YOU FOR YOUR TIME!!!

Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data

Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data Journal of Machine Learning Research 9 (2008) 485-516 Submitted 5/07; Revised 12/07; Published 3/08 Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data

More information

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:

More information

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R. Epipolar Geometry We consider two perspective images of a scene as taken from a stereo pair of cameras (or equivalently, assume the scene is rigid and imaged with a single camera from two different locations).

More information

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method 578 CHAPTER 1 NUMERICAL METHODS 1. ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS As a numerical technique, Gaussian elimination is rather unusual because it is direct. That is, a solution is obtained after

More information

Classification by Pairwise Coupling

Classification by Pairwise Coupling Classification by Pairwise Coupling TREVOR HASTIE * Stanford University and ROBERT TIBSHIRANI t University of Toronto Abstract We discuss a strategy for polychotomous classification that involves estimating

More information

4F7 Adaptive Filters (and Spectrum Estimation) Least Mean Square (LMS) Algorithm Sumeetpal Singh Engineering Department Email : sss40@eng.cam.ac.

4F7 Adaptive Filters (and Spectrum Estimation) Least Mean Square (LMS) Algorithm Sumeetpal Singh Engineering Department Email : sss40@eng.cam.ac. 4F7 Adaptive Filters (and Spectrum Estimation) Least Mean Square (LMS) Algorithm Sumeetpal Singh Engineering Department Email : sss40@eng.cam.ac.uk 1 1 Outline The LMS algorithm Overview of LMS issues

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

SOLVING LINEAR SYSTEMS

SOLVING LINEAR SYSTEMS SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Local classification and local likelihoods

Local classification and local likelihoods Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor

More information

Variational approach to restore point-like and curve-like singularities in imaging

Variational approach to restore point-like and curve-like singularities in imaging Variational approach to restore point-like and curve-like singularities in imaging Daniele Graziani joint work with Gilles Aubert and Laure Blanc-Féraud Roma 12/06/2012 Daniele Graziani (Roma) 12/06/2012

More information

Numerical Methods For Image Restoration

Numerical Methods For Image Restoration Numerical Methods For Image Restoration CIRAM Alessandro Lanza University of Bologna, Italy Faculty of Engineering CIRAM Outline 1. Image Restoration as an inverse problem 2. Image degradation models:

More information

Lecture Topic: Low-Rank Approximations

Lecture Topic: Low-Rank Approximations Lecture Topic: Low-Rank Approximations Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original

More information

Airport Planning and Design. Excel Solver

Airport Planning and Design. Excel Solver Airport Planning and Design Excel Solver Dr. Antonio A. Trani Professor of Civil and Environmental Engineering Virginia Polytechnic Institute and State University Blacksburg, Virginia Spring 2012 1 of

More information

Factorization Theorems

Factorization Theorems Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

More information

Smoothing and Non-Parametric Regression

Smoothing and Non-Parametric Regression Smoothing and Non-Parametric Regression Germán Rodríguez grodri@princeton.edu Spring, 2001 Objective: to estimate the effects of covariates X on a response y nonparametrically, letting the data suggest

More information

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001,

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part X Factor analysis Whenwehavedatax (i) R n thatcomesfromamixtureofseveral Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting, we usually

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

Applications to Data Smoothing and Image Processing I

Applications to Data Smoothing and Image Processing I Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is

More information

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by Statistics 580 Maximum Likelihood Estimation Introduction Let y (y 1, y 2,..., y n be a vector of iid, random variables from one of a family of distributions on R n and indexed by a p-dimensional parameter

More information

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

More information

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on Pa8ern Recogni6on and Machine Learning Chapter 4: Linear Models for Classifica6on Represen'ng the target values for classifica'on If there are only two classes, we typically use a single real valued output

More information

A Learning Based Method for Super-Resolution of Low Resolution Images

A Learning Based Method for Super-Resolution of Low Resolution Images A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method

More information

BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION

BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION Ş. İlker Birbil Sabancı University Ali Taylan Cemgil 1, Hazal Koptagel 1, Figen Öztoprak 2, Umut Şimşekli

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Row Echelon Form and Reduced Row Echelon Form

Row Echelon Form and Reduced Row Echelon Form These notes closely follow the presentation of the material given in David C Lay s textbook Linear Algebra and its Applications (3rd edition) These notes are intended primarily for in-class presentation

More information

Extracting correlation structure from large random matrices

Extracting correlation structure from large random matrices Extracting correlation structure from large random matrices Alfred Hero University of Michigan - Ann Arbor Feb. 17, 2012 1 / 46 1 Background 2 Graphical models 3 Screening for hubs in graphical model 4

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Sparse modeling: some unifying theory and word-imaging

Sparse modeling: some unifying theory and word-imaging Sparse modeling: some unifying theory and word-imaging Bin Yu UC Berkeley Departments of Statistics, and EECS Based on joint work with: Sahand Negahban (UC Berkeley) Pradeep Ravikumar (UT Austin) Martin

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

GI01/M055 Supervised Learning Proximal Methods

GI01/M055 Supervised Learning Proximal Methods GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators

More information

6. Cholesky factorization

6. Cholesky factorization 6. Cholesky factorization EE103 (Fall 2011-12) triangular matrices forward and backward substitution the Cholesky factorization solving Ax = b with A positive definite inverse of a positive definite matrix

More information

Pricing and calibration in local volatility models via fast quantization

Pricing and calibration in local volatility models via fast quantization Pricing and calibration in local volatility models via fast quantization Parma, 29 th January 2015. Joint work with Giorgia Callegaro and Martino Grasselli Quantization: a brief history Birth: back to

More information

Parametric Statistical Modeling

Parametric Statistical Modeling Parametric Statistical Modeling ECE 275A Statistical Parameter Estimation Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 275A SPE Version 1.1 Fall 2012 1 / 12 Why

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

Binary Image Reconstruction

Binary Image Reconstruction A network flow algorithm for reconstructing binary images from discrete X-rays Kees Joost Batenburg Leiden University and CWI, The Netherlands kbatenbu@math.leidenuniv.nl Abstract We present a new algorithm

More information

7 Gaussian Elimination and LU Factorization

7 Gaussian Elimination and LU Factorization 7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method

More information

Partial Least Squares (PLS) Regression.

Partial Least Squares (PLS) Regression. Partial Least Squares (PLS) Regression. Hervé Abdi 1 The University of Texas at Dallas Introduction Pls regression is a recent technique that generalizes and combines features from principal component

More information

Convolution. 1D Formula: 2D Formula: Example on the web: http://www.jhu.edu/~signals/convolve/

Convolution. 1D Formula: 2D Formula: Example on the web: http://www.jhu.edu/~signals/convolve/ Basic Filters (7) Convolution/correlation/Linear filtering Gaussian filters Smoothing and noise reduction First derivatives of Gaussian Second derivative of Gaussian: Laplacian Oriented Gaussian filters

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Strong Rules for Discarding Predictors in Lasso-type Problems

Strong Rules for Discarding Predictors in Lasso-type Problems Strong Rules for Discarding Predictors in Lasso-type Problems Robert Tibshirani, Jacob Bien, Jerome Friedman, Trevor Hastie, Noah Simon, Jonathan Taylor, and Ryan J. Tibshirani Departments of Statistics

More information

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8 Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

GLM, insurance pricing & big data: paying attention to convergence issues.

GLM, insurance pricing & big data: paying attention to convergence issues. GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix multiplication).

Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix multiplication). MAT 2 (Badger, Spring 202) LU Factorization Selected Notes September 2, 202 Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix

More information

1 Determinants and the Solvability of Linear Systems

1 Determinants and the Solvability of Linear Systems 1 Determinants and the Solvability of Linear Systems In the last section we learned how to use Gaussian elimination to solve linear systems of n equations in n unknowns The section completely side-stepped

More information

High Quality Image Deblurring Panchromatic Pixels

High Quality Image Deblurring Panchromatic Pixels High Quality Image Deblurring Panchromatic Pixels ACM Transaction on Graphics vol. 31, No. 5, 2012 Sen Wang, Tingbo Hou, John Border, Hong Qin, and Rodney Miller Presented by Bong-Seok Choi School of Electrical

More information

Reduced echelon form: Add the following conditions to conditions 1, 2, and 3 above:

Reduced echelon form: Add the following conditions to conditions 1, 2, and 3 above: Section 1.2: Row Reduction and Echelon Forms Echelon form (or row echelon form): 1. All nonzero rows are above any rows of all zeros. 2. Each leading entry (i.e. left most nonzero entry) of a row is in

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Linear Equations ! 25 30 35$ & " 350 150% & " 11,750 12,750 13,750% MATHEMATICS LEARNING SERVICE Centre for Learning and Professional Development

Linear Equations ! 25 30 35$ &  350 150% &  11,750 12,750 13,750% MATHEMATICS LEARNING SERVICE Centre for Learning and Professional Development MathsTrack (NOTE Feb 2013: This is the old version of MathsTrack. New books will be created during 2013 and 2014) Topic 4 Module 9 Introduction Systems of to Matrices Linear Equations Income = Tickets!

More information

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

More information

1 Maximum likelihood estimation

1 Maximum likelihood estimation COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

More information

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

More information

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation

More information

Big Data Techniques Applied to Very Short-term Wind Power Forecasting

Big Data Techniques Applied to Very Short-term Wind Power Forecasting Big Data Techniques Applied to Very Short-term Wind Power Forecasting Ricardo Bessa Senior Researcher (ricardo.j.bessa@inesctec.pt) Center for Power and Energy Systems, INESC TEC, Portugal Joint work with

More information

Data analysis in supersaturated designs

Data analysis in supersaturated designs Statistics & Probability Letters 59 (2002) 35 44 Data analysis in supersaturated designs Runze Li a;b;, Dennis K.J. Lin a;b a Department of Statistics, The Pennsylvania State University, University Park,

More information

Detection of changes in variance using binary segmentation and optimal partitioning

Detection of changes in variance using binary segmentation and optimal partitioning Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the

More information

Bilinear Prediction Using Low-Rank Models

Bilinear Prediction Using Low-Rank Models Bilinear Prediction Using Low-Rank Models Inderjit S. Dhillon Dept of Computer Science UT Austin 26th International Conference on Algorithmic Learning Theory Banff, Canada Oct 6, 2015 Joint work with C-J.

More information

Distributed Machine Learning and Big Data

Distributed Machine Learning and Big Data Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya

More information

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008 A tutorial on: Iterative methods for Sparse Matrix Problems Yousef Saad University of Minnesota Computer Science and Engineering CRM Montreal - April 30, 2008 Outline Part 1 Sparse matrices and sparsity

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

Similar matrices and Jordan form

Similar matrices and Jordan form Similar matrices and Jordan form We ve nearly covered the entire heart of linear algebra once we ve finished singular value decompositions we ll have seen all the most central topics. A T A is positive

More information

Probabilistic Latent Semantic Analysis (plsa)

Probabilistic Latent Semantic Analysis (plsa) Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References

More information

Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Part II Redundant Dictionaries and Pursuit Algorithms

Part II Redundant Dictionaries and Pursuit Algorithms Aisenstadt Chair Course CRM September 2009 Part II Redundant Dictionaries and Pursuit Algorithms Stéphane Mallat Centre de Mathématiques Appliquées Ecole Polytechnique Sparsity in Redundant Dictionaries

More information

Direct Methods for Solving Linear Systems. Matrix Factorization

Direct Methods for Solving Linear Systems. Matrix Factorization Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011

More information

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

More information

TWO L 1 BASED NONCONVEX METHODS FOR CONSTRUCTING SPARSE MEAN REVERTING PORTFOLIOS

TWO L 1 BASED NONCONVEX METHODS FOR CONSTRUCTING SPARSE MEAN REVERTING PORTFOLIOS TWO L 1 BASED NONCONVEX METHODS FOR CONSTRUCTING SPARSE MEAN REVERTING PORTFOLIOS XIAOLONG LONG, KNUT SOLNA, AND JACK XIN Abstract. We study the problem of constructing sparse and fast mean reverting portfolios.

More information

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method Lecture 3 3B1B Optimization Michaelmas 2015 A. Zisserman Linear Programming Extreme solutions Simplex method Interior point method Integer programming and relaxation The Optimization Tree Linear Programming

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl

More information

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem. The Section 1 presents a geometric motivation for the

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

More information

Factor analysis. Angela Montanari

Factor analysis. Angela Montanari Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

More information

General Framework for an Iterative Solution of Ax b. Jacobi s Method

General Framework for an Iterative Solution of Ax b. Jacobi s Method 2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,

More information

A Stochastic 3MG Algorithm with Application to 2D Filter Identification

A Stochastic 3MG Algorithm with Application to 2D Filter Identification A Stochastic 3MG Algorithm with Application to 2D Filter Identification Emilie Chouzenoux 1, Jean-Christophe Pesquet 1, and Anisia Florescu 2 1 Laboratoire d Informatique Gaspard Monge - CNRS Univ. Paris-Est,

More information

Gaussian Processes for Big Data Problems

Gaussian Processes for Big Data Problems Gaussian Processes for Big Data Problems Marc Deisenroth Department of Computing Imperial College London http://wp.doc.ic.ac.uk/sml/marc-deisenroth Machine Learning Summer School, Chalmers University 14

More information

Clarify Some Issues on the Sparse Bayesian Learning for Sparse Signal Recovery

Clarify Some Issues on the Sparse Bayesian Learning for Sparse Signal Recovery Clarify Some Issues on the Sparse Bayesian Learning for Sparse Signal Recovery Zhilin Zhang and Bhaskar D. Rao Technical Report University of California at San Diego September, Abstract Sparse Bayesian

More information

Manifold Learning Examples PCA, LLE and ISOMAP

Manifold Learning Examples PCA, LLE and ISOMAP Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

More information

Derivative Free Optimization

Derivative Free Optimization Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTH-MAT-R--2014/02--SE Department of Mathematics Linköping University S-581 83 Linköping, Sweden. Three lectures 1 on Derivative Free

More information

Methods of Data Analysis Working with probability distributions

Methods of Data Analysis Working with probability distributions Methods of Data Analysis Working with probability distributions Week 4 1 Motivation One of the key problems in non-parametric data analysis is to create a good model of a generating probability distribution,

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information