Constrained Least Squares



Similar documents
Linear Algebra Review. Vectors

Lecture 5: Singular Value Decomposition SVD (1)

1 Introduction to Matrices

Derivative Free Optimization

CS3220 Lecture Notes: QR factorization and orthogonal transformations

Examination paper for TMA4205 Numerical Linear Algebra

Linear Algebra Methods for Data Mining

Linear Algebraic Equations, SVD, and the Pseudo-Inverse

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

Least-Squares Intersection of Lines

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

4.1. COMPLEX NUMBERS

Nonlinear Programming Methods.S2 Quadratic Programming

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

3 Orthogonal Vectors and Matrices

Similarity and Diagonalization. Similar Matrices

Factorization Theorems

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Section Inner Products and Norms

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Duality in General Programs. Ryan Tibshirani Convex Optimization /36-725

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Numerical Methods I Eigenvalue Problems

Lecture 5 Least-squares

Lecture 5 Principal Minors and the Hessian

More than you wanted to know about quadratic forms

Chapter 7. Matrices. Definition. An m n matrix is an array of numbers set out in m rows and n columns. Examples. (

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Inner Product Spaces and Orthogonality

Statistical Machine Learning

Matrix Differentiation

Vector and Matrix Norms

Introduction to Matrix Algebra

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

a 1 x + a 0 =0. (3) ax 2 + bx + c =0. (4)

What is Linear Programming?

. P. 4.3 Basic feasible solutions and vertices of polyhedra. x 1. x 2

Orthogonal Diagonalization of Symmetric Matrices

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

DATA ANALYSIS II. Matrix Algorithms

Date: April 12, Contents

Similar matrices and Jordan form

October 3rd, Linear Algebra & Properties of the Covariance Matrix

Finite Dimensional Hilbert Spaces and Linear Inverse Problems

2.3 Convex Constrained Optimization Problems

Lecture 2 Matrix Operations

1 Solving LPs: The Simplex Algorithm of George Dantzig

Linear Algebra I. Ronald van Luijk, 2012

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = i.

Methods for Finding Bases

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).


SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Question 2: How do you solve a matrix equation using the matrix inverse?

Notes on Symmetric Matrices

Quadratic forms Cochran s theorem, degrees of freedom, and all that

The Image Deblurring Problem

Linear Algebra: Vectors

Elementary Gradient-Based Parameter Estimation

Applied Linear Algebra I Review page 1

Lecture 13 Linear quadratic Lyapunov theory

Ridge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS

F Matrix Calculus F 1

Linear Programming. March 14, 2014

Nonlinear Iterative Partial Least Squares Method

T ( a i x i ) = a i T (x i ).

LECTURE: INTRO TO LINEAR PROGRAMMING AND THE SIMPLEX METHOD, KEVIN ROSS MARCH 31, 2005

Moving Least Squares Approximation

University of Lille I PC first year list of exercises n 7. Review

6. Cholesky factorization

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances

Least Squares Estimation

Inner Product Spaces

ASEN Structures. MDOF Dynamic Systems. ASEN 3112 Lecture 1 Slide 1

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Towards Online Recognition of Handwritten Mathematics

Fitting Subject-specific Curves to Grouped Longitudinal Data

Data Mining: Algorithms and Applications Matrix Math Review

Cost Minimization and the Cost Function

Algebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the school year.

Introduction to General and Generalized Linear Models

3. Regression & Exponential Smoothing

Systems of Linear Equations

4: EIGENVALUES, EIGENVECTORS, DIAGONALIZATION

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.

Understanding the Impact of Weights Constraints in Portfolio Theory

Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS

18.06 Problem Set 4 Solution Due Wednesday, 11 March 2009 at 4 pm in Total: 175 points.

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

EXCEL SOLVER TUTORIAL

Eigenvalues and Eigenvectors

Multivariate normal distribution and testing for means (see MKB Ch 3)

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Linear Algebra: Determinants, Inverses, Rank

Lecture 4: Partitioned Matrices and Determinants

Transcription:

Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580-587 CICN may05/1

Background The least squares problem: min Ax b 2 x Sometimes, we want x to be chosen from some proper subset S R n. Example: S = {x R n s.t. x 2 = 1} Such problems can be solved using the QR factorization and the singular value decomposition (SVD). CICN may05/2

Least Squares with a Quadratic Inequality Constraint (LSQI) General problem: min Ax b 2 x s.t. Bx d 2 α where: A R m,n (m n), b R m, B R p,n, d R p, α 0 CICN may05/3

Assume the generalized SVD of matrices A and B given as: U T AX = diag(α 1,..., α n ), U T U = I m V T BX = diag(β 1,..., β q ), V T V = I p, q = min{p, n} Assume also the following definitions: b U T b, d V T d, y X 1 x Then the problem becomes: min D A y b 2 y s.t. D B y d 2 α CICN may05/4

min D A y b 2 y s.t. D B y d 2 α Correctness: By inserting the definitions we get: D A y b 2 = U T AXX 1 x U T b 2 = U T (Ax b) 2 Multiplication with an orthogonal matrix does not affect the 2-norm. (The same result applies for the inequality constraint.) CICN may05/5

The objective function becomes: n (α i y i b ) 2 m i + b 2 i The constraint becomes: r We have: i=1 i=1 (β i y i d i ) 2 + p i=r +1 r = rank(b) i=n+1 d 2 i α2 β r +1 = β r +2 =... = β q = 0 CICN may05/6

We have a solution if and only if: p d 2 i α2 i=r +1 Otherwise, there is obviously no way to satisfy the constraint. CICN may05/7

Special Case: p i=r +1 d 2 i = α2 The first sum in (12.1.5) must equal zero, this means: y i = d i β i, i [1, r ] The remainder of the variables can be chosen to minimize the first sum in (12.1.4): y i = b i α i, i [r + 1, n] (Of course, if α i = 0, i [r + 1, n], this does not make any sense. We then choose y i = 0.) CICN may05/8

The General Case: p i=r +1 d 2 i < α2 The minimization (without regards to the constraint) is given by: b i /α i α i 0 y i = d i /β i α i = 0 This may or may not be a feasible solution, depending on whether it is in S. CICN may05/9

The Method of Lagrange Multipliers h(λ, y) = D A y b 2 2 + λ ( D B y d 2 2 α2) Solve h y i, i = 1,..., n, this yields: ( ) D T A D A + λd T B D B y = D T A b + λd T B d CICN may05/10

Solution using Lagrange multipliers: y i (λ) = b i α i bi +λβ i di α 2 i +λβ2 i α i i = 1, 2,..., q i = q + 1,..., n CICN may05/11

Determining the Lagrange parameter, λ Define: φ(λ) D B y(λ) d 2 2 = r i=1 ( α i β i bi α i di α 2 i + λβ2 i ) 2 + p i=r +1 Solve for φ(λ) = α 2. Because φ(0) > α 2 and the function is monotone decreasing for λ > 0, we know that there must be a unique, positive solution λ with φ(λ ) = α 2. d 2 i CICN may05/12

Algorithm: Spherical Constraint The special case B = I n, d = 0, α < 0 can be interpreted as selecting x from the interior of an n-dimensional sphere. It can be solved using the following algorithm: [U, Σ, V] SVD(A) b U T b r rank (A) CICN may05/13

Algorithm: Sperical Constraint if r i=1 ( bi λ solve σ i ) 2 > α 2 : ( r i=1 ( x ( ) r σ i b i i=1 v σ 2 i i +λ else: x r i=1 end if ( bi σ i ) v i σ i b i σ 2 i +λ ) 2 = α 2 ) Computing the SVD is the most computationally intense operation in the above algorithm. CICN may05/14

Spherical Constraint as Ridge Regression Problem Using Lagrange multipliers to solve the spherical constraint problem results in: ( ) A T A + λi x = A T b where: λ > 0, x 2 = α This is the solution to the ridge regression problem: min Ax b 2 2 + λ x 2 2 We need some procedure for selecting a suitable λ. CICN may05/15

Define the problem: x k (λ) = argmin x D k (Ax b) 2 2 + λ x 2 2 where D k = I e k e T k is the matrix operator for removing one of the rows. Select λ to minimize the cross-validation weighted square error: C(λ) = 1 m w k (a T k m x k(λ) b k ) 2 k=1 This means choosing a λ that does not make the final model rely to much on any one observation. CICN may05/16

Through some calculation, we find that: C(λ) = 1 m ( r k w k m r k / b k where r k is an element in the residual vector r = b Ax(λ). The expression inside the parenthesis can be interpreted as an inverse measure of the impact of the kth observation on the model. k=1 ) 2 CICN may05/17

Using the SVD, the minimization problem is reduced to: C(λ) = 1 m b k ( r j=1 w k u σ 2 j kj b j m k=1 1 ( ) r σ 2 j=1 u2 j kj σ 2 j +λ where b = U T b as before. σ 2 j +λ ) 2 CICN may05/18

Equality Constrained Least Squares We consider a problem similar to LSQI, but with an equality constraint, i.e. a normal least squares problem with solution: with the constraint that: min Ax b 2 Bx = d We assume the following dimensions: A R m,n, B R p,n, b R m, d R p, rank(b) = p CICN may05/19

We start by calculating the QR-factorization of B T : with B T = Q R 0 A R n,n, R R p,p, 0 R n p,p and then add the following defintions: AQ = [A 1 A 2 ], Q T x = y z This gives us: Bx = Q R 0 T x = [ ] R T 0 Q T x = [ ] R T 0 y z = R T y CICN may05/20

We also get (because QQ T = I): Ax = (AQ) ( ) Q T x = [A 1 A 2 ] y z = A1 y + A 2 z So the problem becomes: subject to: min A 1 y + A 2 z b 2 R T y = d where y is determined directly from the constraint, and then inserted into the LS problem: min A 2 z (b A 1 y) 2 giving us a vector z which can be used to find the final answer: x = Q y z CICN may05/21

The Method of Weighting A method for approximating the solution of the LSE-problem (minimize Ax b s.t. Bx = d) through a normal, unconstrained LS problem: min A x b λb λd for large values of λ. x 2 CICN may05/22

The exact solution to the LSE problem: p v T i x = d x i + β i The approximation: x(λ) = p i=1 i=1 n i=p+1 α i u T i b + λ2 β 2 i vt i d α 2 i + λ2 β 2 x i + i u T i b α i x i n i=p+1 u T i b α i x i The difference: ( ) p α i β i u T i x(λ) x = b α iv T i ) d i=1 β i (α 2 i + x i λ2 β 2 i It is appearant that as λ grows larger, the approximation error is reduced. This method is attractive because it only utilizes ordinary LS solving. CICN may05/23