Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580-587 CICN may05/1
Background The least squares problem: min Ax b 2 x Sometimes, we want x to be chosen from some proper subset S R n. Example: S = {x R n s.t. x 2 = 1} Such problems can be solved using the QR factorization and the singular value decomposition (SVD). CICN may05/2
Least Squares with a Quadratic Inequality Constraint (LSQI) General problem: min Ax b 2 x s.t. Bx d 2 α where: A R m,n (m n), b R m, B R p,n, d R p, α 0 CICN may05/3
Assume the generalized SVD of matrices A and B given as: U T AX = diag(α 1,..., α n ), U T U = I m V T BX = diag(β 1,..., β q ), V T V = I p, q = min{p, n} Assume also the following definitions: b U T b, d V T d, y X 1 x Then the problem becomes: min D A y b 2 y s.t. D B y d 2 α CICN may05/4
min D A y b 2 y s.t. D B y d 2 α Correctness: By inserting the definitions we get: D A y b 2 = U T AXX 1 x U T b 2 = U T (Ax b) 2 Multiplication with an orthogonal matrix does not affect the 2-norm. (The same result applies for the inequality constraint.) CICN may05/5
The objective function becomes: n (α i y i b ) 2 m i + b 2 i The constraint becomes: r We have: i=1 i=1 (β i y i d i ) 2 + p i=r +1 r = rank(b) i=n+1 d 2 i α2 β r +1 = β r +2 =... = β q = 0 CICN may05/6
We have a solution if and only if: p d 2 i α2 i=r +1 Otherwise, there is obviously no way to satisfy the constraint. CICN may05/7
Special Case: p i=r +1 d 2 i = α2 The first sum in (12.1.5) must equal zero, this means: y i = d i β i, i [1, r ] The remainder of the variables can be chosen to minimize the first sum in (12.1.4): y i = b i α i, i [r + 1, n] (Of course, if α i = 0, i [r + 1, n], this does not make any sense. We then choose y i = 0.) CICN may05/8
The General Case: p i=r +1 d 2 i < α2 The minimization (without regards to the constraint) is given by: b i /α i α i 0 y i = d i /β i α i = 0 This may or may not be a feasible solution, depending on whether it is in S. CICN may05/9
The Method of Lagrange Multipliers h(λ, y) = D A y b 2 2 + λ ( D B y d 2 2 α2) Solve h y i, i = 1,..., n, this yields: ( ) D T A D A + λd T B D B y = D T A b + λd T B d CICN may05/10
Solution using Lagrange multipliers: y i (λ) = b i α i bi +λβ i di α 2 i +λβ2 i α i i = 1, 2,..., q i = q + 1,..., n CICN may05/11
Determining the Lagrange parameter, λ Define: φ(λ) D B y(λ) d 2 2 = r i=1 ( α i β i bi α i di α 2 i + λβ2 i ) 2 + p i=r +1 Solve for φ(λ) = α 2. Because φ(0) > α 2 and the function is monotone decreasing for λ > 0, we know that there must be a unique, positive solution λ with φ(λ ) = α 2. d 2 i CICN may05/12
Algorithm: Spherical Constraint The special case B = I n, d = 0, α < 0 can be interpreted as selecting x from the interior of an n-dimensional sphere. It can be solved using the following algorithm: [U, Σ, V] SVD(A) b U T b r rank (A) CICN may05/13
Algorithm: Sperical Constraint if r i=1 ( bi λ solve σ i ) 2 > α 2 : ( r i=1 ( x ( ) r σ i b i i=1 v σ 2 i i +λ else: x r i=1 end if ( bi σ i ) v i σ i b i σ 2 i +λ ) 2 = α 2 ) Computing the SVD is the most computationally intense operation in the above algorithm. CICN may05/14
Spherical Constraint as Ridge Regression Problem Using Lagrange multipliers to solve the spherical constraint problem results in: ( ) A T A + λi x = A T b where: λ > 0, x 2 = α This is the solution to the ridge regression problem: min Ax b 2 2 + λ x 2 2 We need some procedure for selecting a suitable λ. CICN may05/15
Define the problem: x k (λ) = argmin x D k (Ax b) 2 2 + λ x 2 2 where D k = I e k e T k is the matrix operator for removing one of the rows. Select λ to minimize the cross-validation weighted square error: C(λ) = 1 m w k (a T k m x k(λ) b k ) 2 k=1 This means choosing a λ that does not make the final model rely to much on any one observation. CICN may05/16
Through some calculation, we find that: C(λ) = 1 m ( r k w k m r k / b k where r k is an element in the residual vector r = b Ax(λ). The expression inside the parenthesis can be interpreted as an inverse measure of the impact of the kth observation on the model. k=1 ) 2 CICN may05/17
Using the SVD, the minimization problem is reduced to: C(λ) = 1 m b k ( r j=1 w k u σ 2 j kj b j m k=1 1 ( ) r σ 2 j=1 u2 j kj σ 2 j +λ where b = U T b as before. σ 2 j +λ ) 2 CICN may05/18
Equality Constrained Least Squares We consider a problem similar to LSQI, but with an equality constraint, i.e. a normal least squares problem with solution: with the constraint that: min Ax b 2 Bx = d We assume the following dimensions: A R m,n, B R p,n, b R m, d R p, rank(b) = p CICN may05/19
We start by calculating the QR-factorization of B T : with B T = Q R 0 A R n,n, R R p,p, 0 R n p,p and then add the following defintions: AQ = [A 1 A 2 ], Q T x = y z This gives us: Bx = Q R 0 T x = [ ] R T 0 Q T x = [ ] R T 0 y z = R T y CICN may05/20
We also get (because QQ T = I): Ax = (AQ) ( ) Q T x = [A 1 A 2 ] y z = A1 y + A 2 z So the problem becomes: subject to: min A 1 y + A 2 z b 2 R T y = d where y is determined directly from the constraint, and then inserted into the LS problem: min A 2 z (b A 1 y) 2 giving us a vector z which can be used to find the final answer: x = Q y z CICN may05/21
The Method of Weighting A method for approximating the solution of the LSE-problem (minimize Ax b s.t. Bx = d) through a normal, unconstrained LS problem: min A x b λb λd for large values of λ. x 2 CICN may05/22
The exact solution to the LSE problem: p v T i x = d x i + β i The approximation: x(λ) = p i=1 i=1 n i=p+1 α i u T i b + λ2 β 2 i vt i d α 2 i + λ2 β 2 x i + i u T i b α i x i n i=p+1 u T i b α i x i The difference: ( ) p α i β i u T i x(λ) x = b α iv T i ) d i=1 β i (α 2 i + x i λ2 β 2 i It is appearant that as λ grows larger, the approximation error is reduced. This method is attractive because it only utilizes ordinary LS solving. CICN may05/23