8. Linear least-squares EE13 (Fall 211-12) definition examples and applications solution of a least-squares problem, normal equations 8-1
Definition overdetermined linear equations if b range(a), cannot solve for x Ax = b (A is m n with m > n) least-squares formulation minimize Ax b = m n ( a ij x j b i ) 2 i=1 j=1 1/2 r = Ax b is called the residual or error x with smallest residual norm r is called the least-squares solution equivalent to minimizing Ax b 2 Linear least-squares 8-2
Example A = 2 1 1 2, b = 1 1 least-squares solution minimize (2x 1 1) 2 +( x 1 +x 2 ) 2 +(2x 2 +1) 2 to find optimal x 1, x 2, set derivatives w.r.t. x 1 and x 2 equal to zero: 1x 1 2x 2 4 =, 2x 1 +1x 2 +4 = solution x 1 = 1/3, x 2 = 1/3 (much more on practical algorithms for LS problems later) Linear least-squares 8-3
r 2 1 = (2x 1 1) 2 r 2 2 = ( x 1 + x 2 ) 2 3 2 2 15 1 1 5 2 2 2 2 x 2 2 2 x 1 x 2 2 2 x 1 r 2 3 = (2x 2 + 1) 2 r 2 1 + r2 2 + r2 3 3 6 2 4 1 2 2 2 2 2 x 2 2 2 x 1 x 2 2 2 x 1 Linear least-squares 8-4
Outline definition examples and applications solution of a least-squares problem, normal equations
Data fitting fit a function g(t) = x 1 g 1 (t)+x 2 g 2 (t)+ +x n g n (t) to data (t 1,y 1 ),..., (t m,y m ), i.e., choose coefficients x 1,..., x n so that g(t 1 ) y 1, g(t 2 ) y 2,..., g(t m ) y m g i (t) : R R are given functions (basis functions) problem variables: the coefficients x 1, x 2,..., x n usually m n, hence no exact solution with g(t i ) = y i for all i applications: developing simple, approximate model of observed data Linear least-squares 8-5
Least-squares data fitting compute x by minimizing m (g(t i ) y i ) 2 = i=1 m (x 1 g 1 (t i )+x 2 g 2 (t i )+ +x n g n (t i ) y i ) 2 i=1 in matrix notation: minimize Ax b 2 where A = g 1 (t 1 ) g 2 (t 1 ) g 3 (t 1 ) g n (t 1 ) g 1 (t 2 ) g 2 (t 2 ) g 3 (t 2 ) g n (t 2 ).... g 1 (t m ) g 2 (t m ) g 3 (t m ) g n (t m ), b = y 1 y 2. y m Linear least-squares 8-6
Example: data fitting with polynomials g(t) = x 1 +x 2 t+x 3 t 2 + +x n t n 1 basis functions are g k (t) = t k 1, k = 1,...,n A = 1 t 1 t 2 1 t n 1 1 1 t 2 t 2 2 t n 1 2.... 1 t m t 2 m t n 1 m, b = y 1 y 2. y m interpolation (m = n): can satisfy g(t i ) = y i exactly by solving Ax = b approximation (m > n): make error small by minimizing Ax b Linear least-squares 8-7
example. fit a polynomial to f(t) = 1/(1+25t 2 ) on [ 1,1] pick m = n points t i in [ 1,1], and calculate y i = 1/(1+25t 2 i ) interpolate by solving Ax = b 1.5 n = 5 8 n = 15 1 6 4.5 2.5 1.5.5 1 2 1.5.5 1 (dashed line: f; solid line: polynomial g; circles: the points (t i,y i )) increasing n does not improve the overall quality of the fit Linear least-squares 8-8
same example by approximation pick m = 5 points t i in [ 1,1] fit polynomial by minimizing Ax b n = 5 n = 15 1.8.6.4.2.2 1.5.5 1 1.8.6.4.2.2 1.5.5 1 (dashed line: f; solid line: polynomial g; circles: the points (t i,y i )) much better fit overall Linear least-squares 8-9
Least-squares estimation y = Ax+w x is what we want to estimate or reconstruct y is our measurement(s) w is an unknown noise or measurement error (assumed small) ith row of A characterizes ith sensor or ith measurement least-squares estimation choose as estimate the vector ˆx that minimizes Aˆx y i.e., minimize the deviation between what we actually observed (y), and what we would observe if x = ˆx and there were no noise (w = ) Linear least-squares 8-1
Navigation by range measurements find position (u,v) in a plane from distances to beacons at positions (p i,q i ) beacons (p 1,q 1 ) (p 4,q 4 ) ρ ρ 1 4 ρ 2 (u,v) (p 2,q 2 ) unknown position ρ 3 (p 3,q 3 ) four nonlinear equations in two variables u, v: (u pi ) 2 +(v q i ) 2 = ρ i for i = 1,2,3,4 ρ i is the measured distance from unknown position (u,v) to beacon i Linear least-squares 8-11
linearized distance function: assume u = u + u, v = v + v where u, v are known (e.g., position a short time ago) u, v are small (compared to ρ i s) (u + u p i ) 2 +(v + v q i ) 2 (u p i ) 2 +(v q i ) 2 + (u p i ) u+(v q i ) v (u p i ) 2 +(v q i ) 2 gives four linear equations in the variables u, v: (u p i ) u+(v q i ) v (u p i ) 2 +(v q i ) 2 ρ i (u p i ) 2 +(v q i ) 2 for i = 1,2,3,4 Linear least-squares 8-12
linearized equations Ax b where x = ( u, v) and A is 4 2 with b i = ρ i (u p i ) 2 +(v q i ) 2 a i1 = (u p i ) (u p i ) 2 +(v q i ) 2 a i2 = (v q i ) (u p i ) 2 +(v q i ) 2 due to linearization and measurement error, we do not expect an exact solution (Ax = b) we can try to find u and v that almost satisfy the equations Linear least-squares 8-13
numerical example beacons at positions (1,), ( 1,2), (3,9), (1,1) measured distances ρ = (8.22, 11.9, 7.8, 11.33) (unknown) actual position is (2, 2) linearized range equations (linearized around (u,v ) = (,)) 1...98.2.32.95.71.71 [ u v ] 1.77 1.72 2.41 2.81 least-squares solution: ( u, v) = (1.97, 1.9) (norm of error is.1) Linear least-squares 8-14
Least-squares system identification measure input u(t) and output y(t) for t =,...,N of an unknown system u(t) unknown system y(t) example (N = 7): 4 5 2 u(t) y(t) 2 4 2 4 6 t 5 2 4 6 system identification problem: find reasonable model for system based on measured I/O data u, y t Linear least-squares 8-15
moving average model y model (t) = h u(t)+h 1 u(t 1)+h 2 u(t 2)+ +h n u(t n) where y model (t) is the model output a simple and widely used model predicted output is a linear combination of current and n previous inputs h,...,h n are parameters of the model called a moving average (MA) model with n delays least-squares identification: choose the model that minimizes the error E = ( N ) 1/2 (y model (t) y(t)) 2 t=n Linear least-squares 8-16
formulation as a linear least-squares problem: ( N ) 1/2 E = (h u(t)+h 1 u(t 1)+ +h n u(t n) y(t)) 2 t=n = Ax b A = x = u(n) u(n 1) u(n 2) u() u(n+1) u(n) u(n 1) u(1) u(n+2) u(n+1) u(n) u(2).... u(n) u(n 1) u(n 2) u(n n) h h 1 h 2. h n, b = y(n) y(n+1) y(n+2). y(n) Linear least-squares 8-17
example (I/O data of page 8-15) with n = 7: least-squares solution is h =.24, h 1 =.2819, h 2 =.4176, h 3 =.3536, h 4 =.2425, h 5 =.4873, h 6 =.284, h 7 =.4412 5 4 3 solid: y(t): actual output dashed: y model (t) 2 1 1 2 3 4 1 2 3 4 5 6 7 t Linear least-squares 8-18
model order selection: how large should n be? 1 relative error E/ y.8.6.4.2 2 4 suggests using largest possible n for smallest error much more important question: how good is the model at predicting new data (i.e., not used to calculate the model)? n Linear least-squares 8-19
model validation: test model on a new data set (from the same system) 4 5 2 ū(t) ȳ(t) 2 4 2 4 6 t 5 2 4 6 t relative prediction error 1.8.6.4.2 2 4 n validation data modeling data for n too large the predictive ability of the model becomes worse! validation data suggest n = 1 Linear least-squares 8-2
for n = 5 the actual and predicted outputs on system identification and model validation data are: 5 I/O set used to compute model solid: y(t) dashed: y model (t) 5 model validation I/O set solid: ȳ(t) dashed: ȳ model (t) 5 2 4 6 t 5 2 4 6 t loss of predictive ability when n is too large is called overfitting or overmodeling Linear least-squares 8-21
Outline definition examples and applications solution of a least-squares problem, normal equations
Geometric interpretation of a LS problem minimize Ax b 2 A is m n with columns a 1,..., a n Ax b is the distance of b to the vector Ax = x 1 a 1 +x 2 a 2 + +x n a n solution x ls gives the linear combination of the columns of A closest to b Ax ls is the projection of b on the range of A Linear least-squares 8-22
example A = 1 1 1 2, b = 1 4 2 a 1 a 2 b Ax ls = 2a 1 + a 2 least-squares solution x ls Ax ls = 1 4, x ls = [ 2 1 ] Linear least-squares 8-23
The solution of a least-squares problem if A is left-invertible, then x ls = (A T A) 1 A T b is the unique solution of the least-squares problem minimize Ax b 2 in other words, if x x ls, then Ax b 2 > Ax ls b 2 recall from page 4-25 that A T A is positive definite and that (A T A) 1 A T is a left-inverse of A Linear least-squares 8-24
proof we show that Ax b 2 > Ax ls b 2 for x x ls : Ax b 2 = A(x x ls )+(Ax ls b) 2 = A(x x ls ) 2 + Ax ls b 2 > Ax ls b 2 2nd step follows from A(x x ls ) (Ax ls b): (A(x x ls )) T (Ax ls b) = (x x ls ) T (A T Ax ls A T b) = 3rd step follows from zero nullspace property of A: x x ls = A(x x ls ) Linear least-squares 8-25
The normal equations (A T A)x = A T b if A is left-invertible: least-squares solution can be found by solving the normal equations n equations in n variables with a positive definite coefficient matrix can be solved using Cholesky factorization Linear least-squares 8-26