8. Linear least-squares

Similar documents

Lecture 5 Least-squares

1 Review of Least Squares Solutions to Overdetermined Systems

14. Nonlinear least-squares

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

6. Cholesky factorization

Lecture 2 Linear functions and examples

5. Orthogonal matrices

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

160 CHAPTER 4. VECTOR SPACES

Linearly Independent Sets and Linearly Dependent Sets

Inner product. Definition of inner product

We shall turn our attention to solving linear systems of equations. Ax = b

Systems of Linear Equations

Math 215 HW #6 Solutions

Inner Product Spaces

Notes on Factoring. MA 206 Kurt Bryan

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

Chapter 6. Orthogonality

Least Squares Estimation

Solutions to Math 51 First Exam January 29, 2015

Manifold Learning Examples PCA, LLE and ISOMAP

Linear Algebraic Equations, SVD, and the Pseudo-Inverse

9 Multiplication of Vectors: The Scalar or Dot Product

Geometric Camera Parameters

Background: State Estimation

Dynamic data processing

Introduction Epipolar Geometry Calibration Methods Further Readings. Stereo Camera Calibration

1 Solving LPs: The Simplex Algorithm of George Dantzig

IEOR 4404 Homework #2 Intro OR: Deterministic Models February 14, 2011 Prof. Jay Sethuraman Page 1 of 5. Homework #2

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

by the matrix A results in a vector which is a reflection of the given

Arithmetic and Algebra of Matrices

The Assignment Problem and the Hungarian Method

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Vector Spaces 4.4 Spanning and Independence

Linear Algebra Methods for Data Mining

Orthogonal Projections

Lecture 14: Section 3.3

1 Introduction to Matrices

Reduced echelon form: Add the following conditions to conditions 1, 2, and 3 above:

Lecture 1: Systems of Linear Equations

Lecture 5: Singular Value Decomposition SVD (1)

4.3 Least Squares Approximations

MAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A =

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

CS3220 Lecture Notes: QR factorization and orthogonal transformations

CONTROLLABILITY. Chapter Reachable Set and Controllability. Suppose we have a linear system described by the state equation

Recall that two vectors in are perpendicular or orthogonal provided that their dot

Introduction to General and Generalized Linear Models

Electrical Engineering 103 Applied Numerical Computing

Understanding and Applying Kalman Filtering

Linear Algebra Notes

What are the place values to the left of the decimal point and their associated powers of ten?

Figure 2.1: Center of mass of four points.

Lectures notes on orthogonal matrices (with exercises) Linear Algebra II - Spring 2004 by D. Klain

x y The matrix form, the vector form, and the augmented matrix form, respectively, for the system of equations are

Inner Product Spaces and Orthogonality

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.

Section 1.1. Introduction to R n

MATH2210 Notebook 1 Fall Semester 2016/ MATH2210 Notebook Solving Systems of Linear Equations... 3

STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp

Curves and Surfaces. Goals. How do we draw surfaces? How do we specify a surface? How do we approximate a surface?

University of Lille I PC first year list of exercises n 7. Review

Department of Chemical Engineering ChE-101: Approaches to Chemical Engineering Problem Solving MATLAB Tutorial VI

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system

1 Cubic Hermite Spline Interpolation

LS.6 Solution Matrices

A QCQP Approach to Triangulation. Chris Aholt, Sameer Agarwal, and Rekha Thomas University of Washington 2 Google, Inc.

1.2 Solving a System of Linear Equations

Definition 8.1 Two inequalities are equivalent if they have the same solution set. Add or Subtract the same value on both sides of the inequality.

Direct Methods for Solving Linear Systems. Matrix Factorization

Introduction to the Finite Element Method (FEM)

Solution of Linear Systems

LINES AND PLANES CHRIS JOHNSON

By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

1 Norms and Vector Spaces

1 VECTOR SPACES AND SUBSPACES

Numerical Analysis Lecture Notes

Sect Solving Equations Using the Zero Product Rule

α = u v. In other words, Orthogonal Projection

Chapter 7. Matrices. Definition. An m n matrix is an array of numbers set out in m rows and n columns. Examples. (

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 10

DATA ANALYSIS II. Matrix Algorithms

Question 2: How do you solve a matrix equation using the matrix inverse?

Unit 1. Today I am going to discuss about Transportation problem. First question that comes in our mind is what is a transportation problem?

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

LINEAR ALGEBRA W W L CHEN

CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION

minimal polyonomial Example

Similarity and Diagonalization. Similar Matrices

Factorization Theorems

Vector Spaces. Chapter R 2 through R n

These axioms must hold for all vectors ū, v, and w in V and all scalars c and d.

Linear Algebra and TI 89

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

COMBINED NEURAL NETWORKS FOR TIME SERIES ANALYSIS

(67902) Topics in Theory and Complexity Nov 2, Lecture 7

Transcription:

8. Linear least-squares EE13 (Fall 211-12) definition examples and applications solution of a least-squares problem, normal equations 8-1

Definition overdetermined linear equations if b range(a), cannot solve for x Ax = b (A is m n with m > n) least-squares formulation minimize Ax b = m n ( a ij x j b i ) 2 i=1 j=1 1/2 r = Ax b is called the residual or error x with smallest residual norm r is called the least-squares solution equivalent to minimizing Ax b 2 Linear least-squares 8-2

Example A = 2 1 1 2, b = 1 1 least-squares solution minimize (2x 1 1) 2 +( x 1 +x 2 ) 2 +(2x 2 +1) 2 to find optimal x 1, x 2, set derivatives w.r.t. x 1 and x 2 equal to zero: 1x 1 2x 2 4 =, 2x 1 +1x 2 +4 = solution x 1 = 1/3, x 2 = 1/3 (much more on practical algorithms for LS problems later) Linear least-squares 8-3

r 2 1 = (2x 1 1) 2 r 2 2 = ( x 1 + x 2 ) 2 3 2 2 15 1 1 5 2 2 2 2 x 2 2 2 x 1 x 2 2 2 x 1 r 2 3 = (2x 2 + 1) 2 r 2 1 + r2 2 + r2 3 3 6 2 4 1 2 2 2 2 2 x 2 2 2 x 1 x 2 2 2 x 1 Linear least-squares 8-4

Outline definition examples and applications solution of a least-squares problem, normal equations

Data fitting fit a function g(t) = x 1 g 1 (t)+x 2 g 2 (t)+ +x n g n (t) to data (t 1,y 1 ),..., (t m,y m ), i.e., choose coefficients x 1,..., x n so that g(t 1 ) y 1, g(t 2 ) y 2,..., g(t m ) y m g i (t) : R R are given functions (basis functions) problem variables: the coefficients x 1, x 2,..., x n usually m n, hence no exact solution with g(t i ) = y i for all i applications: developing simple, approximate model of observed data Linear least-squares 8-5

Least-squares data fitting compute x by minimizing m (g(t i ) y i ) 2 = i=1 m (x 1 g 1 (t i )+x 2 g 2 (t i )+ +x n g n (t i ) y i ) 2 i=1 in matrix notation: minimize Ax b 2 where A = g 1 (t 1 ) g 2 (t 1 ) g 3 (t 1 ) g n (t 1 ) g 1 (t 2 ) g 2 (t 2 ) g 3 (t 2 ) g n (t 2 ).... g 1 (t m ) g 2 (t m ) g 3 (t m ) g n (t m ), b = y 1 y 2. y m Linear least-squares 8-6

Example: data fitting with polynomials g(t) = x 1 +x 2 t+x 3 t 2 + +x n t n 1 basis functions are g k (t) = t k 1, k = 1,...,n A = 1 t 1 t 2 1 t n 1 1 1 t 2 t 2 2 t n 1 2.... 1 t m t 2 m t n 1 m, b = y 1 y 2. y m interpolation (m = n): can satisfy g(t i ) = y i exactly by solving Ax = b approximation (m > n): make error small by minimizing Ax b Linear least-squares 8-7

example. fit a polynomial to f(t) = 1/(1+25t 2 ) on [ 1,1] pick m = n points t i in [ 1,1], and calculate y i = 1/(1+25t 2 i ) interpolate by solving Ax = b 1.5 n = 5 8 n = 15 1 6 4.5 2.5 1.5.5 1 2 1.5.5 1 (dashed line: f; solid line: polynomial g; circles: the points (t i,y i )) increasing n does not improve the overall quality of the fit Linear least-squares 8-8

same example by approximation pick m = 5 points t i in [ 1,1] fit polynomial by minimizing Ax b n = 5 n = 15 1.8.6.4.2.2 1.5.5 1 1.8.6.4.2.2 1.5.5 1 (dashed line: f; solid line: polynomial g; circles: the points (t i,y i )) much better fit overall Linear least-squares 8-9

Least-squares estimation y = Ax+w x is what we want to estimate or reconstruct y is our measurement(s) w is an unknown noise or measurement error (assumed small) ith row of A characterizes ith sensor or ith measurement least-squares estimation choose as estimate the vector ˆx that minimizes Aˆx y i.e., minimize the deviation between what we actually observed (y), and what we would observe if x = ˆx and there were no noise (w = ) Linear least-squares 8-1

Navigation by range measurements find position (u,v) in a plane from distances to beacons at positions (p i,q i ) beacons (p 1,q 1 ) (p 4,q 4 ) ρ ρ 1 4 ρ 2 (u,v) (p 2,q 2 ) unknown position ρ 3 (p 3,q 3 ) four nonlinear equations in two variables u, v: (u pi ) 2 +(v q i ) 2 = ρ i for i = 1,2,3,4 ρ i is the measured distance from unknown position (u,v) to beacon i Linear least-squares 8-11

linearized distance function: assume u = u + u, v = v + v where u, v are known (e.g., position a short time ago) u, v are small (compared to ρ i s) (u + u p i ) 2 +(v + v q i ) 2 (u p i ) 2 +(v q i ) 2 + (u p i ) u+(v q i ) v (u p i ) 2 +(v q i ) 2 gives four linear equations in the variables u, v: (u p i ) u+(v q i ) v (u p i ) 2 +(v q i ) 2 ρ i (u p i ) 2 +(v q i ) 2 for i = 1,2,3,4 Linear least-squares 8-12

linearized equations Ax b where x = ( u, v) and A is 4 2 with b i = ρ i (u p i ) 2 +(v q i ) 2 a i1 = (u p i ) (u p i ) 2 +(v q i ) 2 a i2 = (v q i ) (u p i ) 2 +(v q i ) 2 due to linearization and measurement error, we do not expect an exact solution (Ax = b) we can try to find u and v that almost satisfy the equations Linear least-squares 8-13

numerical example beacons at positions (1,), ( 1,2), (3,9), (1,1) measured distances ρ = (8.22, 11.9, 7.8, 11.33) (unknown) actual position is (2, 2) linearized range equations (linearized around (u,v ) = (,)) 1...98.2.32.95.71.71 [ u v ] 1.77 1.72 2.41 2.81 least-squares solution: ( u, v) = (1.97, 1.9) (norm of error is.1) Linear least-squares 8-14

Least-squares system identification measure input u(t) and output y(t) for t =,...,N of an unknown system u(t) unknown system y(t) example (N = 7): 4 5 2 u(t) y(t) 2 4 2 4 6 t 5 2 4 6 system identification problem: find reasonable model for system based on measured I/O data u, y t Linear least-squares 8-15

moving average model y model (t) = h u(t)+h 1 u(t 1)+h 2 u(t 2)+ +h n u(t n) where y model (t) is the model output a simple and widely used model predicted output is a linear combination of current and n previous inputs h,...,h n are parameters of the model called a moving average (MA) model with n delays least-squares identification: choose the model that minimizes the error E = ( N ) 1/2 (y model (t) y(t)) 2 t=n Linear least-squares 8-16

formulation as a linear least-squares problem: ( N ) 1/2 E = (h u(t)+h 1 u(t 1)+ +h n u(t n) y(t)) 2 t=n = Ax b A = x = u(n) u(n 1) u(n 2) u() u(n+1) u(n) u(n 1) u(1) u(n+2) u(n+1) u(n) u(2).... u(n) u(n 1) u(n 2) u(n n) h h 1 h 2. h n, b = y(n) y(n+1) y(n+2). y(n) Linear least-squares 8-17

example (I/O data of page 8-15) with n = 7: least-squares solution is h =.24, h 1 =.2819, h 2 =.4176, h 3 =.3536, h 4 =.2425, h 5 =.4873, h 6 =.284, h 7 =.4412 5 4 3 solid: y(t): actual output dashed: y model (t) 2 1 1 2 3 4 1 2 3 4 5 6 7 t Linear least-squares 8-18

model order selection: how large should n be? 1 relative error E/ y.8.6.4.2 2 4 suggests using largest possible n for smallest error much more important question: how good is the model at predicting new data (i.e., not used to calculate the model)? n Linear least-squares 8-19

model validation: test model on a new data set (from the same system) 4 5 2 ū(t) ȳ(t) 2 4 2 4 6 t 5 2 4 6 t relative prediction error 1.8.6.4.2 2 4 n validation data modeling data for n too large the predictive ability of the model becomes worse! validation data suggest n = 1 Linear least-squares 8-2

for n = 5 the actual and predicted outputs on system identification and model validation data are: 5 I/O set used to compute model solid: y(t) dashed: y model (t) 5 model validation I/O set solid: ȳ(t) dashed: ȳ model (t) 5 2 4 6 t 5 2 4 6 t loss of predictive ability when n is too large is called overfitting or overmodeling Linear least-squares 8-21

Outline definition examples and applications solution of a least-squares problem, normal equations

Geometric interpretation of a LS problem minimize Ax b 2 A is m n with columns a 1,..., a n Ax b is the distance of b to the vector Ax = x 1 a 1 +x 2 a 2 + +x n a n solution x ls gives the linear combination of the columns of A closest to b Ax ls is the projection of b on the range of A Linear least-squares 8-22

example A = 1 1 1 2, b = 1 4 2 a 1 a 2 b Ax ls = 2a 1 + a 2 least-squares solution x ls Ax ls = 1 4, x ls = [ 2 1 ] Linear least-squares 8-23

The solution of a least-squares problem if A is left-invertible, then x ls = (A T A) 1 A T b is the unique solution of the least-squares problem minimize Ax b 2 in other words, if x x ls, then Ax b 2 > Ax ls b 2 recall from page 4-25 that A T A is positive definite and that (A T A) 1 A T is a left-inverse of A Linear least-squares 8-24

proof we show that Ax b 2 > Ax ls b 2 for x x ls : Ax b 2 = A(x x ls )+(Ax ls b) 2 = A(x x ls ) 2 + Ax ls b 2 > Ax ls b 2 2nd step follows from A(x x ls ) (Ax ls b): (A(x x ls )) T (Ax ls b) = (x x ls ) T (A T Ax ls A T b) = 3rd step follows from zero nullspace property of A: x x ls = A(x x ls ) Linear least-squares 8-25

The normal equations (A T A)x = A T b if A is left-invertible: least-squares solution can be found by solving the normal equations n equations in n variables with a positive definite coefficient matrix can be solved using Cholesky factorization Linear least-squares 8-26