Lecture 5 Least-squares

Similar documents

8. Linear least-squares

CS3220 Lecture Notes: QR factorization and orthogonal transformations

Inner product. Definition of inner product

6. Cholesky factorization

5. Orthogonal matrices

MAT 242 Test 3 SOLUTIONS, FORM A

1 Review of Least Squares Solutions to Overdetermined Systems

Lecture 2 Linear functions and examples

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Solving Linear Systems, Continued and The Inverse of a Matrix

Factorization Theorems

Inner Product Spaces

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Similarity and Diagonalization. Similar Matrices

MAT188H1S Lec0101 Burbulla

Linear Algebra Methods for Data Mining

Least-Squares Intersection of Lines

1 Introduction to Matrices

Orthogonal Projections

Section 5.3. Section 5.3. u m ] l jj. = l jj u j + + l mj u m. v j = [ u 1 u j. l mj

3 Orthogonal Vectors and Matrices

160 CHAPTER 4. VECTOR SPACES

Inner Product Spaces and Orthogonality

Lecture 5: Singular Value Decomposition SVD (1)

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Linear Algebraic Equations, SVD, and the Pseudo-Inverse

Math 215 HW #6 Solutions

Lecture 2 Matrix Operations

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Lecture 8: Signal Detection and Noise Assumption

Notes on Symmetric Matrices

Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix multiplication).

LINEAR ALGEBRA. September 23, 2010

Linear Algebra Review. Vectors

Solution to Homework 2

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

14. Nonlinear least-squares

Using row reduction to calculate the inverse and the determinant of a square matrix

T ( a i x i ) = a i T (x i ).

Applied Linear Algebra I Review page 1

Vector and Matrix Norms

CONTROLLABILITY. Chapter Reachable Set and Controllability. Suppose we have a linear system described by the state equation

Understanding and Applying Kalman Filtering

EE 570: Location and Navigation

18.06 Problem Set 4 Solution Due Wednesday, 11 March 2009 at 4 pm in Total: 175 points.

University of Lille I PC first year list of exercises n 7. Review

S. Boyd EE102. Lecture 1 Signals. notation and meaning. common signals. size of a signal. qualitative properties of signals.

Dynamic data processing

Lecture 7 Circuit analysis via Laplace transform

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013

4.3 Least Squares Approximations

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

Constrained Least Squares

7 Gaussian Elimination and LU Factorization

Examination paper for TMA4205 Numerical Linear Algebra

Matrices 2. Solving Square Systems of Linear Equations; Inverse Matrices

Chapter 6. Orthogonality

Linear Threshold Units

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Question 2: How do you solve a matrix equation using the matrix inverse?

What is Linear Programming?

LINES AND PLANES CHRIS JOHNSON

Introduction to General and Generalized Linear Models

v w is orthogonal to both v and w. the three vectors v, w and v w form a right-handed set of vectors.

Linear Algebra: Determinants, Inverses, Rank

Math 312 Homework 1 Solutions

The Image Deblurring Problem

Applications to Data Smoothing and Image Processing I

Solving Systems of Linear Equations

Pricing and calibration in local volatility models via fast quantization

October 3rd, Linear Algebra & Properties of the Covariance Matrix

Solving Linear Systems of Equations. Gerald Recktenwald Portland State University Mechanical Engineering Department

Largest Fixed-Aspect, Axis-Aligned Rectangle

1 VECTOR SPACES AND SUBSPACES

Problem set on Cross Product

Background: State Estimation

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Part II Redundant Dictionaries and Pursuit Algorithms

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

Cyber-Security Analysis of State Estimators in Power Systems

Electrical Engineering 103 Applied Numerical Computing

Summary: Transformations. Lecture 14 Parameter Estimation Readings T&V Sec Parameter Estimation: Fitting Geometric Models

Lecture 3: Finding integer solutions to systems of linear equations

Lecture 13 Linear quadratic Lyapunov theory

1 Determinants and the Solvability of Linear Systems

DATA ANALYSIS II. Matrix Algorithms

Lecture 1: Schur s Unitary Triangularization Theorem

Section Inner Products and Norms

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication

Operation Count; Numerical Linear Algebra

A QUICK GUIDE TO THE FORMULAS OF MULTIVARIABLE CALCULUS

MATH APPLIED MATRIX THEORY

Lecture 5 Rational functions and partial fraction expansion

1 Solving LPs: The Simplex Algorithm of George Dantzig

Numerical Methods I Eigenvalue Problems

Transcription:

EE263 Autumn 2007-08 Stephen Boyd Lecture 5 Least-squares least-squares (approximate) solution of overdetermined equations projection and orthogonality principle least-squares estimation BLUE property 5 1

Overdetermined linear equations consider y = Ax where A R m n is (strictly) skinny, i.e., m > n called overdetermined set of linear equations (more equations than unknowns) for most y, cannot solve for x one approach to approximately solve y = Ax: define residual or error r = Ax y find x = x ls that minimizes r x ls called least-squares (approximate) solution of y = Ax Least-squares 5 2

Geometric interpretation Ax ls is point in R(A) closest to y (Ax ls is projection of y onto R(A)) y r Ax ls R(A) Least-squares 5 3

Least-squares (approximate) solution assume A is full rank, skinny to find x ls, we ll minimize norm of residual squared, r 2 = x T A T Ax 2y T Ax + y T y set gradient w.r.t. x to zero: x r 2 = 2A T Ax 2A T y = 0 yields the normal equations: A T Ax = A T y assumptions imply A T A invertible, so we have x ls = (A T A) 1 A T y... a very famous formula Least-squares 5 4

x ls is linear function of y x ls = A 1 y if A is square x ls solves y = Ax ls if y R(A) A = (A T A) 1 A T is called the pseudo-inverse of A A is a left inverse of (full rank, skinny) A: A A = (A T A) 1 A T A = I Least-squares 5 5

Projection on R(A) Ax ls is (by definition) the point in R(A) that is closest to y, i.e., it is the projection of y onto R(A) Ax ls = P R(A) (y) the projection function P R(A) is linear, and given by P R(A) (y) = Ax ls = A(A T A) 1 A T y A(A T A) 1 A T is called the projection matrix (associated with R(A)) Least-squares 5 6

Orthogonality principle optimal residual r = Ax ls y = (A(A T A) 1 A T I)y is orthogonal to R(A): r,az = y T (A(A T A) 1 A T I) T Az = 0 for all z R n y r Ax ls R(A) Least-squares 5 7

Least-squares via QR factorization A R m n skinny, full rank factor as A = QR with Q T Q = I n, R R n n upper triangular, invertible pseudo-inverse is so x ls = R 1 Q T y (A T A) 1 A T = (R T Q T QR) 1 R T Q T = R 1 Q T projection on R(A) given by matrix A(A T A) 1 A T = AR 1 Q T = QQ T Least-squares 5 8

Least-squares via full QR factorization full QR factorization: A = [Q 1 Q 2 ] [ R1 0 ] with [Q 1 Q 2 ] R m m orthogonal, R 1 R n n upper triangular, invertible multiplication by orthogonal matrix doesn t change norm, so Ax y 2 = = [ ] 2 [Q R1 1 Q 2 ] x y 0 [ ] [Q 1 Q 2 ] T R1 [Q 1 Q 2 ] x [Q 0 1 Q 2 ] T y 2 Least-squares 5 9

[ ] = R1 x Q T 1 y 2 Q T 2 y = R 1 x Q T 1 y 2 + Q T 2 y 2 this is evidently minimized by choice x ls = R 1 1 QT 1 y (which make first term zero) residual with optimal x is Ax ls y = Q 2 Q T 2 y Q 1 Q T 1 gives projection onto R(A) Q 2 Q T 2 gives projection onto R(A) Least-squares 5 10

Least-squares estimation many applications in inversion, estimation, and reconstruction problems have form y = Ax + v x is what we want to estimate or reconstruct y is our sensor measurement(s) v is an unknown noise or measurement error (assumed small) ith row of A characterizes ith sensor Least-squares 5 11

least-squares estimation: choose as estimate ˆx that minimizes i.e., deviation between Aˆx y what we actually observed (y), and what we would observe if x = ˆx, and there were no noise (v = 0) least-squares estimate is just ˆx = (A T A) 1 A T y Least-squares 5 12

BLUE property linear measurement with noise: with A full rank, skinny y = Ax + v consider a linear estimator of form ˆx = By called unbiased if ˆx = x whenever v = 0 (i.e., no estimation error when there is no noise) same as BA = I, i.e., B is left inverse of A Least-squares 5 13

estimation error of unbiased linear estimator is x ˆx = x B(Ax + v) = Bv obviously, then, we d like B small (and BA = I) fact: A = (A T A) 1 A T is the smallest left inverse of A, in the following sense: for any B with BA = I, we have Bij 2 i,j i,j A 2 ij i.e., least-squares provides the best linear unbiased estimator (BLUE) Least-squares 5 14

Navigation from range measurements navigation using range measurements from distant beacons beacons k 4 unknown position x k 3 k 1k2 beacons far from unknown position x R 2, so linearization around x = 0 (say) nearly exact Least-squares 5 15

ranges y R 4 measured, with measurement noise v: y = k T 1 k T 2 k T 3 k T 4 where k i is unit vector from 0 to beacon i x + v measurement errors are independent, Gaussian, with standard deviation 2 (details not important) problem: estimate x R 2, given y R 4 (roughly speaking, a 2:1 measurement redundancy ratio) actual position is x = (5.59,10.58); measurement is y = ( 11.95, 2.84, 9.81, 2.81) Least-squares 5 16

Just enough measurements method y 1 and y 2 suffice to find x (when v = 0) compute estimate ˆx by inverting top (2 2) half of A: ˆx = B je y = [ 0 1.0 0 0 1.12 0.5 0 0 ] y = [ 2.84 11.9 ] (norm of error: 3.07) Least-squares 5 17

Least-squares method compute estimate ˆx by least-squares: ˆx = A y = [ 0.23 0.48 0.04 0.44 0.47 0.02 0.51 0.18 ] y = [ 4.95 10.26 ] (norm of error: 0.72) B je and A are both left inverses of A larger entries in B lead to larger estimation error Least-squares 5 18

Example from overview lecture u w y H(s) A/D signal u is piecewise constant, period 1sec, 0 t 10: u(t) = x j, j 1 t < j, j = 1,...,10 filtered by system with impulse response h(t): w(t) = t 0 h(t τ)u(τ) dτ sample at 10Hz: ỹ i = w(0.1i), i = 1,...,100 Least-squares 5 19

3-bit quantization: y i = Q(ỹ i ), i = 1,...,100, where Q is 3-bit quantizer characteristic Q(a) = (1/4)(round(4a + 1/2) 1/2) problem: estimate x R 10 given y R 100 example: s(t) u(t) w(t) y(t) 1 0 1 0 1 2 3 4 5 6 7 8 9 10 1.5 1 0.5 0 0 1 2 3 4 5 6 7 8 9 10 1 0 1 0 1 2 3 4 5 6 7 8 9 10 1 0 1 0 1 2 3 4 5 6 7 8 9 10 t Least-squares 5 20

we have y = Ax + v, where A R 100 10 is given by A ij = j j 1 h(0.1i τ) dτ v R 100 is quantization error: v i = Q(ỹ i ) ỹ i (so v i 0.125) least-squares estimate: x ls = (A T A) 1 A T y u(t) (solid) & û(t) (dotted) 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 10 t Least-squares 5 21

RMS error is x x ls 10 = 0.03 better than if we had no filtering! (RMS error 0.07) more on this later... Least-squares 5 22

some rows of B ls = (A T A) 1 A T : row 2 row 5 row 8 0.15 0.1 0.05 0 0.05 0 1 2 3 4 5 6 7 8 9 10 0.15 0.1 0.05 0 0.05 0 1 2 3 4 5 6 7 8 9 10 0.15 0.1 0.05 0 0.05 0 1 2 3 4 5 6 7 8 9 10 t rows show how sampled measurements of y are used to form estimate of x i for i = 2,5, 8 to estimate x 5, which is the original input signal for 4 t < 5, we mostly use y(t) for 3 t 7 Least-squares 5 23