Linear Algebra Methods for Data Mining

Similar documents

CS3220 Lecture Notes: QR factorization and orthogonal transformations

Lecture 5: Singular Value Decomposition SVD (1)

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Solving Linear Systems, Continued and The Inverse of a Matrix

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

Inner product. Definition of inner product

Linear Algebra Review. Vectors

[1] Diagonal factorization

Inner Product Spaces

Inner Product Spaces and Orthogonality

α = u v. In other words, Orthogonal Projection

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

Using row reduction to calculate the inverse and the determinant of a square matrix

1 Introduction to Matrices

3 Orthogonal Vectors and Matrices

160 CHAPTER 4. VECTOR SPACES

Matrix Differentiation

Solving Systems of Linear Equations

Lecture 5 Least-squares

Examination paper for TMA4205 Numerical Linear Algebra

Lectures notes on orthogonal matrices (with exercises) Linear Algebra II - Spring 2004 by D. Klain

Factorization Theorems

5. Orthogonal matrices

Similarity and Diagonalization. Similar Matrices

Similar matrices and Jordan form

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.

Lecture 14: Section 3.3

Orthogonal Diagonalization of Symmetric Matrices

Lecture 3: Linear methods for classification

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Constrained Least Squares

Least Squares Estimation

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

6. Cholesky factorization

LINEAR ALGEBRA. September 23, 2010

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

14. Nonlinear least-squares

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

8. Linear least-squares

A note on companion matrices

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Linear Algebra: Determinants, Inverses, Rank

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix multiplication).

1 VECTOR SPACES AND SUBSPACES

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

by the matrix A results in a vector which is a reflection of the given

Notes on Symmetric Matrices

Recall that two vectors in are perpendicular or orthogonal provided that their dot

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Section 5.3. Section 5.3. u m ] l jj. = l jj u j + + l mj u m. v j = [ u 1 u j. l mj

MAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A =

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

Chapter 6. Orthogonality

Continuity of the Perron Root

CURVE FITTING LEAST SQUARES APPROXIMATION

Linear algebra and the geometry of quadratic equations. Similarity transformations and orthogonal matrices

Text Analytics (Text Mining)

Solutions to Math 51 First Exam January 29, 2015

Section Inner Products and Norms

MAT 242 Test 3 SOLUTIONS, FORM A

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

Lecture notes on linear algebra

Lecture 2 Matrix Operations

MAT188H1S Lec0101 Burbulla

Lecture 4: Partitioned Matrices and Determinants

5.3 The Cross Product in R 3

Penalized Logistic Regression and Classification of Microarray Data

Lecture L3 - Vectors, Matrices and Coordinate Transformations

minimal polyonomial Example

These axioms must hold for all vectors ū, v, and w in V and all scalars c and d.

1 Review of Least Squares Solutions to Overdetermined Systems

Unified Lecture # 4 Vectors

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Direct Methods for Solving Linear Systems. Matrix Factorization

Linearly Independent Sets and Linearly Dependent Sets

Chapter 17. Orthogonal Matrices and Symmetries of Space

Subspaces of R n LECTURE Subspaces

Lecture 1: Schur s Unitary Triangularization Theorem

Eigenvalues and Eigenvectors

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

8 Square matrices continued: Determinants

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances

Problem Set 5 Due: In class Thursday, Oct. 18 Late papers will be accepted until 1:00 PM Friday.

Linear Algebra Notes

Section Continued

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Data Mining: Algorithms and Applications Matrix Math Review

Methods for Finding Bases

Applied Linear Algebra I Review page 1

Factoring Cubic Polynomials

The Image Deblurring Problem

Introduction to Matrices for Engineers

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Solving Linear Systems of Equations. Gerald Recktenwald Portland State University Mechanical Engineering Department

5.04 Principles of Inorganic Chemistry II

October 3rd, Linear Algebra & Properties of the Covariance Matrix

Transcription:

Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Lecture 3: QR, least squares, linear regression Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki

QR decomposition Any matrix A R m n, m n, can be transformed to upper triangular form by an orthogonal matrix: A = Q ( R 0 ) If the columns of A are linearly independent, then R is non-singular. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 1

How? By using the Givens rotation: Let x be a vector. The parameters c and s, c 2 + s 2 = 1, can be chosen so that multiplication of x by G = 1 0 0 0 0 c 0 s 0 0 1 0 0 s 0 c will zero the element 4 in vector x by a rotation in plane (2,4). How? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 2

x θ Gx Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 3

Skinny QR decomposition Partition Q = (Q 1 Q 2 ), where Q 1 R m n : A = (Q 1 Q 2 ) ( R1 0 ) = Q 1 R 1. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 4

A = 1 1 1 1 2 4 1 3 9 1 4 16 [Q,R]=qr(A) %the QR decomposition Q = -0.5000 0.6708 0.5000 0.2236-0.5000 0.2236-0.5000-0.6708-0.5000-0.2236-0.5000 0.6708-0.5000-0.6708 0.5000-0.2236 R = -2.0000-5.0000-15.0000 0-2.2361-11.1803 0 0 2.0000 0 0 0 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 5

[Q,R]=qr(A,0) %the skinny version of QR Q = -0.5000 0.6708 0.5000-0.5000 0.2236-0.5000-0.5000-0.2236-0.5000-0.5000-0.6708 0.5000 R = -2.0000-5.0000-15.0000 0-2.2361-11.1803 0 0 2.0000 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 6

Uniqueness? Suppose A R m n has full column rank (i.e. the column vectors are linearly independent). The skinny QR factorization A = Q 1 R 1 is unique where Q 1 has orthonormal columns and R 1 is upper triangular with positive diagonal entries. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 7

What is QR good for? It will come up when we discuss the computation of some matrix decompositions It can also be used for solving the least squares problem. Other applications exist. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 8

Least squares for linear regression Consider problems of the following form: given a number of measurements X = [x 1...x n ] and an outcome y, build a linear model ŷ = b 0 + n j=1 x j b j, which uses the measurements X to predict the outcome y. X = clinical measures of patients, y = level of cancer specific antigen. X = atmospheric measurements of each day, y = occurence of spontaneous particle formation. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 9

Least squares for linear regression The model ŷ = b 0 + n j=1 x jb j can be written in the form ŷ = X T b, b = (b 1... b n b 0 ) T. To do this, one must append X = [x 1... x n x n+1 ], where x n+1 is a vector of ones. Fitting a linear model to data is usually done using the method of least squares. Note: X need not be a square matrix! Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 10

We have measurement data: Example x 1 2 3 4 5 y 7.97 10.2 14.2 16.0 21.2 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 11

22 20 18 16 14 12 10 8 6 0 1 2 3 4 5 6 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 12

Example (continued) x 1 2 3 4 5 y 7.97 10.2 14.2 16.0 21.2 We wish to find α and β such that αx + β = y. Thus α + β = 7.97 2α + β = 10.2 3α + β = 14.2 4α + β = 16.0 5α + β = 21.2 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 13

In matrix form: Example (continued) 1 1 2 1 3 1 4 1 5 1 ( ) α β = 7.97 10.2 14.2 16.0 21.2 Overdetermined! (More equations than unknowns.) Solve using the least squares method. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 14

Example X = atmospheric measurements of each day: temperature, UV-A: X = ( ) t1 t 2... t n r 1 r 2... r n = ( temperatures UV-A measurements ) y = which month each day belongs to. Choices are: August (y = 0) or September (y = 1). Looking for b such that ŷ = X T b and ŷ y. Use method of least squares. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 15

1.5 August days (b) and September days (r) 1 0.5 UV A 0 0.5 1 1.5 1.5 1 0.5 0 0.5 1 1.5 temperature T Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 16

The least squares problem Let A R m n, m n. The system Ax = b is called overdetermined: more equations than unknown. Usually such a system has no solution. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 17

Example m = 3, n = 2. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 18

What to do? make the residual vector r = b Ax as small as possible. But how? Make r orthogonal to the columns of A: r T ( ) a 1 a 2... a n = r T A = 0. Now write r = b Ax to get the normal equations A T Ax = A T b; solve for x. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 19

Example A= 1 1 b= 7.97 2 1 10.2 3 1 14.2 4 1 16.0 5 1 21.2 C=A *A %Normal equations C= 55 15 15 5 x=c\(a *b) x= 3.2260 4.2360 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 20

If the column vectors of A are linearly independent, then the normal equations A T Ax = A T b are non-singular and have a unique solution BUT: the normal equations have two significant drawbacks: forming A T A leads to loss of information. the condition number of A T A is the square of that of A: κ(a T A) = (κ(a)) 2 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 21

Example A = 1 1 2 1 3 1 4 1 5 1 cond(a) = 8.3657 cond(a *A) = 69.9857 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 22

Worse example A = 101 1 102 1 103 1 104 1 105 1 cond(a) = cond(a *A) = 7.5038e+03 5.6307e+07 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 23

Solving least squares problem using QR r 2 = b Ax 2 = b Q = Q T b ( ) R x 2 0 ( ) R x 2 = Q(Q T b 0 ( ) R x) 2 0 Partition Q = (Q 1 Q 2 ), where Q 1 R m n, and denote Q T b = ( b1 b 2 ) := ( ) Q T 1 b Q T. 2 b Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 24

Then r 2 = ( b1 b 2 ) ( ) R x) 2 = b 1 Rx 2 + b 2 2. 0 Minimize r by making the first term equal to zero: i.e. solve Rx = b 1. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 25

LS by QR Theorem. Let A R m n have full column rank and a thin QRdecomposition A = Q 1 R. Then the least squares problem min x Ax b 2 has the unique solution x = R 1 Q T 1 b. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 26

Example A= 1 1 b= 7.97 2 1 10.2 3 1 14.2 4 1 16.0 5 1 21.2 [Q1,R]=qr(A,0) %skinny QR Q1 = -0.1348-0.7628-0.2697-0.4767-0.4045-0.1907-0.5394 0.0953-0.6742 0.3814 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 27

R = -7.4162-2.0226 0-0.9535 x=r\(q1 *b) x = 3.2260 4.2360 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 28

Back to this example: X = atmospheric measurements of each day: temperature, UV-A: X = ( ) t1 t 2... t n r 1 r 2... r n = ( temperatures UV-A measurements ) y = which month each day belongs to. Choices are: August (y = 0) or September (y = 1). Looking for b such that ŷ = X T b and ŷ y. Use method of least squares. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 29

1.5 August days (b) and September days (r) 1 0.5 UV A 0 0.5 1 1.5 1.5 1 0.5 0 0.5 1 1.5 temperature T Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 30

Example %X (transpose of) data matrix, 1st col: temperature, 2nd col: UV-A %y month vector, y(j)=0 if August and 1 if September [length(find(y==0)) length(find(y==1))] = 116 99 %number of August and September days size(x) = 215 2 %size of data matrix Xap=[X ones(215,1)]; [Q1,R]=qr(Xap,0); b=r\(q1 *y) %this is the same as b=inv(r)*(q1 *y) b= -0.4355-0.2923 0.4605 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 31

yhat=xap*b; %least squares solution [min(yhat) max(yhat)] = -0.3506 1.2436 alpha=0.5; %day classified as September if x y>alpha length(find(1.0*(yhat>alpha)-y)) = 34 %misclassifications x=-1:0.01:1;db=(alpha-b(3)-b(1)*x)/b(2);%decision boundary x b=alpha %attn: in reality, choose alpha with more care! scatter(xnorm(:,1),xnorm(:,2),20,y, filled ) hold;plot(x,db, k ) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 32

1.5 August days (b) and September days (r) 1 0.5 UV A 0 0.5 1 1.5 1.5 1 0.5 0 0.5 1 1.5 temperature T Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 33

Updating the solution of the LS problem Assume we have reduced the matrix and the right hand side (A b) Q T (A b) = ( ) R Q T 1 b 0 Q T. 2 b From this the solution of the LS problem is readily available. Assume we have not saved Q. We then get a new observation (a b), a R n, b R. Do we have to recompute the whole solution? Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 34

Updating the solution of the LS problem No. Instead, write the reduction on the previous slide as ( A ) b a T b ( ) ( ) Q T 0 A b 0 1 a T b = R Q T 1 b 0 Q T 2 b. a T b And reduce this to triangular form using plane rotations. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 35

References [1] Lars Eldén: Matrix Methods in Data Mining and Pattern Recognition, SIAM 2007. [2] G. H. Golub and C. F. Van Loan. Matrix Computations. 3rd ed. Johns Hopkins Press, Baltimore, MD., 1996. [3] T. Hastie, R. Tibshirani, J. Friedman: The Elements of Statistical Learning. Data mining, Inference and Prediction, Springer Verlag, New York, 2001. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 36