Linear Algebra Methods for Data Mining

Transcription

1 Linear Algebra Methods for Data Mining Saara Hyvönen, Spring 2007 The Singular Value Decomposition (SVD) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki

2 Matrix decompositions We wish to decompose the matrix A by writing it as a product of two or more matrices: A m n = B m k C k n, A m n = B m k C k r D r n This is done in such a way that the right side of the equation yields some useful information or insight to the nature of the data matrix A. Or is in other ways useful for solving the problem at hand. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 1

3 Example (SVD): customer \ day Wed Thu Fri Sat Sun ABC Inc CDE Co FGH Ltd NOP Inc Smith Brown Johnson Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 2

4 = ( ) ( ) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 3

5 The Singular Value Decomposition Any m n matrix A, with m n, can be factorized A = U ( ) Σ V T, 0 where U R m m and V R n n are orthogonal, and Σ R n n is diagonal: Σ = diag(σ 1, σ 2,..., σ n ), σ 1 σ 2... σ n 0. Skinny version : A = U 1 ΣV T, U 1 R m n. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 4

6 Singular values and singular vectors The diagonal elements σ j of Σ are the singular values of the matrix A. The columns of U and V are the left singular vectors and right singular vectors respectively. Equivalent form of SVD: Av j = σ j u j, A T u i = σ j v j. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 5

7 Outer product form: Start from skinny version of SVD: σ 1 A = U 1 ΣV T σ = (u 1 u 2... u n ) 2 σ 1 v1 T σ = (u 1 u 2... u n ) 2 v T 2. = σ n vn T n i=1... σ i u i v T i... σ n v T 1 v T 2. v T n Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 6

8 to get the outer product form A = n i=1 σ i u i v T i. This is a sum of rank one matrices!! (Each term σ i u i vi T in the sum is a rank one matrix). Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 7

9 Example A= [U,S,V]=svd(A) U= S= V= Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 8

10 [U,S,V]=svd(A,0) U= S= V= Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 9

11 Rank deficient case %A is not of full rank %i.e. the columns of A are not linearly independent A(:,3)=A(:,1)+A(:,2)*0.5 A= [U,S,V]=svd(A,0) U= Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 10

12 S= V= Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 11

13 Matrix approximation Theorem. Let U k = (u 1 u 2... u k ), V k = (v 1 v 2... v k ) and Σ k = diag(σ 1, σ 2,..., σ k ), and define A k = U k Σ k V T k. Then min rank(b) k A B 2 = A A k 2 = σ k+1. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 12

14 What does this mean? It means, that the best approximation of rank k for the matrix A is A k = U k Σ k V T k. Useful for compression noise reduction It also means, that we can estimate the correct rank by looking at the singular values... Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 13

15 Example: noise reduction Assume A is a low rank matrix plus noise: A = A k + N. Then typically singular values of A have the following behaviour: 8 singular values index Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 14

16 2 log of singular values index Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 15

17 SVD is useful for: compression noise reduction finding concepts or topics (text mining/lsi) data exploration and visualizing data (e.g. spatial data/pca) classification (of e.g. handwritten digits) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 16

18 SVD appears under different names: Principal Component Analysis (PCA) Latent Semantic Indexing (LSI)/Latent Semantic Analysis (LSA) Karhunen-Loeve expansion/hotelling transform (in image processing) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 17

19 Example: Classification of handwritten digits Digitized images, 28 28, grayscale. Task: classify unknown digits. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 18

20 Digitized images, 28 28, grayscale. The image can also be considered as a function of two variables: c = c(x, y), where c is the intensity of the color. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 19

21 The image of one digit is a matrix (of size 28 28). Stacking all the columns of each image matrix above each other gives a vector (of size 784 1). The image vectors can then be collected into a data matrix (of size 784 N, where N is the number of images). Distance between images: Euclidean distance in R 784 or cosine distance. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 20

22 Naive method Given a data base of handwritten digits (training set), compute the mean (centroid) of all digits of one kind. Centroid of threes: Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 21

23 Naive method Given a data base of handwritten digits (training set), compute the mean (centroid) of all digits of one kind. Compare an unknown digit (from the test set) with all means, and classify as the digit that is closest (using e.g. euclidean or cosine distance). Good results for well-written digits, worse for bad digits. does not use any information about the variation of the digits of one kind. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 22

24 Results using the naive method Success rate around 75%. Good... but not good enough! For the homework set (training set=4000, test set size = 2000) the number of misclassifications per digit look like this: digit missed total Some digits are easier to recognize than others. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 23

25 SVD Any matrix A can be written as a sum of rank 1 matrices: A = n i=1 σ i u i v T i. Consider a data matrix A,where each column of A is one image of a digit. Column j: a j = n i=1 (σ iv ij )u i. u j is the same size as a j. We can fold it back to get an image. We can think of the left singular vectors u j as singular images. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 24

26 Compute the SVD of the matrix fo one digit (=3), (392 images). Singular values and right singular vectors v j : 10 5 Singular values of threes 0.15 Three right singular vectors of threes Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 25

27 Left singular vectors or singular images u j : Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 26

28 Classification Let z be and unknown digit, and consider the least squares problem (for some predetermined k): min α j z k j=1 α j u j. Then, if z is the same digit as that represented by the u j, then the residual should be small. I.e. z should be well approximated by some linear combination of the u j. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 27

29 Assume Principal components (SVD) regression Each digit is well characterize by a few of the first singular images of its own kind. Each digit is NOT characterized very well by a few of the first singular images of some other digit. We can use this to classify unknown digits: Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 28

30 Algorithm Training: For the training set of digits, compute the SVD of each set of digits of one kind. We get 10 sets of singular images, one for each digit. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 29

31 Classification: 1. Take an unknown digit z. For each set of singular images u j, solve the least squares problem min α j z k j=1 α j u j. and compute the relative residual r / z, where r is the residual. 2. Classify z to be the digit corresponding to the smallest relative residual. Or: find the smallest residual r 0 and the next smallest residual r 1. If r 0 τr 1, where τ is the tolerance, then classify, else give up. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 30

32 Results To be reported by you in you homework assignment. Performance better: success rate over 90%. Still not up to the best (more complicated) methods. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 31

33 Nice test digit 3 in basis 3 Original image of digit and approximations using 1,3,5,7 and 9 basis vectors in 3-basis: Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 32

34 Relative residual in least squares problem for nice test digit 3 in 3-basis: 0.75 Relative residual in LS problem for test digit (nice 3) in 3 basis number of basis vectors Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 33

35 Nice test digit 3 in basis 5 Original image of digit and approximations using 1,3,5,7 and 9 basis vectors in 5-basis: Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 34

36 Relative residual in least squares problem for nice test digit 3 in 5-basis: 0.8 Relative residual in LS problem for test digit (nice 3) in 5 basis number of basis vectors Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 35

37 Ugly test digit 3 in basis 3 Original image of digit and approximations using 1,3,5,7 and 9 basis vectors in 3-basis: Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 36

38 Relative residual in least squares problem for ugly test digit 3 in 3-basis: 0.7 Relative residual in LS problem for test digit (ugly 3) in 3 basis number of basis vectors Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 37

39 Ugly test digit 3 in basis 5 Original image of digit and approximations using 1,3,5,7 and 9 basis vectors in 5-basis: Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 38

40 Relative residual in least squares problem for ugly test digit 3 in 5-basis: 0.61 Relative residual in LS problem for test digit (ugly 3) in 5 basis number of basis vectors Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 39

41 Conclusion: do not use many terms in the basis. The 100th singular images of all vectors. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 40

42 Work Training: compute SVD s of 10 matrices of dimension m 2 n i, each image of a digit is a m m matrix; n i = number of training digits i. Solve 10 least squares problems; multiply test digit by 10 matrices with orthogonal columns. Fast test phase! Suitable for real time computations! Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 41

43 Test results Correct classification as a function fo the number of basis vectors for each digit vector: number of vectors Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 42

44 How to improve performance? Ugly digits are difficult to handle automatically. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 43

45 The Singular Value Decomposition Any m n matrix A, with m n, can be factorized A = U ( ) Σ V T, 0 where U R m m and V R n n are orthogonal, and Σ R n n is diagonal: Σ = diag(σ 1, σ 2,..., σ n ), σ 1 σ 2... σ n 0. Skinny version : A = U 1 ΣV T, U 1 R m n. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 44

46 We can write A T A = VΣU T UΣV T = VΣ 2 V T, so we get A T AV = VΣ 2. Let v j be the j th column of V. Write the above column by column: A T Av j = σjv 2 j. Equivalently, we can write AA T =... = UΣ 2 U T column of U) AA T u j = σju 2 j. to get (u j is the j th Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 45

47 Singular values and vectors The singular values are the nonnegative square roots of the eigenvalues of A T A, and hence uniquely determined. The columns of V are the eigenvectors of A T A, arranged in the same order as the σ j. The columns of U are the eigenvectors of AA T, arranged in the same order as the σ j. If A is real, then V, Σ and U can be taken to be real. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 46

48 Uniqueness considerations. Σ is unique: the eigenvalues of A T A are unique, the positive square roots of them = the singular values are also unique, and the ordering fixes Σ. But how about U and V? σ 1 σ 2... σ n 0 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 47

49 Example from lecture 4 revisited: %Rank deficient case: A is not of full rank. %The columns of A are not linearly independent. A=[ ; ] A= A(:,3)=A(:,1)+A(:,2)*0.5 A= Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 48

50 %Matlab version 7 [U,S,V]=svd(A,0) U= S= V= %Matlab version 6.5 [U,S,V]=svd(A,0) U= S= V= Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 49

51 Uniqueness of U and V Let A be a m n, real valued matrix, with m n. If the singular values σ j are distinct, and A = U 1 ΣV T = U 2 ΣW T, the following holds for the columns of V and W: w j = ( 1) k jv j, k j (0, 1). If m = n = rank of A, then, given V, U is uniquely determined. But if m > n, U is never uniquely determined! (Only the k first columns are, up to a multiplication by ±1, where k = rank(a).) Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 50

52 Example: 1st singular images of handwritten digits Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 51

53 References [1] Lars Eldén: Matrix Methods in Data Mining and Pattern Recognition, SIAM item[[2]] T. Hastie, R. Tibshirani, J. Friedman: The Elements of Statistical Learning. Data mining, Inference and Prediction, Springer Verlag, New York, [3] The data is from the MNIST database of handwritten digits, Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 52