Fast Monte-Carlo Low Rank Approximations for Matrices Shmuel Friedland University of Illinois at Chicago joint work with M. Kaveh, A. Niknejad and H. Zare IEEE SoSE 2006, LA, April 25, 2006 http://www.math.uic.edu/ friedlan 1
1 Statement of the problem Data is presented in terms of a matrix A = a 11 a 12... a 1n a 21 a 22... a 2n.... a m1 a m2... a mn Examples 1. digital picture: 512 512 matrix of pixels 2. DNA-microarrays: 60, 000 30 (rows are genes and columns are experiments) 3. web pages activities: a ij -the number of times webpage j was accessed from web page i Object: condense data and storage it effectively 2
2 Matrix SVD Let A C m n. Then A : C n C m. Assume C n, C m equipped with standard inner product x, y := y x. Then A = UΣV, where U U(m), V U(n), Σ = diag(σ 1,..., σ min(m,n) ) R m n +. U, V transition matrices from [u 1,..., u m ], [v 1,..., v n ] to the standard bases in C m, C n respectively. For k r let Σ k = diag(σ 1,..., σ k ) R k k, and U k U(m, k), V k U(n, k) having the first k columns of U, V respectively. Then A k := U k Σ k V k the best rank k approximation in Frobenius and operator norm of A: min B R(m,n,k) A B = A A k. A = U r Σ r V r is Reduced SVD (r ) ν numerical rank of A if σ ν+1 σ ν 0. A ν is a noise reduction of A. Noise reduction has many applications in image processing, DNA-Microarrays analysis, data compression. 3
3 SVD in inner product spaces U i is m i -dimensional IPS over C, with, i, i = 1, 2. T : U 1 U 2 linear operator. T : U 2 U 1 the adjoint operator: T x, y 2 = x, T y 1. S 1 := T T : U 1 U 1, S 2 := T T : U 2 U 2. S 1, S 2 self-adjoint: S 1 = S 1, S 2 = S 2 and nonnegative definite: S i x i, x i i 0. σ 2 1... σ2 r > 0 positive eigenvalues of S 1 and S 2 and r = rank T = rank T. Let S 1 v i = σi 2v i, v i, v j 1 = δ ij, i, j =, 1,..., r. Define u i := σ 1 i T v i, i = 1,..., r. Then u i, u j 2 = δ ij, i, j = 1,..., r. Complete {v 1,..., v r } and {u 1,..., u r } to orthonormal bases [v 1,..., v m1 ] and [u 1,..., u m2 ] in U 1 and U 2. 4
4 RANDOM k-svd Stable numerical algortihms of SVD introduced by Golub-Kahan 1965, Golub-Reinsch 1970: Implicit QR Algo to reduce to upper bidiagonal form using Householder matrices, then Golub-Reinsch SVD algo to zero superdiagonal elements. Complexity: O(mn min(m, n)). In applications for massive data: A R m n, m, n >> 1 needed a good approximation A k = k i=1 x iy T i, x i R m, y i R n, i = 1,..., k << min(m, n). Random A k approximation algo: Find a good algo by reading l rows or columns of A at random and update the approximations. Frieze-Kannan-Vempala FOCS 1998 suggest algo without updating. 5
5 FKNZ RANDOM ALGO [4] Fast k-rank approximation and SVD algorithm Input: positive integers m, n, k, l, N, m n matrix A, ɛ > 0. Output: an m n k-rank approximation B f of A, with the ratios B 0 B t and B t 1 B t, approximations to k-singular values and k left and right singular vectors of A. 1. Choose k-rank approximation B 0 using k columns, (or rows), of A. 2. for t = 1 to N - Select l columns, (or rows), from A at random and update B t 1 to B t. - Compute the approximations to k-singular values, and k left and right singular vectors of A. - If B t 1 B t Complexity: O(mnk). > 1 ɛ let f = t and finish. Each iteration A B t 1 F A B t F. 6
6 DETAILS Choose at random k columns of A. Apply modified Gram-Schmidt algo to obtain x 1,..., x q R m, q k. Set B 0 := q i=1 x i(a T x i ) T. A B 0 2 F = tr AT A tr B T 0 B 0 = tr A T A q i=1 (AT x i ) T (A T x i ). Choose at random another l columns of A: w 1,..., w l. Apply modified Gram-Schmidt algo to x 1,..., x q, w 1,..., w l to obtain o.n.s. x 1,..., x q, x q+1,..., x p. Form C 0 := B 0 + p i=q+1 x i(a T x i ) T. Find the first left k-o.n. left singular vectors v 1,..., v k of C 0. Then B 1 := k i=1 v i(a T v i ) T and tr B T 0 B 0 tr B T 1 B 1. Obtain B t from B t 1 as above. 7
7 Lifting body original Figure 1: Lifting body image 512 512. 8
8 Lifting body compressed Figure 2: 80-rank approximation of Lifting body image 512 512. 9
9 SIMULATIONS 1 6.5 7 x 10 4 Weighted sampling Uniform sampling with replacement Uniform sampling without replacement 6 5.5 Relative error 5 4.5 4 3.5 3 0 5 10 15 20 25 30 Number of iteration Figure 3: Convergence property of the Monte-Carlo method for Liftingbody image(512 512), k = 80. 10
10 SIMULATIONS 2 1.8 x 10 3 1.6 Uniform sampling without replacement Uniform sampling with replacement Weighted sampling 1.4 1.2 Relative error 1 0.8 0.6 0.4 0.2 50 100 150 200 250 300 350 400 450 500 550 Total number of sampled rows Figure 4: Liftingbody: relative errors versus total number of sampled rows, k = 100 11
11 Camera man original Figure 5: Camera man image 256 256. 12
12 Camera man compressed Figure 6: 80-rank approximation of Camera man 256 256. 13
13 SIMULATIONS 3 5.5 6 x 10 3 Uniform sampling without replacement Weighted sampling Uniform sampling with replacement 5 4.5 Relative error 4 3.5 3 2.5 2 1.5 1 0 5 10 15 Number of iteration Figure 7: Convergence property of the Monte-Carlo method for Cameraman image(256 256), k = 80. 14
14 SIMULATIONS 4 5.5 6 x 10 3 Uniform sampling without replacement Weighted sampling Uniform sampling with replacement 5 4.5 Relative error 4 3.5 3 2.5 2 1.5 1 80 100 120 140 160 180 200 220 240 260 Total number of sampled rows Figure 8: Cameraman: Relative error versus total number of sampled rows, k = 80. 15
15 SIMULATIONS 5 1.4 x 10 4 1.35 Uniform sampling without replacement Uniform sampling with replacement 1.3 1.25 Relative error 1.2 1.15 1.1 1.05 1 0.95 0 5 10 15 Number of iteration Figure 9: Convergence property of the Monte-Carlo method for random data matrix(3000 500) k = l = 100. 16
16 COMPARISONS Table 1: Comparison of relative error and speed up of our algorithm with optimum k-rank approximation algorithm Data sets Speed up Re. ratio Cameraman(256 256), k = 80 1.145 1.083 Liftingbody (512 512), k = 100 8 1.08 Map image(627 865) k = 200 3.33 1.067 Random matrix(8000 200) k = 100 42 1.1 17
17 Choosing columns of A Frieze, Kannan and Vempala [8] suggest to choose column c i (A) with probability c i(a) 2 A 2. F If s k are chosen then the k-approximation satisfies A k A A k 2 F m i=k+1 σ i(a) 2 + 10k s A 2 F. If s k 10ɛ then A A k 2 F m i=k+1 σ i(a) 2 + ɛ A 2 F. Deshpande, Rademacher, Vempala and Wang [2] improved the sampling by modifying the sampling c i (A) according to new probabilities c i(a A k ) 2 A A k 2 F Perhaps our algorithm can be combined with above sampling of columns to get better results.. 18
References [1] O. Alter, P.O. Brown and D. Botstein, Singular value decomposition for genome-wide expression data processing and modelling, Proc. Nat. Acad. Sci. USA 97 (2000), 10101-10106. [2] A. Deshpande, L. Rademacher, S. Vemapala and G. Wang, Matrix Approximation and Projective Clustering via Volume Sampling, SODA, 2006. [3] S. Friedland, A New Approach to Generalized Singular Value Decomposition, SIMAX 27 (2005), 434-444. [4] S. Friedland, M. Kaveh, A. Niknejad and H. Zare, Fast Monte-Carlo Low Rank Approximations for Matrices, Proc. IEEE SoSE, 2006, 6 pp., to appear. [5] S. Friedland, M. Kaveh, A. Niknejad and H. Zare, An Algorithm for Missing Value Estimation for DNA Microarray Data, Proceedings of ICASSP 2006, Toulouse, France, 4 pp., to appear. [6] S. Friedland, A. Niknejad and L. Chihara, A simultaneous reconstruction of missing data in DNA 19
microarrays, to appear in Linear Algebra and Its Applications. [7] S. Friedland, J. Nocedal and M. Overton, The formulation and analysis of numerical methods for inverse eigenvalue problems, SIAM J. Numer. Anal. 24 (1987), 634-667. [8] A. Frieze, R. Kannan and S. Vempala, Fast Monte-Carlo algorithms for finding low rank approximations, Proceedings of the 39th Annual Symposium on Foundation of Computer Science, 1998. [9] G.H. Golub and C.F. Van Loan, Matrix Computation, John Hopkins Univ. Press, 3rd Ed., 1996. 20