Fast Monte-Carlo Low Rank Approximations for Matrices. Shmuel Friedland University of Illinois at Chicago

Similar documents
The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Linear Algebra Review. Vectors

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

Lecture 5: Singular Value Decomposition SVD (1)

Constrained Least Squares

Applied Linear Algebra I Review page 1

Rank one SVD: un algorithm pour la visualisation d une matrice non négative

CS3220 Lecture Notes: QR factorization and orthogonal transformations

Lecture Topic: Low-Rank Approximations

arxiv: v2 [stat.co] 19 Mar 2011

Examination paper for TMA4205 Numerical Linear Algebra

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data

NOVEL GENOME-SCALE CORRELATION BETWEEN DNA REPLICATION AND RNA TRANSCRIPTION DURING THE CELL CYCLE IN YEAST IS PREDICTED BY DATA-DRIVEN MODELS

3 Orthogonal Vectors and Matrices

Nonlinear Iterative Partial Least Squares Method

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Chapter 7. Lyapunov Exponents. 7.1 Maps

The Image Deblurring Problem

By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

8. Linear least-squares

Similar matrices and Jordan form

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Unsupervised and supervised dimension reduction: Algorithms and connections

Linear Algebra Methods for Data Mining

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition

DATA ANALYSIS II. Matrix Algorithms

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing

Text Analytics (Text Mining)

SYMMETRIC EIGENFACES MILI I. SHAH

Big Data Analytics: Optimization and Randomization

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

SOLVING LINEAR SYSTEMS

ON THE DEGREES OF FREEDOM OF SIGNALS ON GRAPHS. Mikhail Tsitsvero and Sergio Barbarossa

SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

1. Introduction. Consider the computation of an approximate solution of the minimization problem

Factorization Theorems

Capacity Limits of MIMO Channels

Modélisation et résolutions numérique et symbolique

Linear Algebra Methods for Data Mining

Review Jeopardy. Blue vs. Orange. Review Jeopardy

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

Journal of Computational and Applied Mathematics

Linköping University Electronic Press

Similarity and Diagonalization. Similar Matrices

Background: State Estimation

Clustered Subset Selection and its Applications on IT Service Metrics

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

LINEAR ALGEBRA. September 23, 2010

Orthogonal Diagonalization of Symmetric Matrices

Section Inner Products and Norms

Sparse Nonnegative Matrix Factorization for Clustering

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

Linear Algebra and TI 89

A Toolbox for Bicluster Analysis in R

A Parallel Quasi-Monte Carlo Method for Solving Systems of Linear Equations

Lecture 2 Matrix Operations

Multidimensional data analysis

Computational Optical Imaging - Optique Numerique. -- Deconvolution --

A linear algebraic method for pricing temporary life annuities

Operation Count; Numerical Linear Algebra

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

Numerical Methods I Eigenvalue Problems

Orthogonal Bases and the QR Algorithm

Matrix Differentiation

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances

Using row reduction to calculate the inverse and the determinant of a square matrix

Clarify Some Issues on the Sparse Bayesian Learning for Sparse Signal Recovery

Greedy Column Subset Selection for Large-scale Data Sets

Inner Product Spaces and Orthogonality

Subspace intersection tracking using the Signed URV algorithm

Model order reduction via Proper Orthogonal Decomposition

Notes on Symmetric Matrices

Dantzig-Wolfe bound and Dantzig-Wolfe cookbook

Bilinear Prediction Using Low-Rank Models

Chapter 6. Orthogonality

Randomized Robust Linear Regression for big data applications

x = + x 2 + x

A simplified implementation of the least squares solution for pairwise comparisons matrices

Linear Algebra: Determinants, Inverses, Rank

The Characteristic Polynomial

Statistical machine learning, high dimension and big data

7 Gaussian Elimination and LU Factorization

Mean value theorem, Taylors Theorem, Maxima and Minima.

The PageRank Citation Ranking: Bring Order to the Web

Discrete Frobenius-Perron Tracking

FAST EXACT AFFINE PROJECTION ALGORITHM USING DISPLACEMENT STRUCTURE THEORY. Manolis C. Tsakiris and Patrick A. Naylor

1 Solving LPs: The Simplex Algorithm of George Dantzig

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

Derivative Free Optimization

Math 550 Notes. Chapter 7. Jesse Crawford. Department of Mathematics Tarleton State University. Fall 2010

LOW-DIMENSIONAL MODELS FOR MISSING DATA IMPUTATION IN ROAD NETWORKS

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Accurate and robust image superresolution by neural processing of local image representations

Observability Index Selection for Robot Calibration

Lecture 1: Schur s Unitary Triangularization Theorem

ALGEBRAIC EIGENVALUE PROBLEM

Transcription:

Fast Monte-Carlo Low Rank Approximations for Matrices Shmuel Friedland University of Illinois at Chicago joint work with M. Kaveh, A. Niknejad and H. Zare IEEE SoSE 2006, LA, April 25, 2006 http://www.math.uic.edu/ friedlan 1

1 Statement of the problem Data is presented in terms of a matrix A = a 11 a 12... a 1n a 21 a 22... a 2n.... a m1 a m2... a mn Examples 1. digital picture: 512 512 matrix of pixels 2. DNA-microarrays: 60, 000 30 (rows are genes and columns are experiments) 3. web pages activities: a ij -the number of times webpage j was accessed from web page i Object: condense data and storage it effectively 2

2 Matrix SVD Let A C m n. Then A : C n C m. Assume C n, C m equipped with standard inner product x, y := y x. Then A = UΣV, where U U(m), V U(n), Σ = diag(σ 1,..., σ min(m,n) ) R m n +. U, V transition matrices from [u 1,..., u m ], [v 1,..., v n ] to the standard bases in C m, C n respectively. For k r let Σ k = diag(σ 1,..., σ k ) R k k, and U k U(m, k), V k U(n, k) having the first k columns of U, V respectively. Then A k := U k Σ k V k the best rank k approximation in Frobenius and operator norm of A: min B R(m,n,k) A B = A A k. A = U r Σ r V r is Reduced SVD (r ) ν numerical rank of A if σ ν+1 σ ν 0. A ν is a noise reduction of A. Noise reduction has many applications in image processing, DNA-Microarrays analysis, data compression. 3

3 SVD in inner product spaces U i is m i -dimensional IPS over C, with, i, i = 1, 2. T : U 1 U 2 linear operator. T : U 2 U 1 the adjoint operator: T x, y 2 = x, T y 1. S 1 := T T : U 1 U 1, S 2 := T T : U 2 U 2. S 1, S 2 self-adjoint: S 1 = S 1, S 2 = S 2 and nonnegative definite: S i x i, x i i 0. σ 2 1... σ2 r > 0 positive eigenvalues of S 1 and S 2 and r = rank T = rank T. Let S 1 v i = σi 2v i, v i, v j 1 = δ ij, i, j =, 1,..., r. Define u i := σ 1 i T v i, i = 1,..., r. Then u i, u j 2 = δ ij, i, j = 1,..., r. Complete {v 1,..., v r } and {u 1,..., u r } to orthonormal bases [v 1,..., v m1 ] and [u 1,..., u m2 ] in U 1 and U 2. 4

4 RANDOM k-svd Stable numerical algortihms of SVD introduced by Golub-Kahan 1965, Golub-Reinsch 1970: Implicit QR Algo to reduce to upper bidiagonal form using Householder matrices, then Golub-Reinsch SVD algo to zero superdiagonal elements. Complexity: O(mn min(m, n)). In applications for massive data: A R m n, m, n >> 1 needed a good approximation A k = k i=1 x iy T i, x i R m, y i R n, i = 1,..., k << min(m, n). Random A k approximation algo: Find a good algo by reading l rows or columns of A at random and update the approximations. Frieze-Kannan-Vempala FOCS 1998 suggest algo without updating. 5

5 FKNZ RANDOM ALGO [4] Fast k-rank approximation and SVD algorithm Input: positive integers m, n, k, l, N, m n matrix A, ɛ > 0. Output: an m n k-rank approximation B f of A, with the ratios B 0 B t and B t 1 B t, approximations to k-singular values and k left and right singular vectors of A. 1. Choose k-rank approximation B 0 using k columns, (or rows), of A. 2. for t = 1 to N - Select l columns, (or rows), from A at random and update B t 1 to B t. - Compute the approximations to k-singular values, and k left and right singular vectors of A. - If B t 1 B t Complexity: O(mnk). > 1 ɛ let f = t and finish. Each iteration A B t 1 F A B t F. 6

6 DETAILS Choose at random k columns of A. Apply modified Gram-Schmidt algo to obtain x 1,..., x q R m, q k. Set B 0 := q i=1 x i(a T x i ) T. A B 0 2 F = tr AT A tr B T 0 B 0 = tr A T A q i=1 (AT x i ) T (A T x i ). Choose at random another l columns of A: w 1,..., w l. Apply modified Gram-Schmidt algo to x 1,..., x q, w 1,..., w l to obtain o.n.s. x 1,..., x q, x q+1,..., x p. Form C 0 := B 0 + p i=q+1 x i(a T x i ) T. Find the first left k-o.n. left singular vectors v 1,..., v k of C 0. Then B 1 := k i=1 v i(a T v i ) T and tr B T 0 B 0 tr B T 1 B 1. Obtain B t from B t 1 as above. 7

7 Lifting body original Figure 1: Lifting body image 512 512. 8

8 Lifting body compressed Figure 2: 80-rank approximation of Lifting body image 512 512. 9

9 SIMULATIONS 1 6.5 7 x 10 4 Weighted sampling Uniform sampling with replacement Uniform sampling without replacement 6 5.5 Relative error 5 4.5 4 3.5 3 0 5 10 15 20 25 30 Number of iteration Figure 3: Convergence property of the Monte-Carlo method for Liftingbody image(512 512), k = 80. 10

10 SIMULATIONS 2 1.8 x 10 3 1.6 Uniform sampling without replacement Uniform sampling with replacement Weighted sampling 1.4 1.2 Relative error 1 0.8 0.6 0.4 0.2 50 100 150 200 250 300 350 400 450 500 550 Total number of sampled rows Figure 4: Liftingbody: relative errors versus total number of sampled rows, k = 100 11

11 Camera man original Figure 5: Camera man image 256 256. 12

12 Camera man compressed Figure 6: 80-rank approximation of Camera man 256 256. 13

13 SIMULATIONS 3 5.5 6 x 10 3 Uniform sampling without replacement Weighted sampling Uniform sampling with replacement 5 4.5 Relative error 4 3.5 3 2.5 2 1.5 1 0 5 10 15 Number of iteration Figure 7: Convergence property of the Monte-Carlo method for Cameraman image(256 256), k = 80. 14

14 SIMULATIONS 4 5.5 6 x 10 3 Uniform sampling without replacement Weighted sampling Uniform sampling with replacement 5 4.5 Relative error 4 3.5 3 2.5 2 1.5 1 80 100 120 140 160 180 200 220 240 260 Total number of sampled rows Figure 8: Cameraman: Relative error versus total number of sampled rows, k = 80. 15

15 SIMULATIONS 5 1.4 x 10 4 1.35 Uniform sampling without replacement Uniform sampling with replacement 1.3 1.25 Relative error 1.2 1.15 1.1 1.05 1 0.95 0 5 10 15 Number of iteration Figure 9: Convergence property of the Monte-Carlo method for random data matrix(3000 500) k = l = 100. 16

16 COMPARISONS Table 1: Comparison of relative error and speed up of our algorithm with optimum k-rank approximation algorithm Data sets Speed up Re. ratio Cameraman(256 256), k = 80 1.145 1.083 Liftingbody (512 512), k = 100 8 1.08 Map image(627 865) k = 200 3.33 1.067 Random matrix(8000 200) k = 100 42 1.1 17

17 Choosing columns of A Frieze, Kannan and Vempala [8] suggest to choose column c i (A) with probability c i(a) 2 A 2. F If s k are chosen then the k-approximation satisfies A k A A k 2 F m i=k+1 σ i(a) 2 + 10k s A 2 F. If s k 10ɛ then A A k 2 F m i=k+1 σ i(a) 2 + ɛ A 2 F. Deshpande, Rademacher, Vempala and Wang [2] improved the sampling by modifying the sampling c i (A) according to new probabilities c i(a A k ) 2 A A k 2 F Perhaps our algorithm can be combined with above sampling of columns to get better results.. 18

References [1] O. Alter, P.O. Brown and D. Botstein, Singular value decomposition for genome-wide expression data processing and modelling, Proc. Nat. Acad. Sci. USA 97 (2000), 10101-10106. [2] A. Deshpande, L. Rademacher, S. Vemapala and G. Wang, Matrix Approximation and Projective Clustering via Volume Sampling, SODA, 2006. [3] S. Friedland, A New Approach to Generalized Singular Value Decomposition, SIMAX 27 (2005), 434-444. [4] S. Friedland, M. Kaveh, A. Niknejad and H. Zare, Fast Monte-Carlo Low Rank Approximations for Matrices, Proc. IEEE SoSE, 2006, 6 pp., to appear. [5] S. Friedland, M. Kaveh, A. Niknejad and H. Zare, An Algorithm for Missing Value Estimation for DNA Microarray Data, Proceedings of ICASSP 2006, Toulouse, France, 4 pp., to appear. [6] S. Friedland, A. Niknejad and L. Chihara, A simultaneous reconstruction of missing data in DNA 19

microarrays, to appear in Linear Algebra and Its Applications. [7] S. Friedland, J. Nocedal and M. Overton, The formulation and analysis of numerical methods for inverse eigenvalue problems, SIAM J. Numer. Anal. 24 (1987), 634-667. [8] A. Frieze, R. Kannan and S. Vempala, Fast Monte-Carlo algorithms for finding low rank approximations, Proceedings of the 39th Annual Symposium on Foundation of Computer Science, 1998. [9] G.H. Golub and C.F. Van Loan, Matrix Computation, John Hopkins Univ. Press, 3rd Ed., 1996. 20