Sketching as a Tool for Numerical Linear Algebra (Graph Sparsification) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania April, 2015 Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 1 / 18
Goal New survey by David Woodruff: Sketching as a Tool for Numerical Linear Algebra Topics: Subspace Embeddings Least Squares Regression Least Absolute Deviation Regression Low Rank Approximation Graph Sparsification Sketching Lower Bounds Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 2 / 18
Goal New survey by David Woodruff: Sketching as a Tool for Numerical Linear Algebra Topics: Subspace Embeddings Least Squares Regression Least Absolute Deviation Regression Low Rank Approximation Graph Sparsification Sketching Lower Bounds Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 3 / 18
Matrix Compression Previously: Compress a matrix A R n d using linear sketches Example: subspace embedding Definition (l 2 -subspace embedding) A (1 ± ε) l 2 -subspace embedding for a matrix A R n d is a matrix S for which for all x R n SAx 2 2 = (1 ± ε) Ax 2 2 Typically SA is an Õ(d 2 )-size matrix Techniques: Using random matrices S (Guassian, sign matrices, etc. ) Using leverage score sampling Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 4 / 18
Graph Compression Today: Compress a graph G(V, E) using linear sketches Example: sparsification Definition (cut sparsifier) A (1 ± ε) cut sparsifier of a graph G(V, E) is a weighted subgraph H of G such that for any S V : W H (S, S) = (1 ± ε) W G (S, S) *W G (S, S) is the weight of the cut between S and S in G Typically H is an Õ(n)-size graph Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 5 / 18
Graph Compression (cont.) Laplacian matrix of a graph G(V, E): L R n n L = D A, degree matrix D R n n and adjacency matrix A L = e E L e for edge-laplacian matrix L e R n n L = B T B for edge-vertex incidence matrix B R (n 2) n A set of vertices S V and its characteristic vector x {0, 1} n : x T Lx = (x u x v ) 2 = δg (S, S) e:(u,v) E Any cut sparsifier H of G has a Laplacian L such that: x {0, 1} n x T Lx = (1 ± ε) x T Lx Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 6 / 18
Spectral Sparsifier Definition (spectral sparsifier) A (1 ± ε) spectral sparsifier of a graph G(V, E) is a weighted subgraph H of G such that for any x R n : x T Lx = (1 ± ε) x T Lx *L (resp. L) is the Laplacian of G (resp. H ) Originally proposed by Spielman and Teng [ST11]: Õ(m) construction time and Õ(n) size. Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 7 / 18
Spectral vs Cut Sparsifiers Difference between spectral and cut sparsifiers: (Figure from [ST11]) Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 8 / 18
Graph vs Matrix Compression Matrix compression A R n d A is a tall matrix, i.e., n d Compression guarantee of the form Õ(d 2 ) Graph compression L R n n L is a square matrix Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 9 / 18
Graph vs Matrix Compression Matrix compression A R n d A is a tall matrix, i.e., n d Compression guarantee of the form Õ(d 2 ) Graph compression L R n n L is a square matrix But... L = B T B and B is tall x T Lx = x T B T Bx = Bx 2 Spectral sparsification is a subspace embedding for B! Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 10 / 18
Spectral Sparsification and Subspace Embedding A sampling based subspace embedding: Leverage score sampling Leverage Score of i-th row of A = UΣV: l i = 2 U(i) Leverage score sampling for A R m d Ss m = D s m Ω m m Ds m : rescaling matrix (according to the sampled probability) Ωm m : sampling matrix (based on leverage scores) Theorem (LS-sampling theorem) For s = Θ( d log d ), with probability 0.99, S βε 2 s m is a subspace embedding matrix for A m d. Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 11 / 18
Spectral Sparsification and Subspace Embedding (cont.) Theorem Sampling and weighting Õ(ε 2 n) edges from G(V, E) according to leverage scores of B R (n 2) n results in a (1 ± ε) spectral sparsifier of G. Proof. For any x R n, x T Lx = Bx LS-sampling for subspace embedding of B Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 12 / 18
Linear Sketching for Spectral Sparsification Theorem ( [KLM + 14]) There exists a distribution on ε 2 polylog (n) ( ) n 2 dimensional matrices S, such that with high probability, from S B, a (1 ± ε) spectral sparsifier of G can be recovered. Key feature: linear sketch First single pass spectral sparsifier for dynamic graph streams [KLM + 14] Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 13 / 18
Introduction and Removal of Artificial Bases Theorem ( [LMP13]) Let K be any PSD matrix with maximum eigen value λ u and minimum (non-zero) eigen value λ l and d = log (λ u /λ l ). For l [d], define: γ(l) = λ u 2 l Consider the sequence of PSD matrices K(0),..., K(d), where: Then: 1 K R K(d) R 2K K(l) = K + γ(l) I 2 K(l) K(l 1) 2K(l) for l 1 3 K(0) 2γ(0)I 2K(0) Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 14 / 18
Constructing a Spectral Sparsifier Use previous theorem! d = O(log n) for Laplacian matrices Leverage scores of K(l) leverage scores of K(l + 1) Proof. On the board. Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 15 / 18
Sparse Recovery Algorithm Theorem ([GLPS12]) There exists an algorithm D and a distribution on matrices Φ of dimension ε 2 polylog (n) n, such that for any x R n, with high probability, D(Φx, i) can detect whether x i = Ω( x ) or x i = o( x ). Heavy hitter detection! Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 16 / 18
Constructing a Spectral Sparsifiers via Linear Sketches 1 For i = 1,..., O(log n): (a) Maintain Φ D i B, (Φ is the sparse recovery matrix, D i R (n 2) ( n 2) is diagonal) 2 Repeat O(log n) times We are done! Proof Sketch. Enough information to traverse the hierarchy of K(0) to K(d) At each level l, compute Φ D i B K(l) b e for every edge e Run D(Φ D i B K(l) b e, e) to sample an edge e Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 17 / 18
Questions? Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 18 / 18
Anna C. Gilbert, Yi Li, Ely Porat, and Martin J. Strauss. Approximate sparse recovery: Optimizing time and measurements. SIAM J. Comput., 41(2):436 453, 2012. Michael Kapralov, Yin Tat Lee, Cameron Musco, Christopher Musco, and Aaron Sidford. Single pass spectral sparsification in dynamic streams. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014,, pages 561 570, 2014. Mu Li, Gary L. Miller, and Richard Peng. Iterative row sampling. In 54th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2013,, pages 127 136, 2013. Daniel A. Spielman and Shang-Hua Teng. Spectral sparsification of graphs. SIAM J. Comput., 40(4):981 1025, 2011. Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 18 / 18