Sketch As a Tool for Numerical Linear Algebra



Similar documents
CIS 700: algorithms for Big Data

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

DATA ANALYSIS II. Matrix Algorithms

Linear Algebra Review. Vectors

1 Introduction to Matrices

Similarity and Diagonalization. Similar Matrices

Analyzing Graph Structure via Linear Measurements

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

Lectures notes on orthogonal matrices (with exercises) Linear Algebra II - Spring 2004 by D. Klain

(67902) Topics in Theory and Complexity Nov 2, Lecture 7

Math 215 HW #6 Solutions

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property

by the matrix A results in a vector which is a reflection of the given

GI01/M055 Supervised Learning Proximal Methods

Lecture 4: Partitioned Matrices and Determinants

Randomized Robust Linear Regression for big data applications

[1] Diagonal factorization

Chapter 6. Orthogonality

When is missing data recoverable?

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Conductance, the Normalized Laplacian, and Cheeger s Inequality

Lecture Notes 2: Matrices as Systems of Linear Equations

ON THE DEGREES OF FREEDOM OF SIGNALS ON GRAPHS. Mikhail Tsitsvero and Sergio Barbarossa

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

( ) which must be a vector

Part 2: Community Detection

Modélisation et résolutions numérique et symbolique

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

MAT 242 Test 2 SOLUTIONS, FORM T

Orthogonal Diagonalization of Symmetric Matrices

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Split Nonthreshold Laplacian Integral Graphs

Similar matrices and Jordan form

Laplacian Solvers and Their Algorithmic Applications. Contents

Matrix Multiplication

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

Zachary Monaco Georgia College Olympic Coloring: Go For The Gold

MATH1231 Algebra, 2015 Chapter 7: Linear maps

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Learning, Sparsity and Big Data

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

Approximating the Minimum Chain Completion problem

General Framework for an Iterative Solution of Ax b. Jacobi s Method

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Methods for Finding Bases

Chapter 19. General Matrices. An n m matrix is an array. a 11 a 12 a 1m a 21 a 22 a 2m A = a n1 a n2 a nm. The matrix A has n row vectors

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Graph theoretic techniques in the analysis of uniquely localizable sensor networks

Actually Doing It! 6. Prove that the regular unit cube (say 1cm=unit) of sufficiently high dimension can fit inside it the whole city of New York.

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.

University of Lille I PC first year list of exercises n 7. Review

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = i.

6. Cholesky factorization

Notes on Symmetric Matrices

Cheeger Inequalities for General Edge-Weighted Directed Graphs

The Geometry of Polynomial Division and Elimination

Problem Set 5 Due: In class Thursday, Oct. 18 Late papers will be accepted until 1:00 PM Friday.

Subspace clustering of dimensionality-reduced data

Graph Classification and Easy Reliability Polynomials

OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION

Weakly Secure Network Coding

A Direct Numerical Method for Observability Analysis

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH

Sketching as a Tool for Numerical Linear Algebra

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008

Social Media Mining. Graph Essentials

A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form

Bilinear Prediction Using Low-Rank Models

Online Adwords Allocation

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

NETZCOPE - a tool to analyze and display complex R&D collaboration networks

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

T ( a i x i ) = a i T (x i ).

Designing a learning system

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

7 Gaussian Elimination and LU Factorization

Brief Introduction to Vectors and Matrices

Direct Methods for Solving Linear Systems. Matrix Factorization

Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions

GENERATING SETS KEITH CONRAD

Francesco Sorrentino Department of Mechanical Engineering

NP-Hardness Results Related to PPAD

Factor analysis. Angela Montanari

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

3. INNER PRODUCT SPACES

MAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A =

Exponential time algorithms for graph coloring

SALEM COMMUNITY COLLEGE Carneys Point, New Jersey COURSE SYLLABUS COVER SHEET. Action Taken (Please Check One) New Course Initiated

(Quasi-)Newton methods

Transcription:

Sketching as a Tool for Numerical Linear Algebra (Graph Sparsification) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania April, 2015 Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 1 / 18

Goal New survey by David Woodruff: Sketching as a Tool for Numerical Linear Algebra Topics: Subspace Embeddings Least Squares Regression Least Absolute Deviation Regression Low Rank Approximation Graph Sparsification Sketching Lower Bounds Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 2 / 18

Goal New survey by David Woodruff: Sketching as a Tool for Numerical Linear Algebra Topics: Subspace Embeddings Least Squares Regression Least Absolute Deviation Regression Low Rank Approximation Graph Sparsification Sketching Lower Bounds Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 3 / 18

Matrix Compression Previously: Compress a matrix A R n d using linear sketches Example: subspace embedding Definition (l 2 -subspace embedding) A (1 ± ε) l 2 -subspace embedding for a matrix A R n d is a matrix S for which for all x R n SAx 2 2 = (1 ± ε) Ax 2 2 Typically SA is an Õ(d 2 )-size matrix Techniques: Using random matrices S (Guassian, sign matrices, etc. ) Using leverage score sampling Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 4 / 18

Graph Compression Today: Compress a graph G(V, E) using linear sketches Example: sparsification Definition (cut sparsifier) A (1 ± ε) cut sparsifier of a graph G(V, E) is a weighted subgraph H of G such that for any S V : W H (S, S) = (1 ± ε) W G (S, S) *W G (S, S) is the weight of the cut between S and S in G Typically H is an Õ(n)-size graph Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 5 / 18

Graph Compression (cont.) Laplacian matrix of a graph G(V, E): L R n n L = D A, degree matrix D R n n and adjacency matrix A L = e E L e for edge-laplacian matrix L e R n n L = B T B for edge-vertex incidence matrix B R (n 2) n A set of vertices S V and its characteristic vector x {0, 1} n : x T Lx = (x u x v ) 2 = δg (S, S) e:(u,v) E Any cut sparsifier H of G has a Laplacian L such that: x {0, 1} n x T Lx = (1 ± ε) x T Lx Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 6 / 18

Spectral Sparsifier Definition (spectral sparsifier) A (1 ± ε) spectral sparsifier of a graph G(V, E) is a weighted subgraph H of G such that for any x R n : x T Lx = (1 ± ε) x T Lx *L (resp. L) is the Laplacian of G (resp. H ) Originally proposed by Spielman and Teng [ST11]: Õ(m) construction time and Õ(n) size. Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 7 / 18

Spectral vs Cut Sparsifiers Difference between spectral and cut sparsifiers: (Figure from [ST11]) Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 8 / 18

Graph vs Matrix Compression Matrix compression A R n d A is a tall matrix, i.e., n d Compression guarantee of the form Õ(d 2 ) Graph compression L R n n L is a square matrix Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 9 / 18

Graph vs Matrix Compression Matrix compression A R n d A is a tall matrix, i.e., n d Compression guarantee of the form Õ(d 2 ) Graph compression L R n n L is a square matrix But... L = B T B and B is tall x T Lx = x T B T Bx = Bx 2 Spectral sparsification is a subspace embedding for B! Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 10 / 18

Spectral Sparsification and Subspace Embedding A sampling based subspace embedding: Leverage score sampling Leverage Score of i-th row of A = UΣV: l i = 2 U(i) Leverage score sampling for A R m d Ss m = D s m Ω m m Ds m : rescaling matrix (according to the sampled probability) Ωm m : sampling matrix (based on leverage scores) Theorem (LS-sampling theorem) For s = Θ( d log d ), with probability 0.99, S βε 2 s m is a subspace embedding matrix for A m d. Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 11 / 18

Spectral Sparsification and Subspace Embedding (cont.) Theorem Sampling and weighting Õ(ε 2 n) edges from G(V, E) according to leverage scores of B R (n 2) n results in a (1 ± ε) spectral sparsifier of G. Proof. For any x R n, x T Lx = Bx LS-sampling for subspace embedding of B Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 12 / 18

Linear Sketching for Spectral Sparsification Theorem ( [KLM + 14]) There exists a distribution on ε 2 polylog (n) ( ) n 2 dimensional matrices S, such that with high probability, from S B, a (1 ± ε) spectral sparsifier of G can be recovered. Key feature: linear sketch First single pass spectral sparsifier for dynamic graph streams [KLM + 14] Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 13 / 18

Introduction and Removal of Artificial Bases Theorem ( [LMP13]) Let K be any PSD matrix with maximum eigen value λ u and minimum (non-zero) eigen value λ l and d = log (λ u /λ l ). For l [d], define: γ(l) = λ u 2 l Consider the sequence of PSD matrices K(0),..., K(d), where: Then: 1 K R K(d) R 2K K(l) = K + γ(l) I 2 K(l) K(l 1) 2K(l) for l 1 3 K(0) 2γ(0)I 2K(0) Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 14 / 18

Constructing a Spectral Sparsifier Use previous theorem! d = O(log n) for Laplacian matrices Leverage scores of K(l) leverage scores of K(l + 1) Proof. On the board. Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 15 / 18

Sparse Recovery Algorithm Theorem ([GLPS12]) There exists an algorithm D and a distribution on matrices Φ of dimension ε 2 polylog (n) n, such that for any x R n, with high probability, D(Φx, i) can detect whether x i = Ω( x ) or x i = o( x ). Heavy hitter detection! Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 16 / 18

Constructing a Spectral Sparsifiers via Linear Sketches 1 For i = 1,..., O(log n): (a) Maintain Φ D i B, (Φ is the sparse recovery matrix, D i R (n 2) ( n 2) is diagonal) 2 Repeat O(log n) times We are done! Proof Sketch. Enough information to traverse the hierarchy of K(0) to K(d) At each level l, compute Φ D i B K(l) b e for every edge e Run D(Φ D i B K(l) b e, e) to sample an edge e Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 17 / 18

Questions? Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 18 / 18

Anna C. Gilbert, Yi Li, Ely Porat, and Martin J. Strauss. Approximate sparse recovery: Optimizing time and measurements. SIAM J. Comput., 41(2):436 453, 2012. Michael Kapralov, Yin Tat Lee, Cameron Musco, Christopher Musco, and Aaron Sidford. Single pass spectral sparsification in dynamic streams. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014,, pages 561 570, 2014. Mu Li, Gary L. Miller, and Richard Peng. Iterative row sampling. In 54th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2013,, pages 127 136, 2013. Daniel A. Spielman and Shang-Hua Teng. Spectral sparsification of graphs. SIAM J. Comput., 40(4):981 1025, 2011. Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 18 / 18