CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

Similar documents

Linear Algebra Review. Vectors

Text Analytics (Text Mining)

Manifold Learning Examples PCA, LLE and ISOMAP

Orthogonal Diagonalization of Symmetric Matrices

Lecture 5: Singular Value Decomposition SVD (1)

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

Similarity and Diagonalization. Similar Matrices

CS3220 Lecture Notes: QR factorization and orthogonal transformations

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig

17. Inner product spaces Definition Let V be a real vector space. An inner product on V is a function

α = u v. In other words, Orthogonal Projection

Notes on Symmetric Matrices

Nonlinear Iterative Partial Least Squares Method

Chapter 6. Orthogonality

3 Orthogonal Vectors and Matrices

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Review Jeopardy. Blue vs. Orange. Review Jeopardy

6. Cholesky factorization

1 Introduction to Matrices

Inner Product Spaces and Orthogonality

Orthogonal Projections

Modélisation et résolutions numérique et symbolique

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Linear Algebraic Equations, SVD, and the Pseudo-Inverse

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances

Data Mining: Algorithms and Applications Matrix Math Review

Math 550 Notes. Chapter 7. Jesse Crawford. Department of Mathematics Tarleton State University. Fall 2010

by the matrix A results in a vector which is a reflection of the given

Learning, Sparsity and Big Data

Inner product. Definition of inner product

Introduction to Matrix Algebra

5. Orthogonal matrices

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Problem Set 5 Due: In class Thursday, Oct. 18 Late papers will be accepted until 1:00 PM Friday.

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Similar matrices and Jordan form

Applied Linear Algebra I Review page 1

[1] Diagonal factorization

DATA ANALYSIS II. Matrix Algorithms

3. INNER PRODUCT SPACES

LINEAR ALGEBRA. September 23, 2010

1 VECTOR SPACES AND SUBSPACES

Metrics on SO(3) and Inverse Kinematics

Computational Optical Imaging - Optique Numerique. -- Deconvolution --

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Dimensionality Reduction: Principal Components Analysis

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classiﬁca6on

Sketch As a Tool for Numerical Linear Algebra

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Numerical Analysis Lecture Notes

Multidimensional data analysis

Linear Algebra and TI 89

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

CONTROLLABILITY. Chapter Reachable Set and Controllability. Suppose we have a linear system described by the state equation

WEEK #3, Lecture 1: Sparse Systems, MATLAB Graphics

Randomized Robust Linear Regression for big data applications

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Linear algebra and the geometry of quadratic equations. Similarity transformations and orthogonal matrices

We shall turn our attention to solving linear systems of equations. Ax = b

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Dimension Reduction. Wei-Ta Chu 2014/10/22. Multimedia Content Analysis, CSIE, CCU

Examination paper for TMA4205 Numerical Linear Algebra

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Vector and Matrix Norms

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Wavelet analysis. Wavelet requirements. Example signals. Stationary signal 2 Hz + 10 Hz + 20Hz. Zero mean, oscillatory (wave) Fast decay (let)

Recall that two vectors in are perpendicular or orthogonal provided that their dot

Server Load Prediction

(67902) Topics in Theory and Complexity Nov 2, Lecture 7

Clustering Very Large Data Sets with Principal Direction Divisive Partitioning

Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

Inner Product Spaces

The world s largest matrix computation. (This chapter is out of date and needs a major overhaul.)

Topological Data Analysis Applications to Computer Vision

Finite Dimensional Hilbert Spaces and Linear Inverse Problems

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication

SYMMETRIC EIGENFACES MILI I. SHAH

Subspace Analysis and Optimization for AAM Based Face Alignment

Eigenvalues and Eigenvectors

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Linear Algebra Methods for Data Mining

Lecture 18 - Clifford Algebras and Spin groups

Lecture Notes 2: Matrices as Systems of Linear Equations

Transcription:

CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace are effec=ve representa=on of the data 2

Dimensionality Reduc=on Compress / reduce dimensionality: 10 6 rows; 10 3 columns; no updates Random access to any cell(s); small error: OK The above matrix is really 2-dimensional. All rows can be reconstructed by scaling [1 1 1 0 0] or [0 0 0 1 1] 3

Rank of a Matrix Q: What is rank of a matrix A? A: Number of linearly independent columns of A For example: Matrix A = has rank r=2 Why? The first two rows are linearly independent, so the rank is at least 2, but all three rows are linearly dependent (the first is equal to the sum of the second and third) so the rank must be less than 3. Why do we care about low rank? We can write A as two basis vectors: [1 2 1] [- 2-3 1] And new coordinates of : [1 0] [0 1] [1 1] 4

Rank is Dimensionality Cloud of points 3D space: Think of point posi7ons as a matrix: 1 row per point: A B C A We can rewrite coordinates more efficiently! Old basis vectors: [1 0 0] [0 1 0] [0 0 1] New basis vectors: [1 2 1] [- 2-3 1] Then A has new coordinates: [1 0]. B: [0 1], C: [1 1] No=ce: We reduced the number of coordinates! 5

Dimensionality Reduc=on Goal of dimensionality reduc=on is to discover the axis of data! Rather than representing every point with 2 coordinates we represent each point with 1 coordinate (corresponding to the position of the point on the red line). By doing this we incur a bit of error as the points do not exactly lie on the line 6

Why Reduce Dimensions? Why reduce dimensions? Discover hidden correla=ons/topics Words that occur commonly together Remove redundant and noisy features Not all words are useful Interpreta=on and visualiza=on Easier storage and processing of the data 7

SVD - Defini=on A [m x n] = U [m x r] Σ [ r x r] (V [n x r] ) T A: Input data matrix m x n matrix (e.g., m documents, n terms) U: Lec singular vectors m x r matrix (m documents, r concepts) Σ: Singular values r x r diagonal matrix (strength of each concept ) (r : rank of the matrix A) V: Right singular vectors n x r matrix (n terms, r concepts) 8

SVD T n n m A m Σ V T U 9

SVD T n σ 1 u 1 v 1 σ 2 u 2 v 2 m A + σ i scalar u i vector v i vector 10

SVD - Proper=es It is always possible to decompose a real matrix A into A = U Σ V T, where U, Σ, V: unique U, V: column orthonormal U T U = I; V T V = I (I: iden7ty matrix) (Columns are orthogonal unit vectors) Σ: diagonal Entries (singular values) are posi7ve, and sorted in decreasing order (σ 1 σ 2... 0) Nice proof of uniqueness: hep://www.mpi- inf.mpg.de/~bast/ir- seminar- ws04/lecture2.pdf 11

SciFi Romnce SVD Example: Users- to- Movies A = U Σ V T - example: Users to Movies Matrix Alien Serenity Casablanca Amelie 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 = m U Σ n V T Concepts AKA Latent dimensions AKA Latent factors 12

SciFi Romnce SVD Example: Users- to- Movies A = U Σ V T - example: Users to Movies Matrix Alien Serenity Casablanca Amelie 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 = 0.13 0.02-0.01 0.41 0.07-0.03 0.55 0.09-0.04 0.68 0.11-0.05 0.15-0.59 0.65 0.07-0.73-0.67 0.07-0.29 0.32 x 12.4 0 0 0 9.5 0 0 0 1.3 x 0.56 0.59 0.56 0.09 0.09 0.12-0.02 0.12-0.69-0.69 0.40-0.80 0.40 0.09 13 0.09

SciFi Romnce SVD Example: Users- to- Movies A = U Σ V T - example: Users to Movies Matrix Alien Serenity Casablanca Amelie 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 SciFi- concept = 0.13 0.02-0.01 0.41 0.07-0.03 0.55 0.09-0.04 0.68 0.11-0.05 0.15-0.59 0.65 0.07-0.73-0.67 0.07-0.29 0.32 Romance- concept x 12.4 0 0 0 9.5 0 0 0 1.3 x 0.56 0.59 0.56 0.09 0.09 0.12-0.02 0.12-0.69-0.69 0.40-0.80 0.40 0.09 14 0.09

SciFi Romnce SVD Example: Users- to- Movies A = U Σ V T - example: Matrix Alien Serenity Casablanca Amelie 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 SciFi- concept = 0.13 0.02-0.01 0.41 0.07-0.03 0.55 0.09-0.04 0.68 0.11-0.05 0.15-0.59 0.65 0.07-0.73-0.67 0.07-0.29 0.32 Romance- concept U is user- to- concept similarity matrix x 12.4 0 0 0 9.5 0 0 0 1.3 x 0.56 0.59 0.56 0.09 0.09 0.12-0.02 0.12-0.69-0.69 0.40-0.80 0.40 0.09 15 0.09

SVD Example: Users- to- Movies SciFi Romnce A = U Σ V T - example: Matrix Alien Serenity Casablanca Amelie 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 SciFi- concept = 0.13 0.02-0.01 0.41 0.07-0.03 0.55 0.09-0.04 0.68 0.11-0.05 0.15-0.59 0.65 0.07-0.73-0.67 0.07-0.29 0.32 x strength of the SciFi- concept 12.4 0 0 0 9.5 0 0 0 1.3 x 0.56 0.59 0.56 0.09 0.09 0.12-0.02 0.12-0.69-0.69 0.40-0.80 0.40 0.09 16 0.09

SVD Example: Users- to- Movies SciFi Romnce A = U Σ V T - example: Matrix Alien Serenity Casablanca Amelie 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 SciFi- concept = 0.13 0.02-0.01 0.41 0.07-0.03 0.55 0.09-0.04 0.68 0.11-0.05 0.15-0.59 0.65 0.07-0.73-0.67 0.07-0.29 0.32 SciFi- concept V is movie- to- concept similarity matrix x 12.4 0 0 0 9.5 0 0 0 1.3 x 0.56 0.59 0.56 0.09 0.09 0.12-0.02 0.12-0.69-0.69 0.40-0.80 0.40 0.09 17 0.09

SVD - Interpreta=on #1 movies, users and concepts : U: user- to- concept similarity matrix V: movie- to- concept similarity matrix Σ: its diagonal elements: strength of each concept 18

DIMENSIONALITY REDUCTION WITH SVD

SVD Dimensionality Reduc=on Movie 2 rating first right singular vector v 1 Movie 1 rating 20

SVD Dimensionality Reduc=on 21

SVD - Interpreta=on #2 A = U Σ V T - example: V: movie- to- concept matrix U: user- to- concept matrix Movie 2 rating v 1 first right singular vector 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02-0.01 0.41 0.07-0.03 0.55 0.09-0.04 Movie 1 rating = 0.68 0.11-0.05 x 0 9.5 0 x 0.15-0.59 0.65 0.07-0.73-0.67 0.07-0.29 0.32 12.4 0 0 0 0 1.3 0.56 0.59 0.56 0.09 0.09 0.12-0.02 0.12-0.69-0.69 0.40-0.80 0.40 0.09 22 0.09

SVD - Interpreta=on #2 A = U Σ V T - example: variance ( spread ) on the v 1 axis Movie 2 rating v 1 first right singular vector 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02-0.01 0.41 0.07-0.03 0.55 0.09-0.04 Movie 1 rating = 0.68 0.11-0.05 x 0 9.5 0 x 0.15-0.59 0.65 0.07-0.73-0.67 0.07-0.29 0.32 12.4 0 0 0 0 1.3 0.56 0.59 0.56 0.09 0.09 0.12-0.02 0.12-0.69-0.69 0.40-0.80 0.40 0.09 23 0.09

SVD - Interpreta=on #2 A = U Σ V T - example: U Σ: Gives the coordinates of the points in the projec7on axis Movie 2 rating v 1 first right singular vector 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 Projection of users on the Sci-Fi axis (U Σ) Τ : Movie 1 rating 1.61 0.19-0.01 5.08 0.66-0.03 6.82 0.85-0.05 8.43 1.04-0.06 1.86-5.60 0.84 0.86-6.93-0.87 0.86-2.75 0.41 24

SVD - Interpreta=on #2 More details Q: How exactly is dim. reduc=on done? 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02-0.01 0.41 0.07-0.03 0.55 0.09-0.04 = 0.68 0.11-0.05 x 0 9.5 0 x 0.15-0.59 0.65 0.07-0.73-0.67 0.07-0.29 0.32 12.4 0 0 0 0 1.3 0.56 0.59 0.56 0.09 0.09 0.12-0.02 0.12-0.69-0.69 0.40-0.80 0.40 0.09 25 0.09

SVD - Interpreta=on #2 More details Q: How exactly is dim. reduc=on done? A: Set smallest singular values to zero 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02-0.01 0.41 0.07-0.03 0.55 0.09-0.04 = 0.68 0.11-0.05 x 0 9.5 0 x 0.15-0.59 0.65 0.07-0.73-0.67 0.07-0.29 0.32 12.4 0 0 0 0 1.3 0.56 0.59 0.56 0.09 0.09 0.12-0.02 0.12-0.69-0.69 0.40-0.80 0.40 0.09 26 0.09

SVD - Interpreta=on #2 More details Q: How exactly is dim. reduc=on done? A: Set smallest singular values to zero 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02-0.01 0.41 0.07-0.03 0.55 0.09-0.04 0.68 0.11-0.05 0.15-0.59 0.65 0.07-0.73-0.67 0.07-0.29 0.32 x 12.4 0 0 0 9.5 0 0 0 1.3 x 0.56 0.59 0.56 0.09 0.09 0.12-0.02 0.12-0.69-0.69 0.40-0.80 0.40 0.09 27 0.09

SVD - Interpreta=on #2 More details Q: How exactly is dim. reduc=on done? A: Set smallest singular values to zero 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02-0.01 0.41 0.07-0.03 0.55 0.09-0.04 0.68 0.11-0.05 0.15-0.59 0.65 0.07-0.73-0.67 0.07-0.29 0.32 x 12.4 0 0 0 9.5 0 0 0 1.3 x 0.56 0.59 0.56 0.09 0.09 0.12-0.02 0.12-0.69-0.69 0.40-0.80 0.40 0.09 28 0.09

SVD - Interpreta=on #2 More details Q: How exactly is dim. reduc=on done? A: Set smallest singular values to zero 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02 0.41 0.07 0.55 0.09 0.68 0.11 x 0 9.5 x 0.15-0.59 0.07-0.73 0.07-0.29 12.4 0 0.56 0.59 0.56 0.09 0.09 0.12-0.02 0.12-0.69-0.69 29

SVD - Interpreta=on #2 More details Q: How exactly is dim. reduc=on done? A: Set smallest singular values to zero 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 Frobenius norm: 0.92 0.95 0.92 0.01 0.01 2.91 3.01 2.91-0.01-0.01 3.90 4.04 3.90 0.01 0.01 4.82 5.00 4.82 0.03 0.03 0.70 0.53 0.70 4.11 4.11-0.69 1.34-0.69 4.78 4.78 0.32 0.23 0.32 2.01 2.01 ǁ Mǁ F = Σ ij M ij 2 ǁ A-Bǁ F = Σ ij (A ij -B ij ) 2 is small 30

SVD Best Low Rank Approx. Sigma A = U V T B is best approximation of A Sigma B = U V T 31

SVD Best Low Rank Approx. Theorem: Let A = U Σ V T and B = U S V T where S = diagonal rxr matrix with s i =σ i (i=1 k) else s i =0 then B is a best rank(b)=k approx. to A What do we mean by best : B is a solu=on to min B ǁ A-Bǁ F where rank(b)=k Σ 32

Details! SVD Best Low Rank Approx. 33

Details! SVD Best Low Rank Approx. 34

Details! SVD Best Low Rank Approx. 35

SVD - Interpreta=on #2 Equivalent: spectral decomposi=on of the matrix: 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 = u 1 u 2 x σ 1 σ 2 x v 1 v 2 36

SVD - Interpreta=on #2 Equivalent: spectral decomposi=on of the matrix n m 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 = u 1 σ 1 v T 1 + σ 2 u 2 vt 2 +... n x 1 1 x m k terms Assume: σ 1 σ 2 σ 3... 0 Why is setting small σ i to 0 the right thing to do? Vectors u i and v i are unit length, so σ i scales them. So, zeroing small σ i introduces less error. 37

SVD - Interpreta=on #2 asd n m 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 = σ 1 u 1 v T 1 + σ 2 u 2 vt 2 +... Assume: σ 1 σ 2 σ 3... 38

SVD - Complexity To compute SVD: O(nm 2 ) or O(n 2 m) (whichever is less) But: Less work, if we just want singular values or if we want first k singular vectors or if the matrix is sparse Implemented in linear algebra packages like LINPACK, Matlab, SPlus, Mathema7ca... 39

SVD - Conclusions so far SVD: A= U Σ V T : unique U: user- to- concept similari7es V: movie- to- concept similari7es Σ : strength of each concept Dimensionality reduc=on: keep the few largest singular values (80-90% of energy ) SVD: picks up linear correla7ons 40

Rela=on to Eigen- decomposi=on SVD gives us: A = U Σ V T Eigen- decomposi=on: A = X Λ X T A is symmetric U, V, X are orthonormal (U T U=I), Λ, Σ are diagonal Now let s calculate: AA T = UΣ V T (UΣ V T ) T = UΣ V T (VΣ T U T ) = UΣΣ T U T A T A = V Σ T U T (UΣ V T ) = V ΣΣ T V T 41

Rela=on to Eigen- decomposi=on SVD gives us: A = U Σ V T Eigen- decomposi=on: A = X Λ X T A is symmetric U, V, X are orthonormal (U T U=I), Λ, Σ are diagonal Now let s calculate: AA T = UΣ V T (UΣ V T ) T = UΣ V T (VΣ T U T ) = UΣΣ T U T A T A = V Σ T U T (UΣ V T ) = V ΣΣ T V T X Λ 2 X T Shows how to compute SVD using eigenvalue decomposition! X Λ 2 X T 42

SVD: Proper=es A A T = U Σ 2 U T A T A = V Σ 2 V T (A T A) k = V Σ 2k V T E.g.: (A T A) 2 = V Σ 2 V T V Σ 2 V T = V Σ 4 V T (A T A) k ~ v 1 σ 1 2k v 1 T for k>>1 43

EXAMPLE OF SVD & CONCLUSION

SciFi Romnce Case study: How to query? Q: Find users that like Matrix A: Map query into a concept space how? Matrix Alien Serenity Casablanca Amelie 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 = 0.13 0.02-0.01 0.41 0.07-0.03 0.55 0.09-0.04 0.68 0.11-0.05 0.15-0.59 0.65 0.07-0.73-0.67 0.07-0.29 0.32 x 12.4 0 0 0 9.5 0 0 0 1.3 x 0.56 0.59 0.56 0.09 0.09 0.12-0.02 0.12-0.69-0.69 45 0.40-0.80 0.40 0.09 0.09

Case study: How to query? Q: Find users that like Matrix A: Map query into a concept space how? Matrix Alien Serenity Casablanca Amelie Alien q q = 5 0 0 0 0 v2 v1 Project into concept space: Inner product with each concept vector v i Matrix 46

Case study: How to query? Q: Find users that like Matrix A: Map query into a concept space how? Matrix Alien Serenity Casablanca Amelie Alien q q = 5 0 0 0 0 v2 v1 q*v 1 Project into concept space: Inner product with each concept vector v i Matrix 47

Case study: How to query? Compactly, we have: q concept = q V E.g.: q = Matrix Alien Serenity Casablanca Amelie 5 0 0 0 0 SciFi- concept 0.56 0.12 0.59-0.02 x 0.56 0.12 0.09-0.69 = 2.8 0.6 0.09-0.69 movie- to- concept similari=es (V) 48

Case study: How to query? How would the user d that rated ( Alien, Serenity ) be handled? d concept = d V E.g.: q = Matrix Alien Serenity Casablanca Amelie 0 4 5 0 0 movie- to- concept similari=es (V) SciFi- concept 0.56 0.12 0.59-0.02 x 0.56 0.12 0.09-0.69 = 5.2 0.4 0.09-0.69 49

Case study: How to query? Observa=on: User d that rated ( Alien, Serenity ) will be similar to user q that rated ( Matrix ), although d and q have zero ra=ngs in common! Matrix Alien Serenity Casablanca Amelie SciFi- concept d = 0 4 5 0 0 5.2 0.4 q = 5 0 0 0 0 2.8 0.6 Zero ratings in common Similarity 0 50

SVD: Drawbacks + Op=mal low- rank approxima=on in terms of Frobenius norm - Interpretability problem: A singular vector specifies a linear combina7on of all input columns or rows - Lack of sparsity: Singular vectors are dense! = Σ V T U 51

CUR DECOMPOSITION

CUR Decomposi=on Frobenius norm: ǁ Xǁ F = Σ ij X ij 2 Goal: Express A as a product of matrices C,U,R Make ǁ A- C U Rǁ F small Constraints on C and R: A C U R 53

CUR Decomposi=on Goal: Express A as a product of matrices C,U,R Make ǁ A- C U Rǁ F small Constraints on C and R: Frobenius norm: ǁ Xǁ F = Σ ij X ij 2 Pseudo- inverse of the intersec=on of C and R A C U R 54

CUR: Provably good approx. to SVD Let: A k be the best rank k approxima7on to A (that is, A k is SVD of A) Theorem [Drineas et al.] CUR in O(m n) 7me achieves ǁ A- CURǁ F ǁ A- A k ǁ F + εǁ Aǁ F with probability at least 1- δ, by picking O(k log(1/δ)/ε 2 ) columns, and O(k 2 log 3 (1/δ)/ε 6 ) rows In practice: Pick 4k cols/rows 55

CUR: How it Works Sampling columns (similarly for rows): Note this is a randomized algorithm, same column can be sampled more than once 56

Compu=ng U Let W be the intersec7on of sampled columns C and rows R Let SVD of W = X Z Y T Then: U = W + = Y Z + X T Ζ + : reciprocals of non- zero singular values: Ζ + ii =1/ Ζ ii W + is the pseudoinverse A C W R Why pseudoinverse works? W = X Z Y then W -1 = X -1 Z -1 Y -1 Due to orthonomality X -1 =X T and Y -1 =Y T Since Z is diagonal Z -1 = 1/Z ii Thus, if W is nonsingular, pseudoinverse is the true inverse U = W + 57

CUR: Provably good approx. to SVD 58

CUR: Pros & Cons + Easy interpreta=on Since the basis vectors are actual columns and rows + Sparse basis Since the basis vectors are actual columns and rows - Duplicate columns and rows Columns of large norms will be sampled many 7mes Actual column Singular vector 59

Solu=on If we want to get rid of the duplicates: Throw them away Scale (mul7ply) the columns/rows by the square root of the number of duplicates R d R s A C d C s Construct a small U 60

SVD vs. CUR sparse and small SVD: A = U Σ V T Huge but sparse Big and dense dense but small CUR: A = C U R Huge but sparse Big but sparse 61

SVD vs. CUR: Simple Experiment DBLP bibliographic data Author- to- conference big sparse matrix A ij : Number of papers published by author i at conference j 428K authors (rows), 3659 conferences (columns) Very sparse Want to reduce dimensionality How much 7me does it take? What is the reconstruc7on error? How much space do we need? 62

Results: DBLP- big sparse matrix SVD SVD CUR CUR no duplicates SVD CUR CUR no dup CUR Accuracy: 1 rela7ve sum squared errors Space ra=o: #output matrix entries / #input matrix entries CPU =me Sun, Faloutsos: Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM 07. 63

What about linearity assump=on? SVD is limited to linear projec=ons: Lower- dimensional linear projec7on that preserves Euclidean distances Non- linear methods: Isomap Data lies on a nonlinear low- dim curve aka manifold Use the distance as measured along the manifold How? Build adjacency graph Geodesic distance is graph distance SVD/PCA the graph pairwise distance matrix 64

Further Reading: CUR Drineas et al., Fast Monte Carlo Algorithms for Matrices III: CompuEng a Compressed Approximate Matrix DecomposiEon, SIAM Journal on Compu7ng, 2006. J. Sun, Y. Xie, H. Zhang, C. Faloutsos: Less is More: Compact Matrix DecomposiEon for Large Sparse Graphs, SDM 2007 Intra- and interpopulaeon genotype reconstruceon from tagging SNPs, P. Paschou, M. W. Mahoney, A. Javed, J. R. Kidd, A. J. Paks7s, S. Gu, K. K. Kidd, and P. Drineas, Genome Research, 17(1), 96-107 (2007) Tensor- CUR DecomposiEons For Tensor- Based Data, M. W. Mahoney, M. Maggioni, and P. Drineas, Proc. 12- th Annual SIGKDD, 327-336 (2006) 65