# Lecture Topic: Low-Rank Approximations

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Lecture Topic: Low-Rank Approximations

2 Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original matrix by a rank-1 matrix. In this chapter, we will consider problems, where a sparse matrix is given and one hopes to find a structured (e.g., low-rank), dense matrix as close as possible to it, in some norm. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

3 Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original matrix by a rank-1 matrix. In this chapter, we will consider problems, where a sparse matrix is given and one hopes to find a structured (e.g., low-rank), dense matrix as close as possible to it, in some norm. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

4 Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original matrix by a rank-1 matrix. In this chapter, we will consider problems, where a sparse matrix is given and one hopes to find a structured (e.g., low-rank), dense matrix as close as possible to it, in some norm. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

5 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rank-r matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

6 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rank-r matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

7 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rank-r matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

8 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rank-r matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

9 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rank-r matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

10 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rank-r matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

11 Another Example One may also consider estimating positions of sensors from some of their pair-wise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have low-power radios, which allow them to estimate their distance from a handful of closest sensors. From these pair-wise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

12 Another Example One may also consider estimating positions of sensors from some of their pair-wise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have low-power radios, which allow them to estimate their distance from a handful of closest sensors. From these pair-wise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

13 Another Example One may also consider estimating positions of sensors from some of their pair-wise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have low-power radios, which allow them to estimate their distance from a handful of closest sensors. From these pair-wise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

14 Another Example One may also consider estimating positions of sensors from some of their pair-wise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have low-power radios, which allow them to estimate their distance from a handful of closest sensors. From these pair-wise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

15 Another Example One may also consider estimating positions of sensors from some of their pair-wise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have low-power radios, which allow them to estimate their distance from a handful of closest sensors. From these pair-wise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

16 Another Example One may also consider estimating positions of sensors from some of their pair-wise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have low-power radios, which allow them to estimate their distance from a handful of closest sensors. From these pair-wise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

17 Yet Another Example In the most striking result, we will see that for random rank-r matrices, knowing randomly drawn O(nr(log n) 2 ) elements makes it possible to reconstruct the complete matrix of O(n 2 ) elements without any error, with high probability. This has far-reaching consequences: Consider, for instance a digital camera. The price of sensors increases with the number of pixels, but many images are naturally low-rank. Although cameras with a single-pixel chip ( remain a curiosity, super-resolution techniques are actually wide-spread in medical imagining, where battery capacity is not a concern. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

18 Yet Another Example In the most striking result, we will see that for random rank-r matrices, knowing randomly drawn O(nr(log n) 2 ) elements makes it possible to reconstruct the complete matrix of O(n 2 ) elements without any error, with high probability. This has far-reaching consequences: Consider, for instance a digital camera. The price of sensors increases with the number of pixels, but many images are naturally low-rank. Although cameras with a single-pixel chip ( remain a curiosity, super-resolution techniques are actually wide-spread in medical imagining, where battery capacity is not a concern. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

19 Key Concepts A singular value and pair of singular vectors of A R m n are a scalar σ R, σ 0 and two non-zero vectors u R m and v R n such that Av = σu. In a matrix completion problem, with some elements (i, j) E of matrix A R m n known, you solve: min rank(m) s.t. M M R m r i,j = A i,j (i, j) E. (1.1) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

20 Key Concepts A singular value and pair of singular vectors of A R m n are a scalar σ R, σ 0 and two non-zero vectors u R m and v R n such that Av = σu. In a matrix completion problem, with some elements (i, j) E of matrix A R m n known, you solve: min rank(m) s.t. M M R m r i,j = A i,j (i, j) E. (1.1) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

21 Some Revision Definition (Orthogonality) Two vectors u, v R n are orthogonal if and only if their dot product n i=1 u iv i is zero. This suggest the angle of 90 degrees. The columns and rows of an orthogonal matrix U R n n are orthogonal unit vectors, i.e., U T U = UU T = I, where I is the identity matrix. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

22 Some Revision Definition (Orthogonality) Two vectors u, v R n are orthogonal if and only if their dot product n i=1 u iv i is zero. This suggest the angle of 90 degrees. The columns and rows of an orthogonal matrix U R n n are orthogonal unit vectors, i.e., U T U = UU T = I, where I is the identity matrix. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

23 Some More Intuition The linear transformation x Qx, for an orthogonal Q, is an isometry, i.e., preserves the dot product of vectors. Imagine a rotation or reflection. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

24 Some More Intuition The linear transformation x Qx, for an orthogonal Q, is an isometry, i.e., preserves the dot product of vectors. Imagine a rotation or reflection. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

25 Some More Intuition The linear transformation x Qx, for an orthogonal Q, is an isometry, i.e., preserves the dot product of vectors. Imagine a rotation or reflection. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

26 Some Revision Definition (Singular values and vectors of a matrix A R m n ) For every matrix A R m n, there exists a decomposition A = UΣV T, where: U is an m m orthogonal matrix whose m columns are left-singular vectors of A; Σ is m n matrix with Σ i,i 0, i min{m, n} being the singular values of A and all other elements 0; V T is n n orthogonal matrix whose n columns are right-singular vectors of A. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

27 Some More Intuition For A, det(a) > 0, Σ is a scaling matrix and U, V T rotation matrices. UΣV T is a composition a rotation, a scaling, and another rotation Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

28 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a non-negative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with non-negative real diagonal entries, which are the lengths of semi-axes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

29 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a non-negative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with non-negative real diagonal entries, which are the lengths of semi-axes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

30 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a non-negative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with non-negative real diagonal entries, which are the lengths of semi-axes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

31 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a non-negative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with non-negative real diagonal entries, which are the lengths of semi-axes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

32 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a non-negative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with non-negative real diagonal entries, which are the lengths of semi-axes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

33 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a non-negative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with non-negative real diagonal entries, which are the lengths of semi-axes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

34 Singular Values: Perturbation Analysis Much of the perturbation analysis we have seen for eigenvalues carries over. Let 0 m n, and let A R m n. Weyl inequality, for example: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m (2.1) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

35 Some Revision We have seen a variety of norms of x R n : Example n l 1 norm x 1 := x i (3.1) i=1 Maximum norm x := max { x 1,..., x n }. (3.2) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

36 Some Revision Let us consider a new concept, the conjugate norms and. By definition, In particular, 2 = 2 and 1 =. z = max y 1 y T z. (3.3) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

37 Some Revision Let us consider a new concept, the conjugate norms and. By definition, In particular, 2 = 2 and 1 =. z = max y 1 y T z. (3.3) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

38 Some Revision Let us consider a new concept, the conjugate norms and. By definition, In particular, 2 = 2 and 1 =. z = max y 1 y T z. (3.3) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

39 Some Revision Definition (Matrix norm) A is a norm of a matrix A R m n if and only if: A 0 A = 0 if and only if A = 0 αa = α A for all α in R and A R m n A + B A + B for all A, B R m n. Definition (Trace of A R n n ) trace(a) = a 11 + a a nn = n i=1 a ii. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

40 Some Revision ( ) min{m, n} Nuclear norm A := trace A T A = σ i. (3.4) Frobenius norm A F := trace(a T A) = i=1 k i=1 j=1 n a ij 2 1/2 = min{m, n} σi 2 i=1 (3.5) Spectral norm A 2 := λ max (A A) = σ max (A) (3.6) where A A denotes a positive semidefinite B such that B = A T A. F = F and spectral norm is the conjugate of the nuclear norm. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

41 Some Revision ( ) min{m, n} Nuclear norm A := trace A T A = σ i. (3.4) Frobenius norm A F := trace(a T A) = i=1 k i=1 j=1 n a ij 2 1/2 = min{m, n} σi 2 i=1 (3.5) Spectral norm A 2 := λ max (A A) = σ max (A) (3.6) where A A denotes a positive semidefinite B such that B = A T A. F = F and spectral norm is the conjugate of the nuclear norm. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

42 Some More Understanding Previously, we have mentioned that all matrix norms are similar. For matrix A R m n of rank r: A 2 A F r A 2 A F A r A F Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

43 Some More Understanding Previously, we have mentioned that all matrix norms are similar. For matrix A R m n of rank r: A 2 A F r A 2 A F A r A F Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

44 Matrix Completion In general, let us consider: min M M A M N where M R m n is some subset of m n matrices, N is a matrix norm. In particular: 2 or F, M is rank-r, A is dense, M is dense: SVD F, M is rank-r, A is sparse, M is dense: NP-Hard various N, M is rank-1 with sparsity: NP-Hard Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

45 Low-Rank Matrices Theorem (Eckart and Young) Let us have rank-r matrix A R m n, A = UΣV T = r i=1 σ iu i vi T. Consider k k < r and the so called truncated singular value decomposition A k = σ i u i vi T, More visually, arg min B R m n rank(b) k A B F = arg min B R m n rank(b) k i=1 A B 2 = A k (4.1) A = [ ] [ ] Σ U 1 U 1 0 [V1 ] T 2 V 0 Σ 2, (4.2) 2 A K = U 1 Σ 1 V T 1 (4.3) where Σ 1 R k k, U 1 R m k, and V 1 R n k. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

46 Low-Rank Matrices There are a number of proofs One can use the Weyl inequality: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m If B has rank k, σ k+1 (B) = 0. One uses B and AB, j = k + 1. For the spectral norm, i = 1 suffices. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

47 Low-Rank Matrices There are a number of proofs One can use the Weyl inequality: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m If B has rank k, σ k+1 (B) = 0. One uses B and AB, j = k + 1. For the spectral norm, i = 1 suffices. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

48 Low-Rank Matrices There are a number of proofs One can use the Weyl inequality: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m If B has rank k, σ k+1 (B) = 0. One uses B and AB, j = k + 1. For the spectral norm, i = 1 suffices. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

49 Low-Rank Matrices There are a number of proofs One can use the Weyl inequality: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m If B has rank k, σ k+1 (B) = 0. One uses B and AB, j = k + 1. For the spectral norm, i = 1 suffices. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

50 Low-Rank Matrices There are a number of proofs One can use the Weyl inequality: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m If B has rank k, σ k+1 (B) = 0. One uses B and AB, j = k + 1. For the spectral norm, i = 1 suffices. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

51 Sparse Low-Rank Matrices Consider again the applications of low-rank matrix reconstruction: predicting ratings of movies by individual users, in collaborative filtering, wher each user has rated 200 out of movies on offer, or estimating positions of sensors from some of their pair-wise positions, in sensor network localisation, where one may know positions to 4 or 5 sensors. They share the property that we know only a very small number of entries of the matrix. Imputing 0 or similar is a bad idea. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

52 Sparse Low-Rank Matrices Consider again the applications of low-rank matrix reconstruction: predicting ratings of movies by individual users, in collaborative filtering, wher each user has rated 200 out of movies on offer, or estimating positions of sensors from some of their pair-wise positions, in sensor network localisation, where one may know positions to 4 or 5 sensors. They share the property that we know only a very small number of entries of the matrix. Imputing 0 or similar is a bad idea. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

53 Sparse Low-Rank Matrices Consider again the applications of low-rank matrix reconstruction: predicting ratings of movies by individual users, in collaborative filtering, wher each user has rated 200 out of movies on offer, or estimating positions of sensors from some of their pair-wise positions, in sensor network localisation, where one may know positions to 4 or 5 sensors. They share the property that we know only a very small number of entries of the matrix. Imputing 0 or similar is a bad idea. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

54 Sparse Low-Rank Matrices Consider again the applications of low-rank matrix reconstruction: predicting ratings of movies by individual users, in collaborative filtering, wher each user has rated 200 out of movies on offer, or estimating positions of sensors from some of their pair-wise positions, in sensor network localisation, where one may know positions to 4 or 5 sensors. They share the property that we know only a very small number of entries of the matrix. Imputing 0 or similar is a bad idea. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

55 Sparse Low-Rank Matrices Consider again the applications of low-rank matrix reconstruction: predicting ratings of movies by individual users, in collaborative filtering, wher each user has rated 200 out of movies on offer, or estimating positions of sensors from some of their pair-wise positions, in sensor network localisation, where one may know positions to 4 or 5 sensors. They share the property that we know only a very small number of entries of the matrix. Imputing 0 or similar is a bad idea. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

56 Sparse Low-Rank Matrices Let us know only some elements (i, j) E of matrix A R m n. Assume that there exists only one rank-r matrix M with those entries. Then, the search for the simplest explanation fitting the observed data is: The problem is: min rank(m) s.t. M M R m r i,j = A i,j (i, j) E (5.1) non-convex in M and very hard easy to reformulate in a number of ways. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

57 Sparse Low-Rank Matrices Let us know only some elements (i, j) E of matrix A R m n. Consider the fact that rank-r matrix M = XY T, X R m r, Y R n r and: The problem is: non-convex in XY T arg convex in either X or Y. min X R m r Y R n r (i,j) E ( (XY T ) i,j A i,j ) 2 (5.2) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

58 Sparse Low-Rank Matrices A rank-r matrix has exactly r non-zero singular values. Rank can hence be seen as the l 0 norm of the spectrum. Considering we have seen l 0 norm being replaced by l 1 norm, Fazel proposed to replace rank with the spectral norm: The problem is: arg min M subject to M R m n (i,j) E ( (MY T ) i,j A i,j ) 2 (5.3) convex in M and possible to solve using interior-point methods the optimum of the convex problem coincides with the global optimum of the non-convex problem (!) with high probability: Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

59 Sparse Low-Rank Matrices A rank-r matrix has exactly r non-zero singular values. Rank can hence be seen as the l 0 norm of the spectrum. Considering we have seen l 0 norm being replaced by l 1 norm, Fazel proposed to replace rank with the spectral norm: The problem is: arg min M subject to M R m n (i,j) E ( (MY T ) i,j A i,j ) 2 (5.3) convex in M and possible to solve using interior-point methods the optimum of the convex problem coincides with the global optimum of the non-convex problem (!) with high probability: Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

60 Sparse Low-Rank Matrices A rank-r matrix has exactly r non-zero singular values. Rank can hence be seen as the l 0 norm of the spectrum. Considering we have seen l 0 norm being replaced by l 1 norm, Fazel proposed to replace rank with the spectral norm: The problem is: arg min M subject to M R m n (i,j) E ( (MY T ) i,j A i,j ) 2 (5.3) convex in M and possible to solve using interior-point methods the optimum of the convex problem coincides with the global optimum of the non-convex problem (!) with high probability: Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

61 Sparse Low-Rank Matrices Theorem (Candes and Recht) Let us assume M R m n of rank r is sampled from the random orthogonal model. Suppose we observe entries of M with locations E sampled uniformly at random. Then there are numerical constants C 1 and C 1 such that if E C 1 r (max{m, n}) 5/4 log(max{m, n}), (5.4) the minimizer to the -minimisation problem is unique and equal to M with probability at least 1 C 2 (max{m, n}) 3. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

62 Sparse Low-Rank Matrices The result of the previous theorem is of considerable theoretical and practical interest. It has been cited more than 1800 times. Although the -minimisation problem is possible to approximate within any fixed precision in polynomial time, this is limited to modest n 1000 in practice. Notice that the interior point method needs to invert the Hessian, where even the matrix variable is n n. One would hence like to find more efficient algorithms. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

63 Sparse Low-Rank Matrices The result of the previous theorem is of considerable theoretical and practical interest. It has been cited more than 1800 times. Although the -minimisation problem is possible to approximate within any fixed precision in polynomial time, this is limited to modest n 1000 in practice. Notice that the interior point method needs to invert the Hessian, where even the matrix variable is n n. One would hence like to find more efficient algorithms. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

64 Sparse Low-Rank Matrices The result of the previous theorem is of considerable theoretical and practical interest. It has been cited more than 1800 times. Although the -minimisation problem is possible to approximate within any fixed precision in polynomial time, this is limited to modest n 1000 in practice. Notice that the interior point method needs to invert the Hessian, where even the matrix variable is n n. One would hence like to find more efficient algorithms. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

65 Sparse Low-Rank Matrices The result of the previous theorem is of considerable theoretical and practical interest. It has been cited more than 1800 times. Although the -minimisation problem is possible to approximate within any fixed precision in polynomial time, this is limited to modest n 1000 in practice. Notice that the interior point method needs to invert the Hessian, where even the matrix variable is n n. One would hence like to find more efficient algorithms. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

66 Sparse Low-Rank Matrices The result of the previous theorem is of considerable theoretical and practical interest. It has been cited more than 1800 times. Although the -minimisation problem is possible to approximate within any fixed precision in polynomial time, this is limited to modest n 1000 in practice. Notice that the interior point method needs to invert the Hessian, where even the matrix variable is n n. One would hence like to find more efficient algorithms. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

67 : Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

68 Sparse Low-Rank Matrices Alternating Minimisation: 1 Partition E = E 1 E 2... E kmax 2 Compute SVD min m,n i σ i X i Yi T considering only E 1 3 Initialise X 1 = mn E 1 σi x i Y 1 = mn E 1 σi y i 4 For each iteration k = 1... k max O(log n): X k+1 = min (X ((Y k ) T ) i,j A i,j ) 2 (5.5) X R m r (i,j) E k+1 Y k+1 = min (X k+1 Y T ) i,j A i,j ) 2 (5.6) Y R n r (i,j) E k+1 This: solves linear least squares twice in each iteration, in dimensions mr and nr generally takes O((mr) 2 ), O((nr) 2 ), but for the partial separable structure, it is O( E r 2 ), O( E r 2 ) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

69 Sparse Low-Rank Matrices Theorem (Keshavan et al.) Let us assume M = X (Y ) T + W, X R m r, Y R n r, W R m n with elements of W, X, and Y being bounded i.i.d random variables, for X, Y zero-mean, and expectation of W satisfying, among others: θ = σ max (W ), and P ( W i,j W i,j t ) ) 2 exp ( t2 2ω 2. (5.7) There exists constants C 1, C 2 such that k max = C 1 log n and E C 2 κ 8 nr(log n) 2 and E uniformly distributed over all sets of E, such that with probability larger than 1 1/n 4, one has: M (X k (Y k ) T ) F 6 r 2 2k + C 2 rκ 2 (θ + nω ) ɛ (5.8) where κ = max{σ min (X ) 1, σ min (Y ) 1 }. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

70 Regularisations of PCA Alternating minimisation is a very general approach to optimisation problems. For example, consider a generalisation of PCA: with v being l 2 and no norm s. max x R n{ Ax v : x 2 1, x s k}, (6.1) v of l 1 norm works better, in terms of perturbation analysis (stability, robustness). s such as l 1 improves interpretability (sparsity in the loading vector) by approximating l 0. As we have seen in the previous chapter, l 1 norm is non-smooth. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

71 Regularisations of PCA Alternating minimisation is a very general approach to optimisation problems. For example, consider a generalisation of PCA: with v being l 2 and no norm s. max x R n{ Ax v : x 2 1, x s k}, (6.1) v of l 1 norm works better, in terms of perturbation analysis (stability, robustness). s such as l 1 improves interpretability (sparsity in the loading vector) by approximating l 0. As we have seen in the previous chapter, l 1 norm is non-smooth. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

72 Regularisations of PCA Alternating minimisation is a very general approach to optimisation problems. For example, consider a generalisation of PCA: with v being l 2 and no norm s. max x R n{ Ax v : x 2 1, x s k}, (6.1) v of l 1 norm works better, in terms of perturbation analysis (stability, robustness). s such as l 1 improves interpretability (sparsity in the loading vector) by approximating l 0. As we have seen in the previous chapter, l 1 norm is non-smooth. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

73 Regularisations of PCA Alternating minimisation is a very general approach to optimisation problems. For example, consider a generalisation of PCA: with v being l 2 and no norm s. max x R n{ Ax v : x 2 1, x s k}, (6.1) v of l 1 norm works better, in terms of perturbation analysis (stability, robustness). s such as l 1 improves interpretability (sparsity in the loading vector) by approximating l 0. As we have seen in the previous chapter, l 1 norm is non-smooth. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

74 Regularisations of PCA Alternating minimisation is a very general approach to optimisation problems. For example, consider a generalisation of PCA: with v being l 2 and no norm s. max x R n{ Ax v : x 2 1, x s k}, (6.1) v of l 1 norm works better, in terms of perturbation analysis (stability, robustness). s such as l 1 improves interpretability (sparsity in the loading vector) by approximating l 0. As we have seen in the previous chapter, l 1 norm is non-smooth. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

75 Regularisations Richtárik et al. summarise 8 possible regularisations of the problem of computing the first PC by combining: two norms for measuring variance (l 1, l 2 ) and two sparsity-inducing norms (cardinality l 0 and l 1 ), either in a constraint or in a penalty term. All have the form with X R n and f. OPT = max f (x), (6.2) x X Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

76 Regularisations Richtárik et al. summarise 8 possible regularisations of the problem of computing the first PC by combining: two norms for measuring variance (l 1, l 2 ) and two sparsity-inducing norms (cardinality l 0 and l 1 ), either in a constraint or in a penalty term. All have the form with X R n and f. OPT = max f (x), (6.2) x X Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

77 Regularisations Richtárik et al. summarise 8 possible regularisations of the problem of computing the first PC by combining: two norms for measuring variance (l 1, l 2 ) and two sparsity-inducing norms (cardinality l 0 and l 1 ), either in a constraint or in a penalty term. All have the form with X R n and f. OPT = max f (x), (6.2) x X Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

78 Regularisations Richtárik et al. summarise 8 possible regularisations of the problem of computing the first PC by combining: two norms for measuring variance (l 1, l 2 ) and two sparsity-inducing norms (cardinality l 0 and l 1 ), either in a constraint or in a penalty term. All have the form with X R n and f. OPT = max f (x), (6.2) x X Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

79 Regularisations # v s s use X f (x) 1 L 2 L 0 constraint {x R n : x 2 1, x 0 s} Ax 2 2 L 1 L 0 constraint {x R n : x 2 1, x 0 s} Ax 1 3 L 2 L 1 constraint {x R n : x 2 1, x 1 s} Ax 2 4 L 1 L 1 constraint {x R n : x 2 1, x 1 s} Ax 1 5 L 2 L 0 penalty {x R n : x 2 1} Ax 2 2 γ x 0 6 L 1 L 0 penalty {x R n : x 2 1} Ax 2 1 γ x 0 7 L 2 L 1 penalty {x R n : x 2 1} Ax 2 γ x 1 8 L 1 L 1 penalty {x R n : x 2 1} Ax 1 γ x 1 Table : Eight regularisations of PCA, cited in verbatim from Richtárik et al. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

80 Regularisations Let Y := {y R m Y := {y R m : y 2 1} for the l 2 -norm and : y 1} for the l 1 norm, and let F (x, y) be the function obtained from f (x) after replacing Ax with y T Ax (resp. Ax 2 with (y T Ax) 2 ). Then, in view of the above, (6.2) takes on the equivalent form OPT = max F (x, y). (6.3) max x X y Y That is, the 8 problems can be reformulated into the form (6.3). Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

81 Regularisations Let Y := {y R m Y := {y R m : y 2 1} for the l 2 -norm and : y 1} for the l 1 norm, and let F (x, y) be the function obtained from f (x) after replacing Ax with y T Ax (resp. Ax 2 with (y T Ax) 2 ). Then, in view of the above, (6.2) takes on the equivalent form OPT = max F (x, y). (6.3) max x X y Y That is, the 8 problems can be reformulated into the form (6.3). Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

82 Regularisations Let Y := {y R m Y := {y R m : y 2 1} for the l 2 -norm and : y 1} for the l 1 norm, and let F (x, y) be the function obtained from f (x) after replacing Ax with y T Ax (resp. Ax 2 with (y T Ax) 2 ). Then, in view of the above, (6.2) takes on the equivalent form OPT = max F (x, y). (6.3) max x X y Y That is, the 8 problems can be reformulated into the form (6.3). Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

83 Regularisations # X Y F (x, y) 1 {x R n : x 2 1, x 0 s} {y R m : y 2 1} y T Ax 2 {x R n : x 2 1, x 0 s} {y R m : y 1} y T Ax 3 {x R n : x 2 1, x 1 s} {y R m : y 2 1} y T Ax 4 {x R n : x 2 1, x 1 s} {y R m : y 1} y T Ax 5 {x R n : x 2 1} {y R m : y 2 1} (y T Ax) 2 γ x 0 6 {x R n : x 2 1} {y R m : y 1} (y T Ax) 2 γ x 0 7 {x R n : x 2 1} {y R m : y 2 1} y T Ax γ x 1 8 {x R n : x 2 1} {y R m : y 1} y T Ax γ x 1 Table : Reformulations of the problems from Table 1. Cited in verbatim from Richtárik et al. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

84 Generalising the Power Method The alternating minimisation for the regularised problem (6.3) is: y k = arg max y Y F (x k, y) (6.4) x k+1 = arg max x X F (x, y k ). (6.5) As it turns out, there are closed-form solutions for the two sub-problems for all the variants above. Notice that Hotelling s deflation is no longer guaranteed to work, although there are replacements. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

85 Generalising the Power Method The alternating minimisation for the regularised problem (6.3) is: y k = arg max y Y F (x k, y) (6.4) x k+1 = arg max x X F (x, y k ). (6.5) As it turns out, there are closed-form solutions for the two sub-problems for all the variants above. Notice that Hotelling s deflation is no longer guaranteed to work, although there are replacements. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

86 Generalising the Power Method The alternating minimisation for the regularised problem (6.3) is: y k = arg max y Y F (x k, y) (6.4) x k+1 = arg max x X F (x, y k ). (6.5) As it turns out, there are closed-form solutions for the two sub-problems for all the variants above. Notice that Hotelling s deflation is no longer guaranteed to work, although there are replacements. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

87 A Summary Overall, we have seen that there are NP-Hard problems, for which one can retrieve the global optimum with high probability. Leading solvers based on alternating minimisation can tackle gigabyte-sized instances in minutes. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

88 A Summary Overall, we have seen that there are NP-Hard problems, for which one can retrieve the global optimum with high probability. Leading solvers based on alternating minimisation can tackle gigabyte-sized instances in minutes. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

### Lecture 5: Singular Value Decomposition SVD (1)

EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system

### Linear Algebra Review. Vectors

Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length

### Similarity and Diagonalization. Similar Matrices

MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

### Inner Product Spaces and Orthogonality

Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,

### Notes on Symmetric Matrices

CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.

### α = u v. In other words, Orthogonal Projection

Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v

### Chapter 6. Orthogonality

6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be

### Vector and Matrix Norms

Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

### Numerical Methods I Eigenvalue Problems

Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001, Fall 2010 September 30th, 2010 A. Donev (Courant Institute)

### Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = 36 + 41i.

Math 5A HW4 Solutions September 5, 202 University of California, Los Angeles Problem 4..3b Calculate the determinant, 5 2i 6 + 4i 3 + i 7i Solution: The textbook s instructions give us, (5 2i)7i (6 + 4i)(

### Inner Product Spaces

Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

### Section 6.1 - Inner Products and Norms

Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,

### 6. Cholesky factorization

6. Cholesky factorization EE103 (Fall 2011-12) triangular matrices forward and backward substitution the Cholesky factorization solving Ax = b with A positive definite inverse of a positive definite matrix

### University of Lille I PC first year list of exercises n 7. Review

University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients

### [1] Diagonal factorization

8.03 LA.6: Diagonalization and Orthogonal Matrices [ Diagonal factorization [2 Solving systems of first order differential equations [3 Symmetric and Orthonormal Matrices [ Diagonal factorization Recall:

### Derivative Free Optimization

Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTH-MAT-R--2014/02--SE Department of Mathematics Linköping University S-581 83 Linköping, Sweden. Three lectures 1 on Derivative Free

### Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.

Epipolar Geometry We consider two perspective images of a scene as taken from a stereo pair of cameras (or equivalently, assume the scene is rigid and imaged with a single camera from two different locations).

### 1 VECTOR SPACES AND SUBSPACES

1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such

### Chapter 17. Orthogonal Matrices and Symmetries of Space

Chapter 17. Orthogonal Matrices and Symmetries of Space Take a random matrix, say 1 3 A = 4 5 6, 7 8 9 and compare the lengths of e 1 and Ae 1. The vector e 1 has length 1, while Ae 1 = (1, 4, 7) has length

### Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Chapter 7 Eigenvalues and Eigenvectors In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigenvectors of matrices, concepts that were already introduced in Geometry

### October 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix

Linear Algebra & Properties of the Covariance Matrix October 3rd, 2012 Estimation of r and C Let rn 1, rn, t..., rn T be the historical return rates on the n th asset. rn 1 rṇ 2 r n =. r T n n = 1, 2,...,

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

### Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Notes on Orthogonal and Symmetric Matrices MENU, Winter 201 These notes summarize the main properties and uses of orthogonal and symmetric matrices. We covered quite a bit of material regarding these topics,

### Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

### Lecture 5 Principal Minors and the Hessian

Lecture 5 Principal Minors and the Hessian Eivind Eriksen BI Norwegian School of Management Department of Economics October 01, 2010 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and

### Nonlinear Iterative Partial Least Squares Method

Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

### Orthogonal Diagonalization of Symmetric Matrices

MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding

### 3. INNER PRODUCT SPACES

. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

### 2.3 Convex Constrained Optimization Problems

42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

### 13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in three-space, we write a vector in terms

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Examination paper for TMA4205 Numerical Linear Algebra

Department of Mathematical Sciences Examination paper for TMA4205 Numerical Linear Algebra Academic contact during examination: Markus Grasmair Phone: 97580435 Examination date: December 16, 2015 Examination

### Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

### Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear

### Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.

ORTHOGONAL MATRICES Informally, an orthogonal n n matrix is the n-dimensional analogue of the rotation matrices R θ in R 2. When does a linear transformation of R 3 (or R n ) deserve to be called a rotation?

### 4: EIGENVALUES, EIGENVECTORS, DIAGONALIZATION

4: EIGENVALUES, EIGENVECTORS, DIAGONALIZATION STEVEN HEILMAN Contents 1. Review 1 2. Diagonal Matrices 1 3. Eigenvectors and Eigenvalues 2 4. Characteristic Polynomial 4 5. Diagonalizability 6 6. Appendix:

### Constrained Least Squares

Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580-587 CICN may05/1 Background The least squares problem: min Ax b 2 x Sometimes,

### 5. Orthogonal matrices

L Vandenberghe EE133A (Spring 2016) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 5-1 Orthonormal

### Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

### Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain

Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain 1. Orthogonal matrices and orthonormal sets An n n real-valued matrix A is said to be an orthogonal

### The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every

### Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

### LINEAR ALGEBRA. September 23, 2010

LINEAR ALGEBRA September 3, 00 Contents 0. LU-decomposition.................................... 0. Inverses and Transposes................................. 0.3 Column Spaces and NullSpaces.............................

### Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Peter J. Olver 6. Eigenvalues and Singular Values In this section, we collect together the basic facts about eigenvalues and eigenvectors. From a geometrical viewpoint,

### Linear Algebraic Equations, SVD, and the Pseudo-Inverse

Linear Algebraic Equations, SVD, and the Pseudo-Inverse Philip N. Sabes October, 21 1 A Little Background 1.1 Singular values and matrix inversion For non-smmetric matrices, the eigenvalues and singular

### MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors Jordan canonical form (continued) Jordan canonical form A Jordan block is a square matrix of the form λ 1 0 0 0 0 λ 1 0 0 0 0 λ 0 0 J = 0

### CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace

### LINEAR ALGEBRA W W L CHEN

LINEAR ALGEBRA W W L CHEN c W W L Chen, 1997, 2008 This chapter is available free to all individuals, on understanding that it is not to be used for financial gain, and may be downloaded and/or photocopied,

### CS3220 Lecture Notes: QR factorization and orthogonal transformations

CS3220 Lecture Notes: QR factorization and orthogonal transformations Steve Marschner Cornell University 11 March 2009 In this lecture I ll talk about orthogonal matrices and their properties, discuss

### MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α

### Mathematical finance and linear programming (optimization)

Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may

### Solving polynomial least squares problems via semidefinite programming relaxations

Solving polynomial least squares problems via semidefinite programming relaxations Sunyoung Kim and Masakazu Kojima August 2007, revised in November, 2007 Abstract. A polynomial optimization problem whose

### 1 Norms and Vector Spaces

008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)

### Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e

### 1 Introduction to Matrices

1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns

### Key words. Principal Component Analysis, Convex optimization, Nuclear norm minimization, Duality, Proximal gradient algorithms.

FAST CONVEX OPTIMIZATION ALGORITHMS FOR EXACT RECOVERY OF A CORRUPTED LOW-RANK MATRIX ZHOUCHEN LIN, ARVIND GANESH, JOHN WRIGHT, LEQIN WU, MINMING CHEN, AND YI MA Abstract. This paper studies algorithms

### A note on companion matrices

Linear Algebra and its Applications 372 (2003) 325 33 www.elsevier.com/locate/laa A note on companion matrices Miroslav Fiedler Academy of Sciences of the Czech Republic Institute of Computer Science Pod

### x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

Cross product 1 Chapter 7 Cross product We are getting ready to study integration in several variables. Until now we have been doing only differential calculus. One outcome of this study will be our ability

### On Covariance Structure in Noisy, Big Data

On Covariance Structure in Noisy, Big Data Randy C. Paffenroth a, Ryan Nong a and Philip C. Du Toit a a Numerica Corporation, Loveland, CO, USA; ABSTRACT Herein we describe theory and algorithms for detecting

### Solution to Homework 2

Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if

### FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

### DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

### ISOMETRIES OF R n KEITH CONRAD

ISOMETRIES OF R n KEITH CONRAD 1. Introduction An isometry of R n is a function h: R n R n that preserves the distance between vectors: h(v) h(w) = v w for all v and w in R n, where (x 1,..., x n ) = x

### Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001,

### 5.1 Bipartite Matching

CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson

### Linear Algebra Notes for Marsden and Tromba Vector Calculus

Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of

### An Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.

An Overview Of Software For Convex Optimization Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.edu In fact, the great watershed in optimization isn t between linearity

### Lecture 4: Partitioned Matrices and Determinants

Lecture 4: Partitioned Matrices and Determinants 1 Elementary row operations Recall the elementary operations on the rows of a matrix, equivalent to premultiplying by an elementary matrix E: (1) multiplying

### 17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function

17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function, : V V R, which is symmetric, that is u, v = v, u. bilinear, that is linear (in both factors):

### Lecture 1: Schur s Unitary Triangularization Theorem

Lecture 1: Schur s Unitary Triangularization Theorem This lecture introduces the notion of unitary equivalence and presents Schur s theorem and some of its consequences It roughly corresponds to Sections

### Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

(für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained

### Continuity of the Perron Root

Linear and Multilinear Algebra http://dx.doi.org/10.1080/03081087.2014.934233 ArXiv: 1407.7564 (http://arxiv.org/abs/1407.7564) Continuity of the Perron Root Carl D. Meyer Department of Mathematics, North

### Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they

### Inner products on R n, and more

Inner products on R n, and more Peyam Ryan Tabrizian Friday, April 12th, 2013 1 Introduction You might be wondering: Are there inner products on R n that are not the usual dot product x y = x 1 y 1 + +

### Maximum-Margin Matrix Factorization

Maximum-Margin Matrix Factorization Nathan Srebro Dept. of Computer Science University of Toronto Toronto, ON, CANADA nati@cs.toronto.edu Jason D. M. Rennie Tommi S. Jaakkola Computer Science and Artificial

### Sublinear Algorithms for Big Data. Part 4: Random Topics

Sublinear Algorithms for Big Data Part 4: Random Topics Qin Zhang 1-1 2-1 Topic 1: Compressive sensing Compressive sensing The model (Candes-Romberg-Tao 04; Donoho 04) Applicaitons Medical imaging reconstruction

### Applied Linear Algebra I Review page 1

Applied Linear Algebra Review 1 I. Determinants A. Definition of a determinant 1. Using sum a. Permutations i. Sign of a permutation ii. Cycle 2. Uniqueness of the determinant function in terms of properties

### Lecture 11: 0-1 Quadratic Program and Lower Bounds

Lecture : - Quadratic Program and Lower Bounds (3 units) Outline Problem formulations Reformulation: Linearization & continuous relaxation Branch & Bound Method framework Simple bounds, LP bound and semidefinite

### Eigenvalues and Eigenvectors

Chapter 6 Eigenvalues and Eigenvectors 6. Introduction to Eigenvalues Linear equations Ax D b come from steady state problems. Eigenvalues have their greatest importance in dynamic problems. The solution

### Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round \$200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

### Linear algebra and the geometry of quadratic equations. Similarity transformations and orthogonal matrices

MATH 30 Differential Equations Spring 006 Linear algebra and the geometry of quadratic equations Similarity transformations and orthogonal matrices First, some things to recall from linear algebra Two

### Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number

### Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

Variance Reduction The statistical efficiency of Monte Carlo simulation can be measured by the variance of its output If this variance can be lowered without changing the expected value, fewer replications

### Several Views of Support Vector Machines

Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

### Multidimensional data and factorial methods

Multidimensional data and factorial methods Bidimensional data x 5 4 3 4 X 3 6 X 3 5 4 3 3 3 4 5 6 x Cartesian plane Multidimensional data n X x x x n X x x x n X m x m x m x nm Factorial plane Interpretation

### A network flow algorithm for reconstructing. binary images from discrete X-rays

A network flow algorithm for reconstructing binary images from discrete X-rays Kees Joost Batenburg Leiden University and CWI, The Netherlands kbatenbu@math.leidenuniv.nl Abstract We present a new algorithm

### MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix. Nullspace Let A = (a ij ) be an m n matrix. Definition. The nullspace of the matrix A, denoted N(A), is the set of all n-dimensional column

### NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing Alex Cloninger Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu

### Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like

### Section 4.4 Inner Product Spaces

Section 4.4 Inner Product Spaces In our discussion of vector spaces the specific nature of F as a field, other than the fact that it is a field, has played virtually no role. In this section we no longer

### MAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A =

MAT 200, Midterm Exam Solution. (0 points total) a. (5 points) Compute the determinant of the matrix 2 2 0 A = 0 3 0 3 0 Answer: det A = 3. The most efficient way is to develop the determinant along the

### MATH 551 - APPLIED MATRIX THEORY

MATH 55 - APPLIED MATRIX THEORY FINAL TEST: SAMPLE with SOLUTIONS (25 points NAME: PROBLEM (3 points A web of 5 pages is described by a directed graph whose matrix is given by A Do the following ( points

### 3 Orthogonal Vectors and Matrices

3 Orthogonal Vectors and Matrices The linear algebra portion of this course focuses on three matrix factorizations: QR factorization, singular valued decomposition (SVD), and LU factorization The first

### Solving Linear Systems, Continued and The Inverse of a Matrix

, Continued and The of a Matrix Calculus III Summer 2013, Session II Monday, July 15, 2013 Agenda 1. The rank of a matrix 2. The inverse of a square matrix Gaussian Gaussian solves a linear system by reducing

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

### 1 2 3 1 1 2 x = + x 2 + x 4 1 0 1

(d) If the vector b is the sum of the four columns of A, write down the complete solution to Ax = b. 1 2 3 1 1 2 x = + x 2 + x 4 1 0 0 1 0 1 2. (11 points) This problem finds the curve y = C + D 2 t which

### Linear Algebra: Determinants, Inverses, Rank

D Linear Algebra: Determinants, Inverses, Rank D 1 Appendix D: LINEAR ALGEBRA: DETERMINANTS, INVERSES, RANK TABLE OF CONTENTS Page D.1. Introduction D 3 D.2. Determinants D 3 D.2.1. Some Properties of