Lecture Topic: Low-Rank Approximations
|
|
- Virginia Park
- 7 years ago
- Views:
Transcription
1 Lecture Topic: Low-Rank Approximations
2 Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original matrix by a rank-1 matrix. In this chapter, we will consider problems, where a sparse matrix is given and one hopes to find a structured (e.g., low-rank), dense matrix as close as possible to it, in some norm. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
3 Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original matrix by a rank-1 matrix. In this chapter, we will consider problems, where a sparse matrix is given and one hopes to find a structured (e.g., low-rank), dense matrix as close as possible to it, in some norm. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
4 Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original matrix by a rank-1 matrix. In this chapter, we will consider problems, where a sparse matrix is given and one hopes to find a structured (e.g., low-rank), dense matrix as close as possible to it, in some norm. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
5 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rank-r matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
6 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rank-r matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
7 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rank-r matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
8 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rank-r matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
9 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rank-r matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
10 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rank-r matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
11 Another Example One may also consider estimating positions of sensors from some of their pair-wise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have low-power radios, which allow them to estimate their distance from a handful of closest sensors. From these pair-wise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
12 Another Example One may also consider estimating positions of sensors from some of their pair-wise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have low-power radios, which allow them to estimate their distance from a handful of closest sensors. From these pair-wise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
13 Another Example One may also consider estimating positions of sensors from some of their pair-wise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have low-power radios, which allow them to estimate their distance from a handful of closest sensors. From these pair-wise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
14 Another Example One may also consider estimating positions of sensors from some of their pair-wise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have low-power radios, which allow them to estimate their distance from a handful of closest sensors. From these pair-wise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
15 Another Example One may also consider estimating positions of sensors from some of their pair-wise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have low-power radios, which allow them to estimate their distance from a handful of closest sensors. From these pair-wise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
16 Another Example One may also consider estimating positions of sensors from some of their pair-wise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have low-power radios, which allow them to estimate their distance from a handful of closest sensors. From these pair-wise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
17 Yet Another Example In the most striking result, we will see that for random rank-r matrices, knowing randomly drawn O(nr(log n) 2 ) elements makes it possible to reconstruct the complete matrix of O(n 2 ) elements without any error, with high probability. This has far-reaching consequences: Consider, for instance a digital camera. The price of sensors increases with the number of pixels, but many images are naturally low-rank. Although cameras with a single-pixel chip ( remain a curiosity, super-resolution techniques are actually wide-spread in medical imagining, where battery capacity is not a concern. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
18 Yet Another Example In the most striking result, we will see that for random rank-r matrices, knowing randomly drawn O(nr(log n) 2 ) elements makes it possible to reconstruct the complete matrix of O(n 2 ) elements without any error, with high probability. This has far-reaching consequences: Consider, for instance a digital camera. The price of sensors increases with the number of pixels, but many images are naturally low-rank. Although cameras with a single-pixel chip ( remain a curiosity, super-resolution techniques are actually wide-spread in medical imagining, where battery capacity is not a concern. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
19 Key Concepts A singular value and pair of singular vectors of A R m n are a scalar σ R, σ 0 and two non-zero vectors u R m and v R n such that Av = σu. In a matrix completion problem, with some elements (i, j) E of matrix A R m n known, you solve: min rank(m) s.t. M M R m r i,j = A i,j (i, j) E. (1.1) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
20 Key Concepts A singular value and pair of singular vectors of A R m n are a scalar σ R, σ 0 and two non-zero vectors u R m and v R n such that Av = σu. In a matrix completion problem, with some elements (i, j) E of matrix A R m n known, you solve: min rank(m) s.t. M M R m r i,j = A i,j (i, j) E. (1.1) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
21 Some Revision Definition (Orthogonality) Two vectors u, v R n are orthogonal if and only if their dot product n i=1 u iv i is zero. This suggest the angle of 90 degrees. The columns and rows of an orthogonal matrix U R n n are orthogonal unit vectors, i.e., U T U = UU T = I, where I is the identity matrix. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
22 Some Revision Definition (Orthogonality) Two vectors u, v R n are orthogonal if and only if their dot product n i=1 u iv i is zero. This suggest the angle of 90 degrees. The columns and rows of an orthogonal matrix U R n n are orthogonal unit vectors, i.e., U T U = UU T = I, where I is the identity matrix. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
23 Some More Intuition The linear transformation x Qx, for an orthogonal Q, is an isometry, i.e., preserves the dot product of vectors. Imagine a rotation or reflection. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
24 Some More Intuition The linear transformation x Qx, for an orthogonal Q, is an isometry, i.e., preserves the dot product of vectors. Imagine a rotation or reflection. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
25 Some More Intuition The linear transformation x Qx, for an orthogonal Q, is an isometry, i.e., preserves the dot product of vectors. Imagine a rotation or reflection. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
26 Some Revision Definition (Singular values and vectors of a matrix A R m n ) For every matrix A R m n, there exists a decomposition A = UΣV T, where: U is an m m orthogonal matrix whose m columns are left-singular vectors of A; Σ is m n matrix with Σ i,i 0, i min{m, n} being the singular values of A and all other elements 0; V T is n n orthogonal matrix whose n columns are right-singular vectors of A. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
27 Some More Intuition For A, det(a) > 0, Σ is a scaling matrix and U, V T rotation matrices. UΣV T is a composition a rotation, a scaling, and another rotation Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
28 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a non-negative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with non-negative real diagonal entries, which are the lengths of semi-axes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
29 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a non-negative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with non-negative real diagonal entries, which are the lengths of semi-axes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
30 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a non-negative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with non-negative real diagonal entries, which are the lengths of semi-axes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
31 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a non-negative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with non-negative real diagonal entries, which are the lengths of semi-axes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
32 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a non-negative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with non-negative real diagonal entries, which are the lengths of semi-axes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
33 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a non-negative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with non-negative real diagonal entries, which are the lengths of semi-axes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
34 Singular Values: Perturbation Analysis Much of the perturbation analysis we have seen for eigenvalues carries over. Let 0 m n, and let A R m n. Weyl inequality, for example: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m (2.1) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
35 Some Revision We have seen a variety of norms of x R n : Example n l 1 norm x 1 := x i (3.1) i=1 Maximum norm x := max { x 1,..., x n }. (3.2) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
36 Some Revision Let us consider a new concept, the conjugate norms and. By definition, In particular, 2 = 2 and 1 =. z = max y 1 y T z. (3.3) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
37 Some Revision Let us consider a new concept, the conjugate norms and. By definition, In particular, 2 = 2 and 1 =. z = max y 1 y T z. (3.3) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
38 Some Revision Let us consider a new concept, the conjugate norms and. By definition, In particular, 2 = 2 and 1 =. z = max y 1 y T z. (3.3) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
39 Some Revision Definition (Matrix norm) A is a norm of a matrix A R m n if and only if: A 0 A = 0 if and only if A = 0 αa = α A for all α in R and A R m n A + B A + B for all A, B R m n. Definition (Trace of A R n n ) trace(a) = a 11 + a a nn = n i=1 a ii. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
40 Some Revision ( ) min{m, n} Nuclear norm A := trace A T A = σ i. (3.4) Frobenius norm A F := trace(a T A) = i=1 k i=1 j=1 n a ij 2 1/2 = min{m, n} σi 2 i=1 (3.5) Spectral norm A 2 := λ max (A A) = σ max (A) (3.6) where A A denotes a positive semidefinite B such that B = A T A. F = F and spectral norm is the conjugate of the nuclear norm. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
41 Some Revision ( ) min{m, n} Nuclear norm A := trace A T A = σ i. (3.4) Frobenius norm A F := trace(a T A) = i=1 k i=1 j=1 n a ij 2 1/2 = min{m, n} σi 2 i=1 (3.5) Spectral norm A 2 := λ max (A A) = σ max (A) (3.6) where A A denotes a positive semidefinite B such that B = A T A. F = F and spectral norm is the conjugate of the nuclear norm. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
42 Some More Understanding Previously, we have mentioned that all matrix norms are similar. For matrix A R m n of rank r: A 2 A F r A 2 A F A r A F Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
43 Some More Understanding Previously, we have mentioned that all matrix norms are similar. For matrix A R m n of rank r: A 2 A F r A 2 A F A r A F Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
44 Matrix Completion In general, let us consider: min M M A M N where M R m n is some subset of m n matrices, N is a matrix norm. In particular: 2 or F, M is rank-r, A is dense, M is dense: SVD F, M is rank-r, A is sparse, M is dense: NP-Hard various N, M is rank-1 with sparsity: NP-Hard Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
45 Low-Rank Matrices Theorem (Eckart and Young) Let us have rank-r matrix A R m n, A = UΣV T = r i=1 σ iu i vi T. Consider k k < r and the so called truncated singular value decomposition A k = σ i u i vi T, More visually, arg min B R m n rank(b) k A B F = arg min B R m n rank(b) k i=1 A B 2 = A k (4.1) A = [ ] [ ] Σ U 1 U 1 0 [V1 ] T 2 V 0 Σ 2, (4.2) 2 A K = U 1 Σ 1 V T 1 (4.3) where Σ 1 R k k, U 1 R m k, and V 1 R n k. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
46 Low-Rank Matrices There are a number of proofs One can use the Weyl inequality: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m If B has rank k, σ k+1 (B) = 0. One uses B and AB, j = k + 1. For the spectral norm, i = 1 suffices. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
47 Low-Rank Matrices There are a number of proofs One can use the Weyl inequality: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m If B has rank k, σ k+1 (B) = 0. One uses B and AB, j = k + 1. For the spectral norm, i = 1 suffices. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
48 Low-Rank Matrices There are a number of proofs One can use the Weyl inequality: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m If B has rank k, σ k+1 (B) = 0. One uses B and AB, j = k + 1. For the spectral norm, i = 1 suffices. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
49 Low-Rank Matrices There are a number of proofs One can use the Weyl inequality: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m If B has rank k, σ k+1 (B) = 0. One uses B and AB, j = k + 1. For the spectral norm, i = 1 suffices. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
50 Low-Rank Matrices There are a number of proofs One can use the Weyl inequality: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m If B has rank k, σ k+1 (B) = 0. One uses B and AB, j = k + 1. For the spectral norm, i = 1 suffices. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
51 Sparse Low-Rank Matrices Consider again the applications of low-rank matrix reconstruction: predicting ratings of movies by individual users, in collaborative filtering, wher each user has rated 200 out of movies on offer, or estimating positions of sensors from some of their pair-wise positions, in sensor network localisation, where one may know positions to 4 or 5 sensors. They share the property that we know only a very small number of entries of the matrix. Imputing 0 or similar is a bad idea. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
52 Sparse Low-Rank Matrices Consider again the applications of low-rank matrix reconstruction: predicting ratings of movies by individual users, in collaborative filtering, wher each user has rated 200 out of movies on offer, or estimating positions of sensors from some of their pair-wise positions, in sensor network localisation, where one may know positions to 4 or 5 sensors. They share the property that we know only a very small number of entries of the matrix. Imputing 0 or similar is a bad idea. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
53 Sparse Low-Rank Matrices Consider again the applications of low-rank matrix reconstruction: predicting ratings of movies by individual users, in collaborative filtering, wher each user has rated 200 out of movies on offer, or estimating positions of sensors from some of their pair-wise positions, in sensor network localisation, where one may know positions to 4 or 5 sensors. They share the property that we know only a very small number of entries of the matrix. Imputing 0 or similar is a bad idea. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
54 Sparse Low-Rank Matrices Consider again the applications of low-rank matrix reconstruction: predicting ratings of movies by individual users, in collaborative filtering, wher each user has rated 200 out of movies on offer, or estimating positions of sensors from some of their pair-wise positions, in sensor network localisation, where one may know positions to 4 or 5 sensors. They share the property that we know only a very small number of entries of the matrix. Imputing 0 or similar is a bad idea. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
55 Sparse Low-Rank Matrices Consider again the applications of low-rank matrix reconstruction: predicting ratings of movies by individual users, in collaborative filtering, wher each user has rated 200 out of movies on offer, or estimating positions of sensors from some of their pair-wise positions, in sensor network localisation, where one may know positions to 4 or 5 sensors. They share the property that we know only a very small number of entries of the matrix. Imputing 0 or similar is a bad idea. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
56 Sparse Low-Rank Matrices Let us know only some elements (i, j) E of matrix A R m n. Assume that there exists only one rank-r matrix M with those entries. Then, the search for the simplest explanation fitting the observed data is: The problem is: min rank(m) s.t. M M R m r i,j = A i,j (i, j) E (5.1) non-convex in M and very hard easy to reformulate in a number of ways. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
57 Sparse Low-Rank Matrices Let us know only some elements (i, j) E of matrix A R m n. Consider the fact that rank-r matrix M = XY T, X R m r, Y R n r and: The problem is: non-convex in XY T arg convex in either X or Y. min X R m r Y R n r (i,j) E ( (XY T ) i,j A i,j ) 2 (5.2) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
58 Sparse Low-Rank Matrices A rank-r matrix has exactly r non-zero singular values. Rank can hence be seen as the l 0 norm of the spectrum. Considering we have seen l 0 norm being replaced by l 1 norm, Fazel proposed to replace rank with the spectral norm: The problem is: arg min M subject to M R m n (i,j) E ( (MY T ) i,j A i,j ) 2 (5.3) convex in M and possible to solve using interior-point methods the optimum of the convex problem coincides with the global optimum of the non-convex problem (!) with high probability: Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
59 Sparse Low-Rank Matrices A rank-r matrix has exactly r non-zero singular values. Rank can hence be seen as the l 0 norm of the spectrum. Considering we have seen l 0 norm being replaced by l 1 norm, Fazel proposed to replace rank with the spectral norm: The problem is: arg min M subject to M R m n (i,j) E ( (MY T ) i,j A i,j ) 2 (5.3) convex in M and possible to solve using interior-point methods the optimum of the convex problem coincides with the global optimum of the non-convex problem (!) with high probability: Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
60 Sparse Low-Rank Matrices A rank-r matrix has exactly r non-zero singular values. Rank can hence be seen as the l 0 norm of the spectrum. Considering we have seen l 0 norm being replaced by l 1 norm, Fazel proposed to replace rank with the spectral norm: The problem is: arg min M subject to M R m n (i,j) E ( (MY T ) i,j A i,j ) 2 (5.3) convex in M and possible to solve using interior-point methods the optimum of the convex problem coincides with the global optimum of the non-convex problem (!) with high probability: Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
61 Sparse Low-Rank Matrices Theorem (Candes and Recht) Let us assume M R m n of rank r is sampled from the random orthogonal model. Suppose we observe entries of M with locations E sampled uniformly at random. Then there are numerical constants C 1 and C 1 such that if E C 1 r (max{m, n}) 5/4 log(max{m, n}), (5.4) the minimizer to the -minimisation problem is unique and equal to M with probability at least 1 C 2 (max{m, n}) 3. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
62 Sparse Low-Rank Matrices The result of the previous theorem is of considerable theoretical and practical interest. It has been cited more than 1800 times. Although the -minimisation problem is possible to approximate within any fixed precision in polynomial time, this is limited to modest n 1000 in practice. Notice that the interior point method needs to invert the Hessian, where even the matrix variable is n n. One would hence like to find more efficient algorithms. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
63 Sparse Low-Rank Matrices The result of the previous theorem is of considerable theoretical and practical interest. It has been cited more than 1800 times. Although the -minimisation problem is possible to approximate within any fixed precision in polynomial time, this is limited to modest n 1000 in practice. Notice that the interior point method needs to invert the Hessian, where even the matrix variable is n n. One would hence like to find more efficient algorithms. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
64 Sparse Low-Rank Matrices The result of the previous theorem is of considerable theoretical and practical interest. It has been cited more than 1800 times. Although the -minimisation problem is possible to approximate within any fixed precision in polynomial time, this is limited to modest n 1000 in practice. Notice that the interior point method needs to invert the Hessian, where even the matrix variable is n n. One would hence like to find more efficient algorithms. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
65 Sparse Low-Rank Matrices The result of the previous theorem is of considerable theoretical and practical interest. It has been cited more than 1800 times. Although the -minimisation problem is possible to approximate within any fixed precision in polynomial time, this is limited to modest n 1000 in practice. Notice that the interior point method needs to invert the Hessian, where even the matrix variable is n n. One would hence like to find more efficient algorithms. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
66 Sparse Low-Rank Matrices The result of the previous theorem is of considerable theoretical and practical interest. It has been cited more than 1800 times. Although the -minimisation problem is possible to approximate within any fixed precision in polynomial time, this is limited to modest n 1000 in practice. Notice that the interior point method needs to invert the Hessian, where even the matrix variable is n n. One would hence like to find more efficient algorithms. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
67 : Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
68 Sparse Low-Rank Matrices Alternating Minimisation: 1 Partition E = E 1 E 2... E kmax 2 Compute SVD min m,n i σ i X i Yi T considering only E 1 3 Initialise X 1 = mn E 1 σi x i Y 1 = mn E 1 σi y i 4 For each iteration k = 1... k max O(log n): X k+1 = min (X ((Y k ) T ) i,j A i,j ) 2 (5.5) X R m r (i,j) E k+1 Y k+1 = min (X k+1 Y T ) i,j A i,j ) 2 (5.6) Y R n r (i,j) E k+1 This: solves linear least squares twice in each iteration, in dimensions mr and nr generally takes O((mr) 2 ), O((nr) 2 ), but for the partial separable structure, it is O( E r 2 ), O( E r 2 ) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
69 Sparse Low-Rank Matrices Theorem (Keshavan et al.) Let us assume M = X (Y ) T + W, X R m r, Y R n r, W R m n with elements of W, X, and Y being bounded i.i.d random variables, for X, Y zero-mean, and expectation of W satisfying, among others: θ = σ max (W ), and P ( W i,j W i,j t ) ) 2 exp ( t2 2ω 2. (5.7) There exists constants C 1, C 2 such that k max = C 1 log n and E C 2 κ 8 nr(log n) 2 and E uniformly distributed over all sets of E, such that with probability larger than 1 1/n 4, one has: M (X k (Y k ) T ) F 6 r 2 2k + C 2 rκ 2 (θ + nω ) ɛ (5.8) where κ = max{σ min (X ) 1, σ min (Y ) 1 }. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
70 Regularisations of PCA Alternating minimisation is a very general approach to optimisation problems. For example, consider a generalisation of PCA: with v being l 2 and no norm s. max x R n{ Ax v : x 2 1, x s k}, (6.1) v of l 1 norm works better, in terms of perturbation analysis (stability, robustness). s such as l 1 improves interpretability (sparsity in the loading vector) by approximating l 0. As we have seen in the previous chapter, l 1 norm is non-smooth. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
71 Regularisations of PCA Alternating minimisation is a very general approach to optimisation problems. For example, consider a generalisation of PCA: with v being l 2 and no norm s. max x R n{ Ax v : x 2 1, x s k}, (6.1) v of l 1 norm works better, in terms of perturbation analysis (stability, robustness). s such as l 1 improves interpretability (sparsity in the loading vector) by approximating l 0. As we have seen in the previous chapter, l 1 norm is non-smooth. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
72 Regularisations of PCA Alternating minimisation is a very general approach to optimisation problems. For example, consider a generalisation of PCA: with v being l 2 and no norm s. max x R n{ Ax v : x 2 1, x s k}, (6.1) v of l 1 norm works better, in terms of perturbation analysis (stability, robustness). s such as l 1 improves interpretability (sparsity in the loading vector) by approximating l 0. As we have seen in the previous chapter, l 1 norm is non-smooth. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
73 Regularisations of PCA Alternating minimisation is a very general approach to optimisation problems. For example, consider a generalisation of PCA: with v being l 2 and no norm s. max x R n{ Ax v : x 2 1, x s k}, (6.1) v of l 1 norm works better, in terms of perturbation analysis (stability, robustness). s such as l 1 improves interpretability (sparsity in the loading vector) by approximating l 0. As we have seen in the previous chapter, l 1 norm is non-smooth. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
74 Regularisations of PCA Alternating minimisation is a very general approach to optimisation problems. For example, consider a generalisation of PCA: with v being l 2 and no norm s. max x R n{ Ax v : x 2 1, x s k}, (6.1) v of l 1 norm works better, in terms of perturbation analysis (stability, robustness). s such as l 1 improves interpretability (sparsity in the loading vector) by approximating l 0. As we have seen in the previous chapter, l 1 norm is non-smooth. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
75 Regularisations Richtárik et al. summarise 8 possible regularisations of the problem of computing the first PC by combining: two norms for measuring variance (l 1, l 2 ) and two sparsity-inducing norms (cardinality l 0 and l 1 ), either in a constraint or in a penalty term. All have the form with X R n and f. OPT = max f (x), (6.2) x X Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
76 Regularisations Richtárik et al. summarise 8 possible regularisations of the problem of computing the first PC by combining: two norms for measuring variance (l 1, l 2 ) and two sparsity-inducing norms (cardinality l 0 and l 1 ), either in a constraint or in a penalty term. All have the form with X R n and f. OPT = max f (x), (6.2) x X Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
77 Regularisations Richtárik et al. summarise 8 possible regularisations of the problem of computing the first PC by combining: two norms for measuring variance (l 1, l 2 ) and two sparsity-inducing norms (cardinality l 0 and l 1 ), either in a constraint or in a penalty term. All have the form with X R n and f. OPT = max f (x), (6.2) x X Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
78 Regularisations Richtárik et al. summarise 8 possible regularisations of the problem of computing the first PC by combining: two norms for measuring variance (l 1, l 2 ) and two sparsity-inducing norms (cardinality l 0 and l 1 ), either in a constraint or in a penalty term. All have the form with X R n and f. OPT = max f (x), (6.2) x X Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
79 Regularisations # v s s use X f (x) 1 L 2 L 0 constraint {x R n : x 2 1, x 0 s} Ax 2 2 L 1 L 0 constraint {x R n : x 2 1, x 0 s} Ax 1 3 L 2 L 1 constraint {x R n : x 2 1, x 1 s} Ax 2 4 L 1 L 1 constraint {x R n : x 2 1, x 1 s} Ax 1 5 L 2 L 0 penalty {x R n : x 2 1} Ax 2 2 γ x 0 6 L 1 L 0 penalty {x R n : x 2 1} Ax 2 1 γ x 0 7 L 2 L 1 penalty {x R n : x 2 1} Ax 2 γ x 1 8 L 1 L 1 penalty {x R n : x 2 1} Ax 1 γ x 1 Table : Eight regularisations of PCA, cited in verbatim from Richtárik et al. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
80 Regularisations Let Y := {y R m Y := {y R m : y 2 1} for the l 2 -norm and : y 1} for the l 1 norm, and let F (x, y) be the function obtained from f (x) after replacing Ax with y T Ax (resp. Ax 2 with (y T Ax) 2 ). Then, in view of the above, (6.2) takes on the equivalent form OPT = max F (x, y). (6.3) max x X y Y That is, the 8 problems can be reformulated into the form (6.3). Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
81 Regularisations Let Y := {y R m Y := {y R m : y 2 1} for the l 2 -norm and : y 1} for the l 1 norm, and let F (x, y) be the function obtained from f (x) after replacing Ax with y T Ax (resp. Ax 2 with (y T Ax) 2 ). Then, in view of the above, (6.2) takes on the equivalent form OPT = max F (x, y). (6.3) max x X y Y That is, the 8 problems can be reformulated into the form (6.3). Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
82 Regularisations Let Y := {y R m Y := {y R m : y 2 1} for the l 2 -norm and : y 1} for the l 1 norm, and let F (x, y) be the function obtained from f (x) after replacing Ax with y T Ax (resp. Ax 2 with (y T Ax) 2 ). Then, in view of the above, (6.2) takes on the equivalent form OPT = max F (x, y). (6.3) max x X y Y That is, the 8 problems can be reformulated into the form (6.3). Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
83 Regularisations # X Y F (x, y) 1 {x R n : x 2 1, x 0 s} {y R m : y 2 1} y T Ax 2 {x R n : x 2 1, x 0 s} {y R m : y 1} y T Ax 3 {x R n : x 2 1, x 1 s} {y R m : y 2 1} y T Ax 4 {x R n : x 2 1, x 1 s} {y R m : y 1} y T Ax 5 {x R n : x 2 1} {y R m : y 2 1} (y T Ax) 2 γ x 0 6 {x R n : x 2 1} {y R m : y 1} (y T Ax) 2 γ x 0 7 {x R n : x 2 1} {y R m : y 2 1} y T Ax γ x 1 8 {x R n : x 2 1} {y R m : y 1} y T Ax γ x 1 Table : Reformulations of the problems from Table 1. Cited in verbatim from Richtárik et al. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
84 Generalising the Power Method The alternating minimisation for the regularised problem (6.3) is: y k = arg max y Y F (x k, y) (6.4) x k+1 = arg max x X F (x, y k ). (6.5) As it turns out, there are closed-form solutions for the two sub-problems for all the variants above. Notice that Hotelling s deflation is no longer guaranteed to work, although there are replacements. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
85 Generalising the Power Method The alternating minimisation for the regularised problem (6.3) is: y k = arg max y Y F (x k, y) (6.4) x k+1 = arg max x X F (x, y k ). (6.5) As it turns out, there are closed-form solutions for the two sub-problems for all the variants above. Notice that Hotelling s deflation is no longer guaranteed to work, although there are replacements. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
86 Generalising the Power Method The alternating minimisation for the regularised problem (6.3) is: y k = arg max y Y F (x k, y) (6.4) x k+1 = arg max x X F (x, y k ). (6.5) As it turns out, there are closed-form solutions for the two sub-problems for all the variants above. Notice that Hotelling s deflation is no longer guaranteed to work, although there are replacements. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
87 A Summary Overall, we have seen that there are NP-Hard problems, for which one can retrieve the global optimum with high probability. Leading solvers based on alternating minimisation can tackle gigabyte-sized instances in minutes. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
88 A Summary Overall, we have seen that there are NP-Hard problems, for which one can retrieve the global optimum with high probability. Leading solvers based on alternating minimisation can tackle gigabyte-sized instances in minutes. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
Lecture 5: Singular Value Decomposition SVD (1)
EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system
More informationLinear Algebra Review. Vectors
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length
More informationSimilarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
More informationInner Product Spaces and Orthogonality
Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,
More informationNotes on Symmetric Matrices
CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.
More informationα = u v. In other words, Orthogonal Projection
Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v
More informationChapter 6. Orthogonality
6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be
More informationVector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty
More informationNumerical Methods I Eigenvalue Problems
Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001, Fall 2010 September 30th, 2010 A. Donev (Courant Institute)
More informationMath 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = 36 + 41i.
Math 5A HW4 Solutions September 5, 202 University of California, Los Angeles Problem 4..3b Calculate the determinant, 5 2i 6 + 4i 3 + i 7i Solution: The textbook s instructions give us, (5 2i)7i (6 + 4i)(
More informationInner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
More informationSection 6.1 - Inner Products and Norms
Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,
More information6. Cholesky factorization
6. Cholesky factorization EE103 (Fall 2011-12) triangular matrices forward and backward substitution the Cholesky factorization solving Ax = b with A positive definite inverse of a positive definite matrix
More informationUniversity of Lille I PC first year list of exercises n 7. Review
University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients
More information[1] Diagonal factorization
8.03 LA.6: Diagonalization and Orthogonal Matrices [ Diagonal factorization [2 Solving systems of first order differential equations [3 Symmetric and Orthonormal Matrices [ Diagonal factorization Recall:
More informationDerivative Free Optimization
Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTH-MAT-R--2014/02--SE Department of Mathematics Linköping University S-581 83 Linköping, Sweden. Three lectures 1 on Derivative Free
More informationEpipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.
Epipolar Geometry We consider two perspective images of a scene as taken from a stereo pair of cameras (or equivalently, assume the scene is rigid and imaged with a single camera from two different locations).
More information1 VECTOR SPACES AND SUBSPACES
1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such
More informationChapter 17. Orthogonal Matrices and Symmetries of Space
Chapter 17. Orthogonal Matrices and Symmetries of Space Take a random matrix, say 1 3 A = 4 5 6, 7 8 9 and compare the lengths of e 1 and Ae 1. The vector e 1 has length 1, while Ae 1 = (1, 4, 7) has length
More informationAu = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.
Chapter 7 Eigenvalues and Eigenvectors In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigenvectors of matrices, concepts that were already introduced in Geometry
More informationOctober 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix
Linear Algebra & Properties of the Covariance Matrix October 3rd, 2012 Estimation of r and C Let rn 1, rn, t..., rn T be the historical return rates on the n th asset. rn 1 rṇ 2 r n =. r T n n = 1, 2,...,
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
More informationNotes on Orthogonal and Symmetric Matrices MENU, Winter 2013
Notes on Orthogonal and Symmetric Matrices MENU, Winter 201 These notes summarize the main properties and uses of orthogonal and symmetric matrices. We covered quite a bit of material regarding these topics,
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationLecture 5 Principal Minors and the Hessian
Lecture 5 Principal Minors and the Hessian Eivind Eriksen BI Norwegian School of Management Department of Economics October 01, 2010 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and
More informationNonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
More informationOrthogonal Diagonalization of Symmetric Matrices
MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding
More information3. INNER PRODUCT SPACES
. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
More information2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
More information13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.
3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in three-space, we write a vector in terms
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationExamination paper for TMA4205 Numerical Linear Algebra
Department of Mathematical Sciences Examination paper for TMA4205 Numerical Linear Algebra Academic contact during examination: Markus Grasmair Phone: 97580435 Examination date: December 16, 2015 Examination
More informationLeast-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
More informationNimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff
Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear
More informationRecall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.
ORTHOGONAL MATRICES Informally, an orthogonal n n matrix is the n-dimensional analogue of the rotation matrices R θ in R 2. When does a linear transformation of R 3 (or R n ) deserve to be called a rotation?
More information4: EIGENVALUES, EIGENVECTORS, DIAGONALIZATION
4: EIGENVALUES, EIGENVECTORS, DIAGONALIZATION STEVEN HEILMAN Contents 1. Review 1 2. Diagonal Matrices 1 3. Eigenvectors and Eigenvalues 2 4. Characteristic Polynomial 4 5. Diagonalizability 6 6. Appendix:
More informationConstrained Least Squares
Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580-587 CICN may05/1 Background The least squares problem: min Ax b 2 x Sometimes,
More information5. Orthogonal matrices
L Vandenberghe EE133A (Spring 2016) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 5-1 Orthonormal
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationLectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain
Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain 1. Orthogonal matrices and orthonormal sets An n n real-valued matrix A is said to be an orthogonal
More informationThe Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression
The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every
More informationQuadratic forms Cochran s theorem, degrees of freedom, and all that
Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us
More informationLINEAR ALGEBRA. September 23, 2010
LINEAR ALGEBRA September 3, 00 Contents 0. LU-decomposition.................................... 0. Inverses and Transposes................................. 0.3 Column Spaces and NullSpaces.............................
More informationNumerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 6. Eigenvalues and Singular Values In this section, we collect together the basic facts about eigenvalues and eigenvectors. From a geometrical viewpoint,
More informationLinear Algebraic Equations, SVD, and the Pseudo-Inverse
Linear Algebraic Equations, SVD, and the Pseudo-Inverse Philip N. Sabes October, 21 1 A Little Background 1.1 Singular values and matrix inversion For non-smmetric matrices, the eigenvalues and singular
More informationMATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).
MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors Jordan canonical form (continued) Jordan canonical form A Jordan block is a square matrix of the form λ 1 0 0 0 0 λ 1 0 0 0 0 λ 0 0 J = 0
More informationCS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace
More informationLINEAR ALGEBRA W W L CHEN
LINEAR ALGEBRA W W L CHEN c W W L Chen, 1997, 2008 This chapter is available free to all individuals, on understanding that it is not to be used for financial gain, and may be downloaded and/or photocopied,
More informationCS3220 Lecture Notes: QR factorization and orthogonal transformations
CS3220 Lecture Notes: QR factorization and orthogonal transformations Steve Marschner Cornell University 11 March 2009 In this lecture I ll talk about orthogonal matrices and their properties, discuss
More informationMATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.
MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α
More informationMathematical finance and linear programming (optimization)
Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may
More informationSolving polynomial least squares problems via semidefinite programming relaxations
Solving polynomial least squares problems via semidefinite programming relaxations Sunyoung Kim and Masakazu Kojima August 2007, revised in November, 2007 Abstract. A polynomial optimization problem whose
More information1 Norms and Vector Spaces
008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)
More informationBindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8
Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e
More information1 Introduction to Matrices
1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns
More informationKey words. Principal Component Analysis, Convex optimization, Nuclear norm minimization, Duality, Proximal gradient algorithms.
FAST CONVEX OPTIMIZATION ALGORITHMS FOR EXACT RECOVERY OF A CORRUPTED LOW-RANK MATRIX ZHOUCHEN LIN, ARVIND GANESH, JOHN WRIGHT, LEQIN WU, MINMING CHEN, AND YI MA Abstract. This paper studies algorithms
More informationA note on companion matrices
Linear Algebra and its Applications 372 (2003) 325 33 www.elsevier.com/locate/laa A note on companion matrices Miroslav Fiedler Academy of Sciences of the Czech Republic Institute of Computer Science Pod
More informationx1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.
Cross product 1 Chapter 7 Cross product We are getting ready to study integration in several variables. Until now we have been doing only differential calculus. One outcome of this study will be our ability
More informationC variance Matrices and Computer Network Analysis
On Covariance Structure in Noisy, Big Data Randy C. Paffenroth a, Ryan Nong a and Philip C. Du Toit a a Numerica Corporation, Loveland, CO, USA; ABSTRACT Herein we describe theory and algorithms for detecting
More informationSolution to Homework 2
Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationISOMETRIES OF R n KEITH CONRAD
ISOMETRIES OF R n KEITH CONRAD 1. Introduction An isometry of R n is a function h: R n R n that preserves the distance between vectors: h(v) h(w) = v w for all v and w in R n, where (x 1,..., x n ) = x
More informationNumerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems
Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001,
More information5.1 Bipartite Matching
CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson
More informationLinear Algebra Notes for Marsden and Tromba Vector Calculus
Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of
More informationAn Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.
An Overview Of Software For Convex Optimization Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.edu In fact, the great watershed in optimization isn t between linearity
More informationLecture 4: Partitioned Matrices and Determinants
Lecture 4: Partitioned Matrices and Determinants 1 Elementary row operations Recall the elementary operations on the rows of a matrix, equivalent to premultiplying by an elementary matrix E: (1) multiplying
More information17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function
17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function, : V V R, which is symmetric, that is u, v = v, u. bilinear, that is linear (in both factors):
More informationLecture 1: Schur s Unitary Triangularization Theorem
Lecture 1: Schur s Unitary Triangularization Theorem This lecture introduces the notion of unitary equivalence and presents Schur s theorem and some of its consequences It roughly corresponds to Sections
More informationNumerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen
(für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained
More informationContinuity of the Perron Root
Linear and Multilinear Algebra http://dx.doi.org/10.1080/03081087.2014.934233 ArXiv: 1407.7564 (http://arxiv.org/abs/1407.7564) Continuity of the Perron Root Carl D. Meyer Department of Mathematics, North
More informationEigenvalues, Eigenvectors, Matrix Factoring, and Principal Components
Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they
More informationInner products on R n, and more
Inner products on R n, and more Peyam Ryan Tabrizian Friday, April 12th, 2013 1 Introduction You might be wondering: Are there inner products on R n that are not the usual dot product x y = x 1 y 1 + +
More informationMaximum-Margin Matrix Factorization
Maximum-Margin Matrix Factorization Nathan Srebro Dept. of Computer Science University of Toronto Toronto, ON, CANADA nati@cs.toronto.edu Jason D. M. Rennie Tommi S. Jaakkola Computer Science and Artificial
More informationSublinear Algorithms for Big Data. Part 4: Random Topics
Sublinear Algorithms for Big Data Part 4: Random Topics Qin Zhang 1-1 2-1 Topic 1: Compressive sensing Compressive sensing The model (Candes-Romberg-Tao 04; Donoho 04) Applicaitons Medical imaging reconstruction
More informationApplied Linear Algebra I Review page 1
Applied Linear Algebra Review 1 I. Determinants A. Definition of a determinant 1. Using sum a. Permutations i. Sign of a permutation ii. Cycle 2. Uniqueness of the determinant function in terms of properties
More informationLecture 11: 0-1 Quadratic Program and Lower Bounds
Lecture : - Quadratic Program and Lower Bounds (3 units) Outline Problem formulations Reformulation: Linearization & continuous relaxation Branch & Bound Method framework Simple bounds, LP bound and semidefinite
More informationEigenvalues and Eigenvectors
Chapter 6 Eigenvalues and Eigenvectors 6. Introduction to Eigenvalues Linear equations Ax D b come from steady state problems. Eigenvalues have their greatest importance in dynamic problems. The solution
More informationReview Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
More informationLinear algebra and the geometry of quadratic equations. Similarity transformations and orthogonal matrices
MATH 30 Differential Equations Spring 006 Linear algebra and the geometry of quadratic equations Similarity transformations and orthogonal matrices First, some things to recall from linear algebra Two
More informationNumerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number
More informationVariance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers
Variance Reduction The statistical efficiency of Monte Carlo simulation can be measured by the variance of its output If this variance can be lowered without changing the expected value, fewer replications
More informationSeveral Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
More informationMultidimensional data and factorial methods
Multidimensional data and factorial methods Bidimensional data x 5 4 3 4 X 3 6 X 3 5 4 3 3 3 4 5 6 x Cartesian plane Multidimensional data n X x x x n X x x x n X m x m x m x nm Factorial plane Interpretation
More informationBinary Image Reconstruction
A network flow algorithm for reconstructing binary images from discrete X-rays Kees Joost Batenburg Leiden University and CWI, The Netherlands kbatenbu@math.leidenuniv.nl Abstract We present a new algorithm
More informationMATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.
MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix. Nullspace Let A = (a ij ) be an m n matrix. Definition. The nullspace of the matrix A, denoted N(A), is the set of all n-dimensional column
More informationNMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing
NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing Alex Cloninger Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu
More informationLecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs
CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like
More informationSection 4.4 Inner Product Spaces
Section 4.4 Inner Product Spaces In our discussion of vector spaces the specific nature of F as a field, other than the fact that it is a field, has played virtually no role. In this section we no longer
More informationMAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A =
MAT 200, Midterm Exam Solution. (0 points total) a. (5 points) Compute the determinant of the matrix 2 2 0 A = 0 3 0 3 0 Answer: det A = 3. The most efficient way is to develop the determinant along the
More informationMATH 551 - APPLIED MATRIX THEORY
MATH 55 - APPLIED MATRIX THEORY FINAL TEST: SAMPLE with SOLUTIONS (25 points NAME: PROBLEM (3 points A web of 5 pages is described by a directed graph whose matrix is given by A Do the following ( points
More information3 Orthogonal Vectors and Matrices
3 Orthogonal Vectors and Matrices The linear algebra portion of this course focuses on three matrix factorizations: QR factorization, singular valued decomposition (SVD), and LU factorization The first
More informationSolving Linear Systems, Continued and The Inverse of a Matrix
, Continued and The of a Matrix Calculus III Summer 2013, Session II Monday, July 15, 2013 Agenda 1. The rank of a matrix 2. The inverse of a square matrix Gaussian Gaussian solves a linear system by reducing
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More information1 2 3 1 1 2 x = + x 2 + x 4 1 0 1
(d) If the vector b is the sum of the four columns of A, write down the complete solution to Ax = b. 1 2 3 1 1 2 x = + x 2 + x 4 1 0 0 1 0 1 2. (11 points) This problem finds the curve y = C + D 2 t which
More informationLinear Algebra: Determinants, Inverses, Rank
D Linear Algebra: Determinants, Inverses, Rank D 1 Appendix D: LINEAR ALGEBRA: DETERMINANTS, INVERSES, RANK TABLE OF CONTENTS Page D.1. Introduction D 3 D.2. Determinants D 3 D.2.1. Some Properties of
More informationApproximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
More informationSF2940: Probability theory Lecture 8: Multivariate Normal Distribution
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,
More information