Lecture Topic: LowRank Approximations


 Virginia Park
 2 years ago
 Views:
Transcription
1 Lecture Topic: LowRank Approximations
2 LowRank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original matrix by a rank1 matrix. In this chapter, we will consider problems, where a sparse matrix is given and one hopes to find a structured (e.g., lowrank), dense matrix as close as possible to it, in some norm. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
3 LowRank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original matrix by a rank1 matrix. In this chapter, we will consider problems, where a sparse matrix is given and one hopes to find a structured (e.g., lowrank), dense matrix as close as possible to it, in some norm. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
4 LowRank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original matrix by a rank1 matrix. In this chapter, we will consider problems, where a sparse matrix is given and one hopes to find a structured (e.g., lowrank), dense matrix as close as possible to it, in some norm. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
5 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rankr matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
6 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rankr matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
7 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rankr matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
8 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rankr matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
9 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rankr matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
10 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rankr matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
11 Another Example One may also consider estimating positions of sensors from some of their pairwise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have lowpower radios, which allow them to estimate their distance from a handful of closest sensors. From these pairwise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
12 Another Example One may also consider estimating positions of sensors from some of their pairwise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have lowpower radios, which allow them to estimate their distance from a handful of closest sensors. From these pairwise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
13 Another Example One may also consider estimating positions of sensors from some of their pairwise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have lowpower radios, which allow them to estimate their distance from a handful of closest sensors. From these pairwise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
14 Another Example One may also consider estimating positions of sensors from some of their pairwise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have lowpower radios, which allow them to estimate their distance from a handful of closest sensors. From these pairwise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
15 Another Example One may also consider estimating positions of sensors from some of their pairwise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have lowpower radios, which allow them to estimate their distance from a handful of closest sensors. From these pairwise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
16 Another Example One may also consider estimating positions of sensors from some of their pairwise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have lowpower radios, which allow them to estimate their distance from a handful of closest sensors. From these pairwise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
17 Yet Another Example In the most striking result, we will see that for random rankr matrices, knowing randomly drawn O(nr(log n) 2 ) elements makes it possible to reconstruct the complete matrix of O(n 2 ) elements without any error, with high probability. This has farreaching consequences: Consider, for instance a digital camera. The price of sensors increases with the number of pixels, but many images are naturally lowrank. Although cameras with a singlepixel chip ( remain a curiosity, superresolution techniques are actually widespread in medical imagining, where battery capacity is not a concern. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
18 Yet Another Example In the most striking result, we will see that for random rankr matrices, knowing randomly drawn O(nr(log n) 2 ) elements makes it possible to reconstruct the complete matrix of O(n 2 ) elements without any error, with high probability. This has farreaching consequences: Consider, for instance a digital camera. The price of sensors increases with the number of pixels, but many images are naturally lowrank. Although cameras with a singlepixel chip ( remain a curiosity, superresolution techniques are actually widespread in medical imagining, where battery capacity is not a concern. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
19 Key Concepts A singular value and pair of singular vectors of A R m n are a scalar σ R, σ 0 and two nonzero vectors u R m and v R n such that Av = σu. In a matrix completion problem, with some elements (i, j) E of matrix A R m n known, you solve: min rank(m) s.t. M M R m r i,j = A i,j (i, j) E. (1.1) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
20 Key Concepts A singular value and pair of singular vectors of A R m n are a scalar σ R, σ 0 and two nonzero vectors u R m and v R n such that Av = σu. In a matrix completion problem, with some elements (i, j) E of matrix A R m n known, you solve: min rank(m) s.t. M M R m r i,j = A i,j (i, j) E. (1.1) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
21 Some Revision Definition (Orthogonality) Two vectors u, v R n are orthogonal if and only if their dot product n i=1 u iv i is zero. This suggest the angle of 90 degrees. The columns and rows of an orthogonal matrix U R n n are orthogonal unit vectors, i.e., U T U = UU T = I, where I is the identity matrix. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
22 Some Revision Definition (Orthogonality) Two vectors u, v R n are orthogonal if and only if their dot product n i=1 u iv i is zero. This suggest the angle of 90 degrees. The columns and rows of an orthogonal matrix U R n n are orthogonal unit vectors, i.e., U T U = UU T = I, where I is the identity matrix. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
23 Some More Intuition The linear transformation x Qx, for an orthogonal Q, is an isometry, i.e., preserves the dot product of vectors. Imagine a rotation or reflection. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
24 Some More Intuition The linear transformation x Qx, for an orthogonal Q, is an isometry, i.e., preserves the dot product of vectors. Imagine a rotation or reflection. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
25 Some More Intuition The linear transformation x Qx, for an orthogonal Q, is an isometry, i.e., preserves the dot product of vectors. Imagine a rotation or reflection. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
26 Some Revision Definition (Singular values and vectors of a matrix A R m n ) For every matrix A R m n, there exists a decomposition A = UΣV T, where: U is an m m orthogonal matrix whose m columns are leftsingular vectors of A; Σ is m n matrix with Σ i,i 0, i min{m, n} being the singular values of A and all other elements 0; V T is n n orthogonal matrix whose n columns are rightsingular vectors of A. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
27 Some More Intuition For A, det(a) > 0, Σ is a scaling matrix and U, V T rotation matrices. UΣV T is a composition a rotation, a scaling, and another rotation Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
28 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a nonnegative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with nonnegative real diagonal entries, which are the lengths of semiaxes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
29 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a nonnegative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with nonnegative real diagonal entries, which are the lengths of semiaxes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
30 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a nonnegative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with nonnegative real diagonal entries, which are the lengths of semiaxes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
31 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a nonnegative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with nonnegative real diagonal entries, which are the lengths of semiaxes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
32 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a nonnegative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with nonnegative real diagonal entries, which are the lengths of semiaxes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
33 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a nonnegative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with nonnegative real diagonal entries, which are the lengths of semiaxes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
34 Singular Values: Perturbation Analysis Much of the perturbation analysis we have seen for eigenvalues carries over. Let 0 m n, and let A R m n. Weyl inequality, for example: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m (2.1) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
35 Some Revision We have seen a variety of norms of x R n : Example n l 1 norm x 1 := x i (3.1) i=1 Maximum norm x := max { x 1,..., x n }. (3.2) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
36 Some Revision Let us consider a new concept, the conjugate norms and. By definition, In particular, 2 = 2 and 1 =. z = max y 1 y T z. (3.3) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
37 Some Revision Let us consider a new concept, the conjugate norms and. By definition, In particular, 2 = 2 and 1 =. z = max y 1 y T z. (3.3) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
38 Some Revision Let us consider a new concept, the conjugate norms and. By definition, In particular, 2 = 2 and 1 =. z = max y 1 y T z. (3.3) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
39 Some Revision Definition (Matrix norm) A is a norm of a matrix A R m n if and only if: A 0 A = 0 if and only if A = 0 αa = α A for all α in R and A R m n A + B A + B for all A, B R m n. Definition (Trace of A R n n ) trace(a) = a 11 + a a nn = n i=1 a ii. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
40 Some Revision ( ) min{m, n} Nuclear norm A := trace A T A = σ i. (3.4) Frobenius norm A F := trace(a T A) = i=1 k i=1 j=1 n a ij 2 1/2 = min{m, n} σi 2 i=1 (3.5) Spectral norm A 2 := λ max (A A) = σ max (A) (3.6) where A A denotes a positive semidefinite B such that B = A T A. F = F and spectral norm is the conjugate of the nuclear norm. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
41 Some Revision ( ) min{m, n} Nuclear norm A := trace A T A = σ i. (3.4) Frobenius norm A F := trace(a T A) = i=1 k i=1 j=1 n a ij 2 1/2 = min{m, n} σi 2 i=1 (3.5) Spectral norm A 2 := λ max (A A) = σ max (A) (3.6) where A A denotes a positive semidefinite B such that B = A T A. F = F and spectral norm is the conjugate of the nuclear norm. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
42 Some More Understanding Previously, we have mentioned that all matrix norms are similar. For matrix A R m n of rank r: A 2 A F r A 2 A F A r A F Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
43 Some More Understanding Previously, we have mentioned that all matrix norms are similar. For matrix A R m n of rank r: A 2 A F r A 2 A F A r A F Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
44 Matrix Completion In general, let us consider: min M M A M N where M R m n is some subset of m n matrices, N is a matrix norm. In particular: 2 or F, M is rankr, A is dense, M is dense: SVD F, M is rankr, A is sparse, M is dense: NPHard various N, M is rank1 with sparsity: NPHard Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
45 LowRank Matrices Theorem (Eckart and Young) Let us have rankr matrix A R m n, A = UΣV T = r i=1 σ iu i vi T. Consider k k < r and the so called truncated singular value decomposition A k = σ i u i vi T, More visually, arg min B R m n rank(b) k A B F = arg min B R m n rank(b) k i=1 A B 2 = A k (4.1) A = [ ] [ ] Σ U 1 U 1 0 [V1 ] T 2 V 0 Σ 2, (4.2) 2 A K = U 1 Σ 1 V T 1 (4.3) where Σ 1 R k k, U 1 R m k, and V 1 R n k. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
46 LowRank Matrices There are a number of proofs One can use the Weyl inequality: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m If B has rank k, σ k+1 (B) = 0. One uses B and AB, j = k + 1. For the spectral norm, i = 1 suffices. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
47 LowRank Matrices There are a number of proofs One can use the Weyl inequality: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m If B has rank k, σ k+1 (B) = 0. One uses B and AB, j = k + 1. For the spectral norm, i = 1 suffices. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
48 LowRank Matrices There are a number of proofs One can use the Weyl inequality: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m If B has rank k, σ k+1 (B) = 0. One uses B and AB, j = k + 1. For the spectral norm, i = 1 suffices. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
49 LowRank Matrices There are a number of proofs One can use the Weyl inequality: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m If B has rank k, σ k+1 (B) = 0. One uses B and AB, j = k + 1. For the spectral norm, i = 1 suffices. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
50 LowRank Matrices There are a number of proofs One can use the Weyl inequality: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m If B has rank k, σ k+1 (B) = 0. One uses B and AB, j = k + 1. For the spectral norm, i = 1 suffices. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
51 Sparse LowRank Matrices Consider again the applications of lowrank matrix reconstruction: predicting ratings of movies by individual users, in collaborative filtering, wher each user has rated 200 out of movies on offer, or estimating positions of sensors from some of their pairwise positions, in sensor network localisation, where one may know positions to 4 or 5 sensors. They share the property that we know only a very small number of entries of the matrix. Imputing 0 or similar is a bad idea. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
52 Sparse LowRank Matrices Consider again the applications of lowrank matrix reconstruction: predicting ratings of movies by individual users, in collaborative filtering, wher each user has rated 200 out of movies on offer, or estimating positions of sensors from some of their pairwise positions, in sensor network localisation, where one may know positions to 4 or 5 sensors. They share the property that we know only a very small number of entries of the matrix. Imputing 0 or similar is a bad idea. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
53 Sparse LowRank Matrices Consider again the applications of lowrank matrix reconstruction: predicting ratings of movies by individual users, in collaborative filtering, wher each user has rated 200 out of movies on offer, or estimating positions of sensors from some of their pairwise positions, in sensor network localisation, where one may know positions to 4 or 5 sensors. They share the property that we know only a very small number of entries of the matrix. Imputing 0 or similar is a bad idea. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
54 Sparse LowRank Matrices Consider again the applications of lowrank matrix reconstruction: predicting ratings of movies by individual users, in collaborative filtering, wher each user has rated 200 out of movies on offer, or estimating positions of sensors from some of their pairwise positions, in sensor network localisation, where one may know positions to 4 or 5 sensors. They share the property that we know only a very small number of entries of the matrix. Imputing 0 or similar is a bad idea. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
55 Sparse LowRank Matrices Consider again the applications of lowrank matrix reconstruction: predicting ratings of movies by individual users, in collaborative filtering, wher each user has rated 200 out of movies on offer, or estimating positions of sensors from some of their pairwise positions, in sensor network localisation, where one may know positions to 4 or 5 sensors. They share the property that we know only a very small number of entries of the matrix. Imputing 0 or similar is a bad idea. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
56 Sparse LowRank Matrices Let us know only some elements (i, j) E of matrix A R m n. Assume that there exists only one rankr matrix M with those entries. Then, the search for the simplest explanation fitting the observed data is: The problem is: min rank(m) s.t. M M R m r i,j = A i,j (i, j) E (5.1) nonconvex in M and very hard easy to reformulate in a number of ways. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
57 Sparse LowRank Matrices Let us know only some elements (i, j) E of matrix A R m n. Consider the fact that rankr matrix M = XY T, X R m r, Y R n r and: The problem is: nonconvex in XY T arg convex in either X or Y. min X R m r Y R n r (i,j) E ( (XY T ) i,j A i,j ) 2 (5.2) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
58 Sparse LowRank Matrices A rankr matrix has exactly r nonzero singular values. Rank can hence be seen as the l 0 norm of the spectrum. Considering we have seen l 0 norm being replaced by l 1 norm, Fazel proposed to replace rank with the spectral norm: The problem is: arg min M subject to M R m n (i,j) E ( (MY T ) i,j A i,j ) 2 (5.3) convex in M and possible to solve using interiorpoint methods the optimum of the convex problem coincides with the global optimum of the nonconvex problem (!) with high probability: Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
59 Sparse LowRank Matrices A rankr matrix has exactly r nonzero singular values. Rank can hence be seen as the l 0 norm of the spectrum. Considering we have seen l 0 norm being replaced by l 1 norm, Fazel proposed to replace rank with the spectral norm: The problem is: arg min M subject to M R m n (i,j) E ( (MY T ) i,j A i,j ) 2 (5.3) convex in M and possible to solve using interiorpoint methods the optimum of the convex problem coincides with the global optimum of the nonconvex problem (!) with high probability: Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
60 Sparse LowRank Matrices A rankr matrix has exactly r nonzero singular values. Rank can hence be seen as the l 0 norm of the spectrum. Considering we have seen l 0 norm being replaced by l 1 norm, Fazel proposed to replace rank with the spectral norm: The problem is: arg min M subject to M R m n (i,j) E ( (MY T ) i,j A i,j ) 2 (5.3) convex in M and possible to solve using interiorpoint methods the optimum of the convex problem coincides with the global optimum of the nonconvex problem (!) with high probability: Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
61 Sparse LowRank Matrices Theorem (Candes and Recht) Let us assume M R m n of rank r is sampled from the random orthogonal model. Suppose we observe entries of M with locations E sampled uniformly at random. Then there are numerical constants C 1 and C 1 such that if E C 1 r (max{m, n}) 5/4 log(max{m, n}), (5.4) the minimizer to the minimisation problem is unique and equal to M with probability at least 1 C 2 (max{m, n}) 3. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
62 Sparse LowRank Matrices The result of the previous theorem is of considerable theoretical and practical interest. It has been cited more than 1800 times. Although the minimisation problem is possible to approximate within any fixed precision in polynomial time, this is limited to modest n 1000 in practice. Notice that the interior point method needs to invert the Hessian, where even the matrix variable is n n. One would hence like to find more efficient algorithms. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
63 Sparse LowRank Matrices The result of the previous theorem is of considerable theoretical and practical interest. It has been cited more than 1800 times. Although the minimisation problem is possible to approximate within any fixed precision in polynomial time, this is limited to modest n 1000 in practice. Notice that the interior point method needs to invert the Hessian, where even the matrix variable is n n. One would hence like to find more efficient algorithms. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
64 Sparse LowRank Matrices The result of the previous theorem is of considerable theoretical and practical interest. It has been cited more than 1800 times. Although the minimisation problem is possible to approximate within any fixed precision in polynomial time, this is limited to modest n 1000 in practice. Notice that the interior point method needs to invert the Hessian, where even the matrix variable is n n. One would hence like to find more efficient algorithms. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
65 Sparse LowRank Matrices The result of the previous theorem is of considerable theoretical and practical interest. It has been cited more than 1800 times. Although the minimisation problem is possible to approximate within any fixed precision in polynomial time, this is limited to modest n 1000 in practice. Notice that the interior point method needs to invert the Hessian, where even the matrix variable is n n. One would hence like to find more efficient algorithms. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
66 Sparse LowRank Matrices The result of the previous theorem is of considerable theoretical and practical interest. It has been cited more than 1800 times. Although the minimisation problem is possible to approximate within any fixed precision in polynomial time, this is limited to modest n 1000 in practice. Notice that the interior point method needs to invert the Hessian, where even the matrix variable is n n. One would hence like to find more efficient algorithms. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
67 : Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
68 Sparse LowRank Matrices Alternating Minimisation: 1 Partition E = E 1 E 2... E kmax 2 Compute SVD min m,n i σ i X i Yi T considering only E 1 3 Initialise X 1 = mn E 1 σi x i Y 1 = mn E 1 σi y i 4 For each iteration k = 1... k max O(log n): X k+1 = min (X ((Y k ) T ) i,j A i,j ) 2 (5.5) X R m r (i,j) E k+1 Y k+1 = min (X k+1 Y T ) i,j A i,j ) 2 (5.6) Y R n r (i,j) E k+1 This: solves linear least squares twice in each iteration, in dimensions mr and nr generally takes O((mr) 2 ), O((nr) 2 ), but for the partial separable structure, it is O( E r 2 ), O( E r 2 ) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
69 Sparse LowRank Matrices Theorem (Keshavan et al.) Let us assume M = X (Y ) T + W, X R m r, Y R n r, W R m n with elements of W, X, and Y being bounded i.i.d random variables, for X, Y zeromean, and expectation of W satisfying, among others: θ = σ max (W ), and P ( W i,j W i,j t ) ) 2 exp ( t2 2ω 2. (5.7) There exists constants C 1, C 2 such that k max = C 1 log n and E C 2 κ 8 nr(log n) 2 and E uniformly distributed over all sets of E, such that with probability larger than 1 1/n 4, one has: M (X k (Y k ) T ) F 6 r 2 2k + C 2 rκ 2 (θ + nω ) ɛ (5.8) where κ = max{σ min (X ) 1, σ min (Y ) 1 }. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
70 Regularisations of PCA Alternating minimisation is a very general approach to optimisation problems. For example, consider a generalisation of PCA: with v being l 2 and no norm s. max x R n{ Ax v : x 2 1, x s k}, (6.1) v of l 1 norm works better, in terms of perturbation analysis (stability, robustness). s such as l 1 improves interpretability (sparsity in the loading vector) by approximating l 0. As we have seen in the previous chapter, l 1 norm is nonsmooth. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
71 Regularisations of PCA Alternating minimisation is a very general approach to optimisation problems. For example, consider a generalisation of PCA: with v being l 2 and no norm s. max x R n{ Ax v : x 2 1, x s k}, (6.1) v of l 1 norm works better, in terms of perturbation analysis (stability, robustness). s such as l 1 improves interpretability (sparsity in the loading vector) by approximating l 0. As we have seen in the previous chapter, l 1 norm is nonsmooth. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
72 Regularisations of PCA Alternating minimisation is a very general approach to optimisation problems. For example, consider a generalisation of PCA: with v being l 2 and no norm s. max x R n{ Ax v : x 2 1, x s k}, (6.1) v of l 1 norm works better, in terms of perturbation analysis (stability, robustness). s such as l 1 improves interpretability (sparsity in the loading vector) by approximating l 0. As we have seen in the previous chapter, l 1 norm is nonsmooth. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
73 Regularisations of PCA Alternating minimisation is a very general approach to optimisation problems. For example, consider a generalisation of PCA: with v being l 2 and no norm s. max x R n{ Ax v : x 2 1, x s k}, (6.1) v of l 1 norm works better, in terms of perturbation analysis (stability, robustness). s such as l 1 improves interpretability (sparsity in the loading vector) by approximating l 0. As we have seen in the previous chapter, l 1 norm is nonsmooth. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
74 Regularisations of PCA Alternating minimisation is a very general approach to optimisation problems. For example, consider a generalisation of PCA: with v being l 2 and no norm s. max x R n{ Ax v : x 2 1, x s k}, (6.1) v of l 1 norm works better, in terms of perturbation analysis (stability, robustness). s such as l 1 improves interpretability (sparsity in the loading vector) by approximating l 0. As we have seen in the previous chapter, l 1 norm is nonsmooth. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
75 Regularisations Richtárik et al. summarise 8 possible regularisations of the problem of computing the first PC by combining: two norms for measuring variance (l 1, l 2 ) and two sparsityinducing norms (cardinality l 0 and l 1 ), either in a constraint or in a penalty term. All have the form with X R n and f. OPT = max f (x), (6.2) x X Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
76 Regularisations Richtárik et al. summarise 8 possible regularisations of the problem of computing the first PC by combining: two norms for measuring variance (l 1, l 2 ) and two sparsityinducing norms (cardinality l 0 and l 1 ), either in a constraint or in a penalty term. All have the form with X R n and f. OPT = max f (x), (6.2) x X Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
77 Regularisations Richtárik et al. summarise 8 possible regularisations of the problem of computing the first PC by combining: two norms for measuring variance (l 1, l 2 ) and two sparsityinducing norms (cardinality l 0 and l 1 ), either in a constraint or in a penalty term. All have the form with X R n and f. OPT = max f (x), (6.2) x X Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
78 Regularisations Richtárik et al. summarise 8 possible regularisations of the problem of computing the first PC by combining: two norms for measuring variance (l 1, l 2 ) and two sparsityinducing norms (cardinality l 0 and l 1 ), either in a constraint or in a penalty term. All have the form with X R n and f. OPT = max f (x), (6.2) x X Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
79 Regularisations # v s s use X f (x) 1 L 2 L 0 constraint {x R n : x 2 1, x 0 s} Ax 2 2 L 1 L 0 constraint {x R n : x 2 1, x 0 s} Ax 1 3 L 2 L 1 constraint {x R n : x 2 1, x 1 s} Ax 2 4 L 1 L 1 constraint {x R n : x 2 1, x 1 s} Ax 1 5 L 2 L 0 penalty {x R n : x 2 1} Ax 2 2 γ x 0 6 L 1 L 0 penalty {x R n : x 2 1} Ax 2 1 γ x 0 7 L 2 L 1 penalty {x R n : x 2 1} Ax 2 γ x 1 8 L 1 L 1 penalty {x R n : x 2 1} Ax 1 γ x 1 Table : Eight regularisations of PCA, cited in verbatim from Richtárik et al. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
80 Regularisations Let Y := {y R m Y := {y R m : y 2 1} for the l 2 norm and : y 1} for the l 1 norm, and let F (x, y) be the function obtained from f (x) after replacing Ax with y T Ax (resp. Ax 2 with (y T Ax) 2 ). Then, in view of the above, (6.2) takes on the equivalent form OPT = max F (x, y). (6.3) max x X y Y That is, the 8 problems can be reformulated into the form (6.3). Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
81 Regularisations Let Y := {y R m Y := {y R m : y 2 1} for the l 2 norm and : y 1} for the l 1 norm, and let F (x, y) be the function obtained from f (x) after replacing Ax with y T Ax (resp. Ax 2 with (y T Ax) 2 ). Then, in view of the above, (6.2) takes on the equivalent form OPT = max F (x, y). (6.3) max x X y Y That is, the 8 problems can be reformulated into the form (6.3). Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
82 Regularisations Let Y := {y R m Y := {y R m : y 2 1} for the l 2 norm and : y 1} for the l 1 norm, and let F (x, y) be the function obtained from f (x) after replacing Ax with y T Ax (resp. Ax 2 with (y T Ax) 2 ). Then, in view of the above, (6.2) takes on the equivalent form OPT = max F (x, y). (6.3) max x X y Y That is, the 8 problems can be reformulated into the form (6.3). Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
83 Regularisations # X Y F (x, y) 1 {x R n : x 2 1, x 0 s} {y R m : y 2 1} y T Ax 2 {x R n : x 2 1, x 0 s} {y R m : y 1} y T Ax 3 {x R n : x 2 1, x 1 s} {y R m : y 2 1} y T Ax 4 {x R n : x 2 1, x 1 s} {y R m : y 1} y T Ax 5 {x R n : x 2 1} {y R m : y 2 1} (y T Ax) 2 γ x 0 6 {x R n : x 2 1} {y R m : y 1} (y T Ax) 2 γ x 0 7 {x R n : x 2 1} {y R m : y 2 1} y T Ax γ x 1 8 {x R n : x 2 1} {y R m : y 1} y T Ax γ x 1 Table : Reformulations of the problems from Table 1. Cited in verbatim from Richtárik et al. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
84 Generalising the Power Method The alternating minimisation for the regularised problem (6.3) is: y k = arg max y Y F (x k, y) (6.4) x k+1 = arg max x X F (x, y k ). (6.5) As it turns out, there are closedform solutions for the two subproblems for all the variants above. Notice that Hotelling s deflation is no longer guaranteed to work, although there are replacements. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
85 Generalising the Power Method The alternating minimisation for the regularised problem (6.3) is: y k = arg max y Y F (x k, y) (6.4) x k+1 = arg max x X F (x, y k ). (6.5) As it turns out, there are closedform solutions for the two subproblems for all the variants above. Notice that Hotelling s deflation is no longer guaranteed to work, although there are replacements. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
86 Generalising the Power Method The alternating minimisation for the regularised problem (6.3) is: y k = arg max y Y F (x k, y) (6.4) x k+1 = arg max x X F (x, y k ). (6.5) As it turns out, there are closedform solutions for the two subproblems for all the variants above. Notice that Hotelling s deflation is no longer guaranteed to work, although there are replacements. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
87 A Summary Overall, we have seen that there are NPHard problems, for which one can retrieve the global optimum with high probability. Leading solvers based on alternating minimisation can tackle gigabytesized instances in minutes. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
88 A Summary Overall, we have seen that there are NPHard problems, for which one can retrieve the global optimum with high probability. Leading solvers based on alternating minimisation can tackle gigabytesized instances in minutes. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1
Lecture 5: Singular Value Decomposition SVD (1)
EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25Sep02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system
More informationLecture 7: Singular Value Decomposition
0368324801Algorithms in Data Mining Fall 2013 Lecturer: Edo Liberty Lecture 7: Singular Value Decomposition Warning: This note may contain typos and other inaccuracies which are usually discussed during
More informationSimilarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
More informationLinear Algebra Review. Vectors
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length
More informationCSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye
CSE 494 CSE/CBS 598 Fall 2007: Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye 1 Introduction One important method for data compression and classification is to organize
More informationNotes on Symmetric Matrices
CPSC 536N: Randomized Algorithms 201112 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.
More informationInner Product Spaces and Orthogonality
Inner Product Spaces and Orthogonality week 34 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,
More informationChapter 6. Orthogonality
6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be
More informationα = u v. In other words, Orthogonal Projection
Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v
More informationSingular Value Decomposition (SVD) and Principal Component Analysis (PCA)
Singular Value Decomposition SVD and Principal Component Analysis PCA Edo Liberty Algorithms in Data mining 1 Singular Value Decomposition SVD We will see that any matrix A R m n w.l.o.g. m n can be written
More informationNotes for STA 437/1005 Methods for Multivariate Data
Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.
More information1 Singular Value Decomposition (SVD)
Contents 1 Singular Value Decomposition (SVD) 2 1.1 Singular Vectors................................. 3 1.2 Singular Value Decomposition (SVD)..................... 7 1.3 Best Rank k Approximations.........................
More informationNumerical Linear Algebra Chap. 4: Perturbation and Regularisation
Numerical Linear Algebra Chap. 4: Perturbation and Regularisation Heinrich Voss voss@tuharburg.de Hamburg University of Technology Institute of Numerical Simulation TUHH Heinrich Voss Numerical Linear
More information1. True/False: Circle the correct answer. No justifications are needed in this exercise. (1 point each)
Math 33 AH : Solution to the Final Exam Honors Linear Algebra and Applications 1. True/False: Circle the correct answer. No justifications are needed in this exercise. (1 point each) (1) If A is an invertible
More informationThe Power Method for Eigenvalues and Eigenvectors
Numerical Analysis Massoud Malek The Power Method for Eigenvalues and Eigenvectors The spectrum of a square matrix A, denoted by σ(a) is the set of all eigenvalues of A. The spectral radius of A, denoted
More informationMath 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = 36 + 41i.
Math 5A HW4 Solutions September 5, 202 University of California, Los Angeles Problem 4..3b Calculate the determinant, 5 2i 6 + 4i 3 + i 7i Solution: The textbook s instructions give us, (5 2i)7i (6 + 4i)(
More informationNumerical Methods I Eigenvalue Problems
Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420001, Fall 2010 September 30th, 2010 A. Donev (Courant Institute)
More informationSolution based on matrix technique Rewrite. ) = 8x 2 1 4x 1x 2 + 5x x1 2x 2 2x 1 + 5x 2
8.2 Quadratic Forms Example 1 Consider the function q(x 1, x 2 ) = 8x 2 1 4x 1x 2 + 5x 2 2 Determine whether q(0, 0) is the global minimum. Solution based on matrix technique Rewrite q( x1 x 2 = x1 ) =
More informationThe Hadamard Product
The Hadamard Product Elizabeth Million April 12, 2007 1 Introduction and Basic Results As inexperienced mathematicians we may have once thought that the natural definition for matrix multiplication would
More information6. Cholesky factorization
6. Cholesky factorization EE103 (Fall 201112) triangular matrices forward and backward substitution the Cholesky factorization solving Ax = b with A positive definite inverse of a positive definite matrix
More information[1] Diagonal factorization
8.03 LA.6: Diagonalization and Orthogonal Matrices [ Diagonal factorization [2 Solving systems of first order differential equations [3 Symmetric and Orthonormal Matrices [ Diagonal factorization Recall:
More informationSection 6.1  Inner Products and Norms
Section 6.1  Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,
More informationVector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a nonempty
More informationMatrix Norms. Tom Lyche. September 28, Centre of Mathematics for Applications, Department of Informatics, University of Oslo
Matrix Norms Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo September 28, 2009 Matrix Norms We consider matrix norms on (C m,n, C). All results holds for
More informationComputational Methods CMSC/AMSC/MAPL 460. Eigenvalues and Eigenvectors. Ramani Duraiswami, Dept. of Computer Science
Computational Methods CMSC/AMSC/MAPL 460 Eigenvalues and Eigenvectors Ramani Duraiswami, Dept. of Computer Science Eigen Values of a Matrix Definition: A N N matrix A has an eigenvector x (nonzero) with
More informationChapter 17. Orthogonal Matrices and Symmetries of Space
Chapter 17. Orthogonal Matrices and Symmetries of Space Take a random matrix, say 1 3 A = 4 5 6, 7 8 9 and compare the lengths of e 1 and Ae 1. The vector e 1 has length 1, while Ae 1 = (1, 4, 7) has length
More informationInner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
More informationAdvanced Topics in Machine Learning (Part II)
Advanced Topics in Machine Learning (Part II) 3. Convexity and Optimisation February 6, 2009 Andreas Argyriou 1 Today s Plan Convex sets and functions Types of convex programs Algorithms Convex learning
More informationLecture 5 Principal Minors and the Hessian
Lecture 5 Principal Minors and the Hessian Eivind Eriksen BI Norwegian School of Management Department of Economics October 01, 2010 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP  Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationAn Application of Linear Algebra to Image Compression
An Application of Linear Algebra to Image Compression Paul Dostert July 2, 2009 1 / 16 Image Compression There are hundreds of ways to compress images. Some basic ways use singular value decomposition
More information1 VECTOR SPACES AND SUBSPACES
1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such
More informationUniversity of Lille I PC first year list of exercises n 7. Review
University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients
More informationOctober 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix
Linear Algebra & Properties of the Covariance Matrix October 3rd, 2012 Estimation of r and C Let rn 1, rn, t..., rn T be the historical return rates on the n th asset. rn 1 rṇ 2 r n =. r T n n = 1, 2,...,
More information13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.
3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in threespace, we write a vector in terms
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
More informationAu = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.
Chapter 7 Eigenvalues and Eigenvectors In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigenvectors of matrices, concepts that were already introduced in Geometry
More informationNUMERICALLY EFFICIENT METHODS FOR SOLVING LEAST SQUARES PROBLEMS
NUMERICALLY EFFICIENT METHODS FOR SOLVING LEAST SQUARES PROBLEMS DO Q LEE Abstract. Computing the solution to Least Squares Problems is of great importance in a wide range of fields ranging from numerical
More informationOrthogonal Diagonalization of Symmetric Matrices
MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding
More informationDerivative Free Optimization
Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTHMATR2014/02SE Department of Mathematics Linköping University S581 83 Linköping, Sweden. Three lectures 1 on Derivative Free
More informationExamination paper for TMA4205 Numerical Linear Algebra
Department of Mathematical Sciences Examination paper for TMA4205 Numerical Linear Algebra Academic contact during examination: Markus Grasmair Phone: 97580435 Examination date: December 16, 2015 Examination
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
More informationEpipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.
Epipolar Geometry We consider two perspective images of a scene as taken from a stereo pair of cameras (or equivalently, assume the scene is rigid and imaged with a single camera from two different locations).
More information3. INNER PRODUCT SPACES
. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.
More informationMATH 304 Linear Algebra Lecture 4: Matrix multiplication. Diagonal matrices. Inverse matrix.
MATH 304 Linear Algebra Lecture 4: Matrix multiplication. Diagonal matrices. Inverse matrix. Matrices Definition. An mbyn matrix is a rectangular array of numbers that has m rows and n columns: a 11
More informationWHICH LINEARFRACTIONAL TRANSFORMATIONS INDUCE ROTATIONS OF THE SPHERE?
WHICH LINEARFRACTIONAL TRANSFORMATIONS INDUCE ROTATIONS OF THE SPHERE? JOEL H. SHAPIRO Abstract. These notes supplement the discussion of linear fractional mappings presented in a beginning graduate course
More information2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
More informationKey words. Principal Component Analysis, Convex optimization, Nuclear norm minimization, Duality, Proximal gradient algorithms.
FAST CONVEX OPTIMIZATION ALGORITHMS FOR EXACT RECOVERY OF A CORRUPTED LOWRANK MATRIX ZHOUCHEN LIN, ARVIND GANESH, JOHN WRIGHT, LEQIN WU, MINMING CHEN, AND YI MA Abstract. This paper studies algorithms
More informationNonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., RichardPlouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationNumerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 6. Eigenvalues and Singular Values In this section, we collect together the basic facts about eigenvalues and eigenvectors. From a geometrical viewpoint,
More informationCheng Soon Ong & Christfried Webers. Canberra February June 2016
c Cheng Soon Ong & Christfried Webers Research Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 31 c Part I
More informationOn Covariance Structure in Noisy, Big Data
On Covariance Structure in Noisy, Big Data Randy C. Paffenroth a, Ryan Nong a and Philip C. Du Toit a a Numerica Corporation, Loveland, CO, USA; ABSTRACT Herein we describe theory and algorithms for detecting
More informationRecall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.
ORTHOGONAL MATRICES Informally, an orthogonal n n matrix is the ndimensional analogue of the rotation matrices R θ in R 2. When does a linear transformation of R 3 (or R n ) deserve to be called a rotation?
More informationME128 ComputerAided Mechanical Design Course Notes Introduction to Design Optimization
ME128 Computerided Mechanical Design Course Notes Introduction to Design Optimization 2. OPTIMIZTION Design optimization is rooted as a basic problem for design engineers. It is, of course, a rare situation
More informationMATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).
MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors Jordan canonical form (continued) Jordan canonical form A Jordan block is a square matrix of the form λ 1 0 0 0 0 λ 1 0 0 0 0 λ 0 0 J = 0
More informationNimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff
Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear
More informationLecture 3: Convex Sets and Functions
EE 227A: Convex Optimization and Applications January 24, 2012 Lecture 3: Convex Sets and Functions Lecturer: Laurent El Ghaoui Reading assignment: Chapters 2 (except 2.6) and sections 3.1, 3.2, 3.3 of
More informationSummary of week 8 (Lectures 22, 23 and 24)
WEEK 8 Summary of week 8 (Lectures 22, 23 and 24) This week we completed our discussion of Chapter 5 of [VST] Recall that if V and W are inner product spaces then a linear map T : V W is called an isometry
More informationLINEAR ALGEBRA. September 23, 2010
LINEAR ALGEBRA September 3, 00 Contents 0. LUdecomposition.................................... 0. Inverses and Transposes................................. 0.3 Column Spaces and NullSpaces.............................
More informationNOTES on LINEAR ALGEBRA 1
School of Economics, Management and Statistics University of Bologna Academic Year 205/6 NOTES on LINEAR ALGEBRA for the students of Stats and Maths This is a modified version of the notes by Prof Laura
More informationConstrained Least Squares
Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580587 CICN may05/1 Background The least squares problem: min Ax b 2 x Sometimes,
More information4: EIGENVALUES, EIGENVECTORS, DIAGONALIZATION
4: EIGENVALUES, EIGENVECTORS, DIAGONALIZATION STEVEN HEILMAN Contents 1. Review 1 2. Diagonal Matrices 1 3. Eigenvectors and Eigenvalues 2 4. Characteristic Polynomial 4 5. Diagonalizability 6 6. Appendix:
More informationThe Trimmed Iterative Closest Point Algorithm
Image and Pattern Analysis (IPAN) Group Computer and Automation Research Institute, HAS Budapest, Hungary The Trimmed Iterative Closest Point Algorithm Dmitry Chetverikov and Dmitry Stepanov http://visual.ipan.sztaki.hu
More information9.3 Advanced Topics in Linear Algebra
548 93 Advanced Topics in Linear Algebra Diagonalization and Jordan s Theorem A system of differential equations x = Ax can be transformed to an uncoupled system y = diag(λ,, λ n y by a change of variables
More informationLectures notes on orthogonal matrices (with exercises) 92.222  Linear Algebra II  Spring 2004 by D. Klain
Lectures notes on orthogonal matrices (with exercises) 92.222  Linear Algebra II  Spring 2004 by D. Klain 1. Orthogonal matrices and orthonormal sets An n n realvalued matrix A is said to be an orthogonal
More informationLeastSquares Intersection of Lines
LeastSquares Intersection of Lines Johannes Traa  UIUC 2013 This writeup derives the leastsquares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
More informationBasics Inversion and related concepts Random vectors Matrix calculus. Matrix algebra. Patrick Breheny. January 20
Matrix algebra January 20 Introduction Basics The mathematics of multiple regression revolves around ordering and keeping track of large arrays of numbers and solving systems of equations The mathematical
More informationPrincipal Component Analysis Application to images
Principal Component Analysis Application to images Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception http://cmp.felk.cvut.cz/
More information5. Orthogonal matrices
L Vandenberghe EE133A (Spring 2016) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 51 Orthonormal
More informationLINEAR ALGEBRA W W L CHEN
LINEAR ALGEBRA W W L CHEN c W W L Chen, 1997, 2008 This chapter is available free to all individuals, on understanding that it is not to be used for financial gain, and may be downloaded and/or photocopied,
More informationCS3220 Lecture Notes: QR factorization and orthogonal transformations
CS3220 Lecture Notes: QR factorization and orthogonal transformations Steve Marschner Cornell University 11 March 2009 In this lecture I ll talk about orthogonal matrices and their properties, discuss
More informationMatrices, Determinants and Linear Systems
September 21, 2014 Matrices A matrix A m n is an array of numbers in rows and columns a 11 a 12 a 1n r 1 a 21 a 22 a 2n r 2....... a m1 a m2 a mn r m c 1 c 2 c n We say that the dimension of A is m n (we
More informationLinear Algebraic Equations, SVD, and the PseudoInverse
Linear Algebraic Equations, SVD, and the PseudoInverse Philip N. Sabes October, 21 1 A Little Background 1.1 Singular values and matrix inversion For nonsmmetric matrices, the eigenvalues and singular
More informationLINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12,
LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12, 2000 45 4 Iterative methods 4.1 What a two year old child can do Suppose we want to find a number x such that cos x = x (in radians). This is a nonlinear
More informationThe Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression
The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonaldiagonalorthogonal type matrix decompositions Every
More informationNumerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and NonSquare Systems
Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and NonSquare Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420001,
More informationIntroduction to Convex Optimization for Machine Learning
Introduction to Convex Optimization for Machine Learning John Duchi University of California, Berkeley Practical Machine Learning, Fall 2009 Duchi (UC Berkeley) Convex Optimization for Machine Learning
More informationLinear Least Squares
Linear Least Squares Suppose we are given a set of data points {(x i,f i )}, i = 1,...,n. These could be measurements from an experiment or obtained simply by evaluating a function at some points. One
More informationMATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.
MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α
More informationMATH36001 Background Material 2015
MATH3600 Background Material 205 Matrix Algebra Matrices and Vectors An ordered array of mn elements a ij (i =,, m; j =,, n) written in the form a a 2 a n A = a 2 a 22 a 2n a m a m2 a mn is said to be
More informationCS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d dimensional subspace Axes of this subspace
More informationRow and column operations
Row and column operations It is often very useful to apply row and column operations to a matrix. Let us list what operations we re going to be using. 3 We ll illustrate these using the example matrix
More informationx1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.
Cross product 1 Chapter 7 Cross product We are getting ready to study integration in several variables. Until now we have been doing only differential calculus. One outcome of this study will be our ability
More informationLecture 4: Partitioned Matrices and Determinants
Lecture 4: Partitioned Matrices and Determinants 1 Elementary row operations Recall the elementary operations on the rows of a matrix, equivalent to premultiplying by an elementary matrix E: (1) multiplying
More information1 Norms and Vector Spaces
008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)
More informationQuadratic forms Cochran s theorem, degrees of freedom, and all that
Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us
More informationNotes on Orthogonal and Symmetric Matrices MENU, Winter 2013
Notes on Orthogonal and Symmetric Matrices MENU, Winter 201 These notes summarize the main properties and uses of orthogonal and symmetric matrices. We covered quite a bit of material regarding these topics,
More informationA note on companion matrices
Linear Algebra and its Applications 372 (2003) 325 33 www.elsevier.com/locate/laa A note on companion matrices Miroslav Fiedler Academy of Sciences of the Czech Republic Institute of Computer Science Pod
More informationLecture 1: Schur s Unitary Triangularization Theorem
Lecture 1: Schur s Unitary Triangularization Theorem This lecture introduces the notion of unitary equivalence and presents Schur s theorem and some of its consequences It roughly corresponds to Sections
More informationAn Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.
An Overview Of Software For Convex Optimization Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.edu In fact, the great watershed in optimization isn t between linearity
More informationMaximumMargin Matrix Factorization
MaximumMargin Matrix Factorization Nathan Srebro Dept. of Computer Science University of Toronto Toronto, ON, CANADA nati@cs.toronto.edu Jason D. M. Rennie Tommi S. Jaakkola Computer Science and Artificial
More informationQuadratic Functions, Optimization, and Quadratic Forms
Quadratic Functions, Optimization, and Quadratic Forms Robert M. Freund February, 2004 2004 Massachusetts Institute of echnology. 1 2 1 Quadratic Optimization A quadratic optimization problem is an optimization
More informationSolving polynomial least squares problems via semidefinite programming relaxations
Solving polynomial least squares problems via semidefinite programming relaxations Sunyoung Kim and Masakazu Kojima August 2007, revised in November, 2007 Abstract. A polynomial optimization problem whose
More informationReview Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 03 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
More information1 Introduction to Matrices
1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns
More informationAbsolute Value Programming
Computational Optimization and Aplications,, 1 11 (2006) c 2006 Springer Verlag, Boston. Manufactured in The Netherlands. Absolute Value Programming O. L. MANGASARIAN olvi@cs.wisc.edu Computer Sciences
More informationPractical Numerical Training UKNum
Practical Numerical Training UKNum 7: Systems of linear equations C. Mordasini Max Planck Institute for Astronomy, Heidelberg Program: 1) Introduction 2) Gauss Elimination 3) Gauss with Pivoting 4) Determinants
More informationEigenvalues and eigenvectors of a matrix
Eigenvalues and eigenvectors of a matrix Definition: If A is an n n matrix and there exists a real number λ and a nonzero column vector V such that AV = λv then λ is called an eigenvalue of A and V is
More information