Lecture Topic: Low-Rank Approximations

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Lecture Topic: Low-Rank Approximations"

Transcription

1 Lecture Topic: Low-Rank Approximations

2 Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original matrix by a rank-1 matrix. In this chapter, we will consider problems, where a sparse matrix is given and one hopes to find a structured (e.g., low-rank), dense matrix as close as possible to it, in some norm. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

3 Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original matrix by a rank-1 matrix. In this chapter, we will consider problems, where a sparse matrix is given and one hopes to find a structured (e.g., low-rank), dense matrix as close as possible to it, in some norm. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

4 Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original matrix by a rank-1 matrix. In this chapter, we will consider problems, where a sparse matrix is given and one hopes to find a structured (e.g., low-rank), dense matrix as close as possible to it, in some norm. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

5 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rank-r matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

6 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rank-r matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

7 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rank-r matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

8 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rank-r matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

9 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rank-r matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

10 The Continuing Example Consider the example of collaborative filtering: Let us know only some elements (i, j) E of matrix A R m n, corresponding to ratings of m users of n movies or books. There, the set M could be the rank-r matrices, motivated by the best possible transformation to new coordinate system with r axes, such as likes horrors and likes romantic comedies. Notice that in collaborative filtering, each user may rate 200 out of movies on offer. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

11 Another Example One may also consider estimating positions of sensors from some of their pair-wise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have low-power radios, which allow them to estimate their distance from a handful of closest sensors. From these pair-wise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

12 Another Example One may also consider estimating positions of sensors from some of their pair-wise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have low-power radios, which allow them to estimate their distance from a handful of closest sensors. From these pair-wise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

13 Another Example One may also consider estimating positions of sensors from some of their pair-wise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have low-power radios, which allow them to estimate their distance from a handful of closest sensors. From these pair-wise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

14 Another Example One may also consider estimating positions of sensors from some of their pair-wise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have low-power radios, which allow them to estimate their distance from a handful of closest sensors. From these pair-wise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

15 Another Example One may also consider estimating positions of sensors from some of their pair-wise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have low-power radios, which allow them to estimate their distance from a handful of closest sensors. From these pair-wise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

16 Another Example One may also consider estimating positions of sensors from some of their pair-wise distances, which is known as sensor network localisation. In many applications, e.g. in the sewers, the sensors do not actually have GPS signal, but they have low-power radios, which allow them to estimate their distance from a handful of closest sensors. From these pair-wise measurements, you want to retrieve the positions of all sensors. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

17 Yet Another Example In the most striking result, we will see that for random rank-r matrices, knowing randomly drawn O(nr(log n) 2 ) elements makes it possible to reconstruct the complete matrix of O(n 2 ) elements without any error, with high probability. This has far-reaching consequences: Consider, for instance a digital camera. The price of sensors increases with the number of pixels, but many images are naturally low-rank. Although cameras with a single-pixel chip ( remain a curiosity, super-resolution techniques are actually wide-spread in medical imagining, where battery capacity is not a concern. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

18 Yet Another Example In the most striking result, we will see that for random rank-r matrices, knowing randomly drawn O(nr(log n) 2 ) elements makes it possible to reconstruct the complete matrix of O(n 2 ) elements without any error, with high probability. This has far-reaching consequences: Consider, for instance a digital camera. The price of sensors increases with the number of pixels, but many images are naturally low-rank. Although cameras with a single-pixel chip ( remain a curiosity, super-resolution techniques are actually wide-spread in medical imagining, where battery capacity is not a concern. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

19 Key Concepts A singular value and pair of singular vectors of A R m n are a scalar σ R, σ 0 and two non-zero vectors u R m and v R n such that Av = σu. In a matrix completion problem, with some elements (i, j) E of matrix A R m n known, you solve: min rank(m) s.t. M M R m r i,j = A i,j (i, j) E. (1.1) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

20 Key Concepts A singular value and pair of singular vectors of A R m n are a scalar σ R, σ 0 and two non-zero vectors u R m and v R n such that Av = σu. In a matrix completion problem, with some elements (i, j) E of matrix A R m n known, you solve: min rank(m) s.t. M M R m r i,j = A i,j (i, j) E. (1.1) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

21 Some Revision Definition (Orthogonality) Two vectors u, v R n are orthogonal if and only if their dot product n i=1 u iv i is zero. This suggest the angle of 90 degrees. The columns and rows of an orthogonal matrix U R n n are orthogonal unit vectors, i.e., U T U = UU T = I, where I is the identity matrix. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

22 Some Revision Definition (Orthogonality) Two vectors u, v R n are orthogonal if and only if their dot product n i=1 u iv i is zero. This suggest the angle of 90 degrees. The columns and rows of an orthogonal matrix U R n n are orthogonal unit vectors, i.e., U T U = UU T = I, where I is the identity matrix. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

23 Some More Intuition The linear transformation x Qx, for an orthogonal Q, is an isometry, i.e., preserves the dot product of vectors. Imagine a rotation or reflection. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

24 Some More Intuition The linear transformation x Qx, for an orthogonal Q, is an isometry, i.e., preserves the dot product of vectors. Imagine a rotation or reflection. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

25 Some More Intuition The linear transformation x Qx, for an orthogonal Q, is an isometry, i.e., preserves the dot product of vectors. Imagine a rotation or reflection. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

26 Some Revision Definition (Singular values and vectors of a matrix A R m n ) For every matrix A R m n, there exists a decomposition A = UΣV T, where: U is an m m orthogonal matrix whose m columns are left-singular vectors of A; Σ is m n matrix with Σ i,i 0, i min{m, n} being the singular values of A and all other elements 0; V T is n n orthogonal matrix whose n columns are right-singular vectors of A. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

27 Some More Intuition For A, det(a) > 0, Σ is a scaling matrix and U, V T rotation matrices. UΣV T is a composition a rotation, a scaling, and another rotation Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

28 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a non-negative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with non-negative real diagonal entries, which are the lengths of semi-axes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

29 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a non-negative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with non-negative real diagonal entries, which are the lengths of semi-axes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

30 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a non-negative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with non-negative real diagonal entries, which are the lengths of semi-axes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

31 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a non-negative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with non-negative real diagonal entries, which are the lengths of semi-axes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

32 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a non-negative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with non-negative real diagonal entries, which are the lengths of semi-axes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

33 Some More Intuition Every matrix A = UΣV T corresponds to a linear map T : R n R m. There are orthonormal bases of R n and R m such that T maps a basis vector of R n to a non-negative multiple of a basis vector of R m, for i = 1,, min{m, n} With respect to these bases, the T is represented by a diagonal matrix Σ with non-negative real diagonal entries, which are the lengths of semi-axes of an ellipsoid in R m, which would result in applying T to the unit sphere in R n. Formally, T (x) := Ax for A = UΣV T, T : R n R m. T (V i ) = σ i U i for all i = 1,, min{m, n}, T (V i ) = 0 for i > min{m, n}. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

34 Singular Values: Perturbation Analysis Much of the perturbation analysis we have seen for eigenvalues carries over. Let 0 m n, and let A R m n. Weyl inequality, for example: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m (2.1) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

35 Some Revision We have seen a variety of norms of x R n : Example n l 1 norm x 1 := x i (3.1) i=1 Maximum norm x := max { x 1,..., x n }. (3.2) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

36 Some Revision Let us consider a new concept, the conjugate norms and. By definition, In particular, 2 = 2 and 1 =. z = max y 1 y T z. (3.3) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

37 Some Revision Let us consider a new concept, the conjugate norms and. By definition, In particular, 2 = 2 and 1 =. z = max y 1 y T z. (3.3) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

38 Some Revision Let us consider a new concept, the conjugate norms and. By definition, In particular, 2 = 2 and 1 =. z = max y 1 y T z. (3.3) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

39 Some Revision Definition (Matrix norm) A is a norm of a matrix A R m n if and only if: A 0 A = 0 if and only if A = 0 αa = α A for all α in R and A R m n A + B A + B for all A, B R m n. Definition (Trace of A R n n ) trace(a) = a 11 + a a nn = n i=1 a ii. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

40 Some Revision ( ) min{m, n} Nuclear norm A := trace A T A = σ i. (3.4) Frobenius norm A F := trace(a T A) = i=1 k i=1 j=1 n a ij 2 1/2 = min{m, n} σi 2 i=1 (3.5) Spectral norm A 2 := λ max (A A) = σ max (A) (3.6) where A A denotes a positive semidefinite B such that B = A T A. F = F and spectral norm is the conjugate of the nuclear norm. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

41 Some Revision ( ) min{m, n} Nuclear norm A := trace A T A = σ i. (3.4) Frobenius norm A F := trace(a T A) = i=1 k i=1 j=1 n a ij 2 1/2 = min{m, n} σi 2 i=1 (3.5) Spectral norm A 2 := λ max (A A) = σ max (A) (3.6) where A A denotes a positive semidefinite B such that B = A T A. F = F and spectral norm is the conjugate of the nuclear norm. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

42 Some More Understanding Previously, we have mentioned that all matrix norms are similar. For matrix A R m n of rank r: A 2 A F r A 2 A F A r A F Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

43 Some More Understanding Previously, we have mentioned that all matrix norms are similar. For matrix A R m n of rank r: A 2 A F r A 2 A F A r A F Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

44 Matrix Completion In general, let us consider: min M M A M N where M R m n is some subset of m n matrices, N is a matrix norm. In particular: 2 or F, M is rank-r, A is dense, M is dense: SVD F, M is rank-r, A is sparse, M is dense: NP-Hard various N, M is rank-1 with sparsity: NP-Hard Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

45 Low-Rank Matrices Theorem (Eckart and Young) Let us have rank-r matrix A R m n, A = UΣV T = r i=1 σ iu i vi T. Consider k k < r and the so called truncated singular value decomposition A k = σ i u i vi T, More visually, arg min B R m n rank(b) k A B F = arg min B R m n rank(b) k i=1 A B 2 = A k (4.1) A = [ ] [ ] Σ U 1 U 1 0 [V1 ] T 2 V 0 Σ 2, (4.2) 2 A K = U 1 Σ 1 V T 1 (4.3) where Σ 1 R k k, U 1 R m k, and V 1 R n k. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

46 Low-Rank Matrices There are a number of proofs One can use the Weyl inequality: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m If B has rank k, σ k+1 (B) = 0. One uses B and AB, j = k + 1. For the spectral norm, i = 1 suffices. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

47 Low-Rank Matrices There are a number of proofs One can use the Weyl inequality: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m If B has rank k, σ k+1 (B) = 0. One uses B and AB, j = k + 1. For the spectral norm, i = 1 suffices. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

48 Low-Rank Matrices There are a number of proofs One can use the Weyl inequality: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m If B has rank k, σ k+1 (B) = 0. One uses B and AB, j = k + 1. For the spectral norm, i = 1 suffices. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

49 Low-Rank Matrices There are a number of proofs One can use the Weyl inequality: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m If B has rank k, σ k+1 (B) = 0. One uses B and AB, j = k + 1. For the spectral norm, i = 1 suffices. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

50 Low-Rank Matrices There are a number of proofs One can use the Weyl inequality: σ i+j 1 (A + B) σ i (A) + σ j (B) for all 1 i, j, i + j 1 m If B has rank k, σ k+1 (B) = 0. One uses B and AB, j = k + 1. For the spectral norm, i = 1 suffices. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

51 Sparse Low-Rank Matrices Consider again the applications of low-rank matrix reconstruction: predicting ratings of movies by individual users, in collaborative filtering, wher each user has rated 200 out of movies on offer, or estimating positions of sensors from some of their pair-wise positions, in sensor network localisation, where one may know positions to 4 or 5 sensors. They share the property that we know only a very small number of entries of the matrix. Imputing 0 or similar is a bad idea. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

52 Sparse Low-Rank Matrices Consider again the applications of low-rank matrix reconstruction: predicting ratings of movies by individual users, in collaborative filtering, wher each user has rated 200 out of movies on offer, or estimating positions of sensors from some of their pair-wise positions, in sensor network localisation, where one may know positions to 4 or 5 sensors. They share the property that we know only a very small number of entries of the matrix. Imputing 0 or similar is a bad idea. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

53 Sparse Low-Rank Matrices Consider again the applications of low-rank matrix reconstruction: predicting ratings of movies by individual users, in collaborative filtering, wher each user has rated 200 out of movies on offer, or estimating positions of sensors from some of their pair-wise positions, in sensor network localisation, where one may know positions to 4 or 5 sensors. They share the property that we know only a very small number of entries of the matrix. Imputing 0 or similar is a bad idea. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

54 Sparse Low-Rank Matrices Consider again the applications of low-rank matrix reconstruction: predicting ratings of movies by individual users, in collaborative filtering, wher each user has rated 200 out of movies on offer, or estimating positions of sensors from some of their pair-wise positions, in sensor network localisation, where one may know positions to 4 or 5 sensors. They share the property that we know only a very small number of entries of the matrix. Imputing 0 or similar is a bad idea. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

55 Sparse Low-Rank Matrices Consider again the applications of low-rank matrix reconstruction: predicting ratings of movies by individual users, in collaborative filtering, wher each user has rated 200 out of movies on offer, or estimating positions of sensors from some of their pair-wise positions, in sensor network localisation, where one may know positions to 4 or 5 sensors. They share the property that we know only a very small number of entries of the matrix. Imputing 0 or similar is a bad idea. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

56 Sparse Low-Rank Matrices Let us know only some elements (i, j) E of matrix A R m n. Assume that there exists only one rank-r matrix M with those entries. Then, the search for the simplest explanation fitting the observed data is: The problem is: min rank(m) s.t. M M R m r i,j = A i,j (i, j) E (5.1) non-convex in M and very hard easy to reformulate in a number of ways. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

57 Sparse Low-Rank Matrices Let us know only some elements (i, j) E of matrix A R m n. Consider the fact that rank-r matrix M = XY T, X R m r, Y R n r and: The problem is: non-convex in XY T arg convex in either X or Y. min X R m r Y R n r (i,j) E ( (XY T ) i,j A i,j ) 2 (5.2) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

58 Sparse Low-Rank Matrices A rank-r matrix has exactly r non-zero singular values. Rank can hence be seen as the l 0 norm of the spectrum. Considering we have seen l 0 norm being replaced by l 1 norm, Fazel proposed to replace rank with the spectral norm: The problem is: arg min M subject to M R m n (i,j) E ( (MY T ) i,j A i,j ) 2 (5.3) convex in M and possible to solve using interior-point methods the optimum of the convex problem coincides with the global optimum of the non-convex problem (!) with high probability: Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

59 Sparse Low-Rank Matrices A rank-r matrix has exactly r non-zero singular values. Rank can hence be seen as the l 0 norm of the spectrum. Considering we have seen l 0 norm being replaced by l 1 norm, Fazel proposed to replace rank with the spectral norm: The problem is: arg min M subject to M R m n (i,j) E ( (MY T ) i,j A i,j ) 2 (5.3) convex in M and possible to solve using interior-point methods the optimum of the convex problem coincides with the global optimum of the non-convex problem (!) with high probability: Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

60 Sparse Low-Rank Matrices A rank-r matrix has exactly r non-zero singular values. Rank can hence be seen as the l 0 norm of the spectrum. Considering we have seen l 0 norm being replaced by l 1 norm, Fazel proposed to replace rank with the spectral norm: The problem is: arg min M subject to M R m n (i,j) E ( (MY T ) i,j A i,j ) 2 (5.3) convex in M and possible to solve using interior-point methods the optimum of the convex problem coincides with the global optimum of the non-convex problem (!) with high probability: Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

61 Sparse Low-Rank Matrices Theorem (Candes and Recht) Let us assume M R m n of rank r is sampled from the random orthogonal model. Suppose we observe entries of M with locations E sampled uniformly at random. Then there are numerical constants C 1 and C 1 such that if E C 1 r (max{m, n}) 5/4 log(max{m, n}), (5.4) the minimizer to the -minimisation problem is unique and equal to M with probability at least 1 C 2 (max{m, n}) 3. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

62 Sparse Low-Rank Matrices The result of the previous theorem is of considerable theoretical and practical interest. It has been cited more than 1800 times. Although the -minimisation problem is possible to approximate within any fixed precision in polynomial time, this is limited to modest n 1000 in practice. Notice that the interior point method needs to invert the Hessian, where even the matrix variable is n n. One would hence like to find more efficient algorithms. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

63 Sparse Low-Rank Matrices The result of the previous theorem is of considerable theoretical and practical interest. It has been cited more than 1800 times. Although the -minimisation problem is possible to approximate within any fixed precision in polynomial time, this is limited to modest n 1000 in practice. Notice that the interior point method needs to invert the Hessian, where even the matrix variable is n n. One would hence like to find more efficient algorithms. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

64 Sparse Low-Rank Matrices The result of the previous theorem is of considerable theoretical and practical interest. It has been cited more than 1800 times. Although the -minimisation problem is possible to approximate within any fixed precision in polynomial time, this is limited to modest n 1000 in practice. Notice that the interior point method needs to invert the Hessian, where even the matrix variable is n n. One would hence like to find more efficient algorithms. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

65 Sparse Low-Rank Matrices The result of the previous theorem is of considerable theoretical and practical interest. It has been cited more than 1800 times. Although the -minimisation problem is possible to approximate within any fixed precision in polynomial time, this is limited to modest n 1000 in practice. Notice that the interior point method needs to invert the Hessian, where even the matrix variable is n n. One would hence like to find more efficient algorithms. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

66 Sparse Low-Rank Matrices The result of the previous theorem is of considerable theoretical and practical interest. It has been cited more than 1800 times. Although the -minimisation problem is possible to approximate within any fixed precision in polynomial time, this is limited to modest n 1000 in practice. Notice that the interior point method needs to invert the Hessian, where even the matrix variable is n n. One would hence like to find more efficient algorithms. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

67 : Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

68 Sparse Low-Rank Matrices Alternating Minimisation: 1 Partition E = E 1 E 2... E kmax 2 Compute SVD min m,n i σ i X i Yi T considering only E 1 3 Initialise X 1 = mn E 1 σi x i Y 1 = mn E 1 σi y i 4 For each iteration k = 1... k max O(log n): X k+1 = min (X ((Y k ) T ) i,j A i,j ) 2 (5.5) X R m r (i,j) E k+1 Y k+1 = min (X k+1 Y T ) i,j A i,j ) 2 (5.6) Y R n r (i,j) E k+1 This: solves linear least squares twice in each iteration, in dimensions mr and nr generally takes O((mr) 2 ), O((nr) 2 ), but for the partial separable structure, it is O( E r 2 ), O( E r 2 ) Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

69 Sparse Low-Rank Matrices Theorem (Keshavan et al.) Let us assume M = X (Y ) T + W, X R m r, Y R n r, W R m n with elements of W, X, and Y being bounded i.i.d random variables, for X, Y zero-mean, and expectation of W satisfying, among others: θ = σ max (W ), and P ( W i,j W i,j t ) ) 2 exp ( t2 2ω 2. (5.7) There exists constants C 1, C 2 such that k max = C 1 log n and E C 2 κ 8 nr(log n) 2 and E uniformly distributed over all sets of E, such that with probability larger than 1 1/n 4, one has: M (X k (Y k ) T ) F 6 r 2 2k + C 2 rκ 2 (θ + nω ) ɛ (5.8) where κ = max{σ min (X ) 1, σ min (Y ) 1 }. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

70 Regularisations of PCA Alternating minimisation is a very general approach to optimisation problems. For example, consider a generalisation of PCA: with v being l 2 and no norm s. max x R n{ Ax v : x 2 1, x s k}, (6.1) v of l 1 norm works better, in terms of perturbation analysis (stability, robustness). s such as l 1 improves interpretability (sparsity in the loading vector) by approximating l 0. As we have seen in the previous chapter, l 1 norm is non-smooth. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

71 Regularisations of PCA Alternating minimisation is a very general approach to optimisation problems. For example, consider a generalisation of PCA: with v being l 2 and no norm s. max x R n{ Ax v : x 2 1, x s k}, (6.1) v of l 1 norm works better, in terms of perturbation analysis (stability, robustness). s such as l 1 improves interpretability (sparsity in the loading vector) by approximating l 0. As we have seen in the previous chapter, l 1 norm is non-smooth. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

72 Regularisations of PCA Alternating minimisation is a very general approach to optimisation problems. For example, consider a generalisation of PCA: with v being l 2 and no norm s. max x R n{ Ax v : x 2 1, x s k}, (6.1) v of l 1 norm works better, in terms of perturbation analysis (stability, robustness). s such as l 1 improves interpretability (sparsity in the loading vector) by approximating l 0. As we have seen in the previous chapter, l 1 norm is non-smooth. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

73 Regularisations of PCA Alternating minimisation is a very general approach to optimisation problems. For example, consider a generalisation of PCA: with v being l 2 and no norm s. max x R n{ Ax v : x 2 1, x s k}, (6.1) v of l 1 norm works better, in terms of perturbation analysis (stability, robustness). s such as l 1 improves interpretability (sparsity in the loading vector) by approximating l 0. As we have seen in the previous chapter, l 1 norm is non-smooth. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

74 Regularisations of PCA Alternating minimisation is a very general approach to optimisation problems. For example, consider a generalisation of PCA: with v being l 2 and no norm s. max x R n{ Ax v : x 2 1, x s k}, (6.1) v of l 1 norm works better, in terms of perturbation analysis (stability, robustness). s such as l 1 improves interpretability (sparsity in the loading vector) by approximating l 0. As we have seen in the previous chapter, l 1 norm is non-smooth. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

75 Regularisations Richtárik et al. summarise 8 possible regularisations of the problem of computing the first PC by combining: two norms for measuring variance (l 1, l 2 ) and two sparsity-inducing norms (cardinality l 0 and l 1 ), either in a constraint or in a penalty term. All have the form with X R n and f. OPT = max f (x), (6.2) x X Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

76 Regularisations Richtárik et al. summarise 8 possible regularisations of the problem of computing the first PC by combining: two norms for measuring variance (l 1, l 2 ) and two sparsity-inducing norms (cardinality l 0 and l 1 ), either in a constraint or in a penalty term. All have the form with X R n and f. OPT = max f (x), (6.2) x X Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

77 Regularisations Richtárik et al. summarise 8 possible regularisations of the problem of computing the first PC by combining: two norms for measuring variance (l 1, l 2 ) and two sparsity-inducing norms (cardinality l 0 and l 1 ), either in a constraint or in a penalty term. All have the form with X R n and f. OPT = max f (x), (6.2) x X Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

78 Regularisations Richtárik et al. summarise 8 possible regularisations of the problem of computing the first PC by combining: two norms for measuring variance (l 1, l 2 ) and two sparsity-inducing norms (cardinality l 0 and l 1 ), either in a constraint or in a penalty term. All have the form with X R n and f. OPT = max f (x), (6.2) x X Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

79 Regularisations # v s s use X f (x) 1 L 2 L 0 constraint {x R n : x 2 1, x 0 s} Ax 2 2 L 1 L 0 constraint {x R n : x 2 1, x 0 s} Ax 1 3 L 2 L 1 constraint {x R n : x 2 1, x 1 s} Ax 2 4 L 1 L 1 constraint {x R n : x 2 1, x 1 s} Ax 1 5 L 2 L 0 penalty {x R n : x 2 1} Ax 2 2 γ x 0 6 L 1 L 0 penalty {x R n : x 2 1} Ax 2 1 γ x 0 7 L 2 L 1 penalty {x R n : x 2 1} Ax 2 γ x 1 8 L 1 L 1 penalty {x R n : x 2 1} Ax 1 γ x 1 Table : Eight regularisations of PCA, cited in verbatim from Richtárik et al. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

80 Regularisations Let Y := {y R m Y := {y R m : y 2 1} for the l 2 -norm and : y 1} for the l 1 norm, and let F (x, y) be the function obtained from f (x) after replacing Ax with y T Ax (resp. Ax 2 with (y T Ax) 2 ). Then, in view of the above, (6.2) takes on the equivalent form OPT = max F (x, y). (6.3) max x X y Y That is, the 8 problems can be reformulated into the form (6.3). Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

81 Regularisations Let Y := {y R m Y := {y R m : y 2 1} for the l 2 -norm and : y 1} for the l 1 norm, and let F (x, y) be the function obtained from f (x) after replacing Ax with y T Ax (resp. Ax 2 with (y T Ax) 2 ). Then, in view of the above, (6.2) takes on the equivalent form OPT = max F (x, y). (6.3) max x X y Y That is, the 8 problems can be reformulated into the form (6.3). Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

82 Regularisations Let Y := {y R m Y := {y R m : y 2 1} for the l 2 -norm and : y 1} for the l 1 norm, and let F (x, y) be the function obtained from f (x) after replacing Ax with y T Ax (resp. Ax 2 with (y T Ax) 2 ). Then, in view of the above, (6.2) takes on the equivalent form OPT = max F (x, y). (6.3) max x X y Y That is, the 8 problems can be reformulated into the form (6.3). Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

83 Regularisations # X Y F (x, y) 1 {x R n : x 2 1, x 0 s} {y R m : y 2 1} y T Ax 2 {x R n : x 2 1, x 0 s} {y R m : y 1} y T Ax 3 {x R n : x 2 1, x 1 s} {y R m : y 2 1} y T Ax 4 {x R n : x 2 1, x 1 s} {y R m : y 1} y T Ax 5 {x R n : x 2 1} {y R m : y 2 1} (y T Ax) 2 γ x 0 6 {x R n : x 2 1} {y R m : y 1} (y T Ax) 2 γ x 0 7 {x R n : x 2 1} {y R m : y 2 1} y T Ax γ x 1 8 {x R n : x 2 1} {y R m : y 1} y T Ax γ x 1 Table : Reformulations of the problems from Table 1. Cited in verbatim from Richtárik et al. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

84 Generalising the Power Method The alternating minimisation for the regularised problem (6.3) is: y k = arg max y Y F (x k, y) (6.4) x k+1 = arg max x X F (x, y k ). (6.5) As it turns out, there are closed-form solutions for the two sub-problems for all the variants above. Notice that Hotelling s deflation is no longer guaranteed to work, although there are replacements. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

85 Generalising the Power Method The alternating minimisation for the regularised problem (6.3) is: y k = arg max y Y F (x k, y) (6.4) x k+1 = arg max x X F (x, y k ). (6.5) As it turns out, there are closed-form solutions for the two sub-problems for all the variants above. Notice that Hotelling s deflation is no longer guaranteed to work, although there are replacements. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

86 Generalising the Power Method The alternating minimisation for the regularised problem (6.3) is: y k = arg max y Y F (x k, y) (6.4) x k+1 = arg max x X F (x, y k ). (6.5) As it turns out, there are closed-form solutions for the two sub-problems for all the variants above. Notice that Hotelling s deflation is no longer guaranteed to work, although there are replacements. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

87 A Summary Overall, we have seen that there are NP-Hard problems, for which one can retrieve the global optimum with high probability. Leading solvers based on alternating minimisation can tackle gigabyte-sized instances in minutes. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

88 A Summary Overall, we have seen that there are NP-Hard problems, for which one can retrieve the global optimum with high probability. Leading solvers based on alternating minimisation can tackle gigabyte-sized instances in minutes. Jakub Mareček and Seán McGarraghy (UCD) Numerical Analysis and Software November 11, / 1

Lecture 5: Singular Value Decomposition SVD (1)

Lecture 5: Singular Value Decomposition SVD (1) EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system

More information

Lecture 7: Singular Value Decomposition

Lecture 7: Singular Value Decomposition 0368-3248-01-Algorithms in Data Mining Fall 2013 Lecturer: Edo Liberty Lecture 7: Singular Value Decomposition Warning: This note may contain typos and other inaccuracies which are usually discussed during

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length

More information

CSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye

CSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye CSE 494 CSE/CBS 598 Fall 2007: Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye 1 Introduction One important method for data compression and classification is to organize

More information

Notes on Symmetric Matrices

Notes on Symmetric Matrices CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.

More information

Inner Product Spaces and Orthogonality

Inner Product Spaces and Orthogonality Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,

More information

Chapter 6. Orthogonality

Chapter 6. Orthogonality 6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be

More information

α = u v. In other words, Orthogonal Projection

α = u v. In other words, Orthogonal Projection Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v

More information

Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) Singular Value Decomposition SVD and Principal Component Analysis PCA Edo Liberty Algorithms in Data mining 1 Singular Value Decomposition SVD We will see that any matrix A R m n w.l.o.g. m n can be written

More information

Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

More information

1 Singular Value Decomposition (SVD)

1 Singular Value Decomposition (SVD) Contents 1 Singular Value Decomposition (SVD) 2 1.1 Singular Vectors................................. 3 1.2 Singular Value Decomposition (SVD)..................... 7 1.3 Best Rank k Approximations.........................

More information

Numerical Linear Algebra Chap. 4: Perturbation and Regularisation

Numerical Linear Algebra Chap. 4: Perturbation and Regularisation Numerical Linear Algebra Chap. 4: Perturbation and Regularisation Heinrich Voss voss@tu-harburg.de Hamburg University of Technology Institute of Numerical Simulation TUHH Heinrich Voss Numerical Linear

More information

1. True/False: Circle the correct answer. No justifications are needed in this exercise. (1 point each)

1. True/False: Circle the correct answer. No justifications are needed in this exercise. (1 point each) Math 33 AH : Solution to the Final Exam Honors Linear Algebra and Applications 1. True/False: Circle the correct answer. No justifications are needed in this exercise. (1 point each) (1) If A is an invertible

More information

The Power Method for Eigenvalues and Eigenvectors

The Power Method for Eigenvalues and Eigenvectors Numerical Analysis Massoud Malek The Power Method for Eigenvalues and Eigenvectors The spectrum of a square matrix A, denoted by σ(a) is the set of all eigenvalues of A. The spectral radius of A, denoted

More information

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = 36 + 41i.

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = 36 + 41i. Math 5A HW4 Solutions September 5, 202 University of California, Los Angeles Problem 4..3b Calculate the determinant, 5 2i 6 + 4i 3 + i 7i Solution: The textbook s instructions give us, (5 2i)7i (6 + 4i)(

More information

Numerical Methods I Eigenvalue Problems

Numerical Methods I Eigenvalue Problems Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001, Fall 2010 September 30th, 2010 A. Donev (Courant Institute)

More information

Solution based on matrix technique Rewrite. ) = 8x 2 1 4x 1x 2 + 5x x1 2x 2 2x 1 + 5x 2

Solution based on matrix technique Rewrite. ) = 8x 2 1 4x 1x 2 + 5x x1 2x 2 2x 1 + 5x 2 8.2 Quadratic Forms Example 1 Consider the function q(x 1, x 2 ) = 8x 2 1 4x 1x 2 + 5x 2 2 Determine whether q(0, 0) is the global minimum. Solution based on matrix technique Rewrite q( x1 x 2 = x1 ) =

More information

The Hadamard Product

The Hadamard Product The Hadamard Product Elizabeth Million April 12, 2007 1 Introduction and Basic Results As inexperienced mathematicians we may have once thought that the natural definition for matrix multiplication would

More information

6. Cholesky factorization

6. Cholesky factorization 6. Cholesky factorization EE103 (Fall 2011-12) triangular matrices forward and backward substitution the Cholesky factorization solving Ax = b with A positive definite inverse of a positive definite matrix

More information

[1] Diagonal factorization

[1] Diagonal factorization 8.03 LA.6: Diagonalization and Orthogonal Matrices [ Diagonal factorization [2 Solving systems of first order differential equations [3 Symmetric and Orthonormal Matrices [ Diagonal factorization Recall:

More information

Section 6.1 - Inner Products and Norms

Section 6.1 - Inner Products and Norms Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,

More information

Vector and Matrix Norms

Vector and Matrix Norms Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

More information

Matrix Norms. Tom Lyche. September 28, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

Matrix Norms. Tom Lyche. September 28, Centre of Mathematics for Applications, Department of Informatics, University of Oslo Matrix Norms Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo September 28, 2009 Matrix Norms We consider matrix norms on (C m,n, C). All results holds for

More information

Computational Methods CMSC/AMSC/MAPL 460. Eigenvalues and Eigenvectors. Ramani Duraiswami, Dept. of Computer Science

Computational Methods CMSC/AMSC/MAPL 460. Eigenvalues and Eigenvectors. Ramani Duraiswami, Dept. of Computer Science Computational Methods CMSC/AMSC/MAPL 460 Eigenvalues and Eigenvectors Ramani Duraiswami, Dept. of Computer Science Eigen Values of a Matrix Definition: A N N matrix A has an eigenvector x (non-zero) with

More information

Chapter 17. Orthogonal Matrices and Symmetries of Space

Chapter 17. Orthogonal Matrices and Symmetries of Space Chapter 17. Orthogonal Matrices and Symmetries of Space Take a random matrix, say 1 3 A = 4 5 6, 7 8 9 and compare the lengths of e 1 and Ae 1. The vector e 1 has length 1, while Ae 1 = (1, 4, 7) has length

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

Advanced Topics in Machine Learning (Part II)

Advanced Topics in Machine Learning (Part II) Advanced Topics in Machine Learning (Part II) 3. Convexity and Optimisation February 6, 2009 Andreas Argyriou 1 Today s Plan Convex sets and functions Types of convex programs Algorithms Convex learning

More information

Lecture 5 Principal Minors and the Hessian

Lecture 5 Principal Minors and the Hessian Lecture 5 Principal Minors and the Hessian Eivind Eriksen BI Norwegian School of Management Department of Economics October 01, 2010 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and

More information

Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

More information

An Application of Linear Algebra to Image Compression

An Application of Linear Algebra to Image Compression An Application of Linear Algebra to Image Compression Paul Dostert July 2, 2009 1 / 16 Image Compression There are hundreds of ways to compress images. Some basic ways use singular value decomposition

More information

1 VECTOR SPACES AND SUBSPACES

1 VECTOR SPACES AND SUBSPACES 1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such

More information

University of Lille I PC first year list of exercises n 7. Review

University of Lille I PC first year list of exercises n 7. Review University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients

More information

October 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix

October 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix Linear Algebra & Properties of the Covariance Matrix October 3rd, 2012 Estimation of r and C Let rn 1, rn, t..., rn T be the historical return rates on the n th asset. rn 1 rṇ 2 r n =. r T n n = 1, 2,...,

More information

13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions. 3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in three-space, we write a vector in terms

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively. Chapter 7 Eigenvalues and Eigenvectors In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigenvectors of matrices, concepts that were already introduced in Geometry

More information

NUMERICALLY EFFICIENT METHODS FOR SOLVING LEAST SQUARES PROBLEMS

NUMERICALLY EFFICIENT METHODS FOR SOLVING LEAST SQUARES PROBLEMS NUMERICALLY EFFICIENT METHODS FOR SOLVING LEAST SQUARES PROBLEMS DO Q LEE Abstract. Computing the solution to Least Squares Problems is of great importance in a wide range of fields ranging from numerical

More information

Orthogonal Diagonalization of Symmetric Matrices

Orthogonal Diagonalization of Symmetric Matrices MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding

More information

Derivative Free Optimization

Derivative Free Optimization Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTH-MAT-R--2014/02--SE Department of Mathematics Linköping University S-581 83 Linköping, Sweden. Three lectures 1 on Derivative Free

More information

Examination paper for TMA4205 Numerical Linear Algebra

Examination paper for TMA4205 Numerical Linear Algebra Department of Mathematical Sciences Examination paper for TMA4205 Numerical Linear Algebra Academic contact during examination: Markus Grasmair Phone: 97580435 Examination date: December 16, 2015 Examination

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

More information

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R. Epipolar Geometry We consider two perspective images of a scene as taken from a stereo pair of cameras (or equivalently, assume the scene is rigid and imaged with a single camera from two different locations).

More information

3. INNER PRODUCT SPACES

3. INNER PRODUCT SPACES . INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

More information

MATH 304 Linear Algebra Lecture 4: Matrix multiplication. Diagonal matrices. Inverse matrix.

MATH 304 Linear Algebra Lecture 4: Matrix multiplication. Diagonal matrices. Inverse matrix. MATH 304 Linear Algebra Lecture 4: Matrix multiplication. Diagonal matrices. Inverse matrix. Matrices Definition. An m-by-n matrix is a rectangular array of numbers that has m rows and n columns: a 11

More information

WHICH LINEAR-FRACTIONAL TRANSFORMATIONS INDUCE ROTATIONS OF THE SPHERE?

WHICH LINEAR-FRACTIONAL TRANSFORMATIONS INDUCE ROTATIONS OF THE SPHERE? WHICH LINEAR-FRACTIONAL TRANSFORMATIONS INDUCE ROTATIONS OF THE SPHERE? JOEL H. SHAPIRO Abstract. These notes supplement the discussion of linear fractional mappings presented in a beginning graduate course

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

Key words. Principal Component Analysis, Convex optimization, Nuclear norm minimization, Duality, Proximal gradient algorithms.

Key words. Principal Component Analysis, Convex optimization, Nuclear norm minimization, Duality, Proximal gradient algorithms. FAST CONVEX OPTIMIZATION ALGORITHMS FOR EXACT RECOVERY OF A CORRUPTED LOW-RANK MATRIX ZHOUCHEN LIN, ARVIND GANESH, JOHN WRIGHT, LEQIN WU, MINMING CHEN, AND YI MA Abstract. This paper studies algorithms

More information

Nonlinear Iterative Partial Least Squares Method

Nonlinear Iterative Partial Least Squares Method Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Numerical Analysis Lecture Notes Peter J. Olver 6. Eigenvalues and Singular Values In this section, we collect together the basic facts about eigenvalues and eigenvectors. From a geometrical viewpoint,

More information

Cheng Soon Ong & Christfried Webers. Canberra February June 2016

Cheng Soon Ong & Christfried Webers. Canberra February June 2016 c Cheng Soon Ong & Christfried Webers Research Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 31 c Part I

More information

On Covariance Structure in Noisy, Big Data

On Covariance Structure in Noisy, Big Data On Covariance Structure in Noisy, Big Data Randy C. Paffenroth a, Ryan Nong a and Philip C. Du Toit a a Numerica Corporation, Loveland, CO, USA; ABSTRACT Herein we describe theory and algorithms for detecting

More information

Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.

Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n. ORTHOGONAL MATRICES Informally, an orthogonal n n matrix is the n-dimensional analogue of the rotation matrices R θ in R 2. When does a linear transformation of R 3 (or R n ) deserve to be called a rotation?

More information

ME128 Computer-Aided Mechanical Design Course Notes Introduction to Design Optimization

ME128 Computer-Aided Mechanical Design Course Notes Introduction to Design Optimization ME128 Computer-ided Mechanical Design Course Notes Introduction to Design Optimization 2. OPTIMIZTION Design optimization is rooted as a basic problem for design engineers. It is, of course, a rare situation

More information

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued). MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors Jordan canonical form (continued) Jordan canonical form A Jordan block is a square matrix of the form λ 1 0 0 0 0 λ 1 0 0 0 0 λ 0 0 J = 0

More information

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear

More information

Lecture 3: Convex Sets and Functions

Lecture 3: Convex Sets and Functions EE 227A: Convex Optimization and Applications January 24, 2012 Lecture 3: Convex Sets and Functions Lecturer: Laurent El Ghaoui Reading assignment: Chapters 2 (except 2.6) and sections 3.1, 3.2, 3.3 of

More information

Summary of week 8 (Lectures 22, 23 and 24)

Summary of week 8 (Lectures 22, 23 and 24) WEEK 8 Summary of week 8 (Lectures 22, 23 and 24) This week we completed our discussion of Chapter 5 of [VST] Recall that if V and W are inner product spaces then a linear map T : V W is called an isometry

More information

LINEAR ALGEBRA. September 23, 2010

LINEAR ALGEBRA. September 23, 2010 LINEAR ALGEBRA September 3, 00 Contents 0. LU-decomposition.................................... 0. Inverses and Transposes................................. 0.3 Column Spaces and NullSpaces.............................

More information

NOTES on LINEAR ALGEBRA 1

NOTES on LINEAR ALGEBRA 1 School of Economics, Management and Statistics University of Bologna Academic Year 205/6 NOTES on LINEAR ALGEBRA for the students of Stats and Maths This is a modified version of the notes by Prof Laura

More information

Constrained Least Squares

Constrained Least Squares Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580-587 CICN may05/1 Background The least squares problem: min Ax b 2 x Sometimes,

More information

4: EIGENVALUES, EIGENVECTORS, DIAGONALIZATION

4: EIGENVALUES, EIGENVECTORS, DIAGONALIZATION 4: EIGENVALUES, EIGENVECTORS, DIAGONALIZATION STEVEN HEILMAN Contents 1. Review 1 2. Diagonal Matrices 1 3. Eigenvectors and Eigenvalues 2 4. Characteristic Polynomial 4 5. Diagonalizability 6 6. Appendix:

More information

The Trimmed Iterative Closest Point Algorithm

The Trimmed Iterative Closest Point Algorithm Image and Pattern Analysis (IPAN) Group Computer and Automation Research Institute, HAS Budapest, Hungary The Trimmed Iterative Closest Point Algorithm Dmitry Chetverikov and Dmitry Stepanov http://visual.ipan.sztaki.hu

More information

9.3 Advanced Topics in Linear Algebra

9.3 Advanced Topics in Linear Algebra 548 93 Advanced Topics in Linear Algebra Diagonalization and Jordan s Theorem A system of differential equations x = Ax can be transformed to an uncoupled system y = diag(λ,, λ n y by a change of variables

More information

Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain

Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain 1. Orthogonal matrices and orthonormal sets An n n real-valued matrix A is said to be an orthogonal

More information

Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

More information

Basics Inversion and related concepts Random vectors Matrix calculus. Matrix algebra. Patrick Breheny. January 20

Basics Inversion and related concepts Random vectors Matrix calculus. Matrix algebra. Patrick Breheny. January 20 Matrix algebra January 20 Introduction Basics The mathematics of multiple regression revolves around ordering and keeping track of large arrays of numbers and solving systems of equations The mathematical

More information

Principal Component Analysis Application to images

Principal Component Analysis Application to images Principal Component Analysis Application to images Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception http://cmp.felk.cvut.cz/

More information

5. Orthogonal matrices

5. Orthogonal matrices L Vandenberghe EE133A (Spring 2016) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 5-1 Orthonormal

More information

LINEAR ALGEBRA W W L CHEN

LINEAR ALGEBRA W W L CHEN LINEAR ALGEBRA W W L CHEN c W W L Chen, 1997, 2008 This chapter is available free to all individuals, on understanding that it is not to be used for financial gain, and may be downloaded and/or photocopied,

More information

CS3220 Lecture Notes: QR factorization and orthogonal transformations

CS3220 Lecture Notes: QR factorization and orthogonal transformations CS3220 Lecture Notes: QR factorization and orthogonal transformations Steve Marschner Cornell University 11 March 2009 In this lecture I ll talk about orthogonal matrices and their properties, discuss

More information

Matrices, Determinants and Linear Systems

Matrices, Determinants and Linear Systems September 21, 2014 Matrices A matrix A m n is an array of numbers in rows and columns a 11 a 12 a 1n r 1 a 21 a 22 a 2n r 2....... a m1 a m2 a mn r m c 1 c 2 c n We say that the dimension of A is m n (we

More information

Linear Algebraic Equations, SVD, and the Pseudo-Inverse

Linear Algebraic Equations, SVD, and the Pseudo-Inverse Linear Algebraic Equations, SVD, and the Pseudo-Inverse Philip N. Sabes October, 21 1 A Little Background 1.1 Singular values and matrix inversion For non-smmetric matrices, the eigenvalues and singular

More information

LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12,

LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12, LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12, 2000 45 4 Iterative methods 4.1 What a two year old child can do Suppose we want to find a number x such that cos x = x (in radians). This is a nonlinear

More information

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every

More information

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001,

More information

Introduction to Convex Optimization for Machine Learning

Introduction to Convex Optimization for Machine Learning Introduction to Convex Optimization for Machine Learning John Duchi University of California, Berkeley Practical Machine Learning, Fall 2009 Duchi (UC Berkeley) Convex Optimization for Machine Learning

More information

Linear Least Squares

Linear Least Squares Linear Least Squares Suppose we are given a set of data points {(x i,f i )}, i = 1,...,n. These could be measurements from an experiment or obtained simply by evaluating a function at some points. One

More information

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α

More information

MATH36001 Background Material 2015

MATH36001 Background Material 2015 MATH3600 Background Material 205 Matrix Algebra Matrices and Vectors An ordered array of mn elements a ij (i =,, m; j =,, n) written in the form a a 2 a n A = a 2 a 22 a 2n a m a m2 a mn is said to be

More information

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace

More information

Row and column operations

Row and column operations Row and column operations It is often very useful to apply row and column operations to a matrix. Let us list what operations we re going to be using. 3 We ll illustrate these using the example matrix

More information

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0. Cross product 1 Chapter 7 Cross product We are getting ready to study integration in several variables. Until now we have been doing only differential calculus. One outcome of this study will be our ability

More information

Lecture 4: Partitioned Matrices and Determinants

Lecture 4: Partitioned Matrices and Determinants Lecture 4: Partitioned Matrices and Determinants 1 Elementary row operations Recall the elementary operations on the rows of a matrix, equivalent to premultiplying by an elementary matrix E: (1) multiplying

More information

1 Norms and Vector Spaces

1 Norms and Vector Spaces 008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)

More information

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

More information

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013 Notes on Orthogonal and Symmetric Matrices MENU, Winter 201 These notes summarize the main properties and uses of orthogonal and symmetric matrices. We covered quite a bit of material regarding these topics,

More information

A note on companion matrices

A note on companion matrices Linear Algebra and its Applications 372 (2003) 325 33 www.elsevier.com/locate/laa A note on companion matrices Miroslav Fiedler Academy of Sciences of the Czech Republic Institute of Computer Science Pod

More information

Lecture 1: Schur s Unitary Triangularization Theorem

Lecture 1: Schur s Unitary Triangularization Theorem Lecture 1: Schur s Unitary Triangularization Theorem This lecture introduces the notion of unitary equivalence and presents Schur s theorem and some of its consequences It roughly corresponds to Sections

More information

An Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.

An Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt. An Overview Of Software For Convex Optimization Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.edu In fact, the great watershed in optimization isn t between linearity

More information

Maximum-Margin Matrix Factorization

Maximum-Margin Matrix Factorization Maximum-Margin Matrix Factorization Nathan Srebro Dept. of Computer Science University of Toronto Toronto, ON, CANADA nati@cs.toronto.edu Jason D. M. Rennie Tommi S. Jaakkola Computer Science and Artificial

More information

Quadratic Functions, Optimization, and Quadratic Forms

Quadratic Functions, Optimization, and Quadratic Forms Quadratic Functions, Optimization, and Quadratic Forms Robert M. Freund February, 2004 2004 Massachusetts Institute of echnology. 1 2 1 Quadratic Optimization A quadratic optimization problem is an optimization

More information

Solving polynomial least squares problems via semidefinite programming relaxations

Solving polynomial least squares problems via semidefinite programming relaxations Solving polynomial least squares problems via semidefinite programming relaxations Sunyoung Kim and Masakazu Kojima August 2007, revised in November, 2007 Abstract. A polynomial optimization problem whose

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

1 Introduction to Matrices

1 Introduction to Matrices 1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns

More information

Absolute Value Programming

Absolute Value Programming Computational Optimization and Aplications,, 1 11 (2006) c 2006 Springer Verlag, Boston. Manufactured in The Netherlands. Absolute Value Programming O. L. MANGASARIAN olvi@cs.wisc.edu Computer Sciences

More information

Practical Numerical Training UKNum

Practical Numerical Training UKNum Practical Numerical Training UKNum 7: Systems of linear equations C. Mordasini Max Planck Institute for Astronomy, Heidelberg Program: 1) Introduction 2) Gauss Elimination 3) Gauss with Pivoting 4) Determinants

More information

Eigenvalues and eigenvectors of a matrix

Eigenvalues and eigenvectors of a matrix Eigenvalues and eigenvectors of a matrix Definition: If A is an n n matrix and there exists a real number λ and a non-zero column vector V such that AV = λv then λ is called an eigenvalue of A and V is

More information