TRACE INEQUALITIES WITH APPLICATIONS TO ORTHOGONAL REGRESSION AND MATRIX NEARNESS PROBLEMS

Similar documents
Chapter 6. Orthogonality

Similarity and Diagonalization. Similar Matrices

Notes on Symmetric Matrices

Orthogonal Diagonalization of Symmetric Matrices

Section Inner Products and Norms

Inner Product Spaces and Orthogonality

Linear Algebra Review. Vectors

α = u v. In other words, Orthogonal Projection

Continuity of the Perron Root

3. Let A and B be two n n orthogonal matrices. Then prove that AB and BA are both orthogonal matrices. Prove a similar result for unitary matrices.

DATA ANALYSIS II. Matrix Algorithms

Applied Linear Algebra I Review page 1

Factorization Theorems

Lecture 5: Singular Value Decomposition SVD (1)

x = + x 2 + x

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Numerical Analysis Lecture Notes

Introduction to Matrix Algebra

Math 550 Notes. Chapter 7. Jesse Crawford. Department of Mathematics Tarleton State University. Fall 2010

1 Solving LPs: The Simplex Algorithm of George Dantzig

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Examination paper for TMA4205 Numerical Linear Algebra

Chapter 17. Orthogonal Matrices and Symmetries of Space

Lecture 1: Schur s Unitary Triangularization Theorem

Similar matrices and Jordan form

BANACH AND HILBERT SPACE REVIEW

Linear Algebra Notes for Marsden and Tromba Vector Calculus

CONTROLLABILITY. Chapter Reachable Set and Controllability. Suppose we have a linear system described by the state equation

by the matrix A results in a vector which is a reflection of the given

1 VECTOR SPACES AND SUBSPACES

Multidimensional data and factorial methods

Matrix Representations of Linear Transformations and Changes of Coordinates

SYMMETRIC EIGENFACES MILI I. SHAH

The Characteristic Polynomial

Lectures notes on orthogonal matrices (with exercises) Linear Algebra II - Spring 2004 by D. Klain

Systems of Linear Equations

Linear Algebra Methods for Data Mining

Factor analysis. Angela Montanari

NOTES ON LINEAR TRANSFORMATIONS

SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)

Inner Product Spaces

3. INNER PRODUCT SPACES

Vector and Matrix Norms

Mean value theorem, Taylors Theorem, Maxima and Minima.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH

Row Echelon Form and Reduced Row Echelon Form

Direct Methods for Solving Linear Systems. Matrix Factorization

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = i.

UNCOUPLING THE PERRON EIGENVECTOR PROBLEM

ON THE DEGREES OF FREEDOM OF SIGNALS ON GRAPHS. Mikhail Tsitsvero and Sergio Barbarossa

Classification of Cartan matrices

7 Gaussian Elimination and LU Factorization

Inner product. Definition of inner product

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

3 Orthogonal Vectors and Matrices

[1] Diagonal factorization

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

A note on companion matrices

Review Jeopardy. Blue vs. Orange. Review Jeopardy

160 CHAPTER 4. VECTOR SPACES

WHEN DOES A CROSS PRODUCT ON R n EXIST?

October 3rd, Linear Algebra & Properties of the Covariance Matrix

Finite dimensional C -algebras

17. Inner product spaces Definition Let V be a real vector space. An inner product on V is a function

T ( a i x i ) = a i T (x i ).

Recall that two vectors in are perpendicular or orthogonal provided that their dot

1 Sets and Set Notation.

Numerical Methods I Eigenvalue Problems

Lecture Topic: Low-Rank Approximations

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

1 Introduction to Matrices

What is Linear Programming?

Matrix Differentiation

ISOMETRIES OF R n KEITH CONRAD

Numerical Analysis Lecture Notes

4.6 Linear Programming duality

5. Orthogonal matrices

Linear Algebra: Vectors

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

Mathematical finance and linear programming (optimization)

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

Linear algebra and the geometry of quadratic equations. Similarity transformations and orthogonal matrices

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

Data Mining: Algorithms and Applications Matrix Math Review

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Suk-Geun Hwang and Jin-Woo Park

MATH APPLIED MATRIX THEORY

n k=1 k=0 1/k! = e. Example 6.4. The series 1/k 2 converges in R. Indeed, if s n = n then k=1 1/k, then s 2n s n = 1 n

n 2 + 4n + 3. The answer in decimal form (for the Blitz): 0, 75. Solution. (n + 1)(n + 3) = n lim m 2 1

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu

1. Let P be the space of all polynomials (of one real variable and with real coefficients) with the norm

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

Solving Systems of Linear Equations

Transcription:

TRACE INEQUALITIES WITH APPLICATIONS TO ORTHOGONAL REGRESSION AND MATRIX NEARNESS PROBLEMS I.D. COOPE AND P.F. RENAUD Abstract. Matrix trace inequalities are finding increased use in many areas such as analysis, where they can be used to generalise several well known classical inequalities, and computational statistics, where they can be applied, for example, to data fitting problems. In this paper we give simple proofs of two useful matrix trace inequalities and provide applications to orthogonal regression and matrix nearness problems. Key words. Trace inequalities, projection matrices, total least squares, orthogonal regression, matrix nearness problems. AMS subject classifications. 15A42, 15A45, 65F30 1. A Matrix Trace Inequality. Theorem 1.1. Let X be a n n Hermitian matrix with rank(x) =r and let Q k be an n k matrix, k r, withk orthonormal columns. Then, for given X, tr(q k XQ k) is maximized when Q k = V k,wherev k = [v 1,v 2,...,v k ] denotes a matrix of k orthonormal eigenvectors of X corresponding to the k largest eigenvalues. Proof. Let X = VDV be a spectral decomposition of X with V unitary and D = diag[λ 1,λ 2,...,λ n ], the diagonal matrix of (real) eigenvalues ordered so that λ 1 λ 2 λ n. (1.1) Then, tr(q kxq k )=tr(z kdz k )=tr(z k Z kd) =tr(pd), (1.2) where Z k = V Q k and P = Z k Zk is a projection matrix with rank(p )=k. Clearly, the n k matrix Z k has orthonormal columns if and only if Q k has orthonormal columns. Now tr(pd)= P ii λ i Department of Mathematics & Statistics, University of Canterbury, Private Bag 4800, Christchurch, New Zealand. ( I.Coope@math.canterbury.ac.nz, P.Renaud@math.canterbury.ac.nz). 1

2 I.D. COOPE AND P.F. RENAUD with 0 P jj 1, j =1, 2,...,n and n P ii = k because P is an Hermitian projection matrix with rank k. Hence, tr(q kxq k ) L, where L denotes the maximum value attained by the linear programming problem: { n } p i λ i : 0 p j 1,, 2,...,n; p i = k. (1.3) max p R n An optimal basic feasible solution to the LP problem (1.3) is easily identified (noting the ordering (1.1)) as p j =1,, 2,...,k; p j =0,j = k +1,k+2,...,n, with L = k 1 λ i. But P = E k Ek gives tr(pd)=l where E k is the matrix with orthonormal columns formed from the first k columns of the n n identity matrix, therefore (1.2) provides the required result that Q k = VE k = V k maximizes tr Q k XQ k. Corollary 1.2. Let Y be an m n matrix with m n and rank(y )=r and let Q k R n k,k r, be a matrix with k orthonormal columns. Then the Frobenius trace-norm YQ k 2 F =tr(q k Y YQ k ) is maximized for given Y,whenQ = V k,where USV is a singular value decomposition of Y and V k =[v 1,v 2,...,v k ] R n k denotes amatrixofk orthonormal right singular vectors of Y corresponding to the k largest singular values. Corollary 1.3. If a minimum rather than maximum is required then substitute the k smallest eigenvalues/singular values in the above results and reverse the ordering (1.1). Theorem 1.1 is a special case of a more general result established in Section 3. Alternative proofs can be found in some linear algebra texts (see, for example [3]). The special case above and the Corollary 1.2 have applications in total least squares data fitting. 2. An Application to Data Fitting. Suppose that data is available as a set of m points in R n represented by the columns of the n m matrix A and it is required to find the best k-dimensional linear manifold L k R n approximating the set of points in the sense that the sum of squares of the distances of each data point from its orthogonal projection onto the linear manifold is minimized. A general point in L k can be expressed in parametric form as x(t) =z + Z k t, t R k, (2.1)

MATRIX TRACE INEQUALITIES 3 where z is a fixed point in L k and the columns of the n k matrix Z k can be taken to be orthonormal. The problem is now to identify a suitable z and Z k. Now the orthogonal projection of a point a R n onto L k can be written as proj(a, L k )=z + Z k Zk T (a z), and hence the Euclidean distance from a to L k is dist(a, L k )= a proj(a, L k ) 2 = (I Z k Z T k )(a z) 2. Therefore, the total least squares data-fitting problem is reduced to finding a suitable z and corresponding Z k to minimize the sum-of-squares function SS = (I Z k Zk T )(a j z) 2 2, where a j is the jth data point (jth column of A). A necessary condition for SS to be minimized with respect to z is 0= (I Z k Zk T )(a j z) =(I Z k Zk T ) (a j z). Therefore, m (a j z) lies in the null space of (I Z k Zk T )orequivalentlythe column space of Z k. The parametric representation (2.1) shows that there is no loss of generality in letting m (a j z) =0or z = 1 m a j. (2.2) Thus, a suitable z has been determined and it should be noted that the value (2.2) solves the zero-dimensional case corresponding to k = 0. It remains to find Z k when k>0, which is the problem: min (I Z k Zk T )(a j z) 2 2, (2.3) subject to the constraint that the columns of Z k are orthonormal and that z satisfies equation (2.2). Using the properties of orthogonal projections and the definition of the vector 2-norm, (2.3) can be rewritten min (a j z) T (I Z k Zk T )(a j z). (2.4)

4 I.D. COOPE AND P.F. RENAUD Ignoring the terms in (2.4) independent of Z k then reduces the problem to min (a j z) T Z k Zk T (a j z), or equivalently max tr (a j z) T Z k Zk T (a j z). (2.5) The introduction of the trace operator in (2.5) is allowed because the argument to the trace function is a matrix with only one element. The commutative property of the trace then shows that problem (2.5) is equivalent to max tr Zk T (a j z)(a j z) T Z k max tr Zk T ÂÂT Z k where  is the matrix  =[a 1 z,a 2 z,...,a m z]. Theorem 1.1 and its corollary then show that the required matrix Z k can be taken to be the matrix of k left singular vectors of the matrix  (right singular vectors of ÂT ) corresponding to the k largest singular values. This result shows, not unexpectedly, that the best point lies on the best line which lies in the best plane, etc. Moreover, the total least squares problem described above clearly always has a solution although it will not be unique if the (k + 1)st largest singular value of  has the same value as the kth largest. For example, if the data points are the 4 vertices of the unit square in R 2, A = 0 1 1 0, thenany 0 0 1 1 line passing through the centroid of the square is a best line in the total least squares sense because the matrix  for this data has two equal non-zero singular values. The total least squares problem above (also referred to as orthogonal regression) has been considered by many authors and as is pointed out in [4, p 4]...orthogonal regression has been discovered and rediscovered many times, often independently. The approach taken above differs from that in [1], [2], and [4], in that the derivation is more geometric, it does not require the Eckart-Young-Mirsky Matrix Approximation Theorem (see, for example, [4]), and it uses only simple properties of projections and the matrix trace operator.

MATRIX TRACE INEQUALITIES 5 3. A Stronger Result. The proof of Theorem 1.1 relies on maximizing tr(dp) where D is a (fixed) real diagonal matrix and P varies over all rank k projections. Since any two rank k projections are unitarily equivalent the problem is now to maximize tr(du PU)(forfixedD and P ) over all unitary matrices U. Generalizing from P to a general Hermitian matrix leads to the following theorem. Theorem 3.1. Let A, B be n n Hermitian matrices. Then max Uunitary tr(au BU) = α i β i where α 1 α 2 α n and β 1 β 2 β n are the eigenvalues of A and B respectively, both similarly ordered. Clearly, Theorem 1.1 can be recovered since a projection of rank k has eigenvalues 1, repeated k times and 0 repeated n k times. Proof. Let{e i } n be an orthonormal basis of eigenvectors of A corresponding to the eigenvalues {α i } n, written in descending order. Then tr(au BU) = e i AU BUe i = (Ae i ) U BUe i = α i e i U BUe i Let B = V DV where D is diagonal and V is unitary. Writing W = VU gives tr(au BU) = α i e i W DWe i = p ij α i β j i, where the β j s are the elements on the diagonal of D, i.e. the eigenvalues of B and p ij = (We i ) j 2. Note that since W is unitary, the matrix P =[p ij ], is doubly stochastic i.e. has nonnegative entries and whose rows and columns sum to 1. The theorem will therefore follow once it is shown that for α 1 α 2 α n and β 1 β 2 β n max α i β j p ij = α i β i (3.1) [p ij] i, where the maximum is taken over all doubly stochastic matrices P =[p ij ]. For fixed P doubly stochastic, let χ = α i β j p ij. i, If P I, letk be the smallest index i such that p ii 1. (Note that for l<k,p ll =1 and therefore p ij =0ifi<kand i j and also if j<kand i j). Since p kk < 1,

6 I.D. COOPE AND P.F. RENAUD then for some l>k, p kl > 0. Likewise, for some m>k, p mk > 0. These imply that p ml 1. The inequalities above mean that we can choose ɛ>0 such that the matrix P is doubly stochastic where p kk = p kk + ɛ p kl = p kl ɛ p mk = p mk ɛ p ml = p ml + ɛ and p ij = p ij in all other cases. If we write χ = α i β j p ij i, then χ χ = ɛ(α k β k α k β l α m β k + α m β l ) = ɛ(α k α m )(β k β l ) 0 which means that the term α i β j p ij is not decreased. Clearly ɛ can be chosen to reduce a non-diagonal term in P to zero. After a finite number of iterations of this process it follows that P = I maximizes this term. This proves (3.1) and hence Theorem 3.1. Note that this theorem can also be regarded as a generalization of the classical result that if {α i } n, {β i} n are real sequences then α i β σ(i) is maximized over all permutations σ of {1, 2,...,n} when {α i } and {β σ(i) } are similarly ordered. 4. A Matrix Nearness Problem. Theorem 3.1also allows us to answer the following problem. If A, B are Hermitian n n matrices, what is the smallest distance between A and a matrix B unitarily equivalent to B? Specifically, we have: Theorem 4.1. Let A, B be Hermitian n n matrices with ordered eigenvalues α 1 α 2 α n and β 1 β 2 β n respectively. Let denote the Frobenius norm. Then min A Qunitary Q BQ = n (α i β i ) 2. (4.1)

MATRIX TRACE INEQUALITIES 7 Proof. A Q BQ 2 =tr(a Q BQ) 2 =tr(a 2 )+tr(b 2 ) 2tr(AQ BQ) (Note that if C, D are Hermitian, tr(cd) is real.) So by Theorem 3.1 and the result follows. min A Q BQ 2 =tr(a 2 )+tr(b 2 ) 2max Q tr(aq BQ) = α 2 i + βi 2 2 α i β i = (α i β i ) 2 An optimal Q for problem (4.1) is clearly given by Q = VU where U, V are orthonormal matrices of eigenvectors of A, andb respectively (corresponding to similarly ordered eigenvalues). This follows because A = UD α U,B = VD β V,where D α,d β denote the diagonal matrices of eigenvalues {α i }, {β i } respectively and so A Q BQ 2 = D α U Q VD β V QU 2 = (α i β i ) 2 if Q = VU. Problem (4.1) is a variation on the well-known Orthogonal Procrustes Problem (see, for example, [2]) where an orthogonal (unitary) matrix is sought to solve min A BQ. Qunitary In this case A and B are no longer required to be Hermitian (or even square). A minimizing Q for this problem can be obtained from a singular value decomposition of B A [2, p 601]. REFERENCES [1] G. H. Golub and C. F. V. Loan, An analysis of the total least squares problem, SIAM J. Numer. Anal., 17 (1980), pp. 883 893. [2], Matrix Computations, Johns Hopkins University Press, Baltimore, MD, 3rd ed., 1996. [3] R. O. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press, New York, 1985. [4] S. V. Huffel and J. Vandewalle, The total least squares problem: computational aspects and analysis, SIAM Publications, Philadelphia, PA, 1991.