Chapter 5 Inner products and orthogonality Inner product spaces, norms, orthogonality, Gram-Schmidt process Reading The list below gives examples of relevant reading. (For full publication details, see Chapter.) Leon, S.J., Linear Algebra with Applications. Chapter 5, Sections 5., 5.3, 5.5. Ostaszewski, A. Advanced Mathematical Methods. 2.3, 2.4, 2.7 and 2.8. Simon, C.P. and Blume, L., Mathematics for Economists. Chapter 0, Section 0.4. Introduction In this short chapter we examine more generally the concept of orthogonality, which has already been encountered in our work on orthogonal diagonalisation. The inner product of two real n-vectors For x, y R n, the inner product (sometimes called the dot product or scalar product) is defined to be the number x, y given by x, y = x T y = x y + x 2 y 2 + + x n y n. Example: If x = (, 2, 3) T and y = (2,, ) then x, y = (2) + 2( ) + 3() = 3. 7
It is important to realise that the inner product is just a number, not another vector or a matrix. Inner products more generally Suppose that V is a vector space (over the real numbers). An inner product on V is a mapping from (or operation on) pairs of vectors x, y to the real numbers, the result of which is a real number denoted x, y, which satisfies the following properties: (i) x, x 0 for all x V, and x, x = 0 if and only if x = 0, the zero vector of the vector space (ii) x, y = y, x for all x, y V (iii) αx + βy, z = α x, z + β y, z for all x, y, z V and all α, β R. Some other basic facts follow immediately from this definition: for example, z, αx + βy = α z, x + β z, y. Activity 5. Prove this. It is a simple matter to check that the inner product defined above for real vectors is indeed an inner product according to this more abstract definition, and we shall call it the standard inner product on R n. The abstract definition, though, applies to more than just the vector space R n, and there is some advantage in developing results in terms of the general notion of inner product. If a vector space has an inner product defined on it, we refer to it as an inner product space. Example: Suppose that V is the vector space consisting of all real polynomial functions of degree at most n; that is, V consists of all functions of the form p(x) = a 0 + a x + a 2 x 2 + + a n x n, where a 0, a,..., a n R. The addition and scalar multiplication are, as usual, defined pointwise. Let x, x 2,..., x n+ be n fixed, different, real numbers, and define, for p, q V, n+ p, q = p(x i )q(x i ). i= Then this is an inner product. To see this, we check the properties in the definition of an inner product. Property (ii) is clear. For (i), we have n+ p, p = p(x i ) 2 0. i= Clearly, if p is the zero vector of the vector space (which is the identically-0 function), then p, p = 0. To finish verifying (i) we need to check that if p, p = 0 then p must be the zero function. Now, p, p = 0 must mean that p(x i ) = 0 for i =, 2,..., n +. So p has n + different roots. But p has degree no more than n, so p must be the identically-zero function. (A non-zero polynomial of degree at most n has no more than n distinct roots.) Part (iii) is left to you: 72
Activity 5.2 Prove that, for any alpha, β R and any p, q, r V, αp + βq, r = α p, r + β q, r. Norms in a vector space For any x in an inner product space V, the inner product x, x is non-negative (by definition). Now, because x, x 0, we may take its square root (obtaining a real number). We define the norm or length x of a vector x to be x = x, x. For example, for the standard inner product on R n, x, x = x 2 + x 2 2 + + x 2 n, (which is clearly non-negative since it is a sum of squares), and we obtain the standard Euclidean length of a vector: x = x 2 + x2 2 + + x2 n. Orthogonality Orthogonal vectors We have already said (in the discussion of orthogonal diagonalisation) what it means for two vectors x, y in R n to be orthogonal: it means that x T y = 0. In other words, x, y are orthogonal if x, y = 0. We take this as the general definition of orthogonality in an inner product space: Definition 5. Suppose that V is an inner product space. Then x, y V are said to be orthogonal if x, y = 0. We write x y to mean that x, y are orthogonal. Example: With the usual inner product on R 3, the vectors x = (,, 0) T y = (2, 2, 3) T are orthogonal. and Activity 5.3 Check this! Geometrical interpretation A geometrical interpretation can be given to the notion of orthogonality in R n. Consider a very simple example with n = 2. Suppose that x = (, ) T and y = (, ) T. Then x, y are orthogonal, as is easily seen. We can represent x, y geometrically on the standard two-dimensional (x, y)-plane: x is represented as an arrow from the origin (0, 0) to the point (, ); and y is represented as an arrow from the origin to the point (, ). This is shown in the figure. It is clear that these arrows the geometrical interpretations of x, y are at right angles to each other: they are perpendicular. 73
y (, ) (0, 0) (, ) x In fact, this geometrical interpretation is valid in R n, for any n. This is because it turns out that if x, y R n then the inner product x, y equals x y cos θ where θ is the angle between the geometrical representations of the two vectors. If neither x nor y is the zero-vector, then the inner product is 0 if and only if cos θ = 0, which means that θ is π/2 or 3π/2 radians, in which case the angle between the vectors is a right angle. Orthogonality and linear independence If a set of (non-zero) vectors are pairwise orthogonal (that is, any two are orthogonal) then it turns out that the vectors are linearly independent: Theorem 5. Suppose that V is an inner product space and that vectors v, v 2,..., v k V are pairwise orthogonal (v i v j for i j), and none is the zero-vector. Then {v, v 2,..., v k } is a linearly independent set of vectors. Proof We need to show that if α v + α 2 v 2 + + α k v k = 0, (the zero-vector), then α = α 2 = = α k = 0. Let i be any integer between and k. Then v i, α v + α 2 v 2 + + α k v k = v i, 0 = 0. But, since v i, v j = 0 for j i, v i, α v +α 2 v 2 + +α k v k = α v i, v +α 2 v i, v 2 + +α k v i, v k = α i v i, v i = α i v i 2. So we have α i v i 2 = 0. Since v i 0, v i 2 0 and hence α i = 0. But i was any integer in the range to n, so we deduce that as required. α = α 2 =... = α k = 0, 74
Orthogonal matrices and orthonormal sets We have already met the word orthogonal in a different context: we spoke of orthogonal matrices when considering orthogonal diagonalisation. Recall that a matrix P is orthogonal if P T = P. Now, this means that P T P = I, the identity matrix. Suppose that the columns of P are x, x 2,..., x n. Then the fact that P T P = I means that x T i x j = 0 if i j and x T i x i =. To help see this, consider the case n = 3. Then, P = (x x 2 x 3 ) and since P T P = I we have 0 0 0 0 = I = P T P = 0 0 xt x T 2 x T 3 ( x xt x x 2 x 3 ) = x T 2 x x T x 2 x T 2 x 2 x T x 3 x T 2 x 3. x T 3 x x T 3 x 2 x T 3 x 3 But, if i j, x T i x j = 0 means precisely that the columns x i, x j are orthogonal. The second statement is that x i 2 =, which means (since x i 0) that x i = ; that is, x i is of length. This indicates the following characterisation: a matrix P is orthogonal if and only if, as vectors, its columns are pairwise orthogonal, and each has length. When a set of vectors {x, x 2,..., x k } is such that any two are orthogonal and, furthermore, each has length, we say that the vectors form an orthonormal set (ONS) of vectors. So we can restate our previous observation as follows. Theorem 5.2 A matrix P is orthogonal if and only if the columns of P form an orthonormal set of vectors. The Cauchy-Schwarz inequality This important inequality is as follows. Theorem 5.3 (Cauchy-Schwarz inequality) Suppose that V is an inner product space. Then x, y x y for all x, y V. Proof Let x, y be any two vectors of V. For any real number α, we consider the vector αx + y. Certainly, αx + y 2 0 for all α. But αx + y 2 = αx + y, αx + y = α 2 x, x + α x, y + α y, x + y, y = α 2 x 2 + 2α x, y + y 2. Now, this quadratic expression in α is non-negative for all α. Generally, we know that if a quadratic expression az 2 + bz + c is non-negative for all z then b 2 4ac 0. Applying this observation, we see that (2 x, y ) 2 4 x 2 y 2 0, or ( x, y ) 2 x 2 y 2. 75
Taking the square root of each side we obtain which is what we need. x, y x y, (Recall that x, y denotes the absolute value of the inner product.) For example, if we take V to be R n and consider the standard inner product on R n, then for all x, y R n, the Cauchy-Schwarz inequality tells us that n x i y i n x 2 n i yi 2. i= i= i= Generalised Pythagoras theorem A version of Pythagoras theorem will no doubt be familiar to almost all of you: namely, that if a is the length of the longest side of a right angled triangle, and b and c the lengths of the other two sides, then a 2 = b 2 + c 2. The generalised Pythagoras theorem is: Theorem 5.4 (Generalised Pythagoras Theorem) In an inner product space V, if x, y V are orthogonal, then x + y 2 = x 2 + y 2. Proof This is easy to prove. We know that for any z, z 2 = z, z, simply from the definition of the norm. So, x + y 2 = x + y, x + y = x, x + y + y, x + y = x, x + x, y + y, x + y, y = x 2 + 2 x, y + y 2 = x 2 + y 2, where the last line follows from the fact that, x, y being orthogonal, x, y = 0. We also have the triangle inequality for norms. Theorem 5.5 (Triangle inequality for norms) In an inner product space V, if x, y V, then x + y x + y. Proof We have x + y 2 = x + y, x + y = x, x + y + y, x + y 76
= x, x + x, y + y, x + y, y = x 2 + 2 x, y + y 2 x 2 + y 2 + 2 x, y x 2 + y 2 + 2 x y = ( x + y ) 2, where the last inequality used is the Cauchy-Schwarz inequality. Thus x + y x + y, as required. Gram-Schmidt orthonormalisation process The orthonormalisation procedure Given a set of linearly independent vectors {v, v 2,..., v k }, the Gram-Schmidt orthonormalisation process is a way of producing k vectors that span the same space as is spanned by {v, v 2,..., v k }, and that form an orthonormal set. That is, the process produces a set {e, e 2,..., e k } such that: Lin{e, e 2,..., e k } = Lin{v, v 2,..., v k } {e, e 2,..., e k } is an orthonormal set. It works as follows. First, we set Then we define and set Next, we define and set e = v v. u 2 = v 2 v 2, e e, e 2 = u 2 u 2. u 3 = v 3 v 3, e e v 3, e 2 e 2 e 3 = u 3 u 3. Generally, when we have e, e 2,..., e i, we let u i+ = v i+ i v i+, e j e j, e j+ = u i+ u i+. j= It turns out that the resulting set {e, e 2,..., e k } has the required properties. Example: In R 4, let us find an orthonormal basis for the linear span of the three vectors v = (,,, ) T, v 2 = (, 4, 4, ) T, v 3 = (4, 2, 2, 0). 77
First, we have Next, we have e = v v = v 2 + 2 + 2 + 2 = 2 v = (/2, /2, /2, /2) T. u 2 = v 2 v 2, e e = (, 4, 4, ) T (3)(/2, /2, /2, /2) T = ( 5/2, 5/2, 5/2, 5/2) T, and we set e 2 = u 2 = ( /2, /2, /2, /2). u 2 (Note: to do this last step, I merely noted that a normalised vector in the same direction as u 2 is also a normalised vector in the same direction as (,,, ) T, and this second vector is easier to work with.) Continuing, we have u 3 = v 3 v 3, e e v 3, e 2 e 2 = (4, 2, 2, 0) T (2)(/2, /2, /2, /2) T ( 2)( /2, /2, /2, /2) T = (2, 2, 2, 2) T. Then, So e 3 = u 3 u 3 = (/2, /2, /2, /2)T. /2 /2 /2 /2 /2 /2 {e, e 2, e 3 } =,, /2 /2 /2. /2 /2 /2 Activity 5.4 Verify that the set {e, e 2, e 3 } of this example is an orthonormal set. Orthogonal diagonalisation when eigenvalues are not distinct We have seen in an earlier chapter that if a symmetric matrix has distinct eigenvalues, then (since eigenvectors corrsponding to different eigenvalues are orthogonal) it is orthogonally diagonalisable. But, in fact, n n symmetric matrices are always orthogonally diagonalisable, even if they do not have n distinct eigenvalues. What we need for orthogonal diagonalisation is an orthonormal set of n eigenvectors. If it so happens that there are n different eigenvalues then any set of n corresponding eigenvectors form a pairwise orthogonal set of vectors, and all we need do to transform the set into an orthonormal set is normalise each vector. However, if we have repeated eigenvalues, more care is required. Suppose that λ 0 is a repeated eigenvalue of A, by which we mean that, for some k 2, (λ λ 0 ) k is a factor of the characteristic polynomial of A. The multiplicity of λ 0 is the largest k for which this is the case. The eigenspace corresponding to λ 0 is E(λ 0 ) = {x : (A λ 0 )x = 0}, the subspace consisting of all eigenvectors corresponding to λ 0, together with the zero-vector 0. An important fact, which we shall not prove here, is that, if A is symmetric, the dimension of E(λ 0 ) is exactly the multiplicity k of λ 0. This means that there is some basis {x, x 2,..., x k } of size k of the eigenspace E(λ 0 ). We can 78
use the Gram-Schmidt orthonormalisation process to produce an orthonormal basis of E(λ 0 ). Eigenvectors from different eigenspaces are orthogonal (and hence linearly independent). So if we compose a set of n vectors by taking orthonormal bases for each of the eigenspaces, the resulting set is orthonormal, and we can orthogonally diagonalise the matrix A by means of the matrix P with these vectors as its columns. Learning outcomes At the end of this chapter and the relevant reading, you should be able to: explain what is meant by an inner product on a vector space verify that a given inner product is indeed an inner product compute norms in inner product spaces explain why orthogonality of a set of vectors implies linear independence explain what is meant by an orthonormal set of vectors explain why an n n matrix is orthogonal if and only if it possesses an orthonormal set of n eigenvectors know and apply the Cauchy-Schwarz inequality, the Generalised Pythagoras Theorem, and the triangle inequality for norms use the Gram-Schmidt orthonormalisation process Sample examination questions The following are typical exam questions, or parts of questions. Question 5. Let V be the vector space of all m n real matrices (with matrix addition and scalar multiplication). Define, for A = (a ij ) and B = (b ij ) V, m n A, B = a ij b ij. Prove that this is an inner product on V. i= j= Question 5.2 Prove that in any inner product space V, for all x, y V. x + y 2 + x y 2 = 2 x 2 + 2 y 2, Question 5.3 Suppose that v R n. Prove that the set of vectors orthogonal to v, W = {x R n : x v}, is a subspace of R n. How would you describe this subspace geometrically? More generally, suppose that S is any (not necessarily finite) set of vectors in R n and let S denote the set Prove that S is a subspace of R n. S = {x R n : x v for all v S}. 79
Question 5.4 Use the Gram-Schmidt process to find an orthonormal basis for the subspace of R 4 spanned by the vectors 0 0 2 v =, v 2 =, v 3 =. 2 0 Sketch answers or comments on selected questions Question 5. Property (i) of the definition of inner product is easy to check: A, A = m i= j= n a 2 ij 0, and this equals zero if and only if for every i and every j, a ij = 0, which means that A is the zero matrix, which in this vector space is the zero vector. Property (ii) is easy to verify, as also is (iii). Question 5.2 We have: x + y 2 + x y 2 = x + y, x + y + x y, x y = x, x + 2 x, y + y, y + x, x 2 x, y + y, y = 2 x, x + 2 y, y = 2 x 2 + 2 y 2. Question 5.3 Suppose x, y W and α, β R. Because x v and y v, we have (by definition) x, v = y, v = 0. Therefore, αx + βy, v = α x, v + β x, v = α(0) + β(0) = 0, and hence αx + βy v; that is, αx + βy W. Therefore W is a subspace. In fact, W is the set {x : x, v = 0}, which is the hyperplane through the origin with normal vector v. (Hyperplanes are discussed again in a later chapter.) We omit the proof that S is a subspace. This is a standard result, which can be found in the texts: S is known as the orthogonal complement of S. Question 5.4 To start with, e = v / v = (/ 2)(, 0,, 0) T. Then we let 0 2 u 2 = v 2 v 2, e e = 2 0 2 2 =. 2 0 0 Then 0 e 2 = u 2 u 2 = 2. 5 0 80
Next, Normalising u 3 we obtain /5 u 3 = v 3 v 3, e 2 e 2 v 3, e e = =. 2/5 e 3 = 55 ( 5,, 5, 2) T. The required basis is {e, e 2, e 3 }. 8