Section 4.4 Inner Product Spaces In our discussion of vector spaces the specific nature of F as a field, other than the fact that it is a field, has played virtually no role. In this section we no longer consider vector spaces V over arbitrary fields F; rather, we restrict F to be the field of real or complex numbers. In the first case V is called a real vector space, in the second, a complex vector space. We all have had some experience with real vector spaces in fact both analytic geometry and the subject matter of vector analysis deal with these. What concepts used there can we carry over to a more abstract setting? To begin with we had in these concrete examples the idea of length; secondly we had the idea of perpendicularity, or, more generally, that of angle. These became special cases of the notion of a dot product (often called a scalar or inner product). Let us recall some properties of dot product as it pertained to the special case of the threedimensional real vectors. Given the vectors v = (x 1, x 2, x 3 ) and w = (y 1, y 2, y 3 ), where the x s and y s are real numbers, the dot product of v and w, denoted by v w, was defined as v w = x 1 y 1 + x 2 y 2 + x 3 y 3 Note that the length of v is given by v v and the angle θ between v and w is determined by cos θ = v w v v w w What formal properties does this dot product enjoy? We list a few: (1) v v and v v = if and only if v = ; (2) v w = w v; (3) u (αv + βw) = α(u v) + β(u w); for any vectors u, v, w and real numbers α, β. Everything that has been said can be carried over to complex vector spaces. However, to get geometrically reasonable definitions we must make some modifications. If we simply define v w = x 1 y 1 + x 2 y 2 + x 3 y 3 for v = (x 1, x 2, x 3 ) and w = (y 1, y 2, y 3 ) where the x s and y s are complex numbers, then it is quite possible that v v = with v ; this is illustrated by the vector v = (1, i, ). In fact, v v need not even be real. If, as in the real case, we should want v v to represent somehow the length of v, we should like that this length be real and that a nonzero vector should not have zero length. We can achieve this much by altering the definition of dot product slightly. If α denotes the complex conjugate of the complex number α, returning to the v and w of the paragraph above let us define v w = x 1 y 1 + x 2 y 2 + x 3 y 3. For real vectors this new definition coincides with the old one; on the other hand, for arbitrary complex vectors v, not only is v v real, it is in fact positive. Thus we have the possibility of introducing, in a natural way, a nonnegative length. However, we do lose something; for instance it is no longer true that v w = w v. In fact the exact relationship between these is v w = w v. Let us list a few properties of this dot product: 1
(1) v w = w v; (2) v v, and v v = if and only if v = ; (3) (αu + βv) w = α(u w) + β(v w); (4) u (αv + βw) = α(u v) + β(u w); for all complex numbers α, β and all complex vectors u, v, w. We reiterate that in what follows F is either the field of real or complex numbers. DEFINITION. The vector space V over F is said to be an inner product space if there is defined for any two vectors u, v V an element (u, v) in F such that: (1) (u, v) = (v, u); (2) (u, u) and (u, u) = if and only if u = ; (3) (αu + βv, w) = α(u, w) + β(v, w); for any u, v, w V and α, β F. A few observations about properties (1), (2), and (3) are in order. A function satisfying them is called an inner product. If F is the field of complex numbers, property (1) implies that (u, u) is real, and so property (2) makes sense. Using (1) and (3) we see that (u, αv + βw) = (αv + βw, u) = α(v, u) + β(w, u) = α(v, u) + β(w, u) = α(u, v) + β(u, w). We pause to look at some examples of inner product spaces. EXAMPLE 4.4.1: In F (n) define, for u = (α 1,..., α n ) and v = (β 1,..., β n ), This defines an inner product on F (n). (u, v) = α 1 β 1 + α 2 β 2 +... + α n β n. EXAMPLE 4.4.2: In F (2) define for u = (α 1, α 2 ) and v = (β 1, β 2 ), (u, v) = 2α 1 β 1 + α 1 β 2 + α 2 β 1 + α 2 β 2. It is easy to verify that this defines an inner product on F (2). EXAMPLE 4.4.3: Let V be the set of all continuous complex-valued functions on the closed unit interval [, 1]. If f(t), g(t) V, define (f(t), g(t)) = f(t)g(t)dt. We leave it to the reader to verify that this defines an inner product on V. For the remainder of this section V will denote an inner product space. DEFINITION: If v V then the length of v (or norm of v), written as v, is defined by v = (v, v). 2
LEMMA 4.4.1: If u, v V and α, β F then (αu + βv, αu + βv) = αα(u, u) + αβ(u, v) + αβ(v, u) + ββ(v, v). Proof: By property (3) defining an inner product space, But and (αu + βv, αu + βv) = α(u, αu + βv) + β(v, αu + βv). (u, αu + βv) = α(u, u) + β(u, v) (v, αu + βv) = α(v, u) + β(v, v). Substituting these in the expression for (αu + βv, αu + βv) we get the desired result. COROLLARY: αu = α u. Proof: We have αu 2 = (αu, αu) = αα(u, u) by Lemma 4.4.1 (with v = ). Since αα = α 2 and (u, u) = u 2, taking square roots yields αu = α u. We digress for a moment, and prove a very elementary and familiar result about real quadratic equations. LEMMA 4.4.2: If a, b, c are real numbers such that a > and aλ 2 + 2bλ + c for all real numbers λ, then b 2 ac. Proof: Completing the squares, aλ 2 + 2bλ + c = 1 a (aλ + b)2 + ) (c b2. a Since it is greater than or equal to for all λ, in particular this must be true for λ = b a. Thus c b2 a, and since a > we get b2 ac. We now proceed to an extremely important inequality, usually known as the Schwarz inequality THEOREM 4.4.1: If u, v V then (u, v) u v. Proof: If u = then both (u, v) = and u v =, so that the result is true there. Suppose, for the moment, that (u, v) is real and u. By Lemma 4.4.1, for any real number λ, (λu + v, λu + v) = λ 2 (u, u) + 2(u, v)λ + (v, v). Let a = (u, u), b = (u, v), and c = (v, v); for these the hypothesis of Lemma 4.4.2 is satisfied, so that b 2 ac. That is, (u, v) 2 (u, u)(v, v); from this it is immediate that (u, v) u v. 3
If α = (u, v) is not real, then it certainly is not, so that u/α is meaningful. Now, ( u ) α, v = 1 1 (u, v) = (u, v) = 1, α (u, v) and so it is certainly real. By the case of the Schwarz inequality discussed in the paragraph above, ( u ) 1 = α, v u v ; α since u = 1 α α u, we get 1 u v α whence α u v. Putting in that α = (u, v) we obtain (u, v) u v, the desired result. Specific cases of the Schwarz inequality are themselves of great interest. We point out two of them. (1) If V = F (n) with (u, v) = α 1 β 1 +... + α n β n, where u = (α 1,..., α n ) and v = (β 1,..., β n ) then Theorem 4.4.1 implies that α 1 β 1 +... + α n β n 2 ( α 1 2 +... + α n 2 )( β 1 2 +... + β n 2 ). (2) If V is the set of all continuous, complex-valued functions on [, 1] with inner product defined by then Theorem 4.4.1 implies that (f(t), g(t)) = f(t)g(t)dt 2 f(t)g(t)dt, f(t) 2 dt g(t) 2 dt. The concept of perpendicularity is an extremely useful and important one in geometry. We introduce its analog in general inner product spaces. DEFINITION: If u, v V then u is said to be orthogonal to v if (u, v) =. Note that if u is orthogonal to v then v is orthogonal to u, for (v, u) = (u, v) = =. DEFINITION. If W is a subspace of V, the orthogonal complement of W, W, is defined by W = {x V (x, w) = for all w W }. LEMMA 4.4.3: W is a subspace of V. 4
Proof: If a, b W then for all α, β F and all w W, since a, b W. (αa + βb, w) = α(a, w) + β(b, w) = Note that W W = (), for if w W W it must be self-orthogonal, that is (w, w) =. The defining properties of an inner product space rule out this possibility unless w =. One of our goals is to show that V = W + W. Once this is done, the remark made above will become of some interest, for it will imply that V is the direct sum of W and W. DEFINITION. The set of vectors {v i } in V is an orthonormal set if (1) each v i is of length 1 (i.e., (v i, v i ) = 1) (2) for i j, (v i, v j ) =. LEMMA 4.4.4: If {v i } is an orthonormal set, then the vectors in {v i } are linearly independent. If w = α 1 v 1 +... + α n v n, then α i = (w, v i ) for i = 1, 2,..., n. Proof: Suppose that Therefore α 1 v 1 + α 2 v 2 +... + α n v n =. = (α 1 v 1 +... + α n v n, v i ) = α 1 (v 1, v i ) +... + α n (v n, v i ). Since (v j, v i ) = for j i while (v i, v i ) = 1, this equation reduces to α i =. Thus the v j s are linearly independent. If w = α 1 v 1 +... + α n v n then computing as above yields (w, v i ) = α i. Similar in spirit and in proof to Lemma 4.4.4 is LEMMA 4.4.5: If {v 1,..., v n } is an orthonormal set in V and if w V, then u = w (w, v 1 )v 1 (w, v 2 )v 2... (w, v i )v i... (w, v n )v n is orthogonal to each of v 1, v 2,...,v n. Proof: Computing (u, v i ) for any i n, using the orthonormality of v 1,...,v n yields the result. The construction carried out in the proof of the next theorem is one which appears and reappears in many parts of mathematics. It is a basic procedure and is known as the Gram- Schmidt orthogonalization process. Although we shall be working in a finite-dimensional inner product space, the Gram-Schmidt process works equally well in infinite-dimensional situations. THEOREM 4.4.2: Let V be a finite-dimensional inner product space; then V has an orthonormal set as a basis. Proof: Let V be of dimension n over F and let v 1,...,v n be a basis of V. From this basis we shall construct an orthonormal set of n vectors; by Lemma 4.4.4 this set is linearly independent so must form a basis of V. We proceed with the construction. We seek n vectors w 1,...,w n each of length 1 such that for i j, (w i, w j ) =. In fact we shall finally produce them in the following form: w 1 will 5
be a multiple of v 1, w 2 will be in the linear span of w 1 and v 2, w 3 in the linear span of w 1, w 2, and v 3, and more generally, w i in the linear span of w 1, w 2,...,w i, v i. Let w 1 = v 1 v 1, then ( ) v1 (w 1, w 1 ) = v 1, v 1 = 1 v 1 v 1 2(v 1, v 1 ) = 1, whence w 1 = 1. We now ask: for what value of α is αw 1 +v 2 orthogonal to w 1? All we need is that (αw 1 + v 2, w 1 ) =, that is α(w 1, w 1 ) + (v 2, w 1 ) =. Since (w 1, w 1 ) = 1, α = (v 2, w 1 ) will do the trick. Let u 2 = (v 2, w 1 )w 1 + v 2 ; u 2 is orthogonal to w 1 ; since v 1 and v 2 are linearly independent, w 1 and v 2 must be linearly independent, and so u 2. Let w 2 = (u 2 / u 2 ); then {w 1, w 2 } is an orthonormal set. We continue. Let u 3 = (v 3, w 1 )w 1 (v 3, w 2 )w 2 + v 3. A simple check verifies that (u 3, w 1 ) = (u 3, w 2 ) =. Since w 1, w 2, and v 3 are linearly independent (for w 1, w 2 are in the linear span of v 1 and v 2 ), u 3. Let w 3 = (u 3 / u 3 ); then {w 1, w 2, w 3 } is an orthonormal set. The road ahead is now clear. Suppose that we have constructed w 1, w 2,...,w i, in the linear span of v 1,..., v i, which form an orthonormal set. How do we construct the next one, w i+1? Merely put u i+1 = (v i+1, w 1 )w 1 (v i+1, w 2 )w 2... (v i+1, w i )w i + v i+1. That u i+1 and that it is orthogonal to each of w 1,...,w i we leave to the reader. Put w i+1 = (u i+1 / u i+1 ). In this way, given r linearly independent elements in V, we can construct an orthonormal set having r elements. If particular, when dim V = n, from any basis of V we can construct an orthonormal set having n elements. This provides us with the required basis for V. We illustrate the construction used in the last proof in a concrete case. Let F be the real field and let V be the set of polynomials, in a variable x, over F of degree 2 or less. In V we define an inner product by: if p(x), q(x) V, then (p(x), q(x)) = p(x)q(x)dx. Let us start with the basis v 1 = 1, v 2 = x, v 3 = x 2 of V. Following the construction used, w 1 = v 1 v 1 = 1 = 1, u 2 = (v 2, w 1 )w 1 + v 2, 1 2 1dx which after the computations reduces to u 2 = x, and so w 2 = u 2 u 2 = 6 x x 2 dx = 3 2 x;
finally, and so u 3 = (v 3, w 1 )w 1 (v 3, w 2 )w 2 + v 3 = 3 + x2, w 3 = u 3 u 3 = 3 + x2 ( ) 2 3 + x2 dx = 1 4 ( + 3x2 ). We mentioned the next theorem earlier as one of our goals. We are now able to prove it. THEOREM 4.4.3: If V is a finite-dimensional inner product space and if W is a subspace of V, then V = W + W. More particularly, V is the direct sum of W and W. Proof: Because of the highly geometric nature of the result, and because it is so basic, we give several proofs. The first will make use of Theorem 4.4.2 and some of the earlier lemmas. The second will be motivated geometrically. First Proof: As a subspace of the inner product space V, W is itself an inner product space (its inner product being that of V restricted to W). Thus we can find an orthonormal set w 1,...,w r in W which is a basis of W. If v V by Lemma 4.4.5, v = v (v, w 1 )w 1 (v, w 2 )w 2... (v, w r )w r is orthogonal to each of w 1,..., w r and so is orthogonal to W. Thus v W, and since v = v + ((v, w 1 )w 1 +... + (v, w r )w r ) it follows that v W +W. Therefore V = W +W. Since W W = (), this sum is direct. Second Proof: In this proof we shall assume that F is the field of real numbers. The proof works, in almost the same way, for the complex numbers; however, it entails a few extra details which might tend to obscure the essential ideas used. Let v V ; suppose that we could find a vector w W such that v w v w for all w W. We claim that then (v w, w) = for all w W, that is, v w W. If w W, then w + w W, in consequence of which However, the right-hand side is (v w, v w ) (v (w + w), v (w + w)). (w, w) + (v w, v w ) 2(v w, w), leading to 2(v w, w) (w, w) for all w W. If m is any positive integer, since w/m W we have that 2 ( m (v w, w) = 2 v w, w ) ( w m m m), w = 1 m2(w, w), and so 2(v w, w) (1/m)(w, w) for any positive integer m. However (1/m)(w, w) as m, whence 2(v w, w). Similarly, w W, and so 2(v w, w) = 2(v w, w), yielding (v w, w) = for all w W. Thus v w W ; hence v w + W W + W. 7
To finish the second proof we must prove the existence of a w W such that v w v w for all w W. We indicate sketchily two ways of proving the existence of such a w. Let u 1,..., u k be a basis of W; thus any w W is of the form w = λ 1 u 1 +... + λ k u k. Let β ij = (u i, u j ) and let γ i = (v, u i ) for v V. Thus (v w, v w) = (v λ 1 u 1... λ k u k, v λ 1 w 1... λ k w k ) = (v, v) λ i λ j β ij 2 λ i γ i. This quadratic function in the λ s is nonnegative and so, by results from the calculus, has a minimum. The λ s for this minimum, λ () 1, λ () 2,..., λ () k give us the desired vector w = λ () 1 u 1 +... + λ () k u k in W. A second way of exhibiting such a minimizing w is as follows. In V define a metric ζ by ζ(x, y) = x y ; one shows that ζ is a proper metric on V, and V is now a metric space. Let S = {w W v w v }. In this metric S is a compact set (prove!) and so the continuous function f(w) = v w defined for w S takes on a minimum at some point w S. We leave it to the reader to verify that w is the desired vector satisfying v w v w for all w W. COROLLARY: If V is a finite-dimensional inner product space and W is a subspace of V then (W ) = W. Proof: If w W then for any u W, (w, u) =, whence W (W ). Now V = W + W and V = W + (W ) ; from these we get, since the sums are direct, dim(w) = dim((w ) ). Since W (W ) and is of the same dimension as (W ), it follows that W = (W ). 8