Determinants, Matrix Norms, Inverse Mapping Theorem G. B. Folland

Similar documents
Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = i.

Notes on Determinant

Linear Algebra Notes for Marsden and Tromba Vector Calculus

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Vector and Matrix Norms

8 Square matrices continued: Determinants

Similarity and Diagonalization. Similar Matrices

Using row reduction to calculate the inverse and the determinant of a square matrix

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Chapter 7. Matrices. Definition. An m n matrix is an array of numbers set out in m rows and n columns. Examples. (

Chapter 17. Orthogonal Matrices and Symmetries of Space

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

DETERMINANTS TERRY A. LORING

Matrix Algebra. Some Basic Matrix Laws. Before reading the text or the following notes glance at the following list of basic matrix algebra laws.

Lecture 4: Partitioned Matrices and Determinants

BANACH AND HILBERT SPACE REVIEW

Systems of Linear Equations

Lecture 2 Matrix Operations

Numerical Analysis Lecture Notes

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Unit 18 Determinants

Solving Linear Systems, Continued and The Inverse of a Matrix

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

v w is orthogonal to both v and w. the three vectors v, w and v w form a right-handed set of vectors.

1 Sets and Set Notation.

Chapter 3. Distribution Problems. 3.1 The idea of a distribution The twenty-fold way

LS.6 Solution Matrices

NOTES ON LINEAR TRANSFORMATIONS

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

MATH APPLIED MATRIX THEORY

THE BANACH CONTRACTION PRINCIPLE. Contents

Solving Systems of Linear Equations

Notes on Symmetric Matrices

Chapter 5. Banach Spaces

Linear Algebra Notes

Linear Maps. Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007)

Mathematics Course 111: Algebra I Part IV: Vector Spaces

28 CHAPTER 1. VECTORS AND THE GEOMETRY OF SPACE. v x. u y v z u z v y u y u z. v y v z

Solution to Homework 2

The Determinant: a Means to Calculate Volume

LEARNING OBJECTIVES FOR THIS CHAPTER

1 VECTOR SPACES AND SUBSPACES

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

Math 4310 Handout - Quotient Vector Spaces

Row Echelon Form and Reduced Row Echelon Form

Metric Spaces. Chapter Metrics

Matrix Representations of Linear Transformations and Changes of Coordinates

1 Determinants and the Solvability of Linear Systems

Linear Algebra I. Ronald van Luijk, 2012

SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89. by Joseph Collison

88 CHAPTER 2. VECTOR FUNCTIONS. . First, we need to compute T (s). a By definition, r (s) T (s) = 1 a sin s a. sin s a, cos s a

4: EIGENVALUES, EIGENVECTORS, DIAGONALIZATION

DERIVATIVES AS MATRICES; CHAIN RULE

Lecture 3: Finding integer solutions to systems of linear equations

Notes on Linear Algebra. Peter J. Cameron

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

1 Norms and Vector Spaces

Continued Fractions and the Euclidean Algorithm

GROUPS ACTING ON A SET

Vieta s Formulas and the Identity Theorem

Inner products on R n, and more

3. INNER PRODUCT SPACES

Math 312 Homework 1 Solutions

Fixed Point Theorems

T ( a i x i ) = a i T (x i ).

DETERMINANTS IN THE KRONECKER PRODUCT OF MATRICES: THE INCIDENCE MATRIX OF A COMPLETE GRAPH

Suk-Geun Hwang and Jin-Woo Park

Section 1.1. Introduction to R n

GENERATING SETS KEITH CONRAD

FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES

MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform

University of Lille I PC first year list of exercises n 7. Review

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.

Orthogonal Diagonalization of Symmetric Matrices

Matrices 2. Solving Square Systems of Linear Equations; Inverse Matrices

Recall that two vectors in are perpendicular or orthogonal provided that their dot

LINEAR ALGEBRA W W L CHEN

α = u v. In other words, Orthogonal Projection

it is easy to see that α = a

1 Introduction to Matrices

Inner Product Spaces

THREE DIMENSIONAL GEOMETRY

4.5 Linear Dependence and Linear Independence

Cross product and determinants (Sect. 12.4) Two main ways to introduce the cross product

Inner product. Definition of inner product

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

1 Solving LPs: The Simplex Algorithm of George Dantzig

Lectures notes on orthogonal matrices (with exercises) Linear Algebra II - Spring 2004 by D. Klain

Factorization Theorems

The last three chapters introduced three major proof techniques: direct,

PYTHAGOREAN TRIPLES KEITH CONRAD

1.2 Solving a System of Linear Equations

Question 2: How do you solve a matrix equation using the matrix inverse?

Orthogonal Projections

1 Local Brouwer degree

The Characteristic Polynomial

Linear Programming Notes V Problem Transformations

= = 3 4, Now assume that P (k) is true for some fixed k 2. This means that

Introduction to Matrix Algebra

Transcription:

Determinants, Matrix Norms, Inverse Mapping Theorem G. B. Folland The purpose of this notes is to present some useful facts about matrices and determinants and a proof of the inverse mapping theorem that is rather different from the one in Apostol. Notation: M n (R) denotes the set of all n n real matrices. Determinants: If A M n (R), we can consider the rows of A: r,..., r n. These are elements of R n, considered as row vectors. Conversely, given n row vectors r,..., r n, we can stack them up into an n n matrix A. Thus we can think of a function f on matrix space M n (R) as a function of n R n -valued variables or vice versa: f(a) f(r,..., r n ). Basic Fact: There is a unique function det : M n (R) R (the determinant ) with the following three properties: i. det is a linear function of each row when the other rows are held fixed: that is, det(αa + βb, r 2,..., r n ) = α det(a, r 2,..., r n ) + β det(b, r 2,..., r n ), and likewise for the other rows. ii. If two rows of A are interchanged, det A is multiplied by : det(..., r i,..., r j,...) = det(..., r j,..., r i,...). iii. det(i) =, where I denotes the n n identity matrix. The uniqueness of the determinant follows from the discussion below; existence takes more work to establish. We shall not present the proof here but give the formulas. For n = 2 and n = 3 we have ( ) a b c a b det = ad bc, det d e f = aei afh + bfg bdi + cdh ceg. c d g h i For general n, det A = σ (sgn σ)a σ()a 2σ(2) A nσ(n), where the sum is over all permutations σ of {,..., n}, and sgn σ is or depending on whether σ is obtained by an even or odd number of interchanges of two numbers. (This formula is a computational nightmare for large n, being a sum of n! terms, so it is of little use in practice. There are better ways to compute determinants, as we shall see shortly.) An important consequence of properties (i) and (ii) is iv. If one row of A is the zero vector, or if two rows of A are equal, then det A = 0. Properties (i), (ii), and (iv) tell how the determinant of a matrix behaves under the elementary row operations:

Multiplying a row by a scalar multiplies the determinant by that scalar. Interchanging two rows multiplies the determinant by. Adding a multiple of one row to another row leaves the determinant unchanged, because det(..., r i + αr j,..., r j,...) = det(..., r i,..., r j,...) + α det(..., r j,..., r j,...), by (i), and the last term is zero by (iv). This gives a reasonably efficient way to compute determinants. To wit, any n n matrix A can be row-reduced either to a matrix with an all-zero row (whose determinant is 0) or to the identity matrix (whose determinant is ). Just keep track of what happens to the determinant as you perform these row operations, and you will have calculated det A. (There are shortcuts for this procedure, but that s another story. We ll mostly be dealing with 2 2 and 3 3 matrices, for which one can just use the explicit formulas above.) This observation is also the key to the main theoretical significance of the determinant. A matrix A is invertible that is, the map f : R n R n defined by f(x) = Ax is invertible, where elements of R n are considered as column vectors if and only if A can be row-reduced to the identity. Indeed, in this case, if one starts with the identity matrix I and performs the same sequence of row operations on it that row-reduces A to I, the result is A. It follows that A is invertible if and only if det A 0. If A is invertible, its inverse can be calculated in terms of determinants. Indeed, suppose A M n (R), and let B ij be the (n ) (n ) matrix obtained by deleting the ith row and jth column of A. Then the ijth entry of A is given by (A ) ij = ( ) i+j det B ij det A. This formula is computationally less efficient than calculating A by row reduction, at least when the size of A is large. However, it is theoretically important. In particular, it shows explicitly how the condition det A 0 comes into the picture, and since determinants are polynomial functions of the entries of a matrix, it shows that the entries of A are continuous functions of the entries of A, as long as det A 0. By the way, there is nothing special about the real number system here. Everything we have said works equally well if R is replaced by C, or indeed by any field (except that the preceding remark about continuity only applies to fields where the notion of continuity makes sense). Matrix Norms: It is often desirable to have a notion of the size of a matrix, like the norm or magnitude of a vector. One way to manufacture such a thing is simply to regard the n 2 entries of a matrix A M n (R) as the components of a vector in R n2 and take its Euclidean norm. The resulting quantity is usually called the Hilbert-Schmidt norm of the matrix; it can be denoted by A HS : [ ] /2 A HS = A ij 2. i,j= 2

However, if we think of a matrix as determining a linear transformation of R n, namely f(x) = Ax, it is often better to use a different norm that is more closely related to the action of A on vectors. Namely, since the unit sphere {x : x = } in R n is compact and the function x Ax from R n to R is continuous on it, it has a maximum value, which we denote by A op : A op = max x = Ax. A op is called the operator norm of A (the term linear operator being a common synonym for linear transformation ). It is easy to check that both A HS and A op are indeed norms on the vector space M n (R) in the proper sense of the word; that is, they satisfy ca = c A (c R); A + B A + B, A = 0 A = 0. We observe that if x is any nonzero vector in R n and u = x/ x is the corresponding unit vector, then Ax = A( x u) = x Au x A op = A op x, with equality if x is the unit vector that achieves the maximum in the definition of A op. In other words, A op is the smallest constant C such that Ax C x for all x R n. This property of the operator norm makes it very handy for calculations involving the magnitudes of vectors. On the other hand, it is often not easy to calculate A op exactly in terms of the entries of A. The Hilbert-Schmidt norm is easier to compute, so it is good to know that the operator norm is no bigger than it: A op A HS. (Equality is achieved for any matrix A with only one nonzero entry.) The reason is the following: Let r,..., r n be the rows of A; then the jth component of Ax is r j x, so by the Cauchy-Schwarz inequality, Ax 2 = (Ax) j 2 = (r j x) 2 r j 2 x 2 = j= A ij 2 x 2 = A 2 HS x 2. There s also an inequality going the other way. The jth column of A is Ae j where e j is the jth unit coordinate vector, so i= A 2 HS = Ae j 2 j= A 2 op e j 2 = n A 2 op. j= That is, A HS n A op ; equality is achieved when A is the identity matrix. The operator and Hilbert-Schmidt norms both have the useful property that the norm of a product is at most the product of the norms: AB op A op B op, AB HS A HS B HS. 3

For the operator norm this follows from the simple calculation ABx A op Bx A op B op x, and for the Hilbert-Schmidt norm it comes from the fact that the ijth entry of AB is the dot product of the ith row of A with the jth column of B together with the Cauchy-Schwarz inequality; details are left to the reader. The Inverse Mapping Theorem (or Inverse Function Theorem): This is Theorem 3.6 in Apostol. Rather than going through the proof there, which depends on a series of preliminary results (Theorems 3.2 3.5), I m going to present the proof in Rudin s Principles of Mathematical Analysis, which is perhaps more elegant and certainly shorter. It uses the following three lemmas. In them, and throughout this proof, the norm of a matrix is always understood to be the operator norm. Lemma (The Mean Value Inequality). Let U be an open convex set in R n, and let f : U R m be differentiable everywhere on U. If Df(z) C for all z U, then f(y) f(x) C y x for all x, y U. This follows from Theorem 2.9 of Apostol by taking a to be the unit vector in the direction of f(y) f(x): f(y) f(x) = a [f(y) f(x)] = a [(Df(z))(y x)] a Df(z) y x C y x. Lemma 2 (The Fixed Point Theorem for Contractions). Let (M, d) be a complete metric space, and let φ : M M be a map such that for some constant c < we have d(φ(x), φ(y)) cd(x, y) for all x, y M. Then there is a unique x M such that φ(x) = x. This is Theorem 4.48 in Apostol. Lemma 3. Suppose A M n (R) is invertible. If B A < / A, then B is invertible. Proof. Observe that B = A + (B A) = A[I + A (B A)]. If x 0, we have A (B A)x A B A x < x ; hence A (B A)x x; hence [I + A (B A)]x 0; hence Bx 0 since A is invertible. But for square matrices, invertibility is equivalent to the condition that {x : Bx = 0} = {0}, so B is invertible. Theorem (Inverse Mapping Theorem). Let f be a mapping of class C from an open set S R n into R n. Suppose that a is a point in S such that Df(a) is invertible, and let b = f(a). Then there are open sets U and V in R n with a U and b V such that f maps U one-to-one onto V. Moreover, the inverse mapping g : V U, defined by the condition f(g(y)) = y for y V, is of class C on V, and Dg(y) = [Df(g(y))]. 4

Proof. To simplify the notation, set A = Df(a); thus A is invertible. Given y R n, define a function φ : S R n by φ(x) = x + A (y f(x)). () (We really should write φ y instead of φ to indicate the dependence on y, but we won t.) The point of φ is that y = f(x) if and only if φ(x) = x, so solving f(x) = y amounts to finding a fixed point of φ. Since Df is continuous, there is an open ball U S centered at a such that Df(x) A 2 A for x U. (2) Let V = f(u). To prove the first assertion of the theorem, we need to show that f is one-to-one on U and that V is open. Since Dφ(x) = I A Df(x) = A (A Df(x)), we have Dφ(x) A A Df(x)) <, and hence, by Lemma, 2 φ(x ) φ(x 2 ) 2 x x 2 for x, x 2 U. (3) It follows that f is one-to-one on U, for if f(x ) = y = f(x 2 ) then x and x 2 are both fixed points of φ, so x x 2 2 x x 2, which is possible only if x = x 2. Now, to show that V is open, suppose y 0 = f(x 0 ) V, and let B be an open ball of radius r centered at x 0, where r is small enough so that B U. We will show that y V provided that y y 0 < r/2 A. Taking such a y as the y in (), we have Hence, by (3), if x B, φ(x 0 ) x 0 = A (y y 0 ) A y y 0 < 2 r. φ(x) x 0 φ(x) φ(x 0 ) + φ(x 0 ) x 0 < 2 x x 0 + 2 r r, that is, φ(x) B. Thus φ maps B into itself, and by (3) it is a contraction on B since B U. Moreover, B is compact and hence complete. Hence, by Lemma 2, φ has a fixed point x B. That is, there is an x B U such that f(x) = y, and hence y f(u) = V, as claimed. We have shown that f : U V is invertible; we denote its inverse by g : V U. To prove the last assertion of the theorem, suppose y V and y + k V, and set x = g(y) and h = g(y + k) g(y); thus y = f(x) and y + k = f(x + h). Taking this y as the y in (), we have φ(x + h) φ(x) = h + A [f(x) f(x + h)] = h A k. It follows from (3) that h A k h, so 2 A k h and hence 2 h 2 A k. (4) 5

By (2) and Lemma 3, Df(x) is invertible; let us denote its inverse by B. Since g(y + k) g(y) Bk = (x + h) x Bk = Bk + h = B[f(x + h) f(x) Df(x)h], (4) shows that g(y + k) g(y) Bk k 2 B A f(x + h) f(x) Df(x)h. h Now let k 0; then by (4), h 0 too. The numerator on the right is the error term in the definition of differentiability of f at x, so the quantity on the right tends to 0. Hence the quantity on the left does so too, and by definition of differentiability again, this means that g is differentiable at y and Dg(y) = B = [Df(x)] = [Df(g(y))] as claimed. Finally, g is continuous (since g is differentiable), Df is continuous (since f is of class C ), and the inversion map is continuous on matrix space. Therefore Dg is continuous and hence g is of class C. Note that the formula for Dg(y) is in accordance with the chain rule: since f g is the identity mapping, the chain rule says that I = D(f g)(y) = Df(g(y)) Dg(y) (matrix multiplication), i.e., Dg(y) = [Df(g(y))]. 6