Inner products and norms Formal definitions Euclidian inner product and orthogonality Vector norms Matrix norms Cauchy-Schwarz inequality

Similar documents
Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Numerical Analysis Lecture Notes

Chapter 6. Orthogonality

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Similarity and Diagonalization. Similar Matrices

Lecture 5 Principal Minors and the Hessian

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Inner Product Spaces

[1] Diagonal factorization

by the matrix A results in a vector which is a reflection of the given

DERIVATIVES AS MATRICES; CHAIN RULE

Math 4310 Handout - Quotient Vector Spaces

Mean value theorem, Taylors Theorem, Maxima and Minima.

Orthogonal Diagonalization of Symmetric Matrices

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Recall that two vectors in are perpendicular or orthogonal provided that their dot

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

5.3 The Cross Product in R 3

Linear Algebra Review. Vectors

Vector and Matrix Norms

NOTES ON LINEAR TRANSFORMATIONS

Lecture 2: Homogeneous Coordinates, Lines and Conics

Inner Product Spaces and Orthogonality

Section Inner Products and Norms

DIFFERENTIABILITY OF COMPLEX FUNCTIONS. Contents

3. INNER PRODUCT SPACES

Numerical Analysis Lecture Notes

Introduction to Matrix Algebra

Name: Section Registered In:

BANACH AND HILBERT SPACE REVIEW

Linear Algebra I. Ronald van Luijk, 2012

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

4.5 Linear Dependence and Linear Independence

October 3rd, Linear Algebra & Properties of the Covariance Matrix

LINEAR ALGEBRA W W L CHEN

x = + x 2 + x

Zeros of a Polynomial Function

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

MATH APPLIED MATRIX THEORY

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Linear Algebra Notes

Applied Linear Algebra I Review page 1

2.3 Convex Constrained Optimization Problems

Continued Fractions and the Euclidean Algorithm

These axioms must hold for all vectors ū, v, and w in V and all scalars c and d.

Section 1.1. Introduction to R n

Metric Spaces. Chapter Metrics

Chapter 17. Orthogonal Matrices and Symmetries of Space

Continuity of the Perron Root

DATA ANALYSIS II. Matrix Algorithms

Lecture 5: Singular Value Decomposition SVD (1)

Inner product. Definition of inner product

88 CHAPTER 2. VECTOR FUNCTIONS. . First, we need to compute T (s). a By definition, r (s) T (s) = 1 a sin s a. sin s a, cos s a

Scalar Valued Functions of Several Variables; the Gradient Vector

Matrix Calculations: Applications of Eigenvalues and Eigenvectors; Inner Products

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = i.

Lecture 14: Section 3.3

Second Order Linear Nonhomogeneous Differential Equations; Method of Undetermined Coefficients. y + p(t) y + q(t) y = g(t), g(t) 0.

Data Mining: Algorithms and Applications Matrix Math Review

5 Homogeneous systems

1 VECTOR SPACES AND SUBSPACES

Precalculus REVERSE CORRELATION. Content Expectations for. Precalculus. Michigan CONTENT EXPECTATIONS FOR PRECALCULUS CHAPTER/LESSON TITLES

Elasticity Theory Basics

Introduction to Algebraic Geometry. Bézout s Theorem and Inflection Points

MATHEMATICAL METHODS OF STATISTICS

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

Notes on Determinant

F Matrix Calculus F 1

ASEN Structures. MDOF Dynamic Systems. ASEN 3112 Lecture 1 Slide 1

Dot product and vector projections (Sect. 12.3) There are two main ways to introduce the dot product

3.3 Real Zeros of Polynomials

α = u v. In other words, Orthogonal Projection

FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES

28 CHAPTER 1. VECTORS AND THE GEOMETRY OF SPACE. v x. u y v z u z v y u y u z. v y v z

The Characteristic Polynomial

CONTROLLABILITY. Chapter Reachable Set and Controllability. Suppose we have a linear system described by the state equation

We shall turn our attention to solving linear systems of equations. Ax = b

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.

Lecture 3: Finding integer solutions to systems of linear equations

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Linear Algebra Done Wrong. Sergei Treil. Department of Mathematics, Brown University

Vector Math Computer Graphics Scott D. Anderson

Linear algebra and the geometry of quadratic equations. Similarity transformations and orthogonal matrices

Cross product and determinants (Sect. 12.4) Two main ways to introduce the cross product

South Carolina College- and Career-Ready (SCCCR) Pre-Calculus

Linear-Quadratic Optimal Controller 10.3 Optimal Linear Control Systems

Real Roots of Univariate Polynomials with Real Coefficients

Lecture L3 - Vectors, Matrices and Coordinate Transformations

Quadratic forms Cochran s theorem, degrees of freedom, and all that

PYTHAGOREAN TRIPLES KEITH CONRAD

1 Norms and Vector Spaces

State of Stress at Point

Linear Algebra: Determinants, Inverses, Rank

Mathematics (MAT) MAT 061 Basic Euclidean Geometry 3 Hours. MAT 051 Pre-Algebra 4 Hours

Notes on Symmetric Matrices

Transcription:

Lec2 Page 1 Lec2p1, ORF363/COS323 This lecture: The goal of this lecture is to refresh your memory on some topics in linear algebra and multivariable calculus that will be relevant to this course. You can use this as a reference throughout the semester. The topics that we cover are the following: Instructor: Amir Ali Ahmadi Fall 2016 TAs: B. El Khadir, G. Hall, Z. Li, K. Wang, J. Ye, J. Zhang Inner products and norms Formal definitions Euclidian inner product and orthogonality Vector norms Matrix norms Cauchy-Schwarz inequality Eigenvalues and eigenvectors Definitions Positive definite and positive semidefinite matrices Elements of differential calculus Continuity Linear, affine and quadratic functions Differentiability and useful rules for differentiation Gradients and level sets Hessians Taylor expansion Little o and big O notation Taylor expansion

Lec2 Page 2 Lec2p2, ORF363/COS323 Inner products and norms Definition of an inner product An inner product is a real-valued function following properties: that satisfies the Positivity: and iif x=0. Symmetry: Additivity: Homogeneity : Examples in small dimension Here are some examples in and that you are already familiar with. Example 1: Classical multiplication Check that this is indeed a inner product using the definition. Example 2: This geometric definition is equivalent to the following algebraic one: Notice that the inner product is positive when is smaller than 90 degrees, negative when it is greater than 90 degrees and zero when degrees.

Lec2 Page 3 Lec2p3, ORF363/COS323 Euclidean inner product The two previous examples are particular cases ( and ) of the Euclidean inner product: Check that this is an inner product using the definition. Orthogonality We say that two vectors and are orthogonal if Note that with this definition the zero vector is orthogonal to every other vector. But two nonzero vectors can also be orthogonal. For example,

Lec2 Page 4 Lec2p4, ORF363/COS323 Norms A vector norm is a real valued function following properties: that satisfies the Positivity: and iif. Homogeneity: for all Triangle inequality : Basic examples of vector norms Check that these are norms using the definition! When no index is specified on a norm (e.g., Euclidean norm. this is considered to be the For the three norms above, we have the relation Given any inner product one can construct a norm given by. But not every norm comes from an inner product. (For example, one can show that the norm above doesn't.) Cauchy Schwarz Inequality For any two vectors and in, we have the so-called Cauchy-Schwarz inequality: Furthermore, equality holds iif for some

Lec2 Page 5 Lec2p5, ORF363/COS323 Matrix norms (We skipped this topic in lecture. We'll come back to it as we need to.) Similar to vector norms, one can define norms on matrices. These are functions that satisfy exactly the same properties as in the definition of a vector norm (see page 36 of [CZ13]). Induced norms Consider any vector norm. The induced norm on the space of matrices is defined as: Notice that the vector norm and the matrix norm have the same notation; it is for you to know which one we are talking about depending on the context. One can check that satisfies all properties of a norm. Frobenius norm The Frobenius norm is defined by: The Frobenius norm is an example of a matrix norm that is not induced by a vector norm. Indeed, for any induced norm (why?) but Submultiplicative norms A matrix norm is submultiplicative if it satisfies the following inequality: All induced norms are submultiplicative. The Frobenius norm is submultiplicative. Not every matrix norm is submultiplicative: Take Then But Hence and

Lec2 Page 6 Lec2p6, ORF363/COS323 Continuity We first give the definition for a univariate function and then see that it generalizes in a straightforward fashion to multiple dimensions using the concept of a vector norm. Definition in A function is continuous at a point if for all with we have A function is said to be continuous if it is continuous at every point over its domain. Definition in A function is continuous at if : A function that is not continuous Once again, if is continuous at all points in its domain, then is said to be continuous. Remarks. If in the above definition we change the 2-norm with any other vector norm, the class of continuous functions would not change. This is because of "equivalence of norms in finite dimensions", a result we didn't prove. A function given as is continuous if and only if each entry is continuous.

Lec2 Page 7 Lec2p7, ORF363/COS323 Linear, Affine and Quadratic functions Linear functions A function is called a linear if: and Any linear function can be represented as where is an matrix. The special case where functions take the form will be encountered a lot. In this case, linear for some vector Linear Affine Linear Affine functions A function is affine if there exists a linear function and a vector such that: When affine functions are functions of the form where

Lec2 Page 8 Lec2p8, ORF363/COS323 Quadratic functions A quadratic form is a function that can be represented as where Q is a generality (i.e., matrix that we can assume to be symmetric without loss of Why can we assume this without loss of generality? If is not symmetric, then we can define which is a symmetric matrix (why?) and we would have (why?). What do these functions look like in small dimensions? When we have where When, and A quadratic function is a function that is the sum of a quadratic form and an affine function: Quadratic form Quadratic function Quadratic form

Lec2 Page 9 Lec2p9, ORF363/COS323 Eigenvalues and Eigenvectors Definition Let be an square matrix. A scalar and a nonzero vector satisfying the equation are respectively called an eigenvalue and an eigenvector of. In general, both and may be complex. For to be an eigenvalue it is necessary and sufficient for the matrix to be singular, that is ( here is the identity matrix). We call the polynomial characteristic polynomial of. the The fundamental theorem of algebra tells us that the characteristic polynomial must have roots. These roots are the eigenvalues of Once an eigenvalue is computed, we can solve a linear system to to obtain the eigenvectors. You should be comfortable with computing eigenvalues of matrices. Eigenvalues and eigenvectors of a symmetric matrix is a symmetric matrix if All eigenvalues of a symmetric matrix are real. Proof. Any real symmetric matrix has a set of real eigenvectors that are mutually orthogonal. (We did not prove this.)

Lec2 Page 10 Lec2p10, ORF363/COS323 Positive definite and Positive semidefinite matrices A symmetric matrix is said to be Notation Positive semidefinite (psd) if Positive definite (pd) if for all for all Negative semidefinite if is positive semidefinite. Negative definite if is positive definite. Indefinite if it is neither positive semidefinite nor negative semidefinite. Note: The [CZ13] book uses the notation instead of (and similarly for the other notions). We reserve the notation for matrices whose entries are nonengative numbers. The notation is much more common in the literature for positive semidefiniteness. Link with the eigenvalues of the matrix A symmetric matrix is postive semidefinite (resp. positive definite) if and only if all eigenvalues of are nonnegative (resp. positive). As a result, a symmetric matrix is negative semidefinite (resp. negative definite) if and only if the eigenvalues of are nonpositive (resp. negative). Here is the easier direction of the proof (the other direction is also straightforward; see [CZ13]):

Lec2p11, ORF363/COS323 Positive definite and positive semidefinite matrices (cont'd) Sylvester's criterion Sylvester's criterion provides another approach to testing positive definiteness or positive semidefiniteness of a matrix. A symmetric matrix is positive definite if and only if are positive, where are submatrices defined as in the drawing below. These determinants are called the leading principal minors of the matrix. There are always leading principal minors. A symmetric matrix is positive semidefinite if and only if are nonnegative, where are submatrices obtained by choosing a subset of the rows and the same subset of the columns from the matrix. The scalars are called the principal minors of. Lec2 Page 11

Lec2 Page 12 Lec2p12, ORF363/COS323 Gradients, Jacobians, and Hessians Partial derivatives Recall that the partial derivative of a function variable is given by with respect to a where is the -th standard basis vector in i.e., the -th column of the identity matrix. The Jacobian matrix For a function given as, the Jabocian matrix is the matrix of first partial derivatives: (The notation of the CZ book is ) The first order approximation of near a point is obtained using the Jacobian matirx: Note that this is an affine function of. The gradient vector The gradient of a real-valued function is denoted by and is given by This is a very important vector in optimization. As we will see later, at every point, the gradient vector points in a direction where the function grows most rapidly. Image credit: [CZ13]

Lec2 Page 13 Lec2p13, ORF363/COS323 Level sets For a scalar, the -level set of a function is defined as and the -sublevel set of is given by Fact: At any point the gradient vector is orthogonal to the tangent to the level set going through See page 70 of [CZ14] for a proof. Level sets and gradient vectors of a function. Zooming in on the same picture to see orthogonality. The Hessian matrix For a function that is twice differentiable, the Hessian matrix is the matrix of second derivatives: Remarks: If is twice continuously differentiable, the Hessian matrix is always a symmetric matrix. This is because partial derivatives commute:. The [CZ13] book uses the notation for the Hessian matrix. Second derivatives carry information about the "curvature" of the function

Lec2 Page 14 Lec2p14, ORF363/COS323 Practical rules for differentiation The sum rule If and then The product rule Let and be two differentiable functions. Define the function by. Then is also differentiable and and The chain rule Let and. We suppose that g is differentiable on an open set and that is diffentiable on (a,b). Then the composite function given by is differentiable on (a,b) and: A special case that comes up a lot Let and be two fixed vectors in and let Define a univarite function Then Gradients and Hessians of affine and quadratic functions If, then and. If and is symmetric, then and.

Lec2 Page 15 Lec2p15, ORF363/COS323 Taylor expansion Little o and Big O notation (Fall '16: I am skipping the Big O notation and anything that uses it) These notions are used to compare the growth rate of two functions near the origin. Definition Let be a function that does not vanish in a neighborhood around the origin, except possibly at the origin. Let be defined in a domain that includes the origin. Then we write: (pronounced " is big Oh of ") to mean that the quotient is bounded near 0; that is there exists and such that if then (pronounced " is little oh of ") if Intuitively, this means that goes to zero faster than Examples as (can take (why?) (why?) (why?)

Lec2p16, ORF363/COS323 Little o and Big O notation Examples (cont'd) Remarks We gave the definition of little o and big O for comparing growth rates around One can give similar definitions around any other point. In particular, in many areas of computing, these notations are used to compare growth rates of functions at infinity; i.e. as If true. then but the converse is not necessarily Lec2 Page 16

Lec2 Page 17 Lec2p17, ORF363/COS323 Taylor expansion Taylor expansion in one variable The idea behind Taylor expansion is to approximate a function around a given point by functions that are "simpler"; in this case by polynomials. As we increase the order of the Taylor expansion, we increase the degree of this polynomial and we reduce the error in our approximation. The little o and big O notation that we just introduced nicely capture how our error of approximation scales around the point we are approximating. Here are two theorems we can state for functions of a single variable: Version 1 Assume that a function is in i.e., times continuously differentiable (meaning that all exist and are continuous). Consider a point around which we will Taylor expand and define Then, Version 2 Assume that a function is in Consider a point around which we will Taylor expand and define Then,

Lec2 Page 18 Lec2p18, ORF363/COS323 Extension to multiple variable functions When we will only care about first and second order Taylor expansions in this class. Here, the concepts of a gradient vector and a Hessian matrix need to come in to replace first and second derivatives. We state four different variations of the theorem below. The point we are approximating the function around is denoted by First order If is If is Second order If is If is

Lec2 Page 19 Lec2p19, ORF363/COS323 Notes: The material here was a summary of the relevant parts of [CZ13] collected in one place for your convenience. The relevant sections for this lecture are chapters 2,3,5 and more specifically sections: 2.1 3.1, 3.2, 3.4 5.2, 5.3, 5.4, 5.5, 5.6. I filled in some more detail in class, with some examples and proofs given here and there. Your HW 1 will give you some practice with this material. References: - [CZ13] E.K.P. Chong and S.H. Zak. An Introduction to Optimization. Fourth edition. Wiley, 2013.