CO 367. Nonlinear Optimization. Dr. Dmitriy Drusvyatskiy Winter 2014 (1141) University of Waterloo

Size: px
Start display at page:

Download "CO 367. Nonlinear Optimization. Dr. Dmitriy Drusvyatskiy Winter 2014 (1141) University of Waterloo"

Transcription

1 Contents CO 367 Nonlinear Optimization Dr. Dmitriy Drusvyatskiy Winter 2014 (1141) University of Waterloo 1 Lecture 1: Introduction to Nonlinear Optimization General formalism Unconstrained optimization Lecture 4: Introduction to iterative methods for unconstrained optimization Line search method Trust region Lecture 7: Trust region methods (a quick look) Trust region methods

2 Administrative Instructor: Dmitriy Drusvyatskiy (MC 4012); TA: Ahmad Abdi. Webpage: learn.uwaterloo.ca. Textbook: The mathematics of nonlinear programming by Peressini, Sullivan, Uhl. Scribing system. 1 Lecture 1: Introduction to Nonlinear Optimization 1.1 General formalism Notation: R n is the set of ordered n-tuples (x 1,..., x n ). The dot product will be denoted by n x, y = x T y = x i y i. i=1 The norm on R n is x = n x 2 i. i=1 The general problem of nonlinear optimization: given C 2 smooth f, g 1,..., g m : R n R, find a minimizer of min f(x) s.t. g i (x) 0 for i = 1,..., m. ( ) We ll call f the objective function, g i the constraint functions and the feasible region. D = {x : g i (x) 0 for all i = 1,..., m} If f, g 1,..., g m are linear then ( ) is called a linear program. This is a huge class of problems. 1.1 Example. Define g i (x) = x 2 i 1 for i = 1,..., n and ĝ i(x) = 1 x 2 i for i = 1,..., n. Now g i (x) 0 and ĝ i (x) 0 imply x 2 i = 1 hence x i = ±1. There is a vast number of applications, both in applied math and engineering. 1.2 Definition. Consider f : R n R and a subset D R n. Then a point x in D is A global minimizer for f on D if f( x) f(x) for all x D. A strict global minimizer for f on D if f( x) < f(x) for all x x D. A local minimizer for f on D if there exists ɛ > 0 such that for all x D B ɛ ( x). Analogous definition for strict local minimizer. f( x) f(x) Most algorithms in nonlinear programming are designed specifically to find local minimizers. Passing from local to global requires convexity. 1.3 Remark. One has to be careful with the constraints! This is illustrated by Whitney s theorem: for any closed set D R n, there exists a C -smooth function g : R n R such that D = {x : g(x) = 0}. 2

3 So without further restrictions on the constraints, we might as well be optimizing a function over any closed set. An easier problem is to design a procedure to check whether a point x is a local minimizer. Surprisingly, such a procedure can be used to design minimization algorithms! This brings us to the theme of the course: the interplay between optimality conditions and algorithm design. The constraints are really important; they need to be incorporated into everything. We also need certain conditions on how the constraints interact. Finding global minimizers is hopeless; we need to settle for local minimizers unless convexity is present, then they are the same. 1.2 Unconstrained optimization Simplest situation f : R R. The following is key. 1.4 Theorem. Let f : (a, b) R be C 2 -smooth. Then for any x and x in (a, b), there exists z strictly between x and x such that f(x) = f( x) + f ( x) (x x) + f (z) (x x) 2 2 Proof. Bonus Q on HW Corollary (Optimality Conditions I). Let f : (a, b) R be C 2 -smooth. Then the following are true: 1. If x is a local minimizer of f, then f ( x) = 0 and f ( x) 0 2. If x satisfies f ( x) = 0 and f ( x) > 0 then x is a strict local minimizer. 1.6 Remark. If f ( x) = f ( x) = 0 then one cannot deduce anything about optimality. In some sense this is rare. Proof. We have: 1. Suppose x is a local minimizer. Case 1: Suppose f ( x) > 0. Then for x i x we know for large i. Contradiction. f(x i ) f( x) x i x Case 2: Suppose f ( x) < 0; argue similarly but take x i x. From this we deduce f ( x) = 0. Suppose f ( x) < 0. There exists δ > 0 such that f (x) < 0 for all x ( x δ, x+δ). Thus, for x ( x δ, x+δ), there exists z between x and x such that (Taylor) contradiction, so f ( x) 0. > 0 f(x) = f( x) + f (z) (x x) 2 < f( x) 2 2. Exercise. Follows directly from Taylor and the hypotheses. Definition of gradient and Hessian for real-valued function of multiple variables. the Hessian of f at x [ ] 2 f( x) = 2 f x i x j ( x) i,j=1,...,n 1.7 Theorem (Taylor II). Consider a C 2 -smooth f : U R where U is an open subset of R n. If x and x are such that the segment [ x, x] = { x + t(x x) : t [0, 1]} is contained entirely in U, then there exists a point z ( x, x) such that f(x) = f( x) + f( x), x x f(z)(x x), x x 3

4 Proof. Choose ɛ > 0 satisfying x + t(x x) U t ( ɛ, 1 + ɛ). Define ψ(t) = f( x + t(x x)) A question on HW 1 will ask you to verify that ψ (t) = f( x + t(x x)), x x ψ (t) = 2 f( x + t(x x))(x x), x x. Now apply Taylor I. Get s such that the equation from Taylor holds. Define z = x + s(x x). Check that this works. 1.8 Definition (positive definite matrices). An n n symmetric matrix A is positive semi-definite (A 0) if we have Ax, x 0 x R n. positive definite (A 0) Ax, x > 0 0 x R n. 1.9 Corollary (Multivariate Optimality Conditions). Let f : U R be C 2 -smooth, U open subset of R n. Then the following are true: 1. If x is a local minimizer of f, then we have f( x) = 0 and 2 f( x) 0 2. If f( x) = 0 and 2 f( x) 0, then x is a strict local minimizer. Proof. Analogous to the 1-dimensional case Theorem (multivariate optimality conditions). Let f : U R be C 2 -smooth on an open set U in R n. Then the following are true: 1. If x is a local minimizer of f, then f( x) = 0 and 2 f( x) 0 2. If x satisfies f( x) = 0 and 2 f( x) 0, then x is a strict local minimizer of f Definition (critical points). A point x U is a critical point of f : U R if f( x) exists and satisfies f( x) = Remark. Here is a naive recipe for minimizing f : R n R. 1. Find all critical points x of f. 2. Check if 2 f( x) is positive definite. We need to be able to check if a matrix is positive definite. The definition merely states that A 0 iff Ax, x > 0 for all x 0. This is not a practical way to check if A 0. How to check if a matrix is positive definite? Principal minors, or eigenvalues. Consider a symmetric matrix a 11 a a 1n a 21 a a 2n A =.. a 1n a nn and let I be a subset of {1,..., n} (e.g. I = {2, 4}). Let A[I] be the restriction of A to the rows and columns indexed by I Example Definition. We have: 1. det A[I] is called a principal minor of A [ ] A = , I = {1, 3}, A[I] =

5 2. If I = {1,..., k} then det A[I] is called a leading principal minor Theorem. We have: 1. A 0 if and only if all of its principal minors are A 0 if and only if all of its principal minors are > A 0 if and only if all of its leading principal minors are > Remark. The analog of 3 for 0 is false! See Remark b in the book. Recall 0 v R n is an eigenvector of A if there exists λ R such that Av = λv. The number λ is called an eigenvalue Theorem. We have: 1. A 0 iff all its eigenvalues are A 0 iff all its eigenvalues are > Example. Find global and local minimizers of Solve Case 1 : If (x, y) = (2, 1), then f(x, y) = f(x, y) = x 3 12xy + 8y 3. [ ] 3x 2 12y 12x + 24y 2, 2 f(x, y) = [ 6x ] y f(x, y) = (0, 0) (x, y) = (0, 0) or (x, y) = (2, 1). 2 f(2, 1) = [ ] Observe 12 > 0; this is the first leading principal minor. Also, [ ] det > 0 = 2 f(2, 1) 0 = (2, 1) is a strict local minimizer Case 2 : If (x, y) = (0, 0), then 2 f(0, 0) = [ ] det 2 f(0, 0) < 0. Is it a local maximizer or local minimizer? No because f(x, 0) = x 3. Question: When do minimizers of f : R n R exist? 1.19 Example. f(x) = e x is lower-bounded but has no minimizers. [GRAPH OF EXPONENTIAL] 1.20 Theorem (*). If f : R n R is continuous, then it has a global minimizer on any closed and bounded subset D R n. Proof. See bonus question on HW. What about on all of R n? 1.21 Definition. A continuous function f : R n R is coercive if for any sequence x i with x i, it must be the case that f(x i ) Example. We have: f 1 (x) = x 2 is coercive. f 2 (x) = Ax, x for A 0 is coercive. See problem on HW2. g(x) = x is not coercive. h(x) = e x is not coercive Theorem. A coercive function f : R n R always has a minimizer. Proof. Choose r R such that r is greater than the infimum of f. Consider L := {x : f(x) r}. Then L is nonempty. L is bounded (because f is coercive) and closed (because f is continuous); note L = f 1 (, r]. By Theorem (*), there exists a minimizer of f on L, call it x. For any x L, we have f(x) f( x). For any x / L, we have f(x) > r. Thus x is a global minimizer of f. 5

6 2 Lecture 4: Introduction to iterative methods for unconstrained optimization Recall that we are interested in finding a minimizer of a C 2 -smooth function f : R n R. Iterative method. A procedure that produces a sequence {x k } in R n that we can expect to converge to a critical point of f. There are two fundamental strategies: line search methods and trust region methods. 2.1 Line search method At each iteration k, you choose a direction 0 v k R n and then choose α k 0 that approximately solves min f(x k + αv k ) α>0 Then declare x k+1 = x k + α k v k. Note: finding the exact minimizer of (*) is usually expensive and unnecessary. For these methods, the main points are how to choose a good direction v k and then a good α k. 2.2 Trust region In each iteration, we construct (or update) a model of f. That is m k : R n R is a simple function that approximates f well on a set Ω k containing x k. Then we compute the minimizer x of min x m k (x) such that x Ω k. If f( x) is close to m k ( x), then we declare x k+1 := x. If not, then we shrink Ω k and repeat. Usually Ω k is a ball or a box around x k. The line search and the trust region approaches differ in the order that they choose a direction and stepsize. Comparing algorithms Iteration count: # of iterations need to get within ɛ of optimal soln Cost of each iteration (e.g. # of matrix vector multiplications, # of eigenvalue decompositions, # function calls, # gradient evaluations) These two criteria are often opposing. Designing an algorithm requires you to know what information can be gathered about the function. We will assume f(x k ), f(x k ), 2 f(x k ) are available. Notation: o(t) stands for any function satisfying 2.1 Example. f(t) = t 2, g(t) = t is not o(t). For f : R R that is C 2 -smooth Multivariate version: for f : R n R that is C 2 -smooth o(t) lim = 0 t 0 t f( x + t) = f( x) + tf ( x) + o(t) f( x + t) = f( x) + tf ( x) t2 f ( x) + o(t 2 ) f( x + tv) = f( x) + t f( x), v + o(t) f( x + tv) = f( x) + t f( x), v t2 2 f( x)v, v + o(t 2 ) Line search methods in detail: two things to understand: (1) direction v k, (2) step size α k. We would like to ensure That means v k needs to define a direction of decrease. f(x k ) > f(x k+1 ) > f(x k+2 ) >... 6

7 2.2 Theorem. For any v satisfying v, f( x) < 0 there exists δ > 0 such that f( x + tv) < f(x) for all t (0, δ) Proof. Choose v R n such that v, f( x) < 0. Then f( x + tv) = f( x) + t f( x), v + o(t). So f( x + tv) f( x) t 2.3 Example. v = f( x). In fact, f( x) f( x) = f( x), v + o(t) t = f( x + tv) f( x) < 0 for all small t. Proof. For any v with v = 1, we have by Cauchy-Schwarz inequality < 0 for all small t. is the unique minimizer of min v, f( x) subject to v = 1. v, f( x) v f( x) = f( x) But This gives the result. f( x) f( x), f( x) = f( x) 3 Lecture 7: Trust region methods (a quick look) Last time: 3.1 Theorem (Convergence of Newton s Method). Suppose f : R n R is C 2 -smooth. 2 f(x ) is positive definite, f(x ) = 0 (+ minor technical condition). Consider the iterates x k+1 = x k + t k v N, where t k are chosen to satisfy Wolf s conditions (with c 1 < 1/2). Then if the starting point x 0 is sufficiently close to x, we have 1. t k = 1 satisfies Wolf s conditions 2. x k converges to x 3. If we choose t k = 1, then we have quadratic convergence for some r 0. x k+1 x r x k x 2 f(x k+1 ) r f(x k ) 2 In practice, to get global convergence of Newton s method, in each iteration k you consider 2 f(x k ). Remember v n = [ 2 f k ] 1 f k when 2 f k 0 If 2 f k is not positive definite, then replace 2 f k by a close positive definite matrix in the formula for v k. 2. Approaches 1. Set v k = ( 2 f k + λi) 1 f k for λ large. One choice of λ = δ + λ 1 ( 2 f k ) 2. Diagonalize λ 2 f k = U... U T λn Set all negative eigenvalues to δ > 0 and postmultiply back by U. Then run line search to get t k. If 2 f k > 0, then check if t k = 1 is acceptable. Otherwise run the line search. 7

8 3.1 Trust region methods Given f : R n R and an iterate x k approximate f by a local model function. Then set v k+1 to be the minimizer of m k (v) = f(x k ) + f(x k ), v B kv, v min m k (v) qquad v k where k is some number 0. Set x k+1 = x k + v k. Then adjust k to get k+1. Key quantity (actual decrease over predicted decrease). Algorithm (trust region) Given ˆ > 0, 0 (0, ˆ ) and η [0, 1/4) ρ k := f(x k) f(x k + v k ) m k (0) m k (v k ) for k=0,1,2,... Obtain v_k by APPROXIMATELY solving min m_k(v) s.t. v < \Delta_k Evaluate \rho_k if \rho_k < 1/4 \Delta_{k+1} = (1/4) * \Delta_k else if \rho_k > 3/4 \Delta_{k+1} = \min(2 \Delta_k, \hat{\delta}) else \Delta_{k+1} = \Delta_k if \rho_k > 2 x_{k+1} = x_k + v_k else x_{k+1} = x_k endfor The art here is how to solve approximately the trust region subproblem: min m k(v) v 1 [Here s the basic idea. Just like in line search method, to define approximate, what we needed to do was to have some kind of baseline; the baseline was the derivative. Here we need a baseline; we need to know somehow a good point that almost minimizes the above, and any method that does better than that is going to do well. This is a bit tough, but how can you find an OK solution? Not the true minimizer, but just a quick and dirty solution. What you can do is: steepest descent is the simplest thing I can think of. So let s look at the steepest descent direction for f, and minimize m k along that direction, subject to the constraint v 1. That s not hard. This gives you a baseline called the Cauchy point. As long as you get a fractional improvement over the Cauchy point, your method will do well. There is a 1000 page book just on trust region methods.] Now we re going to go back to the second chapter on convexity; read that too. However we will not follow the book; we will follow lecture notes of Stephen Boyd. This means for a week or two nobody has to scribe anything. [The professor posts all lecture notes online, so these notes have been discontinued.] 8

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d). 1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

Date: April 12, 2001. Contents

Date: April 12, 2001. Contents 2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

More information

DERIVATIVES AS MATRICES; CHAIN RULE

DERIVATIVES AS MATRICES; CHAIN RULE DERIVATIVES AS MATRICES; CHAIN RULE 1. Derivatives of Real-valued Functions Let s first consider functions f : R 2 R. Recall that if the partial derivatives of f exist at the point (x 0, y 0 ), then we

More information

Lecture 5 Principal Minors and the Hessian

Lecture 5 Principal Minors and the Hessian Lecture 5 Principal Minors and the Hessian Eivind Eriksen BI Norwegian School of Management Department of Economics October 01, 2010 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α

More information

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen (für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained

More information

Continuity of the Perron Root

Continuity of the Perron Root Linear and Multilinear Algebra http://dx.doi.org/10.1080/03081087.2014.934233 ArXiv: 1407.7564 (http://arxiv.org/abs/1407.7564) Continuity of the Perron Root Carl D. Meyer Department of Mathematics, North

More information

Practice with Proofs

Practice with Proofs Practice with Proofs October 6, 2014 Recall the following Definition 0.1. A function f is increasing if for every x, y in the domain of f, x < y = f(x) < f(y) 1. Prove that h(x) = x 3 is increasing, using

More information

The Characteristic Polynomial

The Characteristic Polynomial Physics 116A Winter 2011 The Characteristic Polynomial 1 Coefficients of the characteristic polynomial Consider the eigenvalue problem for an n n matrix A, A v = λ v, v 0 (1) The solution to this problem

More information

1 if 1 x 0 1 if 0 x 1

1 if 1 x 0 1 if 0 x 1 Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or

More information

Vector and Matrix Norms

Vector and Matrix Norms Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

BANACH AND HILBERT SPACE REVIEW

BANACH AND HILBERT SPACE REVIEW BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but

More information

24. The Branch and Bound Method

24. The Branch and Bound Method 24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no

More information

Chapter 6. Orthogonality

Chapter 6. Orthogonality 6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be

More information

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013 Notes on Orthogonal and Symmetric Matrices MENU, Winter 201 These notes summarize the main properties and uses of orthogonal and symmetric matrices. We covered quite a bit of material regarding these topics,

More information

(67902) Topics in Theory and Complexity Nov 2, 2006. Lecture 7

(67902) Topics in Theory and Complexity Nov 2, 2006. Lecture 7 (67902) Topics in Theory and Complexity Nov 2, 2006 Lecturer: Irit Dinur Lecture 7 Scribe: Rani Lekach 1 Lecture overview This Lecture consists of two parts In the first part we will refresh the definition

More information

Notes on Determinant

Notes on Determinant ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without

More information

LS.6 Solution Matrices

LS.6 Solution Matrices LS.6 Solution Matrices In the literature, solutions to linear systems often are expressed using square matrices rather than vectors. You need to get used to the terminology. As before, we state the definitions

More information

Nonlinear Programming Methods.S2 Quadratic Programming

Nonlinear Programming Methods.S2 Quadratic Programming Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective

More information

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = 36 + 41i.

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = 36 + 41i. Math 5A HW4 Solutions September 5, 202 University of California, Los Angeles Problem 4..3b Calculate the determinant, 5 2i 6 + 4i 3 + i 7i Solution: The textbook s instructions give us, (5 2i)7i (6 + 4i)(

More information

Math 4310 Handout - Quotient Vector Spaces

Math 4310 Handout - Quotient Vector Spaces Math 4310 Handout - Quotient Vector Spaces Dan Collins The textbook defines a subspace of a vector space in Chapter 4, but it avoids ever discussing the notion of a quotient space. This is understandable

More information

1 Review of Least Squares Solutions to Overdetermined Systems

1 Review of Least Squares Solutions to Overdetermined Systems cs4: introduction to numerical analysis /9/0 Lecture 7: Rectangular Systems and Numerical Integration Instructor: Professor Amos Ron Scribes: Mark Cowlishaw, Nathanael Fillmore Review of Least Squares

More information

Principles of Scientific Computing Nonlinear Equations and Optimization

Principles of Scientific Computing Nonlinear Equations and Optimization Principles of Scientific Computing Nonlinear Equations and Optimization David Bindel and Jonathan Goodman last revised March 6, 2006, printed March 6, 2009 1 1 Introduction This chapter discusses two related

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

160 CHAPTER 4. VECTOR SPACES

160 CHAPTER 4. VECTOR SPACES 160 CHAPTER 4. VECTOR SPACES 4. Rank and Nullity In this section, we look at relationships between the row space, column space, null space of a matrix and its transpose. We will derive fundamental results

More information

Cartesian Products and Relations

Cartesian Products and Relations Cartesian Products and Relations Definition (Cartesian product) If A and B are sets, the Cartesian product of A and B is the set A B = {(a, b) :(a A) and (b B)}. The following points are worth special

More information

Math 120 Final Exam Practice Problems, Form: A

Math 120 Final Exam Practice Problems, Form: A Math 120 Final Exam Practice Problems, Form: A Name: While every attempt was made to be complete in the types of problems given below, we make no guarantees about the completeness of the problems. Specifically,

More information

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem. The Section 1 presents a geometric motivation for the

More information

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES I GROUPS: BASIC DEFINITIONS AND EXAMPLES Definition 1: An operation on a set G is a function : G G G Definition 2: A group is a set G which is equipped with an operation and a special element e G, called

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

Multi-variable Calculus and Optimization

Multi-variable Calculus and Optimization Multi-variable Calculus and Optimization Dudley Cooke Trinity College Dublin Dudley Cooke (Trinity College Dublin) Multi-variable Calculus and Optimization 1 / 51 EC2040 Topic 3 - Multi-variable Calculus

More information

Inner Product Spaces and Orthogonality

Inner Product Spaces and Orthogonality Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,

More information

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Linear Algebra Notes for Marsden and Tromba Vector Calculus Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of

More information

Factorization Theorems

Factorization Theorems Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

More information

Duality of linear conic problems

Duality of linear conic problems Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least

More information

Lecture 13 Linear quadratic Lyapunov theory

Lecture 13 Linear quadratic Lyapunov theory EE363 Winter 28-9 Lecture 13 Linear quadratic Lyapunov theory the Lyapunov equation Lyapunov stability conditions the Lyapunov operator and integral evaluating quadratic integrals analysis of ARE discrete-time

More information

TOPIC 4: DERIVATIVES

TOPIC 4: DERIVATIVES TOPIC 4: DERIVATIVES 1. The derivative of a function. Differentiation rules 1.1. The slope of a curve. The slope of a curve at a point P is a measure of the steepness of the curve. If Q is a point on the

More information

CHAPTER 9. Integer Programming

CHAPTER 9. Integer Programming CHAPTER 9 Integer Programming An integer linear program (ILP) is, by definition, a linear program with the additional constraint that all variables take integer values: (9.1) max c T x s t Ax b and x integral

More information

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725 Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

More information

Nonlinear Algebraic Equations. Lectures INF2320 p. 1/88

Nonlinear Algebraic Equations. Lectures INF2320 p. 1/88 Nonlinear Algebraic Equations Lectures INF2320 p. 1/88 Lectures INF2320 p. 2/88 Nonlinear algebraic equations When solving the system u (t) = g(u), u(0) = u 0, (1) with an implicit Euler scheme we have

More information

To give it a definition, an implicit function of x and y is simply any relationship that takes the form:

To give it a definition, an implicit function of x and y is simply any relationship that takes the form: 2 Implicit function theorems and applications 21 Implicit functions The implicit function theorem is one of the most useful single tools you ll meet this year After a while, it will be second nature to

More information

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1. MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

More information

Chapter 17. Orthogonal Matrices and Symmetries of Space

Chapter 17. Orthogonal Matrices and Symmetries of Space Chapter 17. Orthogonal Matrices and Symmetries of Space Take a random matrix, say 1 3 A = 4 5 6, 7 8 9 and compare the lengths of e 1 and Ae 1. The vector e 1 has length 1, while Ae 1 = (1, 4, 7) has length

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

1 Sets and Set Notation.

1 Sets and Set Notation. LINEAR ALGEBRA MATH 27.6 SPRING 23 (COHEN) LECTURE NOTES Sets and Set Notation. Definition (Naive Definition of a Set). A set is any collection of objects, called the elements of that set. We will most

More information

Orthogonal Diagonalization of Symmetric Matrices

Orthogonal Diagonalization of Symmetric Matrices MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding

More information

Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain

Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain 1. Orthogonal matrices and orthonormal sets An n n real-valued matrix A is said to be an orthogonal

More information

Lecture 2: August 29. Linear Programming (part I)

Lecture 2: August 29. Linear Programming (part I) 10-725: Convex Optimization Fall 2013 Lecture 2: August 29 Lecturer: Barnabás Póczos Scribes: Samrachana Adhikari, Mattia Ciollaro, Fabrizio Lecci Note: LaTeX template courtesy of UC Berkeley EECS dept.

More information

Introduction to Algebraic Geometry. Bézout s Theorem and Inflection Points

Introduction to Algebraic Geometry. Bézout s Theorem and Inflection Points Introduction to Algebraic Geometry Bézout s Theorem and Inflection Points 1. The resultant. Let K be a field. Then the polynomial ring K[x] is a unique factorisation domain (UFD). Another example of a

More information

Mathematical finance and linear programming (optimization)

Mathematical finance and linear programming (optimization) Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may

More information

Solving Systems of Linear Equations

Solving Systems of Linear Equations LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how

More information

Linear Maps. Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007)

Linear Maps. Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007) MAT067 University of California, Davis Winter 2007 Linear Maps Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007) As we have discussed in the lecture on What is Linear Algebra? one of

More information

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method 578 CHAPTER 1 NUMERICAL METHODS 1. ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS As a numerical technique, Gaussian elimination is rather unusual because it is direct. That is, a solution is obtained after

More information

Notes on Factoring. MA 206 Kurt Bryan

Notes on Factoring. MA 206 Kurt Bryan The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor

More information

Nonlinear Algebraic Equations Example

Nonlinear Algebraic Equations Example Nonlinear Algebraic Equations Example Continuous Stirred Tank Reactor (CSTR). Look for steady state concentrations & temperature. s r (in) p,i (in) i In: N spieces with concentrations c, heat capacities

More information

MATH 551 - APPLIED MATRIX THEORY

MATH 551 - APPLIED MATRIX THEORY MATH 55 - APPLIED MATRIX THEORY FINAL TEST: SAMPLE with SOLUTIONS (25 points NAME: PROBLEM (3 points A web of 5 pages is described by a directed graph whose matrix is given by A Do the following ( points

More information

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}. Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian

More information

Derivative Free Optimization

Derivative Free Optimization Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTH-MAT-R--2014/02--SE Department of Mathematics Linköping University S-581 83 Linköping, Sweden. Three lectures 1 on Derivative Free

More information

Solutions to Math 51 First Exam January 29, 2015

Solutions to Math 51 First Exam January 29, 2015 Solutions to Math 5 First Exam January 29, 25. ( points) (a) Complete the following sentence: A set of vectors {v,..., v k } is defined to be linearly dependent if (2 points) there exist c,... c k R, not

More information

Linear Programming Notes V Problem Transformations

Linear Programming Notes V Problem Transformations Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material

More information

Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

More information

x a x 2 (1 + x 2 ) n.

x a x 2 (1 + x 2 ) n. Limits and continuity Suppose that we have a function f : R R. Let a R. We say that f(x) tends to the limit l as x tends to a; lim f(x) = l ; x a if, given any real number ɛ > 0, there exists a real number

More information

3. INNER PRODUCT SPACES

3. INNER PRODUCT SPACES . INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

More information

Math 215 HW #6 Solutions

Math 215 HW #6 Solutions Math 5 HW #6 Solutions Problem 34 Show that x y is orthogonal to x + y if and only if x = y Proof First, suppose x y is orthogonal to x + y Then since x, y = y, x In other words, = x y, x + y = (x y) T

More information

GROUPS ACTING ON A SET

GROUPS ACTING ON A SET GROUPS ACTING ON A SET MATH 435 SPRING 2012 NOTES FROM FEBRUARY 27TH, 2012 1. Left group actions Definition 1.1. Suppose that G is a group and S is a set. A left (group) action of G on S is a rule for

More information

Linear Programming. March 14, 2014

Linear Programming. March 14, 2014 Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1

More information

Lecture 8 February 4

Lecture 8 February 4 ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt

More information

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they

More information

1. Prove that the empty set is a subset of every set.

1. Prove that the empty set is a subset of every set. 1. Prove that the empty set is a subset of every set. Basic Topology Written by Men-Gen Tsai email: b89902089@ntu.edu.tw Proof: For any element x of the empty set, x is also an element of every set since

More information

Høgskolen i Narvik Sivilingeniørutdanningen STE6237 ELEMENTMETODER. Oppgaver

Høgskolen i Narvik Sivilingeniørutdanningen STE6237 ELEMENTMETODER. Oppgaver Høgskolen i Narvik Sivilingeniørutdanningen STE637 ELEMENTMETODER Oppgaver Klasse: 4.ID, 4.IT Ekstern Professor: Gregory A. Chechkin e-mail: chechkin@mech.math.msu.su Narvik 6 PART I Task. Consider two-point

More information

Cross product and determinants (Sect. 12.4) Two main ways to introduce the cross product

Cross product and determinants (Sect. 12.4) Two main ways to introduce the cross product Cross product and determinants (Sect. 12.4) Two main ways to introduce the cross product Geometrical definition Properties Expression in components. Definition in components Properties Geometrical expression.

More information

Microeconomic Theory: Basic Math Concepts

Microeconomic Theory: Basic Math Concepts Microeconomic Theory: Basic Math Concepts Matt Van Essen University of Alabama Van Essen (U of A) Basic Math Concepts 1 / 66 Basic Math Concepts In this lecture we will review some basic mathematical concepts

More information

1 Norms and Vector Spaces

1 Norms and Vector Spaces 008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)

More information

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing

More information

DIFFERENTIABILITY OF COMPLEX FUNCTIONS. Contents

DIFFERENTIABILITY OF COMPLEX FUNCTIONS. Contents DIFFERENTIABILITY OF COMPLEX FUNCTIONS Contents 1. Limit definition of a derivative 1 2. Holomorphic functions, the Cauchy-Riemann equations 3 3. Differentiability of real functions 5 4. A sufficient condition

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH 31 Kragujevac J. Math. 25 (2003) 31 49. SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH Kinkar Ch. Das Department of Mathematics, Indian Institute of Technology, Kharagpur 721302, W.B.,

More information

Applied Algorithm Design Lecture 5

Applied Algorithm Design Lecture 5 Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design

More information

ISOMETRIES OF R n KEITH CONRAD

ISOMETRIES OF R n KEITH CONRAD ISOMETRIES OF R n KEITH CONRAD 1. Introduction An isometry of R n is a function h: R n R n that preserves the distance between vectors: h(v) h(w) = v w for all v and w in R n, where (x 1,..., x n ) = x

More information

1 Review of Newton Polynomials

1 Review of Newton Polynomials cs: introduction to numerical analysis 0/0/0 Lecture 8: Polynomial Interpolation: Using Newton Polynomials and Error Analysis Instructor: Professor Amos Ron Scribes: Giordano Fusco, Mark Cowlishaw, Nathanael

More information

Matrix Representations of Linear Transformations and Changes of Coordinates

Matrix Representations of Linear Transformations and Changes of Coordinates Matrix Representations of Linear Transformations and Changes of Coordinates 01 Subspaces and Bases 011 Definitions A subspace V of R n is a subset of R n that contains the zero element and is closed under

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

constraint. Let us penalize ourselves for making the constraint too big. We end up with a

constraint. Let us penalize ourselves for making the constraint too big. We end up with a Chapter 4 Constrained Optimization 4.1 Equality Constraints (Lagrangians) Suppose we have a problem: Maximize 5, (x 1, 2) 2, 2(x 2, 1) 2 subject to x 1 +4x 2 =3 If we ignore the constraint, we get the

More information

Separation Properties for Locally Convex Cones

Separation Properties for Locally Convex Cones Journal of Convex Analysis Volume 9 (2002), No. 1, 301 307 Separation Properties for Locally Convex Cones Walter Roth Department of Mathematics, Universiti Brunei Darussalam, Gadong BE1410, Brunei Darussalam

More information

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set. MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set. Vector space A vector space is a set V equipped with two operations, addition V V (x,y) x + y V and scalar

More information

Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics

Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics Undergraduate Notes in Mathematics Arkansas Tech University Department of Mathematics An Introductory Single Variable Real Analysis: A Learning Approach through Problem Solving Marcel B. Finan c All Rights

More information

PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include 2 + 5.

PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include 2 + 5. PUTNAM TRAINING POLYNOMIALS (Last updated: November 17, 2015) Remark. This is a list of exercises on polynomials. Miguel A. Lerma Exercises 1. Find a polynomial with integral coefficients whose zeros include

More information

Section 6.1 - Inner Products and Norms

Section 6.1 - Inner Products and Norms Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,

More information

Inner product. Definition of inner product

Inner product. Definition of inner product Math 20F Linear Algebra Lecture 25 1 Inner product Review: Definition of inner product. Slide 1 Norm and distance. Orthogonal vectors. Orthogonal complement. Orthogonal basis. Definition of inner product

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Scalar Valued Functions of Several Variables; the Gradient Vector

Scalar Valued Functions of Several Variables; the Gradient Vector Scalar Valued Functions of Several Variables; the Gradient Vector Scalar Valued Functions vector valued function of n variables: Let us consider a scalar (i.e., numerical, rather than y = φ(x = φ(x 1,

More information

2.1 Complexity Classes

2.1 Complexity Classes 15-859(M): Randomized Algorithms Lecturer: Shuchi Chawla Topic: Complexity classes, Identity checking Date: September 15, 2004 Scribe: Andrew Gilpin 2.1 Complexity Classes In this lecture we will look

More information

Lecture 7: Finding Lyapunov Functions 1

Lecture 7: Finding Lyapunov Functions 1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.243j (Fall 2003): DYNAMICS OF NONLINEAR SYSTEMS by A. Megretski Lecture 7: Finding Lyapunov Functions 1

More information

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8 Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e

More information

Lecture 1: Schur s Unitary Triangularization Theorem

Lecture 1: Schur s Unitary Triangularization Theorem Lecture 1: Schur s Unitary Triangularization Theorem This lecture introduces the notion of unitary equivalence and presents Schur s theorem and some of its consequences It roughly corresponds to Sections

More information