Heavy ball method on convex quadratic problem 1 min

Size: px
Start display at page:

Download "Heavy ball method on convex quadratic problem 1 min"

Transcription

1 Heavy ball method on convex quadratic problem 1 min x 2 x Ax b x A case study Andersen Ang Mathématique et recherche opérationnelle UMONS, Belgium manshun.ang@umons.ac.be First draft: June 26, 2018 Last update : July 28, 2019 Homepage: angms.science

2 Overview 1 1 Convex Quadratic Problem min x 2 x Ax b x 2 Gradient Descent and convergence rate 3 Polyak s Heavy Ball Method 4 Convergence of Heavy Ball Method 5 Summary 2 / 22

3 An inverse problem / unconstrained optimization problem Given A R n n, b R n 1, find x R n 1 by 1 (P 0 ) : min x 2 Ax b 2 2 (P 0 ) is equivalent to the quadratic problem min f(x) = 1 x 2 x A x b x. 1 2 Ax b 2 2 = 1 2 (Ax b) (Ax b) (expand) = 1 ( ) x A Ax x A b b Ax + b b 2 (a b = b a) = 1 ( ) x A Ax 2b Ax + b (A = A A, b = A b) = 1 2 x A x b x b 2 2 Ignoring constant 1 2 b 2 2, denote A as A and b as b, we now focus on the equivalent problem (P) : min x f(x) = 1 2 x Ax b x. 3 / 22

4 The convex quadratic problem (P) : min x f(x) = 1 2 x Ax b x. Properties of f : f is convex with respect to (w.r.t) x f is differentiable w.r.t. x First order derivative (gradient) : xf(x) = Ax b Second order derivative (Hessian) : 2 xf(x) = A Assumption 1 : A is positive definite and symmetric Consequence of the assumption : all eigenvalues of A are positive A is nonsingular = optimal solution x exists, which is x = A 1 b We can further assume li A LI 4 / 22

5 Gradient descent GD solves (P) : minf(x) = 1 x 2 x Ax b x by generating the sequence {x k } k IN : x k+1 = x k t k x f(x k ) where k is iteration (k = 1, 2,...) and t k is step size The sequence x k converges to x at a linear rate (in optimization). The convergence is illustrated by showing the distance function x k x is monotonically decreasing as k increases, under a suitable step size t k Theorem (GD converge at linear rate) Consider the problem min 2 x Ax b x with A being pd and li A LI, we have ( κ 1 ) k x0 x k x 2 x 2 κ + 1 x 1 where κ is the conditional number of A : i.e. κ = L l 5 / 22

6 Useful material before the proof As x = A 1 b, we have b = Ax (1) As x f(x) = Ax b, we have Put (1) into (2) we have x k t k x f(x k ) = x k t k (Ax k b) (2) x k t k x f(x k ) = x k t k (Ax k Ax ) = (I t k A)x k + t k Ax (3) With these we can now prove the convergence, starting with the distance function x k+1 x 2 6 / 22

7 Convergence rate of Gradient Descent in 1 slide x k+1 x 2 = x k t k x f(x k ) x 2 by (3) = (I t k A)x k + t k Ax x 2 = (I t k A)(x k x ) 2 I t k A 2 x k x 2 (1 t k l) x k x 2 (1 tl) k x 0 x 2 ( L l ) k x0 = x 2 L + l ( κ 1 ) k x0 = x 2 κ + 1 1st line : by definition of GD x k+1 = x k t k x f(x k ) 4th line : by operator norm inequality Ax 2 A 2 x 2 5th line : by li A LI = (1 t k L)I I t k A (1 t k l)i 6th line : if constant step size is used t k = t 7th line : pick t = 2 L l L+l, 1 tl = L+l 8th line : κ = L l 1 is the conditional number of A. 7 / 22

8 Polyak s Heavy Ball Method (HBM) HBM adds a momentum term in GD x k+1 = x k α k x f(x k ) + β k (x k x k 1 ) }{{} HBM momentum i.e. gradient descent with momentum β k (x k x k 1 ) β k 0 is the momentum parameter / extrapolation parameter α k acts as the step size t k in GD When β k = 0, HBM reduces to GD As update direction is perturbed by the momentum, HBM is not monotone : objective function value may increase However, overall speaking HBM converges faster than GD (Will prove it soon) 8 / 22

9 Comparing GD, HBM and Nesterov s acceleration Compared to HBM, Nesterov s accelerated gradient compute the gradient after applying the momentum x k+1 = x k α k x f ( x k + β k (x k x k 1 ) ) + β k (x k x k 1 ) Consider the following notations : Then x + = x t f(x) a k = β k (x k x k 1 ) Open question : may be? Cauchy Gradient Descent x k+1 = x + k Polyak HBM x k+1 = x + k + a k Nesterov acceleration x k+1 = (x k + a k ) + x k+1 = (x k + a k ) + + b k, x k+1 = ( (x k + a k ) + + b k ) +,... 9 / 22

10 Convergence of Heavy Ball Method Consider x k+1 x. By definition of HBM update : x k+1 x = x k α k x f(x k ) + β k (x k x k 1 ) x As x f(x k ) = Ax k b and b = Ax, we have x f(x k ) = Ax k Ax x k+1 x = x k α k (Ax k Ax ) + β k (x k x k 1 ) x = x k x α k A(x k x ) + β k (x k x k 1 ) = (I α k A)(x k x ) + β k (x k x k 1 x + x ) = (I α k A)(x k x ) β k (x k 1 x ) + β(x k x ) ( ) = (1 + β k )I α k A (x k x ) β k (x k 1 x ) In this sense, we have to consider x k x and x k 1 x at the same time [ xk+1 x ] [ ] [ (1 + βk )I α x k x = k A β k I xk x ] I 0 x k 1 x, }{{} T k (α,β) T k (α, β) is the transition matrix 10 / 22

11 Convergence of Heavy Ball Method - Transition matrix T Compact expression [ xk+1 x ] x k x [ xk x = T k (α, β) ] x k 1 x Take constant constant α k and β k in T k, so T k = T and [ xk+1 x ] [ x k x = T k x1 x ] x 0 x Take norm [ ] xk+1 x x k x 2 [ = x1 x ] Tk x 0 x 2 T k 2 [ x1 x x 0 x ] So if T k 2 is bounded, the series x k produced by HBM converges / 22

12 Tools for bounding T k 2 Recall Spectrum (all eigenvalues) of a block diagonal matrix are the eigenvalues of the block submatrices. Spectrum of a matrix are the roots of characteristic equation. For 2-by-2 matrix, the characteristic equation is in the form ax 2 + bx + c = 0 with roots x = 1 2 ( b ± b 2 4ac). The roots are complex conjugate if = b 2 4ac 0 Complex roots in the form a + ib share same magnitude as a 2 + b 2 We need two lemmas Lemma 1. For a n n matrix T, there exists a sequences ε k 0 that T k (ρ(t) + ε k ) k Lemma 2. For β > (1 αl) 2, ρ(t) < β. where ρ(t) = max{ λ 1, λ 2,..., λ n } is the spectral radius of matrix T, and λ i are the eigenvalues of T 12 / 22

13 The logic flow of bounding T k 2 Ultimate goal : show x k produced by HBM converges to x x k produced by HBM converges to x if T k 2 is bounded We use lemma 1 to bound T k. To use lemma 1, we need ρ(t), which we use lemma 2 We will not prove lemma 1 but lemma 2. Lemma 1. For a n n matrix T, there exists a sequences ε k 0 that where lim k ε k = 0. Proof. Skipped (too long). T k (ρ(t) + ε k ) k 13 / 22

14 The logic of bounding T k 2 Lemma 2. For β > (1 αl) 2, ρ(t) < β. Flow of proving lemma 2 : First show T can be decomposed into blocks T i Then spectrum of T are the eigenvalues of T i As ρ(t) is considering on the magnitude of eigenvalues, so we consider the magnitude of the eigenvalues of T i T i is 2-by-2 matrix, so the eigenvalues are the root of characteristic equation in the form ax 2 + bx + c = 0 Roots of ax 2 + bx + c = 0 are complex conjugate that share the same magnitude if b 2 4ac 14 / 22

15 Proving lemma 2 - eigendecomposition Lemma 2. For β > (1 αl) 2, ρ(t) < β. Proof. First assume β (1 αl) 2. As A is pd, A has eigendecomposition as VΛV. Then [ ] (1 + βk )I α T = k A β k I I 0 [ (1 + βk )I α = k VΛV ] β k I I 0 As T is diagonal and V forms a basis, so [ ] T = (1 + βk )I α k Λ β k I I 0 15 / 22

16 Proving lemma 2 - block decomposition of T Note T is block diagonal, it can be decomposed into blocks. e.g. when n = 2, let x i = 1 + β αλ i x 1 β T = x 2 β 1 As norm is invariant under permutation, we can swap : row 3 row 2, column 3 column 2 x 1 β x 1 x 2 β 1 β 1 x β We can see T can be decomposed into a block diagonal matrix consists of 2 2 component T i as 1 [ ] T = T T β αλi β, where T i = / 22

17 Proving lemma 2 - block decomposition of T Structure of T in general After transformation Hence, spectrum of T = all the eigenvalues of all T i 17 / 22

18 Proving lemma 2 - spectrum of T = eigenvalues of T i T i is 2-by-2 = we use the roots of the characteristic equation to find its eigenvalues, which is to solve det(t i ui) = 0 : ( ) 1 + β αλi u β det = 0 u 2 (1 + β αλ 1 u i )u + β = 0 u = 1 ( 1 + β αλ i ± ) (1 + β αλ i ) 2 2 4β Magnitude of roots are the same iff the roots are complex : Let = (1 + β αλ i ) 2 4β. Then is imaginary 0. (1 + β αλ i ) 2 4β 1 + β αλ i 2 β 1 2 β + β αλ i (1 β) 2 αλ i 1 β αλ i 1 αλ i β β (1 αλ i ) 2 18 / 22

19 Proving lemma 2 - Complex roots β (1 αλ i ) 2 Note : β (1 αλ i ) 2 is automatically satisfied due to assumptions β (1 αl) 2 and A LI Then we have 1 + β αλ i 1 + (1 αλ i ) 2 αλ i = 1 + (1 2 αλ i + αλ i ) αλ i = 2(1 αλ i ) Hence u = 1 ( 1 + β αλ i ± ) (1 + β αλ i ) 2 2 4β 1 ( 2(1 αλ i ) ± (2(1 ) αλ i )) 2 2 4β ( = 1 αλ i ± (1 ) αλ i ) 2 β As β (1 αλ i ) 2, so (1 αλ i ) 2 β 0 and (1 αλ i ) 2 β is imaginary Thus the roots u will be complex number in the form a ± ib, where a = 1 αλ i and b = (1 αλ i ) 2 β 19 / 22

20 Proving lemma 2 - magnitude of u β The magnitude of u in the form of a + ib is a 2 + b 2 u = (1 αλ i ) 2 + (1 αλ i ) 2 β β + β β = β So the magnitude of eigenvalues of T i ( i) are less than β By assumption β > (1 αl) 2, we have β > 1 αl and β β Therefore, the largest eigenvalue (spectral radius) of T β. And we finish the proof of Lemma / 22

21 Convergence of HBM Assume β (1 αl), by lemma 2 we have ρ(t) = max λ i (T) β. By lemma 1, we have T k (ρ(t) + ε k ) k with lim k ε k = 0 Put lemma 2 into lemma 1 we have T k (β + ε k ) k 4 L l Lastly, let α = ( L + l) and β = in T we have 2 L + l [ ] ( xk+1 x ) L l k [ x1 x ] x k x + ε L + l x 0 x or x k x ( κ 1 κ ε) k x 0 x where κ = L l 21 / 22

22 Last page - summary Gradient descent x k+1 = x k t k x f(x k ) has convergence x k x ( κ 1 κ ε ) k x0 x Heavy Ball Method x k+1 = x k α k x f(x k ) + β k (x k x k 1 ) has convergence ( x k x κ 1 k x0 + ε) x κ + 1 ( κ 1 ) k ( Improvement from κ ε = 1 2 ) k κ ε to ( κ 1 ) k ( 2 ) k + ε = 1 + ε κ + 1 κ + 1 End of document 22 / 22

Continuity of the Perron Root

Continuity of the Perron Root Linear and Multilinear Algebra http://dx.doi.org/10.1080/03081087.2014.934233 ArXiv: 1407.7564 (http://arxiv.org/abs/1407.7564) Continuity of the Perron Root Carl D. Meyer Department of Mathematics, North

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

1 Norms and Vector Spaces

1 Norms and Vector Spaces 008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)

More information

SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)

SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA) SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI (B.Sc.(Hons.), BUAA) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF MATHEMATICS NATIONAL

More information

Lecture 5 Principal Minors and the Hessian

Lecture 5 Principal Minors and the Hessian Lecture 5 Principal Minors and the Hessian Eivind Eriksen BI Norwegian School of Management Department of Economics October 01, 2010 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and

More information

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

More information

Mathematics. (www.tiwariacademy.com : Focus on free Education) (Chapter 5) (Complex Numbers and Quadratic Equations) (Class XI)

Mathematics. (www.tiwariacademy.com : Focus on free Education) (Chapter 5) (Complex Numbers and Quadratic Equations) (Class XI) ( : Focus on free Education) Miscellaneous Exercise on chapter 5 Question 1: Evaluate: Answer 1: 1 ( : Focus on free Education) Question 2: For any two complex numbers z1 and z2, prove that Re (z1z2) =

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

7 Gaussian Elimination and LU Factorization

7 Gaussian Elimination and LU Factorization 7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method

More information

Nonlinear Algebraic Equations Example

Nonlinear Algebraic Equations Example Nonlinear Algebraic Equations Example Continuous Stirred Tank Reactor (CSTR). Look for steady state concentrations & temperature. s r (in) p,i (in) i In: N spieces with concentrations c, heat capacities

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

Factorization Theorems

Factorization Theorems Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

More information

Chapter 6. Orthogonality

Chapter 6. Orthogonality 6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be

More information

Similar matrices and Jordan form

Similar matrices and Jordan form Similar matrices and Jordan form We ve nearly covered the entire heart of linear algebra once we ve finished singular value decompositions we ll have seen all the most central topics. A T A is positive

More information

Notes on Symmetric Matrices

Notes on Symmetric Matrices CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.

More information

Applied Linear Algebra I Review page 1

Applied Linear Algebra I Review page 1 Applied Linear Algebra Review 1 I. Determinants A. Definition of a determinant 1. Using sum a. Permutations i. Sign of a permutation ii. Cycle 2. Uniqueness of the determinant function in terms of properties

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Lecture 13 Linear quadratic Lyapunov theory

Lecture 13 Linear quadratic Lyapunov theory EE363 Winter 28-9 Lecture 13 Linear quadratic Lyapunov theory the Lyapunov equation Lyapunov stability conditions the Lyapunov operator and integral evaluating quadratic integrals analysis of ARE discrete-time

More information

A note on companion matrices

A note on companion matrices Linear Algebra and its Applications 372 (2003) 325 33 www.elsevier.com/locate/laa A note on companion matrices Miroslav Fiedler Academy of Sciences of the Czech Republic Institute of Computer Science Pod

More information

Solutions of Equations in One Variable. Fixed-Point Iteration II

Solutions of Equations in One Variable. Fixed-Point Iteration II Solutions of Equations in One Variable Fixed-Point Iteration II Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011

More information

5. Orthogonal matrices

5. Orthogonal matrices L Vandenberghe EE133A (Spring 2016) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 5-1 Orthonormal

More information

Computing a Nearest Correlation Matrix with Factor Structure

Computing a Nearest Correlation Matrix with Factor Structure Computing a Nearest Correlation Matrix with Factor Structure Nick Higham School of Mathematics The University of Manchester higham@ma.man.ac.uk http://www.ma.man.ac.uk/~higham/ Joint work with Rüdiger

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

Lecture 4: Partitioned Matrices and Determinants

Lecture 4: Partitioned Matrices and Determinants Lecture 4: Partitioned Matrices and Determinants 1 Elementary row operations Recall the elementary operations on the rows of a matrix, equivalent to premultiplying by an elementary matrix E: (1) multiplying

More information

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS KEITH CONRAD 1. Introduction The Fundamental Theorem of Algebra says every nonconstant polynomial with complex coefficients can be factored into linear

More information

10. Proximal point method

10. Proximal point method L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing

More information

Numerical Methods I Eigenvalue Problems

Numerical Methods I Eigenvalue Problems Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001, Fall 2010 September 30th, 2010 A. Donev (Courant Institute)

More information

Vector and Matrix Norms

Vector and Matrix Norms Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

More information

Notes on Determinant

Notes on Determinant ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without

More information

Section 6.1 - Inner Products and Norms

Section 6.1 - Inner Products and Norms Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

Cost Minimization and the Cost Function

Cost Minimization and the Cost Function Cost Minimization and the Cost Function Juan Manuel Puerta October 5, 2009 So far we focused on profit maximization, we could look at a different problem, that is the cost minimization problem. This is

More information

0 <β 1 let u(x) u(y) kuk u := sup u(x) and [u] β := sup

0 <β 1 let u(x) u(y) kuk u := sup u(x) and [u] β := sup 456 BRUCE K. DRIVER 24. Hölder Spaces Notation 24.1. Let Ω be an open subset of R d,bc(ω) and BC( Ω) be the bounded continuous functions on Ω and Ω respectively. By identifying f BC( Ω) with f Ω BC(Ω),

More information

Inner Product Spaces and Orthogonality

Inner Product Spaces and Orthogonality Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,

More information

Ideal Class Group and Units

Ideal Class Group and Units Chapter 4 Ideal Class Group and Units We are now interested in understanding two aspects of ring of integers of number fields: how principal they are (that is, what is the proportion of principal ideals

More information

(67902) Topics in Theory and Complexity Nov 2, 2006. Lecture 7

(67902) Topics in Theory and Complexity Nov 2, 2006. Lecture 7 (67902) Topics in Theory and Complexity Nov 2, 2006 Lecturer: Irit Dinur Lecture 7 Scribe: Rani Lekach 1 Lecture overview This Lecture consists of two parts In the first part we will refresh the definition

More information

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001,

More information

BANACH AND HILBERT SPACE REVIEW

BANACH AND HILBERT SPACE REVIEW BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but

More information

Solving Linear Systems, Continued and The Inverse of a Matrix

Solving Linear Systems, Continued and The Inverse of a Matrix , Continued and The of a Matrix Calculus III Summer 2013, Session II Monday, July 15, 2013 Agenda 1. The rank of a matrix 2. The inverse of a square matrix Gaussian Gaussian solves a linear system by reducing

More information

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d). 1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

More information

n k=1 k=0 1/k! = e. Example 6.4. The series 1/k 2 converges in R. Indeed, if s n = n then k=1 1/k, then s 2n s n = 1 n + 1 +...

n k=1 k=0 1/k! = e. Example 6.4. The series 1/k 2 converges in R. Indeed, if s n = n then k=1 1/k, then s 2n s n = 1 n + 1 +... 6 Series We call a normed space (X, ) a Banach space provided that every Cauchy sequence (x n ) in X converges. For example, R with the norm = is an example of Banach space. Now let (x n ) be a sequence

More information

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS Revised Edition James Epperson Mathematical Reviews BICENTENNIAL 0, 1 8 0 7 z ewiley wu 2007 r71 BICENTENNIAL WILEY-INTERSCIENCE A John Wiley & Sons, Inc.,

More information

A characterization of trace zero symmetric nonnegative 5x5 matrices

A characterization of trace zero symmetric nonnegative 5x5 matrices A characterization of trace zero symmetric nonnegative 5x5 matrices Oren Spector June 1, 009 Abstract The problem of determining necessary and sufficient conditions for a set of real numbers to be the

More information

October 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix

October 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix Linear Algebra & Properties of the Covariance Matrix October 3rd, 2012 Estimation of r and C Let rn 1, rn, t..., rn T be the historical return rates on the n th asset. rn 1 rṇ 2 r n =. r T n n = 1, 2,...,

More information

Lecture 5: Singular Value Decomposition SVD (1)

Lecture 5: Singular Value Decomposition SVD (1) EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system

More information

PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include 2 + 5.

PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include 2 + 5. PUTNAM TRAINING POLYNOMIALS (Last updated: November 17, 2015) Remark. This is a list of exercises on polynomials. Miguel A. Lerma Exercises 1. Find a polynomial with integral coefficients whose zeros include

More information

Advanced Microeconomics

Advanced Microeconomics Advanced Microeconomics Ordinal preference theory Harald Wiese University of Leipzig Harald Wiese (University of Leipzig) Advanced Microeconomics 1 / 68 Part A. Basic decision and preference theory 1 Decisions

More information

Examination paper for TMA4205 Numerical Linear Algebra

Examination paper for TMA4205 Numerical Linear Algebra Department of Mathematical Sciences Examination paper for TMA4205 Numerical Linear Algebra Academic contact during examination: Markus Grasmair Phone: 97580435 Examination date: December 16, 2015 Examination

More information

Inner products on R n, and more

Inner products on R n, and more Inner products on R n, and more Peyam Ryan Tabrizian Friday, April 12th, 2013 1 Introduction You might be wondering: Are there inner products on R n that are not the usual dot product x y = x 1 y 1 + +

More information

by the matrix A results in a vector which is a reflection of the given

by the matrix A results in a vector which is a reflection of the given Eigenvalues & Eigenvectors Example Suppose Then So, geometrically, multiplying a vector in by the matrix A results in a vector which is a reflection of the given vector about the y-axis We observe that

More information

Date: April 12, 2001. Contents

Date: April 12, 2001. Contents 2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

More information

ALGEBRAIC EIGENVALUE PROBLEM

ALGEBRAIC EIGENVALUE PROBLEM ALGEBRAIC EIGENVALUE PROBLEM BY J. H. WILKINSON, M.A. (Cantab.), Sc.D. Technische Universes! Dsrmstedt FACHBEREICH (NFORMATiK BIBL1OTHEK Sachgebieto:. Standort: CLARENDON PRESS OXFORD 1965 Contents 1.

More information

A number field is a field of finite degree over Q. By the Primitive Element Theorem, any number

A number field is a field of finite degree over Q. By the Primitive Element Theorem, any number Number Fields Introduction A number field is a field of finite degree over Q. By the Primitive Element Theorem, any number field K = Q(α) for some α K. The minimal polynomial Let K be a number field and

More information

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen (für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained

More information

Mathematical Methods of Engineering Analysis

Mathematical Methods of Engineering Analysis Mathematical Methods of Engineering Analysis Erhan Çinlar Robert J. Vanderbei February 2, 2000 Contents Sets and Functions 1 1 Sets................................... 1 Subsets.............................

More information

GI01/M055 Supervised Learning Proximal Methods

GI01/M055 Supervised Learning Proximal Methods GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators

More information

Inner product. Definition of inner product

Inner product. Definition of inner product Math 20F Linear Algebra Lecture 25 1 Inner product Review: Definition of inner product. Slide 1 Norm and distance. Orthogonal vectors. Orthogonal complement. Orthogonal basis. Definition of inner product

More information

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725 Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

More information

13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions. 3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in three-space, we write a vector in terms

More information

Iterative Methods for Solving Linear Systems

Iterative Methods for Solving Linear Systems Chapter 5 Iterative Methods for Solving Linear Systems 5.1 Convergence of Sequences of Vectors and Matrices In Chapter 2 we have discussed some of the main methods for solving systems of linear equations.

More information

Notes on metric spaces

Notes on metric spaces Notes on metric spaces 1 Introduction The purpose of these notes is to quickly review some of the basic concepts from Real Analysis, Metric Spaces and some related results that will be used in this course.

More information

Duality of linear conic problems

Duality of linear conic problems Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least

More information

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every

More information

IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL. 1. Introduction

IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL. 1. Introduction IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL R. DRNOVŠEK, T. KOŠIR Dedicated to Prof. Heydar Radjavi on the occasion of his seventieth birthday. Abstract. Let S be an irreducible

More information

Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

More information

3. Reaction Diffusion Equations Consider the following ODE model for population growth

3. Reaction Diffusion Equations Consider the following ODE model for population growth 3. Reaction Diffusion Equations Consider the following ODE model for population growth u t a u t u t, u 0 u 0 where u t denotes the population size at time t, and a u plays the role of the population dependent

More information

Metric Spaces. Chapter 7. 7.1. Metrics

Metric Spaces. Chapter 7. 7.1. Metrics Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some

More information

Derivative Free Optimization

Derivative Free Optimization Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTH-MAT-R--2014/02--SE Department of Mathematics Linköping University S-581 83 Linköping, Sweden. Three lectures 1 on Derivative Free

More information

6. Cholesky factorization

6. Cholesky factorization 6. Cholesky factorization EE103 (Fall 2011-12) triangular matrices forward and backward substitution the Cholesky factorization solving Ax = b with A positive definite inverse of a positive definite matrix

More information

A FIRST COURSE IN OPTIMIZATION THEORY

A FIRST COURSE IN OPTIMIZATION THEORY A FIRST COURSE IN OPTIMIZATION THEORY RANGARAJAN K. SUNDARAM New York University CAMBRIDGE UNIVERSITY PRESS Contents Preface Acknowledgements page xiii xvii 1 Mathematical Preliminaries 1 1.1 Notation

More information

Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number

More information

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1 (R) User Manual Environmental Energy Technologies Division Berkeley, CA 94720 http://simulationresearch.lbl.gov Michael Wetter MWetter@lbl.gov February 20, 2009 Notice: This work was supported by the U.S.

More information

Chapter 5. Banach Spaces

Chapter 5. Banach Spaces 9 Chapter 5 Banach Spaces Many linear equations may be formulated in terms of a suitable linear operator acting on a Banach space. In this chapter, we study Banach spaces and linear operators acting on

More information

1. Let P be the space of all polynomials (of one real variable and with real coefficients) with the norm

1. Let P be the space of all polynomials (of one real variable and with real coefficients) with the norm Uppsala Universitet Matematiska Institutionen Andreas Strömbergsson Prov i matematik Funktionalanalys Kurs: F3B, F4Sy, NVP 005-06-15 Skrivtid: 9 14 Tillåtna hjälpmedel: Manuella skrivdon, Kreyszigs bok

More information

1 Local Brouwer degree

1 Local Brouwer degree 1 Local Brouwer degree Let D R n be an open set and f : S R n be continuous, D S and c R n. Suppose that the set f 1 (c) D is compact. (1) Then the local Brouwer degree of f at c in the set D is defined.

More information

[1] Diagonal factorization

[1] Diagonal factorization 8.03 LA.6: Diagonalization and Orthogonal Matrices [ Diagonal factorization [2 Solving systems of first order differential equations [3 Symmetric and Orthonormal Matrices [ Diagonal factorization Recall:

More information

Elementary Gradient-Based Parameter Estimation

Elementary Gradient-Based Parameter Estimation Elementary Gradient-Based Parameter Estimation Julius O. Smith III Center for Computer Research in Music and Acoustics (CCRMA Department of Music, Stanford University, Stanford, California 94305 USA Abstract

More information

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:

More information

A Distributed Line Search for Network Optimization

A Distributed Line Search for Network Optimization 01 American Control Conference Fairmont Queen Elizabeth, Montréal, Canada June 7-June 9, 01 A Distributed Line Search for Networ Optimization Michael Zargham, Alejandro Ribeiro, Ali Jadbabaie Abstract

More information

The NEWUOA software for unconstrained optimization without derivatives 1. M.J.D. Powell

The NEWUOA software for unconstrained optimization without derivatives 1. M.J.D. Powell DAMTP 2004/NA05 The NEWUOA software for unconstrained optimization without derivatives 1 M.J.D. Powell Abstract: The NEWUOA software seeks the least value of a function F(x), x R n, when F(x) can be calculated

More information

FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES

FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES CHRISTOPHER HEIL 1. Cosets and the Quotient Space Any vector space is an abelian group under the operation of vector addition. So, if you are have studied

More information

Principles of Scientific Computing Nonlinear Equations and Optimization

Principles of Scientific Computing Nonlinear Equations and Optimization Principles of Scientific Computing Nonlinear Equations and Optimization David Bindel and Jonathan Goodman last revised March 6, 2006, printed March 6, 2009 1 1 Introduction This chapter discusses two related

More information

Nonlinear Programming Methods.S2 Quadratic Programming

Nonlinear Programming Methods.S2 Quadratic Programming Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective

More information

The Method of Partial Fractions Math 121 Calculus II Spring 2015

The Method of Partial Fractions Math 121 Calculus II Spring 2015 Rational functions. as The Method of Partial Fractions Math 11 Calculus II Spring 015 Recall that a rational function is a quotient of two polynomials such f(x) g(x) = 3x5 + x 3 + 16x x 60. The method

More information

CONTROLLABILITY. Chapter 2. 2.1 Reachable Set and Controllability. Suppose we have a linear system described by the state equation

CONTROLLABILITY. Chapter 2. 2.1 Reachable Set and Controllability. Suppose we have a linear system described by the state equation Chapter 2 CONTROLLABILITY 2 Reachable Set and Controllability Suppose we have a linear system described by the state equation ẋ Ax + Bu (2) x() x Consider the following problem For a given vector x in

More information

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = 36 + 41i.

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = 36 + 41i. Math 5A HW4 Solutions September 5, 202 University of California, Los Angeles Problem 4..3b Calculate the determinant, 5 2i 6 + 4i 3 + i 7i Solution: The textbook s instructions give us, (5 2i)7i (6 + 4i)(

More information

The Ideal Class Group

The Ideal Class Group Chapter 5 The Ideal Class Group We will use Minkowski theory, which belongs to the general area of geometry of numbers, to gain insight into the ideal class group of a number field. We have already mentioned

More information

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all. 1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of

More information

Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.

Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U. Summer course on Convex Optimization Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.Minnesota Interior-Point Methods: the rebirth of an old idea Suppose that f is

More information

LS.6 Solution Matrices

LS.6 Solution Matrices LS.6 Solution Matrices In the literature, solutions to linear systems often are expressed using square matrices rather than vectors. You need to get used to the terminology. As before, we state the definitions

More information

Direct Methods for Solving Linear Systems. Matrix Factorization

Direct Methods for Solving Linear Systems. Matrix Factorization Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011

More information

MATH 551 - APPLIED MATRIX THEORY

MATH 551 - APPLIED MATRIX THEORY MATH 55 - APPLIED MATRIX THEORY FINAL TEST: SAMPLE with SOLUTIONS (25 points NAME: PROBLEM (3 points A web of 5 pages is described by a directed graph whose matrix is given by A Do the following ( points

More information

24. The Branch and Bound Method

24. The Branch and Bound Method 24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no

More information

Notes V General Equilibrium: Positive Theory. 1 Walrasian Equilibrium and Excess Demand

Notes V General Equilibrium: Positive Theory. 1 Walrasian Equilibrium and Excess Demand Notes V General Equilibrium: Positive Theory In this lecture we go on considering a general equilibrium model of a private ownership economy. In contrast to the Notes IV, we focus on positive issues such

More information

We shall turn our attention to solving linear systems of equations. Ax = b

We shall turn our attention to solving linear systems of equations. Ax = b 59 Linear Algebra We shall turn our attention to solving linear systems of equations Ax = b where A R m n, x R n, and b R m. We already saw examples of methods that required the solution of a linear system

More information

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued). MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors Jordan canonical form (continued) Jordan canonical form A Jordan block is a square matrix of the form λ 1 0 0 0 0 λ 1 0 0 0 0 λ 0 0 J = 0

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

1 Lecture: Integration of rational functions by decomposition

1 Lecture: Integration of rational functions by decomposition Lecture: Integration of rational functions by decomposition into partial fractions Recognize and integrate basic rational functions, except when the denominator is a power of an irreducible quadratic.

More information