# Heavy ball method on convex quadratic problem 1 min

Size: px
Start display at page:

Transcription

1 Heavy ball method on convex quadratic problem 1 min x 2 x Ax b x A case study Andersen Ang Mathématique et recherche opérationnelle UMONS, Belgium First draft: June 26, 2018 Last update : July 28, 2019 Homepage: angms.science

2 Overview 1 1 Convex Quadratic Problem min x 2 x Ax b x 2 Gradient Descent and convergence rate 3 Polyak s Heavy Ball Method 4 Convergence of Heavy Ball Method 5 Summary 2 / 22

3 An inverse problem / unconstrained optimization problem Given A R n n, b R n 1, find x R n 1 by 1 (P 0 ) : min x 2 Ax b 2 2 (P 0 ) is equivalent to the quadratic problem min f(x) = 1 x 2 x A x b x. 1 2 Ax b 2 2 = 1 2 (Ax b) (Ax b) (expand) = 1 ( ) x A Ax x A b b Ax + b b 2 (a b = b a) = 1 ( ) x A Ax 2b Ax + b (A = A A, b = A b) = 1 2 x A x b x b 2 2 Ignoring constant 1 2 b 2 2, denote A as A and b as b, we now focus on the equivalent problem (P) : min x f(x) = 1 2 x Ax b x. 3 / 22

4 The convex quadratic problem (P) : min x f(x) = 1 2 x Ax b x. Properties of f : f is convex with respect to (w.r.t) x f is differentiable w.r.t. x First order derivative (gradient) : xf(x) = Ax b Second order derivative (Hessian) : 2 xf(x) = A Assumption 1 : A is positive definite and symmetric Consequence of the assumption : all eigenvalues of A are positive A is nonsingular = optimal solution x exists, which is x = A 1 b We can further assume li A LI 4 / 22

5 Gradient descent GD solves (P) : minf(x) = 1 x 2 x Ax b x by generating the sequence {x k } k IN : x k+1 = x k t k x f(x k ) where k is iteration (k = 1, 2,...) and t k is step size The sequence x k converges to x at a linear rate (in optimization). The convergence is illustrated by showing the distance function x k x is monotonically decreasing as k increases, under a suitable step size t k Theorem (GD converge at linear rate) Consider the problem min 2 x Ax b x with A being pd and li A LI, we have ( κ 1 ) k x0 x k x 2 x 2 κ + 1 x 1 where κ is the conditional number of A : i.e. κ = L l 5 / 22

6 Useful material before the proof As x = A 1 b, we have b = Ax (1) As x f(x) = Ax b, we have Put (1) into (2) we have x k t k x f(x k ) = x k t k (Ax k b) (2) x k t k x f(x k ) = x k t k (Ax k Ax ) = (I t k A)x k + t k Ax (3) With these we can now prove the convergence, starting with the distance function x k+1 x 2 6 / 22

7 Convergence rate of Gradient Descent in 1 slide x k+1 x 2 = x k t k x f(x k ) x 2 by (3) = (I t k A)x k + t k Ax x 2 = (I t k A)(x k x ) 2 I t k A 2 x k x 2 (1 t k l) x k x 2 (1 tl) k x 0 x 2 ( L l ) k x0 = x 2 L + l ( κ 1 ) k x0 = x 2 κ + 1 1st line : by definition of GD x k+1 = x k t k x f(x k ) 4th line : by operator norm inequality Ax 2 A 2 x 2 5th line : by li A LI = (1 t k L)I I t k A (1 t k l)i 6th line : if constant step size is used t k = t 7th line : pick t = 2 L l L+l, 1 tl = L+l 8th line : κ = L l 1 is the conditional number of A. 7 / 22

8 Polyak s Heavy Ball Method (HBM) HBM adds a momentum term in GD x k+1 = x k α k x f(x k ) + β k (x k x k 1 ) }{{} HBM momentum i.e. gradient descent with momentum β k (x k x k 1 ) β k 0 is the momentum parameter / extrapolation parameter α k acts as the step size t k in GD When β k = 0, HBM reduces to GD As update direction is perturbed by the momentum, HBM is not monotone : objective function value may increase However, overall speaking HBM converges faster than GD (Will prove it soon) 8 / 22

9 Comparing GD, HBM and Nesterov s acceleration Compared to HBM, Nesterov s accelerated gradient compute the gradient after applying the momentum x k+1 = x k α k x f ( x k + β k (x k x k 1 ) ) + β k (x k x k 1 ) Consider the following notations : Then x + = x t f(x) a k = β k (x k x k 1 ) Open question : may be? Cauchy Gradient Descent x k+1 = x + k Polyak HBM x k+1 = x + k + a k Nesterov acceleration x k+1 = (x k + a k ) + x k+1 = (x k + a k ) + + b k, x k+1 = ( (x k + a k ) + + b k ) +,... 9 / 22

10 Convergence of Heavy Ball Method Consider x k+1 x. By definition of HBM update : x k+1 x = x k α k x f(x k ) + β k (x k x k 1 ) x As x f(x k ) = Ax k b and b = Ax, we have x f(x k ) = Ax k Ax x k+1 x = x k α k (Ax k Ax ) + β k (x k x k 1 ) x = x k x α k A(x k x ) + β k (x k x k 1 ) = (I α k A)(x k x ) + β k (x k x k 1 x + x ) = (I α k A)(x k x ) β k (x k 1 x ) + β(x k x ) ( ) = (1 + β k )I α k A (x k x ) β k (x k 1 x ) In this sense, we have to consider x k x and x k 1 x at the same time [ xk+1 x ] [ ] [ (1 + βk )I α x k x = k A β k I xk x ] I 0 x k 1 x, }{{} T k (α,β) T k (α, β) is the transition matrix 10 / 22

11 Convergence of Heavy Ball Method - Transition matrix T Compact expression [ xk+1 x ] x k x [ xk x = T k (α, β) ] x k 1 x Take constant constant α k and β k in T k, so T k = T and [ xk+1 x ] [ x k x = T k x1 x ] x 0 x Take norm [ ] xk+1 x x k x 2 [ = x1 x ] Tk x 0 x 2 T k 2 [ x1 x x 0 x ] So if T k 2 is bounded, the series x k produced by HBM converges / 22

12 Tools for bounding T k 2 Recall Spectrum (all eigenvalues) of a block diagonal matrix are the eigenvalues of the block submatrices. Spectrum of a matrix are the roots of characteristic equation. For 2-by-2 matrix, the characteristic equation is in the form ax 2 + bx + c = 0 with roots x = 1 2 ( b ± b 2 4ac). The roots are complex conjugate if = b 2 4ac 0 Complex roots in the form a + ib share same magnitude as a 2 + b 2 We need two lemmas Lemma 1. For a n n matrix T, there exists a sequences ε k 0 that T k (ρ(t) + ε k ) k Lemma 2. For β > (1 αl) 2, ρ(t) < β. where ρ(t) = max{ λ 1, λ 2,..., λ n } is the spectral radius of matrix T, and λ i are the eigenvalues of T 12 / 22

13 The logic flow of bounding T k 2 Ultimate goal : show x k produced by HBM converges to x x k produced by HBM converges to x if T k 2 is bounded We use lemma 1 to bound T k. To use lemma 1, we need ρ(t), which we use lemma 2 We will not prove lemma 1 but lemma 2. Lemma 1. For a n n matrix T, there exists a sequences ε k 0 that where lim k ε k = 0. Proof. Skipped (too long). T k (ρ(t) + ε k ) k 13 / 22

14 The logic of bounding T k 2 Lemma 2. For β > (1 αl) 2, ρ(t) < β. Flow of proving lemma 2 : First show T can be decomposed into blocks T i Then spectrum of T are the eigenvalues of T i As ρ(t) is considering on the magnitude of eigenvalues, so we consider the magnitude of the eigenvalues of T i T i is 2-by-2 matrix, so the eigenvalues are the root of characteristic equation in the form ax 2 + bx + c = 0 Roots of ax 2 + bx + c = 0 are complex conjugate that share the same magnitude if b 2 4ac 14 / 22

15 Proving lemma 2 - eigendecomposition Lemma 2. For β > (1 αl) 2, ρ(t) < β. Proof. First assume β (1 αl) 2. As A is pd, A has eigendecomposition as VΛV. Then [ ] (1 + βk )I α T = k A β k I I 0 [ (1 + βk )I α = k VΛV ] β k I I 0 As T is diagonal and V forms a basis, so [ ] T = (1 + βk )I α k Λ β k I I 0 15 / 22

16 Proving lemma 2 - block decomposition of T Note T is block diagonal, it can be decomposed into blocks. e.g. when n = 2, let x i = 1 + β αλ i x 1 β T = x 2 β 1 As norm is invariant under permutation, we can swap : row 3 row 2, column 3 column 2 x 1 β x 1 x 2 β 1 β 1 x β We can see T can be decomposed into a block diagonal matrix consists of 2 2 component T i as 1 [ ] T = T T β αλi β, where T i = / 22

17 Proving lemma 2 - block decomposition of T Structure of T in general After transformation Hence, spectrum of T = all the eigenvalues of all T i 17 / 22

18 Proving lemma 2 - spectrum of T = eigenvalues of T i T i is 2-by-2 = we use the roots of the characteristic equation to find its eigenvalues, which is to solve det(t i ui) = 0 : ( ) 1 + β αλi u β det = 0 u 2 (1 + β αλ 1 u i )u + β = 0 u = 1 ( 1 + β αλ i ± ) (1 + β αλ i ) 2 2 4β Magnitude of roots are the same iff the roots are complex : Let = (1 + β αλ i ) 2 4β. Then is imaginary 0. (1 + β αλ i ) 2 4β 1 + β αλ i 2 β 1 2 β + β αλ i (1 β) 2 αλ i 1 β αλ i 1 αλ i β β (1 αλ i ) 2 18 / 22

19 Proving lemma 2 - Complex roots β (1 αλ i ) 2 Note : β (1 αλ i ) 2 is automatically satisfied due to assumptions β (1 αl) 2 and A LI Then we have 1 + β αλ i 1 + (1 αλ i ) 2 αλ i = 1 + (1 2 αλ i + αλ i ) αλ i = 2(1 αλ i ) Hence u = 1 ( 1 + β αλ i ± ) (1 + β αλ i ) 2 2 4β 1 ( 2(1 αλ i ) ± (2(1 ) αλ i )) 2 2 4β ( = 1 αλ i ± (1 ) αλ i ) 2 β As β (1 αλ i ) 2, so (1 αλ i ) 2 β 0 and (1 αλ i ) 2 β is imaginary Thus the roots u will be complex number in the form a ± ib, where a = 1 αλ i and b = (1 αλ i ) 2 β 19 / 22

20 Proving lemma 2 - magnitude of u β The magnitude of u in the form of a + ib is a 2 + b 2 u = (1 αλ i ) 2 + (1 αλ i ) 2 β β + β β = β So the magnitude of eigenvalues of T i ( i) are less than β By assumption β > (1 αl) 2, we have β > 1 αl and β β Therefore, the largest eigenvalue (spectral radius) of T β. And we finish the proof of Lemma / 22

21 Convergence of HBM Assume β (1 αl), by lemma 2 we have ρ(t) = max λ i (T) β. By lemma 1, we have T k (ρ(t) + ε k ) k with lim k ε k = 0 Put lemma 2 into lemma 1 we have T k (β + ε k ) k 4 L l Lastly, let α = ( L + l) and β = in T we have 2 L + l [ ] ( xk+1 x ) L l k [ x1 x ] x k x + ε L + l x 0 x or x k x ( κ 1 κ ε) k x 0 x where κ = L l 21 / 22

22 Last page - summary Gradient descent x k+1 = x k t k x f(x k ) has convergence x k x ( κ 1 κ ε ) k x0 x Heavy Ball Method x k+1 = x k α k x f(x k ) + β k (x k x k 1 ) has convergence ( x k x κ 1 k x0 + ε) x κ + 1 ( κ 1 ) k ( Improvement from κ ε = 1 2 ) k κ ε to ( κ 1 ) k ( 2 ) k + ε = 1 + ε κ + 1 κ + 1 End of document 22 / 22

### Continuity of the Perron Root

Linear and Multilinear Algebra http://dx.doi.org/10.1080/03081087.2014.934233 ArXiv: 1407.7564 (http://arxiv.org/abs/1407.7564) Continuity of the Perron Root Carl D. Meyer Department of Mathematics, North

### (Quasi-)Newton methods

(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

### 1 Norms and Vector Spaces

008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)

### SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)

SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI (B.Sc.(Hons.), BUAA) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF MATHEMATICS NATIONAL

### Lecture 5 Principal Minors and the Hessian

Lecture 5 Principal Minors and the Hessian Eivind Eriksen BI Norwegian School of Management Department of Economics October 01, 2010 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and

### The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

### Mathematics. (www.tiwariacademy.com : Focus on free Education) (Chapter 5) (Complex Numbers and Quadratic Equations) (Class XI)

( : Focus on free Education) Miscellaneous Exercise on chapter 5 Question 1: Evaluate: Answer 1: 1 ( : Focus on free Education) Question 2: For any two complex numbers z1 and z2, prove that Re (z1z2) =

### 2.3 Convex Constrained Optimization Problems

42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

### 7 Gaussian Elimination and LU Factorization

7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method

### Nonlinear Algebraic Equations Example

Nonlinear Algebraic Equations Example Continuous Stirred Tank Reactor (CSTR). Look for steady state concentrations & temperature. s r (in) p,i (in) i In: N spieces with concentrations c, heat capacities

### Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

### Factorization Theorems

Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

### Chapter 6. Orthogonality

6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be

### Similar matrices and Jordan form

Similar matrices and Jordan form We ve nearly covered the entire heart of linear algebra once we ve finished singular value decompositions we ll have seen all the most central topics. A T A is positive

### Notes on Symmetric Matrices

CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.

### Applied Linear Algebra I Review page 1

Applied Linear Algebra Review 1 I. Determinants A. Definition of a determinant 1. Using sum a. Permutations i. Sign of a permutation ii. Cycle 2. Uniqueness of the determinant function in terms of properties

Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

### Lecture 13 Linear quadratic Lyapunov theory

EE363 Winter 28-9 Lecture 13 Linear quadratic Lyapunov theory the Lyapunov equation Lyapunov stability conditions the Lyapunov operator and integral evaluating quadratic integrals analysis of ARE discrete-time

### A note on companion matrices

Linear Algebra and its Applications 372 (2003) 325 33 www.elsevier.com/locate/laa A note on companion matrices Miroslav Fiedler Academy of Sciences of the Czech Republic Institute of Computer Science Pod

### Solutions of Equations in One Variable. Fixed-Point Iteration II

Solutions of Equations in One Variable Fixed-Point Iteration II Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011

### 5. Orthogonal matrices

L Vandenberghe EE133A (Spring 2016) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 5-1 Orthonormal

### Computing a Nearest Correlation Matrix with Factor Structure

Computing a Nearest Correlation Matrix with Factor Structure Nick Higham School of Mathematics The University of Manchester higham@ma.man.ac.uk http://www.ma.man.ac.uk/~higham/ Joint work with Rüdiger

### Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

### Lecture 4: Partitioned Matrices and Determinants

Lecture 4: Partitioned Matrices and Determinants 1 Elementary row operations Recall the elementary operations on the rows of a matrix, equivalent to premultiplying by an elementary matrix E: (1) multiplying

### THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS KEITH CONRAD 1. Introduction The Fundamental Theorem of Algebra says every nonconstant polynomial with complex coefficients can be factored into linear

### 10. Proximal point method

L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing

### Numerical Methods I Eigenvalue Problems

Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001, Fall 2010 September 30th, 2010 A. Donev (Courant Institute)

### Vector and Matrix Norms

Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

### Notes on Determinant

ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without

### Section 6.1 - Inner Products and Norms

Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,

### Inner Product Spaces

Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

### Cost Minimization and the Cost Function

Cost Minimization and the Cost Function Juan Manuel Puerta October 5, 2009 So far we focused on profit maximization, we could look at a different problem, that is the cost minimization problem. This is

### 0 <β 1 let u(x) u(y) kuk u := sup u(x) and [u] β := sup

456 BRUCE K. DRIVER 24. Hölder Spaces Notation 24.1. Let Ω be an open subset of R d,bc(ω) and BC( Ω) be the bounded continuous functions on Ω and Ω respectively. By identifying f BC( Ω) with f Ω BC(Ω),

### Inner Product Spaces and Orthogonality

Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,

### Ideal Class Group and Units

Chapter 4 Ideal Class Group and Units We are now interested in understanding two aspects of ring of integers of number fields: how principal they are (that is, what is the proportion of principal ideals

### (67902) Topics in Theory and Complexity Nov 2, 2006. Lecture 7

(67902) Topics in Theory and Complexity Nov 2, 2006 Lecturer: Irit Dinur Lecture 7 Scribe: Rani Lekach 1 Lecture overview This Lecture consists of two parts In the first part we will refresh the definition

### Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001,

### BANACH AND HILBERT SPACE REVIEW

BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but

### Solving Linear Systems, Continued and The Inverse of a Matrix

, Continued and The of a Matrix Calculus III Summer 2013, Session II Monday, July 15, 2013 Agenda 1. The rank of a matrix 2. The inverse of a square matrix Gaussian Gaussian solves a linear system by reducing

### t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

### n k=1 k=0 1/k! = e. Example 6.4. The series 1/k 2 converges in R. Indeed, if s n = n then k=1 1/k, then s 2n s n = 1 n + 1 +...

6 Series We call a normed space (X, ) a Banach space provided that every Cauchy sequence (x n ) in X converges. For example, R with the norm = is an example of Banach space. Now let (x n ) be a sequence

### AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS Revised Edition James Epperson Mathematical Reviews BICENTENNIAL 0, 1 8 0 7 z ewiley wu 2007 r71 BICENTENNIAL WILEY-INTERSCIENCE A John Wiley & Sons, Inc.,

### A characterization of trace zero symmetric nonnegative 5x5 matrices

A characterization of trace zero symmetric nonnegative 5x5 matrices Oren Spector June 1, 009 Abstract The problem of determining necessary and sufficient conditions for a set of real numbers to be the

### October 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix

Linear Algebra & Properties of the Covariance Matrix October 3rd, 2012 Estimation of r and C Let rn 1, rn, t..., rn T be the historical return rates on the n th asset. rn 1 rṇ 2 r n =. r T n n = 1, 2,...,

### Lecture 5: Singular Value Decomposition SVD (1)

EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system

### PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include 2 + 5.

PUTNAM TRAINING POLYNOMIALS (Last updated: November 17, 2015) Remark. This is a list of exercises on polynomials. Miguel A. Lerma Exercises 1. Find a polynomial with integral coefficients whose zeros include

Advanced Microeconomics Ordinal preference theory Harald Wiese University of Leipzig Harald Wiese (University of Leipzig) Advanced Microeconomics 1 / 68 Part A. Basic decision and preference theory 1 Decisions

### Examination paper for TMA4205 Numerical Linear Algebra

Department of Mathematical Sciences Examination paper for TMA4205 Numerical Linear Algebra Academic contact during examination: Markus Grasmair Phone: 97580435 Examination date: December 16, 2015 Examination

### Inner products on R n, and more

Inner products on R n, and more Peyam Ryan Tabrizian Friday, April 12th, 2013 1 Introduction You might be wondering: Are there inner products on R n that are not the usual dot product x y = x 1 y 1 + +

### by the matrix A results in a vector which is a reflection of the given

Eigenvalues & Eigenvectors Example Suppose Then So, geometrically, multiplying a vector in by the matrix A results in a vector which is a reflection of the given vector about the y-axis We observe that

### Date: April 12, 2001. Contents

2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

### ALGEBRAIC EIGENVALUE PROBLEM

ALGEBRAIC EIGENVALUE PROBLEM BY J. H. WILKINSON, M.A. (Cantab.), Sc.D. Technische Universes! Dsrmstedt FACHBEREICH (NFORMATiK BIBL1OTHEK Sachgebieto:. Standort: CLARENDON PRESS OXFORD 1965 Contents 1.

### A number field is a field of finite degree over Q. By the Primitive Element Theorem, any number

Number Fields Introduction A number field is a field of finite degree over Q. By the Primitive Element Theorem, any number field K = Q(α) for some α K. The minimal polynomial Let K be a number field and

### Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

(für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained

### Mathematical Methods of Engineering Analysis

Mathematical Methods of Engineering Analysis Erhan Çinlar Robert J. Vanderbei February 2, 2000 Contents Sets and Functions 1 1 Sets................................... 1 Subsets.............................

### GI01/M055 Supervised Learning Proximal Methods

GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators

### Inner product. Definition of inner product

Math 20F Linear Algebra Lecture 25 1 Inner product Review: Definition of inner product. Slide 1 Norm and distance. Orthogonal vectors. Orthogonal complement. Orthogonal basis. Definition of inner product

### Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

### 13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in three-space, we write a vector in terms

### Iterative Methods for Solving Linear Systems

Chapter 5 Iterative Methods for Solving Linear Systems 5.1 Convergence of Sequences of Vectors and Matrices In Chapter 2 we have discussed some of the main methods for solving systems of linear equations.

### Notes on metric spaces

Notes on metric spaces 1 Introduction The purpose of these notes is to quickly review some of the basic concepts from Real Analysis, Metric Spaces and some related results that will be used in this course.

### Duality of linear conic problems

Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least

### The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every

### IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL. 1. Introduction

IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL R. DRNOVŠEK, T. KOŠIR Dedicated to Prof. Heydar Radjavi on the occasion of his seventieth birthday. Abstract. Let S be an irreducible

### Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

### 3. Reaction Diffusion Equations Consider the following ODE model for population growth

3. Reaction Diffusion Equations Consider the following ODE model for population growth u t a u t u t, u 0 u 0 where u t denotes the population size at time t, and a u plays the role of the population dependent

### Metric Spaces. Chapter 7. 7.1. Metrics

Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some

### Derivative Free Optimization

Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTH-MAT-R--2014/02--SE Department of Mathematics Linköping University S-581 83 Linköping, Sweden. Three lectures 1 on Derivative Free

### 6. Cholesky factorization

6. Cholesky factorization EE103 (Fall 2011-12) triangular matrices forward and backward substitution the Cholesky factorization solving Ax = b with A positive definite inverse of a positive definite matrix

### A FIRST COURSE IN OPTIMIZATION THEORY

A FIRST COURSE IN OPTIMIZATION THEORY RANGARAJAN K. SUNDARAM New York University CAMBRIDGE UNIVERSITY PRESS Contents Preface Acknowledgements page xiii xvii 1 Mathematical Preliminaries 1 1.1 Notation

### Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number

### GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1

(R) User Manual Environmental Energy Technologies Division Berkeley, CA 94720 http://simulationresearch.lbl.gov Michael Wetter MWetter@lbl.gov February 20, 2009 Notice: This work was supported by the U.S.

### Chapter 5. Banach Spaces

9 Chapter 5 Banach Spaces Many linear equations may be formulated in terms of a suitable linear operator acting on a Banach space. In this chapter, we study Banach spaces and linear operators acting on

### 1. Let P be the space of all polynomials (of one real variable and with real coefficients) with the norm

Uppsala Universitet Matematiska Institutionen Andreas Strömbergsson Prov i matematik Funktionalanalys Kurs: F3B, F4Sy, NVP 005-06-15 Skrivtid: 9 14 Tillåtna hjälpmedel: Manuella skrivdon, Kreyszigs bok

### 1 Local Brouwer degree

1 Local Brouwer degree Let D R n be an open set and f : S R n be continuous, D S and c R n. Suppose that the set f 1 (c) D is compact. (1) Then the local Brouwer degree of f at c in the set D is defined.

### [1] Diagonal factorization

8.03 LA.6: Diagonalization and Orthogonal Matrices [ Diagonal factorization [2 Solving systems of first order differential equations [3 Symmetric and Orthonormal Matrices [ Diagonal factorization Recall:

Elementary Gradient-Based Parameter Estimation Julius O. Smith III Center for Computer Research in Music and Acoustics (CCRMA Department of Music, Stanford University, Stanford, California 94305 USA Abstract

### Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:

### A Distributed Line Search for Network Optimization

01 American Control Conference Fairmont Queen Elizabeth, Montréal, Canada June 7-June 9, 01 A Distributed Line Search for Networ Optimization Michael Zargham, Alejandro Ribeiro, Ali Jadbabaie Abstract

### The NEWUOA software for unconstrained optimization without derivatives 1. M.J.D. Powell

DAMTP 2004/NA05 The NEWUOA software for unconstrained optimization without derivatives 1 M.J.D. Powell Abstract: The NEWUOA software seeks the least value of a function F(x), x R n, when F(x) can be calculated

### FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES

FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES CHRISTOPHER HEIL 1. Cosets and the Quotient Space Any vector space is an abelian group under the operation of vector addition. So, if you are have studied

### Principles of Scientific Computing Nonlinear Equations and Optimization

Principles of Scientific Computing Nonlinear Equations and Optimization David Bindel and Jonathan Goodman last revised March 6, 2006, printed March 6, 2009 1 1 Introduction This chapter discusses two related

### Nonlinear Programming Methods.S2 Quadratic Programming

Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective

### The Method of Partial Fractions Math 121 Calculus II Spring 2015

Rational functions. as The Method of Partial Fractions Math 11 Calculus II Spring 015 Recall that a rational function is a quotient of two polynomials such f(x) g(x) = 3x5 + x 3 + 16x x 60. The method

### CONTROLLABILITY. Chapter 2. 2.1 Reachable Set and Controllability. Suppose we have a linear system described by the state equation

Chapter 2 CONTROLLABILITY 2 Reachable Set and Controllability Suppose we have a linear system described by the state equation ẋ Ax + Bu (2) x() x Consider the following problem For a given vector x in

### Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = 36 + 41i.

Math 5A HW4 Solutions September 5, 202 University of California, Los Angeles Problem 4..3b Calculate the determinant, 5 2i 6 + 4i 3 + i 7i Solution: The textbook s instructions give us, (5 2i)7i (6 + 4i)(

### The Ideal Class Group

Chapter 5 The Ideal Class Group We will use Minkowski theory, which belongs to the general area of geometry of numbers, to gain insight into the ideal class group of a number field. We have already mentioned

### Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.

### Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of

### Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.

Summer course on Convex Optimization Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.Minnesota Interior-Point Methods: the rebirth of an old idea Suppose that f is

### LS.6 Solution Matrices

LS.6 Solution Matrices In the literature, solutions to linear systems often are expressed using square matrices rather than vectors. You need to get used to the terminology. As before, we state the definitions

### Direct Methods for Solving Linear Systems. Matrix Factorization

Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011

### MATH 551 - APPLIED MATRIX THEORY

MATH 55 - APPLIED MATRIX THEORY FINAL TEST: SAMPLE with SOLUTIONS (25 points NAME: PROBLEM (3 points A web of 5 pages is described by a directed graph whose matrix is given by A Do the following ( points

### 24. The Branch and Bound Method

24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no

### Notes V General Equilibrium: Positive Theory. 1 Walrasian Equilibrium and Excess Demand

Notes V General Equilibrium: Positive Theory In this lecture we go on considering a general equilibrium model of a private ownership economy. In contrast to the Notes IV, we focus on positive issues such

### We shall turn our attention to solving linear systems of equations. Ax = b

59 Linear Algebra We shall turn our attention to solving linear systems of equations Ax = b where A R m n, x R n, and b R m. We already saw examples of methods that required the solution of a linear system

### MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors Jordan canonical form (continued) Jordan canonical form A Jordan block is a square matrix of the form λ 1 0 0 0 0 λ 1 0 0 0 0 λ 0 0 J = 0