Heavy ball method on convex quadratic problem 1 min
|
|
- Sharyl Porter
- 3 years ago
- Views:
Transcription
1 Heavy ball method on convex quadratic problem 1 min x 2 x Ax b x A case study Andersen Ang Mathématique et recherche opérationnelle UMONS, Belgium manshun.ang@umons.ac.be First draft: June 26, 2018 Last update : July 28, 2019 Homepage: angms.science
2 Overview 1 1 Convex Quadratic Problem min x 2 x Ax b x 2 Gradient Descent and convergence rate 3 Polyak s Heavy Ball Method 4 Convergence of Heavy Ball Method 5 Summary 2 / 22
3 An inverse problem / unconstrained optimization problem Given A R n n, b R n 1, find x R n 1 by 1 (P 0 ) : min x 2 Ax b 2 2 (P 0 ) is equivalent to the quadratic problem min f(x) = 1 x 2 x A x b x. 1 2 Ax b 2 2 = 1 2 (Ax b) (Ax b) (expand) = 1 ( ) x A Ax x A b b Ax + b b 2 (a b = b a) = 1 ( ) x A Ax 2b Ax + b (A = A A, b = A b) = 1 2 x A x b x b 2 2 Ignoring constant 1 2 b 2 2, denote A as A and b as b, we now focus on the equivalent problem (P) : min x f(x) = 1 2 x Ax b x. 3 / 22
4 The convex quadratic problem (P) : min x f(x) = 1 2 x Ax b x. Properties of f : f is convex with respect to (w.r.t) x f is differentiable w.r.t. x First order derivative (gradient) : xf(x) = Ax b Second order derivative (Hessian) : 2 xf(x) = A Assumption 1 : A is positive definite and symmetric Consequence of the assumption : all eigenvalues of A are positive A is nonsingular = optimal solution x exists, which is x = A 1 b We can further assume li A LI 4 / 22
5 Gradient descent GD solves (P) : minf(x) = 1 x 2 x Ax b x by generating the sequence {x k } k IN : x k+1 = x k t k x f(x k ) where k is iteration (k = 1, 2,...) and t k is step size The sequence x k converges to x at a linear rate (in optimization). The convergence is illustrated by showing the distance function x k x is monotonically decreasing as k increases, under a suitable step size t k Theorem (GD converge at linear rate) Consider the problem min 2 x Ax b x with A being pd and li A LI, we have ( κ 1 ) k x0 x k x 2 x 2 κ + 1 x 1 where κ is the conditional number of A : i.e. κ = L l 5 / 22
6 Useful material before the proof As x = A 1 b, we have b = Ax (1) As x f(x) = Ax b, we have Put (1) into (2) we have x k t k x f(x k ) = x k t k (Ax k b) (2) x k t k x f(x k ) = x k t k (Ax k Ax ) = (I t k A)x k + t k Ax (3) With these we can now prove the convergence, starting with the distance function x k+1 x 2 6 / 22
7 Convergence rate of Gradient Descent in 1 slide x k+1 x 2 = x k t k x f(x k ) x 2 by (3) = (I t k A)x k + t k Ax x 2 = (I t k A)(x k x ) 2 I t k A 2 x k x 2 (1 t k l) x k x 2 (1 tl) k x 0 x 2 ( L l ) k x0 = x 2 L + l ( κ 1 ) k x0 = x 2 κ + 1 1st line : by definition of GD x k+1 = x k t k x f(x k ) 4th line : by operator norm inequality Ax 2 A 2 x 2 5th line : by li A LI = (1 t k L)I I t k A (1 t k l)i 6th line : if constant step size is used t k = t 7th line : pick t = 2 L l L+l, 1 tl = L+l 8th line : κ = L l 1 is the conditional number of A. 7 / 22
8 Polyak s Heavy Ball Method (HBM) HBM adds a momentum term in GD x k+1 = x k α k x f(x k ) + β k (x k x k 1 ) }{{} HBM momentum i.e. gradient descent with momentum β k (x k x k 1 ) β k 0 is the momentum parameter / extrapolation parameter α k acts as the step size t k in GD When β k = 0, HBM reduces to GD As update direction is perturbed by the momentum, HBM is not monotone : objective function value may increase However, overall speaking HBM converges faster than GD (Will prove it soon) 8 / 22
9 Comparing GD, HBM and Nesterov s acceleration Compared to HBM, Nesterov s accelerated gradient compute the gradient after applying the momentum x k+1 = x k α k x f ( x k + β k (x k x k 1 ) ) + β k (x k x k 1 ) Consider the following notations : Then x + = x t f(x) a k = β k (x k x k 1 ) Open question : may be? Cauchy Gradient Descent x k+1 = x + k Polyak HBM x k+1 = x + k + a k Nesterov acceleration x k+1 = (x k + a k ) + x k+1 = (x k + a k ) + + b k, x k+1 = ( (x k + a k ) + + b k ) +,... 9 / 22
10 Convergence of Heavy Ball Method Consider x k+1 x. By definition of HBM update : x k+1 x = x k α k x f(x k ) + β k (x k x k 1 ) x As x f(x k ) = Ax k b and b = Ax, we have x f(x k ) = Ax k Ax x k+1 x = x k α k (Ax k Ax ) + β k (x k x k 1 ) x = x k x α k A(x k x ) + β k (x k x k 1 ) = (I α k A)(x k x ) + β k (x k x k 1 x + x ) = (I α k A)(x k x ) β k (x k 1 x ) + β(x k x ) ( ) = (1 + β k )I α k A (x k x ) β k (x k 1 x ) In this sense, we have to consider x k x and x k 1 x at the same time [ xk+1 x ] [ ] [ (1 + βk )I α x k x = k A β k I xk x ] I 0 x k 1 x, }{{} T k (α,β) T k (α, β) is the transition matrix 10 / 22
11 Convergence of Heavy Ball Method - Transition matrix T Compact expression [ xk+1 x ] x k x [ xk x = T k (α, β) ] x k 1 x Take constant constant α k and β k in T k, so T k = T and [ xk+1 x ] [ x k x = T k x1 x ] x 0 x Take norm [ ] xk+1 x x k x 2 [ = x1 x ] Tk x 0 x 2 T k 2 [ x1 x x 0 x ] So if T k 2 is bounded, the series x k produced by HBM converges / 22
12 Tools for bounding T k 2 Recall Spectrum (all eigenvalues) of a block diagonal matrix are the eigenvalues of the block submatrices. Spectrum of a matrix are the roots of characteristic equation. For 2-by-2 matrix, the characteristic equation is in the form ax 2 + bx + c = 0 with roots x = 1 2 ( b ± b 2 4ac). The roots are complex conjugate if = b 2 4ac 0 Complex roots in the form a + ib share same magnitude as a 2 + b 2 We need two lemmas Lemma 1. For a n n matrix T, there exists a sequences ε k 0 that T k (ρ(t) + ε k ) k Lemma 2. For β > (1 αl) 2, ρ(t) < β. where ρ(t) = max{ λ 1, λ 2,..., λ n } is the spectral radius of matrix T, and λ i are the eigenvalues of T 12 / 22
13 The logic flow of bounding T k 2 Ultimate goal : show x k produced by HBM converges to x x k produced by HBM converges to x if T k 2 is bounded We use lemma 1 to bound T k. To use lemma 1, we need ρ(t), which we use lemma 2 We will not prove lemma 1 but lemma 2. Lemma 1. For a n n matrix T, there exists a sequences ε k 0 that where lim k ε k = 0. Proof. Skipped (too long). T k (ρ(t) + ε k ) k 13 / 22
14 The logic of bounding T k 2 Lemma 2. For β > (1 αl) 2, ρ(t) < β. Flow of proving lemma 2 : First show T can be decomposed into blocks T i Then spectrum of T are the eigenvalues of T i As ρ(t) is considering on the magnitude of eigenvalues, so we consider the magnitude of the eigenvalues of T i T i is 2-by-2 matrix, so the eigenvalues are the root of characteristic equation in the form ax 2 + bx + c = 0 Roots of ax 2 + bx + c = 0 are complex conjugate that share the same magnitude if b 2 4ac 14 / 22
15 Proving lemma 2 - eigendecomposition Lemma 2. For β > (1 αl) 2, ρ(t) < β. Proof. First assume β (1 αl) 2. As A is pd, A has eigendecomposition as VΛV. Then [ ] (1 + βk )I α T = k A β k I I 0 [ (1 + βk )I α = k VΛV ] β k I I 0 As T is diagonal and V forms a basis, so [ ] T = (1 + βk )I α k Λ β k I I 0 15 / 22
16 Proving lemma 2 - block decomposition of T Note T is block diagonal, it can be decomposed into blocks. e.g. when n = 2, let x i = 1 + β αλ i x 1 β T = x 2 β 1 As norm is invariant under permutation, we can swap : row 3 row 2, column 3 column 2 x 1 β x 1 x 2 β 1 β 1 x β We can see T can be decomposed into a block diagonal matrix consists of 2 2 component T i as 1 [ ] T = T T β αλi β, where T i = / 22
17 Proving lemma 2 - block decomposition of T Structure of T in general After transformation Hence, spectrum of T = all the eigenvalues of all T i 17 / 22
18 Proving lemma 2 - spectrum of T = eigenvalues of T i T i is 2-by-2 = we use the roots of the characteristic equation to find its eigenvalues, which is to solve det(t i ui) = 0 : ( ) 1 + β αλi u β det = 0 u 2 (1 + β αλ 1 u i )u + β = 0 u = 1 ( 1 + β αλ i ± ) (1 + β αλ i ) 2 2 4β Magnitude of roots are the same iff the roots are complex : Let = (1 + β αλ i ) 2 4β. Then is imaginary 0. (1 + β αλ i ) 2 4β 1 + β αλ i 2 β 1 2 β + β αλ i (1 β) 2 αλ i 1 β αλ i 1 αλ i β β (1 αλ i ) 2 18 / 22
19 Proving lemma 2 - Complex roots β (1 αλ i ) 2 Note : β (1 αλ i ) 2 is automatically satisfied due to assumptions β (1 αl) 2 and A LI Then we have 1 + β αλ i 1 + (1 αλ i ) 2 αλ i = 1 + (1 2 αλ i + αλ i ) αλ i = 2(1 αλ i ) Hence u = 1 ( 1 + β αλ i ± ) (1 + β αλ i ) 2 2 4β 1 ( 2(1 αλ i ) ± (2(1 ) αλ i )) 2 2 4β ( = 1 αλ i ± (1 ) αλ i ) 2 β As β (1 αλ i ) 2, so (1 αλ i ) 2 β 0 and (1 αλ i ) 2 β is imaginary Thus the roots u will be complex number in the form a ± ib, where a = 1 αλ i and b = (1 αλ i ) 2 β 19 / 22
20 Proving lemma 2 - magnitude of u β The magnitude of u in the form of a + ib is a 2 + b 2 u = (1 αλ i ) 2 + (1 αλ i ) 2 β β + β β = β So the magnitude of eigenvalues of T i ( i) are less than β By assumption β > (1 αl) 2, we have β > 1 αl and β β Therefore, the largest eigenvalue (spectral radius) of T β. And we finish the proof of Lemma / 22
21 Convergence of HBM Assume β (1 αl), by lemma 2 we have ρ(t) = max λ i (T) β. By lemma 1, we have T k (ρ(t) + ε k ) k with lim k ε k = 0 Put lemma 2 into lemma 1 we have T k (β + ε k ) k 4 L l Lastly, let α = ( L + l) and β = in T we have 2 L + l [ ] ( xk+1 x ) L l k [ x1 x ] x k x + ε L + l x 0 x or x k x ( κ 1 κ ε) k x 0 x where κ = L l 21 / 22
22 Last page - summary Gradient descent x k+1 = x k t k x f(x k ) has convergence x k x ( κ 1 κ ε ) k x0 x Heavy Ball Method x k+1 = x k α k x f(x k ) + β k (x k x k 1 ) has convergence ( x k x κ 1 k x0 + ε) x κ + 1 ( κ 1 ) k ( Improvement from κ ε = 1 2 ) k κ ε to ( κ 1 ) k ( 2 ) k + ε = 1 + ε κ + 1 κ + 1 End of document 22 / 22
Continuity of the Perron Root
Linear and Multilinear Algebra http://dx.doi.org/10.1080/03081087.2014.934233 ArXiv: 1407.7564 (http://arxiv.org/abs/1407.7564) Continuity of the Perron Root Carl D. Meyer Department of Mathematics, North
More information(Quasi-)Newton methods
(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting
More information1 Norms and Vector Spaces
008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)
More informationSMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)
SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI (B.Sc.(Hons.), BUAA) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF MATHEMATICS NATIONAL
More informationLecture 5 Principal Minors and the Hessian
Lecture 5 Principal Minors and the Hessian Eivind Eriksen BI Norwegian School of Management Department of Economics October 01, 2010 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and
More informationThe Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem
More informationMathematics. (www.tiwariacademy.com : Focus on free Education) (Chapter 5) (Complex Numbers and Quadratic Equations) (Class XI)
( : Focus on free Education) Miscellaneous Exercise on chapter 5 Question 1: Evaluate: Answer 1: 1 ( : Focus on free Education) Question 2: For any two complex numbers z1 and z2, prove that Re (z1z2) =
More information2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
More information7 Gaussian Elimination and LU Factorization
7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method
More informationNonlinear Algebraic Equations Example
Nonlinear Algebraic Equations Example Continuous Stirred Tank Reactor (CSTR). Look for steady state concentrations & temperature. s r (in) p,i (in) i In: N spieces with concentrations c, heat capacities
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationFactorization Theorems
Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization
More informationChapter 6. Orthogonality
6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be
More informationSimilar matrices and Jordan form
Similar matrices and Jordan form We ve nearly covered the entire heart of linear algebra once we ve finished singular value decompositions we ll have seen all the most central topics. A T A is positive
More informationNotes on Symmetric Matrices
CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.
More informationApplied Linear Algebra I Review page 1
Applied Linear Algebra Review 1 I. Determinants A. Definition of a determinant 1. Using sum a. Permutations i. Sign of a permutation ii. Cycle 2. Uniqueness of the determinant function in terms of properties
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationLecture 13 Linear quadratic Lyapunov theory
EE363 Winter 28-9 Lecture 13 Linear quadratic Lyapunov theory the Lyapunov equation Lyapunov stability conditions the Lyapunov operator and integral evaluating quadratic integrals analysis of ARE discrete-time
More informationA note on companion matrices
Linear Algebra and its Applications 372 (2003) 325 33 www.elsevier.com/locate/laa A note on companion matrices Miroslav Fiedler Academy of Sciences of the Czech Republic Institute of Computer Science Pod
More informationSolutions of Equations in One Variable. Fixed-Point Iteration II
Solutions of Equations in One Variable Fixed-Point Iteration II Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011
More information5. Orthogonal matrices
L Vandenberghe EE133A (Spring 2016) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 5-1 Orthonormal
More informationComputing a Nearest Correlation Matrix with Factor Structure
Computing a Nearest Correlation Matrix with Factor Structure Nick Higham School of Mathematics The University of Manchester higham@ma.man.ac.uk http://www.ma.man.ac.uk/~higham/ Joint work with Rüdiger
More informationContinued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
More informationLecture 4: Partitioned Matrices and Determinants
Lecture 4: Partitioned Matrices and Determinants 1 Elementary row operations Recall the elementary operations on the rows of a matrix, equivalent to premultiplying by an elementary matrix E: (1) multiplying
More informationTHE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS
THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS KEITH CONRAD 1. Introduction The Fundamental Theorem of Algebra says every nonconstant polynomial with complex coefficients can be factored into linear
More information10. Proximal point method
L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing
More informationNumerical Methods I Eigenvalue Problems
Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001, Fall 2010 September 30th, 2010 A. Donev (Courant Institute)
More informationVector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty
More informationNotes on Determinant
ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without
More informationSection 6.1 - Inner Products and Norms
Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,
More informationInner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
More informationCost Minimization and the Cost Function
Cost Minimization and the Cost Function Juan Manuel Puerta October 5, 2009 So far we focused on profit maximization, we could look at a different problem, that is the cost minimization problem. This is
More information0 <β 1 let u(x) u(y) kuk u := sup u(x) and [u] β := sup
456 BRUCE K. DRIVER 24. Hölder Spaces Notation 24.1. Let Ω be an open subset of R d,bc(ω) and BC( Ω) be the bounded continuous functions on Ω and Ω respectively. By identifying f BC( Ω) with f Ω BC(Ω),
More informationInner Product Spaces and Orthogonality
Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,
More informationIdeal Class Group and Units
Chapter 4 Ideal Class Group and Units We are now interested in understanding two aspects of ring of integers of number fields: how principal they are (that is, what is the proportion of principal ideals
More information(67902) Topics in Theory and Complexity Nov 2, 2006. Lecture 7
(67902) Topics in Theory and Complexity Nov 2, 2006 Lecturer: Irit Dinur Lecture 7 Scribe: Rani Lekach 1 Lecture overview This Lecture consists of two parts In the first part we will refresh the definition
More informationNumerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems
Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001,
More informationBANACH AND HILBERT SPACE REVIEW
BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but
More informationSolving Linear Systems, Continued and The Inverse of a Matrix
, Continued and The of a Matrix Calculus III Summer 2013, Session II Monday, July 15, 2013 Agenda 1. The rank of a matrix 2. The inverse of a square matrix Gaussian Gaussian solves a linear system by reducing
More informationt := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).
1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction
More informationn k=1 k=0 1/k! = e. Example 6.4. The series 1/k 2 converges in R. Indeed, if s n = n then k=1 1/k, then s 2n s n = 1 n + 1 +...
6 Series We call a normed space (X, ) a Banach space provided that every Cauchy sequence (x n ) in X converges. For example, R with the norm = is an example of Banach space. Now let (x n ) be a sequence
More informationAN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS
AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS Revised Edition James Epperson Mathematical Reviews BICENTENNIAL 0, 1 8 0 7 z ewiley wu 2007 r71 BICENTENNIAL WILEY-INTERSCIENCE A John Wiley & Sons, Inc.,
More informationA characterization of trace zero symmetric nonnegative 5x5 matrices
A characterization of trace zero symmetric nonnegative 5x5 matrices Oren Spector June 1, 009 Abstract The problem of determining necessary and sufficient conditions for a set of real numbers to be the
More informationOctober 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix
Linear Algebra & Properties of the Covariance Matrix October 3rd, 2012 Estimation of r and C Let rn 1, rn, t..., rn T be the historical return rates on the n th asset. rn 1 rṇ 2 r n =. r T n n = 1, 2,...,
More informationLecture 5: Singular Value Decomposition SVD (1)
EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system
More informationPUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include 2 + 5.
PUTNAM TRAINING POLYNOMIALS (Last updated: November 17, 2015) Remark. This is a list of exercises on polynomials. Miguel A. Lerma Exercises 1. Find a polynomial with integral coefficients whose zeros include
More informationAdvanced Microeconomics
Advanced Microeconomics Ordinal preference theory Harald Wiese University of Leipzig Harald Wiese (University of Leipzig) Advanced Microeconomics 1 / 68 Part A. Basic decision and preference theory 1 Decisions
More informationExamination paper for TMA4205 Numerical Linear Algebra
Department of Mathematical Sciences Examination paper for TMA4205 Numerical Linear Algebra Academic contact during examination: Markus Grasmair Phone: 97580435 Examination date: December 16, 2015 Examination
More informationInner products on R n, and more
Inner products on R n, and more Peyam Ryan Tabrizian Friday, April 12th, 2013 1 Introduction You might be wondering: Are there inner products on R n that are not the usual dot product x y = x 1 y 1 + +
More informationby the matrix A results in a vector which is a reflection of the given
Eigenvalues & Eigenvectors Example Suppose Then So, geometrically, multiplying a vector in by the matrix A results in a vector which is a reflection of the given vector about the y-axis We observe that
More informationDate: April 12, 2001. Contents
2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........
More informationALGEBRAIC EIGENVALUE PROBLEM
ALGEBRAIC EIGENVALUE PROBLEM BY J. H. WILKINSON, M.A. (Cantab.), Sc.D. Technische Universes! Dsrmstedt FACHBEREICH (NFORMATiK BIBL1OTHEK Sachgebieto:. Standort: CLARENDON PRESS OXFORD 1965 Contents 1.
More informationA number field is a field of finite degree over Q. By the Primitive Element Theorem, any number
Number Fields Introduction A number field is a field of finite degree over Q. By the Primitive Element Theorem, any number field K = Q(α) for some α K. The minimal polynomial Let K be a number field and
More informationNumerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen
(für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained
More informationMathematical Methods of Engineering Analysis
Mathematical Methods of Engineering Analysis Erhan Çinlar Robert J. Vanderbei February 2, 2000 Contents Sets and Functions 1 1 Sets................................... 1 Subsets.............................
More informationGI01/M055 Supervised Learning Proximal Methods
GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators
More informationInner product. Definition of inner product
Math 20F Linear Algebra Lecture 25 1 Inner product Review: Definition of inner product. Slide 1 Norm and distance. Orthogonal vectors. Orthogonal complement. Orthogonal basis. Definition of inner product
More informationDuality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725
Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T
More information13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.
3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in three-space, we write a vector in terms
More informationIterative Methods for Solving Linear Systems
Chapter 5 Iterative Methods for Solving Linear Systems 5.1 Convergence of Sequences of Vectors and Matrices In Chapter 2 we have discussed some of the main methods for solving systems of linear equations.
More informationNotes on metric spaces
Notes on metric spaces 1 Introduction The purpose of these notes is to quickly review some of the basic concepts from Real Analysis, Metric Spaces and some related results that will be used in this course.
More informationDuality of linear conic problems
Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least
More informationThe Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression
The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every
More informationIRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL. 1. Introduction
IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL R. DRNOVŠEK, T. KOŠIR Dedicated to Prof. Heydar Radjavi on the occasion of his seventieth birthday. Abstract. Let S be an irreducible
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More information3. Reaction Diffusion Equations Consider the following ODE model for population growth
3. Reaction Diffusion Equations Consider the following ODE model for population growth u t a u t u t, u 0 u 0 where u t denotes the population size at time t, and a u plays the role of the population dependent
More informationMetric Spaces. Chapter 7. 7.1. Metrics
Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some
More informationDerivative Free Optimization
Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTH-MAT-R--2014/02--SE Department of Mathematics Linköping University S-581 83 Linköping, Sweden. Three lectures 1 on Derivative Free
More information6. Cholesky factorization
6. Cholesky factorization EE103 (Fall 2011-12) triangular matrices forward and backward substitution the Cholesky factorization solving Ax = b with A positive definite inverse of a positive definite matrix
More informationA FIRST COURSE IN OPTIMIZATION THEORY
A FIRST COURSE IN OPTIMIZATION THEORY RANGARAJAN K. SUNDARAM New York University CAMBRIDGE UNIVERSITY PRESS Contents Preface Acknowledgements page xiii xvii 1 Mathematical Preliminaries 1 1.1 Notation
More informationNumerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number
More informationGenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1
(R) User Manual Environmental Energy Technologies Division Berkeley, CA 94720 http://simulationresearch.lbl.gov Michael Wetter MWetter@lbl.gov February 20, 2009 Notice: This work was supported by the U.S.
More informationChapter 5. Banach Spaces
9 Chapter 5 Banach Spaces Many linear equations may be formulated in terms of a suitable linear operator acting on a Banach space. In this chapter, we study Banach spaces and linear operators acting on
More information1. Let P be the space of all polynomials (of one real variable and with real coefficients) with the norm
Uppsala Universitet Matematiska Institutionen Andreas Strömbergsson Prov i matematik Funktionalanalys Kurs: F3B, F4Sy, NVP 005-06-15 Skrivtid: 9 14 Tillåtna hjälpmedel: Manuella skrivdon, Kreyszigs bok
More information1 Local Brouwer degree
1 Local Brouwer degree Let D R n be an open set and f : S R n be continuous, D S and c R n. Suppose that the set f 1 (c) D is compact. (1) Then the local Brouwer degree of f at c in the set D is defined.
More information[1] Diagonal factorization
8.03 LA.6: Diagonalization and Orthogonal Matrices [ Diagonal factorization [2 Solving systems of first order differential equations [3 Symmetric and Orthonormal Matrices [ Diagonal factorization Recall:
More informationElementary Gradient-Based Parameter Estimation
Elementary Gradient-Based Parameter Estimation Julius O. Smith III Center for Computer Research in Music and Acoustics (CCRMA Department of Music, Stanford University, Stanford, California 94305 USA Abstract
More informationProbabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur
Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:
More informationA Distributed Line Search for Network Optimization
01 American Control Conference Fairmont Queen Elizabeth, Montréal, Canada June 7-June 9, 01 A Distributed Line Search for Networ Optimization Michael Zargham, Alejandro Ribeiro, Ali Jadbabaie Abstract
More informationThe NEWUOA software for unconstrained optimization without derivatives 1. M.J.D. Powell
DAMTP 2004/NA05 The NEWUOA software for unconstrained optimization without derivatives 1 M.J.D. Powell Abstract: The NEWUOA software seeks the least value of a function F(x), x R n, when F(x) can be calculated
More informationFUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES
FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES CHRISTOPHER HEIL 1. Cosets and the Quotient Space Any vector space is an abelian group under the operation of vector addition. So, if you are have studied
More informationPrinciples of Scientific Computing Nonlinear Equations and Optimization
Principles of Scientific Computing Nonlinear Equations and Optimization David Bindel and Jonathan Goodman last revised March 6, 2006, printed March 6, 2009 1 1 Introduction This chapter discusses two related
More informationNonlinear Programming Methods.S2 Quadratic Programming
Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective
More informationThe Method of Partial Fractions Math 121 Calculus II Spring 2015
Rational functions. as The Method of Partial Fractions Math 11 Calculus II Spring 015 Recall that a rational function is a quotient of two polynomials such f(x) g(x) = 3x5 + x 3 + 16x x 60. The method
More informationCONTROLLABILITY. Chapter 2. 2.1 Reachable Set and Controllability. Suppose we have a linear system described by the state equation
Chapter 2 CONTROLLABILITY 2 Reachable Set and Controllability Suppose we have a linear system described by the state equation ẋ Ax + Bu (2) x() x Consider the following problem For a given vector x in
More informationMath 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = 36 + 41i.
Math 5A HW4 Solutions September 5, 202 University of California, Los Angeles Problem 4..3b Calculate the determinant, 5 2i 6 + 4i 3 + i 7i Solution: The textbook s instructions give us, (5 2i)7i (6 + 4i)(
More informationThe Ideal Class Group
Chapter 5 The Ideal Class Group We will use Minkowski theory, which belongs to the general area of geometry of numbers, to gain insight into the ideal class group of a number field. We have already mentioned
More informationIncreasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.
1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of
More informationSummer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.
Summer course on Convex Optimization Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.Minnesota Interior-Point Methods: the rebirth of an old idea Suppose that f is
More informationLS.6 Solution Matrices
LS.6 Solution Matrices In the literature, solutions to linear systems often are expressed using square matrices rather than vectors. You need to get used to the terminology. As before, we state the definitions
More informationDirect Methods for Solving Linear Systems. Matrix Factorization
Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011
More informationMATH 551 - APPLIED MATRIX THEORY
MATH 55 - APPLIED MATRIX THEORY FINAL TEST: SAMPLE with SOLUTIONS (25 points NAME: PROBLEM (3 points A web of 5 pages is described by a directed graph whose matrix is given by A Do the following ( points
More information24. The Branch and Bound Method
24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no
More informationNotes V General Equilibrium: Positive Theory. 1 Walrasian Equilibrium and Excess Demand
Notes V General Equilibrium: Positive Theory In this lecture we go on considering a general equilibrium model of a private ownership economy. In contrast to the Notes IV, we focus on positive issues such
More informationWe shall turn our attention to solving linear systems of equations. Ax = b
59 Linear Algebra We shall turn our attention to solving linear systems of equations Ax = b where A R m n, x R n, and b R m. We already saw examples of methods that required the solution of a linear system
More informationMATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).
MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors Jordan canonical form (continued) Jordan canonical form A Jordan block is a square matrix of the form λ 1 0 0 0 0 λ 1 0 0 0 0 λ 0 0 J = 0
More informationBig Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
More information1 Lecture: Integration of rational functions by decomposition
Lecture: Integration of rational functions by decomposition into partial fractions Recognize and integrate basic rational functions, except when the denominator is a power of an irreducible quadratic.
More information