Advanced Optimization

Similar documents
Fixed Point Theorems

2.3 Convex Constrained Optimization Problems

Duality of linear conic problems

Convex analysis and profit/cost/support functions

Continuity of the Perron Root

Notes V General Equilibrium: Positive Theory. 1 Walrasian Equilibrium and Excess Demand

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

Adaptive Online Gradient Descent

Vector and Matrix Norms

Metric Spaces. Chapter 1

We shall turn our attention to solving linear systems of equations. Ax = b

THE BANACH CONTRACTION PRINCIPLE. Contents

Fixed Point Theorems For Set-Valued Maps

Random graphs with a given degree sequence

Metric Spaces. Chapter Metrics

NOTES ON LINEAR TRANSFORMATIONS

ALMOST COMMON PRIORS 1. INTRODUCTION

Research Article Stability Analysis for Higher-Order Adjacent Derivative in Parametrized Vector Optimization

4.6 Linear Programming duality

BANACH AND HILBERT SPACE REVIEW

1 if 1 x 0 1 if 0 x 1

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

LS.6 Solution Matrices

(Basic definitions and properties; Separation theorems; Characterizations) 1.1 Definition, examples, inner description, algebraic properties

1 Norms and Vector Spaces

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

Practice with Proofs

Chapter 5. Banach Spaces

Mathematical finance and linear programming (optimization)

Notes on metric spaces

Lectures notes on orthogonal matrices (with exercises) Linear Algebra II - Spring 2004 by D. Klain

Mathematical Methods of Engineering Analysis

No: Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics

1. Prove that the empty set is a subset of every set.

FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES

MATH1231 Algebra, 2015 Chapter 7: Linear maps

The Henstock-Kurzweil-Stieltjes type integral for real functions on a fractal subset of the real line

Inner Product Spaces

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS

1. Let P be the space of all polynomials (of one real variable and with real coefficients) with the norm

3. INNER PRODUCT SPACES

Chapter 6. Cuboids. and. vol(conv(p ))

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

4.5 Linear Dependence and Linear Independence

The Expectation Maximization Algorithm A short tutorial

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

Similarity and Diagonalization. Similar Matrices

Fixed Point Theory. With 14 Illustrations. %1 Springer

15 Limit sets. Lyapunov functions

ON SEQUENTIAL CONTINUITY OF COMPOSITION MAPPING. 0. Introduction

by the matrix A results in a vector which is a reflection of the given

Numerical Methods I Eigenvalue Problems

Chapter 6. Orthogonality

Solutions Of Some Non-Linear Programming Problems BIJAN KUMAR PATEL. Master of Science in Mathematics. Prof. ANIL KUMAR

Big Data - Lecture 1 Optimization reminders

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Follow the Perturbed Leader

Row Echelon Form and Reduced Row Echelon Form

Game Theory: Supermodular Games 1

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Tools for the analysis and design of communication networks with Markovian dynamics

160 CHAPTER 4. VECTOR SPACES

Universal Algorithm for Trading in Stock Market Based on the Method of Calibration

Mathematics (MAT) MAT 061 Basic Euclidean Geometry 3 Hours. MAT 051 Pre-Algebra 4 Hours

Linear Algebra Notes

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents

ON COMPLETELY CONTINUOUS INTEGRATION OPERATORS OF A VECTOR MEASURE. 1. Introduction

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

24. The Branch and Bound Method

Perron vector Optimization applied to search engines

Methods for Finding Bases

Separation Properties for Locally Convex Cones

Sensitivity analysis of utility based prices and risk-tolerance wealth processes

Date: April 12, Contents

Metric Spaces Joseph Muscat 2003 (Last revised May 2009)

(67902) Topics in Theory and Complexity Nov 2, Lecture 7

x a x 2 (1 + x 2 ) n.

A FIRST COURSE IN OPTIMIZATION THEORY

Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.

Lecture 2: August 29. Linear Programming (part I)

(Quasi-)Newton methods

7 Gaussian Elimination and LU Factorization

Lecture 5 Principal Minors and the Hessian

1. Let X and Y be normed spaces and let T B(X, Y ).

Linear Maps. Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007)

University of Lille I PC first year list of exercises n 7. Review

4: SINGLE-PERIOD MARKET MODELS

Chapter Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

October 3rd, Linear Algebra & Properties of the Covariance Matrix

Follow links for Class Use and other Permissions. For more information send to:

Lecture 1: Schur s Unitary Triangularization Theorem

Factorization Theorems

Ranking on Data Manifolds

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

Systems of Linear Equations

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.

n k=1 k=0 1/k! = e. Example 6.4. The series 1/k 2 converges in R. Indeed, if s n = n then k=1 1/k, then s 2n s n = 1 n

Transcription:

Advanced Optimization (10-801: CMU) Lecture 22 Fixed-point theory; nonlinear conic optimization 07 Apr 2014 Suvrit Sra

Fixed-point theory Many optimization problems h(x) = 0 x h(x) = 0 (I h)(x) = x 2 / 19

Fixed-point theory Many optimization problems h(x) 0 x h(x) 0 (I h)(x) x 2 / 19

Fixed-point theory Fixed-point Any x that solves x = f(x) 3 / 19

Fixed-point theory Fixed-point Any x that solves x = f(x) Three types of results 1 Geometric: Banach contraction and relatives 3 / 19

Fixed-point theory Fixed-point Any x that solves x = f(x) Three types of results 1 Geometric: Banach contraction and relatives 2 Order-theoretic: Knaster-Tarski 3 / 19

Fixed-point theory Fixed-point Any x that solves x = f(x) Three types of results 1 Geometric: Banach contraction and relatives 2 Order-theoretic: Knaster-Tarski 3 Topological: Brouwer, Schauder-Leray, etc. 3 / 19

Fixed-point theory main concerns existence of a solution uniqueness of a solution stability under small perturbations of parameters structure of solution set (failing uniqueness) algorithms / approximation methods to obtain solutions rate of convergence analyses 4 / 19

Fixed-point theory Banach contraction Some conditions under which the nonlinear equation x = T x, x M X, can be solved by iterating x k+1 = T x k, x 0 M, k = 0, 1,.... 5 / 19

Fixed-point theory Banach contraction Theorem (Banach 1922.) Suppose (i) T : M X M; (ii) M is closed, nonempty set in a complete metric space (X, d); (iii) T is q-contractive, i.e., d(t x, T y) qd(x, y), x, y M, constant 0 q < 1. Then, we have the following: (i) T x = x has exactly one solution (T has a unique FP in M) (ii) The sequence {x k } converges to the solution for any x 0 M (iii) A priori error estimate (iv) A posterior error estimate d(x k, x ) q k (1 q) 1 d(x 0, x 1 ) d(x k+1, x ) q(1 q) 1 d(x k, x k+1 ) (v) (Global) linear rate of convergence: d(x k+1, x ) qd(x k, x ) 5 / 19

Fixed-point theory Banach contraction If X is a Banach space with distance d = x y T x T y q x y, 0 q < 1 (contraction) 6 / 19

Fixed-point theory Banach contraction If X is a Banach space with distance d = x y T x T y q x y, 0 q < 1 (contraction) If inequality holds with q = 1, we call map nonexpansive d(t x, T y) d(x, y) Example: x x + 1 is nonexpansive, but has no fixed points! 6 / 19

Fixed-point theory Banach contraction If X is a Banach space with distance d = x y T x T y q x y, 0 q < 1 (contraction) If inequality holds with q = 1, we call map nonexpansive d(t x, T y) d(x, y) Example: x x + 1 is nonexpansive, but has no fixed points! Map is called contractive or weakly-contractive if d(t x, T y) < d(x, y), x, y M. 6 / 19

Fixed-point theory Banach contraction If X is a Banach space with distance d = x y T x T y q x y, 0 q < 1 (contraction) If inequality holds with q = 1, we call map nonexpansive d(t x, T y) d(x, y) Example: x x + 1 is nonexpansive, but has no fixed points! Map is called contractive or weakly-contractive if d(t x, T y) < d(x, y), x, y M. Several other variations of maps have been studied for Banach spaces (see e.g., book by Bauschke, Combettes (2012)) 6 / 19

Banach contraction proof Blackboard 7 / 19

Banach contraction proof Summary: Blackboard d must be positive-definite, i.e, d(x, y) = 0 iff x = y (X, d) must be complete (contain all its Cauchy sequences) T : M M, M must be closed But M need not be compact! Contraction is often a rare luxury; nonexpansive maps are more common (we ve already seen several) 7 / 19

More general fixed-point theorems Theorem (Brouwer FP.) Every continuous function from a convex compact subset M R d to M itself has a fixed-point. 8 / 19

More general fixed-point theorems Theorem (Brouwer FP.) Every continuous function from a convex compact subset M R d to M itself has a fixed-point. Note: Proves existence only! 8 / 19

More general fixed-point theorems Theorem (Brouwer FP.) Every continuous function from a convex compact subset M R d to M itself has a fixed-point. Note: Proves existence only! Generalizes the intermediate-value theorem. 8 / 19

More general fixed-point theorems Theorem (Brouwer FP.) Every continuous function from a convex compact subset M R d to M itself has a fixed-point. Note: Proves existence only! Generalizes the intermediate-value theorem. Theorem (Schauder FP.) Every continuous function from a convex compact subset M of a Banach space to M itself has a fixed-point. 8 / 19

More general fixed-point theorems Theorem (Brouwer FP.) Every continuous function from a convex compact subset M R d to M itself has a fixed-point. Note: Proves existence only! Generalizes the intermediate-value theorem. Theorem (Schauder FP.) Every continuous function from a convex compact subset M of a Banach space to M itself has a fixed-point. Remarks: Brouwer FPs very hard. Exponential lower bounds for finding Brouwer fixed points Hirsch, Papadimitriou, Vavasis (1988). 8 / 19

More general fixed-point theorems Theorem (Brouwer FP.) Every continuous function from a convex compact subset M R d to M itself has a fixed-point. Note: Proves existence only! Generalizes the intermediate-value theorem. Theorem (Schauder FP.) Every continuous function from a convex compact subset M of a Banach space to M itself has a fixed-point. Remarks: Brouwer FPs very hard. Exponential lower bounds for finding Brouwer fixed points Hirsch, Papadimitriou, Vavasis (1988). Any algorithm for computing a Brouwer FP based on function evaluations only must in the worst case perform a number of function evaluations exponential in both the number of digits of accuracy and the dimension. 8 / 19

More general fixed-point theorems Theorem (Brouwer FP.) Every continuous function from a convex compact subset M R d to M itself has a fixed-point. Note: Proves existence only! Generalizes the intermediate-value theorem. Theorem (Schauder FP.) Every continuous function from a convex compact subset M of a Banach space to M itself has a fixed-point. Remarks: Brouwer FPs very hard. Exponential lower bounds for finding Brouwer fixed points Hirsch, Papadimitriou, Vavasis (1988). Any algorithm for computing a Brouwer FP based on function evaluations only must in the worst case perform a number of function evaluations exponential in both the number of digits of accuracy and the dimension. Contrast with n = 1, where bisection yields f(ˆx) f 2 δ in O(δ) 8 / 19

Kakutani fixed-point theorem FP theorem for set-valued mappings (recall x (I f)(x)) 9 / 19

Kakutani fixed-point theorem FP theorem for set-valued mappings (recall x (I f)(x)) Set-valued map F : M 2 M, x M F (x) 2 M, i.e. F (x) M. Closed-graph {(x, y) y F (x)} is a closed subset ofx X (i.e., x k x, y k y and y k F (x k ) = y F (x)) 9 / 19

Kakutani fixed-point theorem FP theorem for set-valued mappings (recall x (I f)(x)) Set-valued map F : M 2 M, x M F (x) 2 M, i.e. F (x) M. Closed-graph {(x, y) y F (x)} is a closed subset ofx X (i.e., x k x, y k y and y k F (x k ) = y F (x)) Theorem (S. Kakutani 1941.) Let M R n be nonempty, convex, compact. Let F : M 2 M be a set-valued map with a closed graph; also for all x M, let F (x) be non-empty and convex. Then, F has a fixed point. Application: See proof of Nash equilibrium on Wikipedia 9 / 19

Brouwer FP example Consider a Markov transition matrix A R n n + Column stochastic: a ij 0 and i a ij = 1 for 1 j n 10 / 19

Brouwer FP example Consider a Markov transition matrix A R n n + Column stochastic: a ij 0 and i a ij = 1 for 1 j n Claim. There is a probability vector x that is an eigenvector of A. 10 / 19

Brouwer FP example Consider a Markov transition matrix A R n n + Column stochastic: a ij 0 and i a ij = 1 for 1 j n Claim. There is a probability vector x that is an eigenvector of A. Prove: x 0, x T 1 = 1 such that Ax = x. 10 / 19

Brouwer FP example Consider a Markov transition matrix A R n n + Column stochastic: a ij 0 and i a ij = 1 for 1 j n Claim. There is a probability vector x that is an eigenvector of A. Prove: x 0, x T 1 = 1 such that Ax = x. Let n be probability simplex (compact, convex subset of R n ) 10 / 19

Brouwer FP example Consider a Markov transition matrix A R n n + Column stochastic: a ij 0 and i a ij = 1 for 1 j n Claim. There is a probability vector x that is an eigenvector of A. Prove: x 0, x T 1 = 1 such that Ax = x. Let n be probability simplex (compact, convex subset of R n ) Verify that if x n then Ax n 10 / 19

Brouwer FP example Consider a Markov transition matrix A R n n + Column stochastic: a ij 0 and i a ij = 1 for 1 j n Claim. There is a probability vector x that is an eigenvector of A. Prove: x 0, x T 1 = 1 such that Ax = x. Let n be probability simplex (compact, convex subset of R n ) Verify that if x n then Ax n Thus, A : n n ; A is obviously continuous 10 / 19

Brouwer FP example Consider a Markov transition matrix A R n n + Column stochastic: a ij 0 and i a ij = 1 for 1 j n Claim. There is a probability vector x that is an eigenvector of A. Prove: x 0, x T 1 = 1 such that Ax = x. Let n be probability simplex (compact, convex subset of R n ) Verify that if x n then Ax n Thus, A : n n ; A is obviously continuous Hence by Brouwer FP: there is an x n such that Ax = x 10 / 19

Brouwer FP example Consider a Markov transition matrix A R n n + Column stochastic: a ij 0 and i a ij = 1 for 1 j n Claim. There is a probability vector x that is an eigenvector of A. Prove: x 0, x T 1 = 1 such that Ax = x. Let n be probability simplex (compact, convex subset of R n ) Verify that if x n then Ax n Thus, A : n n ; A is obviously continuous Hence by Brouwer FP: there is an x n such that Ax = x How to compute such an x? 10 / 19

Conic optimization 11 / 19

Some definitions Let K be a cone in a real vector space V 12 / 19

Some definitions Let K be a cone in a real vector space V Let y K and x V. We say y dominates x if αy K x K βy, for some α, β R. 12 / 19

Some definitions Let K be a cone in a real vector space V Let y K and x V. We say y dominates x if αy K x K βy, for some α, β R. Max-min gauges M K (x/y) := inf {β R x βy} m K (x/y) := sup {α R αy x}. Shorthand: K 12 / 19

Some definitions Let K be a cone in a real vector space V Let y K and x V. We say y dominates x if αy K x K βy, for some α, β R. Max-min gauges M K (x/y) := inf {β R x βy} m K (x/y) := sup {α R αy x}. Shorthand: K Parts: We have an equivalence relation x K y on K if x dominates y and vice versa. 12 / 19

Some definitions Let K be a cone in a real vector space V Let y K and x V. We say y dominates x if αy K x K βy, for some α, β R. Max-min gauges M K (x/y) := inf {β R x βy} m K (x/y) := sup {α R αy x}. Shorthand: K Parts: We have an equivalence relation x K y on K if x dominates y and vice versa. The equivalence classes are called parts of the cone. 12 / 19

Hilbert projective metric If x K y with y 0, then α, β > 0 s.t. αy x βy. 13 / 19

Hilbert projective metric If x K y with y 0, then α, β > 0 s.t. αy x βy. Def. (Hilbert metric.) Let x K y and y 0. Then, d H (x, y) := log M(x/y) m(x/y) 13 / 19

Hilbert projective metric If x K y with y 0, then α, β > 0 s.t. αy x βy. Def. (Hilbert metric.) Let x K y and y 0. Then, d H (x, y) := log M(x/y) m(x/y) Proposition. Let K be a cone in V ; (K, d H ) satisfies: d H (x, y) 0, and d H (x, y) = d H (y, x) for all x, y K d H (x, z) d H (x, y) + d H (y, z) for all x K y K z, and d H (αx, βy) = d H (x, y) for all α, β > 0 and x, y K. If K is closed, then d H (x, y) = 0 iff x = λy for some λ > 0. In this case, if X K satisfies that for each x K \ {0} there is a unique λ > 0 such that λx X and P is a part of K, then (P X, d H ) is a genuine metric space. Proof: on blackboard 13 / 19

Nonexpansive maps with d H Def. (OPSH maps.) Let K V and K V be closed cones. The f : K K is called order preserving if for x K y, f(x) K f(y). It is homogeneous of degree r if f(λx) = λ r f(x) for all x K and λ > 0. It is subhomogeneous if λf(x) f(λx) for all x K and 0 < λ < 1. 14 / 19

Nonexpansive maps with d H Def. (OPSH maps.) Let K V and K V be closed cones. The f : K K is called order preserving if for x K y, f(x) K f(y). It is homogeneous of degree r if f(λx) = λ r f(x) for all x K and λ > 0. It is subhomogeneous if λf(x) f(λx) for all x K and 0 < λ < 1. Exercise: Prove that if f : K K is OPH of degree r > 0 then d H (f(x), f(y)) rd H (x, y). 14 / 19

Nonexpansive maps with d H Def. (OPSH maps.) Let K V and K V be closed cones. The f : K K is called order preserving if for x K y, f(x) K f(y). It is homogeneous of degree r if f(λx) = λ r f(x) for all x K and λ > 0. It is subhomogeneous if λf(x) f(λx) for all x K and 0 < λ < 1. Exercise: Prove that if f : K K is OPH of degree r > 0 then d H (f(x), f(y)) rd H (x, y). In particular, if r = 1, then f is nonexpansive (in d H ) 14 / 19

Birkhoff s theorem Let L be a linear operator on a cone K (L : K K) 15 / 19

Birkhoff s theorem Let L be a linear operator on a cone K (L : K K) Contraction ratio κ(l) := inf {λ 0 d H (Lx, Ly) λd H (x, y) for all x K y in K}. 15 / 19

Birkhoff s theorem Let L be a linear operator on a cone K (L : K K) Contraction ratio κ(l) := inf {λ 0 d H (Lx, Ly) λd H (x, y) for all x K y in K}. Theorem (Birkhoff.) Let (L) := sup {d H (Lx, Ly) Lx K Ly} be the projective diameter of L. Then κ(l) = tanh( 1 4 (L)) 15 / 19

Birkhoff s theorem Let L be a linear operator on a cone K (L : K K) Contraction ratio κ(l) := inf {λ 0 d H (Lx, Ly) λd H (x, y) for all x K y in K}. Theorem (Birkhoff.) Let (L) := sup {d H (Lx, Ly) Lx K Ly} be the projective diameter of L. Then κ(l) = tanh( 1 4 (L)) If (L) <, then we have a strict contraction! 15 / 19

Application to Pagerank eigenvector Markov transition matrix A R n n + Column stochastic: a ij 0 and i a ij = 1 for 1 j n 16 / 19

Application to Pagerank eigenvector Markov transition matrix A R n n + Column stochastic: a ij 0 and i a ij = 1 for 1 j n Consider cone K R n + Suppose (A) < (next slide) 16 / 19

Application to Pagerank eigenvector Markov transition matrix A R n n + Column stochastic: a ij 0 and i a ij = 1 for 1 j n Consider cone K R n + Suppose (A) < (next slide) Then d H (Ax, Ay) κ(a)d H (x, y) strict contraction 16 / 19

Application to Pagerank eigenvector Markov transition matrix A R n n + Column stochastic: a ij 0 and i a ij = 1 for 1 j n Consider cone K R n + Suppose (A) < (next slide) Then d H (Ax, Ay) κ(a)d H (x, y) strict contraction Need to argue that ( n, d H ) is a complete metric space 16 / 19

Application to Pagerank eigenvector Markov transition matrix A R n n + Column stochastic: a ij 0 and i a ij = 1 for 1 j n Consider cone K R n + Suppose (A) < (next slide) Then d H (Ax, Ay) κ(a)d H (x, y) strict contraction Need to argue that ( n, d H ) is a complete metric space Invoke Banach contraction theorem. 16 / 19

Application to Pagerank eigenvector Markov transition matrix A R n n + Column stochastic: a ij 0 and i a ij = 1 for 1 j n Consider cone K R n + Suppose (A) < (next slide) Then d H (Ax, Ay) κ(a)d H (x, y) strict contraction Need to argue that ( n, d H ) is a complete metric space Invoke Banach contraction theorem. Linear rate of convergence 16 / 19

Application to Pagerank eigenvector Let K = R n + and K = R m +, and A R m n 17 / 19

Application to Pagerank eigenvector Let K = R n + and K = R m +, and A R m n A(K) K iff a ij 0 17 / 19

Application to Pagerank eigenvector Let K = R n + and K = R m +, and A R m n A(K) K iff a ij 0 x K y is equivalent to I x := {i x i > 0} = {i y i > 0} 17 / 19

Application to Pagerank eigenvector Let K = R n + and K = R m +, and A R m n A(K) K iff a ij 0 x K y is equivalent to I x := {i x i > 0} = {i y i > 0} In this case, we obtain d H (x, y) = log ( ) x i y j max i,j I x x j y i 17 / 19

Application to Pagerank eigenvector Let K = R n + and K = R m +, and A R m n A(K) K iff a ij 0 x K y is equivalent to I x := {i x i > 0} = {i y i > 0} In this case, we obtain d H (x, y) = log ( ) x i y j max i,j I x x j y i Lemma If A R m n +. If there exists J [n] s.t. Ae i K Ae j for all i, j J, and Ae i = 0 for all i J then the projective diameter (A) = max i,j J d H(Ae i, Ae j ) <. 17 / 19

More applications Geometric optimization on the psd cone Sra, Hosseini (2013). Conic geometric optimisation on the manifold of positive definite matrices. arxiv:1312.1039. MDPs, Stochastic games, Nonlinear eigenvalue problems, etc. 18 / 19

References Nonlinear functional analysis Vol.1 (Fixed-point theorems). E. Zeidler. Nonlinear Perron-Frobenius theory. Lemmens, Nussbaum (2013). 19 / 19