Convexity I: Sets and Functions

Similar documents
3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

2.3 Convex Constrained Optimization Problems

Optimisation et simulation numérique.

Duality in General Programs. Ryan Tibshirani Convex Optimization /36-725

What is Linear Programming?

Convex analysis and profit/cost/support functions

3. Linear Programming and Polyhedral Combinatorics

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

Notes on Symmetric Matrices

CHAPTER 9. Integer Programming

Lecture 5 Principal Minors and the Hessian

10. Proximal point method

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Nonlinear Programming Methods.S2 Quadratic Programming

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION

Duality of linear conic problems

1 Norms and Vector Spaces

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

(Basic definitions and properties; Separation theorems; Characterizations) 1.1 Definition, examples, inner description, algebraic properties

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

NP-hardness of Deciding Convexity of Quartic Polynomials and Related Problems

1 if 1 x 0 1 if 0 x 1

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.

Solutions Of Some Non-Linear Programming Problems BIJAN KUMAR PATEL. Master of Science in Mathematics. Prof. ANIL KUMAR

NP-hardness of Deciding Convexity of Quartic Polynomials and Related Problems

TOPIC 4: DERIVATIVES

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.

Lecture 1: Schur s Unitary Triangularization Theorem

. P. 4.3 Basic feasible solutions and vertices of polyhedra. x 1. x 2

Notes from Week 1: Algorithms for sequential prediction

Lecture 7: Finding Lyapunov Functions 1

PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include

Homework # 3 Solutions

Linear Maps. Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007)

Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.

1. Prove that the empty set is a subset of every set.

Chapter 6. Cuboids. and. vol(conv(p ))

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Linear Algebra I. Ronald van Luijk, 2012

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

Linear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued.

Convex Optimization. Lieven Vandenberghe University of California, Los Angeles

Proximal mapping via network optimization

Completely Positive Cone and its Dual

Convex Rationing Solutions (Incomplete Version, Do not distribute)

ALMOST COMMON PRIORS 1. INTRODUCTION

Lectures notes on orthogonal matrices (with exercises) Linear Algebra II - Spring 2004 by D. Klain

Lecture 2: August 29. Linear Programming (part I)

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Additional Exercises for Convex Optimization

Introduction to Algebraic Geometry. Bézout s Theorem and Inflection Points

LS.6 Solution Matrices

Adaptive Online Gradient Descent

Linear Programming I

F Matrix Calculus F 1

1 VECTOR SPACES AND SUBSPACES

Largest Fixed-Aspect, Axis-Aligned Rectangle

(a) We have x = 3 + 2t, y = 2 t, z = 6 so solving for t we get the symmetric equations. x 3 2. = 2 y, z = 6. t 2 2t + 1 = 0,

Continuity of the Perron Root

Nonconvex? NP! (No Problem!) Ryan Tibshirani Convex Optimization /36-725

Mathematical finance and linear programming (optimization)

How To Prove The Dirichlet Unit Theorem

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Ideal Class Group and Units

Section 1.1. Introduction to R n

Lecture 13 Linear quadratic Lyapunov theory

1 Solving LPs: The Simplex Algorithm of George Dantzig

Mathematics Course 111: Algebra I Part IV: Vector Spaces

1 Homework 1. [p 0 q i+j p i 1 q j+1 ] + [p i q j ] + [p i+1 q j p i+j q 0 ]

ORIENTATIONS. Contents

FACTORIZATION OF TROPICAL POLYNOMIALS IN ONE AND SEVERAL VARIABLES. Nathan B. Grigg. Submitted to Brigham Young University in partial fulfillment

Algebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the school year.

Section 13.5 Equations of Lines and Planes

Metric Spaces. Chapter Metrics

Classification of Cartan matrices

1 Lecture: Integration of rational functions by decomposition

Math 4310 Handout - Quotient Vector Spaces

Master s Theory Exam Spring 2006

Discuss the size of the instance for the minimum spanning tree problem.

Solving Systems of Linear Equations

Lecture 4: BK inequality 27th August and 6th September, 2007

Limits and Continuity

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

9 MATRICES AND TRANSFORMATIONS

Numerical Analysis Lecture Notes

Linear Programming: Theory and Applications

Several Views of Support Vector Machines

Mathematical Methods of Engineering Analysis

Inner product. Definition of inner product

Understanding Basic Calculus

1 Introduction to Matrices

Math 120 Final Exam Practice Problems, Form: A

MATH 425, PRACTICE FINAL EXAM SOLUTIONS.

6.207/14.15: Networks Lecture 15: Repeated Games and Cooperation

Actually Doing It! 6. Prove that the regular unit cube (say 1cm=unit) of sufficiently high dimension can fit inside it the whole city of New York.

On Minimal Valid Inequalities for Mixed Integer Conic Programs

ON TORI TRIANGULATIONS ASSOCIATED WITH TWO-DIMENSIONAL CONTINUED FRACTIONS OF CUBIC IRRATIONALITIES.

19 LINEAR QUADRATIC REGULATOR

BX in ( u, v) basis in two ways. On the one hand, AN = u+

Date: April 12, Contents

Transcription:

Convexity I: Sets and Functions Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 See supplements for reviews of basic real analysis basic multivariate calculus basic linear algebra

Last time: why convexity? Why convexity? Simply put: because we can broadly understand and solve convex optimization problems Nonconvex problems are mostly treated on a case by case basis Reminder: a convex optimization problem is of the form min x D f(x) subject to g i (x) 0, i = 1,... m h j (x) = 0, j = 1,... r where f and g i, i = 1,... m are all convex, and h j, j = 1,... r are affine. Special property: any local minimizer is a global minimizer 2

Outline Today: Convex sets Examples Key properties Operations preserving convexity Same, for convex functions 3

Convex sets Convex set: C R n such that x, y C = tx + (1 t)y C for all 0 t 1 In words, line segment joining any two elements lies entirely in set Convex combination of x 1,... x k R n : any linear combination θ 1 x 1 +... + θ k x k with θ i 0, i = 1,... k, and k i=1 θ i = 1. Convex hull of a set C, conv(c), is all convex combinations of elements. Always convex 4

Examples of convex sets Trivial ones: empty set, point, line Norm ball: {x : x r}, for given norm, radius r Hyperplane: {x : a T x = b}, for given a, b Halfspace: {x : a T x b} Affine space: {x : Ax = b}, for given A, b 5

Polyhedron: {x : Ax b}, where inequality is interpreted componentwise. Note: the set {x : Ax b, Cx = d} is also a polyhedron (why?) a1 a2 P a5 a3 a4 Simplex: special case of polyhedra, given by conv{x 0,... x k }, where these points are affinely independent. The canonical example is the probability simplex, conv{e 1,... e n } = {w : w 0, 1 T w = 1} 6

Cone: C R n such that Cones x C = tx C for all t 0 Convex cone: cone that is also convex, i.e., x 1, x 2 C = t 1 x 1 + t 2 x 2 C for all t 1, t 2 0 x1 x2 0 Conic combination of x 1,... x k R n : any linear combination θ 1 x 1 +... + θ k x k with θ i 0, i = 1,... k. Conic hull collects all conic combinations 7

Examples of convex cones Norm cone: {(x, t) : x t}, for a norm. Under l 2 norm 2, called second-order cone Normal cone: given any set C and point x C, we can define N C (x) = {g : g T x g T y, for all y C} This is always a convex cone, regardless of C Positive semidefinite cone: S n + = {X S n : X 0}, where X 0 means that X is positive semidefinite (and S n is the set of n n symmetric matrices) 8

Key properties of convex sets Separating hyperplane theorem: two disjoint convex sets have a separating between hyperplane them a T x b a T x b D C a Formally: if C, D are nonempty convex sets with C D =, then there exists a, b such that C {x : a T x b} D {x : a T x b} 9

Supporting hyperplane theorem: a boundary point of a convex set has a supporting hyperplane passing through it Formally: if C is a nonempty convex set, and x 0 bd(c), then there exists a such that C {x : a T x a T x 0 } Both of the above theorems (separating and supporting hyperplane theorems) have partial converses; see Section 2.5 of BV 10

11 Operations preserving convexity Intersection: the intersection of convex sets is convex Scaling and translation: if C is convex, then is convex for any a, b ac + b = {ax + b : x C} Affine images and preimages: if f(x) = Ax + b and C is convex then f(c) = {f(x) : x C} is convex, and if D is convex then is convex f 1 (D) = {x : f(x) D}

12 Example: linear matrix inequality solution set Given A 1,... A k, B S n, a linear matrix inequality is of the form x 1 A 1 + x 2 A 2 +... + x k A k B for a variable x R k. Let s prove the set C of points x that satisfy the above inequality is convex Approach 1: directly verify that x, y C tx + (1 t)y C. This follows by checking that, for any v, v T ( B k (tx i + (1 t)y i )A i )v 0 i=1 Approach 2: let f : R k S n, f(x) = B k i=1 x ia i. Note that C = f 1 (S n +), affine preimage of convex set

13 Example: Fantope Given some integer k 0, the Fantope of order k is { } F k = Z S n : 0 Z I, tr(z) = k where recall the trace operator tr(z) = n i=1 Z ii is the sum of the diagonal entries. Let s prove that F k is convex Approach 1: verify that 0 Z, W I and tr(z) = tr(w ) = I implies the same for tz + (1 t)w Approach 2: recognize the fact that F k = {Z S n : Z 0} {Z S n : Z I} {Z S n : tr(z) = k} intersection of linear inequality and equality constraints (hence like a polyhedron but for matrices)

14 More operations preserving convexity Perspective images and preimages: the perspective function is P : R n R ++ R n (where R ++ denotes positive reals), P (x, z) = x/z for z > 0. If C dom(p ) is convex then so is P (C), and if D is convex then so is P 1 (D) Linear-fractional images and preimages: the perspective map composed with an affine function, f(x) = Ax + b c T x + d is called a linear-fractional function, defined on c T x + d > 0. If C dom(f) is convex then so if f(c), and if D is convex then so is f 1 (D)

15 Example: conditional probability set Let U, V be random variables over {1,... n} and {1,... m}. Let C R nm be a set of joint distributions for U, V, i.e., each p C defines joint probabilities p ij = P(U = i, V = j) Let D R nm contain corresponding conditional distributions, i.e., each q D defines q ij = P(U = i V = j) Assume C is convex. Let s prove that D is convex. Write { D = q R nm p } ij : q ij = n k=1 p, for some p C = f(c) kj where f is a linear-fractional function, hence D is convex

16 Convex functions Convex function: f : R n R such that dom(f) R n convex, and f(tx + (1 t)y) tf(x) + (1 t)f(y) for 0 t 1 and all x, y dom(f) (x, f(x)) (y, f(y)) In words, f lies below the line segment joining f(x), f(y) Concave function: opposite inequality above, so that f concave f convex

17 Important modifiers: Strictly convex: f ( tx + (1 t)y ) < tf(x) + (1 t)f(y) for x y and 0 < t < 1. In words, f is convex and has greater curvature than a linear function Strongly convex with parameter m > 0: f m 2 x 2 2 is convex. In words, f is at least as convex as a quadratic function Note: strongly convex strictly convex convex (Analogously for concave functions)

18 Examples of convex functions Univariate functions: Exponential function: e ax is convex for any a over R Power function: x a is convex for a 1 or a 0 over R + (nonnegative reals) Power function: x a is concave for 0 a 1 over R + Logarithmic function: log x is concave over R++ Affine function: a T x + b is both convex and concave Quadratic function: 1 2 xt Qx + b T x + c is convex provided that Q 0 (positive semidefinite) Least squares loss: y Ax 2 2 is always convex (since AT A is always positive semidefinite)

19 Norm: x is convex for any norm; e.g., l p norms, x p = ( n i=1 x p i ) 1/p for p 1, x = max i=1,...n x i and also operator (spectral) and trace (nuclear) norms, X op = σ 1 (X), X tr = r σ r (X) i=1 where σ 1 (X)... σ r (X) 0 are the singular values of the matrix X

20 Indicator function: if C is convex, then its indicator function { 0 x C I C (x) = x / C is convex Support function: for any set C (convex or not), its support function IC(x) = max y C xt y is convex Max function: f(x) = max{x 1,... x n } is convex

21 Key properties of convex functions A function is convex if and only if its restriction to any line is convex Epigraph characterization: a function f is convex if and only if its epigraph is a convex set epi(f) = {(x, t) dom(f) R : f(x) t} Convex sublevel sets: if f is convex, then its sublevel sets {x dom(f) : f(x) t} are convex, for all t R. The converse is not true

22 First-order characterization: if f is differentiable, then f is convex if and only if dom(f) is convex, and f(y) f(x) + f(x) T (y x) for all x, y dom(f). Therefore for a differentiable convex function f(x) = 0 x minimizes f Second-order characterization: if f is twice differentiable, then f is convex if and only if dom(f) is convex, and 2 f(x) 0 for all x dom(f) Jensen s inequality: if f is convex, and X is a random variable supported on dom(f), then f(e[x]) E[f(X)]

23 Operations preserving convexity Nonnegative linear combination: f 1,... f m convex implies a 1 f 1 +... + a m f m convex for any a 1,... a m 0 Pointwise maximization: if f s is convex for any s S, then f(x) = max s S f s (x) is convex. Note that the set S here (number of functions f s ) can be infinite Partial minimization: if g(x, y) is convex in x, y, and C is convex, then f(x) = min y C g(x, y) is convex

24 Example: distances to a set Let C be an arbitrary set, and consider the maximum distance to C under an arbitrary norm : f(x) = max y C x y Let s check this is convex: f y (x) = x y is convex in x for any fixed y, so by pointwise maximization rule, f is convex Now let C be convex, and consider the minimum distance to C: f(x) = min y C x y Let s check this is convex: g(x, y) = x y is convex in x, y jointly, and C is assumed convex, so apply partial minimization rule

25 More operations preserving convexity Affine composition: f convex implies g(x) = f(ax + b) convex General composition: suppose f = h g, where g : R n R, h : R R, f : R n R. Then: f is convex if h is convex and nondecreasing, g is convex f is convex if h is convex and nonincreasing, g is concave f is concave if h is concave and nondecreasing, g concave f is concave if h is concave and nonincreasing, g convex How to remember these? Think of the chain rule when n = 1: f (x) = h (g(x))g (x) 2 + h (g(x))g (x)

26 Vector composition: suppose that f(x) = h ( g(x) ) = h ( g 1 (x),... g k (x) ) where g : R n R k, h : R k R, f : R n R. Then: f is convex if h is convex and nondecreasing in each argument, g is convex f is convex if h is convex and nonincreasing in each argument, g is concave f is concave if h is concave and nondecreasing in each argument, g is concave f is concave if h is concave and nonincreasing in each argument, g is convex

27 Example: log-sum-exp function Log-sum-exp function: g(x) = log( k i=1 eat i x+b i ), for fixed a i, b i, i = 1,... k. Often called soft max, as it smoothly approximates max i=1,...k (a T i x + b i) How to show convexity? First, note it suffices to prove convexity of f(x) = log( n i=1 ex i ) (affine composition rule) Now use second-order characterization. Calculate i f(x) = 2 ijf(x) = e x i n l=1 ex l e x i e x i e x j n 1{i = j} l=1 ex l ( n l=1 ex l ) 2 Write 2 f(x) = diag(z) zz T, where z i = e x i /( n l=1 ex l). This matrix is diagonally dominant, hence positive semidefinite

28 References and further reading S. Boyd and L. Vandenberghe (2004), Convex optimization, Chapters 2 and 3 J.P. Hiriart-Urruty and C. Lemarechal (1993), Fundamentals of convex analysis, Chapters A and B R. T. Rockafellar (1970), Convex analysis, Chapters 1 10,