CONSTRAINED NONLINEAR PROGRAMMING
|
|
|
- Sharon Walton
- 9 years ago
- Views:
Transcription
1 149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach the constrained nonlinear program is transformed into an unconstrained problem (or more commonly, a series of unconstrained problems) and solved using one (or some variant) of the algorithms we have already seen. Transformation methods may further be classified as (1) Exterior Penalty Function Methods (2) Interior Penalty (or Barrier) Function Methods (3) Augmented Lagrangian (or Multiplier) Methods 2. PRIMAL METHODS: These are methods that work directly on the original problem. We shall look at each of these in turn; we first start with transformation methods.
2 150 Exterior Penalty Function Methods These methods generate a sequence of infeasible points whose limit is an optimal solution to the original problem. The original constrained problem is transformed into an unconstrained one (or a series of unconstrained problems).the constraints are incorporated into the objective by means of a "penalty parameter" which penalizes any constraint violation. Consider the problem Min f(x) st g j (x) 0, j=1,2,,m, h j (x) = 0, j=1,2,,p, x R n Define a penalty function p as follows: m p(x) = Max{0, g j (x)} α j=1 where α is some positive integer. It is easy to see that If g j (x) 0 then Max{0, g j (x)} = 0 p + h j (x) α j=1 If g j (x) > 0 then Max{0, g j (x)} = g j (x) (violation) Now define the auxiliary function as P(x)= f(x) + µp(x),where µ >> 0. Thus if a constraint is violated, p(x)>0 and a "penalty" of µp(x) is incurred.
3 151 The (UNCONSTRAINED) problem now is: Minimize P(x)= f(x) + µp(x) st x R n EXAMPLE: Minimize f(x)=x, st g(x)=2-x 0, x R (Note that minimum is at x * =2) Let p(x)= Max[{0,g(x)}] 2, i. e., p(x) = (2 x)2 if x < 2 0 if x 2 If p(x)=0, then the optimal solution to Minimize f(x) + μμ(x) is at x * = - and this is infeasible; so f(x) + μμ(x) = x + μ(2 x) 2 Minimize P(x) = f(x) + μμ(x) has optimum solution x = 2 1 2μ. NNNN: As μ, 2 1 2μ 2 (= x ) µ 2 p µ 1 =0.5 µ 2 =1.5 f+µ 2 p µ 1 =0.5 µ 2 =1.5 p f+µ 1 p µ 1 p Penalty Function Auxiliary Function
4 152 In general: 1. We convert the constrained problem to an unconstrained one by using the penalty function. 2. The solution to the unconstrained problem can be made arbitrarily close to that of the original one by choosing sufficiently large µ. In practice, if µ is very large, too much emphasis is placed on feasibility. Often, algorithms for unconstrained optimization would stop prematurely because the step sizes required for improvement become very small. Usually, we solve a sequence of problems with successively increasing µ values; the optimal point from one iteration becomes the starting point for the next problem. PROCEDURE: Choose the following initially: 1. a tolerance ε, 2. an increase factor β, 3. a starting point x 1, and 4. an initial µ 1. At Iteration i 1. Solve the problem Minimize f(x)+µ i p(x); st x X (usually R n ). Use x i as the starting point and let the optimal solution be x i+1 2. If µ i p(x i+1 ) ε then STOP; else let µ i+1 =(µ i )β and start iteration (i+1)
5 153 EXAMPLE: Minimize f(x) = x x 2 2 Define the penalty function The unconstrained problem is st g(x) = 1 x 1 x 2 0; x R 2 Minimize x x µp(x) p(x) = (1 x 1 x 2 ) 2 if g(x) > 0 0 if g(x) 0 If p(x)=0, then the optimal solution is x * =(0,0) INFEASIBLE! p(x) = (1 x 1 x 2 ) 2, P(x) = x x µ [(1 x 1 x 2 ) 2 ], and the necessary conditions for the optimal solution ( P (x)=0) yield the following: P x 1 = 2x 1 + 2μ(1 x 1 x 2 )( 1) = 0, P x 2 = 4x 2 + 2μ(1 x 1 x 2) ( 1) = 0 Thus x 1 = 2μ (2 + 3μ) and x 2 μ = (2 + 3μ) Starting with µ =0.1, β=10 and x 1 =(0,0) and using a tolerance of (say), we have the following: Iter. (i) µ i x i+1 g(x i+1 ) µ i p(x i+1 ) (0.087, 0.043) (0.4, 0.2) (0.625, ) (0.6622, ) (0.666, 0.333) Thus the optimal solution is x * = (2/3, 1/3).
6 154 INTERIOR PENALTY (BARRIER) FUNCTION METHODS These methods also transform the original problem into an unconstrained one; however the barrier functions prevent the current solution from ever leaving the feasible region. These require that the interior of the feasible sets be nonempty, which is impossible if equality constraints are present. Therefore, they are used with problems having only inequality constraints. A barrier function B is one that is continuous and nonnegative over the interior of {x g(x) 0}, i.e., over the set {x g(x)<0}, and approaches as the boundary is approached from the interior. Let φ(y) 0 if y<0 and lim y 0 φ(y) = m Then B(x) = φ[g j (x)] Usually, m j=1 1 B(x) = j=1 or B(x) = j=1 lll ( g j (x)) g j (x) In both cases, note that lim gj (x) 0 B(x) = The auxiliary function is now f(x) + µb(x) where µ is a SMALL positive number. m
7 155 Q. Why should µ be SMALL? A. Ideally we would like B(x)=0 if g j (x)<0 and B(x)= if g j (x)=0, so that we never leave the region{x g(x) 0}. However, B(x) is now discontinuous. This causes serious computational problems during the unconstrained optimization. Consider the same problem as the one we looked at with (exterior) penalty functions: Minimize f(x) = x, st x+2 0 Define the barrier function B(x)= -1/(-x+2) So the auxiliary function is f(x) + µb(x) = x - µ/(-x+2), and has its optimum solution at x * =2 + μ, and as µ 0, x * µ 2 B B µ 1 B µ 1 =0.5 µ 2 = µ f+µ 2 B 1 =0.5 µ 2 =1.5 f+µ 1 B Barrier Function Auxiliary Function Similar to exterior penalty functions we don't just choose one small value for rather we start with some µ 1 and generate a sequence of points.
8 156 PROCEDURE: Initially choose a tolerance ε, a decrease factor β, an interior starting point x 1, and an initial µ 1. At Iteration i 1. Solve the problem Minimize f(x)+µ i B(x),st x X (usually R n ). Use x i as the starting point and let the optimal solution be x i+1 2. If µ i B(x i+1 ) < ε then STOP; else let µ i+1 =(µ i )β and start iteration (i+1) Consider the previous example once again EXAMPLE: Minimize f(x) = x x 2 2 Define the barrier function The unconstrained problem is st g(x) = 1 x 1 x 2 0; x R 2 B(x) = lll g(x) = lll (x 1 + x 2 1) Minimize x x µb(x) = x x µ lll (x 1 + x 2 1) The necessary conditions for the optimal solution ( f(x)=0) yield the following: x 1 = 2x 1 μ/(x 1 + x 2 1) = 0, x 2 = 4x 2 μ/(x 1 + x 2 1) = 0
9 157 Solving, we get x 1 = 1 ± 1 + 3μ 3 and x 2 = 1 ± 1 + 3μ 6 Since the negative signs lead to infeasibility, we x 1 = μ 3 and x 2 = μ 6 Starting with µ =1, β=0.1 and x 1 =(0,0) and using a tolerance of (say), we have the following: Iter. (i) µ i x i+1 g(x i+1 ) µ i B(x i+1 ) (1.0, 0.5) (0.714, 0.357) (0.672, 0.336) (0.6672, ) (0.6666, ) Thus the optimal solution is x * = (2/3, 1/3).
10 158 Penalty Function Methods and Lagrange Multipliers Consider the penalty function approach to the problem below Minimize f(x) st g j (x) 0; j = 1,2,,m h j (x) = 0; j = m+1,m+2,,l x k R n. Suppose we use the usual interior penalty function described earlier, with p=2. The auxiliary function that we minimize is then given by P(x) = f(x) + µp(x) = f(x) + µ{ j [Max(0, g j (x))] 2 + j [h j (x)] 2 } = f(x) + { j µ[max(0, g j (x))] 2 + j µ[h j (x)] 2 } The necessary condition for this to have a minimum is that P(x) = f(x) + µ p(x) = 0, i.e., f(x) + j 2µ[Max(0, g j (x)] g j (x) + j 2µ[h j (x)] h j (x) = 0 (1) Suppose that the solution to (1) for a fixed µ (say µ k >0) is given by x k. Let us also designate 2µ k [Max{0, g j (x k )}] = λ j (µ k ); j=1,2,,m (2) 2µ k [h j (x k )] = λ j (µ k ); j=m+1,,l (3) so that for µ=µ k we may rewrite (1) as
11 159 f(x) + j λ j (µ k ) g j (x) + j λ j (µ k ) h j (x) = 0 (4) Now consider the Lagrangian for the original problem: L(x,λ) = f(x) + j λ j g j (x) + j λ j h j (x) The usual K-K-T necessary conditions yield f(x) + j=1..m {λ j g j (x)} + j=m+1..l {λ j h j (x)} = 0 (5) plus the original constraints, complementary slackness for the inequality constraints and λ j 0 for j=1,,m. Comparing (4) and (5) we can see that when we minimize the auxiliary function using µ=µ k, the λ j (µ k ) values given by (2) and (3) estimate the Lagrange multipliers in (5). In fact it may be shown that as the penalty function method proceeds and µ k and the x k x *, the optimum solution, the values of λ j (µ k ) λ * j, the optimum Lagrange multiplier value for constraint j. Consider our example on page 153 once again. For this problem the Lagrangian is given by L(x,λ) =x x λ(1-x 1 -x 2 ). The K-K-T conditions yield L/ x 1 = 2x 1 -λ = 0; L/ x 2 = 4x 2 -λ = 0; λ(1-x 1 -x 2 )=0 Solving these results in x 1 * =2/3; x 2 * =1/3; λ * =4/3; (λ=0 yields an infeasible solution).
12 Recall that for fixed µ k the optimum value of x k was [2µ k /(2+3µ k ) µ k /(2+3µ k )] T. As we saw, when µ k, these converge to the optimum solution of x * = (2/3,1/3). 160 Now, suppose we use (2) to define λ(µ) = 2µ[Max(0, g j (x k )] = 2µ[1 - {2µ/(2+3µ)} - {µ/(2+3µ)}] (since g j (x k )>0 if µ>0) = 2µ[1 - {3µ/(2+3µ)}] = 4µ/(2+3µ) Then it is readily seen that Lim µ λ(µ) = Lim µ 4µ/(2+3µ) = 4/3 = λ * Similar statements can also be made for the barrier (interior penalty) function approach, e.g., if we use the log-barrier function we have P(x) = f(x) + µ p(x) = f(x) + µ j -log(-g j (x)), so that P(x) = f(x) - j {µ/g j (x)} g j (x) = 0 (6) If (like before) for a fixed µ k we denote the solution by x k and define -µ k /g j (x k ) = λ j (µ k ), then from (6) and (5) we see that λ j (µ k ) approximates λ j. Furthermore, as µ k 0 and the x k x *, it can be shown that λ j (µ k ) λ j * For our example (page 156) -µ k /g j (x k ) =µ k / μ μ = 6-2µ k /( μ k ) 4/3, as µ k 0.
13 161 Penalty and Barrier function methods have been referred to as SEQUENTIAL UNCONSTRAINED MINIMIZATION TECHNIQUES (SUMT) and studied in detail by Fiacco and McCormick. While they are attractive in the simplicity of the principle on which they are based, they also possess several undesirable properties. When the parameters µ are very large in value, the penalty functions tend to be ill-behaved near the boundary of the constraint set where the optimum points usually lie. Another problem is the choice of appropriate µ 1 and β values. The rate at which µ i change (i.e. the β values) can seriously affect the computational effort to find a solution. Also as µ increases, the Hessian of the unconstrained function often becomes ill-conditioned. Some of these problems are addressed by the so-called "multiplier" methods. Here there is no need for µ to go to infinity, and the unconstrained function is better conditioned with no singularities. Furthermore, they also have faster rates of convergence than SUMT.
14 162 Ill-Conditioning of the Hessian Matrix Consider the Hessian matrix of the Auxiliary function P(x)=x x 2 2 +µ(1-x 1 -x 2 ) 2 : H = 2 + 2µ 2µ 2µ 4 + 2µ. Suppose we want to find its eigenvalues by solving H-λI = (2+2µ-λ)*(4+2µ-λ) - 4µ 2 = λ 2 (6+4µ)λ + (8+12µ) = 0 This quadratic equation yields λ = (3 + 2µ ) ± 4µ Taking the ratio of the largest and the smallest eigenvalue yields (3 + 2µ ) + (3 + 2µ ) 2 4µ µ + 1. It should be clear that as µ, the limit of the preceding ratio also goes to. This indicates that as the iterations proceed and we start to increase the value of µ, the Hessian of the unconstrained function that we are minimizing becomes increasingly ill-conditioned. This is a common situation and is especially problematic if we are using a method for the unconstrained optimization that requires the use of the Hessian.
15 163 MULTIPLIER METHODS Consider the problem: Min f(x), st g j (x) 0, j=1,2,...,m. In multiplier methods the auxiliary function is given by m P(x) = f(x) + ½ μ j Max{0, g(x) + θ i j } 2 j=1 where µ j >0 and θ j i 0 are parameters associated with the j th constraint (i iteration no.). This is also often referred to as the AUGMENTED LAGRANGIAN function. Note that if θ j i =0 and µ j =µ, this reduces to the usual exterior penalty function. The basic idea behind multiplier methods is as follows: Let x i be the minimum of the auxiliary function P i (x) at some iteration i. From the optimality conditions we have m P i x i = f x i + μ j MMM 0, g j x i + θ j i g j (x i ) j=1 Now, MMM 0, g j x i + θ j i = θ j i + MMM θ j i, g j x i. So, P i x i = 0 implies m f x i + μ j θ i j g j (x i ) + μ j MMM θ i j, g j x i g j (x i ) = 0 j=1 m j=1 Let us assume for a moment that θ j i are chosen at each iteration i so that they satisfy the following: MMM θ j i, g j x i = 0 (1) It is easily verified that (1) is equivalent to requiring that θ j i 0, g j x i 0 and θ j i g j x i = 0 (2)
16 164 Therefore as long as (2) is satisfied, we have for the point x i that minimize P i (x) m P i x i = f x i + j=1 μ j θ i j g j (x i ) = 0 (3) If we let μ j θ i j = λ i j and use the fact that µ j >0, then (2) reduces to λ i j 0, g j x i 0 and λ i j g j x i = 0, j (A) and (3) reduces to m f x i + j=1 λ i j g j (x i ) = 0. (B) It is readily seen that (A) and (B) are merely the KARUSH-KUHN-TUCKER necessary conditions for x i to be a solution to the original problem below!! Min f(x), st g j (x) 0, j=1,2,...,m. We may therefore conclude that x i = x and μ j θ i j = μ j θ j = λ j Here x * is the solution to the problem, and λ * j is the optimum Lagrange multiplier associated with constraint j. From the previous discussion it should be clear that at each iteration i, θ j i should be chosen in such a way that MMM θ j i, g j 0, which results in x i x *. Note that x i is obtained here by minimizing the auxiliary function with respect to x.
17 ALGORITHM: A general algorithmic approach for the multiplier method may now be stated as follows: STEP 0: Set i=0, choose vectors x i, µ and θ i. 165 STEP 1: Set i=i+1. STEP 2: Starting at x i-1, Minimize P i (x) to find x i. STEP 3: Check convergence criteria and go to Step 4 only if the criteria are not satisfied. STEP 4: Modify θ i based on satisfying (1). Also modify µ if necessary and go to Step 2. (Usually θ 0 is set to 0 ) One example of a formula for changing θ i is the following: θ i+1 = θ i + mmm g j x i, θ j i
18 166 PRIMAL METHODS By a primal method we mean one that works directly on the original constrained problem: Min f(x) st g j (x) 0 j=1,2,...,m. Almost all methods are based on the following general strategy: Let x i be a design at iteration i. A new design x i+1 is found by the expression x i+1 = x i + α i d i, where α i is a step size parameter and d i is a search direction (vector). The direction vector d i is typically determined by an expression of the form: d i = β 0 f x i + β j g j x i j J ε where f and g j are gradient vectors of the objective and constraint functions and J ε is an active set" given by J ε ={j g j (x i )=0}, or more commonly ={j g j (x i )+ε 0, ε >0}. Unlike with transformation methods, it is evident that here the gradient vectors of individual constraint functions need to be evaluated. This is a characteristic of all primal methods.
19 167 METHOD OF FEASIBLE DIRECTIONS The first class of primal methods we look at are the feasible directions methods, where each x i is within the feasible region. An advantage is that if the process is terminated before reaching the true optimum solution (as is often necessary with large-scale problems) the terminating point is feasible and hence acceptable. The general strategy at iteration i is: (1) Let x i be a feasible point. (2) Choose a direction d i so that a) x i +αd i is feasible at least for some "sufficiently" small α>0, and b) f(x i +αd i ) < f(x i ). (3) Do an unconstrained optimization Min α f(x i +αd i ) to obtain α * =α i and hence x i+1 = x i +α i d i DEFINITION: For the problem Minimize f(x), st x X d ( 0) is called a feasible direction at x' X if δ>0 (x'+αd) X for α [0,δ) X x' X x' d feasible directions not feasible directions
20 168 Further, if we also have f(x'+αd) < f(x') for all α [0,α'), α' <δ, then d is called an improving feasible direction. Under differentiability, recall that this implies f T (x') d < 0 Let F x' = {d f T (x') d < 0}, and D x' = {d δ>0 (x'+αd) X for all α [0,δ)}, Thus F x' is the set of improving directions at x', and D x' is the set of feasible directions at x'. If x * is a local optimum then F x* D x* = φ as long as a constraint qualification is met. In general, for the problem Minimize f(x), st g j (x) 0, j=1,2,...,m. let J x' ={j g j (x')=0} (Index set of active constraints) We can show that the set of feasible directions at x' is D x' = {d g T j (x') d < 0 for all j J x' } NOTE: If g j (x) is a linear function, then the strict inequality (<) used to define the set D x' above can be relaxed to an inequality ( ).
21 169 Minimize f(x), st g j (x) 0, j=1,2,,m Geometric Interpretation g j (x * ) g j (x * ) x * d T g j (x * )=0 g j (x)=0 d x * g j (x)=0 d d T g j (x * )<0 d is a feasible direction (i) d is tangent to the constraint boundary (ii) In (ii) d any positive step in this direction is infeasible (however we will often consider this a feasible direction ) FARKAS LEMMA: Given A R m n, b R m, x R n, y R m, the following statements are equivalent to each other: 1. y T A 0 y T b 0 2. z such that Az=b, z 0 y T A 0 y j H j- where H j- is the closed half-space on the side of H j that does not contain A j So y T A 0 y T b 0 simply implies that ( j H j- ) H b- H 1 H 2 H 3 j H j- Suppose H j is the hyperplane through the origin that is orthogonal to A j H b A 1 b A 2 A 3
22 170 Application to NLP Let b - f(x * ) A j g j (x * ) y d (direction vector), z j λ j for j J x* {j g j (x * )=0} Farkas Lemma then implies that (1) d T g j (x * ) 0 j J x* -d T f(x * ) 0, i.e., d T f(x * ) 0 (2) λ j 0 such that j Jx* λ j g j (x * ) = - f(x * ) are equivalent! Note that (1) indicates that directions satisfying d T g j (x * ) 0 j J x* are feasible directions directions satisfying d T f(x * ) 0 are ascent (non-improving) directions On the other hand (2) indicates the K-K-T conditions. They are equivalent! So at the optimum, there is no feasible direction that is improving - f (x * ) g 3 (x * ) g 1 (x * ) Cone generated by gradients of tight constraints x * g 3 (x)=0 Feasible region g 2 (x)=0 g 1 (x)=0 K-K-T implies that the steepest descent direction is in the cone generated by gradients of tight constraints
23 171 Similarly for the general NLP problem Minimize f(x) st g j (x) 0, j=1,2,...,m h k (x) = 0, k=1,2,...,p the set of feasible directions is D x' = {d g T j (x') d < 0 j J x', and h T k (x') d = 0 k} ==================================================== In a feasible directions algorithm we have the following: STEP 1 (direction finding) To generate an improving feasible direction from x', we need to find d such that F x' D x' φ, i.e., f T (x') d < 0 and g T j (x') d < 0 for j J x', h T k (x') d = 0 k We therefore solve the subproblem Minimize f T (x') d st g T j (x') d < 0 for j J x' ( 0 for j g j linear) h T k (x') d = 0 for all k, along with some normalizing constraints such as -1 d i 1 for i=1,2,...,n or d 2 = d T d 1. If the objective of this subproblem is negative, we have an improving feasible direction d i.
24 In practice, the actual subproblem we solve is slightly different since strict inequality (<) constraints are difficult to handle. Let z = Max [ f T (x') d, g T j (x') d for j J x' ]. Then we solve: 172 Minimize z st f T (x') d z g T j (x') d z for j J h T k (x') d = 0 for k=1,2,...,p, -1 d i 1 for i=1,2,...,n If (z *,d * ) is the optimum solution for this then a) z * < 0 d =d * is an improving feasible direction b) z = 0 x * is a Fritz John point (or if a CQ is met, a K-K-T point) (Note that z * can never be greater than 0) STEP 2: (line search for step size) Assuming that z * from the first step is negative, we now solve Min f(x i +αd i ) st g j (x i +αd i ) 0, α 0 This can be rewritten as Min f(x i +αd i ) st 0 α α max where α max = supremum {α g j (x i +αd i ) 0, j=1,2,...,m}. Let the optimum solution to this be at α * =α i. Then we set x i+1 = (x i +αd i ) and return to Step 1.
25 NOTE: (1) In practice the set J x' is usually defined as J x' ={ j g j (x i )+ε 0}, where the tolerance ε i is referred to as the "constraint thickness," which may be reduced as the algorithm proceeds. (2) This procedure is usually attributed to ZOUTENDIJK EXAMPLE 173 Minimize 2x x 2 2-2x 1 x 2-4x 1-6x 2 st x 1 + x 2 8 For this, we have -x 1 + 2x x 1 0 -x 2 0 (A QP ) g 1 (x) = 1 1, g 2(x) = 1 2, g 3(x) = 1 0, g 0 4(x) = 1, and f(x) = 4x 1 2x 2 4 2x 2 2x 1 6. Let us begin with x 0 = 0 0, with f(x0 )=0 ITERATION 1 Min z J={3,4}. STEP 1 yields Min z st f T (x 1 ) d z d 1 (4x 1-2x 2-4) + d 2 (2x 2-2x 1-6) z g 3 T (x i ) d z -d 1 z g 4 T (x i ) d z -d 2 z -1 d 1, d 2 1 d 1,d 2 [-1,1]
26 174 i.e., Min z st -4d 1-6d 2 z, -d 1 z, -d 2 z d 1,d 2 [-1,1] yielding z * = -1 (<0), with d * = d 0 = 1 1 STEP 2 (line search): x 0 + αd 0 = α 1 1 = α α Thus α max = {α 2α 8, α 10, -α 0, -α 0} = 4 We therefore solve: Minimize f(x 0 +αd 0 ), st 0 α 4 α = α 0 =4 x 1 = x 0 + α 0 d 0 = 4 4, with f(x1 )= -24. ITERATION 2 J={1}. STEP 1 yields Min z st d 1 (4x 1-2x 2-4) + d 2 (2x 2-2x 1-6) z d 1 + d 2 z -1 d 1, d 2 1 i.e., Min z st 4d 1-6d 2 z, d 1 + d 2 z, d 1,d 2 [-1,1] yielding z * = -4 (<0), with d * = d 1 = 1 0. STEP 2 (line search): x 1 + αd 1 = 4 α + α 1 =
27 175 Thus α max = {α 8-α 8, α , α-4 0, -4 0} = 4 We therefore solve: Minimize 2(4-α) (4-α)4 4(4-α) 6(4), st 0 α 4. α =α 1 =1 x 2 = x 1 + α 1 d 1 = 3 4, with f(x2 )= -26. ITERATION 3 J={φ}. STEP 1 yields Min z st d 1 (4x 1-2x 2-4) + d 2 (2x 2-2x 1-6) z -1 d 1, d 2 1 i.e., Min z st -4d 2 z, d 1,d 2 [-1,1] yielding z * = -4 (<0), with d * = d 2 = 0 1. STEP 2 (line search): x 2 + αd 2 = α 0 1 = α Thus α max = {α 7+α 8, 5+α 10, -3 0, -4-α 0} = 1 We therefore solve: Minimize f(x 2 +αd 2 ), st 0 α 1 i.e., Minimize 2(3) 2 + (4+α) 2 2(3)(4+α) 4(3) 6(4+α), st 0 α 1 α =α 2 =1 x 3 = x 2 + α 2 d 2 = 3 5, with f(x3 )= -29.
28 176 ITERATION 4 J={1}. STEP 1 yields Min z st d 1 (4x 1-2x 2-4) + d 2 (2x 2-2x 1-6) z d 1 + d 2 z -1 d 1, d 2 1 i.e., Min z st -2d 1-10d 2 z, d 1 + d 2 z, d 1,d 2 [-1,1] yielding z * = 0, with d * = 0 0. STOP Therefore the optimum solution is given by x = 3 5, with f(x* )= path followed by algorithm 7 6 -x 1 +2 x 2 =10 5 x x 2 x 1 feasible region 2 x 1 + x 2 =8 1 x
29 177 GRADIENT PROJECTION METHOD (ROSEN) Recall that the steepest descent direction is - f(x). However, for constrained problems, moving along - f(x) may destroy feasibility. Rosen's method works by projecting - f(x) on to the hyperplane tangent to the set of active constraints. By doing so it tries to improve the objective while simultaneously maintaining feasibility. It uses d=-p f(x) as the search direction, where P is a projection matrix. Properties of P (n n matrix) 1) P T = P and P T P = P (i.e., P is idempotent) 2) P is positive semidefinite 3) P is a projection matrix if, and only if, I-P is also a projection matrix. 4) Let Q=I-P and p= Px 1, q=qx 2 where x 1, x 2 R n. Then (a) p T q= q T p =0 (p and q are orthogonal) (b) Any x R n can be uniquely expressed as x=p+q, where p=px, q=(i-p)x q x 2 x given x, we can find p and q p x 1
30 178 Consider the simpler case where all constraints are linear. Assume that there are 2 linear constraints intersecting along a line as shown below. P(- f) - f g 1 (I-P)(- f) g 1 (x)=0 g 2 (x)=0 g 2 Obviously, moving along - f would take us outside the feasible region. Say we project - f on to the feasible region using the matrix P. THEOREM: As long as P is a projection matrix and P f 0, d=-p f will be an improving direction. PROOF: f T d = f T (-P f) = - f T P T P f = - P f 2 < 0. (QED) For -P f to also be feasible we must have g 1 T (-P f) and g 2 T (-P f) 0 (by definition ).
31 179 Say we pick P such that g 1 T (-P f) = g 2 T (-P f) = 0, so that -P f is a feasible direction). Thus if we denote M as the (2 n) matrix where Row 1 is g 1 T and Row 2 is g 2 T, then we have M(-P f)= 0 (*) To find the form of the matrix P, we make use of Property 4(b) to write the vector - f as - f = P(- f) + [I-P](- f) = -P f - [I-P] f Now, - g 1 and - g 2 are both orthogonal to -P f, and so is [I-P](- f). Hence we must have [I-P](- f) = - [I-P] f = λ 1 g 1 + λ 2 g 2 -[I-P] f = M T λ, where λ= λ 1 λ 2 -M[I-P] f = MM T λ -(MM T ) -1 M[I-P] f = (MM T ) -1 MM T λ =λ λ = -{ (MM T ) -1 M f - (MM T ) -1 MP f } λ = -{ (MM T ) -1 M f + (MM T ) -1 M(-P f) } λ = -(MM T ) -1 M f (from (*) above M(-P f)=0 ) Hence we have - f = -P f - [I-P] f = -P f + M T λ = -P f + M T (MM T ) -1 M f i.e., P f = f - M T (MM T ) -1 M f = [I - M T (MM T ) -1 M] f i.e., P = [I - M T (MM T ) -1 M]
32 180 Question: What if P f(x) = 0? Answer: Then 0 = P f = [I M T (MM T ) -1 M] f = f + M T w, where w = -(MM T ) -1 M f. If w 0, then x satisfies the Karush-Kuhn-Tucker conditions and we may stop; if not, a new projection matrix P could be identified such that d= -P f is an improving feasible direction. In order to identify this matrix P we merely pick any component of w that is negative, and drop the corresponding row from the matrix M. Then we use the usual formula to obtain P. ==================== Now consider the general problem Minimize f(x) st g j (x) 0, j=1,2,...,m h k (x) = 0, k=1,2,...,p The gradient projection algorithm can be generalized to nonlinear constraints by projecting the gradient on to the tangent hyperplane at x (rather than on to the surface of the feasible region itself).
33 181 STEP 1: SEARCH DIRECTION Let x i be a feasible point and let J be the "active set", i.e., J = {j g j (x i )=0}, or more practically, J = {j g j (x i )+ε 0}. Suppose each j J is approximated using the Taylor series expansion by: g j (x) = g j (x i ) + (x-x i ) g j (x i ). Notice that this approximation is a linear function of x with slope g j (x i ). Let M matrix whose rows are g T j (x i ) for j J (active constraints) and h T k (x i ) for k=1,2,...,p (equality constraints) Let P=[I - M T (MM T ) -1 M] ( projection matrix). If M does not exist, let P=I. Let d i = -P f(x i ). a) If d i =0 and M does not exist STOP b) If d i =0 and M exists find w = u v = -(MMT ) -1 M f(x i ) where u corresponds to g (inequality constraints) and v to h (equality constraints). If u>0 STOP, else delete the row of M corresponding to some u j <0 and return to Step 1. c) If d i 0 go to Step 2.
34 182 STEP 2. Solve STEP SIZE (Line Search) Minimize 0 α α mmm f x i + αd i where α max = Supremum {α g j (x i +αd i ) 0, h j (x i +αd i ) =0}. Call the optimal solution α * and find x i+1 = x i +α * d i. If all constraints are not linear make a "correction move" to return x into the feasible region. The need for this is shown below. Generally, it could be done by solving g(x)=0 starting at x i+1 using the Newton Raphson approach, or by moving orthogonally from x i+1 etc. x 2 O.F. contours x 2 O.F. contours f g 2 =0 P f f g 1 g 1 =0 P f x i+1 correction move g 1 =0 x 1 x 1 g 1, g 2 are linear constraints NO CORRECTION REQD. g 1 is a nonlinear constraint REQUIRES CORRECTION
35 183 LINEARIZATION METHODS The general idea of linearization methods is to replace the solution of the NLP by the solution of a series of linear programs which approximate the original NLP. The simplest method is RECURSIVE LP: Transform Min f(x), st g j (x), j=1,2,...,m, into the LP Min {f(x i ) + (x-x i ) T f(x i )} st g j (x i ) + (x-x i ) T g j (x i ) 0, for all j. Thus we start with some initial x i where the objective and constraints (usually only the tight ones) are linearized, and then solve the LP to obtain a new x i+1 and continue... Note that f(x i ) is a constant and x-x i =d implies the problem is equivalent to Min d T f(x i ) st d T g j (x i ) -g j (x i ), for all j. and this is a direction finding LP problem, and x i+1 =x i +d implies that the step size=1.0 EXAMPLE: Min f(x)= 4x 1 x , st g 1 (x) = x x , g 2 (x) = -x x x x If this is linearized at the point x i =(2,4), then since
36 f x i 4 4 = = 2x 2 8 ; g 1 x i = 2x 1 = 4 2x 2 8 ; g 2 x i = 2x x = 6 2 it yields the following linear program (VERIFY ) Min f (x) = 4x 1 8x 2 + 4, st g 1 (x) = 4x 1 + 8x , g 2 (x) = 6x 1 + 2x x * f =-30 f=-20 f=-10 f=0 f =0 g 1 =0 g 2 =0 g 1 =0 g 2 =0 The method however has limited application since it does not converge if the local minimum occurs at a point that is not a vertex of the feasible region; in such cases it oscillates between adjacent vertices. To avoid this one may limit the step size (for instance, the Frank-Wolfe method minimizes f between x i and x i +d to get x i+1 ).
37 185 RECURSIVE QUADRATIC PROGRAMMING (RQP) RQP is similar in logic to RLP, except that it solves a sequence of quadratic programming problems. RQP is a better approach because the second order terms help capture curvature information. Min d T L(x i ) + ½ d T Bd st d T g j (x i ) -g j (x i ) j where B is the Hessian (or an approximation of the Hessian) of the Lagrangian function of the objective function f at the point x i. Notice that this a QP in d with linear constraints. Once the above QP is solved to get d i, we obtain x i by finding an optimum step size along x i +d i and continue. The matrix B is never explicitly computed --- it is usually updated by schemes such as the BFGS approach using information from f(x i ) and f(x i+1 ) only. (Recall the quasi-newton methods for unconstrained optimization)
38 186 KELLEY'S CUTTING PLANE METHOD One of the better linearization methods is the cutting plane algorithm of J. E. Kelley, developed for convex programming problems. Suppose we have a set of nonlinear constraints g j (x) 0 for j=1,2,...,m that determine the feasible region G. Suppose further that we can find a polyhedral set H that entirely contains the region determined by these constraints, i.e., G H. h 2 =0 h 3 =0 g 3 =0 G g 2 =0 h 4 =0 g 1 =0 H h 1 =0 G: g j (x): nonlinear ffffffff rrrrrr h j (x): linear; hence H is polyhedral In general, a cutting plane algorithm would operate as follows: 1. Solve the problem with linear constraints. 2. If the optimal solution to this problem is feasible in the original (nonlinear) constraints then STOP; it is also optimal for the original problem. 3. Otherwise add an extra linear constraint (i.e., an extra line/hyperplane) so that the current optimum point (for the linear constraints) becomes infeasible for the new problem with the added linear constraint (so we "cut out" part of the infeasibility). Now return to Step 1.
39 187 JUSTIFICATION Step 1: It is in general easier to solve problems with linear constraints than ones with nonlinear constraints. Step 2: G is determined by g j (x) 0, j=1,...,m, where each g j is convex, while H is determined by h k 0 k, where each h k is linear. G is wholly contained inside the polyhedral set H. Thus every point that satisfies g j (x) 0 automatically satisfies h k (x) 0, but not vice-versa. Thus G places "extra" constraints on the design variables and therefore the optimal solution with G can never be any better than the optimal solution with H. Therefore, if for some region H, the optimal solution x * is also feasible in G, it MUST be optimal for G also. (NOTE: In general, the optimal solution for H at some iteration will not be feasible in G). Step 3: Generating a CUT In Step 2, let g x i = Maximum j J g j x i, where J is the set of violated constraints, J={j g j (x)>0}, i.e., g x i is the value of the most violated constraint at x i. By the Taylor series approximation around x i, g (x) = g x i + x x i T g x i + ½ [x x i ] T H(x )[x x i ] g (x) g x i + x x i T g (x i ), for every x R n (since g is convex).
40 188 If g x i = 0 then g (x) g x i But since g is violated at x i, g x i > 0. Therefore g (x) > 0 x R n. g is violated for all x, the original problem is infeasible. If g x i 0, then consider the following linear constraint: x x i T g x i g x i i. e., h(x) = g x i + x x i T g x i 0 ( ) (Note that x i, g x i and g x i are all known ) The current point x i is obviously infeasible in this constraint --- since h(x i ) 0 implies that g x i + x i x i T g x i = g x i 0, which contradicts the fact that g is violated at x i, (i.e, g (x i ) >0). So, if we add this extra linear constraint, then the optimal solution would change because the current optimal solution would no longer be feasible for the new LP with the added constraint. Hence ( ) defines a suitable cutting plane. NOTE: If the objective for the original NLP was linear, then at each iteration we merely solve a linear program!
41 189 An Example (in R 2 ) of how a cutting plane algorithm might proceed. ( cutting plane). x h 4 True optimum x * for G h 2 h 5 h 4 x h 2 h 1 G h 1 G H h 3 H h 3 Nonnegativity + 3 linear constraints Nonnegativity + 4 linear constraints h 4 h 6 h 5 h 2 x h 4 h 5 x =x * h 6 h 2 h 1 G h 1 G H h 3 H h 3 Nonnegativity + 5 linear constraints Nonnegativity + 6 linear constraints
Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen
(für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained
Date: April 12, 2001. Contents
2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........
2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
Nonlinear Optimization: Algorithms 3: Interior-point methods
Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris [email protected] Nonlinear optimization c 2006 Jean-Philippe Vert,
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem
Nonlinear Programming Methods.S2 Quadratic Programming
Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective
SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA
SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem. The Section 1 presents a geometric motivation for the
Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.
Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian
In this section, we will consider techniques for solving problems of this type.
Constrained optimisation roblems in economics typically involve maximising some quantity, such as utility or profit, subject to a constraint for example income. We shall therefore need techniques for solving
24. The Branch and Bound Method
24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no
A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION
1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of
constraint. Let us penalize ourselves for making the constraint too big. We end up with a
Chapter 4 Constrained Optimization 4.1 Equality Constraints (Lagrangians) Suppose we have a problem: Maximize 5, (x 1, 2) 2, 2(x 2, 1) 2 subject to x 1 +4x 2 =3 If we ignore the constraint, we get the
Linear Programming Notes V Problem Transformations
Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material
Support Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.
Introduction Linear Programming Neil Laws TT 00 A general optimization problem is of the form: choose x to maximise f(x) subject to x S where x = (x,..., x n ) T, f : R n R is the objective function, S
Derivative Free Optimization
Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTH-MAT-R--2014/02--SE Department of Mathematics Linköping University S-581 83 Linköping, Sweden. Three lectures 1 on Derivative Free
Constrained optimization.
ams/econ 11b supplementary notes ucsc Constrained optimization. c 2010, Yonatan Katznelson 1. Constraints In many of the optimization problems that arise in economics, there are restrictions on the values
Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.
1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.
Linear Programming Problems
Linear Programming Problems Linear programming problems come up in many applications. In a linear programming problem, we have a function, called the objective function, which depends linearly on a number
No: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics
No: 10 04 Bilkent University Monotonic Extension Farhad Husseinov Discussion Papers Department of Economics The Discussion Papers of the Department of Economics are intended to make the initial results
t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).
1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction
Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725
Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T
Interior Point Methods and Linear Programming
Interior Point Methods and Linear Programming Robert Robere University of Toronto December 13, 2012 Abstract The linear programming problem is usually solved through the use of one of two algorithms: either
Chapter 4. Duality. 4.1 A Graphical Example
Chapter 4 Duality Given any linear program, there is another related linear program called the dual. In this chapter, we will develop an understanding of the dual linear program. This understanding translates
Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method
Lecture 3 3B1B Optimization Michaelmas 2015 A. Zisserman Linear Programming Extreme solutions Simplex method Interior point method Integer programming and relaxation The Optimization Tree Linear Programming
Practical Guide to the Simplex Method of Linear Programming
Practical Guide to the Simplex Method of Linear Programming Marcel Oliver Revised: April, 0 The basic steps of the simplex algorithm Step : Write the linear programming problem in standard form Linear
Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.
Summer course on Convex Optimization Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.Minnesota Interior-Point Methods: the rebirth of an old idea Suppose that f is
Duality of linear conic problems
Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least
1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
Discrete Optimization
Discrete Optimization [Chen, Batson, Dang: Applied integer Programming] Chapter 3 and 4.1-4.3 by Johan Högdahl and Victoria Svedberg Seminar 2, 2015-03-31 Todays presentation Chapter 3 Transforms using
CHAPTER 9. Integer Programming
CHAPTER 9 Integer Programming An integer linear program (ILP) is, by definition, a linear program with the additional constraint that all variables take integer values: (9.1) max c T x s t Ax b and x integral
Support Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
Inner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
What is Linear Programming?
Chapter 1 What is Linear Programming? An optimization problem usually has three essential ingredients: a variable vector x consisting of a set of unknowns to be determined, an objective function of x to
Nonlinear Algebraic Equations. Lectures INF2320 p. 1/88
Nonlinear Algebraic Equations Lectures INF2320 p. 1/88 Lectures INF2320 p. 2/88 Nonlinear algebraic equations When solving the system u (t) = g(u), u(0) = u 0, (1) with an implicit Euler scheme we have
TOPIC 4: DERIVATIVES
TOPIC 4: DERIVATIVES 1. The derivative of a function. Differentiation rules 1.1. The slope of a curve. The slope of a curve at a point P is a measure of the steepness of the curve. If Q is a point on the
Cost Minimization and the Cost Function
Cost Minimization and the Cost Function Juan Manuel Puerta October 5, 2009 So far we focused on profit maximization, we could look at a different problem, that is the cost minimization problem. This is
(Quasi-)Newton methods
(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting
15 Kuhn -Tucker conditions
5 Kuhn -Tucker conditions Consider a version of the consumer problem in which quasilinear utility x 2 + 4 x 2 is maximised subject to x +x 2 =. Mechanically applying the Lagrange multiplier/common slopes
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Linear Programming. March 14, 2014
Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1
Linear Programming in Matrix Form
Linear Programming in Matrix Form Appendix B We first introduce matrix concepts in linear programming by developing a variation of the simplex method called the revised simplex method. This algorithm,
Linear Programming: Theory and Applications
Linear Programming: Theory and Applications Catherine Lewis May 11, 2008 1 Contents 1 Introduction to Linear Programming 3 1.1 What is a linear program?...................... 3 1.2 Assumptions.............................
Lecture 2: August 29. Linear Programming (part I)
10-725: Convex Optimization Fall 2013 Lecture 2: August 29 Lecturer: Barnabás Póczos Scribes: Samrachana Adhikari, Mattia Ciollaro, Fabrizio Lecci Note: LaTeX template courtesy of UC Berkeley EECS dept.
The Envelope Theorem 1
John Nachbar Washington University April 2, 2015 1 Introduction. The Envelope Theorem 1 The Envelope theorem is a corollary of the Karush-Kuhn-Tucker theorem (KKT) that characterizes changes in the value
Lecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
Vector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty
Duality in Linear Programming
Duality in Linear Programming 4 In the preceding chapter on sensitivity analysis, we saw that the shadow-price interpretation of the optimal simplex multipliers is a very useful concept. First, these shadow
Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1
Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing
Practice with Proofs
Practice with Proofs October 6, 2014 Recall the following Definition 0.1. A function f is increasing if for every x, y in the domain of f, x < y = f(x) < f(y) 1. Prove that h(x) = x 3 is increasing, using
Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization
Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization 2.1. Introduction Suppose that an economic relationship can be described by a real-valued
DERIVATIVES AS MATRICES; CHAIN RULE
DERIVATIVES AS MATRICES; CHAIN RULE 1. Derivatives of Real-valued Functions Let s first consider functions f : R 2 R. Recall that if the partial derivatives of f exist at the point (x 0, y 0 ), then we
Dual Methods for Total Variation-Based Image Restoration
Dual Methods for Total Variation-Based Image Restoration Jamylle Carter Institute for Mathematics and its Applications University of Minnesota, Twin Cities Ph.D. (Mathematics), UCLA, 2001 Advisor: Tony
Linear Algebra I. Ronald van Luijk, 2012
Linear Algebra I Ronald van Luijk, 2012 With many parts from Linear Algebra I by Michael Stoll, 2007 Contents 1. Vector spaces 3 1.1. Examples 3 1.2. Fields 4 1.3. The field of complex numbers. 6 1.4.
Nonlinear Algebraic Equations Example
Nonlinear Algebraic Equations Example Continuous Stirred Tank Reactor (CSTR). Look for steady state concentrations & temperature. s r (in) p,i (in) i In: N spieces with concentrations c, heat capacities
Numerical methods for American options
Lecture 9 Numerical methods for American options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 American options The holder of an American option has the right to exercise it at any moment
International Doctoral School Algorithmic Decision Theory: MCDA and MOO
International Doctoral School Algorithmic Decision Theory: MCDA and MOO Lecture 2: Multiobjective Linear Programming Department of Engineering Science, The University of Auckland, New Zealand Laboratoire
Approximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
CONTINUED FRACTIONS AND PELL S EQUATION. Contents 1. Continued Fractions 1 2. Solution to Pell s Equation 9 References 12
CONTINUED FRACTIONS AND PELL S EQUATION SEUNG HYUN YANG Abstract. In this REU paper, I will use some important characteristics of continued fractions to give the complete set of solutions to Pell s equation.
Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc.
1. Introduction Linear Programming for Optimization Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc. 1.1 Definition Linear programming is the name of a branch of applied mathematics that
LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS. 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method
LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method Introduction to dual linear program Given a constraint matrix A, right
. P. 4.3 Basic feasible solutions and vertices of polyhedra. x 1. x 2
4. Basic feasible solutions and vertices of polyhedra Due to the fundamental theorem of Linear Programming, to solve any LP it suffices to consider the vertices (finitely many) of the polyhedron P of the
Probability Generating Functions
page 39 Chapter 3 Probability Generating Functions 3 Preamble: Generating Functions Generating functions are widely used in mathematics, and play an important role in probability theory Consider a sequence
Stochastic Inventory Control
Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the
x a x 2 (1 + x 2 ) n.
Limits and continuity Suppose that we have a function f : R R. Let a R. We say that f(x) tends to the limit l as x tends to a; lim f(x) = l ; x a if, given any real number ɛ > 0, there exists a real number
3. Linear Programming and Polyhedral Combinatorics
Massachusetts Institute of Technology Handout 6 18.433: Combinatorial Optimization February 20th, 2009 Michel X. Goemans 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the
Lecture Notes on Elasticity of Substitution
Lecture Notes on Elasticity of Substitution Ted Bergstrom, UCSB Economics 210A March 3, 2011 Today s featured guest is the elasticity of substitution. Elasticity of a function of a single variable Before
HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)!
Math 7 Fall 205 HOMEWORK 5 SOLUTIONS Problem. 2008 B2 Let F 0 x = ln x. For n 0 and x > 0, let F n+ x = 0 F ntdt. Evaluate n!f n lim n ln n. By directly computing F n x for small n s, we obtain the following
4 Lyapunov Stability Theory
4 Lyapunov Stability Theory In this section we review the tools of Lyapunov stability theory. These tools will be used in the next section to analyze the stability properties of a robot controller. We
Proximal mapping via network optimization
L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:
Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach
MASTER S THESIS Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach PAULINE ALDENVIK MIRJAM SCHIERSCHER Department of Mathematical
BANACH AND HILBERT SPACE REVIEW
BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but
Interior-Point Algorithms for Quadratic Programming
Interior-Point Algorithms for Quadratic Programming Thomas Reslow Krüth Kongens Lyngby 2008 IMM-M.Sc-2008-19 Technical University of Denmark Informatics and Mathematical Modelling Building 321, DK-2800
Lecture 11: 0-1 Quadratic Program and Lower Bounds
Lecture : - Quadratic Program and Lower Bounds (3 units) Outline Problem formulations Reformulation: Linearization & continuous relaxation Branch & Bound Method framework Simple bounds, LP bound and semidefinite
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Solving Linear Programs
Solving Linear Programs 2 In this chapter, we present a systematic procedure for solving linear programs. This procedure, called the simplex method, proceeds by moving from one feasible solution to another,
Least-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.
Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given
Big Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
Economics 2020a / HBS 4010 / HKS API-111 FALL 2010 Solutions to Practice Problems for Lectures 1 to 4
Economics 00a / HBS 4010 / HKS API-111 FALL 010 Solutions to Practice Problems for Lectures 1 to 4 1.1. Quantity Discounts and the Budget Constraint (a) The only distinction between the budget line with
1 if 1 x 0 1 if 0 x 1
Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or
MATH 425, PRACTICE FINAL EXAM SOLUTIONS.
MATH 45, PRACTICE FINAL EXAM SOLUTIONS. Exercise. a Is the operator L defined on smooth functions of x, y by L u := u xx + cosu linear? b Does the answer change if we replace the operator L by the operator
5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1
5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 General Integer Linear Program: (ILP) min c T x Ax b x 0 integer Assumption: A, b integer The integrality condition
MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform
MATH 433/533, Fourier Analysis Section 11, The Discrete Fourier Transform Now, instead of considering functions defined on a continuous domain, like the interval [, 1) or the whole real line R, we wish
Quotient Rings and Field Extensions
Chapter 5 Quotient Rings and Field Extensions In this chapter we describe a method for producing field extension of a given field. If F is a field, then a field extension is a field K that contains F.
Solutions Of Some Non-Linear Programming Problems BIJAN KUMAR PATEL. Master of Science in Mathematics. Prof. ANIL KUMAR
Solutions Of Some Non-Linear Programming Problems A PROJECT REPORT submitted by BIJAN KUMAR PATEL for the partial fulfilment for the award of the degree of Master of Science in Mathematics under the supervision
Several Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan
Adaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
1. Prove that the empty set is a subset of every set.
1. Prove that the empty set is a subset of every set. Basic Topology Written by Men-Gen Tsai email: [email protected] Proof: For any element x of the empty set, x is also an element of every set since
Notes on metric spaces
Notes on metric spaces 1 Introduction The purpose of these notes is to quickly review some of the basic concepts from Real Analysis, Metric Spaces and some related results that will be used in this course.
Separation Properties for Locally Convex Cones
Journal of Convex Analysis Volume 9 (2002), No. 1, 301 307 Separation Properties for Locally Convex Cones Walter Roth Department of Mathematics, Universiti Brunei Darussalam, Gadong BE1410, Brunei Darussalam
Linear Programming I
Linear Programming I November 30, 2003 1 Introduction In the VCR/guns/nuclear bombs/napkins/star wars/professors/butter/mice problem, the benevolent dictator, Bigus Piguinus, of south Antarctica penguins
A Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
PYTHAGOREAN TRIPLES KEITH CONRAD
PYTHAGOREAN TRIPLES KEITH CONRAD 1. Introduction A Pythagorean triple is a triple of positive integers (a, b, c) where a + b = c. Examples include (3, 4, 5), (5, 1, 13), and (8, 15, 17). Below is an ancient
Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics
Undergraduate Notes in Mathematics Arkansas Tech University Department of Mathematics An Introductory Single Variable Real Analysis: A Learning Approach through Problem Solving Marcel B. Finan c All Rights
1 Norms and Vector Spaces
008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)
Applied Algorithm Design Lecture 5
Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design
