Preprint Ayalew Getachew Mersha, Stephan Dempe Feasible Direction Method for Bilevel Programming Problem ISSN

Transcription

1 Fakultät für Mathematik und Informatik Preprint Ayalew Getachew Mersha, Stephan Dempe Feasible Direction Method for Bilevel Programming Problem ISSN

2 Ayalew Getachew Mersha, Stephan Dempe Feasible Direction Method for Bilevel Programming Problem TU Bergakademie Freiberg Fakultät für Mathematik und Informatik Institut für Numerische Mathematik und Optimierung Akademiestr. 6 (Mittelbau) FREIBERG

3 ISSN Herausgeber: Herstellung: Dekan der Fakultät für Mathematik und Informatik Medienzentrum der TU Bergakademie Freiberg

4 FEASIBLE DIRECTION METHOD FOR BILEVEL PROGRAMMING PROBLEM AYALEW GETACHEW MERSHA AND STEPHAN DEMPE Abstract. In this paper, we investigate the application of feasible direction method for an optimistic nonlinear bilevel programming problem. The convex lower level problem of an optimistic nonlinear bilevel programming problem is replaced by relaxed KKT conditions. The feasible direction method developed by Topkis and Veinott [22] is applied to the auxiliary problem to get a Bouligand stationary point for an optimistic bilevel programming problem. (1.1) 1. Introduction In this paper we consider the following bilevel programming problem. with F (x, y) min x G(x) 0, y Ψ(x) (1.2) y Ψ(x) := arg min{f(x, y) : g(x, y) 0} y where F, f : R n+m R, g : R n+m R p, G : R n R l are at least twice continuously differentiable functions. The following assumption is made throughout this paper. Assumption 1. f(x,.), g i (x,.), i = 1,..., p are convex for every x. If the solution of the lower level problem (1.2) corresponding to any parameter x is not unique, the upper level problem (1.1) is not a well defined optimization problem. In this case the bilevel programming problem may not be solvable. This situation is depicted by examples constructed in [2] and [4, example on page 121]. The following Lemma substantiates this claim. Lemma 1.1. [3] If Ψ(x) is not a singleton for all parameters x, the leader may not achieve his infimum objective function value. To overcome such an unpleasant situation there are three strategies available for the leader. The first strategy is to replace min with inf in the formulation of problem (1.1) and define ɛ optimal solutions. The second strategy is to allow cooperation between the two players: the upper level decision maker, the leader, and the lower level decision maker, the follower. This resulted in the so called optimistic or weak bilevel programming problem. The third is a conservative strategy. In this case the Work of the first author was supported by DAAD(Deutscher Akademischer Austausch Dienst) with a scholarship grant. 1

5 2 AYALEW GETACHEW MERSHA AND STEPHAN DEMPE leader is forced to bound the damage caused by a follower s unfavorable choice. Define the following sets, M := {(x, y) : G(x) 0, g(x, y) 0}, M(X) := {x R n : y s.t. (x, y) M}, M(x) := {y R m : g(x, y) 0, } I(x, y ) := {i : g i (x, y ) = 0}. Definition 1.2. The lower level problem (1.2) is said to satisfy the Mangasarian- Fromowitz Constraint Qualification (MFCQ) at (x, y ), y Ψ(x ) if { r R m r y g i (x, y ) < 0, i I(x, y ) }. The Lagrangian function for the lower level problem is given by Consider the set of Lagrange multipliers Let L(x, y, λ) := f(x, y) + λ g(x, y). Λ(x, y) := {λ : λ 0, λ g(x, y) = 0, y L(x, y, λ) = 0}. J(λ) := {j : λ j > 0}. Definition 1.3. The lower level problem (1.2) is said to satisfy a strong sufficient optimality condition of second order (SSOC) at a point (x, y ) if for each λ Λ(x, y ) and for every nonzero element of the set { r R m r y g i (x, y ) = 0, i J(λ) } we have r T 2 yyl(x, y, λ)r > 0. Definition 1.4. The constant rank constraint qualification (CRCQ) is valid for problem (1.2) at a point (x, y ) if there is an open neighborhood W ε (x, y ), ε > 0 of (x, y ) such that for each subsets I I(x, y ), the family of gradient vectors { y g i (x, y) : i I} has the same rank for all (x, y) W ε (x, y ). Assume the following is satisfied: Assumption 2. The set M is nonempty and compact. 2. Optimistic bilevel programming If the follower agrees to cooperate with the leader in the decision making process, we get the weak bilevel programming problem: (2.1) where ϕ (x) min x s.t. G(x) 0 (2.2) ϕ (x) := min{f (x, y) : y Ψ(x)} y As can be seen from the formulation, this problem is weaker that the original bilevel programming problem.

6 FEASIBLE DIRECTION METHOD FOR BILEVEL PROGRAMMING PROBLEM 3 Definition 2.1. [4] A point (x, y ) R n R m is a local solution of problem (2.1), (2.2) if x {x : G(x) 0}, y Ψ(x ) with F (x, y) F (x, y ), y Ψ(x ) and there exists a sufficiently small positive number ɛ > 0, with ϕ (x ) ϕ (x), x B ɛ (x ). If ɛ = then (x, y ) is a global solution. Now we further require to replace the optimistic bilevel programming problem with another version of hierarchical problem. Consider the following problem (2.3) min{f (x, y) : G(x) 0, y Ψ(x)} x,y with the properties F, g, h explained at the beginning of this paper. The relation between problem (2.3) and (2.1), (2.2) follows. If Assumption 2 and MFCQ satisfied, the global optimal solutions of (2.3) and (2.1), (2.2) coincide [4]. With respect to a local optimal solution, Dutta and Dempe [6] showed that every local optimal solution of the optimistic bilevel problem is a local optimal solution of problem (2.3) as the following result shows. Theorem 2.2. [6] Let the lower level problem (1.2) satisfy Assumption 2 and MFCQ and let x be a local optimal solution of problem (2.1) with corresponding ȳ arg min{f ( x, y) : y Ψ( x)}. y Then ( x, ȳ) is a local optimal solution of problem (2.3). Problem (2.3) is used to derive the necessary optimality condition for the optimistic bilevel programming problem (2.1), (2.2) because it can be reformulated to a single level problem provided that the lower level problem is convex and satisfies the Mangasarian-Fromowitz constraint qualification. 3. Piecewise continuously differentiable functions Definition 3.1. A function y : R n R m is a P C 1 function at x if it is continuous and there exists an open neighborhood V of x and a finite number of continuously differentiable functions y i : V R m, i = 1,..., b such that y(x) {y 1 (x),..., y b (x)} for all x V. The function y is a P C 1 function on some open set O provided it is a P C 1 function at every point x O. Theorem 3.2. [18] Let the lower level problem (1.2) at x = x be convex satisfying (MFCQ), (SSOC) and (CRCQ) at a stationary solution y. Then, the locally uniquely determined function y(x) Ψ(x), and hence F (x, y(x)) is a P C 1 function. It has been shown in [11] that P C 1 functions are locally Lipschitz continuous. The directional differentiability of the solution function y(x) and hence F (x, y(x)) is by product of the result in [16]. The following result shows that P C 1 functions are directional differentiable and locally Lipschitz continuous. Theorem 3.3. [4, Theorem 4.8] Let f : R n R be a P C 1 function. Then f is directionally differentiable and locally Lipschitz continuous.

7 4 AYALEW GETACHEW MERSHA AND STEPHAN DEMPE Define K(i) := {x : y(x) = y i (x)} and the index set of essentially active functions I s y(x) := {i : y(x) = y i (x), x cl int K(i)} {1,..., b}. The usual tangent cone to the set K(i) is defined by { T K(i) (x) := d : x k x, x k K(i) k, t k 0 such that x } k x d. t k Once the set of essentially active selection function is known, it is shown in Scholtes [20, Proposition A.4.1] that the generalized Jacobian of y( ) is given by y(x) = conv{ y i (x) : i I s y(x)}. The generalized directional derivative of solution function y( ) is given by y (x; d) = sup{ d, z : z y(x)}. 4. Optimality It is known that a point (x, y ) is said to be a local optimal solution for a given optimization problem if it is the best point in an small neighborhood of (x, y ). Definition 4.1. [4]: A feasible point (x, y ) is said to be a local optimal solution of problem (2.3) if there exists an open neighborhood U(x, y ) such that F (x, y ) F (x, y), (x, y) U(x, y ), y Ψ(x), G(x) 0. Remember that the validity of conditions MFCQ, SSOC and CRCQ in the lower level problem implies that the solution function y(x) is a P C 1 function: a continuous selection of continuously differentiable functions y 1,..., y b. The so called Clarke and Bouligand stationarity are defined as: Definition 4.2. [20] Let the lower level problem satisfy MFCQ, SSOC and CRCQ at a point (x, y ), y Ψ(x ). Such a point (x, y ) is said to be a Clarke stationary point for the optimistic bilevel programming problem if d, i I s y(x ) such that d T K(i) (x ), x F (x, y )d + y F (x, y ) y i (x )d 0. Definition 4.3. Let conditions MFCQ, SSOC and CRCQ be satisfied for the lower level problem under consideration. A point (x, y ) is said to be a Bouligand stationary point for problem (1.1), (1.2) if (4.1) x F (x, y )d + y F (x, y )y (x ; d) 0 Or in other words j I s y(x ) we have (4.2) d : d G i (x ) 0, i : G i (x ) = 0. x F (x, y )d + y F (x, y ) y j (x )d 0 d : d G i (x ) 0, i : G i (x ), where I s y(x ) is the set of essentially active indices.

8 FEASIBLE DIRECTION METHOD FOR BILEVEL PROGRAMMING PROBLEM 5 If the lower level problem (1.2) is convex satisfying MFCQ, it can be replaced by its sufficient and necessary conditions and hence we get the following problem: (4.3) F (x, y) min x,y,λ G(x) 0 y L(x, y, λ) = 0 g(x, y) 0 λ 0 λ i g i (x, y) = 0, i = 1,..., p. Problems (1.1), (1.2) and problem (4.3) are equivalent if the lower level problem is convex satisfying MFCQ at each feasible point, the solution function is uniquely determined and strongly stable in the sense of Kojima [12], and a global solution is considered. Without the convexity assumption problem (4.3) has a larger feasible set. Hence even the global solution of the aforementioned problems may not coincide. The following simple relation can be established between problem (1.1),(1.2) and problem (4.3). Theorem 4.4. Let ( x, ȳ, λ) be a feasible point of (4.3). If the lower level problem (1.2) is convex satisfying MFCQ then the primal part, ( x, ȳ) is feasible in (1.1). Proof. Let ( x, ȳ, λ) be a feasible point of problem (4.3). Then for x, G( x) 0, validity of MFCQ in the lower level problem, the pair (ȳ, λ) solves the system y L( x, y, λ) = 0 g( x, y) 0 λ 0 λ i g i ( x, y) = 0, i = 1,..., p. Due to the convexity of the parametric lower level problem the above system is both sufficient and necessary condition for ȳ Ψ( x). Hence ( x, ȳ) is a feasible point for problem (1.1). Problem (4.3) is one element of special classes of mathematical programming problems called Mathematical Programs with Equilibrium Constraints (MPEC). MPECs are special classes of nonlinear programming problems with equilibrium conditions in the constraints. This means an expression of this type: θ 1 (x) 0, θ 2 (x) 0, θ i 1(x)θ i 2(x) = 0, i = 1,..., m appear in the constraints, where θ k : R n R m, k = 1, 2 are differentiable functions. MPECs are considered as one of the ill conditioned problems in optimization theory due to the existence of the complementarity term. Neither known regularity conditions like MFCQ, LICQ and Slater Constraint Qualification nor their nonsmooth variants are satisfied at any point of the feasible set of MPEC [19]. MPECs are used to model many economics and mechanic applications[17]. There are a number of investigations done during the last couples of decades. Consequently, many results were reported. The notable contribution we need to mention here is the successful attempt to solve MPEC as an ordinary nonlinear programming problem [7, 21]. Also a number of papers address question of deriving optimality conditions (see for instance [19]). The interested reader can consult the monographs [14, 17].

9 6 AYALEW GETACHEW MERSHA AND STEPHAN DEMPE Scholtes [21] investigated a general version of problem (4.3) via a regularization method. He relaxed the complementary constraint and then solved the resulting parameterized nonlinear programming problem for decreasing sequence of the relaxation parameter. The validity of MFCQ in the lower level problem implies the multiplier set is a nonempty convex compact polyhedron[10]. However, it is not necessarily a singleton. To obtain a stationary solution of an optimistic bilevel programming problem with convex lower level problem, we have to show the nonexistence of descent directions at the current incumbent stationary solution of the corresponding MPEC. As it is clearly shown in [4, Theorem 5.15], a stationary solution for the corresponding MPEC may not be a local solution for the optimistic bilevel programming problem even under strong assumptions: convexity and validity of regularity conditions for the lower level problem. We approach the ill-conditioned problem (4.3) by relaxing the stationarity condition y L(x, y, λ) = 0 and the complementarity condition λ i g i (x, y) = 0, i = 1, 2,..p. Consequently, we consider the following parametric auxiliary problem NLP(ɛ) for ɛ > 0: (4.5) F (x, y) min x,y,λ G(x) 0 y L(x, y, λ) ɛ g(x, y) 0 λ 0 λ i g i (x, y) ɛ, i = 1,..., p Before we proceed, we need to establish the relation between problem (1.1) and problem (4.5). Theorem 4.5. Let the parametric lower level problem (1.1) be convex satisfying MFCQ. If (x, y ) is a local optimal solution for problem (1.1), (1.2) then (x, y, λ ) is local optimal solution for problem (4.5) for ɛ = 0 and for all λ Λ(x, y ) := {λ : λ 0, λ T g(x, y ) = 0, y L(x, y, λ) = 0}. Proof. Assume that there exists λ Λ(x, y ) such that (x, y, λ) is not a local optimal solution of problem (4.5) with ɛ = 0. Then there exists a sequence {(x k, y k, λ k )} converging to (x, y, λ) with (4.6) F (x k, y k ) < F (x, y ) G(x k ) 0 g(x k, y k ) 0 y L(x k, y k, λ k ) = 0 λ k Λ(x k, y k ). Since the KKT conditions are necessary and sufficient for problem (1.2), we have y k Ψ(x k ), k. This implies (x k, y k ) is feasible in (1.1), (1.2). Therefore (x, y ) is not a local optimal solution of problem (1.1), (1.2) which is a contradiction to the hypothesis of the theorem. We make the following assumption for the upper level constraint.

10 FEASIBLE DIRECTION METHOD FOR BILEVEL PROGRAMMING PROBLEM 7 Assumption 3. The upper level constraint qualification is said to be satisfied at a point x if {d : d G i (x) < 0, i : G i (x) = 0} The existence of descent direction at a point in the neighborhood of a nonstationary point is shown in the next theorem. Theorem 4.6. Let ( x, ȳ, λ) be a feasible solution of problem (4.5) for ɛ = 0. Let the lower level problem (1.2) be convex satisfying MFCQ, SSOC and CRCQ. Assume that the upper level constraint set satisfies Assumption 3. If ( x, ȳ) is not a Bouligand stationary point of the problem (1.1) then there exists d with d x G i ( x) < 0, i : G i ( x) = 0 and δ > 0 such that for some j = 1,..., w. x F (x, y)d + y F (x, y) y j (x)d < 0, (x, y) B δ ( x, ȳ) Proof. Let the lower level problem be convex satisfying MFCQ, SSOC and CRCQ. Let ( x, ȳ, λ) be a feasible solution of problem (4.5) for ɛ = 0. If ( x, ȳ) is not a Bouligand stationary point of the problem (1.1) then there exists a direction d satisfying and G i ( x) d 0, i : G i ( x) = 0 x F ( x, ȳ)d + y F ( x, ȳ)y ( x; d) < 0. Due to MFCQ, SSOC and CRCQ the function y( ) {y 1 ( ),..., y w ( )} is a P C 1 function and hence directionally differentiable (Theorem 3.3). Therefore, there is j {1,..., w} such that Hence for this j we have y ( x; d) = y j ( x)d. x F ( x, ȳ)d + y F ( x, ȳ) y j ( x)d < 0. Since y j ( ) is continuously differentiable there exists δ > 0 such that x F (x, y)d + y F (x, y) y j (x)d < 0, (x, y) B δ ( x, ȳ). The following theorem is one of the key results that relates the MPEC (Problem 4.3) and its relaxed nonlinear programming problem NLP(ɛ): Theorem 4.7. Let {(x k, y k, λ k )} be a local optimal point of problem (4.5) for ɛ = ɛ k where ɛ k 0. Let (x, y, λ ) be a limit point of the sequence {(x k, y k, λ k )}. If { (d, r) G } i(x )d < 0, i : G i (x ) = 0, (4.7) g j (x, y )(d, r) < 0, j : g j (x, y ) = 0 holds, then (x, y, λ ) is a Bouligand stationary solution for problem (4.5) for ɛ = 0.

11 8 AYALEW GETACHEW MERSHA AND STEPHAN DEMPE Proof. Let (x, y, λ ) be not a Bouligand stationary solution for problem (4.5) for ɛ = 0. Then by definition the system x F (x, y )d + y F (x, y )r < 0 x G i (x )d 0, i : G i (x ) = 0 ( y L(x, y, λ ))(d, r, γ) = 0 x g j (x, y )d + y g j (x, y )r = 0, j : g j (x, y ) = 0 γ i 0, i : λ i = 0 (λ i g i (x, y ))(d, r, γ) = 0 has solution (d, r, γ). Let ɛ > 0 be arbitrary small. Due to the assumption (4.7) there is ( ˆd, ˆr, ˆγ) x F (x, y ) ˆd + y F (x, y )ˆr < 0 G i (x ) + x G i (x ) ˆd < 0, i ( y L(x, y, λ ))( ˆd, ˆr, ˆγ) ɛ/2 g j (x, y ) + x g j (x, y ) ˆd + y g j (x, y )ˆr < 0, j : g j (x, y ) = 0 λ i + ˆγ i > 0, i (λ i g i (x, y ))( ˆd, ˆr, ˆγ) ɛ/2 where (d, r) ( ˆd, ˆr) sufficiently small. Since {(x k, y k, λ k )} converges to (x, y, λ ) there exist k > 0 such that the following holds for k k x F (x k, y k ) ˆd + y F (x k, y k )ˆr < 0 G i (x k ) + x G i (x k ) ˆd < 0, i ( y L(x k, y k, λ k ))( ˆd, ˆr, ˆγ) ɛ/2 g j (x k, y k ) + x g j (x k, y k ) ˆd + y g j (x k, y k )ˆr < 0, j : g j (x, y ) = 0 λ i + ˆγ i > 0, i ((λ i ) k g i (x k, y k ))( ˆd, ˆr, ˆγ) ɛ/2 which implies (x k, y k, λ k ) is not a local solution of problem (4.5) resulting in a contradiction. The converse of the above theorem may not be true as the following simple example shows. Example 4.8. [21]: Consider the following simple example. min y s.t xy ɛ x, y 0 Clearly, every point on the positive x-axis is a local minimizer of NLP (0). But the relaxed problem NLP (ɛ), ɛ > 0 has no local solution. It is clear that the feasible set of NLP (ɛ) does not expand as ɛ 0. Hence the following theorem follows: Theorem 4.9. Let {(x k, y k, λ k )} be a sequence of local optimal solutions of problem (4.5) corresponding to ɛ k. If ɛ k 0 then there is a monotone increasing subsequence of the sequence {F (x k, y k )}.

12 FEASIBLE DIRECTION METHOD FOR BILEVEL PROGRAMMING PROBLEM 9 Proof. If the sequence {ɛ k } converges to zero then there is a subsequence {k i } of {k} such that ɛ ki > ɛ ki+1, i. which implies S(ɛ ki ) S(ɛ ki+1 ), i where S(ɛ k ) is the feasible set of problem (4.5) corresponding to ɛ k. The theorem follows from the property of min operator. Those presented results are the base of convergence analysis of the algorithm given next. The verification of optimality or stationarity is based on the size of (4.8) (4.9) (4.10) Initialization: Initialize with (x, y ) {(x, y) : G(x) 0, g(x, y) 0}, k := 0, s < w < 0, δ, σ, αɛ (0, 1) Step 1: Choose a maximal set of indices I k {i : g i (x k, y k ) = 0} with { y g i (x k, y k ) = 0, i I k } is linearly independent. Go to Step 2. Step 2: Solve the following system of inequalities to find a multiplier: L y (x k, y k, λ) ɛ λ i g i (x k, y k ) ɛ, j = 1,..., p λ 0, λ j = 0, j / I k. Step 3 : Find direction a (d k, r k ) using the direction finding problem (4.11) (see the next page) with s k < s. If there is no solution goto Step 6 otherwise proceed to Step 4. Step 4 : Using Armijo rule, find a step size which satisfies the following: F (x k + td k, y k + tr k ) F (x k, y k ) + δts k G i (x k + td k ) 0, i = 1,..., l g j (x k + td k, y k + tr k ) 0, j = 1,..., p. Step 5: Set x k+1 = x k + t k d k, y k+1 = y k + t k r k, put k:=k+1 and goto step 1. Step 6: Put s := s /2. If s w goto Step 3 otherwise goto Step 7. Step 7: Choose new set I k (σ) {i : σ g i (x k, y k ) = 0} with { y g i (x k, y k ), i I k (σ)} is linearly independent. If all such I k (σ) are already tried then go to Step 9 otherwise go to step 8. Step 8: Solve the following optimization problem y L(x k, y k, λ) 2 min λ λ i = 0, i / I k (σ) goto Step 3. Step 9: Put ɛ := αɛ. If ɛ ɛ then stop other wise goto Step 2. Algorithm 4.1: Feasible direction algorithm applied to bilevel programming problem with convex lower level problem. the directional derivative. If the directional derivative is sufficiently small (in the absolute value), we compute an approximate Lagrange multiplier by considering the near active constraints in the lower level problem. The solution of problem (4.10) is uniquely determined and converges to the true multiplier λ Λ(x, y ) for the sequence {(x k, y k )} produced by the algorithm with limit point (x, y ), if I k (σ) {I : {j : λ j > 0} I I g(x, y ) with { y g i (x k, y k ) : i I k (σ)} is linearly independent. The main purpose of doing Step 7 is to exclude Clarke stationary

13 10 AYALEW GETACHEW MERSHA AND STEPHAN DEMPE points that are not Bouligand stationary points. However, the choice of σ in Step 7 seems to be difficult. The direction in Step 3 of Algorithm 4.1 is found by solving the following linear optimization problem: (4.11a) (4.11b) (4.11c) (4.11d) (4.11e) (4.11f) (4.11g) (4.11h) (4.11i) (4.11j) (4.11k) (4.11l) s min d,r,s,γ xf (x k, y k )d + yf (x k, y k )r s xg i(x k )d + G i(x) s, i = 1,..., l yl(x k, y k, λ k ) + 2 xyl(x k, y k, λ k )d + 2 yyl(x k, y k, λ k )r + γ T yg(x k, y k ) ɛ yl(x k, y k, λ k ) 2 xyl(x k, y k, λ k )d 2 yyl(x k, y k, λ k )r γ T yg(x k, y k ) ɛ λ T g(x k, y k ) γ T g(x k, y k ) λ T xg(x k, y k )d λ T yg(x k, y k )r ɛ xg i(x k, y k )d + yg i(x k, y k )r = 0, i I k g j(x k, y k ) + xg j(x k, y k )d + yg j(x k, y k )r 0, j / I k λ γ 0 d 1 r 1 γ 0. Problem (4.11) has a negative optimal objective function value at all non stationary terms of the feasible sequence {(x k, y k )} provided that a suitable index set I k is chosen in Step 2 of the algorithm. Theorem Let the lower level problem be convex satisfying MFCQ, CRCQ and SSOC. Let ( x, ȳ, λ) be a feasible point of problem (4.3). If (x, y) B δ ( x, ȳ) {(x, y) : y Ψ(x)}, δ > 0, then there exists ɛ > 0 and I {i : g i ( x, ȳ) = 0}, with { y g i ( x, ȳ) : i I} linearly independent such that the following system is satisfied: (4.12) y L(x, y, λ) ɛ λ 0, λ i = 0, i / I. Proof. Since the lower level problem satisfies MFCQ, the Λ( x, ȳ) is nonempty and bounded for ȳ Ψ( x). Moreover, ȳ Ψ( x) and convexity implies that (4.13) y L( x, ȳ, λ) = 0 λ g( x, ȳ) = 0 λ 0 for λ Λ( x, ȳ). From this there exists an index set I such that y L( x, ȳ, λ) = 0 λ i = 0, i / I with {i : λ i > 0} I I g ( x, ȳ). Consider the sequence {(x k, y k )} B δ ( x, ȳ) {(x, y) : y Ψ(x)}

14 FEASIBLE DIRECTION METHOD FOR BILEVEL PROGRAMMING PROBLEM 11 with {(x k, y k )} converging to ( x, ȳ). Fix such an index set I. Then we have y L(x k, y k, λ) = y f(x k, y k ) + λ i y g i (x k, y k ) i I = y f( x, ȳ) + λ i y g i ( x, ȳ) i I + ( y f( x, ȳ) + i I λ i y g i ( x, ȳ)((x k, y k ) ( x, ȳ) ) +o ((x k, y k ) ( x, ȳ) ) y f( x, ȳ) + λ i y g i ( x, ȳ) i I } {{ } =0 + ( y f( x, ȳ) + i I λ i y g i ( x, ȳ))((x k, y k ) ( x, ȳ) ) Put +o ((x k, y k ) ( x, ȳ) ). ɛ = ( y f( x, ȳ) + i I +o ((x k, y k ) ( x, ȳ) ). λ i g i ( x, ȳ)((x k, y k ) ( x, ȳ) ) The following result is adapted from a result reported in [5]. It shows that the sequence of step size parameters is bounded away from zero if the sequence of optimal function values of problem (4.11) converges to a non zero number. Theorem Let the lower level problem be convex satisfying MFCQ. Suppose {x k, y k, λ k, d k, r k, ɛ k, s k, t k, I k } be a sequence produced by the algorithm with ɛ k 0. If the sequence {x k, y k, λ k } converges to (x, y, λ ) and lim k s k = s < 0 then lim k t k > 0. Proof. Assume that aforementioned positive step size converges to zero. Then, the sequence of maximal possible step sizes τ k must tend to zero too. Define τ k by τ k = min{max{t : G i (x k + td k ) 0}, max{t : g j (x k + td k, y k + td k ) 0}}. Then there exists or i : G i (x k ) = G i (x k + τ k d k ) = 0 (4.14) j : g j (x k, y k ) = g j (x k + τ k d k, y k + τ k d k ) = 0. If the sequence {x k }, {d k } with d k = 1 converges respectively to x and d then the sequence { Gi(x k+τ k d k ) G i(x k ) τ k } converges to G i ( x) d. Observe that for the above taken i we have Gi(x k+τ k d k ) G i(x k ) τ k = 0. But G i (x k )d k s k for G i ( x) = 0. This implies G i (x k + τ k d k ) G i (x k ) 0 = lim = G i ( x) k,τ k 0 τ d s < 0 k which is a contradiction. The other case, namely for condition (4.14) can be shown analogously.

15 12 AYALEW GETACHEW MERSHA AND STEPHAN DEMPE The next result shows that the directional derivative converges to zero. Theorem Let F be bounded below, the lower level problem be convex satisfying MFCQ, SSOC, CRCQ and the upper level constraint satisfy condition {d : d G i (x) < 0} =. Let {x k, y k, λ k, d k, r k, ɛ k, s k, t k, I k } be a sequence computed by the algorithm where ɛ k 0. Then the sequence {s k } has zero as the only accumulation point. Proof. Suppose there exist a subsequence that does not converge to zero. i.e it converges to a number say s < 0. Assume without loss of generality that such a subsequence be {s k }. From Theorem 4.11, we have lim k 0 t k > 0. But from Step 4 of the algorithm we have F (x k + t k d k, y k + t k d k ) F (x k, y k ) δs k t k, k which implies F (x k, y k ) as k. This is a contradiction to the fact that F is bounded below. Assume that λ is obtained from Step 2 of the proposed algorithm. Let ( x, ȳ, λ) be a solution of (4.3) and let s = 0 be the corresponding optimal objective function value of problem (4.11). Then we have the following result. Theorem Let (x, y) be a point satisfying G(x) 0, y Ψ(x). Let the feasible set of the lower level problem satisfy MFCQ, SSOC, and CRCQ. Moreover, let the optimal objective function value of problem (4.11) with ɛ = 0 be zero and λ be obtained by solving (4.8). Then (x, y) is a Clarke stationary point for problem (2.3). Proof. From y Ψ(x) we have Λ(x, y). That means the system λ T g(x, y) = 0 y L(x, y, λ) = 0 λ 0 has a solution. Moreover, if ɛ = 0 then from (4.11d) we have (4.15) 2 yyl(x, y, λ)r + γ T y g(x, y) = 2 xyl(x, y, λ)d. Also λ i > 0 implies g i (x, y) = 0. In this case (4.11f) gives us and hence we get λ i x g i (x, y)d + λ i y g i (x, y)r 0 x g i (x, y)d + y g i (x, y)r 0 for ɛ = 0. On the other hand from (4.11h) and since the optimal objective function value is zero we have that g i (x, y) = 0 implies x g i (x, y)d + y g i (x, y)r 0. Therefore (4.16) x g i (x, y)d = y g i (x, y)r i : λ i > 0. From (4.15) and (4.16) we derive λ arg max{ x L(x, y, λ)d : λ Λ(x, y)} and r = y (x; d). Again from g i (x, y) < 0 we have λ i = 0. This means, from (4.11i) we get γ i 0. But in (4.11f) ɛ = 0 results in γ i g i (x, y) 0 giving γ i 0.

16 FEASIBLE DIRECTION METHOD FOR BILEVEL PROGRAMMING PROBLEM 13 Therefore γ i = 0. Since d, G i (x) + x G i (x)d < 0 there is λ Λ(x, y) such that r = y (x; d) is computed by solving (4.15), (4.16) such that the system { x F (x, y)d + y F (x, y)y (x; d) < 0 G i (x) + x G i (x)d < 0 has no solution. Or in other words d, G i (x) + x G i (x)d < 0, i : 1,..., l, we have x F (x, y)d + y F (x, y)y (x; d) 0. Hence (x, y) is a Clarke stationary point (2.3) in the sense of Definition 4.2. Remark In the above theorem one can not guarantee that ( x, ȳ) is a Bouligand stationary point because the computed direction depends on the fixed multiplier. If the multiplier changes we can not guarantee the nonexistence of descent direction. Theorem Let ɛ k 0, {x k, y k, λ k, γ k, s k } be a solution of problem (4.5) for ɛ = ɛ k. Let {x k, y k, λ k, γ k, s k } converge to (x, y, λ, γ, 0). If (x, y ) satisfies SSOC, MFCQ, CRCQ and there exists d {d : x G i (x )d < 0, i : G i (x ) = 0} then (x, y ) is a Bouligand stationary point of the optimistic bilevel programming problem (2.3). Proof. Let (x, y ) be not a Bouligand stationary point. Then there must exist with d {d : x G i (x )d < 0, i : G i (x ) = 0} x F (x, y )d + y F (x, y )y (x ; d ) < 0. Since the lower level problem satisfies MFCQ, SSOC, and CRCQ then there is λ Λ(x, y ) with (r, γ ), r = y (x ; d ) is the solution of the following system of linear equations and inequalities: 2 yyl(x, y, λ )r + γ T y g(x, y ) + 2 xyl(x, y, λ )d = 0 x g i (x, y )d + y g i (x, y )r = 0, i : λ i > 0 x g i (x, y )d + y g i (x, y )r 0, i : g i (x, y ) = λ = 0 γ i 0, γ i ( x g i (x, y )d + y g i (x, y )r) = 0, i : g i (x, y ) = λ i = 0 Since the (x, y ) satisfies MFCQ and from the definition of Λ(.,.), y L(x, y, λ ) = 0 λ Λ(x, y ) λ 0 λ T g(x, y ) = 0 which implies that the optimal objective function value of (4.11) for ɛ = 0 is s < 0. Since (x k, y k ) (x, y ) for ɛ 0, ɛ > 0 then there exist k: ɛ y L(x k, y k, λ ) ɛ λ 0 k k. λ T g(x k, y k ) ɛ

17 14 AYALEW GETACHEW MERSHA AND STEPHAN DEMPE Then the following is satisfied x g i (x k, y k )d + y g i (x k, y k )r = 0, i I k g i (x k, y k ) + x g i (x k, y k )d + y g i (x k, y k )r s, i / I k ɛ 2 yyl(x k, y k, λ ) + 2 xyl(x k, y k, λ )d + γ T y g i (x k, y k ) ɛ ɛ λ i g i (x k, y k ) + γi g i(x k, y k ) + λ i ( yg i (x k, y k )d + y g i (x k, y k )r) g i (x k, y k ) + γi ( xg i (x k, y k )d + y g i (x k, y k )r) 0 γi 0, i : λ i = 0, g i(x k, y k ) ɛ which implies that the point (d, r, λ, γ ) is feasible in problem (4.11). But this means that optimal objective function value of (4.11) at the limit is s s < 0, k k which implies the sequence {s k } does not converge to zero. This is a contradiction to the fact that s k 0. Hence the limit (x, y ) is B-stationary for the bilevel programming problem(2.3). 5. Numerical Experience In this section we consider the relaxed KKT reformulation NLP (ɛ) to numerically verify our claim in the previous sections of this paper. In this section, we present our experience with numerical experiments of the proposed algorithm in order to substantiate the theory we have established in previous sections. We present two instances of numerical experiments. The first concerns verifying literature solution by directly solving problem (4.5) for a decreasing sequence of the relaxation parameter ɛ. The second concerns the application of the feasible direction method. The latter is purely the implementation of our algorithm. The numerical test is conducted on some exemplary test problems from Leyffer s MacMPEC collection [13]. MacMPEC collection is a collection of mathematical programs with equilibrium constraints coded in AMPL. We have recoded them in AMPL since we consider the relaxed problem (4.5). The experiment was conducted on Medion MD 95800, Intel R Pentium R M Processor 730, 1.6 GHz with 512 MB RAM under Microsoft Windows operating system. We used the optimization solvers SNOPT, DONLP2, LOQO, and MINOS that can be downloaded along with AMPL student edition Relaxation method. We consider problem (4.5) and the solution approach in Scholtes[21] solved by the same solver for a decreasing sequence of relaxation parameters ɛ. On most instances of test problems our relaxation method and Scholtes regularization produce a sequence of iterates that converges to a local solution of the corresponding optimistic bilevel programming reported in the literature with the exception of problems ex9.1.3, ex9.1.7 and ex Our method produced a sequence of iterates that converge to the local solution of the optimistic bilevel programming ex9.1.7 reported in the original literature. However, with the same initial point, the regularization approach in Scholtes[21] fails to converge to the literature solution of the optimistic bilevel programming problem. We obtained a better solution for problems ex9.1.3, ex9.1.8 and bard3. Table 1 gives information regarding the comparison of best function values. The first column shows serial number of considered problems, the second column indicates the name of the problem. The third, the forth and the fifth columns show respectively best function values from literature, using our approach and using Scholtes regularization scheme [21]. The result seems to confirm the relevance of relaxing the stationarity constraint of problem (4.5).

18 FEASIBLE DIRECTION METHOD FOR BILEVEL PROGRAMMING PROBLEM 15 Best function value obtained from ours Scholtes No Name literature approach approach source comment 1 bard [1] 2 bard [1] 3 ex [8] 4 ex [8] 5 ex [8] 6 ex [8] 7 ex [8] 8 ex [8] 9 ex [8] better solution 10 ex [8] 11 ex [8] 12 ex [8] 13 ex [8] 14 ex [8] 15 ex [8] 16 ex [8] 17 ex [8] 18 ex [8] 19 ex [8] 20 ex [8] 21 TuMiPh [23] 22 MD [15] 23 FLZl [9] Table 1. Comparison our approach with Scholtes regularization scheme Feasible direction method. Initially Algorithm 4.1 was implemented in Matlab 7. It is known that Matlab s programming facility made it suitable for such an iterative method. However, Matlab s optimization solvers performed poorly at least for our problems. It was observed that a wrong multiplier is calculated from problems (4.8) and (4.10). Having observed this limitation in Matlab solvers, the algorithm was re-implemented in AMPL using AMPL s script facility. The algorithm solves three problems alternately. The multiplier problem, the direction finding problem and the choice of the step size parameter. The step size t is chosen as the maximum of { 1 2, 1 4, 1 8,...} such that the system F (x k + td k, y k + tr k ) F (x k, y k ) + δts k G i (x k + td k ) 0, i = 1,..., l g j (x k + td k, y k + tr k ) 0, j = 1,..., p is satisfied. For the current incumbent primal point (x k, y k ) the algorithm first computes the corresponding dual point λ either from linear inequalities (4.8) or quadratic optimization problem (4.10) depending on the size of s at hand. Once the appropriate triplets (x k, y k, λ k ) found, the direction finding problem (4.11) is solved by CPLEX, a linear programming solver, which can be downloaded from AMPL web

19 16 AYALEW GETACHEW MERSHA AND STEPHAN DEMPE site, and problem (4.10) is solved by Spellucci s DONLP2, a nonlinear programming solver. In our AMPL implementation, this is done by alternatively solving different problems. For the test problems listed in Table 1, literature solutions are obtained. However, for some problems such as ex9.2.7 the solution slow convergence was observed. This indicates that some researches and more experiments has to be done in order to improve the performance of this algorithm. To better illustrate this we consider the following example. This example is taken from the monograph [8]. Example 5.1. Consider (5.1) where y solves min (x x 0,y 0 5)2 + (2y + 1) 2 s.t. min (y y 0 1)2 1.5xy 3x + y 3 x.5y 4 x + y 7. The convex lower level problem is replaced by the its KKT condition and then the stationarity condition is relaxed. The feasible direction method is applied to the problem (5.2) s.t. min (x 5) 2 + (2y + 1) 2 2(y 1) 1.5x + λ 1.5λ 2 + λ 3 ɛ 2(y 1) + 1.5x λ 1 +.5λ 2 λ 3 ɛ 3x + y x + 0.5y 4 0 x + y 7 0 λ 1 ( 3x + y + 3) ɛ λ 2 (x + 0.5y 4) ɛ λ 3 (x + y 7) ɛ

20 FEASIBLE DIRECTION METHOD FOR BILEVEL PROGRAMMING PROBLEM 17 The direction finding problem is given by s min d,r,s,γ 2(x 5)d + 4(2y + 1)r s x d s 2(y 1) 1.5x + λ 1.5λ 2 + λ 3 λ 4 1.5d + 2r + γ 1 0.5γ 2 + γ 3 γ 4 ɛ 2(y 1) + 1.5x λ 1 +.5λ 2 λ 3 + λ d 2r γ γ 2 γ 3 + γ 4 ɛ { = 0 if 3x + y + 3 = 0 3x + y + 3 3d + r s if 3x + y { = 0 if x 0.5y 4 = 0 x 0.5y 4 + d 0.5r s if x 0.5y 4 0 { = 0 if x + y 7 = 0 x + y 7 + d + r s if x + y 7 0 λ 1 ( 3x + y + 3) γ 1 ( 3x + y + 3) λ 1 ( 3d + r) ɛ λ 2 (x 0.5y 4) γ 2 (x 0.5y 4) λ 2 (d.5r) ɛ λ 3 (x + y 7) γ 3 (x + y 7) λ 3 (d + r) ɛ λ 4 y + γ 4 y + λ 4 r ɛ λ 1 γ 1 s 0 λ 2 γ 2 s 0 λ 3 γ 3 s 0 λ 4 γ 4 s 0 For this particular problem the initial point is taken to be (x, y ) = (1.1, 0.3), (λ 1, λ 2, λ 3, λ 4 ) = (0, 0, 0, 0), σ = 10 1, δ = 0.3, ɛ = The solution reported in the literature is (1, 0)(see [8, Problem 9.3.8]). The first iteration is the computation of the multiplier. Then the direction finding problem is solved using CPLEX. The step size parameter is chosen in the sense of Armijo. These three alternating phase of computation continues until the stopping criterion (ɛ = 10 6 for this problem) satisfied. For this particular problem, the algorithm converges to the above mentioned literature solution after several iterations. 6. Conclusion In this paper, a descent like algorithm based on the framework of a feasible direction method is proposed to compute Bouligand stationary points of optimistic bilevel programming problems with convex lower level problem. It has been shown and numerically evidenced that in addition to relaxing complementarity conditions, relaxing the stationarity conditions of the auxiliary MPEC problem (4.3) has a significant advantage on computing the appropriate multiplier, and hence finding Bouligand stationary points of the optimistic bilevel programming problems. The algorithm is implemented for some bilevel programming problems. Its performance, though in its infant stage, is found promising. References [1] J. F. Bard. Convex two-level optimization. Mathematical Programming, 40:15 27, 1988.

21 18 AYALEW GETACHEW MERSHA AND STEPHAN DEMPE [2] J. F. Bard. Some properties of the bilevel programming problem. Journal of Optimization Theory and Applications, 68: , [3] J. F. Bard. Practical Bilevel Optimization: Algorithms and Applications. Kluwer Academic Publishers, Dordrecht, [4] S. Dempe. Foundations of Bilevel programming. Kluwer Academic Publishers, Dordrecht et al., [5] S. Dempe and H. Schmidt. On an algorithm solving two-level programming problems with nonunique lower level solutions. Computational Optimization and Applications, 6: , [6] J. Dutta and S. Dempe. Bilevel programming with convex lower level problem. In S. Dempe and V. Kalashnikov, editors, Optimization with Multivalued Mappings: Theory, Applications and Algorithms, volume 2 of Springer Optimization and Its Applications, pages Springer US, [7] R. Fletcher and S. Leyffer. Numerical experience with solving MPECs as NLPs. Technical Report NA/210, Department of Mathematics, University of Dundee, [8] C. A. Floudas, P. M. Pardalos, C. S. Adjiman, W. R. Esposito, Z. H. Gümüs, S. T. Harding, J. L. Klepeis, C. A. Meyer, and C. A. Schweiger. Handbook of test problems in local and global optimization. Number 33 in Nonconvex Optimization and Its Applications. Kluwer Academic Publishers, Dordrecht, [9] C. A. Floudas and S Zlobec. Multilevel Optimization: Algorithms and Applications, volume 20, chapter Optimality and duality in parametric convex lexographic programming, pages Kluwer Academic Publishers, Dordrecht, [10] J. Gauvin. A necessary and sufficient regularity condition to have bounded multiplier in non convex programming. Mathematical Programming, 12: , [11] W.W. Hager. Lipschitz continuity for constrained processes. SIAM Journal on Control and Optimization, 17: , [12] M. Kojima. Strongly stable stationary solutions in nonlinear programs. In S.M. Robinson, editor, Analysis and Computation of Fixed Points, pages Academic Press, New York, [13] S. Leyffer. MacMPEC ampl collection of Mathematical Programs with Equilibrium Constraints. Argonne National Laboratory, MacMPEC/. [14] Z.-Q. Luo, J.-S. Pang, and D. Ralph. Mathematical Programs with Equilibrium Constraints. Cambridge University Press, Cambridge, [15] A.Getachew Mersha and S. Dempe. Linear bilevel programming with upper level constraints depending on the lower level solution. Applied Mathematics and Computation, 180: , [16] R. Mifflin. Semismooth and semiconvex functions in constrained optimization. SIAM Journal on Control and Optimization, 15: , [17] J. Outrata, M. Kočvara, and J. Zowe. Nonsmooth Approach to Optimization Problems with Equilibrium Constraints. Kluwer Academic Publishers, Dordrecht, [18] D. Ralph and S. Dempe. Directional derivatives of the solution of a parametric nonlinear program. Mathematical Programming, 70: , [19] H. Scheel and S. Scholtes. Mathematical programs with equilibrium constraints: stationarity, optimality, and sensitivity. Mathematics of Operations Research, 25:1 22, [20] S. Scholtes. Introduction to piecewise differentaible equations. Preprint , Universität Karlsruhe, Karlsruhe, [21] S. Scholtes. Convergence properties of a regularization scheme for mathematical programs with complementarity constraints. SIAM Journal on Optimization, 11: , [22] D. M. Topkis and A.F. Veinott. On the convergence of of some feasible direction algorithms for nonlinear programming. SIAM Journal on Control and Optimization, 5: , [23] H. Tuy, A. Migdalas, and N.T. Hoai-Phuong. A novel approach to bilevel nonlinear programming. Journal of Global Optimization, 38: , 2007.

22 FEASIBLE DIRECTION METHOD FOR BILEVEL PROGRAMMING PROBLEM 19 Department of Mathematics and Computer Science, Technical University Bergakademie Freiberg, Germany address: Department of Mathematics and Computer Science, Technical University Bergakademie Freiberg, Germany address: