

 Warren McDowell
 1 years ago
 Views:
Transcription
1 A Globally Convergent PrimalDual Interior Point Method for Constrained Optimization Hiroshi Yamashita 3 Abstract This paper proposes a primaldual interior point method for solving general nonlinearly constrained optimization problems. The method is based on solving the barrier KarushKuhnTucker conditions for optimality by the Newton method. To globalize the iteration we introduce the barrierpenalty fucntion and the optimality condition for minimizing this function. Our basic iteration is the Newton iteration for solving the optimality conditions with respect to the barrierpenalty function which coincides with the Newton iteration for the barrier KarushKuhn Tucker conditions if the penalty parameter is suciently large. It is proved that the method is globally convergent from an arbitrary initial point that strictly satises the bounds on the variables. Implementations of the given algorithm are done for small dense nonlinear programs. The method solves all the problems in Hock and Schittkowski's textbook eciently. Thus it is shown that the method given in this paper possesses a good theoretical convergence property and is ecient in practice. Key words: interior point method, primaldual method, constrained optimization, nonlinear programming 3 Mathematical Systems, Inc., 243, Shinjuku, Shinjukuku, Tokyo, Japan 1
2 1 Introduction In this paper we propose a primaldual interior point method that solves general nonlinearly constrained optimization problems. To obtain a fast algorithm for nonlinear optimization problems it is fairly clear from various experiences (for example the well known success of the SQP method) that we should eventually solve the KarushKuhnTucker conditions for optimality by a Newtonlike method. However, solving the optimality conditions simply as a system of equations does not give an algorithm for solving optimization problems in general except for convex problems. Therefore it is not appropriate to treat the primal and dual variables equally to obtain globally convergent methods for general nonlinear optimization problems. Observing that there is a nice relation between the parameterized optimality conditions (barrier KKT condition) and the classical logarithmic barrier function of the primal variables, we can develop a globally convergent method for general nonlinear optimization problems. To prevent the possible divergence of the dual variables, we introduce the barrier penalty function and solve the optimality conditions by a Newtonlike method. For nonlinear programs it has been believed long time that interior point methods are not practical because of inevitable numerical diculties which occur at the nal stage of iterations. See Fiacco and McCormick [4], Fletcher [5] and other standard textbooks on nonlinear optimization. Thus the state of the art method for nonlinear programs today is the SQP method (see Powell [7], Fletcher [5]) which can also be interpreted as a Newtonlike method for KarushKuhnTucker conditions near a solution. It is known that the SQP method is ecient and stable for wide range of problems. The SQP method requires the quadratic programming subproblems which handle the combinatorial aspect of the problem caused by inequality constraints. A solution of the quadratic programming problem itself requires rather expensive cost especially for large scale problems. Therefore it is desired to have an ecient and stable interior point method for nonlinear problems in the light of success of that in the eld of large scale linear programs. This paper shows that it is in fact possible to construct such a method that is globally convergent theoretically, and ecient and stable in practice. The method is tested on 115 test problems in Hock and Schittkowski's collection [6]. For these problems our method solves all the problems with 2 to 3 function evaluations per problem and about 2 iterations per problem. Details are described in Section 5. A preliminary report of this paper has appeared in [11]. In this paper, problems to be solved are restricted to small to medium ones because we do not exploit the sparsity of the matrices here. However, recent report by Yamashita, Yabe and Tanabe [13] studies a trust region type method to utilize the sparsity of the Hessian of the Lagrangian. They report eciency of the method that uses the barrier penalty function as in this paper. See also [1] for a trust region type interior point method. It is to be noted that the recent studies by Yamashita and Yabe [12] and Yabe and Yamashita [9] show the superlinear and/or quadratic convergence of a class of primaldual interior point methods that use the Newton or quasinewton iteration for solving the barrier KKT conditions. Other reports on the local behavior of primaldual interior point methods include [2] and [3]. In Section 2, we describe basic concepts in the primaldual interior point method. The 2
3 barrier penalty function which plays a key role in the method of this paper is introduced in Section 3, and analyzed there. A line search method that minimizes the barrier penalty function is described in Section 4, and is proved to be globally convergent. Section 5 reports the results of numerical experiment. Notation. The subscript k denotes an iteration count. Subscripts i and j denote components of vectors and matrices. The superscript t denotes transposes of vectors and matrices. The vector e denotes the vector of all ones and the matrix I the identity matrix. For simplicity of description, we assume k1k denotes the l 2 norm for vectors. The symbol R n denotes the n dimensional real vector space. The set R n + is dened by R n = fx + 2 jx> g. Rn 2 Primaldual interior point method In this paper, we consider the following constrained optimization problem: (1) minimize f(x); x 2 R n ; subject to g(x) = ; x ; where we assume that the functions f : R n! R 1 and g : R n! R m are twice continuously dierentiable. Let the Lagrangian function of the above problem be dened by (2) L(w) =f(x) y t g(x) z t x; where w =(x; y; z) t, and y 2 R m and z 2 R n are the Lagrange multiplier vectors which correspond to the equality and inequality constraints respectively. Then KarushKuhn Tucker (KKT) conditions for optimality of the above problem are given by 1 1 r (w) r xl(w) g(x) CA = (3) CA XZe and (4) where x ; z ; r x L(w) = rf(x) A(x) t y z; A(x) = rg 1(x) t. CA ; rg m (x) t 1 X = diag (x 1 ; 111;x n ) ; Z = diag (z 1 ; 111;z n ) : Now we approximate problem (1) by introducing the barrier function F B (; ) :R n +! R 1, (5) minimize subject to g(x) = ; F B (x; ) =f(x) np 3 log(x i );x 2 R n +
4 where the barrier parameter > is a given constant. It is well known that, if is suciently small, problem (5) is a good approximation to original problem (1) (see Fiacco and McCormick [4]). The optimality conditions for (5) are given by (6) rf (x) A(x) t y X 1 e = ; g(x) = and x>; where y 2 R m is the Lagrange multiplier for the equality constraints. If we introduce an auxiliary variable z 2 R n which is to be equal to X 1 e, then the above conditions become the conditions (7) and (8) r(w; ) r xl(w) g(x) XZe e x>; 1 CA = z>: The introduction of the variable z is essential to the numerical success of the barrier based algorithm in this paper. In this paper we call conditions (7) the barrier KKT conditions, and a point w() = (x();y();z()) that satises these conditions is called the barrier KKT point. We will use an interior point method for searching a point that approximately satises the above conditions, and nally obtain a point that satises the KarushKuhnTucker conditions by letting #. This means that we force x and z be strictly positive during iterations. Therefore we delete inequality conditions (8) hereafter, and always assume that x and z are strictly positive in what follows. Here we note that (9) where r(w; ) =r (w) ^e; ^e = e An algorithm of this paper approximately solves the sequence of conditions (7) with a decreasing sequence of the barrier parameter that tends to, and thus obtains an approximate solution to KKT conditions. For deniteness, we describe a prototype of such algorithm as follows. Algorithm IP Step. (Initialize) Set ">, M c > and k =. Let a positive sequence f k g ; k # be given. 1 CA : Step 1. (Termination) If kr (w)k ", then stop. 1 CA ; 4
5 Step 2. (Approximate barrier KKT point) Find a point w k+1 that satises (1) kr(w k+1 ; k )km c k : Step 3. (Update) Set k := k + 1 and go to Step 1. 2 The following theorem shows the global convergence property of Algorithm IP. Theorem 1 Let fw k g be an innite sequence generated by Algorithm IP. Suppose that the sequences fx k g and fy k g are bounded. Then fz k g is bounded, and any accumulation point of fw k g satises KKT conditions (3) and (4). Proof. Assume that there exists an i such that (z k ) i!1. Equation (1) yields (rf(x k) A(x k ) t y k ) i (z k ) i 1 M c k1 (z k ) i ; which isacontradiction because of the boundedness of fx k g and fy k g.thus the sequence fz k g is bounded. Let ^w be any accumulation point offw k g. Since the sequences fw k g and f k g satisfy (1) for each k and k approaches zero, r (^w) = follows from the denition of r(w; ). Therefore the proof is complete. 2 We note that the barrier parameter sequence f k g in Algorithm IP need not be determined beforehand. The value of each k may be set adaptively as the iteration proceeds. An example of updating method of k is described in Section 5. We call condition (1) the approximate barrier KKT condition, and call a point that satises this condition the approximate barrier KKT point. To nd an approximate barrier KKT point for a given >, we use the Newtonlike method in this paper. Let 1w =(1x; 1y; 1z) t be dened by a solution of (11) where (12) J(w) = J(w)1w = r(w; ); G A(x)t I A(x) Z X Then the basic iteration of the Newtonlike method may be described as 1 CA : (13) w k+1 = w k +3 k 1w k ; where 3 k = diag( xk I n ; yk I m ; zk I n ) is composed of step sizes in x, y and z variables. If G = r 2 L(w), then 1w becomes Newton's direction for solving (7). To solve (11), we x split the equations into two groups. Thus we solve! G + X 1 Z A(x) t 1x!! = r xl(w)+x 1 e z (14) ; A(x) 1y g(x) 5
6 for (1x; 1y) t, then we obtain 1z by the third equation in (11). If G + X 1 Z is positive denite and A(x) is of full rank, then the coecient matrix in (14) is nonsingular. It will be useful to note that (14) can be written as (15) where (16) G + X 1 Z A(x) t A(x)! 1x ey ~y = y +1y:! = rf(x)+x1 e g(x) In this paper, we will deal with the case in which the matrix G can be assumed nonnegative denite. Therefore, to solve the general nonlinear problems, we will use a positive denite quasinewton approximation to the Hessian matrix of the Lagrangian function to obtain the desired property of the matrix G.! ; 3 Barrier penalty function To globalize the convergence property ofaninterior point algorithm based on the above iteration, we introduce two auxiliary problems. Firstly mp we dene the following problem: minimize F P (x; ) =f(x)+ jg i (x)j; x 2 R n ; (17) subject to x ; where the penalty parameter is a given positive constant. The necessary conditions for optimality of this problem are (see 14.2 of Fletcher [5]) r x L(w) = ; ( mx ) (18) y jg i (x)j ; XZe = ; x ;z ; where the means the subdierential of the function in the braces with respect to g. In our case the second condition in (18) is equivalent to (19) y i ; g i (x) =; y i = ; g i (x) > ; y i =; g i (x) < ; for each i =1; 111;m. This condition can be expressed as jg i (x)j = y i g i (x); y i ; i =1; 111;m; or jg i (x)j + y ig i (x) =; y i ; i =1; 111;m: 6
7 Therefore conditions (18) can be written as r (w) = r xl(w) (2) r E (w) XZe and (21) where 1 CA = x ; z ; y i ; i =1; 111;m; 1 CA r E (w) i = jg i (x)j + y ig i (x) ;; 111;m: Note that we are using the same symbol r (w) to denote the residual vector of the optimality conditions as in Section 2 for simplicity. Ifkyk 1 <, conditions (18) are equivalent to conditions (3) and (4). In this sense, problem (17) is equivalent to problem (1). Next we introduce the barrier penalty function F (; ; ) :R n +! R1 by (22) F (x; ; ) =f(x) nx log x i + mx jg i (x)j ; where and are given positive constants. This function plays an essential role in the method given in this paper. Let us approximate problem (17) by the following problem (23) minimize F (x; ; ); x 2 R n + : The necessary conditions for optimality of a solution to the above problem are r x L(w) = ; ( mx ) (24) y jg i (x)j XZe = e; x > ; z>; where we introduce an auxiliary variable z 2 R n as in (7). As above, conditions (24) can be written as r(w; ) xl(w) (25) r E (w) CA = CA XZe e and (26) 1 x>; z>; y i ; i =1; 111;m; where we use the same symbol r(w; ) as in Section 2 for simplicity. If>kyk 1 then conditions (25) and (26) coincide with (7) and (8). We call a point w() =(x();y();z()) 2 R n 2 R m 2 R n that satises (24) for a given > the barrier KKT point for this as before. We can use Algorithm IP to solve (17). The following theorem shows the global convergence property of Algorithm IP for solving (17). 1 7
8 Theorem 2 Let fw k g be an innite sequence generated byalgorithm IP for solving (17). Suppose that the sequences fx k g is bounded. Then fz k g is bounded, and any accumulation point of fw k g satises the optimality conditions (2) and (21) for problem (17). 2 Now we formulate a Newtonlike iteration for solving the above conditions (25). Thus we calculate the rst order change of (24) with respect to a change in w. This gives (G + X 1 Z)1x A(x) t 1y = r x L(w)+X 1 e z; ( mx ~y y +1y g i (x)+rg i (x) 1x ) (27) t ; 1z = X 1 Z1x + X 1 e z: Following lemma gives a basic property of the iteration vector 1w =(1x; 1y; 1z) t. Lemma 1 Suppose that 1w satises (27) at an interior point w. (i) If >keyk 1, then 1w is identical to the one given by (11). (ii) If 1w =, then the point w isabarrier KKT point that satises (24). (iii) If 1x =, then the point (x; y +1y; z +1z) is a barrier KKT point that satises (24). 2 If we consider the subproblem: (28) minimize 1 2 1xt (G+X 1 Z)1x+(rf(x)X 1 e) t 1x+ mx g i (x)+rg i (x) t 1x ; 1x 2 R n ; the solution vector 1x and the corresponding multiplier vector ~y that satisfy the necessary conditions for optimality also satisfy conditions (27). If G + X 1 Z is positive denite we can solve the problem (28) by a straightforward active set method which starts with an active set that contains all the constraints g i (x)+rg i (x) t 1x =;; 111m. An example of the procedure is described in Fletcher [5]. It is important to note that, if the vectors 1x and ~y obtained by solving (15) satisfy >k~yk 1, then the desired iteration vector that satises (27) are also obtained. The procedure described above is devised instead of the simple Newton iteration given in Section 2 in order to show away of preventing possible divergence of the dual variable y. However for the practical purpose it seems sucient to solve only equation (11) once per iteration as shown in sections below in which practical experiences obtained by the author on general nonlinear programming problems are described. Therefore this procedure is of only theoretical importance at present. As shown above, optimality conditions (2) and (21) are identical to optimality conditions (3) and (4) if we set = 1. Also, Newtonlike iteration (27) coincides with (11) when = 1. Therefore we will regard that the method explained below includes a method for solving (1) when = 1 for simplicity of exposition. We note that the penalty parameter which will appear below should be nite even if = 1. Another interesting point to be noted in the above iteration is that it can give a solution to an infeasible problem. Even if the constraints in problem (1) are incompatible, the 8
9 method described above will give a solution to problem (17) as shown below. A solution to problem (17) with infeasible constraints may give useful information about problem (1). Now we proceed to an analysis of the properties of the barrier penalty function. The directional derivative F (x; ; ; s) of the function F (x; ; ) along an arbitrary given direction s is dened by F F (x + s; ; ) F (x; ; ) (x; ; ; s) = lim # = rf(x) X X t s e t X 1 s + rg i (x) t s + + rg i (x) s X t rg i (x) t s; where the summations in the above equation are to be understood as X X a i = a i X X ; a i = a i X X ; a i = a i : + g i > We introduce a rst order approximation F l of F (x + s; ; ) by F l (x; ; ; s) f(x)+rf (x) t s log(x i )+ s i (29) + mx g i = nx g i (x)+rg i (x) t s; g i < x i and an estimate of the rst order change 1F l of F by (3) 1F l (x; ; ; s) F l (x; ; ; s) F (x; ; ); = rf(x) t s e t X 1 s + mx jg i (x)+rg i (x)sj mx jg i (x)j for an arbitrary given direction s. We have following properties for the quantities dened above which give an extension to the similar properties in the case of dierentiable functions (see for example Yamashita [1]). Lemma 2 Let >, > and s 2 R n be given. Then the following assertions hold. (i) The function F l (x; ; ; s) is convex with respect to the variable. (ii) There holds the relation (31) F (x; ; )+F (x; ; ; s) F l (x; ; ; s): (iii) Further, there exists a 2 (; 1) such that (32) F (x + s; ; ) F (x; ; )+F (x + s; ; ; s); whenever x + s>. 9
10 Proof. The rst statement of the lemma is obvious. If > is suciently small, we have F l (x; ; ; s) = F (x; ; )+F (x; ; ; s) = F (x; ; )+F (x; ; ; s): Since F l (x; ; ; s) isconvex with respect to and coincides with a linear function of > when is suciently small, we obtain (31). Now we show (32). If we consider F (x + s; ; ; s) as a function of the variable 2 [; 1], the number of discontinuous points of F is nite. Therefore there exists a 2 (; 1) such that F (x + s; ; ; s) Z1 F (x + s; ; ; s)d = F (x + s; ; ) F (x; ; ): This completes the proof. 2 Lemma 3 Let " 2 (; 1) be a given constant and s 2 R n be given. If 1F l (x; ; ; s) <, then (33) F (x + s; ; ) F (x; ; ) " 1F l (x; ; ; s); for suciently small >. Proof. From (32), there exists a 2 (; 1) such that (34) F (x + s; ; ) F (x; ; ) F (x + s; ; ; s) = F (x + s; ; ; s): From (31) we have (35) F (x + s; ; ; s) 1F l (x + s; ; ; s): Because 1F l (; ; ; s) is continuous, and 1F l (x; ; ; s) < by the assumption, we obtain (36) 1F l (x + s; ; ; s) " 1F l (x; ; ; s); for suciently small ksk. From (34)  (36) we obtain (33) for suciently small >. 2 Lemma 4 Suppose that 1w satises (27). If <, then (37) 1F l (x; ; ;1x) 1x t (G + X 1 Z)1x mx ( j~y i j)jg i (x)j: Further if, G is positive semidenite and keyk 1 1F l (x; ; ;1x) =yields 1x =., then 1F l (x; ; ;1x), and 1
11 Proof. From (27) and (3) we have 1F l (x; ; ;1x) =1x t (G+X 1 Z)1x+1x t A(x) t ~y+ The ith components in the summations in the last three terms give mx g i (x)+rg i (x) t 1x mx jg i (x)j: (38) ~y i rg i (x) t 1x + g i (x)+rg i (x) 1x t jg i (x)j ~y i rg i (x) t 1x + g i (x)+rg i (x) 1x t jg i (x)j = ~y i g i (x) jg i (x)j ; where the inequality in the second line follows from, and the equality in the third line follows from the property ~y i ; g i (x)+rg i (x) t 1x =; ~y i = ; g i (x)+rg i (x) t 1x>; ~y i =; g i (x)+rg i (x) t 1x<: The relation (38) gives (37). 2 4 Line search algorithm To obtain a globally convergent algorithm to a barrier KKT point for a xed >, it is necessary to modify the basic Newton iteration with the unit step length somehow. Our iterations consist of (39) x k+1 = x k + xk 1x k ; y k+1 = y k + yk 1y k ; z k+1 = z k + zk 1z k ; where xk, yk and zk are step sizes determined by the line search procedures described below. The main iteration is to decrease the value of the barrier penalty function for xed. Thus the step size of the primal variable x is determined by the sucient decrease rule of the merit function. The step size of the dual variable z is determined so as to stabilize the iteration. The explicit rules follow in order. We adopt Armijo's rule as the line search rule for the variable x. At the point x k,we calculate the maximum allowed step to the boundary of the feasible region by (4) kmax = min i ( (x k) i (1x k ) i (1x k) i < i.e., the step size kmax gives an innitely large value of the barrier penalty function F if it exists, because of the barrier terms, and a step size 2 [; kmax ) gives a strictly feasible primal variable. A step to the next iterate is given by ) ; (41) xk = k l k ; k = min f kmax ; 1g ; 11
12 where 2 (; 1) and 2 (; 1) are xed constants and l k is the smallest nonnegative integer such that (42) F (x k + k l k 1x k; ; ) F (x k ; ; ) " k l k 1F l(x k ; ; ;1x k ); where " 2 (; 1). Typical values of these parameters are = :5, = :9995 and " =1 6. Therefore we will try the sequence x k +:9995 kmax 1x k ; x k +:5 2 :9995 kmax 1x k ; x k +:25 2 :9995 kmax 1x k ; 111 for example, and will nd a step size that satises (42). If G is positive semidenite, then 1F l (x k ; ; ;1x k ) by Lemma 4, and therefore the existence of such steps is assured by Lemma 3. For the variable z, we adopt the box constraints rule, i.e., we force x and z to satisfy the condition (43) c Lki ((x k ) i + xk (1x k ) i )((z k ) i + zk (1z k ) i ) c Uki ; i =1; 111;n at the end of each iteration, where the bounds c Lk and c Uk satisfy (44) <c Lki <<c Uki ; i =1; 111;n: To this end, we let (45) c Lki = minn M L ; ((x k ) i + xk (1x k ) i )(z k ) io ; c Uki = max fm U ; ((x k ) i + xk (1x k ) i )(z k ) i g ; where M L > 1 and M U > 1 are given constants. The construction of the above bounds shows that current z satises (46) (47) c Lki ((x k ) i + xk (1x k ) i ) (z k) i The step size z is determined by zk = min ( min i ( max i ( i c Uki ((x k ) i + xk (1x k ) i ) ; c Lki ((x k ) i + xk (1x k ) i ) c Uki ((z k ) i + i (1z k ) i ) ((x k ) i + xk (1x k ) i ) i =1; 111;n: )) ) ; 1 : The rule (47) means that the step size z is the maximal allowed step that satises the box constraints with the restriction of being not greater than the unit step length. Lemma 5 Suppose that an innite sequence fw k g is generated for xed >. Then if lim inf k!1(x k ) i > and lim sup k!1(x k ) i < 1, then lim inf k!1(c Lk ) i > and lim sup k!1(c Uk ) i < 1 for i =1; 111;n. 12
13 Proof. Suppose that (c Lk ) i! for an i and some subsequence K f; 1; 2; 111g. Then by the denition of (c Lk ) i in (45), (z k ) i! ;k 2 K. However, in order for a subsequence of f(z k ) i g to tend to, there must be an iteration k at which the lower bound (c Lk ) i =(x k+1 ) i of (z k ) i is arbitrary small and the value of (z k ) i at the iteration is strictly larger than that bound, i.e. at the iteration the value of (z k ) i decreases to a strictly smaller value. This means that at iteration k, (c Lk ) i = =M L from the denition (45), and therefore the value of (x k+1 ) i must be arbitrary large because =M L < (x k+1 ) i (z k ) i and (z k ) i! ;k 2 K. This is impossible because of the assumption of the lemma. The proof of the boundedness of (c Uk ) i is similar. 2 (48) In actual calculation we modify the direction 1z k (1z k ) i = 8>< >: ; if (z k) i = c Lki=(x k+1) i and (1z k ) i < ; ; if (z k ) i = c Uki =(x k+1 ) i and (1z k ) i > ; (1z k ) i ; otherwise: This modication means that we project the direction along the boundary of the box constraints if the point z k is on that boundary and the direction 1z k points outward of the box. This procedure is adopted because it gives better numerical results. The global convergence results shown in the following are equally valid for both unmodied and modied directions. For the variable y, there exist three obvious choices for the step length: by (49) yk =1 or xk or zk : The global convergence property given below holds for these choices. We choose yk = zk from numerical experiments. The following algorithm describes the iteration for xed > and >. We note that this algorithm corresponds to Step 2 of Algorithm IP in Section 2. Algorithm LS Step. (Initialize) Let w 2 R n + 2 Rm 2 R n +, and >, >. Set " >, 2 (; 1), 2 (; 1), " 2 (; 1), M L > 1 and M U > 1. Let k =. Step 1. (Termination) If kr(w k ;)k " ; then stop. Step 2. (Compute direction) Calculate the direction 1w k by (27). Step 3. (Stepsize) Set kmax = min i ( (x k) i ) (1x k ) i (1x k) i < ; k = min f kmax ; 1g : Find the smallest nonnegative integer l k that satises F (x k + k l k 1x k ;;) F (x k ;;) " k l k 1F l (x k ; ; ;1x k ): Calculate xk = k l k ; 13
14 c Lki = min ; (x k + xk 1x k ) i (z k ) i ; M L c Uki = max fm U ; (x k + xk 1x k ) i (z k ) i g ; ( ( ( i c Lki zk = min min max i i ((x k ) i + xk (1x k ) i c Uki )) ) ((z k ) i + i (1z k ) i ) ; 1 ; ((x k ) i + xk (1x k ) i ) yk = zk ; 3 k = diagf xk I n ; yk I m ; zk I n g: Step 4. (Update variables) Set w k+1 = w k +3 k 1w k : Step 5. Set k := k + 1 and go to Step 1. 2 To prove global convergence of Algorithm LS, we need the following assumptions. Assumption G (1) The functions f and g i ;i =1; :::; m, are twice continuously dierentiable. (2) The level set of the barrier penalty function at an initial point x 2 R n +, which is dened by nx 2 R n jf (x; ; ) + F (x ; ; )o, is compact for given >. (3) The matrix A(x) is of full rank on the level set dened in (2). (4) The matrix G k is positive semidenite and uniformly bounded. (5) The penalty parameter satises ky k +1y k k 1 for each k =; 1; :::. 2 We note that if a quasinewton approximation is used for computing the matrix G k, then we need the continuity of only the rst order derivatives of functions in Assumption G(1). We also note that if 1F l (x k ; ; ;1x k ) =, at an iteration k, then the step sizes xk = yk = zk = 1 are adopted and (x k+1 ;y k+1 ;z k+1 ) gives a barrier KKT point from Lemma 1 and Lemma 4. The following theorem gives a convergence of an innite sequence generated by Algorithm LS. Theorem 3 Let an innite sequence fw k g be generated byalgorithm LS. Then there exists at least one accumulation point of fw k g, and any accumulation point of the sequence fw k g is a barrier KKT point. 14
15 Proof. First we note that each component of the sequence fx k g is bounded away from zero and bounded above by the assumption and the existence of the log barrier term. Therefore the sequence fx k g has at least one accumulation point. The sequence fz k g also has these properties by Lemma 5. Thus there exists a positive number M such that (5) kpk 2 M pt (G k + X 1 k Z k)p M kpk 2 ; 8p 2 R n ; by the assumption. From (37) and (5), we have (51) and from (42), 1F l (x k ; ; ;1x k ) k1x kk 2 M < ; (52) F (x k+1 ; ; ) F (x k ; ; ) " k l k 1F l(x k ; ; ;1x k ) " k l k k1x kk 2 M < : Because the sequence ff (x k ; ; )g is decreasing and bounded below, the left hand side of (52) converges to. Since lim inf k!1(x k ) i > ;i =1; 111;n,wehave lim inf k!1 k >. Suppose that there exists a subsequence K f; 1; 111g and a such that (53) lim inf k!1 k1x kk>; k 2 K: Then we have l k!1;k 2 K from (52) because the left most expression tends to zero, and therefore we can assume l k > for suciently large k 2 K without loss of generality. If l k > then the point x k + xk 1x k = does not satisfy the condition (42). Thus, we have (54) F (x k + xk 1x k =; ; ) F (x k ; ; ) >" xk 1F l (x k ; ; ;1x k )=: By (32) and (31), there exists a k 2 (; 1) such that (55) F (x k + xk 1x k =; ; ) F (x k ; ; ) xk F (x k + k xk 1x k =; ; ;1x k )= xk 1F l (x k + k xk 1x k =; ; ;1x k )=; k 2 K: Then, from (54) and (55), we have " 1F l (x k ; ; ;1x k ) < 1F l (x k + k xk 1x k =; ; ;1x k ): This inequality yields (56) 1F l (x k + k xk 1x k =; ; ;1x k ) 1F l (x k ; ; ;1x k ) > (" 1)1F l (x k ; ; ;1x k ) > : Because 1x k is a solution of problem (28) and there holds (5), k1x k k is uniformly bounded above. Then by the property l k! 1,we have k k xk 1x k =k!;k 2 K. 15
16 Thus the left hand side of (56) and therefore 1F l (x k ; ; ;1x k ) converges to zero when k!1;k 2 K. This contradicts the assumption (53) because we have 1x k! ;k 2 K from (51). Therefore we proved (57) lim k1x k k =: k!1;k2k Let an arbitrary accumulation point of the sequence fx k g be ^x 2 R n + and let x k! ^x; k 2 K for K f; 1; 111g.Thus (58) x k! ^x; 1x k! ; x k+1! ^x; k 2 K: Because nx 1 Z k ko is bounded, we have lim k!1;k2kz k +1z k X e 1 = k from (27). If we dene ^z = ^X 1 e where ^X = diag(^x 1 ; 111; ^x n ), then we have Hence from (45) we have z k +1z k! ^z; k 2 K: (c Lk ) i M L (x k+1 ) i (z k +1z k ) i M U (C Uk ) i ; i =1; 111;n for k 2 K suciently large, which shows that the point z k +1z k is always accepted as z k+1 for suciently large k 2 K. Since zk = 1 is accepted for k 2 K suciently large, so is yk =1. Therefore we obtain lim r x L(^x; y k +1y k ; ^z) = ; k!1;k2k lim y k +1y k k!1;k2k ( nx ) jg i (^x)j : Because the matrix A(^x) is of full rank, the sequence fy k +1y k g ;k 2 K converges to a point ^y 2 R m which satises r x L(^x; ^y; ^z) = ; mx! ^y jg i (^x)j ; ^X ^z = e; ^x >; ^z >: This completes the proof because we proved that there exists at least one accumulation point offx k g, and for an arbitrary accumulation point ^x of fx k g, there exist unique ^y and ^z that satisfy the above. 2 16
17 5 Numerical Result In this section, we report numerical results of an implementation of the algorithm given in this paper for nonlinear programming problems. We set = 1 in this experiment. The software is called NUOPT and the code is written bytakahito Tanabe. In order to havean appropriate positive semidenite matrix G by a reasonable cost for nonlinear problems, we resort to a quasinewton approximation to the Hessian matrix of the Lagrangian function. We use updating formula suggested by Powell[7] for the SQP method: G k+1 = G k G ks k s t k G k s t k G ks k + u ku t k s t u ; k k where u k is calculated by s k = x k+1 x k ; v k = r x L(x k+1 ;y k+1 ;z k+1 ) r x L(x k ;y k+1 ;z k+1 ); u k = k v k +(1 k )G k s k ; k = 8< : :8s t k G k s k s t k G k s ks t k v k 1; s t k v k :2s t k G ks k ; ; s t k v k :2s t k G ks k ; to satisfy s t u k k > for the hereditary positive deniteness of the update. Method for updating the barrier parameter k is as follows. Suppose we have an approximate barrier KKT point w k+1 that satises kr(w k+1 ; k )km c k ; in Step 2 of Algorithm IP. Then k+1 is dened by k+1 = max( kr(wk+1 ; k ) )k ; k ; M M where < M c < M and M > 1 should be satised. In our experiment weset M c =3;M =4;M = 5. As in the SQP method, we expect fast local convergence of the method if is suciently small near a solution because it is based on the Newton iteration for the optimality conditions and a quasinewton approximation to the second derivative matrix. As noted in the above, this expectation is proved by Yamashita and Yabe [12]. Linear equation (14 ) is solved by using the BunchParlett factorization. The test problems for nonlinear problems are adopted from the book by Hock and Schittkowski [6]. The results are summarized in Table 1 at the end of this paper. Following list explains the notations used in Table 1: n= number of variables. m=number of constraints. obj=nal objective function value. res=norm of nal KKT condition residual. itr=iteration count. 17
A Perfect Example for The BFGS Method
A Perfect Example for The BFGS Method YuHong Dai Abstract Consider the BFGS quasinewton method applied to a general nonconvex function that has continuous second derivatives. This paper aims to construct
More informationA Study on SMOtype Decomposition Methods for Support Vector Machines
1 A Study on SMOtype Decomposition Methods for Support Vector Machines PaiHsuen Chen, RongEn Fan, and ChihJen Lin Department of Computer Science, National Taiwan University, Taipei 106, Taiwan cjlin@csie.ntu.edu.tw
More informationA Tutorial on Support Vector Machines for Pattern Recognition
c,, 1 43 () Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. A Tutorial on Support Vector Machines for Pattern Recognition CHRISTOPHER J.C. BURGES Bell Laboratories, Lucent Technologies
More informationSubspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity
Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity Wei Dai and Olgica Milenkovic Department of Electrical and Computer Engineering University of Illinois at UrbanaChampaign
More informationTHE PROBLEM OF finding localized energy solutions
600 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1997 Sparse Signal Reconstruction from Limited Data Using FOCUSS: A Reweighted Minimum Norm Algorithm Irina F. Gorodnitsky, Member, IEEE,
More informationRELAXATION HEURISTICS FOR THE SET COVERING PROBLEM
Journal of the Operations Research Society of Japan 2007, Vol. 50, No. 4, 350375 RELAXATION HEURISTICS FOR THE SET COVERING PROBLEM Shunji Umetani University of ElectroCommunications Mutsunori Yagiura
More informationNew Support Vector Algorithms
LETTER Communicated by John Platt New Support Vector Algorithms Bernhard Schölkopf Alex J. Smola GMD FIRST, 12489 Berlin, Germany, and Department of Engineering, Australian National University, Canberra
More information11 Projected Newtontype Methods in Machine Learning
11 Projected Newtontype Methods in Machine Learning Mark Schmidt University of British Columbia Vancouver, BC, V6T 1Z4 Dongmin Kim University of Texas at Austin Austin, Texas 78712 schmidtmarkw@gmail.com
More informationIterative Methods for Optimization
Iterative Methods for Optimization C.T. Kelley North Carolina State University Raleigh, North Carolina Society for Industrial and Applied Mathematics Philadelphia Contents Preface xiii How to Get the Software
More informationIEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2013. ACCEPTED FOR PUBLICATION 1
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2013. ACCEPTED FOR PUBLICATION 1 ActiveSet Newton Algorithm for Overcomplete NonNegative Representations of Audio Tuomas Virtanen, Member,
More informationMichael Ulbrich. Nonsmooth Newtonlike Methods for Variational Inequalities and Constrained Optimization Problems in Function Spaces
Michael Ulbrich Nonsmooth Newtonlike Methods for Variational Inequalities and Constrained Optimization Problems in Function Spaces Technische Universität München Fakultät für Mathematik June 21, revised
More informationMaxPlanckInstitut für biologische Kybernetik Arbeitsgruppe Bülthoff
MaxPlanckInstitut für biologische Kybernetik Arbeitsgruppe Bülthoff Spemannstraße 38 7276 Tübingen Germany Technical Report No. 44 December 996 Nonlinear Component Analysis as a Kernel Eigenvalue Problem
More informationRECENTLY, there has been a great deal of interest in
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 1, JANUARY 1999 187 An Affine Scaling Methodology for Best Basis Selection Bhaskar D. Rao, Senior Member, IEEE, Kenneth KreutzDelgado, Senior Member,
More informationSupport Vector Machines
CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe are indeed the best)
More informationChoosing Multiple Parameters for Support Vector Machines
Machine Learning, 46, 131 159, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Choosing Multiple Parameters for Support Vector Machines OLIVIER CHAPELLE LIP6, Paris, France olivier.chapelle@lip6.fr
More informationOptimization by Direct Search: New Perspectives on Some Classical and Modern Methods
SIAM REVIEW Vol. 45,No. 3,pp. 385 482 c 2003 Society for Industrial and Applied Mathematics Optimization by Direct Search: New Perspectives on Some Classical and Modern Methods Tamara G. Kolda Robert Michael
More informationChapter 5. Banach Spaces
9 Chapter 5 Banach Spaces Many linear equations may be formulated in terms of a suitable linear operator acting on a Banach space. In this chapter, we study Banach spaces and linear operators acting on
More informationThe Set Covering Machine
Journal of Machine Learning Research 3 (2002) 723746 Submitted 12/01; Published 12/02 The Set Covering Machine Mario Marchand School of Information Technology and Engineering University of Ottawa Ottawa,
More informationHow to Use Expert Advice
NICOLÒ CESABIANCHI Università di Milano, Milan, Italy YOAV FREUND AT&T Labs, Florham Park, New Jersey DAVID HAUSSLER AND DAVID P. HELMBOLD University of California, Santa Cruz, Santa Cruz, California
More informationFast Solution of l 1 norm Minimization Problems When the Solution May be Sparse
Fast Solution of l 1 norm Minimization Problems When the Solution May be Sparse David L. Donoho and Yaakov Tsaig October 6 Abstract The minimum l 1 norm solution to an underdetermined system of linear
More informationChapter 3 OneStep Methods 3. Introduction The explicit and implicit Euler methods for solving the scalar IVP y = f(t y) y() = y : (3..) have an O(h) global error which istoolow to make themofmuch practical
More informationFrom Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images
SIAM REVIEW Vol. 51,No. 1,pp. 34 81 c 2009 Society for Industrial and Applied Mathematics From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images Alfred M. Bruckstein David
More informationIEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006 1289. Compressed Sensing. David L. Donoho, Member, IEEE
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006 1289 Compressed Sensing David L. Donoho, Member, IEEE Abstract Suppose is an unknown vector in (a digital image or signal); we plan to
More informationA Googlelike Model of Road Network Dynamics and its Application to Regulation and Control
A Googlelike Model of Road Network Dynamics and its Application to Regulation and Control Emanuele Crisostomi, Steve Kirkland, Robert Shorten August, 2010 Abstract Inspired by the ability of Markov chains
More informationSampling 50 Years After Shannon
Sampling 50 Years After Shannon MICHAEL UNSER, FELLOW, IEEE This paper presents an account of the current state of sampling, 50 years after Shannon s formulation of the sampling theorem. The emphasis is
More informationWorking Set Selection Using Second Order Information for Training Support Vector Machines
Journal of Machine Learning Research 6 (25) 889 98 Submitted 4/5; Revised /5; Published /5 Working Set Selection Using Second Order Information for Training Support Vector Machines RongEn Fan PaiHsuen
More informationON THE DISTRIBUTION OF SPACINGS BETWEEN ZEROS OF THE ZETA FUNCTION. A. M. Odlyzko AT&T Bell Laboratories Murray Hill, New Jersey ABSTRACT
ON THE DISTRIBUTION OF SPACINGS BETWEEN ZEROS OF THE ZETA FUNCTION A. M. Odlyzko AT&T Bell Laboratories Murray Hill, New Jersey ABSTRACT A numerical study of the distribution of spacings between zeros
More informationControllability and Observability of Partial Differential Equations: Some results and open problems
Controllability and Observability of Partial Differential Equations: Some results and open problems Enrique ZUAZUA Departamento de Matemáticas Universidad Autónoma 2849 Madrid. Spain. enrique.zuazua@uam.es
More informationDecoding by Linear Programming
Decoding by Linear Programming Emmanuel Candes and Terence Tao Applied and Computational Mathematics, Caltech, Pasadena, CA 91125 Department of Mathematics, University of California, Los Angeles, CA 90095
More informationAn Introduction to the NavierStokes InitialBoundary Value Problem
An Introduction to the NavierStokes InitialBoundary Value Problem Giovanni P. Galdi Department of Mechanical Engineering University of Pittsburgh, USA Rechts auf zwei hohen Felsen befinden sich Schlösser,
More information