Michael Ulbrich. Nonsmooth Newtonlike Methods for Variational Inequalities and Constrained Optimization Problems in Function Spaces


 Earl Tyler Riley
 1 years ago
 Views:
Transcription
1 Michael Ulbrich Nonsmooth Newtonlike Methods for Variational Inequalities and Constrained Optimization Problems in Function Spaces Technische Universität München Fakultät für Mathematik June 21, revised February 22
2 Table of Contents 1. Introduction Examples of Applications Optimal Control Problems Variational Inequalities Motivation of the Method FiniteDimensional Variational Inequalities InfiniteDimensional Variational Inequalities Organization Elements of FiniteDimensional Nonsmooth Analysis Generalized Differentials Semismoothness Semismooth Newton s Method Higher Order Semismoothness Examples of Semismooth Functions The Euclidean Norm The Fischer Burmeister Function Piecewise Differentiable Functions Extensions Newton Methods for Semismooth Operator Equations Introduction Newton Methods for Abstract Semismooth Operators Semismooth Operators in Banach Spaces Basic Properties Semismooth Newton s Method Inexact Newton s Method Projected Inexact Newton s Method Alternative Regularity Conditions Semismooth Newton Methods for Superposition Operators Assumptions A Generalized Differential Semismoothness of Superposition Operators Illustrations
3 II Table of Contents Proof of the Main Theorems Semismooth Newton Methods Semismooth Composite Operators and Chain Rules Further Properties of the Generalized Differential Smoothing Steps and Regularity Conditions Smoothing Steps A Newton Method without Smoothing Steps Sufficient Conditions for Regularity Variational Inequalities and Mixed Problems Application to Variational Inequalities Problems with BoundConstraints Pointwise Convex Constraints Mixed Problems Karush Kuhn Tucker Systems Connections to the Reduced Problem Relations between Full and Reduced Newton System Smoothing Steps Regularity Conditions TrustRegion Globalization The TrustRegion Algorithm Global Convergence Implementable Decrease Conditions Transition to Fast Local Convergence Applications Distributed Control of a Nonlinear Elliptic Equation BlackBox Approach AllatOnce Approach Finite Element Discretization Discrete BlackBoxApproach Efficient Solution of the Newton System Discrete AllatOnce Approach Numerical Results Using Multigrid Techniques BlackBox Approach AllatOnce Approach Nested Iteration Discussion of the Results Obstacle Problems Dual Problem Regularized Dual Problem Discretization
4 Table of Contents III Numerical Results Optimal Control of the Incompressible Navier Stokes Equations Introduction Functional Analytic Setting of the Control Problem Function Spaces The Control Problem Analysis of the Control Problem State Equation ControltoState Mapping Adjoint Equation Properties of the Reduced Objective Function Application of Semismooth Newton Methods Optimal Control of the Compressible Navier Stokes Equations Introduction The Flow Control Problem AdjointBased Gradient Computation Semismooth BFGSNewton Method QuasiNewton BFGSApproximations The Algorithm Numerical Results A. Appendix A.1 Adjoint Approach for Optimal Control Problems A.1.1 Adjoint Representation of the Reduced Gradient A.1.2 Adjoint Representation of the Reduced Hessian A.2 Several Inequalities A.3 Elementary Properties of Multifunctions A.4 Nemytskij Operators Notations References
5 Acknowledgments It is my great pleasure to thank Prof. Dr. Klaus Ritter for his constant support and encouragement over the past ten years. Furthermore, I would like to thank Prof. Dr. Johann Edenhofer who stimulated my interest in optimal control of PDEs. My scientific work benefited significantly from two very enjoyable and fruitful research stays at the Department of Computational and Applied Mathematics (CAAM) and the Center for Research on Parallel Computation (CRPC), Rice University, Houston, Texas. These visits were made possible by Prof. John Dennis and Prof. Matthias Heinkenschloss. I am very thankful to both of them for their hospitality and support. During my second stay at Rice University, I laid the foundation of a large part of this work. The visits were funded by the Forschungsstipendium Ul157/11 and the Habilitandenstipendium Ul157/31 of the Deutsche Forschungsgemeinschaft, and by CRPC grant CCR This support is gratefully acknowledged. The computational results in chapter 9 for the boundary control of the compressible Navier Stokes equations build on joint work with Prof. Scott Collis, Prof. Matthias Heinkenschloss, Dr. Kaveh Ghayour, and Dr. Stefan Ulbrich as part of the Rice AeroAcoustic Control (RAAC) project, which is directed by Scott Collis and Matthias Heinkenschloss. I thank all RAAC group members for allowing me to use their contributions to the project for my computations. In particular, Scott Collis Navier Stokes solver was very helpful. The computations for chapter 9 were performed on an SGI Origin 2 at Rice University which was purchased with the aid of NSF SCREMS grant I am very thankful to Matthias Heinkenschloss for giving me access to this machine. Furthermore, I would like to thank Prof. Dr. Folkmar Bornemann for the opportunity to use his SGI Origin 2 for computations. I also would like to acknowledge the Zentrum Mathematik, Technische Universität München, for providing a very pleasant and professional working environment. In particular, I am thankful to the members of our Rechnerbetriebsgruppe, Dr. Michael Nast, Dr. Andreas Johann, and Rolf Schöne, for their good system administration and their helpfulness. In making the ideas for this work concrete, I profited from an inspiring conversation with Prof. Liqun Qi, Prof. Danny Ralph, and PD Dr. Christian Kanzow during the ICCP99 meeting in Madison, Wisconsin, which I would like to acknowledge. Finally, I wish to thank my parents, Margot and Peter, and my brother Stefan for always being there for me.
6 1. Introduction A central theme of applied mathematics is the design of accurate mathematical models for a variety of technical, financial, medical, and many other applications, and the development of efficient numerical algorithms for their solution. Often, these models contain parameters that should be adjusted in an optimal way, either to maximize the accuracy of the model (parameter identification), or to control the simulated system in a desired way (optimal control). Since optimization with simulation constraints is more challenging than simulation alone (which already can be very involved on its own), the development and analysis of efficient optimization methods is crucial for the viability of this approach. Besides the optimization of systems, minimization problems and variational inequalities often arise already in the process of building mathematical models; this, e.g., applies to contact problems, free boundary problems, and elastoplastic problems [47, 62, 63, 97, 98, 117]. Most of the variational problems mentioned so far join the property that they are continuous in time and/or space, so that infinitedimensional function spaces provide the appropriate setting for their analysis. Since essential information on the problem to solve is carried by the properties of the underlying infinitedimensional spaces, the successful design of robust and meshindependent optimization methods requires a thorough convergence analysis in this infinitedimensional function space setting. The purpose of this work is to develop and analyze a class of Newtontype methods for the solution of optimization problems and variational inequalities that are posed in function spaces and contain pointwise inequality constraints. A representative prototype of the problems we consider here is the following: BoundConstrained Variational Inequality Problem (VIP): Find u L p (Ω) such that: u B def = {v L p (Ω) : a v b on Ω}, F (u), v u for all v B. (1.1) Hereby, u, v = Ω u(ω)v(ω)dω, and F : Lp (Ω) L p (Ω) with p, p (1, ], 1/p + 1/p 1, is an (in general nonlinear) operator, where L p (Ω) is the usual Lebesgue space on the bounded Lebesgue measurable set Ω R n. We assume that Ω has positive Lebesgue measure, so that < µ(ω) <. These requirements on Ω are assumed throughout this work. In case this is needed (e.g., for embeddings), but not explicitly stated, we assume that Ω is nonempty, open, and bounded with
7 2 1. Introduction sufficiently smooth boundary Ω. The lower and upper bound functions a and b may be present only on measurable parts Ω a and Ω b of Ω, which is achieved by setting a Ω\Ωa = and b Ω\Ωb = +, respectively. We assume that the natural extensions by zero of a Ωa and b Ωb to Ω are elements of L p (Ω). We also require a minimum distance ν > of the bounds from each other, i.e., b a ν on Ω. In the definition of B, and throughout this work, relations between measurable functions are meant to hold pointwise almost everywhere on Ω in the Lebesgue sense. Various extensions of problem (1.1) will also be considered and are discussed below. In many situations, the VIP (1.1) describes the firstorder necessary optimality conditions of the boundconstrained minimization problem minimize j(u) subject to u B. (1.2) In this case, F is the Fréchet derivative j : L p (Ω) L p (Ω) of the objective functional j : L p (Ω) R. The methods we are going to investigate are best explained by considering the unilateral case with lower bounds a. The resulting problem is called nonlinear complementarity problem (NCP): u L p (Ω), u, F (u), v u for all v L p (Ω), v. (1.3) As we will see, and as might be obvious to the reader, (1.3) is equivalent to the pointwise complementarity system u, F (u), uf (u) = on Ω. (1.4) The basic idea, which was developed in the nineties for the numerical solution of finitedimensional NCPs, consists in the observation that (1.3) is equivalent to the operator equation Φ(u) =, where Φ(u) = φ ( u(ω), F (u)(ω) ) ω Ω. (1.5) Hereby, φ : R 2 R is an NCPfunction, i.e., φ(x) = x 1, x 2, x 1 x 2 =. We will develop a semismoothness concept that is applicable to the operators arising in (1.5) and that allows us to develop a class of Newtontype methods for the solution of (1.5). The resulting algorithms have, as their finitedimensional counter parts the semismooth Newton methods several remarkable properties: (a) The methods are locally superlinearly convergent, and they converge with qrate > 1 under slightly stronger assumptions. (b) Although an inequality constrained problem is solved, only one linear operator equation has to be solved per iteration. Thus, the cost per iteration is comparable to that of Newton s method for smooth operator equations. We remark that sequential quadratic programming (SQP) algorithms, which are very efficient in
8 1. Introduction 3 practice, require the solution of an inequality constrained quadratic program per iteration, which can be significantly more expensive. Thus, it is also attractive to combine SQP methods with the class of Newton methods we describe here, either by using the Newton method for solving subproblems, or by rewriting the complementarity conditions in the Kuhn Tucker system as operator equation. (c) The convergence analysis does not require a strict complementarity condition to hold. Therefore, we can prove fast convergence also for the case where the set {ω : ū(ω) =, F (ū)(ω) = } has positive measure at the solution ū. (d) The systems that have to be solved in each iteration are of the form [d 1 I + d 2 F (u)]s = Φ(u), (1.6) where I : u u is the identity and F denotes the Fréchet derivative of F. Further, d 1, d 2 are nonnegative L functions that are chosen depending on u and satisfy < γ 1 < d 1 + d 2 < γ 2 on Ω uniformly in u. More precisely: (d 1, d 2 ) is a measurable selection of the measurable multifunction ω Ω φ ( u(ω), F (u)(ω) ), where φ is Clarke s generalized gradient of φ. As we will see, in typical applications the system (1.6) can be symmetrized and is not much harder to solve than a system involving only the operator F (u), which would arise for the unconstrained problem F (u) =. In particular, fast solvers like multigrid methods, preconditioned iterative solvers, etc., can be applied to solve (1.6). (e) The method is not restricted to the problem class (1.1). Among the possible extensions we also investigate variational inequality problems of the form (1.1), but with the feasible set B replaced by C = {u L p (Ω) m : u(ω) C on Ω}, C R m closed and convex. Furthermore, we will consider mixed problems, where F (u) is replaced by F (y, u) and where we have the additional operator equation E(y, u) =. In particular, such problems arise as the firstorder necessary optimality conditions (Karush Kuhn Tucker or KKTconditions) of optimization problems with optimal control structure minimize J(y, u) subject to E(y, u) =, u C. (f) Other extensions are possible that we do not cover in this work. For instance, certain quasivariational inequalities [12, 13], i.e., variational inequalities for which the feasible set depends on u (e.g., a = A(u), b = B(u)), can be solved by our class of semismooth Newton methods. For illustration, we begin with examples of two problem classes that fit in the above framework.
9 4 1. Introduction 1.1 Examples of Applications Optimal Control Problems Let be given the state space Y (a Banach space), the control space U = L p (Ω), and the set B U of admissible or feasible controls as defined in (1.1). The state y Y of the system under consideration is governed by the state equation E(y, u) =, (1.7) where E : Y U W and W denotes the dual of a reflexive Banach space W. In our context, the state equation usually is given by the weak formulation of a partial differential equation (PDE), including all boundary conditions that are not already contained in the definition of Y. Suppose that, for every control u U, the state equation (1.7) possesses a unique solution y = y(u) Y. The control problem consists in finding a control ū such that the pair (y(ū), ū) minimizes a given objective function J : Y U R among all feasible controls u B. Thus, the control problem is minimize y Y,u U J(y, u) subject to (1.7) and u B. (1.8) Alternatively, we can use the state equation to express the state in terms of the control, y = y(u), and to write the control problem in the equivalent reduced form minimize j(u) subject to u B, (1.9) with the reduced objective function j(u) def = J(y(u), u). By the implicit function theorem, the continuous differentiability of y(u) in a neighborhood of ū follows if E is continuously differentiable and E y (y(ū), ū) is continuously invertible. Further, if in addition J is continuously differentiable in a neighborhood of (y(ū), ū) then j is continuously differentiable in a neighborhood of ū. In the same way, differentiability of higher order can be ensured. For problem (1.9), the gradient j (u) U is given by j (u) = J u (y, u) + y u (u) J y (y, u), with y = y(u). Alternatively, j can be represented via the adjoint state w = w(u) W, which is the solution of the adjoint equation E y (y, u) w = J y (y, u), where y = y(u). As discussed in more detail in appendix A.1, the gradient of j can be written in the form j (u) = J u (y, u) + E u (y, u) w. Adjointbased expressions for the second derivative j are also available, see appendix A.1.
10 1.1 Examples of Applications 5 We now make the example more concrete and consider as state equation the Poisson problem with distributed control on the right hand side, y = u on Ω, y = on Ω, (1.1) and an objective function of tracking type J(y, u) = 1 y d ) 2 Ω(y 2 dx + λ 2 Ω u 2 dx. Hereby, Ω R n is a nonempty and bounded open set, y d L 2 (Ω) is a target state that we would like to achieve as well as possible by controlling u, and the second term is for the purpose of regularization (the parameter λ > is typically very small, e.g., λ = 1 3 ). We incorporate the boundary conditions into the state space by choosing Y = H 1 (Ω), the Sobolev space of functions vanishing on Ω. For the control space we choose U = L 2 (Ω). The control problem thus is minimize y H 1 (Ω),u L2 (Ω) 1 2 y d ) Ω(y 2 dx + λ 2 subject to y = u, u B. Ω u 2 dx (1.11) Defining the operator E : Y U W def = Y, E(y, u) = y u, we can write the state equation in the form (1.7). We identify L 2 (Ω) with its dual and introduce the Gelfand triples Then H 1 (Ω) = Y U = L2 (Ω) Y = H 1 (Ω). J y (y, u) = y y d, J u (y, u) = λu, E u (y, u)v = v v U, E y (y, u)z = z z Y. Therefore, the adjoint state w W = W = H 1 (Ω) is given by w = y d y on Ω, w = on Ω, (1.12) where y solves (1.1). Note that in (1.12) the boundary conditions could also be omitted because they are already enforced by w H 1 (Ω). The gradient of the reduced objective function j thus is j (u) = J u (y, u) + E u (y, u) w = λu w with y = y(u) and w = w(u) solutions of (1.1) and (1.12), respectively. This problem has the following properties that are common to many control problems and will be of use later on:
11 6 1. Introduction The mapping u w(u) possesses a smoothing property. In fact, w is a smooth (in this simple example even affine linear and bounded) mapping from U = L 2 (Ω) to W = H 1 (Ω), which is continuously embedded in L p (Ω) for appropriate p > 2. If the boundary of Ω is sufficiently smooth, elliptic regularity results even imply that the mapping u w(u) maps smoothly into H 1 (Ω) H 2 (Ω). The solution ū is contained in L p (Ω) U (note that Ω is bounded) for appropriate p (2, ] if the bounds satisfy a Ωa L p (Ω a ), b Ωb L p (Ω b ). In fact, let p (2, ] be such that H 1 (Ω) Lp (Ω). As we will see shortly, j (ū) = λū w vanishes on Ω = {ω : a(ω) < ū(ω) < b(ω)}. Thus, using w H 1 (Ω) Lp (Ω), we conclude ū Ω = λ 1 w Ω L p (Ω ). On Ω a \ Ω we have ū = a, and on Ω b \ Ω holds ū = b. Hence, ū L p (Ω). Therefore, the reduced problem (1.9) is of the form (1.2). Due to strict convexity of j, it can be written in the form (1.1) with F = j, and it enjoys the following properties: There exist p, p (2, ] such that F : L 2 (Ω) L 2 (Ω) is continuously differentiable (here even continuous affine linear). F has the form F (u) = λu + G(u), where G : L 2 (Ω) L p (Ω) is locally Lipschitz continuous (here even continuous affine linear). The solution is contained in L p (Ω). This problem arises as special case in the class of nonlinear elliptic control problems that we discuss in detail in section 7.1. The distributed control of the right hand side can be replaced by a variety of other control mechanisms. One alternative is Neumann boundary control. To describe this briefly, let us assume that the boundary Ω is sufficiently smooth with positive and finite Hausdorff measure. We consider the problem minimize y H 1 (Ω),u L 2 ( Ω) 1 2 Ω(y y d ) 2 dx + λ 2 subject to y + y = f on Ω, Ω u 2 ds y = u on Ω, u B, n (1.13) where B U = L 2 ( Ω), f W = H 1 (Ω), and / n denotes the outward normal derivative. The state equation in weak form reads v Y : ( y, v) L 2 (Ω) 2 + (y, v) L 2 (Ω) = f, v H 1 (Ω),H 1 (Ω) + (u, v Ω ) L 2 ( Ω), where Y = H 1 (Ω). This can be written in the form E(y, u) = with E : H 1 (Ω) L 2 ( Ω) H 1 (Ω). A calculation similar as above yields for the reduced objective function j (u) = λu w Ω, where the adjoint state w = w(u) W = H 1 (Ω) is given by
12 w + w = y d y on Ω, 1.1 Examples of Applications 7 w n = on Ω. Using standard results on Neumann problems, we see that the mappings u L 2 ( Ω) y(u) H 1 (Ω) w(u) H 1 (Ω) are continuous affine linear, and thus is u L 2 ( Ω) w(u) Ω H 1/2 ( Ω) L p ( Ω) for appropriate p > 2. Therefore, we have a scenario comparable to the distributed control problem, but now posed on the boundary of Ω Variational Inequalities As further application, we discuss a variational inequality arising from obstacle problems. For q [2, ), let g H 2,q (Ω) represent a (lower) obstacle located over the nonempty bounded open set Ω R 2 with sufficiently smooth boundary, denote by y H 1 (Ω) the position of a membrane, and by f L q (Ω) external forces. For compatibility we assume g on Ω. Then y solves the problem 1 minimize y H 1(Ω) 2 a(y, y) (f, y) L2 subject to y g, (1.14) where a(y, z) = i,j y z a ij, x i x j a ij = a ji C 1 ( Ω), and a being H 1 elliptic. Let A L(H1, H 1 ) be the operator induced by a, i.e., a(y, z) = y, Az H 1,H 1. It can be shown, see section 7.3 and [22], that (1.14) possesses a unique solution ȳ H 1 (Ω) and that, in addition, ȳ H2,q (Ω). Using Fenchel Rockafellar duality [49], an equivalent dual problem can be derived, which (written as minimization problem) assumes the form minimize u L 2 (Ω) 1 2 (f + u, A 1 (f + u)) L 2 (g, u) L 2 subject to u. (1.15) The dual problem admits a unique solution ū L 2 (Ω), which in addition satisfies ū L q (Ω). From the dual solution ū we can recover the primal solution ȳ via ȳ = A 1 (f + ū). Obviously, the objective function in (1.15) is not L 2 coercive, which we compensate by adding a regularization. This yields the objective function j λ (u) = 1 2 (f + u, A 1 (f + u)) L 2 (g, u) L 2 + λ 2 u u d 2 L 2,
13 8 1. Introduction where λ > is a (small) parameter and u d L p (Ω), p [2, ), is chosen appropriately. We will show in section 7.3 that the solution ū λ of the regularized problem minimize u L 2 (Ω) j λ (u) subject to u (1.16) lies in L p (Ω) and satisfies ū λ ū H 1 = o(λ 1/2 ), which implies ȳ λ ȳ H 1 = o(λ 1/2 ), where ȳ λ = A 1 (f + ū λ ). Since j λ is strictly convex, problem (1.16) can be written in the form (1.1) with F = j λ. We have F (u) = λu + A 1 def (f + u) g λu d = λu + G(u). Using that A L(H 1, H 1 ) is a homeomorphism, and that H 1 (Ω) L p (Ω) for all p [1, ), we conclude that the operator G maps L 2 (Ω) continuously affine linearly into L p (Ω). Therefore, we see: F : L 2 (Ω) L 2 (Ω) is continuously differentiable (here even continuous affine linear). F has the form F (u) = λu + G(u), where G : L 2 (Ω) L p (Ω) is locally Lipschitz continuous (here even continuous affine linear). The solution is contained in L p (Ω). A detailed discussion of this problem including numerical results is given in section 7.3. In a similar way, obstacle problems on the boundary can be treated. Furthermore, timedependent parabolic variational inequality problems can be reduced, by semidiscretization in time, to a sequence of elliptic variational inequality problems. 1.2 Motivation of the Method The class of methods for solving (1.1) that we consider here is based on the following equivalent formulation of (1.1) as a system of pointwise inequalities: (i) a u b, (ii) (u a)f (u), (iii) (u b)f (u) on Ω. (1.17) On Ω \Ω a, condition (ii) has to be interpreted as F (u), and on Ω \Ω b condition (iii) means F (u). The equivalence of (1.1) and (1.17) is easily verified. In fact, if u is a solution of (1.1) then (i) holds. Further, if (ii) is violated on a set Ω of positive measure, we define v B by v = a on Ω, and v = u on Ω \ Ω, and obtain the contradiction F (u), v u = F (u)(a u)dω <. In the same way, (iii) Ω can be shown to hold. Conversely, if u solves (1.17) then (i) (iii) imply that Ω is the union of the disjoint sets {a < u < b, F (u) = }, Ω = {u = a, F (u) }, and Ω {u = b, F (u) }. Now, for arbitrary v B, we have F (u), v u = F (u)(v a)dω + F (u)(v b)dω, Ω Ω
14 1.2 Motivation of the Method 9 so that u solves (1.1). As already mentioned, an important special case, which will provide our main example throughout, is the nonlinear complementarity problem (NCP), which corresponds to a and b +. Obviously, unilateral problems can be converted to an NCP via the transformation ũ = u a, F (ũ) = F (ũ + a) in the case of lower bounds, and ũ = b u, F (ũ) = F (b ũ) in the case of upper bounds. For NCPs, (1.17) reduces to (1.4). In finite dimensions, the NCP and, more generally, the boxconstrained variational inequality problem (which is also called mixed complementarity problem, MCP) have been extensively investigated and there exists a significant, rapidly growing body of literature on numerical algorithms for their solution, see section Hereby, a major role is played by devices that allow to reformulate the problem equivalently in form of a system of (nonsmooth) equations. We begin with a description of these concepts in the framework of finitedimensional MCPs and NCPs FiniteDimensional Variational Inequalities Although we consider finitedimensional problems throughout this section 1.2.1, we will work with the same notations as in the function space setting (a, b, u, F, etc.), since there is no danger of ambiguity. In analogy to (1.4), the finitedimensional mixed complementarity problem consists in finding u R m such that a i u i b i, (u i a i )F i (u), (u i b i )F i (u), i = 1,..., m, (1.18) where a, b R m and F : R m R m are given. We begin with an early approach by Eaves [48] who observed (in the more general framework of VIPs on closed convex sets) that (1.18) can be equivalently written in the form u P [a,b] (u F (u)) =, (1.19) where P [a,b] (u) = max{a, min{u, b}} (componentwise) is the Euclidean projection onto [a, b] = m i=1 [a i, b i ]. Note that if the function F is C k then the left hand side of (1.19) is piecewise C k and thus, as we will see, semismooth. The reformulation (1.19) can be embedded in a more general framework. To this end, we interpret (1.18) as a system of m conditions of the form α x 1 β, (x 1 α)x 2, (x 1 β)x 2, (1.2) which have to be fulfilled by x = (u i, F i (u)) for [α, β] = [a i, b i ], i = 1,..., m. Given any function φ [α,β] : R 2 R with the property we can write (1.18) equivalently as φ [α,β] (x) = (1.2) holds, (1.21) φ [ai,b i ](u i, F i (u)) =, i = 1,..., m. (1.22)
15 1 1. Introduction A function with the property (1.21) is called MCPfunction for the interval [α, β] (also the name BVIPfunction is used, where BVIP stands for box constrained variational inequality problem). The link between (1.19) and (1.22) consists in the fact that the function φ [α,β] : R 2 R 2, φ E [α,β] (x) = x 1 P [α,β] (x 1 x 2 ) with P [α,β] (t) = max{α, min{t, β}} (1.23) defines an MCPfunction for the interval [α, β]. The reformulation of NCPs requires only an MCPfunction for the interval [, ). As already said, such functions are called NCPfunctions. According to (1.21), φ : R 2 R is an NCPfunction if and only if φ(x) = x 1, x 2, x 1 x 2 =. (1.24) The corresponding reformulation of the NCP then is φ(u 1, F 1 (u)) Φ(u) def =. φ(u m, F m (u)) and the NCPfunction φ E [, ) can be written in the form φ E (x) = φ E [, ) (x) = min{x 1, x 2 }. =, (1.25) A further important reformulation, which is due to Robinson [127], uses the normal map F [a,b] (z) = F (P [a,b] (z)) + z P [a,b] (z). It is not difficult to see that any solution z of the normal map equation F [a,b] (z) = (1.26) gives rise to a solution u = P [a,b] (z) of (1.18), and, conversely, that, for any solution u of (1.26), the vector z = u F (u) solves (1.26). Therefore, the MCP (1.18) and the normal equation (1.26) are equivalent. Again, the normal map is piecewise C k if F is C k. In contrast to the reformulation based on NCP and MCPfunctions, the normal map approach evaluates F only at feasible points, which can be advantageous in certain situations. Many modern algorithms for finite dimensional NCPs and MCPs are based on reformulations by means of the Fischer Burmeister NCPfunction φ F B (x) = x 1 + x 2 x x2 2, (1.27) which was introduced by Fischer [55]. This function is Lipschitz continuous and 1 order semismooth on R 2 (the definition of semismoothness is given below, and, in more detail, in chapter 2). Further, φ F B is C on R 2 \ {}, and (φ F B ) 2 is continuously differentiable on R 2. The latter property implies that, if F is continuously
16 1.2 Motivation of the Method 11 differentiable, the function 1 2 ΦF B (u) T Φ F B (u) can serve as a continuously differentiable merit function for (1.25). It is also possible to obtain 1order semismooth MCPfunctions from the Fischer Burmeister function, see [18, 54] and section The described reformulations were successfully used as basis for the development of locally superlinearly convergent Newtontype methods for the solution of (mixed) nonlinear complementarity problems [18, 38, 39, 45, 5, 52, 53, 54, 88, 89, 93, 116, 124, 14]. This is remarkable, since all these reformulations are nonsmooth systems of equations. However, the underlying functions are semismooth, a concept introduced by Mifflin [113] for realvalued functions on R n, and extended to mappings between finitedimensional spaces by Qi [12] and Qi and Sun [122]. Hereby details are given in chapter 2 a function f : R l R m is called semismooth at x R l if it is Lipschitz continuous near x, directionally differentiable at x, and if sup f(x + h) f(x) Mh = o( h ) as h, M f(x+h) where the setvalued function f : R l R m l, f(x) = co{m R m l : x k x, f is differentiable at x k and f (x k ) M} denotes Clarke s generalized Jacobian ( co is the convex hull). It can be shown that piecewise C 1 functions are semismooth, see section Further, it is easy to prove that Newton s method (where in Newton s equation the Jacobian is replaced by an arbitrary element of f) converges superlinearly in a neighborhood of a CDregular ( CD for Clarkedifferential) solution x, i.e., a solution where all elements of f(x ) are invertible. More details on semismoothness in finite dimensions can be found in chapter 2. It should be mentioned that also continuously differentiable NCPfunctions can be constructed. In fact, already in the seventies, Mangasarian [11] proved the equivalence of the NCP to a system of equations, which, in our terminology, he obtained by choosing the NCPfunction φ M (x) = θ( x 2 x 1 ) θ(x 2 ) θ(x 1 ), where θ : R R is any strictly increasing function with θ() =. Maybe the most straightforward choice is θ(t) = t, which gives φ M = 2φ E. If, in addition, θ is C 1 with θ () =, then φ M is C 1. This is, e.g., satisfied by θ(t) = t t. Nevertheless, most modern approaches prefer nondifferentiable, semismooth reformulations. This has a good reason. In fact, consider (1.25) with a differentiable NCPfunction. Then the Jacobian of Φ is given by Φ (u) = diag ( φ x1 (u i, F (u i )) ) + diag ( φ x2 (u i, F (u i )) ) F (u). Now, since φ(t, ) = = φ(, t) for all t, we see that φ (, ) =. Thus, if strict complementarity is violated for the ith component, i.e., if u i = = F i (u), then the ith row of Φ (u) is zero, and thus Newton s method is not applicable if strict complementarity is violated at the solution. This can be avoided by using nonsmooth
17 12 1. Introduction NCPfunctions, because they can be constructed in such a way that any element of the generalized gradient φ(x) is bounded away from zero at any point x R 2. For the Fischer Burmeister function, e.g., holds φ F B (x) = (1, 1) x/ x 2 for all x and thus g for all g φ F B (x) and all x R 2. The development of nonsmooth Newton methods [12, 13, 12, 122, 118], especially the unifying notion of semismoothness [12, 122], has led to considerable research on numerical methods for the solution of finitedimensional VIPs that are based on semismooth reformulations [18, 38, 39, 5, 52, 53, 54, 88, 89, 93, 116, 14]. These investigations confirm that this approach admits an elegant and general theory (in particular, no strict complementarity assumption is required) and leads to very efficient numerical algorithms [54, 115, 116]. Related approaches The research on semismoothnessbased methods is still in progress. Promising new directions of research are provided by Jacobian smoothing methods and continuation methods [31, 29, 92]. Hereby, a family of functions (φ µ ) µ is introduced such that φ is a semismooth NCP or MCPfunction, φ µ, µ >, is smooth and φ µ φ in a suitable sense as µ. These functions are used to derive a family of equations Φ µ (u) = in analogy to (1.25). In the continuation approach [29], a sequence (u k ) of approximate solutions corresponding to parameter values µ = µ k with µ k is generated such that u k converges to a solution of the equation Φ (u) =. Steps are usually obtained by solving the smoothed Newton equation Φ µ k (u k )s c k = Φ µ k (u k ), yielding centering steps towards the central path {x : Φ µ (x) = for some µ > }, or by solving the Jacobian smoothing Newton equation Φ µ k (u k )s k = Φ (u k ), yielding fast steps towards the solution set of Φ (u) =. The latter steps are also used as trial steps in the recently developed Jacobian smoothing methods [31, 92]. Since the limit operator Φ is semismooth, the analysis of these methods heavily relies on the properties of Φ and the semismoothness of Φ. The smoothing approach is also used in the development of algorithms for mathematical programs with equilibrium constraints (MPECs) [51, 57, 9, 19]. In this difficult class of problems, an objective function f(u, v) has to be minimized under the constraint u S(v), where S(v) is the solution set of a VIP that is parameterized by v. Under suitable conditions on this inner problem, S(v) can be characterized equivalently by its KKT conditions. These, however, when taken as constraints for the outer problem, violate any standard constraint qualification. Alternatively, the KKT conditions can be rewritten as a system of semismooth equations by means of an NCPfunction. This, however, introduces the (mainly numerical) difficulty of nonsmooth constraints, which can be circumvented by replacing the NCPfunction with a smoothing NCPfunction and considering a sequence of solutions of the smoothed MPEC corresponding to µ = µ k, µ k. In conclusion, semismooth Newton methods are at the heart of many modern algorithms in finitedimensional optimization, and hence should also be investigated
18 1.2 Motivation of the Method 13 in the framework of optimal control and infinitedimensional VIPs. This is the goal of the present manuscript InfiniteDimensional Variational Inequalities A main concern of this work is to extend the concept of semismooth Newton methods to a class of nonsmooth operator equations sufficiently rich to cover appropriate reformulations of the infinitedimensional VIP (1.1). In a first step we derive analogues of the reformulations in section 1.2.1, but now in the function space setting. We begin with the NCP (1.4). Replacing componentwise operations by pointwise (a.e.) operations, we can apply an NCPfunction φ pointwise to the pair of functions (u, F (u)) to define the superposition operator Φ(u)(ω) = φ ( u(ω), F (u)(ω) ). (1.28) which, under appropriate assumptions, defines a mapping Φ : L p (Ω) L r (Ω), r 1, see section Obviously, (1.4) is equivalent to the nonsmooth operator equation Φ(u) =. (1.29) In the same way, the more general problem (1.1) can be converted into an equivalent nonsmooth equation. To this end, we use a semismooth NCPfunction φ and a semismooth MCPfunction φ [α,β], < α < β < +. Now, we define the operator Φ : L p (Ω) L r (Ω), F (u)(ω) ω Ω \ (Ω a Ω b ), φ ( u(ω) a(ω), F (u)(ω) ) ω Ω a \ Ω b, Φ(u)(ω) = φ ( b(ω) u(ω), F (u)(ω) ) ω Ω b \ Ω a, φ [a(ω),b(ω)] (u(ω), F (u)(ω)) ω Ω a Ω b. (1.3) Again, Φ is a superposition operator on the four different subsets of Ω distinguished in (1.3). Along the same line, the normal map approach can be generalized to the function space setting. We will concentrate on NCPfunction based reformulations and their generalizations. Our approach is applicable whenever it is possible to write the problem under consideration as an operator equation in which the underlying operator is obtained by superposition Ψ = ψ G of a Lipschitz continuous and semismooth function ψ and a continuously Fréchet differentiable operator G with reasonable properties, which maps into a direct product of Lebesgue spaces. We will show that the results for finitedimensional semismooth equations can be extended to superposition operators in function spaces. To this end, we first develop a general semismoothness concept for operators in Banach spaces and then use these results to analyze superlinearly convergent Newton methods for semismooth operator equations. Then we apply this theory to superposition operators in function spaces of the form Ψ = ψ G. We work with a setvalued generalized differential Ψ that is motivated by Qi s
19 14 1. Introduction finitedimensional Csubdifferential. The semismoothness result we establish is an estimate of the form sup Ψ(y + s) Ψ(y) Ms L r = o( s Y ) as s Y. M Ψ(y+s) We also prove semismoothness of order α >, which means that the above estimate holds with o( s Y ) replaced by O( s 1+α Y ). This semismoothness result enables us to apply the class of semismooth Newton methods that we analyzed in the abstract setting. If applied to nonsmooth reformulations of variational inequality problems, these methods can be regarded as infinitedimensional analogues of finitedimensional semismooth Newton methods for this class of problems. As a consequence, we can adjust to the function space setting many of the ideas that were developed for finitedimensional VIPs in recent years. 1.3 Organization We now give an overview on the organization of this work. In chapter 2 we recall important results of finitedimensional nonsmooth analysis. Several generalized differentials known from the literature (Clarke s generalized Jacobian, Bdifferential, and Qi s Csubdifferential) and their properties are considered. Furthermore, finitedimensional semismoothness is discussed and semismooth Newton methods are introduced. Finally, we give important examples for semismooth functions, e.g., piecewise smooth functions, and discuss finitedimensional generalizations of the semismoothness concept. In the first part of chapter 3 we establish semismoothness results for operator equations in Banach spaces. The definition is based on a setvalued generalized differential and requires an approximation condition to hold. Furthermore, semismoothness of higher order is introduced. It is shown that continuously differentiable operators are semismooth with respect to their Fréchet derivative, and that the sum, composition, and direct product of semismoothness operators is again semismooth. The semismoothness concept is used to develop a Newton method for semismooth operator equations that is superlinearly convergent (with qorder 1 + α in the case of αorder semismoothness). Several variants of this method are considered, including an inexact version that allows to work with approximate generalized differentials in the Newton system, and a version that includes a projection in order to stay feasible with respect to a given closed convex set containing the solution. In the second part of chapter 3 this abstract semismoothness concept is applied to the concrete situation of operators obtained by superposition of a Lipschitz continuous semismooth function and a smooth operator mapping into a product of Lebesgue spaces. This class of operators is of significant practical importance as it contains reformulations of variational inequalities by means of semismooth NCP, MCP, and related functions. We first develop a suitable generalized differential that has simple structure and is closely related to the finitedimensional Csubdifferential. Then
20 1.3 Organization 15 we show that the considered superposition operators are semismooth with respect to this differential. We also develop results to establish semismoothness of higher order. The theory is illustrated by applications to the NCP. The established semismoothness of superposition operators enables us, via nonsmooth reformulations, to develop superlinearly convergent Newton methods for the solution of the NCP (1.4), and, as we show in chapter 5, for the solution of the VIP (1.1) and even more general problems. Finally, further properties of the generalized differential are considered. In chapter 4 we investigate two ingredients that are needed in the analysis of chapter 3. In chapter 3 it becomes apparent that in general a smoothing step is required to close a gap between two different L p norms. This necessity was already observed in similar contexts [95, 143]. In section 4.1 we describe a way how smoothing steps can be constructed, which is based on an idea by Kelley and Sachs [95]. Furthermore, in section 4.2 we investigate a particular choice of the MCPfunction that leads to reformulations for which no smoothing step is required. The analysis of semismooth Newton methods in chapter 3 relies on a regularity condition that ensures the uniform invertibility (between appropriate spaces) of the generalized differentials in a neighborhood of the solution. In section 4.3 we develop sufficient conditions for this regularity assumption. In chapter 5 we show how the developed concepts can be applied to solve more general problems than NCPs. In particular, we propose semismooth reformulations for boundconstrained VIPs and, more generally, for VIPs with pointwise convex constraints. These reformulations allow us to apply semismooth Newton methods for their solution. Furthermore, we discuss how semismooth Newton methods can be applied to solve mixed problems, i.e., systems of VIPs and smooth operator equations. Hereby, we concentrate on mixed problems arising as the Karush Kuhn Tucker (KKT) conditions of constrained optimization problems with optimal control structure. A close relationship between reformulations based on the blackbox approach, in which the reduced problem is considered, and reformulations based on the allatonce approach, where the full KKTsystem is considered, is established. We observe that the generalized differentials of the blackbox reformulation appear as Schur complements in the generalized differentials of the allatonce reformulation. This can be used to relate regularity conditions of both approaches. We also describe how smoothing steps can be computed. In chapter 6 we describe a way to make the developed class of semismooth Newton methods globally convergent by embedding them in a trust region method. To this end, we propose three variants of minimization problems such that solutions of the semismooth operator equation are critical points of the minimization problem. Then we develop and analyze a class of nonmonotone trustregion methods for the resulting optimization problems in a general Hilbert space setting. The trial steps have to fulfill a model decrease condition, which, as we show, can be implemented by means of a generalized fraction of Cauchy decrease condition. For this algorithm global convergence results are established. Further, it is shown how semismooth Newton steps can be used to compute trial steps and it is proved that, under
Controllability and Observability of Partial Differential Equations: Some results and open problems
Controllability and Observability of Partial Differential Equations: Some results and open problems Enrique ZUAZUA Departamento de Matemáticas Universidad Autónoma 2849 Madrid. Spain. enrique.zuazua@uam.es
More informationOptimization by Direct Search: New Perspectives on Some Classical and Modern Methods
SIAM REVIEW Vol. 45,No. 3,pp. 385 482 c 2003 Society for Industrial and Applied Mathematics Optimization by Direct Search: New Perspectives on Some Classical and Modern Methods Tamara G. Kolda Robert Michael
More informationTHE PROBLEM OF finding localized energy solutions
600 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1997 Sparse Signal Reconstruction from Limited Data Using FOCUSS: A Reweighted Minimum Norm Algorithm Irina F. Gorodnitsky, Member, IEEE,
More informationSubspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity
Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity Wei Dai and Olgica Milenkovic Department of Electrical and Computer Engineering University of Illinois at UrbanaChampaign
More informationIEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006 1289. Compressed Sensing. David L. Donoho, Member, IEEE
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006 1289 Compressed Sensing David L. Donoho, Member, IEEE Abstract Suppose is an unknown vector in (a digital image or signal); we plan to
More informationA Modern Course on Curves and Surfaces. Richard S. Palais
A Modern Course on Curves and Surfaces Richard S. Palais Contents Lecture 1. Introduction 1 Lecture 2. What is Geometry 4 Lecture 3. Geometry of InnerProduct Spaces 7 Lecture 4. Linear Maps and the Euclidean
More informationFrom Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images
SIAM REVIEW Vol. 51,No. 1,pp. 34 81 c 2009 Society for Industrial and Applied Mathematics From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images Alfred M. Bruckstein David
More informationA UNIQUENESS RESULT FOR THE CONTINUITY EQUATION IN TWO DIMENSIONS. Dedicated to Constantine Dafermos on the occasion of his 70 th birthday
A UNIQUENESS RESULT FOR THE CONTINUITY EQUATION IN TWO DIMENSIONS GIOVANNI ALBERTI, STEFANO BIANCHINI, AND GIANLUCA CRIPPA Dedicated to Constantine Dafermos on the occasion of his 7 th birthday Abstract.
More informationCOSAMP: ITERATIVE SIGNAL RECOVERY FROM INCOMPLETE AND INACCURATE SAMPLES
COSAMP: ITERATIVE SIGNAL RECOVERY FROM INCOMPLETE AND INACCURATE SAMPLES D NEEDELL AND J A TROPP Abstract Compressive sampling offers a new paradigm for acquiring signals that are compressible with respect
More informationOrthogonal Bases and the QR Algorithm
Orthogonal Bases and the QR Algorithm Orthogonal Bases by Peter J Olver University of Minnesota Throughout, we work in the Euclidean vector space V = R n, the space of column vectors with n real entries
More informationOptimization with SparsityInducing Penalties. Contents
Foundations and Trends R in Machine Learning Vol. 4, No. 1 (2011) 1 106 c 2012 F. Bach, R. Jenatton, J. Mairal and G. Obozinski DOI: 10.1561/2200000015 Optimization with SparsityInducing Penalties By
More informationCOMPUTING EQUILIBRIA FOR TWOPERSON GAMES
COMPUTING EQUILIBRIA FOR TWOPERSON GAMES Appeared as Chapter 45, Handbook of Game Theory with Economic Applications, Vol. 3 (2002), eds. R. J. Aumann and S. Hart, Elsevier, Amsterdam, pages 1723 1759.
More informationProbability in High Dimension
Ramon van Handel Probability in High Dimension ORF 570 Lecture Notes Princeton University This version: June 30, 2014 Preface These lecture notes were written for the course ORF 570: Probability in High
More informationSome Applications of Laplace Eigenvalues of Graphs
Some Applications of Laplace Eigenvalues of Graphs Bojan MOHAR Department of Mathematics University of Ljubljana Jadranska 19 1111 Ljubljana, Slovenia Notes taken by Martin Juvan Abstract In the last decade
More informationFast Solution of l 1 norm Minimization Problems When the Solution May be Sparse
Fast Solution of l 1 norm Minimization Problems When the Solution May be Sparse David L. Donoho and Yaakov Tsaig October 6 Abstract The minimum l 1 norm solution to an underdetermined system of linear
More informationDecoding by Linear Programming
Decoding by Linear Programming Emmanuel Candes and Terence Tao Applied and Computational Mathematics, Caltech, Pasadena, CA 91125 Department of Mathematics, University of California, Los Angeles, CA 90095
More informationThe Backpropagation Algorithm
7 The Backpropagation Algorithm 7. Learning as gradient descent We saw in the last chapter that multilayered networks are capable of computing a wider range of Boolean functions than networks with a single
More informationMaximizing the Spread of Influence through a Social Network
Maximizing the Spread of Influence through a Social Network David Kempe Dept. of Computer Science Cornell University, Ithaca NY kempe@cs.cornell.edu Jon Kleinberg Dept. of Computer Science Cornell University,
More informationAn Elementary Introduction to Modern Convex Geometry
Flavors of Geometry MSRI Publications Volume 3, 997 An Elementary Introduction to Modern Convex Geometry KEITH BALL Contents Preface Lecture. Basic Notions 2 Lecture 2. Spherical Sections of the Cube 8
More informationBayesian Models of Graphs, Arrays and Other Exchangeable Random Structures
Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures Peter Orbanz and Daniel M. Roy Abstract. The natural habitat of most Bayesian methods is data represented by exchangeable sequences
More informationHow to Use Expert Advice
NICOLÒ CESABIANCHI Università di Milano, Milan, Italy YOAV FREUND AT&T Labs, Florham Park, New Jersey DAVID HAUSSLER AND DAVID P. HELMBOLD University of California, Santa Cruz, Santa Cruz, California
More informationFast Greeks by algorithmic differentiation
The Journal of Computational Finance (3 35) Volume 14/Number 3, Spring 2011 Fast Greeks by algorithmic differentiation Luca Capriotti Quantitative Strategies, Investment Banking Division, Credit Suisse
More informationRECENTLY, there has been a great deal of interest in
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 1, JANUARY 1999 187 An Affine Scaling Methodology for Best Basis Selection Bhaskar D. Rao, Senior Member, IEEE, Kenneth KreutzDelgado, Senior Member,
More informationSteering User Behavior with Badges
Steering User Behavior with Badges Ashton Anderson Daniel Huttenlocher Jon Kleinberg Jure Leskovec Stanford University Cornell University Cornell University Stanford University ashton@cs.stanford.edu {dph,
More informationPrice discrimination through communication
Price discrimination through communication Itai Sher Rakesh Vohra May 26, 2014 Abstract We study a seller s optimal mechanism for maximizing revenue when a buyer may present evidence relevant to her value.
More informationWhen Is There a Representer Theorem? Vector Versus Matrix Regularizers
Journal of Machine Learning Research 10 (2009) 25072529 Submitted 9/08; Revised 3/09; Published 11/09 When Is There a Representer Theorem? Vector Versus Matrix Regularizers Andreas Argyriou Department
More informationWHICH SCORING RULE MAXIMIZES CONDORCET EFFICIENCY? 1. Introduction
WHICH SCORING RULE MAXIMIZES CONDORCET EFFICIENCY? DAVIDE P. CERVONE, WILLIAM V. GEHRLEIN, AND WILLIAM S. ZWICKER Abstract. Consider an election in which each of the n voters casts a vote consisting of
More informationHYBRID systems involve a combination of discrete and continuous
UNPUBLISHED REPORT Formal Semantics and Analysis Methods for Simulink Stateflow Models A. Tiwari Abstract Embedded control systems typically comprise continuous control laws combined with discrete mode
More informationON THE DISTRIBUTION OF SPACINGS BETWEEN ZEROS OF THE ZETA FUNCTION. A. M. Odlyzko AT&T Bell Laboratories Murray Hill, New Jersey ABSTRACT
ON THE DISTRIBUTION OF SPACINGS BETWEEN ZEROS OF THE ZETA FUNCTION A. M. Odlyzko AT&T Bell Laboratories Murray Hill, New Jersey ABSTRACT A numerical study of the distribution of spacings between zeros
More informationThe Gödel Phenomena in Mathematics: A Modern View
Chapter 1 The Gödel Phenomena in Mathematics: A Modern View Avi Wigderson Herbert Maass Professor School of Mathematics Institute for Advanced Study Princeton, New Jersey, USA 1.1 Introduction What are
More information