Michael Ulbrich. Nonsmooth Newton-like Methods for Variational Inequalities and Constrained Optimization Problems in Function Spaces
|
|
|
- Earl Tyler Riley
- 10 years ago
- Views:
Transcription
1 Michael Ulbrich Nonsmooth Newton-like Methods for Variational Inequalities and Constrained Optimization Problems in Function Spaces Technische Universität München Fakultät für Mathematik June 21, revised February 22
2 Table of Contents 1. Introduction Examples of Applications Optimal Control Problems Variational Inequalities Motivation of the Method Finite-Dimensional Variational Inequalities Infinite-Dimensional Variational Inequalities Organization Elements of Finite-Dimensional Nonsmooth Analysis Generalized Differentials Semismoothness Semismooth Newton s Method Higher Order Semismoothness Examples of Semismooth Functions The Euclidean Norm The Fischer Burmeister Function Piecewise Differentiable Functions Extensions Newton Methods for Semismooth Operator Equations Introduction Newton Methods for Abstract Semismooth Operators Semismooth Operators in Banach Spaces Basic Properties Semismooth Newton s Method Inexact Newton s Method Projected Inexact Newton s Method Alternative Regularity Conditions Semismooth Newton Methods for Superposition Operators Assumptions A Generalized Differential Semismoothness of Superposition Operators Illustrations
3 II Table of Contents Proof of the Main Theorems Semismooth Newton Methods Semismooth Composite Operators and Chain Rules Further Properties of the Generalized Differential Smoothing Steps and Regularity Conditions Smoothing Steps A Newton Method without Smoothing Steps Sufficient Conditions for Regularity Variational Inequalities and Mixed Problems Application to Variational Inequalities Problems with Bound-Constraints Pointwise Convex Constraints Mixed Problems Karush Kuhn Tucker Systems Connections to the Reduced Problem Relations between Full and Reduced Newton System Smoothing Steps Regularity Conditions Trust-Region Globalization The Trust-Region Algorithm Global Convergence Implementable Decrease Conditions Transition to Fast Local Convergence Applications Distributed Control of a Nonlinear Elliptic Equation Black-Box Approach All-at-Once Approach Finite Element Discretization Discrete Black-Box-Approach Efficient Solution of the Newton System Discrete All-at-Once Approach Numerical Results Using Multigrid Techniques Black-Box Approach All-at-Once Approach Nested Iteration Discussion of the Results Obstacle Problems Dual Problem Regularized Dual Problem Discretization
4 Table of Contents III Numerical Results Optimal Control of the Incompressible Navier Stokes Equations Introduction Functional Analytic Setting of the Control Problem Function Spaces The Control Problem Analysis of the Control Problem State Equation Control-to-State Mapping Adjoint Equation Properties of the Reduced Objective Function Application of Semismooth Newton Methods Optimal Control of the Compressible Navier Stokes Equations Introduction The Flow Control Problem Adjoint-Based Gradient Computation Semismooth BFGS-Newton Method Quasi-Newton BFGS-Approximations The Algorithm Numerical Results A. Appendix A.1 Adjoint Approach for Optimal Control Problems A.1.1 Adjoint Representation of the Reduced Gradient A.1.2 Adjoint Representation of the Reduced Hessian A.2 Several Inequalities A.3 Elementary Properties of Multifunctions A.4 Nemytskij Operators Notations References
5 Acknowledgments It is my great pleasure to thank Prof. Dr. Klaus Ritter for his constant support and encouragement over the past ten years. Furthermore, I would like to thank Prof. Dr. Johann Edenhofer who stimulated my interest in optimal control of PDEs. My scientific work benefited significantly from two very enjoyable and fruitful research stays at the Department of Computational and Applied Mathematics (CAAM) and the Center for Research on Parallel Computation (CRPC), Rice University, Houston, Texas. These visits were made possible by Prof. John Dennis and Prof. Matthias Heinkenschloss. I am very thankful to both of them for their hospitality and support. During my second stay at Rice University, I laid the foundation of a large part of this work. The visits were funded by the Forschungsstipendium Ul157/1-1 and the Habilitandenstipendium Ul157/3-1 of the Deutsche Forschungsgemeinschaft, and by CRPC grant CCR This support is gratefully acknowledged. The computational results in chapter 9 for the boundary control of the compressible Navier Stokes equations build on joint work with Prof. Scott Collis, Prof. Matthias Heinkenschloss, Dr. Kaveh Ghayour, and Dr. Stefan Ulbrich as part of the Rice AeroAcoustic Control (RAAC) project, which is directed by Scott Collis and Matthias Heinkenschloss. I thank all RAAC group members for allowing me to use their contributions to the project for my computations. In particular, Scott Collis Navier Stokes solver was very helpful. The computations for chapter 9 were performed on an SGI Origin 2 at Rice University which was purchased with the aid of NSF SCREMS grant I am very thankful to Matthias Heinkenschloss for giving me access to this machine. Furthermore, I would like to thank Prof. Dr. Folkmar Bornemann for the opportunity to use his SGI Origin 2 for computations. I also would like to acknowledge the Zentrum Mathematik, Technische Universität München, for providing a very pleasant and professional working environment. In particular, I am thankful to the members of our Rechnerbetriebsgruppe, Dr. Michael Nast, Dr. Andreas Johann, and Rolf Schöne, for their good system administration and their helpfulness. In making the ideas for this work concrete, I profited from an inspiring conversation with Prof. Liqun Qi, Prof. Danny Ralph, and PD Dr. Christian Kanzow during the ICCP99 meeting in Madison, Wisconsin, which I would like to acknowledge. Finally, I wish to thank my parents, Margot and Peter, and my brother Stefan for always being there for me.
6 1. Introduction A central theme of applied mathematics is the design of accurate mathematical models for a variety of technical, financial, medical, and many other applications, and the development of efficient numerical algorithms for their solution. Often, these models contain parameters that should be adjusted in an optimal way, either to maximize the accuracy of the model (parameter identification), or to control the simulated system in a desired way (optimal control). Since optimization with simulation constraints is more challenging than simulation alone (which already can be very involved on its own), the development and analysis of efficient optimization methods is crucial for the viability of this approach. Besides the optimization of systems, minimization problems and variational inequalities often arise already in the process of building mathematical models; this, e.g., applies to contact problems, free boundary problems, and elastoplastic problems [47, 62, 63, 97, 98, 117]. Most of the variational problems mentioned so far join the property that they are continuous in time and/or space, so that infinite-dimensional function spaces provide the appropriate setting for their analysis. Since essential information on the problem to solve is carried by the properties of the underlying infinite-dimensional spaces, the successful design of robust and mesh-independent optimization methods requires a thorough convergence analysis in this infinite-dimensional function space setting. The purpose of this work is to develop and analyze a class of Newton-type methods for the solution of optimization problems and variational inequalities that are posed in function spaces and contain pointwise inequality constraints. A representative prototype of the problems we consider here is the following: Bound-Constrained Variational Inequality Problem (VIP): Find u L p (Ω) such that: u B def = {v L p (Ω) : a v b on Ω}, F (u), v u for all v B. (1.1) Hereby, u, v = Ω u(ω)v(ω)dω, and F : Lp (Ω) L p (Ω) with p, p (1, ], 1/p + 1/p 1, is an (in general nonlinear) operator, where L p (Ω) is the usual Lebesgue space on the bounded Lebesgue measurable set Ω R n. We assume that Ω has positive Lebesgue measure, so that < µ(ω) <. These requirements on Ω are assumed throughout this work. In case this is needed (e.g., for embeddings), but not explicitly stated, we assume that Ω is nonempty, open, and bounded with
7 2 1. Introduction sufficiently smooth boundary Ω. The lower- and upper bound functions a and b may be present only on measurable parts Ω a and Ω b of Ω, which is achieved by setting a Ω\Ωa = and b Ω\Ωb = +, respectively. We assume that the natural extensions by zero of a Ωa and b Ωb to Ω are elements of L p (Ω). We also require a minimum distance ν > of the bounds from each other, i.e., b a ν on Ω. In the definition of B, and throughout this work, relations between measurable functions are meant to hold pointwise almost everywhere on Ω in the Lebesgue sense. Various extensions of problem (1.1) will also be considered and are discussed below. In many situations, the VIP (1.1) describes the first-order necessary optimality conditions of the bound-constrained minimization problem minimize j(u) subject to u B. (1.2) In this case, F is the Fréchet derivative j : L p (Ω) L p (Ω) of the objective functional j : L p (Ω) R. The methods we are going to investigate are best explained by considering the unilateral case with lower bounds a. The resulting problem is called nonlinear complementarity problem (NCP): u L p (Ω), u, F (u), v u for all v L p (Ω), v. (1.3) As we will see, and as might be obvious to the reader, (1.3) is equivalent to the pointwise complementarity system u, F (u), uf (u) = on Ω. (1.4) The basic idea, which was developed in the nineties for the numerical solution of finite-dimensional NCPs, consists in the observation that (1.3) is equivalent to the operator equation Φ(u) =, where Φ(u) = φ ( u(ω), F (u)(ω) ) ω Ω. (1.5) Hereby, φ : R 2 R is an NCP-function, i.e., φ(x) = x 1, x 2, x 1 x 2 =. We will develop a semismoothness concept that is applicable to the operators arising in (1.5) and that allows us to develop a class of Newton-type methods for the solution of (1.5). The resulting algorithms have, as their finite-dimensional counter parts the semismooth Newton methods several remarkable properties: (a) The methods are locally superlinearly convergent, and they converge with q-rate > 1 under slightly stronger assumptions. (b) Although an inequality constrained problem is solved, only one linear operator equation has to be solved per iteration. Thus, the cost per iteration is comparable to that of Newton s method for smooth operator equations. We remark that sequential quadratic programming (SQP) algorithms, which are very efficient in
8 1. Introduction 3 practice, require the solution of an inequality constrained quadratic program per iteration, which can be significantly more expensive. Thus, it is also attractive to combine SQP methods with the class of Newton methods we describe here, either by using the Newton method for solving subproblems, or by rewriting the complementarity conditions in the Kuhn Tucker system as operator equation. (c) The convergence analysis does not require a strict complementarity condition to hold. Therefore, we can prove fast convergence also for the case where the set {ω : ū(ω) =, F (ū)(ω) = } has positive measure at the solution ū. (d) The systems that have to be solved in each iteration are of the form [d 1 I + d 2 F (u)]s = Φ(u), (1.6) where I : u u is the identity and F denotes the Fréchet derivative of F. Further, d 1, d 2 are nonnegative L -functions that are chosen depending on u and satisfy < γ 1 < d 1 + d 2 < γ 2 on Ω uniformly in u. More precisely: (d 1, d 2 ) is a measurable selection of the measurable multifunction ω Ω φ ( u(ω), F (u)(ω) ), where φ is Clarke s generalized gradient of φ. As we will see, in typical applications the system (1.6) can be symmetrized and is not much harder to solve than a system involving only the operator F (u), which would arise for the unconstrained problem F (u) =. In particular, fast solvers like multigrid methods, preconditioned iterative solvers, etc., can be applied to solve (1.6). (e) The method is not restricted to the problem class (1.1). Among the possible extensions we also investigate variational inequality problems of the form (1.1), but with the feasible set B replaced by C = {u L p (Ω) m : u(ω) C on Ω}, C R m closed and convex. Furthermore, we will consider mixed problems, where F (u) is replaced by F (y, u) and where we have the additional operator equation E(y, u) =. In particular, such problems arise as the first-order necessary optimality conditions (Karush Kuhn Tucker or KKT-conditions) of optimization problems with optimal control structure minimize J(y, u) subject to E(y, u) =, u C. (f) Other extensions are possible that we do not cover in this work. For instance, certain quasivariational inequalities [12, 13], i.e., variational inequalities for which the feasible set depends on u (e.g., a = A(u), b = B(u)), can be solved by our class of semismooth Newton methods. For illustration, we begin with examples of two problem classes that fit in the above framework.
9 4 1. Introduction 1.1 Examples of Applications Optimal Control Problems Let be given the state space Y (a Banach space), the control space U = L p (Ω), and the set B U of admissible or feasible controls as defined in (1.1). The state y Y of the system under consideration is governed by the state equation E(y, u) =, (1.7) where E : Y U W and W denotes the dual of a reflexive Banach space W. In our context, the state equation usually is given by the weak formulation of a partial differential equation (PDE), including all boundary conditions that are not already contained in the definition of Y. Suppose that, for every control u U, the state equation (1.7) possesses a unique solution y = y(u) Y. The control problem consists in finding a control ū such that the pair (y(ū), ū) minimizes a given objective function J : Y U R among all feasible controls u B. Thus, the control problem is minimize y Y,u U J(y, u) subject to (1.7) and u B. (1.8) Alternatively, we can use the state equation to express the state in terms of the control, y = y(u), and to write the control problem in the equivalent reduced form minimize j(u) subject to u B, (1.9) with the reduced objective function j(u) def = J(y(u), u). By the implicit function theorem, the continuous differentiability of y(u) in a neighborhood of ū follows if E is continuously differentiable and E y (y(ū), ū) is continuously invertible. Further, if in addition J is continuously differentiable in a neighborhood of (y(ū), ū) then j is continuously differentiable in a neighborhood of ū. In the same way, differentiability of higher order can be ensured. For problem (1.9), the gradient j (u) U is given by j (u) = J u (y, u) + y u (u) J y (y, u), with y = y(u). Alternatively, j can be represented via the adjoint state w = w(u) W, which is the solution of the adjoint equation E y (y, u) w = J y (y, u), where y = y(u). As discussed in more detail in appendix A.1, the gradient of j can be written in the form j (u) = J u (y, u) + E u (y, u) w. Adjoint-based expressions for the second derivative j are also available, see appendix A.1.
10 1.1 Examples of Applications 5 We now make the example more concrete and consider as state equation the Poisson problem with distributed control on the right hand side, y = u on Ω, y = on Ω, (1.1) and an objective function of tracking type J(y, u) = 1 y d ) 2 Ω(y 2 dx + λ 2 Ω u 2 dx. Hereby, Ω R n is a nonempty and bounded open set, y d L 2 (Ω) is a target state that we would like to achieve as well as possible by controlling u, and the second term is for the purpose of regularization (the parameter λ > is typically very small, e.g., λ = 1 3 ). We incorporate the boundary conditions into the state space by choosing Y = H 1 (Ω), the Sobolev space of functions vanishing on Ω. For the control space we choose U = L 2 (Ω). The control problem thus is minimize y H 1 (Ω),u L2 (Ω) 1 2 y d ) Ω(y 2 dx + λ 2 subject to y = u, u B. Ω u 2 dx (1.11) Defining the operator E : Y U W def = Y, E(y, u) = y u, we can write the state equation in the form (1.7). We identify L 2 (Ω) with its dual and introduce the Gelfand triples Then H 1 (Ω) = Y U = L2 (Ω) Y = H 1 (Ω). J y (y, u) = y y d, J u (y, u) = λu, E u (y, u)v = v v U, E y (y, u)z = z z Y. Therefore, the adjoint state w W = W = H 1 (Ω) is given by w = y d y on Ω, w = on Ω, (1.12) where y solves (1.1). Note that in (1.12) the boundary conditions could also be omitted because they are already enforced by w H 1 (Ω). The gradient of the reduced objective function j thus is j (u) = J u (y, u) + E u (y, u) w = λu w with y = y(u) and w = w(u) solutions of (1.1) and (1.12), respectively. This problem has the following properties that are common to many control problems and will be of use later on:
11 6 1. Introduction The mapping u w(u) possesses a smoothing property. In fact, w is a smooth (in this simple example even affine linear and bounded) mapping from U = L 2 (Ω) to W = H 1 (Ω), which is continuously embedded in L p (Ω) for appropriate p > 2. If the boundary of Ω is sufficiently smooth, elliptic regularity results even imply that the mapping u w(u) maps smoothly into H 1 (Ω) H 2 (Ω). The solution ū is contained in L p (Ω) U (note that Ω is bounded) for appropriate p (2, ] if the bounds satisfy a Ωa L p (Ω a ), b Ωb L p (Ω b ). In fact, let p (2, ] be such that H 1 (Ω) Lp (Ω). As we will see shortly, j (ū) = λū w vanishes on Ω = {ω : a(ω) < ū(ω) < b(ω)}. Thus, using w H 1 (Ω) Lp (Ω), we conclude ū Ω = λ 1 w Ω L p (Ω ). On Ω a \ Ω we have ū = a, and on Ω b \ Ω holds ū = b. Hence, ū L p (Ω). Therefore, the reduced problem (1.9) is of the form (1.2). Due to strict convexity of j, it can be written in the form (1.1) with F = j, and it enjoys the following properties: There exist p, p (2, ] such that F : L 2 (Ω) L 2 (Ω) is continuously differentiable (here even continuous affine linear). F has the form F (u) = λu + G(u), where G : L 2 (Ω) L p (Ω) is locally Lipschitz continuous (here even continuous affine linear). The solution is contained in L p (Ω). This problem arises as special case in the class of nonlinear elliptic control problems that we discuss in detail in section 7.1. The distributed control of the right hand side can be replaced by a variety of other control mechanisms. One alternative is Neumann boundary control. To describe this briefly, let us assume that the boundary Ω is sufficiently smooth with positive and finite Hausdorff measure. We consider the problem minimize y H 1 (Ω),u L 2 ( Ω) 1 2 Ω(y y d ) 2 dx + λ 2 subject to y + y = f on Ω, Ω u 2 ds y = u on Ω, u B, n (1.13) where B U = L 2 ( Ω), f W = H 1 (Ω), and / n denotes the outward normal derivative. The state equation in weak form reads v Y : ( y, v) L 2 (Ω) 2 + (y, v) L 2 (Ω) = f, v H 1 (Ω),H 1 (Ω) + (u, v Ω ) L 2 ( Ω), where Y = H 1 (Ω). This can be written in the form E(y, u) = with E : H 1 (Ω) L 2 ( Ω) H 1 (Ω). A calculation similar as above yields for the reduced objective function j (u) = λu w Ω, where the adjoint state w = w(u) W = H 1 (Ω) is given by
12 w + w = y d y on Ω, 1.1 Examples of Applications 7 w n = on Ω. Using standard results on Neumann problems, we see that the mappings u L 2 ( Ω) y(u) H 1 (Ω) w(u) H 1 (Ω) are continuous affine linear, and thus is u L 2 ( Ω) w(u) Ω H 1/2 ( Ω) L p ( Ω) for appropriate p > 2. Therefore, we have a scenario comparable to the distributed control problem, but now posed on the boundary of Ω Variational Inequalities As further application, we discuss a variational inequality arising from obstacle problems. For q [2, ), let g H 2,q (Ω) represent a (lower) obstacle located over the nonempty bounded open set Ω R 2 with sufficiently smooth boundary, denote by y H 1 (Ω) the position of a membrane, and by f L q (Ω) external forces. For compatibility we assume g on Ω. Then y solves the problem 1 minimize y H 1(Ω) 2 a(y, y) (f, y) L2 subject to y g, (1.14) where a(y, z) = i,j y z a ij, x i x j a ij = a ji C 1 ( Ω), and a being H 1 -elliptic. Let A L(H1, H 1 ) be the operator induced by a, i.e., a(y, z) = y, Az H 1,H 1. It can be shown, see section 7.3 and [22], that (1.14) possesses a unique solution ȳ H 1 (Ω) and that, in addition, ȳ H2,q (Ω). Using Fenchel Rockafellar duality [49], an equivalent dual problem can be derived, which (written as minimization problem) assumes the form minimize u L 2 (Ω) 1 2 (f + u, A 1 (f + u)) L 2 (g, u) L 2 subject to u. (1.15) The dual problem admits a unique solution ū L 2 (Ω), which in addition satisfies ū L q (Ω). From the dual solution ū we can recover the primal solution ȳ via ȳ = A 1 (f + ū). Obviously, the objective function in (1.15) is not L 2 -coercive, which we compensate by adding a regularization. This yields the objective function j λ (u) = 1 2 (f + u, A 1 (f + u)) L 2 (g, u) L 2 + λ 2 u u d 2 L 2,
13 8 1. Introduction where λ > is a (small) parameter and u d L p (Ω), p [2, ), is chosen appropriately. We will show in section 7.3 that the solution ū λ of the regularized problem minimize u L 2 (Ω) j λ (u) subject to u (1.16) lies in L p (Ω) and satisfies ū λ ū H 1 = o(λ 1/2 ), which implies ȳ λ ȳ H 1 = o(λ 1/2 ), where ȳ λ = A 1 (f + ū λ ). Since j λ is strictly convex, problem (1.16) can be written in the form (1.1) with F = j λ. We have F (u) = λu + A 1 def (f + u) g λu d = λu + G(u). Using that A L(H 1, H 1 ) is a homeomorphism, and that H 1 (Ω) L p (Ω) for all p [1, ), we conclude that the operator G maps L 2 (Ω) continuously affine linearly into L p (Ω). Therefore, we see: F : L 2 (Ω) L 2 (Ω) is continuously differentiable (here even continuous affine linear). F has the form F (u) = λu + G(u), where G : L 2 (Ω) L p (Ω) is locally Lipschitz continuous (here even continuous affine linear). The solution is contained in L p (Ω). A detailed discussion of this problem including numerical results is given in section 7.3. In a similar way, obstacle problems on the boundary can be treated. Furthermore, time-dependent parabolic variational inequality problems can be reduced, by semidiscretization in time, to a sequence of elliptic variational inequality problems. 1.2 Motivation of the Method The class of methods for solving (1.1) that we consider here is based on the following equivalent formulation of (1.1) as a system of pointwise inequalities: (i) a u b, (ii) (u a)f (u), (iii) (u b)f (u) on Ω. (1.17) On Ω \Ω a, condition (ii) has to be interpreted as F (u), and on Ω \Ω b condition (iii) means F (u). The equivalence of (1.1) and (1.17) is easily verified. In fact, if u is a solution of (1.1) then (i) holds. Further, if (ii) is violated on a set Ω of positive measure, we define v B by v = a on Ω, and v = u on Ω \ Ω, and obtain the contradiction F (u), v u = F (u)(a u)dω <. In the same way, (iii) Ω can be shown to hold. Conversely, if u solves (1.17) then (i) (iii) imply that Ω is the union of the disjoint sets {a < u < b, F (u) = }, Ω = {u = a, F (u) }, and Ω {u = b, F (u) }. Now, for arbitrary v B, we have F (u), v u = F (u)(v a)dω + F (u)(v b)dω, Ω Ω
14 1.2 Motivation of the Method 9 so that u solves (1.1). As already mentioned, an important special case, which will provide our main example throughout, is the nonlinear complementarity problem (NCP), which corresponds to a and b +. Obviously, unilateral problems can be converted to an NCP via the transformation ũ = u a, F (ũ) = F (ũ + a) in the case of lower bounds, and ũ = b u, F (ũ) = F (b ũ) in the case of upper bounds. For NCPs, (1.17) reduces to (1.4). In finite dimensions, the NCP and, more generally, the box-constrained variational inequality problem (which is also called mixed complementarity problem, MCP) have been extensively investigated and there exists a significant, rapidly growing body of literature on numerical algorithms for their solution, see section Hereby, a major role is played by devices that allow to reformulate the problem equivalently in form of a system of (nonsmooth) equations. We begin with a description of these concepts in the framework of finite-dimensional MCPs and NCPs Finite-Dimensional Variational Inequalities Although we consider finite-dimensional problems throughout this section 1.2.1, we will work with the same notations as in the function space setting (a, b, u, F, etc.), since there is no danger of ambiguity. In analogy to (1.4), the finite-dimensional mixed complementarity problem consists in finding u R m such that a i u i b i, (u i a i )F i (u), (u i b i )F i (u), i = 1,..., m, (1.18) where a, b R m and F : R m R m are given. We begin with an early approach by Eaves [48] who observed (in the more general framework of VIPs on closed convex sets) that (1.18) can be equivalently written in the form u P [a,b] (u F (u)) =, (1.19) where P [a,b] (u) = max{a, min{u, b}} (componentwise) is the Euclidean projection onto [a, b] = m i=1 [a i, b i ]. Note that if the function F is C k then the left hand side of (1.19) is piecewise C k and thus, as we will see, semismooth. The reformulation (1.19) can be embedded in a more general framework. To this end, we interpret (1.18) as a system of m conditions of the form α x 1 β, (x 1 α)x 2, (x 1 β)x 2, (1.2) which have to be fulfilled by x = (u i, F i (u)) for [α, β] = [a i, b i ], i = 1,..., m. Given any function φ [α,β] : R 2 R with the property we can write (1.18) equivalently as φ [α,β] (x) = (1.2) holds, (1.21) φ [ai,b i ](u i, F i (u)) =, i = 1,..., m. (1.22)
15 1 1. Introduction A function with the property (1.21) is called MCP-function for the interval [α, β] (also the name BVIP-function is used, where BVIP stands for box constrained variational inequality problem). The link between (1.19) and (1.22) consists in the fact that the function φ [α,β] : R 2 R 2, φ E [α,β] (x) = x 1 P [α,β] (x 1 x 2 ) with P [α,β] (t) = max{α, min{t, β}} (1.23) defines an MCP-function for the interval [α, β]. The reformulation of NCPs requires only an MCP-function for the interval [, ). As already said, such functions are called NCP-functions. According to (1.21), φ : R 2 R is an NCP-function if and only if φ(x) = x 1, x 2, x 1 x 2 =. (1.24) The corresponding reformulation of the NCP then is φ(u 1, F 1 (u)) Φ(u) def =. φ(u m, F m (u)) and the NCP-function φ E [, ) can be written in the form φ E (x) = φ E [, ) (x) = min{x 1, x 2 }. =, (1.25) A further important reformulation, which is due to Robinson [127], uses the normal map F [a,b] (z) = F (P [a,b] (z)) + z P [a,b] (z). It is not difficult to see that any solution z of the normal map equation F [a,b] (z) = (1.26) gives rise to a solution u = P [a,b] (z) of (1.18), and, conversely, that, for any solution u of (1.26), the vector z = u F (u) solves (1.26). Therefore, the MCP (1.18) and the normal equation (1.26) are equivalent. Again, the normal map is piecewise C k if F is C k. In contrast to the reformulation based on NCP- and MCP-functions, the normal map approach evaluates F only at feasible points, which can be advantageous in certain situations. Many modern algorithms for finite dimensional NCPs and MCPs are based on reformulations by means of the Fischer Burmeister NCP-function φ F B (x) = x 1 + x 2 x x2 2, (1.27) which was introduced by Fischer [55]. This function is Lipschitz continuous and 1- order semismooth on R 2 (the definition of semismoothness is given below, and, in more detail, in chapter 2). Further, φ F B is C on R 2 \ {}, and (φ F B ) 2 is continuously differentiable on R 2. The latter property implies that, if F is continuously
16 1.2 Motivation of the Method 11 differentiable, the function 1 2 ΦF B (u) T Φ F B (u) can serve as a continuously differentiable merit function for (1.25). It is also possible to obtain 1-order semismooth MCP-functions from the Fischer Burmeister function, see [18, 54] and section The described reformulations were successfully used as basis for the development of locally superlinearly convergent Newton-type methods for the solution of (mixed) nonlinear complementarity problems [18, 38, 39, 45, 5, 52, 53, 54, 88, 89, 93, 116, 124, 14]. This is remarkable, since all these reformulations are nonsmooth systems of equations. However, the underlying functions are semismooth, a concept introduced by Mifflin [113] for real-valued functions on R n, and extended to mappings between finite-dimensional spaces by Qi [12] and Qi and Sun [122]. Hereby details are given in chapter 2 a function f : R l R m is called semismooth at x R l if it is Lipschitz continuous near x, directionally differentiable at x, and if sup f(x + h) f(x) Mh = o( h ) as h, M f(x+h) where the setvalued function f : R l R m l, f(x) = co{m R m l : x k x, f is differentiable at x k and f (x k ) M} denotes Clarke s generalized Jacobian ( co is the convex hull). It can be shown that piecewise C 1 functions are semismooth, see section Further, it is easy to prove that Newton s method (where in Newton s equation the Jacobian is replaced by an arbitrary element of f) converges superlinearly in a neighborhood of a CDregular ( CD for Clarke-differential) solution x, i.e., a solution where all elements of f(x ) are invertible. More details on semismoothness in finite dimensions can be found in chapter 2. It should be mentioned that also continuously differentiable NCP-functions can be constructed. In fact, already in the seventies, Mangasarian [11] proved the equivalence of the NCP to a system of equations, which, in our terminology, he obtained by choosing the NCP-function φ M (x) = θ( x 2 x 1 ) θ(x 2 ) θ(x 1 ), where θ : R R is any strictly increasing function with θ() =. Maybe the most straightforward choice is θ(t) = t, which gives φ M = 2φ E. If, in addition, θ is C 1 with θ () =, then φ M is C 1. This is, e.g., satisfied by θ(t) = t t. Nevertheless, most modern approaches prefer nondifferentiable, semismooth reformulations. This has a good reason. In fact, consider (1.25) with a differentiable NCP-function. Then the Jacobian of Φ is given by Φ (u) = diag ( φ x1 (u i, F (u i )) ) + diag ( φ x2 (u i, F (u i )) ) F (u). Now, since φ(t, ) = = φ(, t) for all t, we see that φ (, ) =. Thus, if strict complementarity is violated for the ith component, i.e., if u i = = F i (u), then the ith row of Φ (u) is zero, and thus Newton s method is not applicable if strict complementarity is violated at the solution. This can be avoided by using nonsmooth
17 12 1. Introduction NCP-functions, because they can be constructed in such a way that any element of the generalized gradient φ(x) is bounded away from zero at any point x R 2. For the Fischer Burmeister function, e.g., holds φ F B (x) = (1, 1) x/ x 2 for all x and thus g for all g φ F B (x) and all x R 2. The development of nonsmooth Newton methods [12, 13, 12, 122, 118], especially the unifying notion of semismoothness [12, 122], has led to considerable research on numerical methods for the solution of finite-dimensional VIPs that are based on semismooth reformulations [18, 38, 39, 5, 52, 53, 54, 88, 89, 93, 116, 14]. These investigations confirm that this approach admits an elegant and general theory (in particular, no strict complementarity assumption is required) and leads to very efficient numerical algorithms [54, 115, 116]. Related approaches The research on semismoothness-based methods is still in progress. Promising new directions of research are provided by Jacobian smoothing methods and continuation methods [31, 29, 92]. Hereby, a family of functions (φ µ ) µ is introduced such that φ is a semismooth NCP- or MCP-function, φ µ, µ >, is smooth and φ µ φ in a suitable sense as µ. These functions are used to derive a family of equations Φ µ (u) = in analogy to (1.25). In the continuation approach [29], a sequence (u k ) of approximate solutions corresponding to parameter values µ = µ k with µ k is generated such that u k converges to a solution of the equation Φ (u) =. Steps are usually obtained by solving the smoothed Newton equation Φ µ k (u k )s c k = Φ µ k (u k ), yielding centering steps towards the central path {x : Φ µ (x) = for some µ > }, or by solving the Jacobian smoothing Newton equation Φ µ k (u k )s k = Φ (u k ), yielding fast steps towards the solution set of Φ (u) =. The latter steps are also used as trial steps in the recently developed Jacobian smoothing methods [31, 92]. Since the limit operator Φ is semismooth, the analysis of these methods heavily relies on the properties of Φ and the semismoothness of Φ. The smoothing approach is also used in the development of algorithms for mathematical programs with equilibrium constraints (MPECs) [51, 57, 9, 19]. In this difficult class of problems, an objective function f(u, v) has to be minimized under the constraint u S(v), where S(v) is the solution set of a VIP that is parameterized by v. Under suitable conditions on this inner problem, S(v) can be characterized equivalently by its KKT conditions. These, however, when taken as constraints for the outer problem, violate any standard constraint qualification. Alternatively, the KKT conditions can be rewritten as a system of semismooth equations by means of an NCP-function. This, however, introduces the (mainly numerical) difficulty of nonsmooth constraints, which can be circumvented by replacing the NCP-function with a smoothing NCP-function and considering a sequence of solutions of the smoothed MPEC corresponding to µ = µ k, µ k. In conclusion, semismooth Newton methods are at the heart of many modern algorithms in finite-dimensional optimization, and hence should also be investigated
18 1.2 Motivation of the Method 13 in the framework of optimal control and infinite-dimensional VIPs. This is the goal of the present manuscript Infinite-Dimensional Variational Inequalities A main concern of this work is to extend the concept of semismooth Newton methods to a class of nonsmooth operator equations sufficiently rich to cover appropriate reformulations of the infinite-dimensional VIP (1.1). In a first step we derive analogues of the reformulations in section 1.2.1, but now in the function space setting. We begin with the NCP (1.4). Replacing componentwise operations by pointwise (a.e.) operations, we can apply an NCP-function φ pointwise to the pair of functions (u, F (u)) to define the superposition operator Φ(u)(ω) = φ ( u(ω), F (u)(ω) ). (1.28) which, under appropriate assumptions, defines a mapping Φ : L p (Ω) L r (Ω), r 1, see section Obviously, (1.4) is equivalent to the nonsmooth operator equation Φ(u) =. (1.29) In the same way, the more general problem (1.1) can be converted into an equivalent nonsmooth equation. To this end, we use a semismooth NCP-function φ and a semismooth MCP-function φ [α,β], < α < β < +. Now, we define the operator Φ : L p (Ω) L r (Ω), F (u)(ω) ω Ω \ (Ω a Ω b ), φ ( u(ω) a(ω), F (u)(ω) ) ω Ω a \ Ω b, Φ(u)(ω) = φ ( b(ω) u(ω), F (u)(ω) ) ω Ω b \ Ω a, φ [a(ω),b(ω)] (u(ω), F (u)(ω)) ω Ω a Ω b. (1.3) Again, Φ is a superposition operator on the four different subsets of Ω distinguished in (1.3). Along the same line, the normal map approach can be generalized to the function space setting. We will concentrate on NCP-function based reformulations and their generalizations. Our approach is applicable whenever it is possible to write the problem under consideration as an operator equation in which the underlying operator is obtained by superposition Ψ = ψ G of a Lipschitz continuous and semismooth function ψ and a continuously Fréchet differentiable operator G with reasonable properties, which maps into a direct product of Lebesgue spaces. We will show that the results for finite-dimensional semismooth equations can be extended to superposition operators in function spaces. To this end, we first develop a general semismoothness concept for operators in Banach spaces and then use these results to analyze superlinearly convergent Newton methods for semismooth operator equations. Then we apply this theory to superposition operators in function spaces of the form Ψ = ψ G. We work with a setvalued generalized differential Ψ that is motivated by Qi s
19 14 1. Introduction finite-dimensional C-subdifferential. The semismoothness result we establish is an estimate of the form sup Ψ(y + s) Ψ(y) Ms L r = o( s Y ) as s Y. M Ψ(y+s) We also prove semismoothness of order α >, which means that the above estimate holds with o( s Y ) replaced by O( s 1+α Y ). This semismoothness result enables us to apply the class of semismooth Newton methods that we analyzed in the abstract setting. If applied to nonsmooth reformulations of variational inequality problems, these methods can be regarded as infinite-dimensional analogues of finite-dimensional semismooth Newton methods for this class of problems. As a consequence, we can adjust to the function space setting many of the ideas that were developed for finite-dimensional VIPs in recent years. 1.3 Organization We now give an overview on the organization of this work. In chapter 2 we recall important results of finite-dimensional nonsmooth analysis. Several generalized differentials known from the literature (Clarke s generalized Jacobian, B-differential, and Qi s C-subdifferential) and their properties are considered. Furthermore, finite-dimensional semismoothness is discussed and semismooth Newton methods are introduced. Finally, we give important examples for semismooth functions, e.g., piecewise smooth functions, and discuss finite-dimensional generalizations of the semismoothness concept. In the first part of chapter 3 we establish semismoothness results for operator equations in Banach spaces. The definition is based on a setvalued generalized differential and requires an approximation condition to hold. Furthermore, semismoothness of higher order is introduced. It is shown that continuously differentiable operators are semismooth with respect to their Fréchet derivative, and that the sum, composition, and direct product of semismoothness operators is again semismooth. The semismoothness concept is used to develop a Newton method for semismooth operator equations that is superlinearly convergent (with q-order 1 + α in the case of α-order semismoothness). Several variants of this method are considered, including an inexact version that allows to work with approximate generalized differentials in the Newton system, and a version that includes a projection in order to stay feasible with respect to a given closed convex set containing the solution. In the second part of chapter 3 this abstract semismoothness concept is applied to the concrete situation of operators obtained by superposition of a Lipschitz continuous semismooth function and a smooth operator mapping into a product of Lebesgue spaces. This class of operators is of significant practical importance as it contains reformulations of variational inequalities by means of semismooth NCP-, MCP-, and related functions. We first develop a suitable generalized differential that has simple structure and is closely related to the finite-dimensional C-subdifferential. Then
20 1.3 Organization 15 we show that the considered superposition operators are semismooth with respect to this differential. We also develop results to establish semismoothness of higher order. The theory is illustrated by applications to the NCP. The established semismoothness of superposition operators enables us, via nonsmooth reformulations, to develop superlinearly convergent Newton methods for the solution of the NCP (1.4), and, as we show in chapter 5, for the solution of the VIP (1.1) and even more general problems. Finally, further properties of the generalized differential are considered. In chapter 4 we investigate two ingredients that are needed in the analysis of chapter 3. In chapter 3 it becomes apparent that in general a smoothing step is required to close a gap between two different L p -norms. This necessity was already observed in similar contexts [95, 143]. In section 4.1 we describe a way how smoothing steps can be constructed, which is based on an idea by Kelley and Sachs [95]. Furthermore, in section 4.2 we investigate a particular choice of the MCP-function that leads to reformulations for which no smoothing step is required. The analysis of semismooth Newton methods in chapter 3 relies on a regularity condition that ensures the uniform invertibility (between appropriate spaces) of the generalized differentials in a neighborhood of the solution. In section 4.3 we develop sufficient conditions for this regularity assumption. In chapter 5 we show how the developed concepts can be applied to solve more general problems than NCPs. In particular, we propose semismooth reformulations for bound-constrained VIPs and, more generally, for VIPs with pointwise convex constraints. These reformulations allow us to apply semismooth Newton methods for their solution. Furthermore, we discuss how semismooth Newton methods can be applied to solve mixed problems, i.e., systems of VIPs and smooth operator equations. Hereby, we concentrate on mixed problems arising as the Karush Kuhn Tucker (KKT) conditions of constrained optimization problems with optimal control structure. A close relationship between reformulations based on the black-box approach, in which the reduced problem is considered, and reformulations based on the all-at-once approach, where the full KKT-system is considered, is established. We observe that the generalized differentials of the black-box reformulation appear as Schur complements in the generalized differentials of the all-at-once reformulation. This can be used to relate regularity conditions of both approaches. We also describe how smoothing steps can be computed. In chapter 6 we describe a way to make the developed class of semismooth Newton methods globally convergent by embedding them in a trust region method. To this end, we propose three variants of minimization problems such that solutions of the semismooth operator equation are critical points of the minimization problem. Then we develop and analyze a class of nonmonotone trust-region methods for the resulting optimization problems in a general Hilbert space setting. The trial steps have to fulfill a model decrease condition, which, as we show, can be implemented by means of a generalized fraction of Cauchy decrease condition. For this algorithm global convergence results are established. Further, it is shown how semismooth Newton steps can be used to compute trial steps and it is proved that, under
21 16 1. Introduction appropriate conditions, eventually always Newton steps are taken. Therefore, the rate of local convergence to regular solutions is at least q-superlinear. In chapter 7 the developed algorithms are applied to concrete problems. Section 7.1 discusses in detail the applicability of semismooth Newton methods to a nonlinear elliptic control problem with bounds on the control. Furthermore, a finite element discretization is discussed and it is shown that the application of finite-dimensional semismooth Newton methods to the discretized problem can be viewed as a discretization of the infinite-dimensional semismooth Newton method. Furthermore, it is discussed how multigrid methods can be used to solve the semismooth Newton system efficiently. The efficiency of the method is documented by various numerical tests. Hereby, both, black-box and all-at-once approach are tested. Furthermore, a nested iteration is proposed that first solves the problem approximately on a coarse grid to obtain a good initial point on the next finer grid and proceeds in this way until the finest grid is reached. As a second application we investigate the obstacle problem of section in detail. An equivalent dual problem is derived, which is augmented by a regularization term to make it coercive. An error estimate for the regularized solution is established in terms of the regularization parameter. We then show that our class of semismooth Newton methods is applicable to the regularized dual problem. Numerical results for a finite element discretization are presented. In the implementation we again use multigrid methods to solve the semismooth Newton system. In chapter 8 we show that our class of semismooth Newton methods can be applied to solve control-constrained distributed optimal control problems governed by the incompressible Navier Stokes equations. To this end, differentiability and local Lipschitz continuity properties of the control-to-state mapping are investigated. Furthermore, results for the adjoint equation are established that allow us to prove a smoothing property of the reduced gradient mapping. These results show that semismooth Newton methods can be applied to the flow control problem and that these methods converge superlinearly in a neighborhood of regular critical points. In chapter 9 we present applications of our method to the boundary control of the time-dependent compressible Navier Stokes equations. Hereby, we control the normal velocity of the fluid on part of the boundary (suction and blowing), subject to pointwise lower and upper bounds. As control objective, the terminal kinetic energy is minimized. In the algorithm, the Hessian is approximated by BFGS matrices. This problem is very large scale, with over 75, unknown controls and over 29,, state variables. The numerical results show that our approach is viable and efficient also for very large scale, state of the art control problems. The appendix contains some useful supplementary material. In appendix A.1 we describe the adjoint-based gradient and Hessian representation for the reduced objective function of optimal control problems. Appendix A.2 collects several frequently used inequalities. In appendix A.3 we state elementary properties of multifunctions. Finally, in appendix A.4, the differentiability properties of Nemytskij operators are considered.
22 2. Elements of Finite-Dimensional Nonsmooth Analysis In this chapter we collect several results of finite-dimensional nonsmooth analysis that are required for our investigations. In particular, finite-dimensional semismoothness and semismooth Newton methods are considered. The concepts introduced in this section will serve as a motivation and guideline for the developments in subsequent sections. All generalized differentials considered here are set-valued functions (or multifunctions). Basic properties of multifunctions, like upper semicontinuity, can be found in appendix A.3. Throughout, we denote by arbitrary, but fixed norms on the respective R n - spaces as well as the induced matrix norms. The open unit ball {x R n : x < 1} is denoted by B n. 2.1 Generalized Differentials On the nonempty open set V R n, we consider the function f : V R m and denote by D f V the set of all x V at which f admits a (Fréchet-) derivative f (x) R m n. Now suppose that f is Lipschitz continuous near x V, i.e., that there exists an open neighborhood V (x) V of x on which f is Lipschitz continuous. Then, according to Rademacher s Theorem [149], V (x) \ D f has Lebesgue measure zero. Hence, the following constructions make sense. Definition 2.1. [32, 118, 122] Let V R n be open and f : V R m be Lipschitz continuous near x V. The set B f(x) def = {M R m n : (x k ) D f : x k x, f (x k ) M} is called B-subdifferential ( B for Bouligand) of f at x. Moreover, Clarke s generalized Jacobian of f at x is the convex hull f(x) def = co( B f(x)), and denotes Qi s C-subdifferential. C f(x) def = f 1 (x) f m (x)
23 18 2. Elements of Finite-Dimensional Nonsmooth Analysis The differentials B f, f, and C f have the following properties. Proposition 2.2. Let V R n be open and f : V R m be locally Lipschitz continuous. Then for x V holds: (a) B f(x) is nonempty and compact. (b) f(x) and C f(x) are nonempty, compact, and convex. (c) The setvalued mappings B f, f, and C f, respectively, are locally bounded and upper semicontinuous. (d) B f(x) f(x) C f(x). (e) If f is continuously differentiable in a neighborhood of x then C f(x) = f(x) = B f(x) = {f (x)}. Proof. The results for B f(x) and f(x) as well as (d) are established in [32, Prop ]. Part (e) immediately follows from the definition of the respective differentials. The remaining assertions on C f are immediate consequences of the properties of f i (x). The following chain rule holds: Proposition 2.3. [32, Cor ] Let V R n and W R l be nonempty open sets, let g : V W be Lipschitz continuous near x V, and h : W R m be Lipschitz continuous near g(x). Then, f = h g is Lipschitz continuous near x and for all v R n, it holds that f(x)v co ( h(g(x)) g(x)v) = co{m h M g v : M h h(g(x)), M g g(x)}. If, in addition, h is continuously differentiable near g(x), then, for all v R n, f(x)v = h (g(x)) g(x)v. If f is real-valued (i.e., if m = 1), then in both chain rules the vector v can be omitted. In particular, choosing h(y) = e T i y = y i and g = f, where e i is the ith unit vector, we see that Corollary 2.4. Let V R n be open and f : V R m be Lipschitz continuous near x V. Then f i (x) = e T i f(x) = {M i : M i is the ith row of some M f(x)}. 2.2 Semismoothness The notion of semismoothness was introduced by Mifflin [113] for real-valued functions defined on finite-dimensional spaces, and extended to mappings between finitedimensional spaces by Qi [12] and Qi and Sun [122]. The importance of semismooth equations results from the fact that, although the underlying mapping is in general nonsmooth, Newton s method is still applicable and converges locally with q-superlinear rate to a regular solution.
24 2.2 Semismoothness 19 Definition 2.5. [113, 118, 122] Let V R n be nonempty and open. The function f : V R m is semismooth at x V if it is Lipschitz continuous near x and if the following limit exists for all s R n : lim Md. M f(x+τd) d s, τ + If f is semismooth at all x V, we call f semismooth (on V ). Note that we include the local Lipschitz condition in the definition of semismoothness. Hence, if f is semismooth at x, it is also Lipschitz continuous near x. Semismoothness admits different, yet equivalent, characterizations. To formulate them, we first recall directional and Bouligand- (or B-) differentiability. Definition 2.6. Let the function f : V R m be defined on the open set V. (a) f is directionally differentiable at x V if the directional derivative f f(x + τs) f(x) (x, s) def = lim τ + τ exists for all s R n. (b) f is B-differentiable at x V if f is directionally differentiable at x and f(x + s) f(x) f (x, s) = o( s ) as s. (c) f is α-order B-differentiable at x V, < α 1, if f is directionally differentiable at x and f(x + s) f(x) f (x, s) = O( s 1+α ) as s. Note that f (x, ) is positive homogeneous. Furthermore, it is known that directional differentiability and B-differentiability are equivalent for locally Lipschitz continuous mappings between finite-dimensional spaces [133]. The following Proposition gives alternative definitions of semismoothness. Proposition 2.7. Let f : V R m be defined on the open set V R n. Then for x V the following statements are equivalent: (a) f is semismooth at x. (b) f is Lipschitz continuous near x, f (x, ) exists and sup Ms f (x, s) = o( s ) as s. M f(x+s) (c) f is Lipschitz continuous near x, f (x, ) exists and sup f(x + s) f(x) Ms = o( s ) as s. (2.1) M f(x+s)
25 2 2. Elements of Finite-Dimensional Nonsmooth Analysis Proof. Concerning the equivalence of (a) and (b), see [122, Thm. 2.3]. If f is Lipschitz continuous near x and directionally differentiable at x, then, as noted above, f is also B-differentiable at x. Hence, it is now easily seen that (b) and (c) are equivalent, since for all M f(x + s) f(x + s) f(x) Ms Ms f (x, s) f(x + s) f(x) f (x, s) = o( s ) as s. The version (c) is especially well suited for the analysis of Newton-type methods. To give a first example of semismooth functions, we note the following immediate consequence of Proposition 2.7: Proposition 2.8. Let V R n be open. If f : V R n is continuously differentiable in a neighborhood of x V then f is semismooth at x and f(x) = B f(x) = {f (x)}. Further, the class of semismooth functions is closed under composition: Proposition 2.9. [56, Lem. 18] Let V R n and W R l be open sets. Let g : V W be semismooth at x V and h : W R m be semismooth at g(x) with g(v ) W. Then the composite map f def = h g : V R m is semismooth at x. Moreover, f (x, ) = h (g(x), g (x, )). It is natural to ask if f is semismooth if its component functions are semismooth and vice versa. This is in fact true: Proposition 2.1. The function f : V R m, V R n open, is semismooth at x V if and only if its component functions are semismooth at x. Proof. We use the characterization of semismoothness given in Proposition 2.7. If f is semismooth at x then the functions f i are Lipschitz continuous near x and directionally differentiable at x. Furthermore, by Corollary 2.4, sup f i (x + s) f i (x) vs v f i (x+s) = sup e T i (f(x + s) f(x) Ms) = o( s ) as s, M f(x+s) which proves the semismoothness of f i at x. The reverse direction is an immediate consequence of the inclusion f(x) C f(x). 2.3 Semismooth Newton s Method We now analyze the following Newton-like method for the solution of the equation f(x) =, (2.2) where f : V R n, V R n open, is semismooth at the solution x V :
26 Algorithm 2.11 (Semismooth Newton s Method).. Choose an initial point x and set k =. 1. If f(x k ) =, then STOP. 2. Choose M k f(x k ) and compute s k from 2.3 Semismooth Newton s Method 21 M k s k = f(x k ). 3. Set x k+1 = x k + s k, increment k by one and go to step 1. Under a regularity assumption on the matrices M k, this iteration converges locally q-superlinearly: Proposition Let f : V R n be defined on the open set V R n and denote by x R n a solution of (2.2). Assume that (a) Estimate (2.1) holds at x = x (which, in particular, is satisfied if f is semismooth at x). (b) One of the following conditions holds: (i) There exists a constant C > such that, for all k, the matrices M k are nonsingular with M 1 k C. (ii) There exist constants η > and C > such that, for all x x + ηb n, every M f(x) is nonsingular with M 1 C. (iii) The solution x is CD-regular ( CD for Clarke-differential), i.e., every M f( x) is nonsingular with M 1 C. Then there exists δ > such that, for all x x + δb n, (i) holds and Algorithm 2.11 either terminates with x k = x or generates a sequence (x k ) that converges q-superlinearly to x. Various results of this type can be found in the literature [12, 13, 118, 12, 122]. In particular, Kummer [13] develops a general abstract framework of essentially two requirements (CA) and (CI), under which Newton s method is well-defined and converges superlinearly. The condition (2.1) is a special case of the approximation condition (CA), whereas (CI) is a uniform injectivity condition, which, in our context, corresponds to assumption (b) (ii). Since the proof of Proposition 2.12 is not difficult and quite helpful in getting familiar with the notion of semismoothness, we include it here. Proof. First, we prove (iii) = (ii). Assume that (ii) does not hold. Then there exist sequences x i x and M i f(x i ) such that, for any i, either M i is singular or (M i ) 1 i. Since f is upper semicontinuous and compact-valued, we can select a subsequence such that M i M f( x). Due to the properties of the matrices M i, M cannot be invertible, and thus (iii) does not hold. Further, observe that (ii) implies (i) whenever x k x+ηb n for all k. Therefore, if one of the conditions in (b) holds, we have (i) at hand as long as x k x + δb n and δ > is sufficiently small. Denoting the error by v k = x k x and using M k s k = f(x k ), f( x) =, we obtain for such x k
27 22 2. Elements of Finite-Dimensional Nonsmooth Analysis M k v k+1 = M k (s k + v k ) = f(x k ) + M k v k = [f( x + v k ) f( x) M k v k ]. (2.3) Invoking (2.1) yields Hence, for sufficiently small δ >, we have and thus by (i) M k v k+1 = o( v k ) as v k. (2.4) M k v k+1 1 2C v k, v k+1 M 1 k M kv k v k. This shows x k+1 x + (δ/2)b n and inductively x k x (in the nontrivial case x k x for all k). Now we conclude from (2.4) that the rate of convergence is q- superlinear. 2.4 Higher Order Semismoothness The rate of convergence of the semismooth Newton method can be improved if instead of (2.1) an estimate of higher order is available. This leads to the following definition of higher order semismoothness, which can be interpreted as a semismooth relaxation of Hölder-continuous differentiability. Definition [122] Let the function f : V R m be defined on the open set V R n. Then, for < α 1, f is called α-order semismooth at x V if f is locally Lipschitz continuous near x, f (x, ) exists, and sup Ms f (x, s) = O( s 1+α ) as s. M f(x+s) If f is α-order semismooth at all x V, we call f α-order semismooth (on V ). For α-order semismooth functions, a counterpart of Proposition 2.7 can be established. Proposition Let f : V R m be defined on the open set V R n. Then for x V and < α 1 the following statements are equivalent: (a) f is α-order semismooth at x. (b) f is Lipschitz continuous near x, α-order B-differentiable at x, and sup f(x + s) f(x) Ms = O( s 1+α ) as s. (2.5) M f(x+s) Proof. According to results in [122], α-order semismoothness at x implies α-order B-differentiability at x. Now we can proceed as in the proof of Proposition 2.7.
28 2.5 Examples of Semismooth Functions 23 Of course, α-hölder continuously differentiable functions are α-order semismooth. More precisely, we have: Proposition Let V R n be open. If f : V R m is differentiable in a neighborhood of x V with α-hölder continuous derivative, < α 1, then f is α-order semismooth at x and f(x) = B f(x) = {f (x)}. The class of α-order semismooth functions is closed under composition: Proposition [56, Thm. 21] Let V R n and W R l be open sets and < α 1. Let g : V W be α-order semismooth at x V and h : W R m be α-order semismooth at g(x) with g(v ) W. Then the composite map f def = h g : V R m is α-order semismooth at x. Moreover, f (x, ) = h (g(x), g (x, )). Further, we obtain by a straightforward modification of the proof of Proposition 2.1: Proposition Let V R n be open. The function f : V R m is α-order semismooth at x V, < α 1, if and only if its component functions are α-order semismooth at x. Concerning the rate of convergence of Algorithm 2.11, the following holds: Proposition Let the assumptions in Proposition 2.12 hold, but assume that instead of (2.1) the stronger condition (2.5), with < α 1, holds at the solution x. Then there exists δ > such that, for all x x + δb n, Algorithm 2.11 either terminates with x k = x or generates a sequence (x k ) that converges to x with rate 1 + α. Proof. In light of Proposition 2.12, we only have to establish the improved rate of convergence. But from v k, (2.3), and (2.5) follows immediately v k+1 = O( v k 1+α ). 2.5 Examples of Semismooth Functions The Euclidean Norm The Euclidean norm e : x R n x 2 = (x T x) 1/2 is an important example of a 1-order semismooth function that arises, e.g., as the nonsmooth part of the Fischer Burmeister function. Obviously, e is Lipschitz continuous on R n, and C on R n \ {} with e (x) = xt x 2.
29 24 2. Elements of Finite-Dimensional Nonsmooth Analysis Therefore, { } x T e(x) = B e(x) = x 2 for x, B e() = {v T : v R n, v 2 = 1}, and e() = {v T : v R n, v 2 1}. By Proposition 2.15, e is 1-order semismooth on R n \ {}, since it is smooth there. On the other hand, for all s R n \ {} and v e(s) holds v = s T / s 2 and Hence, e is also 1-order semismooth at. e(s) e() vs = s 2 s 2 = The Fischer Burmeister Function The Fischer Burmeister function was already defined in (1.27): φ F B : R 2 R, φ F B (x) = x 1 + x 2 x x2 2. φ = φ F B is the difference of the linear function f(x) = x 1 + x 2 and the 1-order semismooth and Lipschitz continuous function x 2, see section Therefore, φ is Lipschitz continuous and 1-order semismooth by Proposition 2.15 and Proposition Further, from the definition of B φ and φ, it is immediately clear that B φ(x) = f (x) B x 2, φ(x) = f (x) x 2. Hence, for x, } φ(x) = B φ(x) = {(1, 1) xt, x 2 and B φ() = {(1, 1) y T : y 2 = 1}, φ() = {(1, 1) y T : y 2 1}. From this one can see that for all x R 2 and all v φ F B (x) holds v 1, v 2, 2 2 v 1 + v , showing that all generalized gradients are bounded above (a consequence of the global Lipschitz continuity) and are bounded away from zero Piecewise Differentiable Functions Piecewise continuously differentiable functions are an important subclass of semismooth functions. We refer to Scholtes [132] for a thorough treatment of the topic, where the results of this section can be found. For the reader s convenience, we include selected proofs.
30 2.5 Examples of Semismooth Functions 25 Definition [132] A function f : V R m defined on the open set V R n is called PC k -function ( P for piecewise), 1 k, if f is continuous and if at every point x V there exist a neighborhood W V of x and a finite collection of C k -functions f i : W R m, i = 1,..., N, such that f(x) {f 1 (x),..., f N (x)} for all x W. We say that f is a continuous selection of {f 1,..., f N } on W. The set is the active index set at x W and I(x) = {i : f(x) = f i (x)} I e (x) = {i I(x) : x cl(int{y W : f(y) = f i (y)}} is the essentially active index set at x. The following is obvious. Proposition 2.2. The class of PC k -functions is closed under composition, finite summation, and multiplication (in case the respective operations make sense). Example The functions t R t, x R 2 max{x 1, x 2 }, and x R 2 min{x 1, x 2 } are PC -functions. As a consequence, the projection onto the interval [α, β], P [α,β] (t) = max{α, min{t, β}} is PC, and thus also the MCP-function φ E [α,β] defined in (1.23). Proposition Let the PC k -function f : V R m be a continuous selection of the C k -functions {f 1,..., f N } on the open set V R n. Then, for x V, there exists a neighborhood W of x on which f is also a continuous selection of {f i : i I e (x)}. Proof. Assume the contrary. Then the open sets V r = {y V : y x < 1/r, f(y) f i (y) for all i I e (x)} are nonempty for all r N. Let {i 1,..., i q } enumerate the set {1,..., N} \ I e (x). Set Vr = V r, and, for l = 1,..., q, generate the open sets Vr l = V r l 1 {y V : f(y) f i l (y)}. Since for all y V there exists i I e (x) {i 1,..., i q } with f(y) = f i (y), we see that Vr q =. Hence, there exists a maximal l r with V l r r. With j r = i lr +1 we have V l r r {y V : f(y) = f j r (y)}. We can select a constant subsequence (j r ) r K, i.e., j r = j / I e (x) for all r K. Now {y V : f(y) = f j (y)}, r K V l r r the set on the left being open and having x as an accumulation point. Therefore, j I e (x), which is a contradiction.
31 26 2. Elements of Finite-Dimensional Nonsmooth Analysis Proposition [132, Cor ] Every PC 1 -function f : V R m, V R n open, is locally Lipschitz continuous. Proposition Let the PC 1 -function f : V R m, V R n open, be a continuous selection of the C 1 -functions {f 1,..., f N } in a neighborhood W of x V. Then f is B-differentiable at x and, for all y R n, Further, if f is differentiable at x then f (x, y) {(f i ) (x)y : i I e (x)}. f (x) {(f i ) (x) : i I e (x)}. Proof. The first part restates [132, Prop ]. Now assume that f is differentiable at x. Then, for all y R n, f (x)y {(f i ) (x)y : i I e (x)}. Denote by q 1 the cardinality of I e (x). Now choose l = q(n 1) + 1 vectors y r R n, r = 1,... l, such that every selection of n of these vectors is linearly independent (the vectors y r can be obtained, e.g., by choosing l pairwise different numbers t r R, and setting y r = (1, t r, t 2 r,..., t n 1 r ) T ). For every r, choose i r I e (x) such that f (x)y r = (f i r ) (x)y r. Since r ranges from 1 to q(n 1) + 1 and i r can assume only q different values, we can find n pairwise different indices r 1,..., r n such that i r1 =... = i rn = j. Since the columns of Y = (y r1,..., y rn ) are linearly independent and f (x)y = (f j ) (x)y, we conclude that f (x) = (f j ) (x). Proposition Let the PC 1 -function f : V R m, V R n open, be a continuous selection of the C 1 -functions {f 1,..., f N } in a neighborhood of x V. Then B f(x) = {(f i ) (x) : i I e (x)}, (2.6) f(x) = co{(f i ) (x) : i I e (x)}. (2.7) Proof. We know from Proposition2.23 that f is locally Lipschitz continuous, so that the subdifferentials are well defined. By Proposition 2.22, f is a continuous selection of {f i : i I e (x)} in a neighborhood W of x. Further, for M B f(x), there exists x k x in W such that f (x k ) M. Among the functions f i, i I e (x), exactly those with indices i I e (x) I e (x k ) are essentially active at x k. Hence, by Proposition 2.22, f is a continuous selection of {f i : i I e (x) I e (x k )} in a neighborhood of x k. Proposition 2.24 now yields that f (x k ) = (f i k ) (x k ) for some i k I e (x) I e (x k ). Now we select a subsequence k K on which i k is constant with value i I e (x). Since (f i ) is continuous, this proves M = (f i ) (x), and thus in (2.6). For every i I e (x) there exists, by definition, a sequence x k x such that f f i in an open neighborhood of every x k. In particular, f is differentiable at x k (since f i is C 1 ), and f (x k ) = (f i ) (x k ) (f i ) (x). This completes the proof of (2.6). Assertion (2.7) is an immediate consequence of (2.6). We now establish the semismoothness if PC 1 -functions.
32 2.6 Extensions 27 Proposition Let f : V R m be a PC 1 -function on the open set V R n. Then f is semismooth. If f is a PC 2 -function, then f is 1-order semismooth. Proof. The local Lipschitz continuity and B-differentiability of f is guaranteed by Propositions 2.23 and Now consider x V. In a neighborhood W of x, f is a continuous selection of C 1 -functions {f 1,..., f N } and, without restriction, we may assume that all f i are active at x. For all x + s W and all M f(x + s) we have, by Proposition 2.25, M = i I e (x+s) λ i(f i ) (x + s), λ i, i λ i = 1. Hence, by Taylor s theorem, using f i (x + s) = f(x + s) for all i I e (x + s), f(x + s) f(x) Ms = λ i f i (x + s) f i (x) (f i ) (x + s)s 1 max i I e (x+s) i I e (x+s) (f i ) (x + τs)s (f i ) (x + s)s dτ = o( s ), which establishes the semismoothness of f. If the f i are C 2, we obtain f(x + s) f(x) Ms 1 max i I e (x+s) showing that f is 1-order semismooth in this case. τ s T (f i ) (x + τs)s dτ = O( s 2 ), 2.6 Extensions It is obvious that useful semismoothness concepts can also be obtained for other suitable generalized derivatives. This was investigated in a general, finite-dimensional framework by Jeyakumar [85, 86]. He introduced the concept of f-semismoothness, where f is an approximate Jacobian [87]. For the definition of approximate Jacobians we refer to [87]; in the sequel, it is sufficient to know that an approximate Jacobian of f : R n R m is a closed-valued multifunctions f : R n R m n and that B f, f, and C f are approximate Jacobians. To avoid confusion with the infinite-dimensional semismoothness concept introduced later (which essentially corresponds to weak J-semismoothness), we denote Jeyakumar s semismoothness concept by J-semismoothness ( J for Jeyakumar). Definition Let f : R n R m be a function with approximate Jacobian f. (a) The function f is called weakly f-j-semismooth at x if it is continuous near x and sup f(x + s) f(x) Mh = o( s ) as s. (2.8) M co f(x+s) (b) The function f is f-j-semismooth at x if (i) f is B-differentiable at x (e.g., locally Lipschitz continuous near x and directionally differentiable at x, see [133]), and
33 28 2. Elements of Finite-Dimensional Nonsmooth Analysis (ii) f is weakly f-j-semismooth at x. Obviously, we can define weak f-j-semismoothness of order α by requiring the order O ( s 1+α) in (2.8), and f-j-semismoothness of order α by the additional requirement that f be α-order B-differentiable at x. Note that for locally Lipschitz continuous functions B f-, f-, and C f-jsemismoothness all coincide with the usual semismoothness, cf. Proposition 2.1 in the case of C f-j-semismoothness. The same holds true for α-order semismoothness. Algorithm 2.11 can be extended to weakly f-j-semismoothness equations by choosing M k f(x k ) in step 2. The proof of Proposition 2.12 can be left unchanged, with the only difference that in assumption (b) (iii) we have to require that f is compact-valued and upper semicontinuous at x. If f is weakly f-jsemismoothness of order α at x, then an analogue of Proposition 2.18 holds.
34 3. Newton Methods for Semismooth Operator Equations 3.1 Introduction It was shown in chapter 1 that semismooth NCP- and MCP-functions can be used to reformulate the VIP (1.1) as (one or more) nonsmooth operator equation(s) of the form Φ(u) =, where Φ(u)(ω) = φ ( G(u)(ω) ) on Ω, (3.1) with G mapping u L p (Ω) to a vector of Lebesgue functions. In particular, for NCPs we have G(u) = (u, F (u)) with F : L p (Ω) L p (Ω), p, p (1, ]. In finite dimensions this reformulation technique is well investigated and yields a semismooth system of equations, which can be solved by semismooth Newton methods. Naturally, the question arises if it is possible to develop a similar semismoothness theory for operators of the form (3.1). This question is of significant practical importance since the performance of numerical methods for infinite-dimensional problems is intimately related to the infinite-dimensional problem structure. In particular, it is desirable that the numerical method can be viewed as a discrete version of a well-behaved abstract algorithm for the infinite-dimensional problem. Then, for increasing accuracy of discretization, the convergence properties of the numerical algorithm can be expected to be (and usually are) predicted very well by the infinitedimensional convergence analysis. Therefore, the investigation of algorithms in the original infinite-dimensional problem setting is very helpful for the development of robust, efficient, and mesh-independent numerical algorithms. In the following, we carry out such an analysis for semismooth Newton methods that are applicable to operator equations of the form (3.1). We split our investigations in two parts. First, we develop: A general semismoothness concept for operators f : Y V Z in Banach spaces, which is based on a setvalued generalized differential f. A locally q-superlinearly convergent Newton-like method for the solution of f-semismoothness operator equations. Extensions of these methods that (a) allow inexact computations and (b) incorporate a projection to stay feasible with respect to a closed convex set containing the solution. α-order f-semismoothness and, based on this, convergence rate 1 + α for the developed Newton methods.
35 3 3. Newton Methods for Semismooth Operator Equations Results on the (α-order) semismoothness of the sum, composition, and direct product of semismooth operators with respect to suitable generalized differentials. In the second part, which follows [139] and constitutes the major part of this chapter, we fill these abstract concepts with life by considering the concrete case of superposition operators in function spaces. Hereby, we investigate operators of the form Ψ(y)(ω) = ψ(g(y)(ω)), a class that includes the operators arising in reformulations (3.1) of VIPs. In particular: We introduce a suitable generalized differential Ψ that is easy to compute and has a natural finite-dimensional counter-part. We prove that, under suitable assumptions, the operators Ψ are Ψ-semismooth; under additional assumptions, we establish α-order semismoothness. We apply the general semismoothness theory to develop locally fast convergent Newton type methods for the operator equation Ψ(y) =. In carrying out this program, we want to achieve a reasonable compromise between generality and applicability of the developed concepts. Concerning generality, it is possible to pose abstract conditions on an operator and its generalized differential such that superlinearly convergent Newton-type methods can be developed. We refer to Kummer [13], where a nice such framework is developed. Similarly, on the abstract level, we work with the following general concept: Given an operator f : Y V Z (V open) between Banach spaces and a set-valued mapping f : V L(Y, Z), we say that f is f-semismooth at y V if f is continuous near y and sup f(y + s) f(y) Ms Z = o( s Y ) as s Y. M f(y+s) If the remainder term is of the order O( s 1+α Y ), < α 1, we call f α-order f-semismooth at y. The class of f-semismooth operators allows a relatively straightforward development and analysis of Newton-type methods. The reader should be aware that in view of section 2.6 it would be more precise to use the term weakly f-semismooth instead of semismooth, since we do not require the B-differentiability of f at y. Nevertheless, we prefer the term semismooth for brevity. Therefore, our definition of semismoothness is slightly weaker than finitedimensional semismoothness, but, as already said, still powerful enough to admit the design of superlinearly convergent Newton-type methods, which is our main objective. It is also weaker than the abstract semismoothness concept that, independently of the present work, was recently proposed by Chen, Nashed and Qi [3]; to avoid ambiguity, we call this concept CNQ-semismoothness ( CNQ for Chen, Nashed and Qi). Hereby [3], the notions of a slanting function f and of slant differentiability of f are introduced and a generalized derivative S f(y), the slant derivative, is obtained as the collection of all possible limits lim yk y f (y k ). CNQsemismoothness is then defined by imposing appropriate conditions on the approximation properties of the slanting function and the slant derivative. These conditions
36 3.1 Introduction 31 are equivalent [3, Thm. 3.3] to the requirements that (i) f is slantly differentiable in a neighborhood of y, (ii) f is S f-semismoothness at y, and (iii) f is B-differentiable at y, i.e., the directional derivative f (y, s) = lim t +(f(y + ts) f(y))/t exists and satisfies f(x + s) f(x) f (x, s) Z = o( s Y ) as s Y. For f-semismooth equations we develop Newton-like methods and prove q- superlinear convergence. Hereby, we impose regularity assumptions that are similar to their finite-dimensional counterparts (e.g., those in Proposition 2.12). For α-order f-semismooth equations, convergence of order 1 + α is established. In view of our applications to reformulations of the VIP, and, more generally, semismooth superposition operators, it is advantageous to formulate and analyze the Newton method in a two-norm framework, which requires to augment the Newton iteration by a smoothing step. Further, we allow for inexactness in the computations and also analyze a projected version of the algorithm which generates iterates that stay within a prescribed closed convex set. Unfortunately, from the viewpoint of applications, the abstract framework of f-semismoothness (as well as other general approaches) leaves two important questions unanswered: (a) Given a particular operator f, how should f be chosen? (b) Is there an easy way to verify that f is f-semismooth? The same questions arise in the case of CNQ-semismoothness. Then (a) consists in finding an appropriate slanting function, and part (b) becomes even more involved since CNQ-semismoothness is stronger than S f-semismoothness. The major, second part of this chapter is intended to develop satisfactory answers to these two questions for a class of nonsmooth operators which includes the mappings Φ arising from reformulations of NCPs and MCPs, see (3.1). More precisely, we consider superposition operators of the form Ψ : Y L r (Ω), Ψ(y)(ω) = ψ ( G(y)(ω) ), (3.2) with mappings ψ : R m R and G : Y m i=1 Lr i (Ω), where 1 r r i <, Y is a real Banach space, and Ω R n is a bounded measurable set with positive Lebesgue measure. Essentially, our working assumptions are that ψ is Lipschitz continuous and semismooth, and that G is continuously Fréchet-differentiable. The detailed assumptions are given below. As generalized differential for Ψ we introduce an appropriate multifunction Ψ : Y L(Y, L r ) (the superscript is used to indicate that is designed especially for superposition operators), which is easy to compute and is motivated by Qi s finite-dimensional C-subdifferential [121]; this addresses question (a) raised above. In our main result we establish the Ψ-semismoothness of Ψ: sup Ψ(y + s) Ψ(y) Ms L r = o( s Y ) as s Y. (3.3) M Ψ(y+s)
37 32 3. Newton Methods for Semismooth Operator Equations This answers question (b) for superposition operators of the form (3.2). We also give conditions under which Ψ is α-order Ψ-semismooth, < α 1. Based on (3.3), we use the abstract results of the first part to develop a locally q-superlinearly convergent Newton method for the nonsmooth operator equation Ψ(y) =. (3.4) Moreover, in the case where Ψ is α-order semismooth we prove convergence with q- rate 1+α. As was already observed earlier in the context of related local convergence analyses in function space [95, 143], we have to incorporate a smoothing step to overcome the non-equivalence of norms. We also give an example showing that this smoothing step can be indispensable. Although the differentiability properties of superposition operators with smooth ψ are well investigated, see, e.g., the expositions [9] and [1], this is not the case for nonsmooth functions ψ. Further, even if ψ is smooth, for operator equations of the form (3.4) the availability of local convergence results for Newton-like methods appears to be very limited. As already said, an important application of our results, which motivates our investigations, are reformulations of VIPs (1.1) posed in function spaces. Throughout this chapter, our investigations of the operator Ψ will be accompanied by illustrations at the example of NCP-function based reformulations of nonlinear complementarity problem (NCPs), which, briefly recalled, consists in finding u L p (Ω) such that almost everywhere on Ω holds u, F (u), uf (u) =, (3.5) where the operator F : L p (Ω) L p (Ω), 1 < p, p, is given. As always, Ω R n is assumed to be bounded and measurable with positive Lebesgue measure. Using a Lipschitz continuous, semismooth NCP-function φ : R 2 R, (3.5) is equivalent to the operator equation (3.1). Obviously, choosing Y = L p (Ω), r 2 = r [1, p ) [1, p), r 1 [r, p), ψ φ, and G : u L p (Ω) ( u, F (u) ), we have Ψ Φ with Ψ as in (3.2). Our focus on the NCP as the main example rather than reformulations of the more general VIP is just for notational convenience. In fact, as can be seen from (1.3), the general VIP requires to use different reformulations on different parts of Ω, depending on the kind of bounds (none, only lower, only upper, lower and upper bounds), a burden we want to avoid in this chapter. To establish the semismoothness of Ψ we have to choose an appropriate vectorvalued generalized differential. Although the available literature on generalized differentials and subdifferentials is mainly focused on real-valued functions, see, e.g., [2, 32, 33, 13] and the references therein, several authors have proposed and analyzed generalized differentials for nonlinear operators between infinite-dimensional spaces [37, 61, 84, 123, 135]. In our approach, we work with a generalized differential that exploits the structure of Ψ. Roughly speaking, our general guidance hereby is to transcribe, at least formally, componentwise operations in R k to pointwise operations in function spaces. To sketch the idea, note that the finite-dimensional analogue of the operator Ψ is the mapping
38 Ψ f : R k R l, Ψ f j(x) = ψ ( G j (x) ), j = 1,..., l 3.1 Introduction 33 with ψ as above and C 1 -mappings G j : R k R m. We have the correspondences ω Ω j {1,..., l}, y Y x R k, and G(y)(ω) G j (x). Componentwise application of the chain rule for Clarke s generalized gradient [32] shows that the C-subdifferential of Ψ f consists of matrices M R l k having rows of the form M j = m d j i (Gj i ) (x), with d j ψ ( G j (x) ). i=1 For completeness, let us note that, conversely, every such matrix is an element of C Ψ f if, e.g., ψ is regular. Carrying out the same construction for Ψ in a purely formal manner suggests to choose a generalized differential for Ψ consisting of operators of the form v Y m d i (G i (x)v) with (d 1,..., d m )(ω) ψ ( G(y)(ω) ) a.e. on Ω, i=1 where the inclusion on the right is meant in the sense of measurable selections. One advantage of this approach, which motivates our choice of the generalized differential Ψ, is that it consists of relatively concrete objects as compared to those investigated in, e.g., [37, 61, 84, 123, 135], which necessarily are more abstract since they are not restricted to a particular structure of the underlying operator. It is not the objective of this chapter to investigate the connections between the generalized differential Ψ and other generalized differentials. There are close relationships, but we leave it as a topic for future research. Here, we concentrate on the development of a semismoothness concept based on Ψ, a related nonsmooth Newton s method, and the relations to the respective finite-dimensional analogues. As already mentioned, the literature on Newton-like methods for the solution of nonlinear complementarity problems or, closely related, bound-constrained optimization problems posed in function spaces is very limited. Hereby, we call an iteration Newton-like if each iteration essentially requires the solution of a linear operator equation. We point out that in this sense sequential quadratic programming (SQP) methods for problems involving inequality constraints [2, 3, 4, 5, 6, 76, 138] are not Newton-like, since each iteration requires the solution of a quadratic programming problem (or, put differently, a linearized generalized equation) which is in general significantly more expensive than solving a linear operator equation. Therefore, instead of applying the methods considered in this chapter directly to the nonlinear problem, they also could be of interest as subproblem solvers for SQP methods. Probably the investigations closest related to ours are the analysis of Bertsekas projected Newton method by Kelley and Sachs [95], and the investigation of affinescaling interior-point Newton methods by Ulbrich and Ulbrich [143]. Both papers deal with bound-constrained minimization problems in function spaces and establish the local q-superlinear convergence of their respective Newton-like methods. In both approaches the convergence results are obtained by estimating directly the remainder
39 34 3. Newton Methods for Semismooth Operator Equations terms appearing in the analysis of the Newton iteration. Hereby, specific properties of the solution are exploited, and a strict complementarity condition is assumed in both papers. We develop our results for the general problem class (3.4) and derive the applicability to nonlinear complementarity problems as a simple, but important special case. In the context of NCPs and optimization, we do not have to assume any strict complementarity condition. Notation In this chapter we equip product spaces i Y i with the norm y Πi Y i = i y Y i. further, for convenience, we write i and i instead of m i=1 and m i= Newton Methods for Abstract Semismooth Operators Semismooth Operators in Banach Spaces In the previous section we have already outlined the following abstract semismoothness concept for general operators between Banach spaces: Definition 3.1. Let f : Y V Z be defined on an open subset V of the Banach space Y with images in the Banach space Z. Further, let be given a set-valued mapping f : V L(Y, Z), and let y V. (i) We say that f is f-semismooth at y if f is continuous near y and sup f(y + s) f(y) Ms Z = o( s Y ) as s Y. M f(y+s) (ii) We say that f is α-order f-semismooth at y, < α 1, if f is continuous near y and sup f(y + s) f(y) Ms Z = O( s 1+α Y ) as s Y. M f(y+s) (iii) The multifunction f is called generalized differential of f. Remark 3.2. The mapping y Y f(y) L(Y, Z) can be interpreted as a set-valued point-based approximation, see Robinson [128], Kummer [13], and Xu [146] Basic Properties We begin by establishing several fundamental properties of semismooth operators. First, it is important to know that continuously differentiable operators f are f - semismooth. More precisely:
40 3.2 Newton Methods for Abstract Semismooth Operators 35 Proposition 3.3. Let f : Y V Z be differentiable on the neighborhood V of y with its derivative f being continuous near y. Then f is f -semismooth at y. If f is α-hölder continuous near y, < α 1, then f is α-order f -semismooth at y. Proof. We have by the fundamental theorem of calculus f(y + s) f(y) f (y + s)s Z sup t 1 1 (f (y + ts) f (y + s))s Z dt f (y + ts) f (y + s) Y,Z s Y = o( s Y ) as s Y. Thus f is f -semismooth at y. If f is α-hölder continuous near y, we obtain sup f (y + ts) f (y + s) Y,Z t 1 sup O( (t 1)s α Y ) = O( s α Y ) as s Y, t 1 which establishes the α-order f -semismoothness of f at y. We proceed by establishing the semismoothness of the sum of semismooth operators. Proposition 3.4. Let V Y be open and let f i : V Z be (α-order) f i - semismooth at y V, i = 1,..., m. Consider the operator f : Y V Z, f(y) = f 1 (y) + + f m (y). Further, define the generalized differential f def = f f m : V L(Y, Z) as follows: f(y) = {M M m : M i f i (y), i = 1,..., m}. Then f is (α-order) f-semismooth at y. Proof. By the f i -semismoothness of f i, sup f(y + s) f(y) Ms Z M i sup f i (y + s) f i (y) M i s Z = o( s Y ) as s Y, M i where the suprema are taken over M f(y + s) and M i f i (y + s), respectively. In the case of α-order semismoothness, we can replace o( s Y ) by O( s 1+α Y ). The next result shows that the direct product of semismooth operators is itself semismooth with respect to the direct product of the generalized differentials of the components.
41 36 3. Newton Methods for Semismooth Operator Equations Proposition 3.5. Let V Y be open and assume that the operators f i : V Z i, i = 1,..., m, are (α-order) f i -semismooth at y V with generalized differentials f i : V L(Y, Z i ). Then the operator f = (f 1,..., f m ) : y V ( f 1 (y),..., f m (y) ) Z def = Z 1 Z m is (α-order) ( f 1 f m )-semismooth at y, where ( f 1 f m )(y) is the set of all operators M L(Y, Z) of the form M : v (M 1 v,..., M m v) with M i f i (y), i = 1,..., m. Proof. Let f = f 1 f m. Then for all M f(y + s) there exist M i f i (y + s) with Mv = (M 1 v,..., M m v). Hence, using the norm z Z = z 1 Z1 + + z m Zm, and writing sup M and sup Mi for suprema taken over M f(y + s) and M i f i (y + s), respectively, we obtain sup f(y + s) f(y) Ms Z = M = o( s Y ) as s Y. m i=1 sup M i f i (y + s) f i (y) M i s Zi In the case of α-order semismoothness, the above holds with o( ) replaced by O( 1+α ). Remark 3.6. We stress that the construction of f 1 f m from f i is analogous to that of the C-subdifferential C f from f i. Next, we give conditions under which the composition of two semismooth operators is semismooth. Proposition 3.7. Let U X and V Y be open. Further, let f 1 : U Y be Lipschitz continuous near x U and (α-order) f 1 -semismooth at x. Further, let f 2 : V Z be (α-order) f 2 -semismooth at y = f 1 (x) with f 2 being bounded near y. Let f 1 (U) V and consider the operator f def = f 2 f 1 : X U Z, f(x) = f 2 (f 1 (x)). Further, define the generalized differential f def = f 2 f 1 : U L(X, Z) as follows: f(x) = ( f 2 f 1 )(x) = {M 2 M 1 : M 1 f 1 (x), M 2 f 2 ( f1 (x) ) }. Then f is (α-order) f-semismooth at x. Proof. We set h = f 1 (x + s) f 1 (x), x + s U. For all x + s U and all M f(x + s) there exist M 1 f 1 (x + s) and M 2 f 2 ( f1 (x + s) ) = f 2 (y + h) with M = M 2 M 1. Due to the Lipschitz continuity of f 1 near x, we have h Y = f 1 (x + s) f 1 (x) Y = O( s X ) as s X. (3.6) Further, since f 2 is bounded near y, we can use the semismoothness of f 1, f 2 and (3.6) to see that for all sufficiently small s X holds
42 3.2 Newton Methods for Abstract Semismooth Operators 37 sup f(x + s) f(x) Ms Z M = sup f 2 (y + h) f 2 (y) M 2 M 1 s Z M 1,M 2 ( ) sup f2 (y + h) f 2 (y) M 2 h Z + M 2 (h M 1 s) Z M 1,M 2 o( h Y ) + sup M 2 M 2 Y,Z sup M 1 f 1 (x + s) f 1 (x) M 1 s Y = o( h Y ) + o( s X ) = o( s X ) as s X, where the suprema are taken over M f(x + s), M 1 f 1 (x + s), and M 2 f 2 (y + h), respectively. Therefore, f is f-semismooth at x. In the case of α- order semismoothness, we can replace o( ) with O( 1+α ) in the above calculations, which yields the α-order f-semismooth of f at x. Remark 3.8. The established results provide a variety of ways to combine semismooth operators to construct new semismooth operators Semismooth Newton s Method In analogy to Algorithm 2.11, we now consider a Newton-like method for the solution of the operator equation f(y) =, (3.7) which uses the generalized differential f. Hereby, we will assume that f : V Z, V Y open, is f-semismooth at the solution ȳ V of (3.7). As we will see, it is important for applications to incorporate an additional device, the smoothing step, in the algorithm, which enables us to work with two-norm techniques. To this end, we introduce a further Banach space Y, in which Y is continuously and densely embedded, and augment the iteration by a smoothing step: Algorithm 3.9 (Semismooth Newton s Method).. Choose an initial point y V and set k =. 1. Choose M k f(y k ), compute s k Y from M k s k = f(y k ), and set y k+1 = y k + s k. 2. Perform a smoothing step: y k+1 Y y k+1 = S k (y k+1 ) Y. 3. If y k+1 = y k, then STOP with result y = y k Increment k by one and go to step 1. Remark 3.1. The stopping test in step 3 is certainly not standard. In fact, we could remove step 3 and perform the following simpler test at the beginning of step 1: If f(y k ) =, then STOP with result y = y k. But then we only could prove that y is a solution of (3.7), but we would not know if y = ȳ or not. For Algorithm 3.9, however, we are able to prove that y = ȳ holds in the case of finite termination.
43 38 3. Newton Methods for Semismooth Operator Equations Before we establish fast local convergence of this algorithm, a comment on the smoothing step is in order. First, it is clear that the smoothing step can be eliminated from the algorithm by choosing Y = Y and S k (yk+1 ) = y k+1. However, as we will see later, in many important situations the operators M k are not continuously invertible in L(Y, Z). Fortunately, the following framework, which turns out to be widely applicable, provides an escape from this difficulty: Assumption The space Y is continuously and densely embedded in a Banach space Y such that: (i) (Regularity condition) The operators M k map Y continuously into Z with bounded inverses, and there exists a constant C M 1 > such that M 1 k Z,Y C M 1. (ii) (Smoothing condition) The smoothing steps in step 1 satisfy for all k, where ȳ Y solves (3.7). S k (y k+1 ) ȳ Y C S y k+1 ȳ Y Theorem Let f : Y V Z be an operator between Banach spaces, defined on the open set V, with generalized differential f : V L(Y, Z). Denote by ȳ V a solution of (3.7) and let Assumption 3.11 hold. Then holds: (i) If f is f-semismooth at ȳ, then there exists δ > such that, for all y ȳ + δb Y, Algorithm 3.9 either terminates with y = ȳ or generates a sequence (y k ) V that converges q-superlinearly to ȳ in Y. (ii) If in (i) the mapping f is α-order f-semismooth at ȳ, < α 1, then the rate of convergence is at least 1 + α. The proof is similar as that of Proposition Proof. (i): Denote the errors before/after smoothing by vk+1 = y k+1 ȳ and v k+1 = y k+1 ȳ, respectively. Now let δ > be so small that ȳ + δb Y V and consider y k ȳ + δb Y. Using M k s k = f(y k ) and f(ȳ) =, we obtain M k v k+1 = M k(s k +v k ) = f(y k )+M k v k = [f(ȳ+v k ) f(ȳ) M k v k ]. (3.8) This and the f-semismoothness of f at ȳ yield M k v k+1 Z = o( v k Y ) as v k Y. (3.9) Hence, for sufficiently small δ >, we have M k v k+1 Z and thus by Assumption 3.11 (i) 1 2C M 1C S v k Y, (3.1)
44 3.2 Newton Methods for Abstract Semismooth Operators 39 v k+1 Y M 1 k Z,Y M k v k+1 Z 1 2C S v k Y. Therefore, using Assumption 3.11 (ii), This shows v k+1 Y C S v k+1 Y 1 2 v k Y. (3.11) y k+1 ȳ + ( v k Y /2)B Y ȳ + (δ/2)b Y V. (3.12) If the algorithm terminates in step 3, then v k Y = v k+1 Y 1 2 v k Y, hence v k =, and thus y = y k = ȳ. On the other hand, if the algorithm runs infinitely, then (3.12) inductively yields V y k ȳ in Y. Now we conclude from the derived estimates and (3.9) v k+1 Y C S vk+1 Y C S M 1 k Z,Y M k vk+1 Z C S C M 1 M k vk+1 Z = o( v k Y ), (3.13) which completes the proof of (i). (ii): If, in addition, f is α-order f-semismooth at ȳ, then we can write O( v k 1+α Y ) on the right hand side of (3.9) and obtain as in (3.13) Inexact Newton s Method v k+1 Y = O( v k 1+α Y ). From a computational point of view, due to discretization and finite precision arithmetics, we only can compute approximate elements of f in general. We address this issue by allowing a certain amount of inexactness in the operators M k 1. We incorporate the possibility of inexact computations in our algorithm by modifying step 1 of Algorithm 3.9 as follows: Algorithm Inexact Semismooth Newton s Method As Algorithm 3.9, but with step 1 replaced by 1. Choose a boundedly invertible operator B k L(Y, Z), compute s k Y from and set y k+1 = y k + s k. B k s k = f(y k ), 1 We stress that inexact solutions of a linear operator equations Ms = b, M L(Y, Z) can always be interpreted as exact solution of a system with inexact operator: If Md = b + e, then holds (M + δm)d = b with, e.g., δmv = w, v Y,Y e for all v Y, where w Y is chosen such that w, d Y,Y = 1.
45 4 3. Newton Methods for Semismooth Operator Equations On the operators B k we pose a Dennis-Moré-type condition [4, 42, 112, 125], which we formulate in two versions, a weaker one required for superlinear convergence and a stronger variant to prove convergence with rate 1 + α. Assumption (i) There exist operators M k f(y k + s k ) such that (B k M k )s k Z = o( s k Y ) as s k Y, (3.14) where s k Y is the step computed in step 1. (ii) Condition (i) holds with (3.14) replaced by (B k M k )s k Z = O( s k 1+α Y ) as s k Y. Theorem Let f : Y V Z be an operator between Banach spaces, defined on the open set V, with generalized differential f : V L(Y, Z). Let ȳ V be a solution of (3.7) and let f be Lipschitz continuous near ȳ. Further, let the Assumptions 3.11 and 3.14 (i) hold. Then: (i) If f is f-semismooth at ȳ, then there exists δ > such that, for all y ȳ +δb Y, Algorithm 3.13 either terminates with y k = ȳ or generates a sequence (y k ) V that converges q-superlinearly to ȳ in Y. (ii) If in (i) the mapping f is α-order f-semismooth at ȳ, < α 1, and if Assumption 3.14 (ii) is satisfied, then the q-order of convergence is at least 1+α. Proof. We use the same notations as in the proof of Theorem 3.12 and set µ k = (B k M k )s k Z. Throughout, consider y k ȳ + δb Y and let δ > be so small that f is Lipschitz continuous on ȳ + δb Y V with modulus L >. Then holds We estimate the Y -norm of s k : f(y k ) Z L v k Y. s k Y M 1 k Z,Y ( B k s k Z + (M k B k )s k Z ) C M 1( f(y k ) Z + µ k ) C M 1(L v k Y + µ k ). (3.15) By reducing δ, we achieve that C M 1µ k s k Y /2. Hence, s k Y 2C M 1L v k Y. (3.16) Next, using f(ȳ) = and B k s k = f(y k ) = f(ȳ + v k ), we derive M k v k+1 = M k(s k + v k ) = (M k B k )s k + B k s k + M k v k = (M k B k )s k [f(ȳ + v k ) f(ȳ) M k v k ]. (3.17) This, Assumption 3.14 (i), the f-semismoothness of f at ȳ, and (3.16) yield M k v k+1 Z = o( s k Y ) + o( v k Y ) = o( v k Y ) as v k Y. (3.18)
46 3.2 Newton Methods for Abstract Semismooth Operators 41 Now we can proceed as in the proof of Theorem 3.12 (i) to establish assertion (i). (ii): If, in addition, f is α-order f-semismooth at ȳ and Assumption 3.14 (ii) holds, then we can improve (3.18) to M k v k+1 Z = O( s k 1+α Y ) + O( v k 1+α Y ) = o( v k 1+α Y ) as v k Y. Now we can proceed as in the proof of Theorem 3.12 (ii) Projected Inexact Newton s Method As a last variant of semismooth Newton methods, we develop a projected version of Algorithm 3.15 that is applicable to the constrained semismooth operator equation f(y) = subject to y K, (3.19) where K Y is a closed convex set. Hereby, let f : Y V Z be defined on the open set V and assume that (3.19) possesses a solution ȳ V K. Sometimes it is desirable to have an algorithm for (3.19) that stays feasible with respect to K. To achieve this, we augment Algorithm 3.15 by a projection onto K. We assume that an operator P K : Y K Y is available with the following properties: Assumption (i) P K is a projection onto K, i.e., for all y Y holds P K (y) y Y = min v K v y Y. (ii) For all y in an Y -neighborhood of ȳ holds with a constant L P >. P K (y) ȳ Y L P y ȳ Y These two requirements are easily seen to be satisfied in all situations we encounter in this work. In particular, it holds with L P = 1 if Y is a Hilbert space or if K = B and Y = L p (Ω), p [1, ]. In the latter case, we use P B (u)(ω) = P [a(ω),b(ω)] (u(ω)) = max{a(ω), min{u(ω), b(ω)}} on Ω, which satisfies the assumptions (for p [1, ), P B is the unique metric projection onto B). We are now in a position to formulate the algorithm: Algorithm 3.17 (Projected Inexact Semismooth Newton s Method).. Choose an initial point y V K and set k =. 1. Choose an invertible operator B k L(Y, Z), compute s k Y from B k s k = f(y k ), and set y k+1 = y k + s k. 2. Perform a smoothing step: y k+1 Y y 1 k+1 = S k(y k+1 ) Y. 3. Project onto K: y k+1 = P K (y 1 k+1 ). 4. If y k+1 = y k, then STOP with result y = y k Increment k by one and go to step 1.
47 42 3. Newton Methods for Semismooth Operator Equations Remark (i) Since y K and all iterates y k, k 1, are obtained by projection onto K, we have y k K for all k. (ii) It is interesting to observe that by composing the smoothing step and the projection step, we obtain a step S P k (y k+1 ) = P K(S k (y k+1 )) that has the smoothing property in an Y -neighborhood of ȳ. In fact, for y k+1 near ȳ (in Y ) holds by Assumptions 3.11 and 3.16 S P k (y k+1 ) ȳ Y L P S k (y k+1 ) ȳ Y C S L P y k+1 ȳ Y. Theorem Let f : Y V Z be an operator between Banach spaces, defined on the open set V, with generalized differential f : V L(Y, Z). Let K Y be closed and convex with corresponding projection operator P K and let ȳ V K be a solution of (3.19). Further, assume that f is Lipschitz continuous on K near ȳ and let the Assumptions 3.11, 3.14 (i), and 3.16 hold. Then: (i) If f is f-semismooth at ȳ, then there exists δ > such that, for all y (ȳ + δb Y ) K, Algorithm 3.17 either terminates with y k = ȳ or generates a sequence (y k ) V K that converges q-superlinearly to ȳ in Y. (ii) If in (i) the mapping f is α-order f-semismooth at ȳ, < α 1, and if Assumption 3.14 (ii) is satisfied, then the q-order of convergence is at least 1+α. Proof. We only sketch the modifications required to adjust the proof of Theorem 3.15 to the present situation. We choose δ > sufficiently small to ensure that f is Lipschitz on K δ = (ȳ + δb Y ) K. Then, for all y k K δ we can establish (3.15), (3.16), and, by reducing δ, (3.17) and (3.18). A further reduction of δ yields, instead of (3.1), M k v k+1 Y (2C M 1C S L P ) 1 v k Y and thus, analogous to (3.11), v 1 k+1 Y C S v k+1 Y C M 1C S M k v k+1 Y (2L P ) 1 v k Y, where vk+1 1 = y1 k+1 ȳ. Hence, for δ small enough, Assumption 3.16 (ii) can be used to derive v k+1 Y L P vk+1 1 Y v k Y /2. The rest of the proof, including the one for part (ii), can be transcribed directly from Theorem Alternative Regularity Conditions In the convergence theorems we used the regularity condition of Assumption 3.11 (i), which requires uniform invertibility in L(Y, Z) of all operators M k. Since M k f(y k ), we also could require the uniform invertibility of all M f(y) on a neighborhood of ȳ, more precisely:
48 3.2 Newton Methods for Abstract Semismooth Operators 43 Assumption 3.2. There exist η > and C M 1 > such that, for all y ȳ +ηb Y, every M f(y) is an invertible element of L(Y, Z) with M 1 Z,Y C M 1. Then obviously holds: Theorem Let the operator f : Y Z and a corresponding generalized differential f : Y L(Y, Z) be given. Denote by ȳ Y a solution of (3.7) and let the Assumption 3.2 hold. Further assume that y k ȳ + ηb Y for all k. Then Assumption 3.11 (i) holds. In particular, the Theorems 3.12, 3.15, and 3.19 remain true if Assumption 3.11 (i) is replaced by Assumption 3.2. Proof. The first part follows directly from the fact that M k f(y k ). The proofs of the Theorems 3.12, 3.15, and 3.19 can be applied without change as long as y k ȳ + ηb Y. In particular it follows for y k ȳ + δb Y and δ (, η] small enough that y k+1 ȳ + (δ/2)b Y ȳ + ηb Y, see, e.g., (3.12). Therefore, all iterates remain in ȳ + ηb Y, and the proofs are applicable without change. Remark For the projected Newton method, the requirement of Assumption 3.2 can be restricted to all y (ȳ + ηb Y ) K. A further variant, which corresponds to the finite-dimensional CD-regularity is obtained by restricting the bounded invertibility to all M f(ȳ). Assumption The multifunction y Y f(y) L(Y, Z) is upper semicontinuous at ȳ, and there exists C M 1 > such that every M f(ȳ) is invertible in L(Y, Z) with M 1 Z,Y C M 1. Theorem Assumption 3.23 implies Assumption 3.2. In particular, the Theorems 3.12, 3.15, and 3.19 remain true if Assumption 3.11 (i) is replaced by Assumption Proof. Let the Assumption 3.23 hold and choose ε = 1/(2C M 1). By upper semicontinuity there exists η > such that f(y) f(ȳ) + εb L(Y,Z) for all y ȳ + ηb Y. Now consider any y ȳ + ηb Y and any M f(y). Then there exists M f(ȳ) with M M Y,Z < ε = 1 2C M M 1 Z,Y. Therefore, by Banach s theorem [91, p. 155], M is invertible in L(Y, Z) with M 1 Z,Y M 1 Z,Y 1 M 1 Z,Y M M Y,Z C M 1 1 C M 1/(2C M 1) = 2C M 1. Thus, Assumption 3.2 holds with C M 1 replaced by 2C M 1.
49 44 3. Newton Methods for Semismooth Operator Equations Remark Theorem 3.24 is conveniently applicable in finite dimensions. In the general Banach space setting, however, upper semicontinuity of f with respect to the operator norm topology is a quite strong requirement. More realistic is usually upper semicontinuity with respect to the weak operator topology on the image space, which is generated by the seminorms M w, My Z,Z, w Z, y Y. However, this weak form of upper semicontinuity is (except for the finite-dimensional case) not strong enough to establish results like in Theorem In conclusion, we observe that in the infinite-dimensional setting the regularity conditions stated in Assumption 3.11 (i) and in Assumption 3.2 are much wider applicable than Assumption Semismooth Newton Methods for Superposition Operators We now concentrate on nonsmooth superposition operators of the form Ψ : Y L r (Ω), Ψ(y)(ω) = ψ ( G(y)(ω) ), (3.2) with mappings ψ : R m R and G : Y m i=1 Lr i (Ω). Throughout we assume that 1 r r i <, Y is a real Banach space, and Ω R n is a bounded measurable set with positive Lebesgue measure. Remark Since all our investigations are of local nature, it would be sufficient if G is only defined on a nonempty open subset of Y. Having this in mind, we prefer to work on Y to avoid notational inconveniences. Throughout, our investigations are illustrated by applications to the reformulated NCP Φ(u) =, where Φ(u)(ω) = φ ( u(ω), F (u)(ω) ) on Ω (3.21) with F : L p (Ω) L p (Ω), p, p (1, ]. As already observed, Φ can be cast in the form Ψ Assumptions In the rest of the chapter, we will impose the following assumptions on G and ψ: Assumption There are 1 r r i < q i, 1 i m, such that (a) The operator G : Y i Lr i (Ω) is continuously Fréchet differentiable. (b) The mapping y Y G(y) i Lq i (Ω) is locally Lipschitz continuous, i.e., for all y Y there exists an open neighborhood U = U(y) and a constant L G = L G (U) such that i G i(y 1 ) G i (y 2 ) L q i L G y 1 y 2 Y for all y 1, y 2 U.
50 3.3 Semismooth Newton Methods for Superposition Operators 45 (c) The function ψ : R m R is Lipschitz continuous of rank L ψ >, i.e., (d) ψ is semismooth. ψ(x 1 ) ψ(x 2 ) L ψ x 1 x 2 1 for all x 1, x 2 R m, Remark Since by assumption the set Ω is bounded, we have the continuous embedding L q (Ω) L p (Ω) whenever 1 p q. Remark It is important to note that the norm of the image space in (b) is stronger than in (a). For semismoothness of order > we will strengthen the Assumptions 3.27 as follows: Assumption 3.3. As Assumption 3.27, but with (a) and (d) replaced by: There exists α (, 1] such that (a) The operator G : Y i Lr i (Ω) is Fréchet differentiable with locally α- Hölder continuous derivative. (d) ψ is α-order semismooth. Note that for the special case Y = i Lq i (Ω) and G = I we have Ψ : y Y ψ(y), and it is easily seen that the Assumptions 3.27 or 3.3, respectively, reduce to parts (c) and (d). Under the Assumptions 3.27, the operator Ψ defined in (3.2) is well defined and locally Lipschitz continuous. Proposition Let the Assumptions 3.27 hold. Then for all 1 q q i, 1 i m, and thus in particular for q = r, the operator Ψ defined in (3.2) maps Y locally Lipschitz continuous into L q (Ω). Proof. Using Lemma A.4, we first prove Ψ(Y ) L q (Ω), which follows from Ψ(y) L q = ψ ( G(y) ) L q ψ() L q + ψ ( G(y) ) ψ() L q c q, (Ω) ψ() + L ψ i G i(y) L q c q, (Ω) ψ() + L ψ i c q,q i (Ω) G i (y) L q i. To establish the local Lipschitz continuity, denote by L G the local Lipschitz constant in Assumption 3.27 (b) on the set U and let y 1, y 2 U be arbitrary. Then, again by Lemma A.4, Ψ(y 1 ) Ψ(y 2 ) L q L ψ i G i(y 1 ) G i (y 2 ) L q L ψ i c q,q i (Ω) G i (y 1 ) G i (y 2 ) L q i ( ) L ψ L G max c q,q i (Ω) y 1 y 2 Y. 1 i m
51 46 3. Newton Methods for Semismooth Operator Equations For the special case Φ in (3.21), the nonsmooth NCP-reformulation, and the choices Y = L p (Ω), q 1 = p, q 2 = p, r 2 = r [1, p ) [1, p), r 1 [r, p), ψ φ, G(u) = ( u, F (u) ), (3.22) we have Ψ Φ, and Assumption 3.27 can be expressed in the following simpler form: Assumption There exists r [1, p) [1, p ) such that (a) The mapping u L p (Ω) F (u) L r (Ω) is continuously Fréchetdifferentiable. (b) The operator F : L p (Ω) L p (Ω) is locally Lipschitz continuous. (c) The function φ : R 2 R is Lipschitz continuous. (d) φ is semismooth. In fact, (a) and the continuous embedding L p (Ω) L r 1 (Ω) imply 3.27 (a). Further, (b) and the Lipschitz continuity of the identity u L p (Ω) u L p (Ω) yield 3.27(b). Finally, (c),(d) imply 3.27 (c),(d). In the same way, Assumption 3.3 for Φ becomes Assumption As Assumption 3.32, but with (a) and (d) replaced by: There exist r [1, p) [1, p ) and α (, 1] such that (a) The operator F : L p (Ω) L r (Ω) is Fréchet differentiable with locally α- Hölder continuous derivative. (d) φ is α-order semismooth. Remark The three different L p -spaces deserve an explanation. Usually, we have the following scenario: F : L 2 (Ω) L 2 (Ω) is (often even twice) continuously differentiable and has the property that there exist p, p > 2 such that the mapping u L p (Ω) F (u) L p (Ω) is locally Lipschitz continuous. A typical example arises from optimal control problems as the problem (1.11) that we discussed in section In this problem, which in view of many applications can be considered to be typical, F = j is the reduced gradient of the control problem, which, in adjoint representation, is given by F (u) = λu w(u), where w(u) is the adjoint state. The mapping u w(u) is locally Lipschitz continuous (for the problem under consideration even affine linear) from L 2 (Ω) to H 1 (Ω) and thus, via continuous embedding, also to L p (Ω) for suitable p > 2. Hence, for any p p, F maps L p (Ω) locally Lipschitz continuous to L p (Ω). Often, we can invoke regularity results for the adjoint equation to prove the local Lipschitz continuity of the mapping u L 2 (Ω) H 1 (Ω) H 2 (Ω) which allows to choose p even larger, if desired. Therefore, as a rule of thumb, usually we are dealing with the case where F is smooth as a mapping L 2 (Ω) L 2 (Ω) and locally Lipschitz continuous as a mapping L p (Ω) L p (Ω), p, p > 2. Obviously, these condition imply the weaker Assumption 3.32 for 1 r 2 and p, p > 2 as specified.
52 3.3.2 A Generalized Differential 3.3 Semismooth Newton Methods for Superposition Operators 47 For the development of a semismoothness concept for the operator Ψ defined in (3.2) we have to choose an appropriate generalized differential. As we already mentioned in the introduction, our aim is to work with a differential that is as closely connected to finite dimensional generalized Jacobians as possible. Hence, we will propose a generalized differential Ψ in such a way that its natural finite-dimensional discretization contains Qi s C-subdifferential. Our construction is motivated by a formal pointwise application of the chain rule. In fact, suppose for the moment that the operator y Y G(y) C(Ω) m is continuously differentiable, where C(Ω) denotes the space of continuous functions equipped with the max-norm. Then for fixed ω Ω the function f : y G(y)(ω) is continuously differentiable with derivative f (y) L(Y, R m ), f (y) : v ( G (y)v ) (ω). The chain rule for generalized gradients [32, Thm ] applied to the real-valued mapping y Ψ(y)(ω) = ψ ( f(y) ) yields ( Ψ(y)(ω) ) ψ ( f(y) ) f (y) = { g Y : g, v Y,Y = i d i(ω) ( G i (y)v) (ω), d(ω) ψ ( G(y)(ω) )}. (3.23) Furthermore, we can replace by = if ψ is regular (e.g., convex or concave) or if the linear operator f (y) is onto, see [32, Thm ]. Inspired by the idea of the finite-dimensional C-subdifferential, and following the above motivation, we return to the general setting of Assumption 3.27, and define the generalized differential Ψ(y) in such a way that for all M Ψ(y), the linear form v (Mv)(ω) is an element of the right hand side in (3.23): Definition Let the Assumptions 3.27 hold. For Ψ as defined in (3.2) we define the generalized differential Ψ : Y L(Y, L r ), { Ψ(y) def = M L(Y, L r ) : M : v i d i (G i (y)v) }, d measurable selection of ψ ( G(y) ). (3.24) Remark The superscript is chosen to indicate that this generalized differential is designed for superposition operators. The generalized differential Ψ(y) is nonempty. To show this, we first prove: Lemma Let the Assumption 3.27 (a) hold and let d L (Ω) m be arbitrary. Then the operator M : v Y i d i (G i(y)v ) is an element of L(Y, L r ) and M Y,L r i c r,r i (Ω) d i L G i(y) Y,L r i. (3.25)
53 48 3. Newton Methods for Semismooth Operator Equations Proof. By Assumption 3.27 (a) and Lemma A.4 Mv L r = d i (G i(y)v ) L r d i L G i(y)v L r i i ( ) c r,r i (Ω) d i L G i (y) Y,L i r i v Y for all v Y, which shows that (3.25) holds and M L(Y, L r ). In a next step, we show that the multifunction ψ ( G(y) ) : ω Ω ψ ( G(y)(ω) ) R m is measurable (see Definition A.7 or [129, p. 16]). Lemma Any closed-valued, upper semicontinuous multifunction Γ : R k R l is Borel measurable. Proof. Let C R l be compact. We show that Γ 1 (C) is closed. To this end, let x k Γ 1 (C) be arbitrary with x k x. Then there exist z k Γ (x k ) C, and, due to the compactness of C, we achieve by transition to a subsequence that z k z C. Since x k x, upper semicontinuity yields that there exist ẑ k Γ (x ) with (z k ẑ k ) and thus ẑ k z. Therefore, since Γ (x ) is closed, we obtain z Γ (x ) C. Hence, x Γ 1 (C), which proves that Γ 1 (C) is closed and therefore a Borel set. Corollary The multifunction ψ ( G(y) ) : Ω R is measurable. Proof. By Lemma 3.38, the compact-valued and upper semicontinuous multifunction ψ is Borel measurable. Now, for all closed sets C R m, we have, setting w = G(y) i Lr i (Ω), ψ ( G(y) ) 1 (C) = {ω Ω : w(ω) ψ 1 (C)}. This set is measurable, since ψ 1 (C) is a Borel set and w is a (class of equivalent) measurable function(s). The next result is a direct consequence of Lipschitz continuity, see [32, 2.1.2]. Lemma 3.4. Under Assumption 3.27 (c) holds ψ(x) [ L ψ, L ψ ] m for all x R m. Combining this with Corollary 3.39 yields: Lemma Let the Assumptions 3.27 hold. Then for all y Y, the set K(y) = {d : Ω R m : d measurable selection of ψ ( G(y) ) } (3.26) is a nonempty subset of L ψ Bm L L (Ω) m.
54 3.3 Semismooth Newton Methods for Superposition Operators 49 Proof. By the Theorem on Measurable Selections [129, Cor. 1C] and Corollary 3.39, ψ ( G(y) ) admits at least one measurable selection d : Ω R m, i.e., d(ω) ψ ( G(y)(ω) ) a.e. on Ω. From Lemma 3.4 follows d L ψ Bm L. We now can prove: Proposition Under the Assumptions 3.27, for all y Y the generalized differential Ψ(y) is nonempty and bounded in L(Y, L r ). Proof. Lemma 3.41 ensures that there exist measurable selections d of ψ ( G(y) ) and that all these d are contained in L ψ Bm L. Hence, Lemma 3.37 shows that M : v i d i (G i(y)v ) is in L(Y, L r ). The boundedness of Ψ(y) follows from (3.25). We now have everything at hand to introduce a semismoothness concept that is based on the generalized differential Ψ. We postpone the investigation of further properties of Ψ to sections and There, we will establish chain rules, the convex-valuedness, weak compact-valuedness, and the weak graph closedness of Ψ Semismoothness of Superposition Operators In this section, we prove the main result of this chapter, which asserts that under Assumption 3.27 the operator Ψ is Ψ-semismooth. Under Assumption 3.3 and a further condition we establish Ψ-semismooth of order >. For convenience, we will use the term semismoothness instead of Ψ-semismoothness in the sequel. Therefore, applying the general Definition 3.1 to the current situation, we have: Definition The operator Ψ is called ( Ψ-) semismooth at y Y if it is continuous near y and sup Ψ(y + s) Ψ(y) Ms L r = o( s Y ) as s in Y. (3.27) M Ψ(y+s) Ψ is α-order ( Ψ-) semismooth at y Y, < α 1, if it is continuous near y and Ψ(y + s) Ψ(y) Ms L r = O ( s 1+α ) Y as s in Y. (3.28) sup M Ψ(y+s) In the following main theorems we establish the semismoothness and the β-order semismoothness, respectively, of the operator Ψ.
55 5 3. Newton Methods for Semismooth Operator Equations Theorem Under the Assumptions 3.27, the operator Ψ is semismooth on Y. Under slightly stronger assumption, we can also establish β-order semismoothness of Ψ: Theorem Let the Assumptions 3.3 hold and let y Y. Assume that there exists γ > such that the set ( ( ) Ω ε = {ω : max ρ G(y)(ω), h ε α h 1+α ) 1 > }, ε >, h 1 ε with the residual function ρ : R m R m R given by ρ(x, h) = has the following decrease property: max ψ(x + h) ψ(x) z T h, z T ψ(x+h) µ(ω ε ) = O(ε γ ) as ε +, (3.29) Then the operator Ψ is β-order semismooth at y with { } γν αγν β = min,, where 1 + γ/q α + γν q = min q i, 1 i m ν = q r q r if q <, ν = 1 r if q =. (3.3) The proofs of both theorems will be presented in section Remark Condition 3.29 requires the measurability of the set Ω ε, which will be verified in the proof. Remark As we will see in Lemma 3.54, it would be sufficient to require only the local β-order Hölder continuity of G in Assumption 3.3 (a) with β α as defined in (3.3). It might be helpful to give an explanation of the abstract condition (3.29). For convenient notation, let x = G(y)(ω). Due to the α-order semismoothness of ψ provided by Assumption 3.3, we have ρ ( x, h ) = O ( h 1+α ) 1 as h. In essence, Ω ε is the set of all ω Ω where there exists h ε B 1 m for which this asymptotic behavior is not yet observed, because the remainder term ρ ( x, h ) exceeds h 1+α 1 by a factor of at least ε α, which grows infinitely as ε. From the continuity of the Lebesgue measure it is clear that µ(ω ε ) as ε. The decrease condition (3.29) essentially states that the measure of the set Ω ε where G(y) takes bad values, i.e., values at which the radius of small residual is very small, decreases with the rate ε γ. The following subsection applies Theorem 3.44 and Theorem 3.45 to reformulated nonlinear complementarity problems. Furthermore, it provides a very concrete interpretation of condition (3.29).
56 Application to NCPs 3.3 Semismooth Newton Methods for Superposition Operators 51 We apply the semismoothness result to the operator Φ that arises in the reformulation (3.21) of nonlinear complementarity problems (3.5). In this situation, Assumption 3.27 can be expressed in form of Assumption Hence, Theorem 3.44 becomes Theorem Under Assumption 3.27, the operator Φ : L p (Ω) L r (Ω) defined in (3.21) is semismooth on L p (Ω). Remark Due to the structure of Φ, we have for all M Φ(u) and v L p (Ω) Mv = d 1 v + d 2 (F (y)v ), (3.31) where d L (Ω) 2 is a measurable selection of φ ( u, F (u) ). Theorem 3.45 is applicable as well. Once we have chosen a particular NCP-function, condition (3.29) can be made very concrete, so that we can write Theorem 3.45 in a more elegant form. We discuss this for the Fischer Burmeister function φ = φ F B, which is Lipschitz continuous and 1-order semismooth, and thus satisfies Assumptions 3.3 (c) and (d) with α = 1. Then holds Theorem 3.5. Let the Assumptions 3.33 (a), (b) hold and consider the operator Φ with φ = φ F B. Assume that for u L p (Ω) there exists γ > such that µ ({ < u + F (u) < ε}) = O(ε γ ) as ε. (3.32) Then Φ is β-order semismooth at u with { } γν β = min 1 + γ/q, αγν, where α + γν q = min{p, p }, ν = q r if q <, qr ν = 1 r if q =. (3.33) Proof. We only have to establish the equivalence of (3.29) and (3.32). Obviously, this follows easily when we have established the following relation: { { < G(u) 1 < ε} Ω ε < G(u) 1 < ( /2) } ε (3.34) with G(u) = ( u, F (u) ). The function φ = φ F B is C on R 2 \ {}, see section 2.5.2, with derivative φ (x) = (1, 1) x T / x 2. To show the first inclusion in (3.34), let ω be such that x = G(u)(ω) satisfies < x 1 < ε. We observe that, for all λ R, there holds and thus, for all σ >, φ(λx) = λ(x 1 + x 2 ) λ x 2,
57 52 3. Newton Methods for Semismooth Operator Equations ρ(x, (1 + σ)x) = σ x 2 + x 2 + (1 + σ) xt x x = 2 x 2. Hence, for the choice h = tx with t (1, 2) such that h 1 ε, we obtain ρ(x, h) = 2 x x 1 = t h 1 > h 1 ε α h 1+α 1. This implies ω Ω ε and thus proves the first inclusion. Next, we prove the second inclusion in (3.34). On R 2 \ {} there holds φ (x) = 1 ( ) x 2 2 x 1 x 2 x 3 x 2 1 x 2 x 2. 1 The eigenvalues of φ (x) are and x 1 2. In particular, we see that φ (x) 2 = x 1 2 explodes as x. If / [x, x + h], then Taylor expansion of φ(x) about x + h yields with appropriate τ [, 1] ρ(x, h) = φ(x + h) φ(x) φ (x + h)h = 1 2 ht φ (x + τh)h h x + τh 2. Further, ρ(, h) = and ρ(x, ) =. Now consider any ω Ω that is not contained in the right hand side of (3.34) and set x = G(u)(ω). If x = then certainly ω / Ω ε, since then ρ(x, ). If on the other hand x 1 ( /2) ε then we have for all h ε B 2 1 ρ(x, h) h x + τh 2 h x + τh 1 ε 1 h 2 1 ε α h 1+α 1, and thus ω / Ω ε. Remark The meaning of (3.29), which was shown to be equivalent to (3.32), can be interpreted in the following way: The set { < u + F (u) < ε} on which the decrease rate in measure is assumed is the set of all ω where strict complementarity holds, but is less than ε. In a neighborhood of these points the curvature of φ is very large since φ (G(u)(ω)) 2 = G(u)(ω) 1 2 is big. This requires that G(u + s)(ω) G(u)(ω) must be very small in order to have a sufficiently small residual ρ ( G(u)(ω), G(u + s)(ω) G(u)(ω) ). We stress that a violation of strict complementarity, i.e., u(ω) = F (u)(ω) = does not cause any problems since then ρ(g(u)(ω), ) ρ(, ) Illustrations In this section we give two examples to illustrate the above analysis by pointing out the necessity of the main assumptions and by showing that the derived results cannot be improved in several respects:
58 3.3 Semismooth Newton Methods for Superposition Operators 53 Example 3.52 shows the necessity of the norm gap between the L q i - and L r - norms. Example 3.53 discusses the sharpness of our order of semismoothness β in Theorem 3.44 for varying values of γ. In order to prevent our examples from being too academical, we will not work with the simplest choices possible. Rather, we will throughout use reformulations of NCPs based on the Fischer Burmeister function. In the proofs of Theorem 3.44 and Theorem 3.45, more precisely in the derivation of (3.41) and (3.42), we need the gap between the L q i - and L r -norms in order to apply Hölder s inequality. The following example illustrates that both theorems do in general not hold if we drop the condition r i < q i in the Assumptions 3.27 and 3.3. Example 3.52 (Necessity of the L q i -L r -norm gap). We consider the operator Φ arising in semismooth reformulations of the NCP by means of the Fischer Burmeister function. Theorem 3.48 ensures that, under Assumption 3.32, Φ is semismooth. Our aim here is to show that the requirement r < q = min{p, p } is indispensable in the sense that in general (3.27) (with Ψ Φ) is violated for r q. In section 3.2 we developed and analyzed semismooth Newton methods. A central requirement for superlinear convergence is the semismoothness of the underlying operator at the solution. Hence, we will construct a simple NCP with a unique solution for which (3.27) fails to hold whenever r q. Let 1 < p be arbitrary, choose Ω = (, 1), and set F (u)(ω) = u(ω) + ω. Obviously, ū is the unique solution of the NCP. Choosing p = p, φ = φ F B, and α = 1, the Assumptions 3.27 and 3.3 are satisfied for all r [1, p). To show that the requirement r < p is really necessary to obtain the semismoothness of Φ we will investigate the residual R(s) def = Φ(ū + s) Φ(ū) Ms, M Φ(ū + s), (3.35) at ū with s L (Ω), s, s. Our aim is to show that for all r [1, ] holds R(s) L r = o( s L p) as s in L = r < p. (3.36) Setting σ = s(ω), we have for all ω (, 1) (Ms)(ω) = d 1 (ω)s(ω) + d 2 (ω)(f ()s)(ω) = d 1 (ω)σ + d 2 (ω)σ d(ω) φ ( s(ω), F (s)(ω) ) = φ(σ, σ + ω) = {φ (σ, σ + ω)}, with where we have used σ + ω > and that φ is smooth at x. Hence, with e = (1, 1) T, noting that the linear part of φ cancels in R(s)(ω), we derive
59 54 3. Newton Methods for Semismooth Operator Equations R(s)(ω) = φ(σ, σ + ω) φ(, ω) φ (σ, σ + ω)σe = (σ, σ + ω) 2 + (, ω) 2 + = ω σ2 + (σ + ω) 2 σ(2σ + ω) (σ, σ + ω) 2 σ(σ, σ + ω)e (σ, σ + ω) 2 = ω ω(σ + ω) (2σ 2 + 2σω + ω 2 ) 1/2. def Now let < ε < 1. For the special choice s ε = ε1 (,ε), i.e., s ε (ω) = ε for ω (, ε) and s ε (ω) =, otherwise, we obtain s ε L p = ε p+1 p (1 < p < ), s ε L = ε. In particular, s ε in L as ε. For < ω < ε holds ( R(s ε )(ω) ω 1 sup <t<1 ) 1 + t 2 + 2t + t 2 Hence, R(s ε ) L ε 1 s ε Lp, and for all r [p, ) 1 = ω ω 5 1. R(s ε ) L r 1 1 ( ε ) 1 ω r r ε r+1 r dω = 1(r + 1) 1 r s ε L p. 1(r + 1) 1 r Therefore, (3.36) is proven. This shows that in (3.27) the norm on the left must be stronger than on the right. Next, we show that, at least in the case q (1 + α)r, the order of our semismoothness result is sharp. By showing this for varying values of γ, we also observe that decreasing values of γ reduce the maximum order of semismoothness exactly as stated in Theorem Hence, our result does not overestimate the role of γ. Example 3.53 (Order of semismoothness and its dependence on γ). We consider the following NCP, which generalizes the one in Example 3.52: Let 1 < p be arbitrary, set Ω = (, 1), and choose F (y)(ω) = u(ω) + ω θ, θ >. Obviously, ū is the unique solution of the NCP. Choosing p = p, φ = φ F B, and α = 1, Assumption 3.3 is satisfied for all r [1, p). From F (ū)(ω) = (, ω θ ) follows that γ = 1/θ is the maximum value for which condition (3.32), and thus the equivalent condition (3.29), is satisfied. With the residual R(s) as defined in (3.35) we obtain R(s)(ω) = ω θ ω θ (s(ω) + ω θ ) 2s(ω)2 + 2s(ω)ω θ + ω 2θ. def For ε (, 1) and s ε = ε θ 1 (,ε) we have
60 Further, for < ω < ε holds 3.3 Semismooth Newton Methods for Superposition Operators 55 s ε L p = ε pθ+1 p (1 < p < ), s ε L = ε θ. R(s ε )(ω) ω θ (1 sup <t<1 Hence, for all r [1, p) ) 1 + t 2 + 2t + t 2 = ω θ ωθ 5 1. R(s ε ) L r 1 ( ε ) 1 ω rθ r ε rθ+1 r dω = 1 1(rθ + 1) 1 r prθ+p s prθ+r ε L p 1(rθ + 1) 1 r γν 1+γ/q = s ε 1+ L p 1(rθ + 1) 1 r with q = p = p, γ = 1/θ and ν as in (3.3). This shows that the value of β given in Theorem 3.44 is sharp for all values of θ (and thus γ) at least as long as q (1 + α)r, which in the current setting can be written as p (1 + α)r. We think that in the case q > (1+α)r our value of β could still be slightly improved by splitting Ω in more than the two parts Ω βε and Ωβε c by choosing different values ε k for ε that correspond to different powers of v Πi L q i. In order to keep the analysis as clear as possible, we do not pursue this idea any further in the current work Proof of the Main Theorems We can simplify the analysis by exploiting the following fact. Lemma Let the Assumptions 3.27 hold and suppose that the operator Λ : u i Lq i (Ω) ψ(u) L r (Ω) is semismooth. Then the operator Ψ : Y L r (Ω) defined in (3.2) is also semismooth. Further, if the Assumptions 3.3 hold and Λ is α-order semismooth then Ψ is α-order semismooth. Proof. We first observe that, given any M Ψ(y +s), there is M Λ Λ ( G(y + s) ) such that M = M Λ G (y + s). In fact, there exists a measurable selection d L (Ω) m of ψ(ω) such that M = i d i G i (y + s), and obviously M Λ : v i d iv i yields an element of Λ ( G(y + s) ) with the desired property. A more general chain rule will be established in Theorem Setting g = G(y), v = G(y + s) G(y), and w = G(y + s), we have sup Ψ(y + s) Ψ(y) Ms L r M Ψ(y+s) sup Λ(w) Λ(g) M Λ G (y + s)s L r M Λ Λ(w)
61 56 3. Newton Methods for Semismooth Operator Equations sup Λ(w) Λ(g) M Λ v L r M Λ Λ(w) + sup ( M Λ G(y + s) G(y) G (y + s)s ) L r M Λ Λ(w) def = ρ Λ + ρ MG. By the local Lipschitz continuity of G and the semismoothness of Λ, we obtain ρ Λ = o( v Πi L q i ) = o( s Y ) as s in Y. Further, since d L ψ Bm L by Lemma 3.41, we have by Assumption 3.27 (a) ρ MG L r L ψ i G i(y + s) G i (y) G i (y + s)s L r L ψ i c r,r i (Ω) G i (y + s) G i (y) G i(y + s)s L r i = o( s Y ) as s in Y. This proves the first result. Now let the Assumptions 3.3 hold and Λ be α-order semismooth. Then ρ Λ and ρ MG are both of the order O ( s 1+α ) Y, which implies the second assertion. For the proof of Theorems 3.44 and 3.45 we need, as a technical intermediate result, the Borel measurability of the function ρ : R m R m R, ρ(x, h) = max ψ(x + h) ψ(x) z T h. (3.37) z T ψ(x+h) We prove this by showing that ρ is upper semicontinuous. Readers familiar with this type of results might want to skip the proof of Lemma Recall that a function f : R l R is upper semicontinuous at x if lim sup f(x ) f(x). x x Equivalently, f is upper semicontinuous if and only if {x : f(x) a} is closed for all a R. Lemma Let f : (x, z) R l R m R be upper semicontinuous. Moreover, let the multifunction Γ : R l R m be upper semicontinuous and compact-valued. Then the function g : R l R, g(x) = max z Γ (x) f(x, z), is well-defined and upper semicontinuous. Proof. For x R l, let (z k ) Γ (x) be such that lim f(x, z k) = sup f(x, z). k z Γ (x)
62 3.3 Semismooth Newton Methods for Superposition Operators 57 Since Γ (x) is compact, we may assume that z k z (x) Γ (x). Now, by upper semicontinuity of f, f ( x, z (x) ) lim sup f(x, z k ) = sup f(x, z) f ( x, z (x) ). k z Γ (x) Thus, g is well-defined and there exists z : R l R m with g(x) = f ( x, z (x) ). We now prove the upper semicontinuity of g at x. Let (x k ) R l tend to x in such a way that lim g(x k) = lim sup g(x ), k x x and set z k = z (x k ) Γ (x k ). By the upper semicontinuity of Γ there exists (ẑ k ) Γ (x) with (ẑ k z k ) as k. Since Γ (x) is compact, a subsequence can be selected such that the sequence (ẑ k ), and thus (z k ), converges to some ẑ Γ (x). Now, using that f is upper semicontinuous and ẑ Γ (x), lim sup g(x ) = lim g(x k) = lim f(x k, z k ) x x k k = lim sup f(x k, z k ) f(x, ẑ) g(x). k Therefore, g is upper semicontinuous at x. Lemma Let ψ : R m R be locally Lipschitz continuous. Then the function ρ defined in (3.37) is well-defined and upper semicontinuous. Proof. Since ψ is upper semicontinuous and compact-valued, the multifunction (x, h) R m R m ψ(x + h) is upper semicontinuous and compact-valued as well. Further, the mapping (x, h, z) ψ(x + h) ψ(x) z T h is continuous, and we may apply Lemma 3.55, which yields the assertion. Proof of Theorem 3.44 By Lemma 3.54, it suffices to prove the semismoothness of the operator Λ : u i Lq i (Ω) ψ(u) L r (Ω). (3.38) In Lemma 3.56 we showed that the function ρ : R m R m R, ρ(x, h) = max ψ(x + h) ψ(x) z T h, z T ψ(x+h) is upper semicontinuous and thus Borel measurable. Hence, for u, v i Lr i (Ω), the function ρ(u, v) is measurable. We define the measurable function
63 58 3. Newton Methods for Semismooth Operator Equations a = ρ(u, v) v {v=}. Since ρ ( u(ω), v(ω) ) = whenever v(ω) =, we obtain Furthermore, a(ω) = ρ ( u(ω), v(ω) ) v(ω) {v=} (ω) = ρ(u, v) = a v 1. Due to the Lipschitz continuity of ψ, we have ( ) o v(ω) 1 as v(ω). v(ω) {v=} (ω) (3.39) ρ(x, h) 2L ψ h 1, (3.4) which implies a 2L ψ BL. Now let (v k ) tend to zero in the space i Lq i (Ω) and set a k = a v=vk. Then every subsequence of (v k ) contains itself a subsequence (v k ) such that v k a.e. on Ω. By (3.39), this implies a k a.e. on Ω. Since (a k ) is bounded in L (Ω), we conclude lim k a k L t = for all t [1, ). Hence, in L t (Ω), 1 t <, zero is an accumulation point of every subsequence of (a k ). This proves a k in all spaces L t (Ω), 1 t <. Since the sequence (v k ), v k, was arbitrary, we thus have proven that for all 1 t < holds a L t as v Πi L q i. Now we can use Hölder s inequality to obtain ρ(u, v) Lr (Ω) av i L r a L i i p i v i L q i ( max a ) L p i v Πi L q i 1 i m = o ( ) v Πi L q i as v Πi L q i, (3.41) q ir where p i = q i r if q i < and p i = r if q i =. Note that here we exploited the fact that r < q i. This proves the semismoothness of Λ. Proof of Theorem 3.45 Also here, by Lemma 3.54, it suffices to prove the β-order semismoothness of the operator Λ defined in (3.38). We now suppose that the Assumption 3.3 and, in addition, (3.29) hold. First, note that for fixed ε > the function (x, h) R m R m ρ(x, h) ε α h 1+α 1
64 3.3 Semismooth Newton Methods for Superposition Operators 59 is upper semicontinuous and that the multifunction x R m ε B m 1 is compact-valued and upper semicontinuous. Hence, by Lemma 3.55, the function ( x R m max ρ(x, h) ε α h 1+α ) 1 h 1 ε is upper semicontinuous and therefore Borel measurable. This proves the measurability of the set Ω ε appearing in (3.29). For ε > and < β α we define the set Ω βε = {ω : ρ ( u(ω), v(ω) ) > ε β v(ω) 1+β 1 }, and observe that Ω βε Ω ε { v 1 > ε} def = Ω ε Ω ε. In fact, let ω Ω βε be arbitrary. The nontrivial case is v(ω) 1 ε. We then obtain for h = v(ω) ρ ( u(ω), h ) > ε β h 1+β 1 = ε α ε α β h 1+β 1 ε α h α β 1 h 1+β 1 = ε α h 1+α 1, and thus, since h 1 ε, ( ( ) max ρ u(ω), h ε α h 1+α ) 1 >, h 1 ε showing that ω Ω ε. In the case q = min q i < we derive the estimate 1 i m µ(ω ε ) = µ ({ v 1 > ε}) ε 1 v 1 q L q (Ω ε ( ) ) ε q max c q,q i (Ω ε ) q v q Π i i L q i = ε q O ( v q ) Π i L q i. If we choose ε = v λ Π i L q i, < λ < 1, then ( ) ( µ(ω βε ) µ(ω ε ) + µ(ω ε) = O v γλ Π i L q i + O v (1 λ)q Π i L q i This estimate is also true in the case q = since then µ(ω ε ) = as soon as v Πi L q i < 1. This can be seen by noting that then for a.a. ω Ω holds v(ω) 1 v 1 L v Πi L q i v λ Π i L q i = ε. Introducing ν = q r q if q r < and ν = 1/r, otherwise, for all < β α, we obtain, using (3.4) and Lemma A.4 ).
65 6 3. Newton Methods for Semismooth Operator Equations ρ(u, v) Lr (Ω βε ) 2L ψ v 1 Lr (Ω βε ) 2L ψ c r,q (Ω βε ) v L q (Ωβε ) m 2L ψ µ(ω βε ) ν v L q (Ωβε ) ( ) ( m = O v 1+γλν Π i L q i + O v 1+(1 λ)νq Π i L q i ). (3.42) Again, we have used here the fact that r < q q i, which allowed us to take advantage of the smallness of the set Ω βε. Finally, on Ωβε c, (1 +β)r q, < β α, holds with our choice ε = v λ Π i L q i ρ(u, v) Lr (Ωβε c ) ε β v 1+β 1 Lr (Ωβε c ) c q r, (Ωc 1+β βε) v βλ ( ) = O v 1+β(1 λ) Π i L q i. Π i L q i v 1+β L q (Ωβε c )m Therefore, ρ(u, v) L r ( = O v 1+γλν Π i L q i ) ( + O v 1+(1 λ)νq Π i L q i ) ( + O v 1+β(1 λ) Π i L q i ). We now choose < λ < 1 and β > with β α, (1 + β)r q in such a way that the order of the right hand side is maximized. In the case (1 + α)r q the minimum of all three exponents is maximized for the choice β = q r = νq r and λ = q γ+q. Then all three exponents are equal to 1 + γνq γ+q and thus ρ(u, v) L r ( = O v 1+ γνq γ+q Π i L q i ). (3.43) If, on the other hand, (1 + α)r < q then the third exponent is smaller than the second one for all < λ < 1 and < β α. Further, it is not difficult to see that under these constraints the first and third exponent become maximal for β = α and λ = α α+γν and attain the value 1 + αγν α+γν. Hence, ρ(u, v) L r = O ( v αγν 1+ α+γν Π i L q i ). (3.44) Combining (3.43) and (3.44) proves the β-order semismoothness of Λ with β as in (3.3) Semismooth Newton Methods The developed semismoothness results can be used to derive a superlinearly convergent Newton-type methods for the solution of the nonsmooth operator equation Ψ(y) = (3.45) with Ψ as defined in (3.2). In fact, any of the three variants of Newton methods that we developed and analyzed in section can be applied. We just have to choose Z = L r (Ω), f Ψ, and f Ψ. With these settings, the Algorithms 3.9, 3.13, and 3.17 are applicable to (3.45) and their convergence properties are stated in
66 3.3 Semismooth Newton Methods for Superposition Operators 61 the Theorems 3.12, 3.15, and 3.19, respectively. The semismoothness requirements on Ψ are ensured by Theorems 3.44 and 3.45 under Assumptions 3.27 and 3.3, respectively. For illustration, we restate the most general of these methods, Algorithm 3.17, when applied to reformulations (3.21) of the NCP (3.5). We also recall the local convergence properties of the resulting method. The results equally well hold for bilaterally constrained problems; the only difference is that the reformulation then requires an MCP-function instead of an NCP-function. For the reformulation of the NCP we work with an NCP-function φ which, together with the operator F, satisfies Assumption Further, we assume that we are given an admissible set K = {u L p (Ω) : a K u b K on Ω}, which contains the solution ū L p (Ω), and in which all iterates generated by the algorithm should stay. The requirements on the bounds a K and b K are: There exist measurable sets Ωa K, Ωb K Ω such that: a K = on Ω \ Ω K a, b K = + on Ω \ Ω K b, a K Ω K a L p (Ω K a ), b K Ω K b L p (Ω K b ). (3.46) Natural choices for K are K = L p (Ω) or K = B = {u L p (Ω) : u }. We define the projection P K : L p (Ω) K, P K (u) = P [ak (ω),b K (ω)](u) = max{a K (ω), min{u(ω), b K (ω)}}, which is easily seen to assign to each u L p (Ω) a function P K (u) K that is nearest to u in L p (for p <, P K (u) is the unique metric projection). Since P K (u) P K (v) u v pointwise on Ω, we see that P K (u) P K (v) L p u v L p for all u, v L p (Ω). In particular, since ū K, we see that P K (u) ū L p u ū L p for all u L p (Ω). Therefore, K and P K satisfy the Assumptions In section we developed Newton-like methods that are formulated in a twonorm framework by incorporating an additional space Y with Y Y. However, so far a rigorous justification for the necessity of two-norm techniques is still missing. We are now in a position to give this justification. In the current setting, we have Y = L p (Ω), and, as we will see, it is appropriate to choose Y = L r (Ω). Algorithm 3.17 then becomes:
67 62 3. Newton Methods for Semismooth Operator Equations Algorithm 3.57 (Projected Inexact Newton s Method for NCP).. Choose an initial point u K and set k =. 1. Choose an invertible operator B k L(L r (Ω), L r (Ω)), compute s k L r (Ω) from B k s k = Φ(u k ), and set u k+1 = u k + s k. 2. Perform a smoothing step: u k+1 Lr (Ω) u 1 k+1 = S k(u k+1 ) Lp (Ω). 3. Project onto K: u k+1 = P K (u 1 k+1 ). 4. If u k+1 = u k, then STOP with result u = u k Increment k by one and go to step 1. To discuss the role of the two-norm technique and the smoothing step, it is convenient to consider the special case of the semismooth Newton method with smoothing step as described in Algorithm 3.9, which is obtained by choosing K = L p (Ω) and B k = M k Φ(u k ). For well-definedness of the method, it is reasonable to require that the Newton equation M k s k = Φ(u k ) in step 1 always possesses a unique solution. Further, in the convergence analysis an estimate is needed that bounds the norm of s k in terms of Φ(u k ) L r. It turns out that the L p -norm is too strong for this purpose. In fact, recall that every operator M Φ(u) assumes the form M = d 1 I + d 2 F (u), with d L (Ω) 2, d(ω) φ ( u(ω), F (u)(ω) ). Now define Then for all ω Ω 1 holds Ω 1 = {ω Ω : d 2 (ω) = }. (Mv)(ω) = d 1 (ω)v(ω). This shows that Mv is in general not more regular (in the L q -sense) than v and vice versa. Therefore, it is not appropriate to assume that M Φ(u) is continuously invertible in L(L p, L r ), as the norm on L p is stronger than on L r. However, it is reasonable to assume that M is an L r -homeomorphism. This leads to regularity conditions of the form stated in Assumption 3.11 (i) or in Assumption 3.2 with Y = L r (Ω). As a consequence, in the convergence analysis we only have available the uniform boundedness of M 1 k Z,Y, and this makes a smoothing step necessary, as can be seen from the following chain of implications that we used in the proof of Theorem 3.12 (and its generalizations): M k s k = Φ(u k ), Φ(ū) =, v k = u k ū, v k = u k ū = M k v k+1 = ( Φ(ū + v k ) Φ(ū) M k v k )
68 3.3 Semismooth Newton Methods for Superposition Operators 63 = M k v k+1 L r = o( v k L p) (semismoothness) = vk+1 1 Lr Mk L r,l r M kvk+1 L r = o( v k L p) (regularity) = v k+1 L p = S k (u k+1) ū L p = O( v k+1 L r) = o( v k L p) (smoothing step). Therefore, we see that the two-norm framework of our abstract analysis in section is fully justified. Adapted to the current setting, the Assumptions 3.14 and 3.11 required to apply Theorem 3.19 now read as follows: Assumption 3.58 (Dennis-Moré condition for B k ). (i) There exist operators M k Φ(u k + s k ) such that (B k M k )s k L r = o( s k L r) as s k L p, (3.47) where s k L r (Ω) is the step computed in step 1. (ii) Condition (i) holds with (3.47) replaced by Assumption (B k M k )s k L r = O( s k 1+α L r ) as s k L p. (i) (Regularity condition) One of the following conditions holds: (a) The operators M k map L r (Ω) continuously into itself with bounded inverses, and there exists a constant C M 1 > such that M 1 k L r,l r C M 1. (b) There exist constants η > and C M 1 > such that, for all u (ū + ηb L p) K, every M Φ(u) is an invertible element of L(L r, L r ) with M 1 L r,l r C M 1. (ii) (Smoothing condition) The smoothing steps in step 1 satisfy for all k, where ū K solves (3.1). S k (u k+1 ) ū L p C S u k+1 ū L r Remark 3.6. In section 4.3 we establish sufficient conditions for regularity that are widely applicable and easy to apply. Remark In section 4.1 we discuss how smoothing steps can be computed. Further, in section 4.2 we propose a choice for φ which allows to get rid of the smoothing step. Since Φ is semismooth by Theorem 3.44 and locally Lipschitz continuous by Proposition 3.31, we can applying Theorem 3.19 to the current situation and obtain the following local convergence result:
69 64 3. Newton Methods for Semismooth Operator Equations Theorem Denote by ū K a solution of (3.1). Further, let the Assumptions 3.32, 3.58 (i), and 3.59 hold. Then: (i) There exists δ > such that, for all u (ū+δb L p) K, Algorithm 3.13 either terminates with u k = ū or generates a sequence (u k ) K that converges q- superlinearly to ū in L p (Ω). (ii) If in (i) the mapping Φ is α-order semismooth at ū, < α 1, and if Assumption 3.58 (ii) is satisfied, then the q-order of convergence is at least 1 + α Semismooth Composite Operators and Chain Rules This section considers the semismoothness of composite operators. There is a certain overlap with the result of the abstract Proposition 3.7, but we think it is helpful to study the properties of the generalized differential Φ in some more detail. We consider the scenario where G = H 1 H 2 is a composition of the operators H 1 : X i Lr i (Ω), H 2 : Y X, with X a Banach space, and where ψ = ψ 1 ψ 2 is a composition of the functions ψ 1 : R l R, ψ 2 : R m R l. We impose assumptions on ψ 1, ψ 2, H 1, and H 2 to ensure that G and ψ satisfy Assumption Here is one way to do this: Assumption There are 1 r r i < q i, 1 i m, such that (a) The operators H 1 : X i Lr i (Ω) and H 2 : Y X are continuously Fréchet differentiable. (b) The operator H 1 maps X locally Lipschitz continuously into L q i (Ω). (c) The functions ψ 1 and ψ 2 are Lipschitz continuous. (d) ψ 1 and ψ 2 are semismooth. It is straightforward to strengthen these assumptions such that they imply the Assumptions 3.3. For brevity, we will not discuss the extension of the next theorem to semismoothness of order β, which is easily established by slight modifications of the assumptions and the proofs. Theorem Let the Assumptions 3.63 hold and let G = H 1 H 2 and ψ = ψ 1 ψ 2. Then (i) G and ψ satisfy the Assumptions (ii) Ψ as defined in (3.2) is semismooth. (iii) The operator Ψ 1 : z X ψ ( H 1 (z) ) L r (Ω) is semismooth and the following chain rule holds: Ψ(y) = Ψ 1 ( H2 (y) ) H 2 (y) = {M 1H 2 (y) : M 1 Ψ 1 ( H2 (y) ) }.
70 3.3 Semismooth Newton Methods for Superposition Operators 65 (iv) If l = 1 and ψ 1 is strictly differentiable [32, p. 3] then the operator Ψ 2 : y Y ψ 2 ( G(y) ) L r (Ω) is semismooth and the following chain rule holds: Ψ(y) = ψ 1( Ψ2 (y) ) Ψ 2 (y) = {ψ 1( Ψ2 (y) ) M 2 : M 2 Ψ 2 (y)}. Proof. (i): 3.63 (a) implies 3.27 (a), 3.27 (b) follows from 3.63 (a),(b), 3.63 (c) implies 3.27 (c), and 3.27 (d) holds by 3.63 (d), since the composition of semismooth functions is semismooth. (ii): By (i), we can apply Theorem (iii): The Assumptions 3.63 imply the Assumptions 3.27 with H 1 and X instead of G and Y. Hence, Ψ 1 is semismooth by Theorem For the proof of the part of the chain rule, let M Ψ(y) be arbitrary. By definition, there exists a measurable selection d of ψ ( G(y) ) such that Now, since G i (y) = H 1i( H2 (y) ) H 2(y), M = i d i G i (y). M = d i H 1i( H2 (y) ) H 2 (y) = M 1H 2 (y), where i M 1 = i d i H 1i( H2 (y) ). (3.48) Obviously, we have M 1 Ψ 1 ( H2 (y) ). To prove the reverse inclusion, note that any M 1 Ψ 1 ( H2 (y) ) assumes the form (3.48) with appropriate measurable selection d ψ ( G(y) ). Then M 1 H 2(y) = i d i (H 1i( H2 (y) ) H 2(y) ) = i d i G i(y), which shows M 1 H 2(y) Ψ(y). (iv): Certainly, G and ψ 2 satisfy the Assumptions 3.27 (with ψ replaced by ψ 2 ). Hence, Theorem 3.44 yields the semismoothness of Ψ 2. We proceed by noting that a.e. on Ω holds ψ 1( Ψ2 (y)(ω) ) ψ 2 ( G(y)(ω) ) = ψ ( G(y)(ω) ), (3.49) where we have applied the chain rule for generalized gradients [32, Thm ] and the identity ψ 1 = {ψ 1 }, see [32, Prop ]. We first prove the direction of the chain rule. Let M 2 Ψ 2 be arbitrary. It assumes the form M 2 = ˆd i G i (y), i where ˆd ( ) L (Ω) m is a measurable selection of ψ 2 G(y). Now for any operator M contained in the right hand side of the assertion we have with d def = ψ 1( Ψ2 (y) ) ˆd M = ψ 1( Ψ2 (y) ) M 2 = i d i G i (y).
71 66 3. Newton Methods for Semismooth Operator Equations Obviously, d L (Ω) m and, by (3.49), d is a measurable selection of ψ ( G(y) ). Hence, M Ψ(y). Conversely, to prove, let M Ψ(y) be arbitrary and denote by d L (Ω) m the corresponding measurable selection of ψ ( G(y) ). Now let d L (Ω) m be a measurable selection of ψ 2 ( G(y) ) and define ˆd L (Ω) m by ˆd(ω) = d(ω) on Ω = {ω : ψ 1( Ψ2 (y)(ω) ) = }, ˆd(ω) = ψ 1 d(ω) ( Ψ2 (y)(ω) ) on Ω \ Ω. Then ˆd is measurable and d = ψ 1( Ψ2 (y) ) ( ) ˆd. Further, ˆd(ω) = d(ω) ψ2 G(y) on Ω and, using (3.49), ( d(ω) ˆd(ω) = ( ψ 1 Ψ2 (y)(ω) ) ψ 1 Ψ2 (y)(ω) ) ( ) ψ 2 G(y) ( ψ 1 Ψ2 (y)(ω) ) = ψ ( ) 2 G(y) on Ω \ Ω. Thus, ˆd is a measurable selection of ψ2 ( G(y) ), and consequently also ˆd L (Ω) m due to the Lipschitz continuity of ψ 2. Therefore, M 2 = i ˆd i G i(y) Ψ 2 (y) and thus M ψ 1( Ψ2 (y) ) Ψ 2 (y) as asserted Further Properties of the Generalized Differential We now establish that our generalized differential is convex-valued, weak compactvalued and weakly graph closed. These properties can provide a basis for future research on the connections between Ψ and other generalized differentials, in particular the Thibault generalized differential [135] and the Ioffe Ralph generalized differential [84, 123]. As weak topology on L(Y, L r ) we use the weak operator topology, which is defined by the seminorms M w, Mv Ω, v Y, w L r (Ω), the dual space of L r (Ω). The following result will be of importance. Lemma Under Assumption 3.27, the set K(y) defined in (3.26) is convex and weak sequentially compact in L (Ω) m for all y Y. Proof. From Lemma 3.41 we know that K(y) L ψ Bm L is nonempty and bounded. Further, the convexity of ψ(x) implies the convexity of K(y). Now let s k K(y) tend to s in L 2 (Ω) m. Then for a subsequence holds s k (ω) s(ω) for a.a. ω Ω. Since ψ ( u(ω) ) is compact, this implies that for a.a. ω Ω holds s(ω) ψ ( u(ω) ) and thus s K(y). Hence, K(y) is a bounded, closed, and convex subset of L 2 (Ω) m and therefore weak sequentially compact in L 2 (Ω) m. Therefore, K(y) is also weak sequentially closed in L (Ω) m, for, if (s k ) K(y) converges weakly to s in L (Ω) m, then w, s k s Ω for all w L 1 (Ω) m L 2 (Ω) m, showing that s k s weakly in L 2 (Ω) m. Thus, K(y) is weak sequentially closed and bounded in L (Ω) m. Since L 1 (Ω) m is separable, this yields that K(y) is weak sequentially compact.
72 3.3 Semismooth Newton Methods for Superposition Operators 67 Convexity and Weak Compactness As further useful properties of Ψ we establish the convexity and weak compactness of its images: Theorem Under the Assumptions 3.27, the generalized differential Ψ(y) is nonempty, convex, and weakly sequentially compact for all y Y. If Y is separable, then Ψ(y) is also weakly compact for all y Y. Proof. The nonemptyness was already stated in Theorem The convexity follows immediately from the convexity of the set K(y) derived in Lemma We now prove weak sequential compactness. Let (M k ) Ψ(y) be any sequence. Then M k = i d ki G i (y) with d k K(y), see (3.26). Lemma 3.65 yields that K(y) is weak sequentially compact in L (Ω) m. Hence, we can select a subsequence such that (d k ) converges weak to d K(y) in L (Ω) m. Define M = i d i G i(y) and observe that M Ψ(y), since d K(y). It remains to prove that M k M weakly. Let w L r (Ω) = L r (Ω) and v Y be arbitrary. We set z i = w G i (y)v and note that z i L 1 (Ω). Hence, w, (M k M )v Ω i w, (d k d ) i G i (y)v Ω = i z i, (d k d ) i Ω as k. (3.5) Therefore, the weak sequential compactness is shown. By Lemma 3.37, Ψ(y) is contained in a closed ball in L(Y, L r ), on which the weak topology is metrizable if Y is separable (note that 1 r < implies that L r (Ω) is separable). Hence, in this case the weak compactness follows from the weak sequential compactness. Weak Graph Closedness of the Generalized Differential Finally, we prove that the multifunction Ψ is weakly graph closed: Theorem Let the Assumptions 3.27 be satisfied and let (y k ) Y and (M k ) L(Y, L r (Ω)) be sequences such that M k Ψ(y k ) for all k, y k y in Y, and M k M weakly in L(Y, L r (Ω)). Then holds M Ψ(y ). If, in addition, Y is separable, then the above assertion also holds if we replace the sequences (y k ) and (M k ) by nets. Proof. Let y k y in Y and Ψ(y k ) M k M weakly. We have the representations M k = i d ki G i (y k) with measurable selections d k of ψ(u k ), where u k = G(y k ). We also introduce u = G(y ). The multifunction ω Ω ψ ( u (ω) ) is closed-valued (even compact-valued) and measurable. Furthermore, the function
73 68 3. Newton Methods for Semismooth Operator Equations (ω, h) d k (ω) h 2 is a normal integrand on Ω R m [129, Cor. 2P]. Hence, by [129, Thm. 2K], the multifunctions S k : Ω R m, S k (ω) = arg min d k (ω) h 2 h ψ(u (ω)) are closed-valued (even compact-valued) and measurable. We choose measurable selections s k of S k. The sequence (s k ) is contained in the, by Lemma 3.65, sequentially weak compact set K(y ) L (Ω) m. Further, by Lemma 3.41, we have d k L ψ Bm L. Hence, by transition to subsequences we achieve s k s K(y ) weak in L (Ω) m and d k d L ψ Bm L weak in L (Ω) m. Therefore, (d k s k ) ( d s) weak in L (Ω) m and thus also weakly in L 2 (Ω) m. Since u k u in i Lq i (Ω), we achieve by transition to a further subsequence that u k u a.e. on Ω. Hence, since d k (ω) ψ ( u k (ω) ) for a.a. ω Ω and ψ is upper semicontinuous, we obtain from the construction of s k that (d k s k ) a.e. on Ω. The sequence (d k s k ) is bounded in L (Ω) m and thus the Lebesgue convergence theorem yields (d k s k ) in L 2 (Ω) m. From (d k s k ) and (d k s k ) ( d s) weakly in L 2 (Ω) m we see d = s. We thus have d k d = s K(y ) weak in L (Ω) m. def This shows that M = d i i G i (y ) Ψ(y ). It remains to prove that M k M weakly. To show this, let w L r (Ω) = L r (Ω) and v Y be arbitrary. Then with z ki = w G i (y k)v and z i = w G i (y )v holds z ki, z i L 1 (Ω) and z ki z i L 1 w L r G i(y k )v G i(y )v L r as k. Hence, we obtain similar as in (3.5) w, (M k M)v Ω w, d ki G i i (y k)v d i G i (y )v Ω = (dki, z ki Ω d i, z i Ω i i ( di d ki, z i Ω + d ki L z i z ki L 1) as k. This implies M = M Ψ(y ) and completes the proof of the first assertion. Now let (y κ ) Y and (M κ ) L(Y, L r (Ω)) be nets such that M κ Ψ(y κ ) for all κ, y κ y in Y, and M κ M weakly in L(Y, L r (Ω)). Since (y κ ) finally stays in any neighborhood of y and since G is continuous, we see from (3.25) that w.l.o.g. we may assume that (M κ ) is contained in a bounded ball B L(Y, L r ). Since, due to the assumed separability of Y, B is metrizable with respect to the weak topology, we see that we can work with sequences instead of nets.
74 4. Smoothing Steps and Regularity Conditions The analysis of semismooth Newton methods used three ingredients, semismoothness, a smoothing step and a regularity condition. In this chapter we show how smoothing steps can be obtained in practice and also describe a particular method that does not require a smoothing step at all. Furthermore, we establish sufficient conditions that imply the regularity condition stated in Assumption Smoothing Steps We consider the VIP (1.1) under the assumptions stated there. It was already observed in earlier work [95, 143], and it can be verified by considering the applications encountered so far, that many problems of practical interest can be stated as a VIP (1.1) with the operator F meeting the following requirement: Assumption 4.1. The operator F has the form F (u) = λu + G(u), where λ is positive and and G : L r (Ω) L p (Ω), p > r, is locally Lipschitz continuous. Note that G(u) lives in a smoother space than its preimage u, since L p (Ω) L r (Ω) (using that Ω is bounded) with nonequivalent norms. This form of G arises, e.g., in the first-order necessary optimality conditions of a large class of optimal control problems with bounds on the control and L 2 -regularization [95, 141, 143]. For obtaining smoothing steps, we borrow an idea from Kelley and Sachs [95]. Since φ E [α,β] (x) = x 1 P [α,β] (x 1 x 2 ) is an MCP-function, we know that ū L p (Ω) solves the VIP (1.1) if and only if S(ū) = ū, where S(u) def = P B (u λ 1 F (u) ), P B (u) = max{a, min{u, b}}. (4.1) Further, for all u L r (Ω) we have u λ 1 F (u) = λ 1 G(u) L p (Ω), and therefore S(u) = P B ( λ 1 G(u) ). We now use that for all v, w L p (Ω) there holds pointwise P B (v) P B (w) v w, and thus P B (v) P B (w) L p v
75 7 4. Smoothing Steps and Regularity Conditions w L p. Further, G is Lipschitz continuous (with rank L G ) on an L r -neighborhood of ū. Hence, for all u L r (Ω) in this neighborhood, we obtain S(u) ū L p = S(u) S(ū) L p = P B ( λ 1 G(u)) P B ( λ 1 G(ū)) L p This shows λ 1 G(u) G(ū) L p L G λ 1 u ū L r. Theorem 4.2. Let the Assumption 4.1 hold and define S by (4.1). Then in any L r - neighborhood of ū on which G is Lipschitz continuous (with rank L G ) the mapping u k L r def (Ω) u k = S(u k) L p (Ω) is a smoothing step in the sense of Assumption 3.11 (ii) with constant C S = L G /λ. The applicability of this approach to concrete problems is discussed in the application chapters 7 and 8. Here we only consider the introductory example control problem (1.11) of section There, see Remark 3.34, we have F (u) = λu w(u), where w(u) H 1 (Ω) is the adjoint state, which depends continuously and affine linearly on u L 2 (Ω). Since H 1(Ω) Lp (Ω) for appropriate p > 2, the described scenario is given with G(u) = w(u), and r = A Newton Method without Smoothing Steps We now describe how a variant of the MCP-function φ E can be used to derive a semismooth reformulation of VIP to which a Newton method without smoothing step can be applied. In fact, the very same idea used in the construction of smoothing steps can be adopted. Hereby, we assume that F has the same structure as in the previous section 4.1. The simple idea is to reformulate (1.1) equivalently as u S(u) =, (4.2) and to establish the semismoothness of the operator u L r (Ω) u S(u) L r (Ω). Remark 4.3. In the recent report [78], Hintermüller, Ito, and Kunisch observe in the context of bound-constrained linear-quadratic control problems that semismooth Newton methods applied to (4.2) are identical to the class of primal dual methods developed in [14, 15]. Numerical tests in these papers have proven the excellent efficiency of this class of methods, and thus underlines the potential and importance of semismooth Newton methods. These positive results are confirmed by all our numerical tests, see chapter 7. Theorem 4.4. Let F : L r (Ω) L r (Ω) be continuously differentiable and let Assumption 4.1 hold. Define the operator Φ : u L r (Ω) u S(u) L r (Ω),
76 4.2 A Newton Method without Smoothing Steps 71 with S as defined in (4.1). Then Φ is locally Lipschitz continuous and Φ-semismooth, with Φ(u) consisting of all M L(L r, L r ) of the form with d L (Ω), M = I + λ 1 dg (u), d(ω) P [a(ω),b(ω)] ( λ 1 G(u)(ω)), ω Ω. (4.3) If F is α-order Hölder continuous, α (, 1], then Φ is β-order semismooth with β as given in Theorem Proof. We introduce the disjoint measurable partitioning Ω = Ω f Ω l Ω u Ω lu, Ω f = Ω \ (Ω a Ω b ), Ω l = Ω a \ Ω b, Ω u = Ω b \ Ω a, Ω lu = Ω a Ω b. Now, set ā = a on Ω a and ā =, otherwise, b = b on Ω b and b = 1, otherwise. Since Ψ f (u) = λ 1 G(u) maps L r (Ω) continuously differentiable to L r (Ω), Ψ f is locally Lipschitz continuous and ( λ 1 G )-semismooth. Further, we have S(u) = Ψ f (u) on Ω f. Hence, by Proposition 3.7, 1 Ω f S is locally Lipschitz continuous and [1 Ω f ( λ 1 G )]-semismooth. Obviously, this generalized differential consists of all operators of the form [1 Ω f ( λ 1 dg )] with d as in (4.3). Next, we set ψ l (t) = max{, t} and define Ψ l : L r (Ω) L r (Ω), Ψ l (u) = ψ l( λ 1 G(u) ā ). By Proposition 3.31 and Theorem 3.44, this operator is locally Lipschitz continuous and Ψ l -semismooth. Furthermore, there holds S(u) = ā + Ψ l (u) on Ω l, and thus 1 Ω l S is locally Lipschitz continuous and (1 Ω l Ψ l )-semismooth by Propositions 3.4 and 3.7. Looking at the structure of Ψ l we see that (1 Ω l Ψ l ) is the set of all operators 1 Ω l [ λ 1 dg (u)], where d L (Ω) satisfies (4.3). In fact, for ω Ω l holds with α = ā(ω) = a(ω) and thus P [a(ω),b(ω)] (t) = max{α, t} = α + max{, t α} = α + ψ l (t α), P [α, ) (t) = ψ l (t α). In a completely analogous way, we see that 1 Ω u S is locally Lipschitz continuous and (1 Ω u Ψ u )-semismooth, where the latter differential is the set of all operators with d L (Ω) as in (4.3). 1 Ω u [ λ 1 dg (u)]
77 72 4. Smoothing Steps and Regularity Conditions Finally, we consider ω Ω lu. For α = ā(ω) = a(ω), β = b(ω) = b(ω) we have P [a(ω),b(ω)] (t) = max{α, min{t, β}} = α + max{, min{t α, β α} ( ) t α = α + (β α)ψ lu β α with ψ lu (t) = max{, min{t, 1}} = P [,1] (t). We conclude for ω Ω lu [ ( )] ( ) t α t α P [a(ω),b(ω)] (t) = (β α) t ψ lu = ψ lu. (4.4) β α β α Now define ( ) Ψ lu (u) = ψ lu λ 1 G(u) + ā. b ā By Proposition 3.31 and Theorem 3.44, this operator is locally Lipschitz continuous and Ψ lu -semismooth. Furthermore, there holds 1 Ω lu S = 1 Ω lu [ā + ( b ā) Ψ lu ]. We use once again Propositions 3.4 and 3.7 to conclude that 1 Ω lu S is locally Lipschitz continuous and (1 Ω lu ( b ā) Ψ lu )-semismooth. From (4.4) we see that this differential is the set of all operators 1 Ω lu [ λ 1 dg (u)], where d L (Ω) satisfies (4.3). Now, since u S(u) = u 1 Ω f S(u) 1 Ω l S(u) 1 Ω u S(u) 1 Ω lu S(u), we can apply Proposition 3.4 to complete the proof of the first assertion. If F is α-hölder continuous, then it is straightforward to modify the proof to establish semismoothness of order β >. Therefore, we can apply the Newton methods of section to solve the reformulation (4.2) of the VIP. A smoothing step is not required, since Φ is semismooth as a mapping from L r = L 2 into itself, and, as we will demonstrate for NCPs in section 4.3, it is appropriate to use Assumption 3.59 (i), i.e., the uniformly bounded invertibility of the generalized differentials in L(L 2, L 2 ) as regularity condition. 4.3 Sufficient Conditions for Regularity In this section we establish a sufficient condition for solutions of the NCP (1.4), posed in the usual setting of (1.1), that implies the following regularity condition: Assumption 4.5. There exist constants η > and C M 1 > such that, for all u ū + ηb L p, every M Φ(u) is an invertible element of L(L 2, L 2 ) with M 1 L 2,L 2 C M 1.
78 4.3 Sufficient Conditions for Regularity 73 Hereby, Φ = φ(u, F (u)) is the superposition operator arising in the semismooth reformulation via the NCP-function φ. We consider problems where F has the form F (u) = λu + G(u), and G has a smoothing property. In this setting, we show that, in broad terms, regularity is implied by L 2 -coercivity of F (ū) on the tangent space of the strongly active constraints. An alternative sufficient condition for regularity, which does not require special structure of F, but assumes that F (ū) is L 2 -coercive on the whole space, can be found in the author s paper [141]. We work under the following assumptions: Assumption 4.6. There exist p [2, ] and p (2, ] such that: (a) F (u) = λu + G(u), λ L (Ω), λ λ >, (b) G : L 2 (Ω) L 2 (Ω) is Fréchet differentiable with derivative G (u). (c) u L p (Ω) G (u) L(L 2 (Ω), L 2 (Ω)) is continuous near ū. (d) For u near ū in L p (Ω), the L 2 -endomorphisms G (u) and G (u) are contained in L(L 2 (Ω), L p (Ω)) with their norms uniformly bounded by a constant C G. (e) There exists a constant ν > such that for F (ū) = λi + G (u) holds (v, F (ū)v) L2 (Ω) ν v 2 L 2 (Ω) for all v L 2 (Ω) with v = on {ω Ω : F (ū)(ω) = }. (f) φ is Lipschitz continuous and semismooth. (g) There exists a constant θ > such that for all x R 2 and all g φ(x) holds g 1 g 2, g 1 + g 2 θ. (h) For x (, ) {} holds φ(x) {} R, and for x {} (, ) holds φ(x) R {}. Remark 4.7. In the case of a minimization problem, i.e., F = j, condition (e) can be interpreted as a strong second order sufficient condition: The Hessian operator j (ū) has to be coercive on the tangent space of the strongly active constraints. Similar conditions can be found in, e.g., Dunn, Tian [46] and Ulbrich, Ulbrich [143]. Strong second order sufficient conditions are also essential for proving fast convergence of finite-dimensional algorithms, see, e.g., [19, 77, 15]. Observe that Assumption 4.6 with p > 2 implies Assumption 3.32 with r = 2 and p = min{p, p } on an L p -neighborhood of ū. Hence, Φ : L p (Ω) L 2 (Ω) is semismooth at ū by Theorem In fact, (a) (c) imply Assumption 3.32 (a). Further, for u, u + v L p (Ω) near ū holds with s = min{p, p }, using (d), F (u + v) F (u) L s 1 F (u + tv)v L sdt c λ L v L p + c sup G (u + tv) L p,l v p L p t [,1] c( λ L + C G ) v L p,
79 74 4. Smoothing Steps and Regularity Conditions which implies Assumption 3.32 (b) for p = s. Finally (f) ensures Assumption 3.32 (c),(d). Next, we illustrate the Assumptions 4.6 by verifying them for the control problem (1.11). There, F (u) = j (u) = λu w(u), where w(u) = 1 (y y d ) = 1 ( 1 u + y d ) H 1 (Ω) (4.5) is the adjoint state. The mapping u L 2 (Ω) w(u) H 1 (Ω) is continuous and affine linear. Thus, choosing p > 2 such that H 1(Ω) Lp (Ω), F has the form as in assumption (a) with G : u L 2 (Ω) w(u) L p (Ω) being continuous affine linear. Therefore, G is smooth and G L(L 2, L p ) is constant. From (4.5) we see that G (u) = 1 1, hence G (u) = G (u) and, with z H 1 (Ω) solution of z = v, we have (F (u)v, v) L 2 = (G (u)v, v) L 2 + (λv, v) L 2 z 2 L 2 + λ v 2 L 2 λ v 2 L 2. Taking all together, we see that (a) (e) are satisfied for any p [2, ]. We now establish our sufficient condition for regularity: Theorem 4.8. If Assumption 4.6 holds at a solution ū L p (Ω) of the NCP (1.4) then there exists ρ > such that Assumption 4.5 is satisfied. Proof. For convenience, we set (, ) = (, ) L2 (Ω) and = L2 (Ω). Any element M Φ(u) can be written in the form M = d 1 I + d 2 F (u), d i L (Ω), (d 1, d 2 ) φ(u). (4.6) Due to the Lipschitz continuity of φ, the functions d 1, d 2 are bounded in L (Ω) uniformly in u. We define d 2 c =, (4.7) d 1 + λd 2 which, since by assumption d 1 d 2, θ d 1 +d 2, and λ λ >, is well-defined and uniformly bounded in L (Ω) for all u L p (Ω). Using F (u) = λi + G (u), we see that M = (d 1 + λd 2 ) (I + c G (u)). Since (d 1 + λd 2 ) and (d 1 + λd 2 ) 1 are uniformly bounded in L (Ω) for all u L p (Ω), the operators M Φ(u) are continuously invertible in L(L 2 (Ω), L 2 (Ω)) on an L p -neighborhood of ū with uniformly bounded inverses if and only if the same holds true for the operators T = I + c G (u). Next, consider any M Φ(ū) with corresponding functions d 1, d 2, c L (Ω) according to (4.6) and (4.7). Define the sets Ω 1 = {(ū, F (ū)) }, Ω 2 = {ū =, F (ū) = }, and consider the function e L (Ω), e = c on Ω 1, e = c on Ω 2. (4.8)
80 We first prove that, for arbitrary t [1, ), 4.3 Sufficient Conditions for Regularity 75 c e L t as u ū in L p (Ω). (4.9) Assume that this is not true. Then there exist t 1, ε > and a sequence (u k ) L p (Ω) with u k ū in L p (Ω) and corresponding differentials M k Φ(u k ) such that c k e k L t ε k. (4.1) Hereby, we denote by d 1k, d 2k, c k, and e k the associated functions defined in (4.6), (4.7), and (4.8). From u k ū follows F (u k ) F (ū) in L min{p,p } (Ω). Hence, there exists a subsequence such that (u k, F (u k )) (ū, F (ū)) a.e. on Ω. Since ūf (ū) =, we have the disjoint partitioning Ω 1 = Ω 11 Ω 12 with Ω 11 = {F (ū) } = {ū =, F (ū) }, Ω 12 = {ū } = {ū, F (ū) = }. On the set Ω 11 we have (a.e.) u k, F (u k ) F (ū) and thus, by the upper semicontinuity of φ and the assumptions on φ, d 1k d 1, d 2k, which implies c k = c on Ω 11. Since Ω has finite measure and the sequence (c k ) is bounded in L (Ω), the Lebesgue convergence theorem implies c k c Lt (Ω 11 ). (4.11) On the set Ω 12 holds u k ū, F (u k ) F (ū) = and thus, again using the properties of φ, d 1k = d 1, d 2k d 2, which implies c k 1/λ = c. Invoking Lebesgue s convergence theorem once again we see that c k c L t (Ω 12 ). (4.12) Then it is an immediate consequence of (4.11) and (4.12) that c k e k L t (Ω) = c k c L t (Ω 1 ) c k c L t (Ω 11 ) + c k c L t (Ω 12 ), which contradicts (4.1). Thus, (4.9) is proved. We now consider the operators T = I + c G (u) and S = I + e G (ū). For all v L 2 (Ω) holds (with 2p /(p 2) to be interpreted as 2 if p = ) T v Sv (c e) G (ū)v + c (G (u)v G (ū)v) This proves c e c e L L 2p p 2 2p p 2 G (ū)v L p + c L G (u)v G (ū)v G (ū) L2,L p v + c L G (u) G (ū) L 2,L 2 v. T S L 2,L 2 as u ū in Lp (Ω). (4.13)
81 76 4. Smoothing Steps and Regularity Conditions Next, we prove S v γ v v L 2 (Ω), (4.14) where γ = 1 if G (ū) = and γ = min{νκ, 1/2}, κ = 1 2 G (ū) L2,L 2 if G (ū). The assertion is trivial if G (ū) =. To prove the assertion for G (ū), we set w = ev and distinguish two cases. Case 1: w κ v. Then S v = v + G (ū) (ev) v G (ū) w (1 κ G (ū) L2,L 2) v 1 v γ v. 2 Case 2: w > κ v : Since w = ev and e = c = on Ω 11, we have w = on Ω 11 and thus, by (e), (w, (λi + G (ū) )w) ν w 2. In the calculations to follow we will use that 1 λe = 1 on Ω 11, 1 λe = 1 λ c = on Ω 12, 1 λe = 1 λc = d 1 + λd 2 λd 2 d 1 + λd 2 = In particular, 1 λe on Ω, and thus w S v (w, S v) = (w, v) + (w, G (ū) w) d 1 d 1 + λd 2 on Ω 2. (w, v) + ν w 2 (w, λw) = (w, (1 λe)v) + ν w 2 = (v, e(1 λe)v) + ν w 2 ν w 2 νκ w v γ w v. Hence, (4.14) is proved. In particular, S is injective. Moreover, S has closed range. In fact, let S v k z. Then v k v l γ 1 S v k S v l as k, l. Therefore, v k v and S v k S v, hence z = S v. By the closed range theorem [91, Ch. XII], the injectivity of S now implies the surjectivity of S. We proceed by showing the injectivity of S. Consider any v L 2 (Ω) with Sv =. Let us introduce the function z L p (Ω), Observing that z = on Ω 11, z = G (ū)v on Ω 12 Ω 2. (4.15)
82 and e = on Ω 11, we see that 4.3 Sufficient Conditions for Regularity 77 v = Sv e G (ū)v = e G (ū)v on Ω, v = ez on Ω, and that v vanishes on Ω 11. Therefore, using (e), = (z, Sv) = (z, v) + (z, e G (ū)v) = (z, v) + (ez, G (ū)v) = (z, v) + (v, G (ū)v) (z, v) + ν v 2 (v, λv) = ν v 2 + (z λez, ez) = ν v 2 + (z, (1 λe)ez) ν v 2, since (1 λe)e. This implies v =, which proves the injectivity of S. We thus have shown the S L(L 2 (Ω), L 2 (Ω)) is bijective and hence, by the open mapping theorem, continuously invertible. Furthermore, for all v L 2 (Ω) we have v = S (S ) 1 v γ (S ) 1 v, and thus S 1 L2,L 2 = (S ) 1 L2,L 2 1 γ. By (4.13), there exists ρ > such that for all u L p (Ω), u ū L p ρ, holds T S L2,L 2 γ/2. Therefore, by Banach s theorem [91, Ch. V.4.6], T L(L 2 (Ω), L 2 (Ω)) is invertible with T 1 L2,L 2 S 1 L2,L 2 1 S 1 L2,L 2 T S L 2,L 2 2 γ. The sufficient condition of Theorem 4.8 and the sufficient condition for regularity established in [141] are very helpful in establishing regularity for concrete applications.
83 5. Variational Inequalities and Mixed Problems So far, we have demonstrated the applicability of semismooth Newton methods mainly for the NCP (1.4). We now discuss several applications to more general classes of problems. First, we show how the semismooth reformulation approach that we investigated in detail for the NCP can be extended to the larger problem class of bound-constrained VIPs (1.1). In addition, we describe how semismooth reformulations can be obtained for even more general problems than the bound-constrained VIP. The second extension considers mixed problems consisting of VIPs and and additional operator equations. In particular, the first order necessary (Karush Kuhn Tucker, KKT) conditions of very general optimization problems can be written in this form. 5.1 Application to Variational Inequalities Problems with Bound-Constraints We now describe how our treatment of the NCP can be carried over to the boundconstrained VIP (1.1). One possibility was already described in section 4.2, where we presented a semismooth reformulation that does not require a smoothing step. Here, we describe a similar approach for which general NCP- and MCP-functions can be used. For the derivation of a semismooth reformulation, let be given an NCP-function φ and MCP-functions φ [α,β] for all compact intervals. We now define the operator F (u)(ω) on Ω f = Ω \ (Ω a Ω b ), φ ( u(ω) a(ω), F (u)(ω) ) on Ω l = Ω a \ Ω b, Φ(u)(ω) = φ ( b(ω) u(ω), F (u)(ω) ) on Ω u = Ω b \ Ω a, ( ) φ [a(ω),b(ω)] u(ω), F (u)(ω) on Ω lu = Ω a Ω b. It was shown in section 1.2 that u L p (Ω) solves (1.1) if and only if (5.1) Φ(u) =. (5.2) Our aim is to establish the semismoothness of Φ and to characterize its generalized differential. Hereby, we require:
84 8 5. Variational Inequalities and Mixed Problems Assumption 5.1. There exist r [1, p) [1, p ) such that (a) The mapping u L p (Ω) F (u) L r (Ω) is continuously differentiable. (b) The operator F : L p (Ω) L p (Ω) is locally Lipschitz continuous. (c) The function φ : R 2 R is Lipschitz continuous and semismooth. (d) The function x ψ [x1,x 2 ](x 3, x 4 ) is Lipschitz continuous and semismooth. For semismoothness of higher order we need slightly stronger requirements. Assumption 5.2. There exists r [1, p) [1, p ) and α (, 1] such that (a) The mapping u L p (Ω) F (u) L r (Ω) is differentiable with locally α- Hölder continuous derivative. (b) The operator F : L p (Ω) L p (Ω) is locally Lipschitz continuous. (c) The function φ : R 2 R is Lipschitz continuous and α-order semismooth. (d) The function x ψ [x1,x 2 ](x 3, x 4 ) is Lipschitz continuous and α-order semismooth. Remark 5.3. At this point it would be more convenient if we had established semismoothness results for superposition operators of the form ψ(ω, G(u)(ω)). This is certainly possible, but not really needed in this work. Instead, the trick we will use here is to build superposition operators with the inner operator given by u (ā, b, u, F (u)), where ā, b are cutoff versions of a and b to make them finite. A different approach would be to transform the problem such that [a, b] [, 1] on Ω a Ω b and [a, b] [, ) on (Ω a Ω b )\(Ω a Ω b ). There is however, a certain danger that this transformation affects the scaling of the problem in a negative way. The latter approach was implicitly used in the proof of Theorem 4.4. Theorem 5.4. Under Assumption 5.1 the operator Φ : L p (Ω) L r (Ω) is locally Lipschitz continuous and Φ-semismooth, where Φ(u) consists of all operators M L(L p, L r ) of the form M = d 1 I + d 2 F (u), with d 1, d 2 L (Ω), {(, 1)} on Ω f, φ ( u(ω) a(ω), F (u)(ω) ) on Ω l, (d 1, d 2 )(ω) φ ( b(ω) u(ω), F (u)(ω) ) on Ω u, ( ) φ [a(ω),b(ω)] u(ω), F (u)(ω) on Ω lu. (5.3) Under Assumption 5.2 the operator Φ is even β-order semismooth, where β > is as in Theorem Proof. Let us define ā, b L p (Ω) by ā = a on Ω a, ā =, otherwise, b = b on Ω b, b =, otherwise. Further, let
85 ψ f (x) = x 4, ψ l (x) = φ(x 3 x 1, x 4 ), 5.1 Application to Variational Inequalities 81 ψ u (x) = φ(x 2 x 3, x 4 ), ψ lu (x) = φ [x1,x 2 ](x 3, x 4 ), which are Lipschitz continuous and semismooth. Define T : u L p (Ω) (ā, b, u, F (u)) L r (Ω) 4, which is continuously differentiable with derivative T (u) = ( I F (u)), and locally Lipschitz continuous as a mapping L p (Ω) L p (Ω) 3 L p (Ω). Next, for γ {f, l, u, lu}, we introduce the superposition operators Ψ γ : L p (Ω) L r (Ω), Ψ γ (u)(ω) = ψ γ( T (u)(ω) ). By Proposition 3.31 and Theorem 3.44, these operators are Ψ γ -semismooth; hereby, the operator M γ L(L r, L r ) is an element of Ψ γ (u) if and only if M γ = (d γ a, dγ b, dγ 1, dγ 2 ) T (u) = d γ 1 I + dγ 2 F (u), where d γ a, dγ b, dγ 1, dγ 2 L (Ω) satisfy (d γ a, dγ b, dγ 1, dγ 2 ) ψγ (T (u)) on Ω. We now use [32, Prop ], a direct consequence of Proposition 2.3, to conclude (x3,x 4 )ψ γ (x) {g R 2 : h R 2 : (h, g) ψ γ (x)}. Now let d 1, d 2 L (Ω) be arbitrary such that (5.3) holds. Then holds (d 1, d 2 ) (x3,x 4 )ψ γ (T (u)) on Ω γ. Therefore, using Filippov s theorem [11, Thm ], we conclude that there exist d γ a, d γ b L (Ω) with This shows (d γ a, d γ b, d 1, d 2 ) ψ γ (T (u)) on Ω γ, γ {f, l, u, lu}. Finally, we define H L([L r ] 4, L r ), and observe that 1 Ω γ [d 1 I + d 2 F (u)] 1 Ω γ Ψ γ (u). (5.4) Hv = 1 Ω f v Ω lv Ω uv Ω luv 4. Φ(u) = H ( Ψ f (u), Ψ l (u), Ψ u (u), Ψ lu (u) ). Thus, Φ is locally Lipschitz continuous. Application of the direct product rule and the chain rule, Propositions 3.5 and 3.7 (note that H H is bounded), we conclude that Φ is H ( Ψ f Ψ l Ψ u Ψ lu )-semismooth and that, by (5.4), this generalized differential contains all M L(L r, L r ) of the form M = d 1 I + d 2 F (u), where d 1, d 2 L (Ω) satisfy (5.3). If Assumption 5.2 holds, then it is straightforward to modify the proof to establish semismoothness of order β >.
86 82 5. Variational Inequalities and Mixed Problems It should be immediately clear from our detailed discussion of NCPs in previous sections how the semismooth reformulation (5.2) can be used to apply our class of semismooth Newton methods. The resulting algorithm looks exactly like Algorithm 3.57, with the only difference that Φ is defined by (5.1). Also the regularity condition of Assumption 3.59 is appropriate and the assertions of Theorem 3.62 can be established as well. We now discuss ways of choosing φ and φ [α,β]. Consider any NCP-function φ that is positive on (, ) 2 and negative on R 2 \ [, ) 2, Then the following construction, which was proposed by Billups [18] for φ = φ F B, can be used to obtain an MCP-function φ [α,β], < α < β < + : φ [α,β] (x) = φ ( x 1 α, φ(β x 1, x 2 ) ). (5.5) Proposition 5.5. Let φ be an NCP-function that is positive on (, ) 2 and negative on R 2 \ [, ) 2. Then, for any interval [α, β], < α < β <, the function φ [α,β] (x) defined in (5.5) is an MCP-function. Proof. We have to show that φ [α,β] (x) = holds if and only if α x 1 β, (x 1 α)x 2, (x 1 β)x 2. (5.6) To this end, observe that φ [α,β] (x) = is equivalent to x 1 α, φ(β x 1, x 2 ), (x 1 α)φ(β x 1, x 2 ) =, (5.7) where we have used the fact that φ is an NCP-function. For x 1 < α, (5.6) and (5.7) are both violated. For x 1 = α, we use the assumptions on φ to obtain Finally, for x 1 > α, Then (5.6) x 2 φ(β α, x 2 ) (5.7). (5.6) x 1 β, x 2, (x 1 β)x 2 φ(β x 1, x 2 ) = (5.7). We demonstrate this construction for φ(x) = φ E (x) = x 1 P [, ) (x 1 x 2 ) = min{x 1, x 2 }. φ [α,β] (x) = min{x 1 α, min{β x 1, x 2 }} = min{x 1 α, max{x 1 β, x 2 }} = x 1 P [α,β] (x 1 x 2 ) = φ E [α,β] (x). Therefore, starting with the projection-based NCP-function φ E, we obtain the projection-based MCP-function φ E [α,β]. Concerning the concrete calculation of φe and φ E [α,β], we have
87 5.1 Application to Variational Inequalities 83 Proposition 5.6. The function φ E is piecewise affine linear on R 2 and affine linear on the sets {x : x 1 < x 2 }, {x : x 1 > x 2 }. There holds: φ E (x) = B φ E (x) = {φ E (x)} = {(1, )} for x 1 < x 2, φ E (x) = B φ E (x) = {φ E (x)} = {(, 1)} for x 1 > x 2, B φ E (x) = {(1, ), (, 1)}, φ E (x) = {(t, 1 t) : t 1} for x 1 = x 2. The function φ E [α,β] is piecewise affine linear on R2 and affine linear on the connected components of {x : x 1 x 2 α, x 1 x 2 β}. There holds: φ E [α,β] (x) = Bφ E [α,β] (x) = {φe [α,β](x)} = {(1, )} for φ E [α,β] (x) = Bφ E [α,β] (x) = {φe [α,β](x)} = {(, 1)} } B φ E [α,β](x) = {(1, ), (, 1)}, (x) = {(t, 1 t) : t 1} φ E [α,β] x 1 x 2 / [α, β], for x 1 x 2 (α, β), for x 1 x 2 {α, β}. Proof. This is an immediate consequence of Proposition The generalized differential of φ F B was already derived in section In a similar way, it is possible to obtain formulas for the generalized differential of φ F [α,β] B, see [54] Pointwise Convex Constraints More general than bound constraints, we can consider pointwise convex constraints, i.e., the feasible set C is given by C = {u L p (Ω) m : u(ω) C on Ω}, (5.8) with p > 1, where C R m is a nonempty closed convex set and, as throughout this work, Ω is bounded and measurable with µ(ω) >. Equally well, we could consider sets C consisting of all u L p (Ω) m with u(ω) C(ω) on Ω, with the multifunction C having suitable properties. For convenience, however, we restrict our discussion to the case (5.8). We wish to solve the following problem: Variational Inequality with Pointwise Convex Constraints: u C, F (u), v u v C, (5.9) with the same assumptions as in (1.1), but F being an operator between m-dimensional spaces, i.e., F : L p (Ω) m L p (Ω) m, 1/p + 1/p 1 and u, v = Ω u(ω)t v(ω)dω. The set C is defined in (5.8). Suppose that a continuous function π : R m R m R m is available with the property π(x 1, x 2 ) = x 1 = P C (x 1 x 2 ), (5.1)
88 84 5. Variational Inequalities and Mixed Problems where P C is the Euclidean projection onto C. We will prove that (5.9) is equivalent to the operator equation Remark 5.7. The function Π(u) =, where Π(u)(ω) = π ( u(ω), F (u)(ω) ). (5.11) π E (x 1, x 2 ) = x 1 P C (x 1 x 2 ) (5.12) satisfies (5.1). It generalizes the projection-based NCP-function φ E. Proposition 5.8. Let the function π : R m R m R m satisfy (5.1) and define Π by (5.11). Then u solves (5.9) if and only if (5.11) is satisfied. Proof. The projection x P = P C (x) is characterized by x P C, (x P x) T (z x P ) z C. (5.13) Now, if Π(u) =, then u(ω) = P C ( u(ω) F (u)(ω) ) a.e. on Ω. In particular, u(ω) C and, by (5.13), for all v C, ( u(ω) [u(ω) F (u)(ω)] ) T (v(ω) u(ω)), where we have used v(ω) C. Integrating this over Ω shows that u solves (5.9). Conversely, assume that Π(u). If u / C, then u does not solve (5.9). Otherwise, u C and the set Ω = {ω : u(ω) P C ( u(ω) F (u)(ω) ) } has positive measure. Set z = u F (u) and v = u + σw, where, for ω Ω, w(ω) = P C (z(ω)) u(ω), σ(ω) = Then holds v C, w, and, F (u)(ω) T (v(ω) u(ω)) = σ(ω)f (u)(ω) T w(ω) = σ(ω) ( w(ω) + F (u)(ω) ) T w(ω) σ(ω) w(ω) max{1, w(ω) 2 }. = σ(ω) ( P C (z(ω)) z(ω) ) T ( PC (z(ω)) u(ω) ) σ(ω) w(ω) 2 2 σ(ω) w(ω) 2 2 min{ w(ω) 2, w(ω) 2 2}. Integration over Ω yields F (u), v u <. Therefore, since v C, u is not a solution of (5.9).
89 5.1 Application to Variational Inequalities 85 The reformulation (5.11) is an operator equation involving the superposition operator Π. The application of semismooth Newton methods is attractive if a function π can be found that is (a) Lipschitz continuous and (b) semismooth, and for which (c) π and C π can be computed efficiently. Requirement (a) holds, e.g., for φ = φ E, since the Euclidean projection is nonexpansive. (b) depends on the set C; if, e.g., C is a polyhedron, then P C is piecewise affine linear, see [132, Prop ], and thus 1- order semismooth. Also (c) depends on the set C. We will give an example below. Requirements (a) and (b) are essential for proving the semismoothness of Π. As a preparation for the treatment of mixed problems, we will prove the semismoothness of a slightly more general class of operators than those defined in (5.11). Hereby, we consider operators Π(z, u) that arise from the reformulation of problems (5.9) where F depends on an additional parameter z Z, where Z is a Banach space: F : Z L p (Ω) m L p (Ω) m. For z Z we then consider the problem u C, F (z, u), v u v C, (5.14) which can be interpreted as a class of problems (5.9) that is parameterized by z. Hereby, C is defined by (5.8). Remark 5.9. The problem (5.9) is contained in the class (5.14) by choosing Z = {} and F (, u) = F (u). By Proposition 5.8 we can use a function π satisfying (5.1) to reformulate (5.14) equivalently as Π(z, u) =, where Π(z, u)(ω) = π ( u(ω), F (z, u)(ω) ), ω Ω. (5.15) Now suppose that the following holds: Assumption 5.1. There are 1 r < min{p, p } such that (a) F : Z L p (Ω) m L r (Ω) m is continuously Fréchet differentiable. (b) (z, u) Z L p (Ω) m F (z, u) L p (Ω) m is locally Lipschitz continuous. (c) The function π is Lipschitz continuous. (d) π is semismooth. Then we obtain: Theorem Under Assumption 5.1 the operator Π : Z L p (Ω) m L r (Ω) m defined in (5.15) is locally Lipschitz continuous and C Π-semismooth, where the generalized differential C Π(u) consists of all operators M L(Z [Lp ] m, [L r ] m ) of the form
90 86 5. Variational Inequalities and Mixed Problems M(v, w) = D 1 w + D 2 (F (z, u)(v, w)) (v, w) Z L p (Ω) m, (5.16) where D i L (Ω) m m and D = (D 1 D 2 ) satisfies D(ω) C π ( u(ω), F (z, u)(ω) ), ω Ω. (5.17) Proof. Consider the ith component Π i (z, u) = π i ( u, F (z, u) ) of Π. Obviously, Assumption 5.1 implies Assumption 3.27 with Y = Z L p (Ω) m, G(z, u) = (u, F (z, u)), r i = r, i = 1,..., 2m, q i = p, i = 1,..., m, q i = p, i = m + 1,..., 2m, and ψ = π i. Therefore, by Proposition 3.31 and Theorem 3.44, the operator Π i : Z L p (Ω) m L r (Ω) is locally Lipschitz continuous and Π i - semismooth. Hence, we can apply Proposition 3.5 to conclude that Π : Z L p (Ω) m L r (Ω) m is C Π-semismooth, where C Π = Π 1 Π m. From the definition of the C-subdifferential it is clear that C Π(z, u) can be characterized by (5.16) and (5.17). We also can prove semismoothness of higher order: Assumption As Assumption 5.12, but with (a), (d) replaced by: There exists α (, 1] such that (a) F : Z L p (Ω) m L r (Ω) m is continuously Fréchet differentiable with locally α-hölder continuous derivative. (d) π is α-order semismooth. Under these strengthened assumptions we can use Theorem 3.45 to prove: Theorem Under the Assumption 5.12 the assertions of Theorem 5.11 hold true and, in addition, the operator Π is β-order C Π-semismooth, where β can be determined as in Theorem The established semismoothness results allow to solve problem (5.9) by applying the semismooth Newton methods of section to the reformulation (5.11). The resulting methods are of the same form as Algorithm 3.57 for NCPs, only Φ has to be replaced by Π and all L p -spaces are now m-dimensional. Smoothing steps can be obtained as described in section 4.1. An appropriate regularity condition is obtained by requiring that all M k are elements of L([L r ] m, [L r ] m ) with uniformly bounded inverses. In section 4.2 we described a situation where, through an appropriate choice of the MCP-function, the smoothing step can be avoided. This approach can be generalized to the current situation: Assumption The operator F has the form F (z, u) = λu+g(z, u) with λ > and there exist 1 r < p such that (a) G : Z L r (Ω) m L r (Ω) m is continuously Fréchet differentiable.
91 5.1 Application to Variational Inequalities 87 (b) (z, u) Z L r (Ω) m G(z, u) L p (Ω) m is locally Lipschitz continuous. (c) The function π is defined by π(x 1, x 2 ) = x 1 P C (x 1 λ 1 x 2 ), where P C is the projection on C. (d) The projection P C is semismooth. Under these assumptions we can prove: Theorem Let the Assumption 5.14 hold. Then, we have Π(z, u)(ω) = u(ω) P C ( λ 1 G(z, u)(ω) ), and Π : Z L r (Ω) m L r (Ω) m is C Π-semismooth. Hereby, CΠ(z, u) is the set of all M L(Z L r (Ω) m, L r (Ω) m ) of the form M = ( λ 1 DG z (z, u) I + λ 1 DG u (z, u) ), (5.18) with D L (Ω) m m, D(ω) C P C ( λ 1 G(z, u)(ω) ) on Ω. Proof. We set T (z, u) = λ 1 G(z, u), ψ(x) = P C (x). Then T : Z L r (Ω) m L r (Ω) m is continuously differentiable and maps locally Lipschitz continuous into L p (Ω) m. Further, ψ is Lipschitz continuous and semismooth. Therefore, we can apply Theorem 3.44 componentwise (with Y = Z L r (Ω) m, r i = r, q i = p ) and obtain that Ψ i : (z, u) Z L r (Ω) m ψ i (T (z, u)) L r (Ω) is Ψ i -semismooth. Therefore, by Proposition 3.5, we see that Ψ : Z L r (Ω) m L r (Ω) m is C Ψ-semismooth. Now, using the ( I)-semismoothness of (z, u) u and the sum rule for semismooth operators, Proposition 3.4, we see that Π : Z L r (Ω) m L r (Ω) m is C Π-semismooth with C Π = ( I) CΨ. It is straightforward to see that the elements of Π are characterized by (5.18). The situation typically arising in practice is r = 2. Under the (reasonable) regularity requirement M k L([L r ] m, ([L r ] m ) with uniformly bounded inverses, superlinear convergence of the semismooth Newton method can be established as for the case of bound-constraints, see section 4.2. Finally, we give an example how a function π and its differential can be obtained in a concrete situation. Example Models for the flow of Bingham fluids [62, 63] involve VIPs of the form (5.14), where C = {x : x 2 1}.
92 88 5. Variational Inequalities and Mixed Problems We now derive explicit formulas for π E (x 1, x 2 ) = x 1 P C (x 1 x 2 ) and its differentials B π E, π E, and C π E. First, observe that P C (x) = 1 max {1, x 2 } x, is Lipschitz continuous and PC on R m. Further, P C is C on {x : x 2 1} with P C (x) = I for x 2 < 1, P C (x) = 1 x 2 I xxt x 3 2 for x 2 > 1. This shows that π E is Lipschitz continuous and PC on R m. Hence, π E is 1-order semismooth and where, with w = x 1 x 2, B π E (x 1, x 2 ) = {(I S S) : S M B }, π E (x 1, x 2 ) = {(I S S) : S M}, C π E (x 1, x 2 ) = {(I S S) : S M C }, M B = M = M C = {I} for w 2 < 1, { } 1 M B = M = M C = I wwt w 2 w 3 for w 2 > 1, 2 M B = {I, I ww T }, M = {I tww T } : t 1}, M C = {I diag(t 1,..., t m )ww T for w 2 = 1. : t 1,..., t m 1} 5.2 Mixed Problems So far we have considered variational inequalities in an L p -setting. Often, the problem to solve is not given in this particular form, because the original problem formulation contains additional unknowns (e.g., the state) and additional operator equality constraints (e.g., the state equation). In the case of control problems with unique control-to-state mapping u y(u) (induced by the state equation) we demonstrated how, by using the dependence y = y(u), a reduced problem can be obtained that only depends on the control. This reduction method is called black-box approach. Having the advantage of reducing the problem dimension, the black-box approach nevertheless suffers from several disadvantages: The evaluation of the objective function requires the solution of the (possibly nonlinear) state equation. Further, the blackbox approach is only viable if the state equation admits a unique solution y(u) for every control u. Therefore, it can be advantageous to employ the all-at-once approach, i.e., to solve for u and y simultaneously. In the following we describe how the developed ideas can be extended to the all-at-once approach.
93 5.2 Mixed Problems Karush Kuhn Tucker Systems Consider the optimization problem (with control structure) minimize J(y, u) subject to E(y, u) = and u C. (5.19) Hereby, let C U be a nonempty closed convex set and assume that the operator E : Y U W and the objective function J : Y U R are twice continuously differentiable. Further, let the control space U and the state space Y be Banach spaces and W a reflexive Banach space with dual W. Now consider a local solution (ȳ, ū) Y U of (5.19) at which Robinson s regularity condition [126] holds. More precisely, this means that int {( E (ȳ, ū)(v, w), ū + w u ) : v Y, w U, u C }, or, which turns out to be equivalent, int {E (ȳ, ū)(v, u ū) : v Y, u C}. (5.2) In particular, (5.2) is satisfied if E y (ȳ, ū) is onto, which holds true for many control problems. If the regularity condition (5.2) holds at a local solution (ȳ, ū), then there exists a Lagrange multiplier w W such that the triple (ȳ, ū, w) satisfies the KKTconditions, cf., e.g., [15]: ū C, J u (ȳ, ū) + E u (ȳ, ū) w, v ū U,U v C, (5.21) J y (ȳ, ū) + E y (ȳ, ū) w =, (5.22) E(ȳ, ū) =. (5.23) This system consists of a variational inequality (parameterized by z = (ȳ, w)) of the form (5.14) with F (y, u, w) = J u (y, u) + E u (y, u) w (except that the space U and the convex set C are not yet specified) and two operator equations. For convenient notation, we introduce the Lagrange function L : Y U W R, L(y, u, w) = J(y, u) + w, E(y, u) W,W. Then the operators appearing in (5.21) (5.23) are L u (ȳ, ū, w), L y (ȳ, ū, w), and L w (ȳ, ū, w), respectively. Therefore, we can write (5.21) (5.23) in the form ū C, L u (ȳ, ū, w), v ū U,U v C, (5.24) L y (ȳ, ū, w) =, (5.25) E(ȳ, ū) =. (5.26) Our aim is to reformulate the variational inequality as an equivalent nonsmooth operator equation. To this end, we consider U = L p (Ω) m, p (1, ], Ω bounded with µ(ω) >, and assume that C has appropriate structure. In the following we analyze the case where C is described by pointwise convex constraints of the form
94 9 5. Variational Inequalities and Mixed Problems (5.8) and assume that a continuous function π : R m R m R m with the property (5.1) is available. Note that this problem class includes the NCP and the boundconstrained VIP in normal form as special cases. According to Proposition 5.8, we can reformulate (5.24) as Π(ȳ, ū, w) =, where Π(y, u, w)(ω) = π ( u(ω), L u (y, u, w)(ω) ), ω Ω, and thus (ȳ, ū, w) is a KKT-triple if and only if it is a solution to the system Σ(y, u, w) def = L y(y, u, w) Π(y, u, w) =. (5.27) E(y, u) We continue by considering two approaches, parallel to the situations in Assumption 5.1 and Assumption 5.14, respectively. The first approach requires the following hypotheses: Assumption There exist 1 r < min{p, p } such that (a) E : Y L p (Ω) m W and J : Y L p (Ω) m R are twice continuously differentiable. (b) The operator (y, u, w) Y L p (Ω) m W L u (y, u, w) L r (Ω) m is well-defined and continuously differentiable. (c) The operator (y, u, w) Y L p (Ω) m W L u (y, u, w) L p (Ω) m is well-defined and locally Lipschitz continuous. (d) π is Lipschitz continuous and semismooth. Remark Variants of Assumption 5.17 are possible. We obtain: Theorem Let the Assumption 5.17 hold. Then the operator Σ : Y L p (Ω) m W Y L r (Ω) m W defined in (5.27) is locally Lipschitz continuous and C Σ-semismooth with C Σ = L y C Π E. More precisely, C Σ(y, u, w) is the set of all M L(Y [L p ] m W, Y L r (Ω) m W ) of the form L yy (y, u, w) L yu (y, u, w) E y (y, u) M = D 2 L uy (y, u, w) D 1 I + D 2 L uu (y, u, w) D 2 E u (y, u), (5.28) E y (y, u) E u (y, u) where D i L (Ω) m m, (D 1 D 2 )(ω) C π ( u(ω), L u (y, u, w)(ω) ). Proof. We set Z = Y W and F (y, w, u) = L u (y, u, w). Assumption 5.17 then implies Assumption 5.1, and thus Π is locally Lipschitz continuous and C Π-semismooth by Theorem From the differentiability requirements in Assumption 5.17 we obtain the local Lipschitz continuity and, by Proposition 3.3, the L y- and E -semismoothness of the second and third component of Σ, respectively. Proposition 3.5 now yields the local Lipschitz continuity and the C Σ- semismoothness of Σ for C Σ = L y C Π E. The elements of C Σ(y, u, w) are easily seen to be given by (5.28).
95 5.2 Mixed Problems 91 In Example 5.23, we apply Theorem 5.19 to a control problem. A second approach for establishing the semismoothness of Π relies on the following hypotheses: Assumption 5.2. There exist 1 r < p such that: (i) E : Y L r (Ω) m W and J : Y L r (Ω) m R are twice continuously differentiable. (ii) L u has the form L u (y, u, w) = λu + G(y, u, w) with λ > and: (a) G : Y L r (Ω) m W L r (Ω) m is continuously Fréchet differentiable. (b) The operator (y, u, w) Y L r (Ω) m W G(y, u, w) L p (Ω) m is locally Lipschitz continuous. (iii) The function π is defined by π(x 1, x 2 ) = x 1 P C (x 1 λ 1 x 2 ) and the projection P C on C is semismooth. Theorem Let the Assumption 5.2 hold. Then we have Π(y, u, w)(ω) = u(ω) P C ( λ 1 G(y, u, w)(ω) ), and Σ : Y L r (Ω) m W Y L r (Ω) m W is locally Lipschitz continuous and C Σ-semismooth. Hereby, C Σ(y, u, w) is the set of all M L(Y L r (Ω) m W, Y L r (Ω) m W ) of the form M = L yy(y, u, w) L yu (y, u, w) E y (y, u) λ 1 DG y (y, u, w) I + λ 1 DG u (y, u, w) λ 1 DG w (y, u, w) (5.29) E y (y, u) E u (y, u) with D L (Ω) m m, D(ω) C P C ( λ 1 G(y, u, w)(ω) ) on Ω. (5.3) Proof. Assumption 5.2 implies Assumption 5.14 for Z = Y W and F (y, w, u) = L u (y, u, w). Theorem 5.15 is applicable and yields the local Lipschitz continuity and C Π-semismoothness of Π : Y Lr (Ω) m W L r (Ω) m, where C Π(y, u, w) is the set of all M Π L(Y L r (Ω) m W, L r (Ω) m ) of the form M Π = ( λ 1 DG y (y, u, w) I + λ 1 DG u (y, u, w) λ 1 DG w (y, u, w) ), where D is as in the Theorem. From Assumption 5.2 and Proposition 3.3 follow the local Lipschitz continuity as well as the L y - and E -semismoothness of the second and third component of Σ, respectively. Therefore, the operator Σ : Y L r (Ω) m W Y L r (Ω) m W is locally Lipschitz continuous and, by Proposition 3.5, C Σ-semismooth with C Σ = L y Π E. It is straightforward to verify that the elements of C Σ(y, u, w) are exactly the operators M in (5.29).
96 92 5. Variational Inequalities and Mixed Problems Remark If P C is α-order semismooth, it is easy to modify Assumption 5.2 and Theorem 5.21 such that higher order semismoothness of Π can be established. The following example illustrates how Theorem 5.19 and 5.21 can be applied in practice. Example Let Ω R n be a bounded Lipschitz domain and consider the control problem 1 minimize y d (x)) y H 1(Ω),u L2 (Ω) 2 Ω(y(x) 2 dx + λ u(x) 2 dx 2 Ω (5.31) subject to y = f + gu on Ω β 1 u β 2 on Ω. This is a problem of the form (5.19) with U = L 2 (Ω), Y = H 1 (Ω), W = H 1 (Ω), W = H 1 (Ω), C = [β 1, β 2 ], C defined in (5.8), and J(y, u) = 1 y d (x)) 2 Ω(y(x) 2 dx + λ u(x) 2 dx, 2 Ω E(y, u) = y f gu. We assume < β 1 < β 2 < +, y d L 2 (Ω), λ >, f H 1 (Ω), and g L (Ω). Observe that (a) J is strictly convex, (b) {(y, u) : y = f + gu, u [β 1, β 2 ]} H 1 (Ω) L2 (Ω) is closed, convex, and bounded. In (b) we have used that L(H 1, H 1 ) is a homeomorphism. Hence, by a standard result [49, Prop. II.1.2], there exists a unique solution (ȳ, ū) H 1(Ω) L 2 (Ω) to the problem. Since C max{ β 1, β 2 } B L, we have ū L p (Ω) for all p [1, ]. Hence, instead of considering (5.31) as a problem posed in H 1 (Ω) L 2 (Ω) we can equally well treat it in Y U = H 1(Ω) Lp (Ω), with arbitrary p [2, ], which we will do in the following. The continuous invertibility of E y (y, u) = L(H 1, H 1 ) guarantees that Robinson s regularity condition (5.2) is satisfied, so that the solution (ȳ, ū) is characterized by (5.24) (5.26), where w W = H 1 (Ω) is the Lagrange multiplier. Using integration by parts, we have for y, w H 1(Ω) y, w H 1,H 1 = y(x) w(x)dx = w, y H 1,H 1. Hence, Therefore, Ω L(y, u, w) = J(y, u) + w, y H 1,H 1 (f + gu, w) L 2.
97 L y (y, u, w) = y y d w, L u (y, u, w) = λu gw, 5.2 Mixed Problems 93 and (5.24) (5.26) are satisfied by the triple (ȳ, ū, w) if and only if it solves the system u L p (Ω), u C, (λu gw, v u) L 2 v L p (Ω), v C, (5.32) y y d w = (5.33) y = f + gu. (5.34) Now, let q be arbitrary with q (2, ] if n = 1, q (2, ) if n = 2, and q (2, 2n/(n 2)] if n 3. Then the continuous embedding H 1 (Ω) Lq (Ω) implies that the operator (y, u, w) Y L p (Ω) W L u (y, u, w) = λu gw L q (Ω) is continuous linear and thus C for all p q. It is now straightforward to see that Assumption 5.17 (a) (c) holds for any p (2, ], p (2, min{p, q}] with q > 2 as specified, and any r [2, p ). For π we can choose any Lipschitz continuous and semismooth MCP-function for the interval [β 1, β 2 ] to meet Assumption 5.17 (d). This makes Theorem 5.19 applicable. Now we turn to the situation of Assumption 5.2. Obviously, for r = 2 and p = q, Assumptions 5.2 (i), (ii) hold with G(y, u, w) = gw. Further, P C (x) = max{β 1, min{x, β 2 }} is 1-order semismooth, so that also Assumption 5.2 (iii) holds. Hence, Theorem 5.21 is applicable. Having established the semismoothness of the operator Σ, we can apply the (projected) semismooth Newton method (Algorithm 3.13 or 3.17) for the solution of (5.27). For the superlinear convergence results, Theorem 3.15 and 3.19, respectively, the regularity condition of Assumption 3.14 or one of its variants, Assumption 3.2 or 3.23, respectively, has to be satisfied. Essentially, these assumptions require the bounded invertibility of some or all elements of C Σ, viewed as operators between appropriate spaces, near the solution. In the next section we establish a relation between C Σ and the generalized differential of the reformulated reduced problem. This relation can then be used to show that regularity conditions for the reduced problem imply regularity of the full problem (5.27). Further, we discuss how smoothing steps can be constructed for the scenario of Assumption As we will see, in the setting of Assumption 5.2 no smoothing step is required Connections to the Reduced Problem We consider the problem (5.19) and, in parallel, the reduced problem minimize j(u) subject to u C, (5.35) where j(u) = J(y(u), u) and y(u) Y is such that
98 94 5. Variational Inequalities and Mixed Problems E(y(u), u) =. (5.36) We assume that y(u) exists uniquely for all u in a neighborhood V of C (this can be relaxed, see Remark 5.24) and that E y (y(u), u) is continuously invertible. Then, by the implicit function theorem, the mapping u U y(u) Y is twice continuously differentiable. The adjoint representation of the gradient j (u) U is given by j (u) = J u (y(u), u) + E u (y(u), u) w(u), where w = w(u) W solves the adjoint equation E y (y(u), u) w = J y (y(u), u), (5.37) see appendix A.1. In terms of the Lagrange function this can be written as where w(u) satisfies L(y, u, w) = J(y, u) + w, E(y, u) W,W j (u) = L u (y(u), u, w(u)), (5.38) L y (y(u), u, w(u)) =. (5.39) Any solution ū U of (5.35) satisfies the first-order necessary optimality conditions for (5.35): ū C, j (ū), v ū U,U v C. (5.4) Now, setting ȳ = y(ū) and combining (5.4) with (5.38), (5.39), and (5.36), we can write (5.4) equivalently as ū C, L u (ȳ, ū, w, v ū U,U v C, L y (ȳ, ū, w) = E(ȳ, ū) =. These are exactly the KKT-conditions (5.24) (5.26) of problem (5.19). Therefore, if ū U is a critical point of (5.35), i.e. if ū U satisfies (5.4), then (ȳ, ū, w) = (y(ū), ū, w(ū)) is a KKT-triple of (5.19), i.e., (ȳ, ū, w) satisfies (5.24) (5.26). Conversely, if (ȳ, ū, w) is a KKT-triple of (5.19), then there holds ȳ = y(ū), w = w(ū), and ū is a critical point of (5.35). Remark We have assumed that y(u) exists uniquely with E y (y(u), u) being continuously invertible for all u in a neighborhood of C. This requirement can be relaxed. In fact, let (ȳ, ū, w) be a KKT-triple of (5.19) and assume that E y (ȳ, ū) is continuously invertible. Then, by the implicit function theorem there exist neighborhoods V U of ū and V Y of ȳ and a unique mapping u V U y(u) V Y with y(ū) = ȳ and E y (y(u), u) = for all u V U. Furthermore, y(u) is twice continuously differentiable. Introducing j(u) = J(y(u), u), u V U, we see as above that (5.24) (5.26) and (5.4) are equivalent. Due to this equivalence of the optimality systems for (5.19) and (5.35) we expect to find close relations between Newton methods for the solution of (5.24) (5.26) and those for the solution of (5.4). This is the objective of the next section.
99 5.2.3 Relations between Full and Reduced Newton System We now return to problems (5.19) with U = L p (Ω) m and C = {u L p (Ω) m : u(ω) C, ω Ω}, 5.2 Mixed Problems 95 where C R m is closed and convex. As in Remark 5.24, let us suppose that (ȳ, ū, w) is a KKT-triple with continuously invertible operator E y (ȳ, ū) and denote by y(u) the locally unique control-to-state mapping with y(ū) = ȳ. We consider the reformulation (5.27) of (5.24) (5.26) under the Assumption If we work with exact elements M of the generalized differential C Σ(y, u, w), the semismooth Newton method for the solution of (5.27) requires to solve systems of the form Ms = Σ(y, u, w). According to Theorem 5.19, these systems assume the form L yy L yu Ey ρ 1 D 2 L uy D 1 I + D 2 L uu D 2 Eu ρ 2, (5.41) E y E u ρ 3 where we have omitted the arguments (y, u, w) and (y, u). By the Banach theorem, E y (y, u) is continuously invertible in a neighborhood of (ȳ, ū) with uniformly bounded inverse. Using this, we can perform the following block elimination: L yy L yu Ey ρ 1 D 2 L uy D 1 I + D 2 L uu D 2 Eu ρ 2 E y E u ρ 3 where (Row 1 L yy E 1 y Row 3) L yu L yy Ey 1E u Ey ρ 1 L yy E 1 D 2 L uy D 1 I + D 2 L uu D 2 Eu ρ 2 E y E u ρ 3 y ρ 3 (Row 2 D 2 L uy E 1 y Row 3) L yu L yy Ey 1E u Ey ρ 1 L yy Ey 1ρ 3 D 1 I + D 2 (L uu L uy Ey 1 E u ) D 2 Eu ρ 2 D 2 L uy Ey 1 ρ 3 E y E u ρ 3 (Row 2 D 2 E u(e y) 1 Row 1) L yu L yy Ey 1E u Ey ρ 1 L yy E 1 D 1 I + D 2 H ρ 2 E y E u ρ 3 y ρ 3 H(y, u, w) = L uu L uy E 1 y E u E u(e y) 1 L yu + E u (E y ) 1 L yy E 1 y E u, (5.42),
100 96 5. Variational Inequalities and Mixed Problems ρ 2 = ρ 2 D 2 E u(e y) 1 ρ 1 + D 2 (E u(e y) 1 L yy L uy )E 1 y ρ 3. The operator H can be written in the form H = T ( Lyy L uy L yu L uu ) T, T (y, u) = ( E 1 y I Therefore, the continuous invertibility of M is closely related to the continuous invertibility of the operator D 1 I + D 2 H. We now consider the reduced objective function j(u) = J(y(u), u) in a neighborhood of ū. It is shown in appendix A.1 that the Hessian j (u) can be represented in the form ( ) j (u) = T (y, u) Lyy (y, u, w) L yu (y, u, w) T (y, u), L uy (y, u, w) L uu (y, u, w) ( Ey (y, u) 1 ) E u (y, u) T (y, u) =, I where y = y(u), and w = w(u) is the adjoint state, given by the adjoint equation (5.37), which can also be written in the form (5.39). Therefore, we see that j (u) = H(y(u), u, w(u)) and, hence, j (ū) = H(ȳ, ū, w), since ȳ = y(ū) and w = w(ū). For (y, u, w) = (y(u), u, w(u)) we have L u (y(u), u, w(u)) = j (u) by (5.38). Hence, with D = (D 1 D 2 ), D(ω) C π ( u(ω), L u (y(u), u, w(u))(ω) ) D(ω) C π ( u(ω), j (u)(ω) ). Thus, by Theorems 5.11 and 5.19, for any (y, u, w) = (y(u), u, w(u)) and all operators M of the form (5.28) the Schur complement satisfies where E u M R = D 1 I + D 2 H(y(u), u, w(u)) CΠ R (u), Π R (u)(ω) = π ( u(ω), j (u)(ω) ). For the application of the class of (projected) semismooth Newton methods to problem (5.27) we need the invertibility of M k C Σ(y k, u k, w k ) as operator between appropriate spaces. We already observed that for the reduced problem it is appropriate to require the uniformly bounded invertibility of Mk R C ΠR (u k ) in L([L r ] m, [L r ] m ). In agreement with this we now require: Assumption At least one of the following conditions holds: (a) The operators M k C Σ(y k, u k, w k ) are continuously invertible elements of L(Y [L r ] m W, Y [L r ] m W ) with the norms of their inverses bounded by a constant C M 1. (b) There exist constants η > and C M 1 > such that, for all (y, u, w) (ȳ, ū, w) + ηb Y [L p ] m W, every M C Σ(y k, u k, w k ) is an invertible element of L(Y [L r ] m W, Y [L r ] m W ) with the norm of its inverse bounded by C M 1. ).
101 5.2 Mixed Problems 97 This assumption corresponds to Assumption 3.11 (i) with Y = Y [L r ] m W. Under Assumptions 5.17, 5.25 and 3.11 (ii) (ensuring the availability of a smoothing step), we can apply Algorithm 3.9 or its projected version, Algorithm 3.17, (with, B k = M k and, e.g., K = C) for f = Σ, f = C Σ, Y = Y [L p ] m W, Z = Y [L r ] m W, and Y = Y [L r ] m W. The Theorems 3.12 and 3.19 then guarantee superlinear convergence since, by Theorem 5.19, Σ is C Σ-semismooth. In section we will propose a way of constructing smoothing steps. In the same way, we can consider reformulations arising under the Assumption 5.2. In this case we have L u (y, u, w) = λu + G(y, u, w), π(x) = x 1 P C (x 1 λ 1 x 2 ). Further, for all M C Σ(y, u, w), there exists D L (Ω) m m with D C P C ( λ 1 G(y, u, w)) such that L yy L yu Ey M = λ 1 DG y I + λ 1 DG u λ 1 DG w E y E u L yy L yu Ey = λ 1 DL uy I + λ 1 D(L uu λi) λ 1 DEu E y E u L yy L yu Ey = D 2 L uy D 1 I + D 2 L uu D 2 Eu, E y E u with D 1 = I D and D 2 = λ 1 D. Note that (D 1, D 2 ) C π(u, L u (y, u, w)) and, hence, for these choices of D 1 and D 2, the operator M assumes the form (5.28). Thus, we can apply the same transformations to the Newton system as before and obtain again that for (y, u, w) = (y(u), u, w(u)) the generalized differentials of the reduced semismooth reformulation appear as Schur complement of the full system. As regularity condition we choose: Assumption At least one of the following conditions holds: (a) The operators M k C Σ(y k, u k, w k ) are continuously invertible elements of L(Y [L r ] m W, Y [L r ] m W ) with the norms of their inverses uniformly bounded by a constant C M 1. (b) There exist constants η > and C M 1 > such that, for all (y, u, w) (ȳ, ū, w) + ηb Y [Lr ] m W, every M C Σ(y k, u k, w k ) is an invertible element of L(Y [L r ] m W, Y [L r ] m W ) with the norm of its inverse bounded by C M 1. This assumption corresponds to Assumption 3.11 (i) with Y = Y = Y [L r ] m W. Now, under Assumptions 5.2 and 5.26, we can apply Algorithm 3.9 or its projected version, Algorithm 3.17, for f = Σ, f = C Σ, Y = Y = Y [L r ] m W,
102 98 5. Variational Inequalities and Mixed Problems and Z = Y [L r ] m W. Since Y = Y, we do not need a smoothing step. Theorems 3.12 and 3.19 establish superlinear convergence since, by Theorem 5.21, Σ is C Σ-semismooth Smoothing Steps In addition to Assumption 5.17, we require: Assumption The derivative L u has the form L u (y, u, w) = λu + G(y, u, w), with (y, u, w) Y L r (Ω) m W G(y, u, w) L p (Ω) m being locally Lipschitz continuous. Example We verify this assumption for the control problem of Example There, we had Y = W = H 1, U = Lp with p 2 arbitrary, and L u (y, u, w) = λu gw = λu + G(y, u, w) with G(y, u, w) = gw. Since g L and w H 1 L q for all q [1, ] if n = 1, all q [1, ) if n = 2, and all q [1, 2n/(n 2)] if n 3, we see that G maps L r, with r 2 arbitrary, linear and continuous to L q. Thus, Assumption 5.27 holds for all p (2, q]. We can show: Theorem Let the Assumptions 5.17 and 5.27 hold. Then the operator defines a smoothing step. Proof. We first note that so that S : Y L r (Ω) m W Y L p (Ω) m W, y S(y, u, w) = P C (u λ 1 L u (y, u, w)), w x 1 = P C (x 1 λ 1 x 2 ) x 1 = P C (x 1 x 2 ) π(x) =, u = P C ( u λ 1 L u (y, u, w) ) Π(y, u, w) =. Hence, for any solution (ȳ, ū, w) of (5.27), we have S(ȳ, ū, w) = (ȳ, ū, w). Furthermore, as in section 4.1, pointwise on Ω holds
103 5.2 Mixed Problems 99 ( P C u λ 1 L u (y, u, w) ) ū 2 ( = P C u λ 1 L u (y, u, w) ) (ū P C λ 1 L u (ȳ, ū, w) ) 2 ( = P C λ 1 G(y, u, w) ) ( P C λ 1 G(ȳ, ū, w) ) 2 λ 1 G(y, u, w) G(ȳ, ū, w) 2, and thus, with C G denoting the local Lipschitz constant of G near (ȳ, ū, w), P C ( u λ 1 L u (y, u, w) ) ū [Lp ] m C G cλ 1 (y, u, w) (ȳ, ū, w) Y [Lr ] m W, where c depends on m only. The proof is complete, since S(y, u, w) (ȳ, ū, w) Y [Lp ] m W c ( (y, w) (ȳ, w) Y W + P C ( u λ 1 L u (y, u, w) ) ū [Lp ] m ) Regularity Conditions We already observed that the all-at-once Newton system is closely related to the black-box Newton system. In this section we show how the regularity of the all-atonce Newton system can be reduced to regularity conditions on its Schur complement. Since, for (y, u, w) = (y(u), u, w(u)), this Schur complement coincides with the operator of the black-box Newton system, sufficient conditions for regularity can then be developed along the lines of section 4.3. In the following, we restrict our investigations to the situation of Assumptions 5.2 and Our hypothesis on the Schur complement is: Assumption 5.3. There exist constants η > and CM R > such that, for all 1 (y, u, w) (ȳ, ū, w) + ηb Y [Lr ] m W holds: (i) E y (y, u, w) L(Y [L r ] m W, Y [L r ] m W ) is continuously invertible with uniformly bounded inverse. (ii) For all D satisfying (5.3), the Schur complement D 1 + D 2 H, with D 1 = I D, D 2 = λ 1 D, and H as defined in (5.42), is an invertible element of L([L r ] m, [L r ] m ) with M 1 [Lr ] m,[l r ] m CR M 1. Theorem Let the Assumptions 5.2 and 5.3 hold. Then the regularity condition of Assumption 5.26 (b) holds. Proof. Let (y, u, w) (ȳ, ū, w) + ηb Y [Lr ] m W and M C Σ(y, u, w) be arbitrary. Then there exists D satisfying (5.3) such that M assumes the form (5.29).
104 1 5. Variational Inequalities and Mixed Problems Now consider any ρ = (ρ 1, ρ 2, ρ 3 ) T Y [L r ] m W. Then, according to section 5.2.3, solving the system M(s y, s u, s w ) T = ρ is equivalent to (D 1 I + D 2 H)s u = ρ 2 D 2 E u (E y ) 1 ρ 1 + D 2 (Eu(E y) 1 L yy L uy )Ey 1 ρ 3, (5.43) E y s y = ρ 3 E u s u, (5.44) Ey s w = ρ 1 L yy Ey 1 ρ 3 (L yu L yy Ey 1 E u)s u. (5.45) The assumptions ensure twice continuous differentiability of L and uniformly bounded invertibility of E y and D 1 + D 2 H. Furthermore, D and thus D 1, D 2 are uniformly bounded in [L ] m m due to the Lipschitz continuity of P C. This and (5.43) (5.45) show that, possibly after shrinking η, there exists C M 1 > such that s Y [Lr ] m W C M 1 s Y [L r ] m W, holds uniformly on (ȳ, ū, w) + ηb Y [Lr ] m W.
105 6. Trust-Region Globalization So far, we have concentrated on locally convergent Newton-type methods. We now propose a class of trust-region algorithms which are globally convergent and use (projected) Newton steps as candidates for trial steps. Hereby, we restrict ourself to the case where the problem is posed in Hilbert space, which, from a practical point of view, is not very restrictive. To motivate our approach, we consider (1.1) with U = L 2 (Ω) and continuously differentiable function F : U U. Using an MCP/NCP-function φ, we reformulate the problem in the form Φ(u) =. (6.1) Let the Assumption 5.1 hold with r = 2 and some p, p (2, ]. Then the operator Φ : L p (Ω) L 2 (Ω) is semismooth by Theorem 5.4. Alternatively, if F assumes the form F (u) = λu + G(u) and G has the smoothing property of section 4.2, and if Φ(u) = u P B (u λ 1 G(u)) is chosen, then by Theorem 4.4, Φ : L 2 (Ω) L 2 (Ω) is is locally Lipschitz continuous and semismooth. For globalization, we need a minimization problem whose solutions or critical points correspond to solutions of (6.1). We propose three different approaches to obtain these minimization reformulations: Most naturally, we can choose the squared residual h(u) = 1 2 Φ(u) 2 L 2 as objective function. In fact, any global solution of h is a solution to Φ(u) = and vice versa. Therefore, (6.1) is equivalent to the minimization problem minimize u L 2 (Ω) h(u). (6.2) We will show that, for appropriate choices of φ, the function h(u) = Φ(u) 2 L 2 /2 is continuously differentiable. This makes (6.2) a C 1 problem posed in the Hilbert space L 2 (Ω). As was discussed in the context of the projected semismooth Newton method (Algorithm 3.17), it is often desirable that the algorithm stays feasible with respect to a given closed convex set K L p (Ω) which contains the solution ū L p (Ω). Usually K = B is chosen. We consider sets of the general form K = {a K u b K } with lower and upper bound functions satisfying the conditions (3.46). Then the constrained minimization problem
106 12 6. Trust-Region Globalization minimize u L 2 (Ω) h(u) subject to u K (6.3) is equivalent to (6.1) in the sense that any global solution ū K of (6.3) solves (6.1) and vice versa. Finally, we come to a third possibility of globalization, which can be used if the VIP is obtained from the first-order necessary optimality conditions of the constrained minimization problem minimize j(u) subject to u B (6.4) with B = {u L 2 (Ω) : a u b} as in (1.1). Then we can use the problem (6.4) itself for the purpose of globalization. In all three approaches, (6.2), (6.3), and (6.4), we obtain a minimization problem of the form minimize u L 2 (Ω) f(u) subject to u K. (6.5) For the development and analysis of the trust-region method, rather than working in L 2, we prefer to choose a general Hilbert space setting. This has the advantage of covering also the finite-dimensional case, and many other situations, e.g., the reformulation of mixed problems, see section 5.2. Therefore, in the following we consider the problem minimize f(u) subject to u K, (6.6) u U where f : U R is a continuously differentiable function that is defined on the Hilbert space U. The feasible set K U is assumed to be nonempty, closed, and convex. In particular, there exists a unique metric projection P K : U K, P K (u) = argmin v u U. v K We identify the dual U of U with U, i.e., we use, U,U = (, ) U Our idea is to use projected semismooth Newton steps as trial step candidates for a trust-region globalization based on (6.6). In general, the presence of the smoothing step in the semismooth Newton method makes it difficult to prove rigorously transition to fast local convergence. There are ways to do this, but the approach would be highly technical, and thus we will prove transition to fast local convergence only for the case where the semismooth Newton method converges superlinearly without a smoothing step. This is justified for two reasons: As we will see in our numerical tests, experience shows that we usually observe fast convergence without incorporating a smoothing step in the algorithm. One reason for this is that a discretization would have to be very fine to resolve functions that yield an excessively big L p/ L 2-ratio. Second, in section 4.2 we have developed a reformulation to which the semismooth Newton method is applicable without a smoothing step. For unconstrained problems, global convergence usually means that the method converges to a critical point, i.e., a point u U such that f (u) = in the sense that at least lim inf k f (u k ) U =. In the constrained context, we have to clarify what we mean by a critical point.
107 Definition 6.1. We call u U a critical point of (6.6) if The following result is important: Lemma Trust-Region Globalization 13 u K and (f (u), v u) U v K. (6.7) (i) Let u be a local solution of (6.6); more precisely, u K and there exists δ > such that f(v) f(u) for all v (u + δb U ) K. Then u is a critical point of (6.6). (ii) The following statements are equivalent: (a) u is a critical point of (6.6). (b) u P K (u f (u)) =. (c) u P K (u tf (u)) = for some t >. (d) u P K (u tf (u)) = for all t. Proof. (see also [66, 8]). (i): For any v K, there holds v(t) = u + t(y u) (u + δb U ) K for sufficiently small t > and thus [f(v(t)) f(u)]/t (f (u), v u) U as t +. (ii): Let t > be arbitrary. Condition (6.7) is equivalent to u K, (u (u tf (u)), v u) U v K, which is the same as u = P K (u tf (u)). This proves the equivalence of (a) (d). Next, we introduce the concept of criticality measures. Definition 6.3. A continuous function χ : K [, ) with the property χ(u) = u is a critical point of problem (6.6) (6.8) is called criticality measure for (6.6). Example 6.4. By Lemma 6.2, for any t >, the function χ P,t (u) = u P K (u tf (u)) U is a criticality measure for (6.6). For t = 1, the resulting criticality measure is the norm of the projected gradient. χ P (u) = χ P,1 (u) = u P K (u f (u)) U
108 14 6. Trust-Region Globalization The algorithm that we present in this chapter uses ideas developed in the author s paper [14] on trust-region methods for finite-dimensional semismooth equations. Other trust-region approaches for the solution of finite-dimensional NCPs and VIPs can be found in, e.g., [88, 93, 119]. Trust-region algorithms for infinite-dimensional constrained optimization problems are investigated in, e.g., [96, 136, 144]. The method we propose allows for nonmonotonicity of the sequence of generated function values. This has proven advantageous to avoid convergence to local, but nonglobal solutions of the problem [28, 65, 93, 137, 14]. Before we describe the trust-region algorithm, we show that for appropriate choice of φ the function h(u) = Φ(u) 2 L 2 /2 is continuously differentiable. We begin with the following result: Lemma 6.5. Let ψ : V R be locally Lipschitz continuous on the nonempty open set V R m. Assume that ψ is continuously differentiable on V \ ψ 1 (). Then the function ψ 2 is continuously differentiable on V. Moreover, (ψ 2 ) (x) = 2ψ(x)g for all g ψ(x) and all x V. The simple proof can be found in [14]. Lemma 6.6. Let ψ : R m R be Lipschitz continuous on R m and continuously differentiable on R m \ ψ 1 (). Further, let G : U L 2 (Ω) m be continuously differentiable. Then the function h : u U 1 2 Ψ(u) 2 L 2 (Ω) m with Ψ(u)(ω) = ψ(g(u)(ω)), ω Ω, is continuously differentiable with h (u) = M Ψ(u) M Ψ(u). Remark 6.7. Note that Ψ(u) L(U, L 2 ) by Lemma Proof. Using Lemma 6.5, η = ψ 2 /2 is continuously differentiable with η (x) = ψ(x)g for all g ψ(x). The Lipschitz continuity of ψ implies η (x) 2 = ψ(x) g 2 L( ψ() + ψ(x) ψ() ) L ψ() + L 2 x 2. Hence, by Proposition A.1, the superposition operator T : w L 2 (Ω) m η(w) L 1 (Ω) m is continuously differentiable with derivative (T (w)v)(ω) = η (w(ω))v(ω) = ψ(w(ω))g T v(ω) g T ψ(w(ω)). From this and the chain rule we see that H : u U (G(u)) L 1 (Ω) m is continuously differentiable with (H (u)v)(ω) = η (G(u)(ω))(G (u)v)(ω) = ψ(g(u)(ω))g T (G (u)v)(ω) g T ψ(g(u)(ω)).
109 6.1 The Trust-Region Algorithm 15 Hence, H (u) = Ψ(u) M M Ψ(u). Thus, we see that h : u U H(u)(ω)dω is continuously differentiable with Ω (h (u), v) U = H (u)(ω)v(ω)dω = Ψ(u)(ω)(Mv)(ω)dω = (M Ψ(u), v) U Ω Ω for all M Ψ(u). Remark 6.8. The Fischer Burmeister function φ F B meets all requirements of Lemma 6.6. Hence, if F : L 2 (Ω) L 2 (Ω) is continuously differentiable, then h(u) = Φ(u) 2 L /2 with Φ(u) = φ ( u, F (u) ) is continuously differentiable. The 2 same holds true for the MCP-function φ F [α,β] B defined in (5.5). 6.1 The Trust-Region Algorithm We use the continuous differentiability of f to build an at least first-order accurate quadratic model q k (s) = (g k, s) U (s, B ks) U def of f(u k + s) f(u k ) at the current iterate u k, where g k = f (u k ) U is the gradient of f at u k. The self-adjoint operator B k L(U, U) can be viewed as an approximation of the Hessian operator of f (if it exists). We stress, however, that the proposed trust-region method is globally convergent for very general choices of B k, including B k =. In each iteration of the trust-region algorithm, a trial step s k is computed as approximate solution of the Trust-Region Subproblem: minimize q k (s) subject to u k + s K, s U k. (6.9) We will assume that the trial steps meet the following two requirements: Feasibility Condition: Reduction Condition: u k + s k K and s k U β 1 k, (6.1) pred k (s k ) def = q k (s k ) β 2 χ(u k ) min{ k, χ(u k )} (6.11) with constants β 1 1 and β 2 > independent of k. Hereby, χ is a suitably chosen criticality measure, see Definition 6.3. Usually, the update of the trust-region radius k is controlled by the ratio of actual reduction ared k (s) def = f(u k ) f(u k + s)
110 16 6. Trust-Region Globalization and predicted reduction pred k (s) def = q k (s). It has been observed [28, 65, 93, 137] that the performance of nonlinear programming algorithms can be significantly improved by using nonmonotone line search- or trust-region techniques. Hereby, in contrast to the traditional approach, the monotonicity f(u k+1 ) f(u k ) of the function values is not enforced in every iteration. To achieve this, we generalize a nonmonotone trust-region technique that was recently introduced by the author [14] in the context of finite-dimensional semismooth equations. For this algorithm all global convergence results for monotone, finite-dimensional trust-region methods remain valid. However, the decrease requirement is significantly relaxed. Before we describe this approach and the corresponding reduction ratio ρ k (s) in detail, we first state the basic trust-region algorithm. Algorithm 6.9 (Trust-Region Algorithm). 1. Initialization: Choose η 1 (, 1), min, and a criticality measure χ. Choose u K, > such that min, and a model Hessian B L(U, U). Choose an integer m 1 and fix λ (, 1/m] for the computation of ρ k. Set k := and i := Compute χ k := χ(u k ). If χ k =, then STOP. 3. Compute a trial step s k satisfying the conditions (6.1) and (6.11). 4. Compute the reduction ratio ρ k := ρ k (s k ) by calling Algorithm 6.11 with m k := min{i + 1, m}. 5. Compute the new trust-region radius k+1 by invoking Algorithm If ρ k η 1, then reject the step s k, i.e., set u k+1 := u k, B k+1 := B k, increment k by 1, and go to Step Accept the step: Set u k+1 := u k + s k and choose a new model Hessian B k+1 L(U, U). Set j i+1 := k, increment k and i by 1 and go to Step 2. The increasing sequence (j i ) i, enumerates all indices of accepted steps. Moreover, u k = u ji j i 1 < k j i, i 1. (6.12) Conversely, if k j i for all i, then s k was rejected. In the following we denote the set of all these successful indices j i by S: S def = {j i : i } = {k : trial step s k is accepted}. Sometimes, accepted steps will also be called successful. We will repeatedly use the fact that {u k : k } = {u k : k S}. The trust-region updates are implemented as usual. We deal with two different flavors of update rules simultaneously by introducing a nonnegative parameter min. We require that after successful steps k+1 min holds. If min = is chosen, this is automatically satisfied. For min >, however, it is an additional feature that allows for special proof techniques.
111 Algorithm 6.1 (Update of the Trust-Region Radius). 6.1 The Trust-Region Algorithm 17 min and η 1 (, 1) are the constants defined in Step 1 of Algorithm 6.9. Let η 1 < η 2 < 1, and γ < γ 1 < 1 < γ 2 be fixed. 1. If ρ k η 1, then choose k+1 (γ k, γ 1 k ]. 2. If ρ k (η 1, η 2 ), then choose k+1 [γ 1 k, max{ min, k }] [ min, ). 3. If ρ k η 2, then choose k+1 ( k, max{ min, γ 2 k }] [ min, ). We still have to describe how the reduction ratios ρ k (s) are defined. Here is a detailed description: Algorithm 6.11 (Computation of Relaxed Reduction Ratio). 1. Choose scalars λ kr λ, r =,..., m k 1, m k 1 r= λ kr = Compute the relaxed actual reduction rared k := rared k (s k ), where rared k (s) def = max { f(u k ), m k 1 r= λ kr f(u ji r ) 3. Compute the reduction ratio ρ k := ρ k (s k ) according to ρ k (s) def = rared k(s) pred k (s). } f(u k + s). (6.13) Remark At the very beginning of Algorithm 6.9, Step 4 invokes Algorithm 6.11 with m k =. In this case the sum in (6.13) is empty and thus rared k (s) = max {f(u k ), } f(u k + s) = f(u k ) f(u k + s) = ared k (s). The idea behind the above update rule is the following: Instead of requiring that f(u k +s k ) be smaller than f(u k ), it is only required that f(u k +s k ) is either less than f(u k ) or less than the weighted mean of the function values at the last m k = min{i+ 1, m} successful iterates. Of course, if m = 1, then rared k (s) = ared k (s) and the usual reduction ratio is recovered. Our approach is a slightly stronger requirement than the straightforward idea to replace ared k with rared k (s) = max f(u ji r ) f(u k + s). r<m k
112 18 6. Trust-Region Globalization Unfortunately, for this latter choice it does not seem to be possible to establish all the global convergence results that are available for the monotone case. For our approach, however, this is possible without making the theory substantially more difficult. Moreover, we can approximate rared k arbitrarily accurately by rared k if we choose λ sufficiently small, in each iteration select r k < m k satisfying f(u ji rk ) = max f(u ji r ), and set r<m k λ kr = λ if r r k, λ krk = 1 (m k 1)λ. (6.14) 6.2 Global Convergence For the global convergence analysis we rely on the following Assumption (i) The objective function f is continuously differentiable on an open neighborhood of the nonempty closed convex set K. (ii) The function f is bounded below on K. (iii) The norms of the model Hessians are uniformly bounded: B k U,U C B for all k. Throughout this section, Assumption 6.13 is required to hold. We first prove an important decrease property of the function values f(u k ). Lemma Let u k, s k, k, j i, etc., be generated by Algorithm 6.9. Then for all computed indices i 1 holds i 2 f(u ji ) < f(u ) η 1 λ pred jr (s jr ) η 1 pred ji 1 (s ji 1 ) < f(u ). (6.15) r= Proof. We will use the short notations ared k = ared k (s k ), rared k = rared k (s k ), and pred k = pred k (s k ). First, let us note that (6.11) implies pred k > whenever u k is not critical. Therefore, the second inequality holds. The proof of the first inequality is by induction. For i = 1 we have by (6.12) and using ρ j (s j ) > η 1 f(u j1 ) = f(u j +1) = f(u j ) ared j < f(u j ) η 1 pred j = f(u ) η 1 pred j. Now assume that (6.15) holds for 1,..., i. If rared ji = ared ji then, using (6.15) and λ 1, f(u ji+1 ) = f(u ji +1) = f(u ji ) ared ji = f(u ji ) rared ji i 2 < f(u ) η 1 λ pred jr η 1 pred ji 1 η 1 pred ji r= i 1 f(u ) η 1 λ pred jr η 1 pred ji. r=
113 6.2 Global Convergence 19 If rared ji ared ji then rared ji > ared ji, and with q = min{i, m 1} we obtain f(u ji+1 ) = f(u ji +1) = < q p= λ ji p q λ ji pf(u ji p ) rared ji p= ( i p 2 ) f(u ) η 1 λ pred jr η 1 pred ji p 1 η 1 pred ji. r= Using λ ji + + λ ji q = 1, λ ji p λ, and {,..., q} {,..., i q 2} {(p, r) : p q, r i p 2}, we can proceed f(u ji+1 ) < f(u ) η 1 λ η 1 λ i q 2 r= ( q p= λ ji p ) pred jr q pred ji p 1 η 1 pred ji p= i q 2 f(u ) η 1 λ pred jr η 1 λ r= i 1 = f(u ) η 1 λ pred jr η 1 pred ji. r= i 1 r=i q 1 pred jr η 1 pred ji Lemma Let u k, s k, k, etc., be generated by Algorithm 6.9. Then for arbitrary u K with χ(u) and < η < 1 there exist > and δ > such that ρ k η holds whenever u k u U δ and k are satisfied. Proof. Since χ(u), by continuity there exist δ > and ε > such that χ(u k ) ε for all k with u k u U δ. Now, for < ε and any k with u k u U δ and < k, we obtain from the decrease condition (6.11): pred k (s k ) = q k (s k ) β 2 χ(u k ) min { k, χ(u k )} β 2 ε k. In particular, by (6.1) s k U β 1 k β 1 β 2 ε pred k (s k). (6.16)
114 11 6. Trust-Region Globalization Further, with appropriate y k = u k + τ k s k, τ k [, 1], by the intermediate value theorem ared k (s k ) = f(u k ) f(u k + s k ) = (f (y k ), s k ) U = q k (s) + (g k f (y k ), s k ) U (s k, B k s k ) U pred k (s k ) ( g k f (y k ) U + 12 ) B ks k U s k U. Since f is continuous, there exists δ > such that f (u ) f (u) U (1 η) β 2ε 4β 1 for all u K with u u U < δ. Further, since B k U,U C B by Assumption 6.13 (iii), choosing sufficiently small yields 1 2 B ks k U (1 η) β 2ε. 2β 1 for all k with k. By reducing and δ, if necessary, such that δ + β 1 < δ we achieve, using (6.1), that for all k with u k u U δ and < k y k u U u k u U + τ k s k U δ + β 1 < δ, u k u U δ < δ. Hence, for all these indices k, g k f (y k ) U g k f (u) U + f (u) f (y k ) U (1 η) β 2ε 2β 1, and thus by (6.16) ( g k f (y k ) U + 12 B ks k U ) s k U (1 η) β 2ε β 1 s k U (1 η)pred k (s k ). This implies that for all these k there holds rared k (s k ) ared k (s k ) pred k (s k ) ( g k f (y k ) U + 12 ) B ks k U s k U ηpred k (s k ). The proof is complete. Lemma Algorithm 6.9 either terminates after finitely many steps with a critical point u k of (6.6) or generates an infinite sequence (s ji ) of accepted steps.
115 6.2 Global Convergence 111 Proof. Assume that Algorithm 6.9 neither terminates nor generates an infinite sequence (s ji ) of accepted steps. Then there exists a smallest index k such that all steps s k are rejected for k k. In particular, u k = u k, k k, and the sequence of trust-region radii k tends to zero as k, because k +j γ j 1 k. Since the algorithm does not terminate, we know that χ(u k ). But now Lemma 6.15 with u = u k yields that s k is accepted as soon as k becomes sufficiently small. This contradicts our assumption. Therefore, the assertion of the Lemma is true. Lemma Assume that Algorithm 6.9 generates infinitely many successful steps s ji and that there exists S S with Then lim inf S k χ(u k) =. k S k =. (6.17) Proof. Let the assumptions of the lemma hold and assume that the assertion is wrong. Then there exists ε > such that χ(u k ) ε for all k S S. From (6.17) follows that S is not finite. For all k S holds by (6.11) pred k (s k ) β 2 χ(u k ) min { k, χ(u k )} β 2 ε min { k, ε}. From this estimate, the fact that f is bounded below on K, see Assumption 6.13 (ii), and Lemma 6.14 we obtain for all j S, using λ 1 f(u ) f(u j ) > η 1 λ k S k<j η 1 λβ 2 ε k S k<j pred k (s k ) η 1 λ k S k<j min { k, ε} pred k (s k ) (as j ). This is a contradiction. Therefore, the assumption was wrong and the lemma is proved. We now have everything at hand that we need to establish our first global convergence result. It is applicable in the case γ >, min > and says that accumulation points are critical points of (6.6). Theorem Let γ > and min >. Assume that Algorithm 6.9 does not terminate after finitely many steps with a critical point u k of (6.6). Then the algorithm generates infinitely many accepted steps (s ji ). Moreover, every accumulation point of (u k ) is a critical point of (6.6).
116 Trust-Region Globalization Proof. Suppose that Algorithm 6.9 does not terminate after a finite number of steps. Then according to Lemma 6.16 infinitely many successful steps (s ji ) are generated. Assume that ū is an accumulation point of (u k ) that is not a critical point of (6.6). Since χ(ū), invoking Lemma 6.15 with u = ū yields > and δ > such that k S holds for all k with u k ū δ and k. Since ū is an accumulation point, there exists an infinite increasing sequence j i S, i, of indices such u j i ū δ and u j i ū. If (j i 1) S, then j i min. Otherwise, s j i 1 was rejected, which, since then u j i 1 = u j i, is only possible if j i 1 >, and therefore j i γ j i 1 > γ. We conclude that for all i holds j i min{ min, γ }. Now Lemma 6.17 is applicable with S = {j i : i } and yields χ(ū) = lim χ(u j ) = lim inf χ(u i i j ) =, i i where we have used the continuity of χ. This is a contradiction. Therefore, the assumption χ(ū) was wrong. Next, we prove a result that holds also for min =. Moreover, the existence of accumulation points is not required. Theorem Let γ > or min = hold. Assume that Algorithm 6.9 does not terminate after finitely many steps with a critical point u k of (6.6). Then the algorithm generates infinitely many accepted steps (s ji ). Moreover, lim inf k χ(u k) =. (6.18) In particular, if u k converges to ū, then ū is a critical point of (6.6). Proof. By Lemma 6.16, infinitely many successful steps (s ji ) are generated. Now assume that (6.18) is wrong, i.e., lim inf k χ(u k) >. (6.19) Then we obtain from Lemma 6.17 that k <. (6.2) k S In particular, (u ji ) is a Cauchy sequence by (6.1) and (6.12). Therefore, (u k ) converges to some limit ū, at which according to (6.19) and the continuity of χ holds χ(ū). Case 1: min >. Then by assumption also γ >, and Theorem 6.18 yields χ(ū) =, which is a contradiction. Case 2: min =. Lemma 6.15 with u = ū and η = η 2 yields > and δ > such that k S and k+1 k holds for all k with u k ū δ and k. Since u k ū,
117 6.2 Global Convergence 113 there exists k with u k ū δ for all k k. Case 2.1: There exists k k with k for all k k. Then k S and (inductively) k k for all k k. This contradicts (6.2). Case 2.2: For infinitely many k holds k >. By (6.2) there exists k k with ji for all j i k. Now, for each j i k, there exists an index k i > j i such that k, j i k < k i, and ki >. If k i S, set j i = k i, thus obtaining j i S with j i >. If k i / S, we have j i def = k i 1 j i k, and thus j i S, since by construction j i. Moreover, < ki γ 2 j i (here min = is used) implies that j i > /γ 2. By this construction, we obtain an infinitely increasing sequence (j i ) S with j i > /γ 2. Again, this yields a contradiction to (6.2). Therefore, in all cases we obtain a contradiction. Thus, the assumption was wrong and the proof of (6.18) is complete. Finally, if u k ū, the continuity of χ and (6.18) imply χ(ū) =. Therefore, ū is a critical point of (6.6). The next result shows that under appropriate assumptions the lim inf in (6.18) can be replaced by lim. Theorem 6.2. Let γ > or min = hold. Assume that Algorithm 6.9 does not terminate after finitely many steps with a critical point u k of (6.6). Then the algorithm generates infinitely many accepted steps (s ji ). Moreover, if there exists a set O that contains (u k ) and on which χ is uniformly continuous, then lim χ(u k) =. (6.21) k Proof. In view of Theorem 6.19 we only have to prove (6.21). Thus, let us assume that (6.21) is not true. Then there exists ε > such that χ(u k ) 2ε for infinitely many k S. Since (6.18) holds, we thus can find increasing sequences (j i ) i and (k i ) i with j i < k i < j i+1 and χ(u j i ) 2ε, χ(u k ) > ε k S with j i < k < k i, χ(u k i ) ε. Setting S = i= S i with S i = {k S : j i k < k i }, we have Therefore, with Lemma 6.17 lim inf χ(u k) ε. S k k <. k S In particular, k S i k as i, and thus, using (6.1) and (6.12), u k i u j i U s k U β 1 k (as i ). k S i k S i This is a contradiction to the uniform continuity of χ, since lim (u k u i i j i ) =, but χ(u k i ) χ(u j i ) ε i. Therefore, the assumption was wrong and the assertion is proved.
118 Trust-Region Globalization 6.3 Implementable Decrease Conditions Algorithm 6.9 requires the computation of trial steps that satisfy the conditions (6.1) and (6.11). We now describe how these condition can be implemented by means of a generalized Cauchy point which is based on the projected gradient path. As criticality measure we can use any criticality measure χ that is majorized by the projected gradient in the following sense: θχ(u) χ P (u) def = u P K (u f (u)) U (6.22) with a fixed parameter θ >. For u k K and t, we introduce the projected gradient path π k (t) = P K (u k tg k ) u k. and define the generalized Cauchy point s c k as follows: s c k = π k(σ k ), with σ k {1, 2 1, 2 2,...} chosen maximal such that q k (π k (σ k )) γ(g k, π k (σ k )) U, (6.23) π k (σ k ) U k, (6.24) where γ (, 1) is a fixed parameter. Our aim is to show that the following condition ensures that (6.11) is satisfied with a constant β 2 independent of u k. Fraction of Cauchy Decrease Condition: pred k (s k ) β 3 pred k (s c k ), (6.25) where β 3 (, 1] is fixed. We first establish several useful properties of the projected gradient path. Lemma Let u k K. Then for all t (, 1] and all s 1 holds π k (t) U π k (st) U s π(t) U, (6.26) (g k, π k (t)) U 1 t π k(t) 2 U χp (u k ) π k (t) U tχ P (u k ) 2, (6.27) Proof. The first inequality in (6.26) is well known, see, e.g., [136, Lem. 2]. The second inequality is proved in [27]. For (6.27), we use that (P K (v) v, u P K (v)) U u K, v U, (6.28) since w = P K (v) minimizes w v 2 U on K. We set v k(t) = u k tg k and derive (tg k, π k (t)) U = (π k (t) + [(v k (t) P K (v k (t))], π k (t)) U = π k (t) 2 + (v k (t) P K (v k (t)), P K (v k (t)) u k ) π k (t) 2, where we have used (6.28) in the last step. From χ P (u k ) = π k (1) and (6.26) follow the remaining assertions.
119 6.3 Implementable Decrease Conditions 115 This allows us to prove the well-definedness of the generalized Cauchy point. Lemma For all u k K, the condition (6.23) is satisfied whenever { } < σ k σ 2(1 γ) def = min 1,. C B Furthermore, the condition (6.24) holds for all σ k (, 1] with σ k g k U k. Proof. For all < t σ holds by Assumption 6.13 (iii) and (6.27) q k (π k (t)) = (g k, π k (t)) U (π k(t), B k π k (t)) U (g k, π k (t)) U + C B 2 π k(t) 2 U (1 C B t/2)(g k, π k (t)) γ(g k, π k (t)) U. Furthermore, (6.24) is met by all σ k (, 1] satisfying σ k g k U k, since holds for all t [, 1], see (6.27). π k (t) U t g k U Lemma Let s k satisfy the feasibility condition (6.1) and the fraction of Cauchy decrease condition (6.25). Then s k satisfies the reduction condition (6.11) for any criticality measure χ verifying (6.22) and any < β 2 1 { } 2 β 3γθ 2 2(1 γ) min 1,. C B Proof. 1. If σ k = 1, then by (6.23) and (6.27) pred k (s c k ) = q k(π k (σ k )) γ(g k, π k (1)) U γχ P (u k ) If σ k < 1, then for τ k = 2σ k either holds π k (τ k ) U > k or q k (π k (τ k )) > γ(g k, π k (τ k )) U. In the second case we must have τ k > σ by Lemma 6.22, and thus, using (6.26), Therefore, in both cases, π k (τ k ) U τ k π k (1) U σ χ P (u k ). π k (σ k ) U = π k (τ k /2) U 1 2 π k(τ k ) U 1 2 min{σ χ P (u k ), k }. Now, we obtain from (6.23) and (6.27) pred k (s c k ) = q k(π k (σ k )) γ(g k, π k (σ k )) U γχ P (u k ) π k (σ k ) U γ 2 χp (u k ) min{σ χ P (u k ), k }. As shown in 1, this also holds for the case σ k = 1. The proof is completed by using (6.22) and (6.25).
120 Trust-Region Globalization Remark Obviously, the generalized Cauchy point s c k satisfies (6.1) and (6.25). Since s c k is computed by an Armijo-type projected line search, we thus have an easily implementable way of computing an admissible trial step by choosing s k = s c k. 6.4 Transition to Fast Local Convergence We now return to the problem of solving the semismooth operator equation Φ(u) =. We assume that any ū U with Φ(ū) = is a critical point of the minimization problem (6.6). Especially the smoothing step makes it theoretically difficult to prove that close to a regular solution projected semismooth Newton steps satisfy the reduction condition (6.11) (or (6.25)). In order to prevent our discussion from becoming too technical, we avoid the consideration of smoothing steps by assuming that Φ : U U is Φ-semismooth. In the framework of MCPs this is, e.g., satisfied for U = L 2 (Ω) and Φ(u) = u P B (u λ 1 F (u)) if F has the form F (u) = λu+g(u) and G : L 2 (Ω) L p (Ω) is locally Lipschitz continuous, see section 4.2. Therefore, the assumptions of this section are: Assumption In addition to Assumption 6.13, let the following hold: (i) The operator Φ : U U is continuous with generalized differential Φ. (ii) The criticality measure χ satisfies v k K, lim Φ(v k) U = = lim χ(v k) =. k k Remark Assumption (ii) implies that any u U with Φ(u) = is a critical point of (6.6). In order to cover the different variants (6.2) (6.4) of minimization problems that can be used for globalization of (1.1), we propose the following hybrid method: Algorithm 6.27 (Trust-Region Projected Newton Algorithm). 1. Initialization: Choose η 1 (, 1), min, ν (, 1), and a criticality measure χ. Choose u K, > min, and a model Hessian B L(U, U). Choose an integer m 1 and fix λ (, 1/m] for the computation of ρ k. Compute ζ 1 := Φ(u ) U and set l 1 := 1, r := 1, k :=, i := 1, and i n := Compute χ k := χ(u k ). If χ k =, then STOP. 3. Compute a model Hessian B k L(U, U) and a differential M k Φ(u k ). 4. Try to compute s n,1 k U by solving M k s n,1 k = Φ(u k ). If this fails, then go to Step 11. Otherwise, set s n,2 k := P K (u k + s n,1 k ) u k.
121 5. Compute s n k := min {1, k s n,2 k U 6.4 Transition to Fast Local Convergence 117 } s n,2 k and ζ k := Φ(u k + s n k ) U. 6. If ζ k νζ lr, then set s k := s n k. Otherwise, go to Step If s k fails to satisfy (6.11), then go to Step Call Algorithm 6.11 with m k = min{i i n, m} to compute ρ k := ρ k (s k ). If ρ k η 1 then go to Step 9. Otherwise, obtain a new trust-region radius k+1 by invoking Algorithm 6.1, set l r+1 := k, increment r by 1 and go to Step Set u k+1 := u k + s k, k+1 := max{ min, k }, j i+1 := k, l r+1 := k, and i n := i + 1. Increment k, r, and i by 1 and go to Step If s k = s n k satisfies (6.11), then set s k := s n k and go to Step Compute a trial step s k satisfying the conditions (6.1) and (6.11). 12. Compute the reduction ratio ρ k := ρ k (s k ) by calling Algorithm 6.11 with m k = min{i i n, m}. 13. Compute the new trust-region radius k+1 by invoking Algorithm If ρ k η 1 then reject the step s k : Set u k+1 := u k, B k+1 := B k, and M k+1 := M k. If the computation of s n,2 k was successful, then set s n,2 k+1 := sn,2 k, increment k by 1, and go to Step 5. Otherwise, increment k by 1 and go to Step Accept the step: Set u k+1 := u k + s k and j i+1 := k. Increment k and i by 1 and go to Step 2. In each iteration, a semismooth Newton step s n,1 k for the equation Φ(u) = is computed. This step is projected onto K and scaled to lie in the trust-region; the resulting step is s n k. In Step 6 a test is performed to decide if sn k can be accepted right away or not. If the outcome is positive, the step s n k is accepted in any case (either in step 9 or, via step 8, in step 15, see below), the index k is stored in l r+1, and r is incremented. Therefore, the sequence l < l 2 < lists all iterations at which the test in Step 6 was successful and, thus, the semismooth Newton step was accepted. The resulting residual ζ lr = Φ(u lr + s n l r ) U is stored in ζ lr, and ζ l 1 holds the initial residual Φ(u ) U. The test in Step 6 ensures that ζ lr νζ lr 1 ν r+1 ζ l 1 = ν r+1 Φ(u ) U. After a positive outcome of the test in Step 6, it is first checked if the step s k = s n k also passes the ordinary (relaxed) reduction-ratio-based acceptance test. This is done to embed the new acceptance criterion as smoothly as possible in the trustregion framework. If s k = s n k satisfies the reduction-ratio-based test, then s k is treated as any other step that is accepted by the trust-region mechanism. If it does not, the step is nevertheless accepted (in Step 9), but now i n is set to i + 1, which has the consequence that in the next iteration we have m k =, which results in a restart of the rared-nonmonotonicity mechanism. If the test ζ k νζ lr in Step 6 fails,
122 Trust-Region Globalization then s n k is chosen as ordinary trial step if it satisfies the condition (6.11); note that (6.1) is satisfied automatically. Otherwise, a different trial step is computed. The global convergence result of Theorem 6.19 can now easily be generalized to Algorithm Theorem Let the Assumption 6.25 hold and let γ > or min =. Assume that Algorithm 6.27 does not terminate after finitely many steps with a critical point u k of (6.6). Then the algorithm generates infinitely many accepted steps (s ji ). Moreover, lim inf k χ(u k) =. In particular, if u k converges to ū, then ū is a critical point of (6.6). Proof. The well-definedness of Algorithm 6.27 follows immediately from the welldefinedness of Algorithm 6.9, which was established in Lemma Therefore, if Algorithm 6.27 does not terminate finitely, the sequences (s ji ) of accepted steps is infinite. If r remains bounded during the algorithm, i.e., if only finitely many steps s n k pass the test in Step 6, then Algorithm 6.27 eventually turns into Algorithm 6.9. In fact, if Step 9 is never entered, then all accepted steps pass the reductionratio-based test and thus Algorithm 6.27 behaves like Algorithm 6.9 from the very beginning. Otherwise, let k = j i be the last iteration at which Step 9 is entered. Then k +1 min and i n = i + 1 for all k > k. In particular, m k = for all j i < k j i +1. Thus, Algorithm 6.27 behaves like an instance of Algorithm 6.9 started at u = u k +1 with = k +1. Hence, the assertion follows from Theorem If, on the other hand, r during the algorithm, then we have inductively Φ(u lr +1) U = ζ lr νζ lr 1 ν r+1 Φ(u ) U as r. By Assumption 6.25 (ii) this implies χ(u lr +1). Since χ is continuous, we see that u k ū implies that ū is a critical point of (6.6). Remark Various generalizations can be incorporated. For instance, it is possible not to reset m k to zero after acceptance of s n k in Step 9. Hereby, we would have to generalize Lemma 6.14 along the lines of [142]. Further, we could allow for nonmonotonicity of the residuals ζ lr in a similar way as for the function values f(u ji ). We now come to the proof of transition to fast local convergence. Theorem 6.3. Let the Assumption 6.25 hold and let min >. Assume that Algorithm 6.27 generates an infinite sequence (u k ) of iterates that converges to a point ū U with Φ(ū) =. Let Φ be Φ-semismooth at ū and Lipschitz continuous near ū. Further, assume that M k is invertible with M 1 k U,U C M 1 whenever u k is sufficiently close to ū. Then (u k ) converges q-superlinearly to ū. If Φ is even α-order semismooth at ū, < α 1, then the q-rate of convergence is at least 1 + α.
123 6.4 Transition to Fast Local Convergence 119 Proof. Using the assumptions, the abstract local convergence result of Theorem 3.19 for projected semismooth Newton methods is applicable with S k (u) = u and yields Therefore, u k + s n,2 k ū U = o( u k ū U ) (as u k ū). (6.29) s n,2 k U u k ū U + u k + s n,2 k ū U 3 2 u k ū U, (6.3) s n,2 k U u k ū U u k + s n,2 k ū U 1 2 u k ū U (6.31) for all u k in a neighborhood of ū, and thus 1 2 u k ū U s n,2 k U s n,1 k We conclude that for u k near ū holds U = M 1 k Φ(u k) U C M 1 Φ(u k ) U. Φ(u k + s n,2 k ) U L u k + s n,2 k ū U = o( u k ū U ) = o( Φ(u k ) U ), (6.32) where L is the Lipschitz constant of Φ near ū. Since u k ū, we see from (6.3) and (6.32) that there exists K with s n,2 k U min, Φ(u k + s n,2 k ) U ν Φ(u k ) U k K. The mechanism of updating k implies k min whenever k 1 S. Hence, for all k K with k 1 S we have s n k = sn,2 k and thus ζ k ν Φ(u k ) U. Now assume that none of the steps s n k, k K, passes the test in Step 6. Then r and thus ζ lr > remain unchanged for all k K. But since Φ(u k ) as k, there exists k K with k 1 S and Φ(u k ) U ζ lr. Thus s n k would satisfy the test in Step 6, which is a contradiction. Hence, there exists k K for which s n k satisfies the test in Step 6 and thus is accepted. Then, in iteration k = k + 1, we have k min, s n k = sn,2 k, and ζ k ν Φ(u k ) U = νζ k, so that s n k again passes the test in Step 6 and therefore is accepted. Inductively, all steps s n k = sn,2 k, k k, are accepted. The superlinear convergence now follows from (6.29). If Φ is α-order semismooth, then (6.29) holds with o( u k ū U ) replaced by O( u k ū 1+α U ) and the rate of convergence is thus at least 1 + α. The reason why we require convergence u k ū instead of considering an accumulation point ū is that, although we can show that ζ k = o( Φ(u k ) U ) for k 1 S and u k close to ū, it could be that ζ lr is so small that nevertheless ζ k > νζ lr. However, depending on the choice of the objective function f, it often is easy to establish that there exists a constant C Φ > with Φ(u k ) U C Φ Φ(u lr ) U for all iterations k and corresponding r. (6.33) This holds, e.g., for f(u) = Φ(u) 2 U /2 if the amount of nonmonotonicity of f(u l r ) is slightly restricted. If (6.33) holds, we can prove the following more general result:
124 12 6. Trust-Region Globalization Theorem Let the Assumption 6.25 hold and let min >. Assume that Algorithm 6.27 generates an infinite sequence (u k ) of iterates that has an accumulation point ū U with Φ(ū) =. Let Φ be Φ-semismooth at ū and Lipschitz continuous near ū. Further, assume that M k is invertible with M 1 k U,U C M 1 whenever u k is sufficiently close to ū. Finally, assume that (6.33) holds. Then (u k ) converges q-superlinearly to ū. If Φ is even α-order semismooth at ū, < α 1, then the q-rate of convergence is at least 1 + α. Proof. As in the proof of Theorem 6.3 we can show that (6.29) holds. We then can proceed similar as above to show that there exists δ > such that for all k with k 1 S and u k ū + δb U holds s n k = sn,2 k, u k + s n k ū + δb U, ζ k = Φ(u k + s n k) U ν Φ(u k ) U ν Φ(u lr ) U = νζ lr, C Φ where we have used (6.33). Let k be any of those k. Then the step s n k satisfies the test in Step 6 and hence is accepted. Furthermore, k = k +1 again satisfies k 1 S and u k ū + δb U, so that also s n k is accepted. Inductively, sn k is accepted for all k k. Superlinear convergence to ū and convergence with rate 1+α now follow as in the proof of Theorem 6.3.
125 7. Applications 7.1 Distributed Control of a Nonlinear Elliptic Equation Let Ω R n be a nonempty and bounded open domain with sufficiently smooth boundary and consider the nonlinear control problem 1 minimize y d (x)) y H 1(Ω),u L2 (Ω) 2 Ω(y(x) 2 dx + λ (u(x) u d (x)) 2 dx 2 Ω (7.1) subject to y + ϕ(y) = f + gu on Ω, β 1 u β 2 on Ω. We assume y d L 2 (Ω), u d L (Ω) (L q with q > 2 would also be possible) f L 2 (Ω), g L (Ω), β 1 < β 2 + ; λ > is the regularization parameter. Further, let ϕ : R R be nondecreasing and twice continuously differentiable with ϕ (τ) c 1 + c 2 τ s 3, (7.2) where c 1, c 2 are constants and s > 3 is fixed with s (3, ] for n = 1, s (3, ) for n = 2, and s (3, 2n/(n 2)] for n = 3, 4, 5. We set U = L 2 (Ω), Y = H 1 (Ω), W = H 1 (Ω), W = H 1 (Ω), C = [β 1, β 2 ], C = {u U : u(x) C on Ω}, and define J(y, u) = 1 y d (x)) 2 Ω(y(x) 2 dx + λ (u(x) u d (x)) 2 dx, 2 Ω (7.3) E(y, u) = y + ϕ(y) f gu. (7.4) Then we can write (7.1) in the form minimize y Y,u U J(y, u) subject to E(y, u) =, u C. (7.5) We now begin with our investigation of the control problem. Lemma 7.1. The operator E : Y U W defined in (7.4) is twice continuously differentiable with derivatives
126 Applications E y (y, u) = + ϕ (y)i, E u (y, u) = gi, E yu (y, u) =, E uy (y, u) =, E uu (y, u) = E yy (y, u)(v 1, v 2 ) = ϕ (y)v 1 v 2. Proof. By Proposition A.11 and (7.2), the superposition operator u L s (Ω) ϕ(u) L s (Ω), is twice continuously differentiable, since 1 s + 1 s = 1, s 2s The choice of s implies the embeddings s = s s 2 = s 3. H 1 (Ω) L s (Ω), L s (Ω) H 1 (Ω). Therefore, the operator y H 1(Ω) ϕ(y) H 1 (Ω) is twice continuously differentiable, too, and thus also E. The form of the derivatives is obvious, see Propositions A.1 and A.11. Lemma 7.2. For every u U, the state equation E(y, u) = possesses a unique solution y = y(u) Y. Proof. Integrating (7.2) twice, we see that there exists constants C i, C i with ϕ (τ) C 1 + C 2 τ s 2, ϕ(τ) C 1 + C 2 τ s 1. (7.6) Therefore, by Proposition A.9, y L t (Ω) ϕ(y) L t s 1 (Ω) is continuous for all s 1 < t <, (7.7) y L t (Ω) ϕ (y) L t s 2 (Ω) is continuous for all s 2 < t <. (7.8) Now, let θ(t) = t ϕ(τ)dτ. Then θ (t) = ϕ(t), and from (7.6) and Proposition A.11 follows that the mapping y L t θ(y) L t/s is twice continuously differentiable for all s t < with first derivative v ϕ(y)v and second derivative (v, w) ϕ (y)vw. Since H 1 Ls, this also holds for y H 1 θ(y) L1 (Ω). Now consider, for fixed u C, the function e : H 1 R, e(y) = 1 y(x) y(x)dx + θ(y(x))dx (f + gu, y) L 2. 2 Ω This function is twice continuously differentiable with Ω
127 7.1 Distributed Control of a Nonlinear Elliptic Equation 123 e (y) = y + ϕ(y) f gu = E(y, u), e (y)(v, v) = v, v H 1,H 1 + ϕ (y(x))v(x)v(x)dx v 2 H. 1 Ω Therefore, by standard existence and uniqueness results for strongly convex optimization problems, see, e.g., [147, Prop ], there exists a unique solution y = y(u) H 1 (Ω) of E(y, u) =. Thus, for all u, there exists a unique solution y = y(u) of the state equation. Next, we discuss the existence of solutions of the control problem for the cases n = 1, 2, 3. To simplify the presentation, we assume s (3, 4] in the case n = 3. Lemma 7.3. Let n = 1, 2, 3, and, assume s (3, 4] in the case n = 3. Then the control problem (7.5) admits a solution. Proof. By Lemma 7.2 there exists a (feasible) minimizing sequence (y k, u k ) for the control problem, which, due to the structure of J, is bounded in L 2 L 2. Note that in the case β 1, β 2 R the particular form of C even implies that u k L max{ β 1, β 2 }, but we do not need this here. From E(y k, u k ) = and (ϕ(y) ϕ())y we obtain y k 2 H y 1 k, y k H 1,H 1 + [ϕ(y k )(x) ϕ()]y k (x)dx = (f + gu k ϕ(), y k ) L 2 Ω ( f L 2 + g L u k L 2 + µ(ω) 1/2 ϕ() ) y k L 2 C ( f L 2 + g L u k L 2 + ϕ() ) y k H 1. This implies that (y k ) is bounded in H 1. Since H1 Lt for all 1 t if n = 1, 1 t < if n = 2, and all 1 t 2n/(n 2) = 6 if n = 3, we conclude from (7.8) that ϕ(y k ) is bounded in all spaces L t, 1 t if n = 1, 1 t < if n = 2, 1 t 6/(s 1) 2 if n = 3. Thus, y k is bounded in L 2, and therefore, using regularity results (we assume that the boundary of Ω is sufficiently nice ), y k is bounded in H 1 H2. Since H 1 H2 is compactly embedded in L and also in H 1, we can extract a subsequence with y k y strongly in H 1 and strongly in L, and, due to the boundedness of u k in L 2 and the weak sequential closedness of C, u k u C weakly in L 2. Hence, ϕ(y k ) ϕ(y ) strongly in L. Now f + gu k f + gu weakly in L 2, f + gu k = y k + ϕ(y k ) y + ϕ(y ) strongly in H 1 shows E(y, u ) =. Therefore, (y, u ) is feasible. Further, J is continuous and convex, and thus weakly lower semicontinuous. From the weak convergence (y k, u k ) (y, u ) we thus conclude that (y, u ) solves the problem.
128 Applications Black-Box Approach In Lemma 7.2 it was proved that the state equation admits a unique solution y(u). Therefore, we can introduce the reduced objective function j(u) = J(y(u), u) and consider the equivalent reduced problem minimize u U j(u) subject to u C. (7.9) From Lemma 7.1 we know that E is twice continuously differentiable. Our next aim is to apply the implicit function theorem to prove that y(u) is twice continuously differentiable. To this end we observe: Lemma 7.4. For all y Y and u U, the partial derivative is a homeomorphism with E y (y, u) = + ϕ (y)i L(Y, W ) = L(H 1, H 1 ) E y (y, u) 1 W,Y 1. Proof. Since ϕ is nondecreasing, we have ϕ and thus for all v H 1 E y (y, u)v, v H 1,H 1 = (v, v) H 1 + ϕ (y)v 2 dx v 2 H. 1 Ω Therefore, by the Lax Milgram theorem, E y (y, u) L(H 1, H 1 ) = L(Y, W ) is a homeomorphism with E y (y, u) 1 W,Y 1. Therefore, we can apply the implicit function theorem to obtain Lemma 7.5. The mapping u U y(u) Y is twice continuously differentiable. Since the objective function J is quadratic, we thus have Lemma 7.6. The reduced objective function j : U R is twice continuously differentiable. Finally, we establish the following structural result for the reduced gradient: Lemma 7.7. The reduced gradient j (u) has the form j (u) = λu + G(u), G(u) = gw(u) λu d, where w = w(u) solves the adjoint equation w + ϕ (y)w = y d y(u). (7.1) The mapping u U G(u) L p (Ω) is continuously differentiable, and thus locally Lipschitz continuous, for all p [2, ] if n = 1, p [2, ) if n = 2, and p [2, 2n/(n 2)] if n 3. As a consequence, the mapping u L p (Ω) j (u) L r (Ω) is continuously differentiable for all p [2, ] and all r [1, min{p, p }].
129 7.1 Distributed Control of a Nonlinear Elliptic Equation 125 Proof. Using the adjoint representation of j, we see that j (u) = J u (y(u), u) + E u (y(u), u) w(u) = λ(u u d ) gw(u), where w = w(u) solves the adjoint equation E y (y(u), u) w = J y (y(u), u), which has the form (7.1). Since E y (y(u), u) is a homeomorphism by Lemma 7.4, the adjoint state w(u) is unique. Further, since E y, y(u), and J y are continuously differentiable, we can use the implicit function theorem to prove that the mapping u U w(u) W is continuously differentiable, and thus, in particular, locally Lipschitz continuous. For p as given in the Lemma, the embedding W = H 1 Lp implies that the operator G(u) = gw(u) λu d is continuously differentiable, and thus locally Lipschitz continuous, as a mapping from U to L p. The last assertion of the Lemma follows immediately. Our aim is to apply our class of semismooth Newton methods to compute critical points of problem (7.9), i.e., to solve the VIP u C, (j (u), v u) L 2 = v C. (7.11) The solutions of (7.11) enjoy the following regularity property: Lemma 7.8. Every solution ū U of (7.11) satisfies ū L (Ω) if β 1, β 2 R, and ū L p (Ω) with p as in Lemma 7.7, otherwise. Proof. For β 1, β 2 R we have C L (Ω) and the assertion is obvious. For β 1 =, β 2 = + follows from (7.11) = j (ū) = λū + G(ū), and thus ū = λ 1 G(ū) L p (Ω) by Lemma 7.7. For β 1 >, β 2 = + we conclude in the same way 1 {ū β1 }j (ū) =, and thus 1 {ū β1 }ū = λ 1 1 {ū β1 }G(ū) L p (Ω). Furthermore, 1 {ū=β1 }ū = β 1 1 {ū=β1 } L (Ω). The case β 1 =, β 2 < + can be treated in the same way. With the results developed above we have everything at hand to prove the semismoothness of the superposition operator Π arising from equation reformulations Π(u) =, Π(u) def = π(u, j (u)) (7.12) of problem (7.11), where π is an MCP-function for the interval [β 1, β 2 ]. In the following, we distinguishing the two variants of reformulations that were discussed in section
130 Applications First Reformulation Here, we discuss reformulations based on a general MCP-function π = φ [β1,β 2 ] for the interval C = [β 1, β 2 ]. Theorem 7.9. The problem assumptions imply that Assumption 5.1 (a), (b) (with Z = {}) is satisfied with F = j for any p [2, ], any p p with p [2, ] if n = 1, p [2, ) if n = 2, and p [2, 2n/(n 2)] if n 3, and any r [1, p ]. In particular, if π satisfies the Assumption 5.1 (c), (d), then Theorem 5.11 yields the Π-semismoothness of the operator Π. Hereby, the differential Π(u) consists of all operators M L(L p, L r ), M = d 1 I + d 2 j (u), d L (Ω) 2, d π ( u, j (u) ) on Ω. (7.13) Proof. The assertions follow immediately from the boundedness of Ω, Lemma 7.7, and Theorem Concerning higher order semismoothness, we have: Theorem 7.1. Suppose that the operator y H 1 (Ω) ϕ(y) H 1 (Ω) is three times continuously differentiable. This can, e.g., be satisfied if ϕ has suitable properties. Then the Assumption 5.12 (a), (b) with Z = {} and α = 1 is satisfied by F = j for r = 2, any p (2, ], and all p p with p (2, ] if n = 1, p (2, ) if n = 2, and p (2, 2n/(n 2)] if n 3. In particular, if π satisfies the Assumption 5.12 (c), (d), then Theorem 5.13 yields the β-order Π-semismoothness of the operator Π(u) = π(u, j (u)), where β is given by Theorem The differential Π(u) consists of all operators M L(L p, L 2 ) of the form (7.13). Proof. If y H 1 ϕ(y) H 1 is three times continuously differentiable, then also E and thus, by the implicit function theorem, y(u) is three times continuously differentiable. Hence, j : L 2 L 2 is twice continuously differentiable and therefore its derivative is locally Lipschitz continuous. The same then holds true for u L p j (u) L r. The assertions now follow from the boundedness of Ω, Lemma 7.7, and Theorem Remark The Hessian operator j can be obtained via the adjoint representation in appendix A.1. In section it is described how finite element discretizations of j, j, j and Φ, etc., can be computed. Second Reformulation We now consider the case where Π(u) = u P [β1,β 2 ](u λ 1 j (u)) is chosen to reformulate the problem as equation Π(u) =.
131 7.1 Distributed Control of a Nonlinear Elliptic Equation 127 Theorem The problem assumptions imply that Assumption 5.14 (a), (b) (with Z = {}) is satisfied with F = j for r = 2 and any p (2, ] if n = 1, p (2, ) if n = 2, and p (2, 2n/(n 2)] if n 3. In particular, Theorem 5.15 yields the Π-semismoothness of the operator Π. Hereby, the differential Π(u) consists of all operators M L(L r, L r ), M = I + λ 1 d G u (u), d L (Ω), d P [β1,β 2 ]( λ 1 G(u) ) on Ω. (7.14) Proof. The assertions follow immediately from the boundedness of Ω, Lemma 7.7, and Theorem A result establishing higher order semismoothness analogous to Theorem 7.1 can also be established, but we do not formulate it here. Remark Since j (u) = λi + G u (u), the adjoint representation of appendix A.1 can be used to compute G u (u). Regularity For the application of semismooth Newton methods, a regularity condition like in Assumption 3.59 (i) has to hold. For the problem under consideration, we can establish regularity by using the sufficient condition of Theorem 4.8. Since this condition was established for NCPs (but can be extended to other situations), we consider the case of the NCP, i.e., β 1 =, β 2 =. To apply Theorem 4.8, we have to verify the conditions of Assumption 4.6. The assumptions (a) (d) follow immediately from Lemma 7.7 for p as in the Lemma and any p [p, ]. Note hereby that G (u) = j (u) λi is selfadjoint. Assumptions (e) requires that the Hessian operator j (ū) is coercive on the tangent space of the strongly active constraints, which is an infinite-dimensional analogue of the strong second order sufficient condition for optimality. The remaining assumptions (f) (h) only concern the NCP-function and are satisfied for φ = φ F B as well as φ(x) = x 1 P [, ) (x 1 λ 1 x 2 ), the NCP-function used in the second reformulation. Application of Semismooth Newton Methods In conclusion, we have shown that problem (7.1) satisfies all assumptions that are required to prove superlinear convergence of our class of (projected) semismooth Newton methods. Hereby, both types of reformulations are appropriate, the one of section and the semismooth reformulation of section 4.2, the latter yielding a smoothing-step-free method. Numerical results are given in section 7.2.
132 Applications All-at-Once Approach We now describe, in some less detail, how mixed semismooth Newton methods can be applied to solve the all-at-once KKT-system. The continuous invertibility of E y (y, u) = + ϕ (y)i L(H 1, H 1 ) guarantees that Robinson s regularity condition is satisfied, so that every solution (ȳ, ū) satisfies the KKT-conditions (5.24) (5.26), where w W = H 1 (Ω) is a multiplier. The Lagrange function L : Y U W R is given by L(y, u, w) = J(y, u) + E(y, u), w H 1,H 1 = J(y, u) + w, y H 1,H 1 + ϕ(y), w H 1,H 1 (f, w) L 2 (gu, w) L 2. Now, using the results of the previous sections, we obtain Lemma The Lagrange function L is twice continuously differentiable with derivatives L y (y, u, w) = J y (y, u) + E y (y, u) w = y y d w + ϕ (y)w, L u (y, u, w) = J u (y, u) + E u (y, u) w = λ(u u d ) gw, L w (y, u, w) = E(y, u), L yy (y, u, w) = (1 + ϕ (y)w)i, L yu (y, u, w) =, L uy (y, u, w) =, L uu (y, u, w) =. Since L w = E, we have L wy = E y, etc., see Lemma 7.1 for formulas. Furthermore, L u can be written in the form L u (y, u, w) = λu + G(y, u, w), G(y, u, w) = gw λu d. The mapping (y, u, w) Y U W G(y, u, w) L p (Ω) is continuous affine linear for all p [2, ] if n = 1, p [2, ) if n = 2, and p [2, 2n/(n 2)] if n 3. As a consequence, the mapping (y, u, w) Y L p (Ω) W L u (y, u, w) L r (Ω) is continuous affine linear for all p [2, ] and all r [1, min{p, p }]. Proof. The differentiability properties and the form of the derivatives is an immediate consequence of Lemma 7.1. The mapping properties of L u are due to the fact that the embedding H 1 is continuous. Lp For KKT-triples we have the following regularity result: Lemma Every KKT-triple (ȳ, ū, w) Y U W of (7.11) satisfies ū L (Ω) if β 1, β 2 R, and ū L p (Ω) with p as in Lemma 7.14, otherwise. Proof. The proof of Lemma 7.8 can be easily adjusted.
133 7.1 Distributed Control of a Nonlinear Elliptic Equation 129 From Lemma 7.14 we conclude that Assumption 5.17 (a) (c) is satisfied for r = 2, all p [2, ], and all p p as in the lemma. Hence, using an MCP-function π that satisfies Assumption 5.17 (d), we can write the KKT conditions in the form (5.27), and Theorem 5.19 yields the semismoothness of Σ. Furthermore, Lemma 7.14 implies that Assumption 5.27 is satisfied for p = p, and we thus can compute smoothing steps as described in Theorem Therefore, if the generalized differential is regular near the KKT-triple (ȳ, ū, w) Y L p (Ω) W, p = p, (cf. Lemma 7.15), the semismooth Newton methods of section are applicable and converge superlinearly. In a similar way, we can deal with the second mixed reformulation, which is based on Assumption Finite Element Discretization For the discretization of the state equation, we follow [62, Ch. IV.2.5], [63, App ]. Let Ω R 2 be a bounded polygonal domain and let T h be a regular triangulation of Ω: T h = {Ti h : Ti h is a triangle, i = 1,..., m h }. T h T T h = Ω, int T h h i int Tj h = for all i j. For all i j, Ti h Tj h is either a common edge or a common vertex or the empty set. The parameter h denotes the length of the longest edge of all triangles in the triangulation. Now, we define V h = {v h C ( Ω) : v h T affine linear for all T T h }, V h = {v h V h : v h Ω = }. Further, denote by Σ h the set of all vertices in the triangulation T h and by Σ h = {P Σh : P / Γ } the set of all interior vertices of T h. For any P Σ h there exists a unique function βh P V h with βh P (P ) = 1 and βp h (Q) = for all Q Σh, Q P. The set β h = {βp h : P Σh } is a basis of V h, and we can write any v h V h uniquely in the form v h = vp h βh P, with vh P = vh (P ). P Σ h The space H h L (Ω) is defined by H h = {u h L (Ω) : u h T constant for all T T }.
134 13 7. Applications Hereby, the specific values of u h on the edges of the triangles (which are null sets) are not relevant. The set of functions η h = {ηt h : T T h }, ηt h = 1 on T and ηt h =, otherwise, forms a basis of Hh, and for all u h H h holds u h = u h T ηh T, where uh T u h T. T T h For any P Σ h, let Ωh P be the polygon around P whose boundary connects midpoints of edges emanating from P with midpoints of triangles containing P and this edge. By χ h P, we denote the characteristic function of Ω P, being equal to one on ΩP h and vanishing on Ω \ Ω P. Finally, we introduce the linear operator L h : C ( Ω) H 1(Ω) L (Ω), L h v = v(p )χ h P. P Σ h Obviously, L h v is constant on int Ω P with value v(p ). We choose H h for the discrete control space and V h for the discrete state space. Now, we discretize the state equation as follows: (y h, v h ) H 1 + ϕ(l h y h )(L h v h )dx = (f + gu h, v h ) L 2 v h V h. (7.15) Ω It is easy to see that ϕ(l h y h )(L h βp h )dx = ϕ(yp h )(L h βp h, L h βp h ) L 2 Ω = µ(ω P )ϕ(y h P ) = 1 3 The objective function J is discretized by J h (y h, u h ) = 1 2 Ω(L h y h y d ) 2 dx + λ 2 µ(t )ϕ(yp h ). T P Ω (u h u d ) 2 dx. Remark For the first integral in J h we also could have used (y h y d ) 2 dx, Ω but in coordinate form this would result in a quadratic term of the form 1 2 yht ˆMh y h, with non-diagonal matrix ˆM h, ˆM h ij = (βh i, βh j ) L2, which would make the numerical computations more expensive. The discrete feasible set is C h = H h C. Thus, we can write down the fully discrete control problem:
135 7.1 Distributed Control of a Nonlinear Elliptic Equation minimize y h V h,uh H h 2 Ω(L h y h y d ) 2 dx + λ (u h u d ) 2 dx 2 Ω subject to (y h, v h ) H 1 + (ϕ(l h y h ), L h v h ) L 2 (7.16) = (f + gu h, v h ) L 2 v h V h u h C h. Next, we intend to write (7.16) in coordinate form. To this end, let Σ h = { P1 h,..., P h } n, β h h i = β h P, η h i h l = η h Tl h. Further, we write y h R nh for the coordinates of y h V h with respect to the basis β h = {β h i } and uh R mh for the coordinates of u h H h with respect to the basis η h = {η h l }. We define the matrices Ah, S h R nh n h, A h ij = (β h i, β h j ) H 1, S h ij = (L h β h i, L h β h j ) L 2, (7.17) (note that S h is diagonal and positive definite), the vectors f h, ϕ(y h ) R nh, and the matrix G h R nh m h, f h i = (β h i, f) L 2, ϕ(yh ) i = ϕ(y h i ), G h il = (βh i, gηh l ) L 2. Then (7.15) is equivalent to the nonlinear system of equations Further, in coordinates we can write J h as A h y h + S h ϕ(y h ) = f h + G h u h. (7.18) J h (y h, u h ) = 1 2 yht S h y h y h d T S h y h + λ 2 uht M h u h λu h d T M h u h + γ, where the mass matrix M h R mh m h, the vectors yd h, u h Rnh d, and the Rmh scalar γ are defined by M h kl = (ηk, h ηl h ) L 2, (yd) h 1 i = y d (x)dx, µ(ω Pi ) Ω Pi (M h u h d) l = (η h l, u d ) L 2, γ = 1 2 y d 2 L 2 + λ 2 u d 2 L 2. Finally, we note that u h C h if and only if its η h -coordinates u h satisfy u h C h, where C h = {u h R mh : u h l C, l = 1,..., m h }. Thus, we can write down the fully discrete control problem in coordinate form:
136 Applications J h (y h, u h ) minimize y h R nh,u h R mh subject to A h y h + S h ϕ(y h ) = f h + G h u h, u h C h. (7.19) It is advisable to consider problem (7.19) only in conjunction with the coordinatefree version (7.16), since (7.16) still contains all the information on the underlying function spaces while problem (7.19) does not. To explain this in more detail, we give a very simple example (readers familiar with discretizations of control problems can skip the example): Example Let us consider the trivial problem minimize u L 2 (Ω) j(u) def = 1 2 u 2 L 2. Since j (u) = u, from any point u L 2 a gradient step with stepsize 1 brings us to the solution u. Of course, for a proper discretization of this problem, we expect a similar behavior. Discretizing U = L 2 (Ω) by H h as above, and j by j h (u h ) = j(u h ) = u h 2 L 2 /2, we have j h (u h ) = u h and thus, after one gradient step with stepsize 1, we have found the solution. Consequently, if u h are the η h -coordinates of u h, then the η h -coordinates j h (u h ) of j h (u h ) = u h are j h (u h ) = u h, and the step j h (u h ) brings us from u h to the solution. However, the following approach yields a completely different result: In coordinate form, the discretized problem reads minimize u h R mh j h (u h ) with j h (u h ) = 1 2 uht M h u h. Differentiating j h (u h ) with respect to u h yields d du h jh (u h ) = M h u h = M h j h (u h ). Since M h = O(h 2 ), this Euclidean gradient is very short and a gradient step of stepsize one will provide almost no progress. Therefore, it is crucial to work with gradients that are represented with respect to the correct inner product, in our case the one induced by the matrix M h, which corresponds to the inner product of H h, the discretization of L Discrete Black-Box-Approach We proceed by discussing the black-box approach, applied to the discrete control problem (7.16). It is straightforward to derive analogues of Lemmas for the discrete control problem. In particular, the discrete state equation (7.15) possesses a unique solution operator u h H h y h (u h ) V h which is twice continuously differentiable. The reduced objective function is j h (u h ) = J h (y h (u h ), u h ) where
137 7.1 Distributed Control of a Nonlinear Elliptic Equation 133 y h = y h (u h ) solves (7.15), or, in coordinate form, j h (u h ) = J h (y h (u h ), u h ), where y h = y h (u h ) solves (7.18). The discrete adjoint equation is given by the variational equation v h V h : (v h, w h ) H 1 + (ϕ (L h y h )L h v h, L h w h ) L 2 = J h y h (y h, u h ), v h H 1,H 1. The coordinates w h R n h of the discrete adjoint state w h V h ( A h + T h (y h ) ) w h = S h (y h yd h ), are thus given by where T h (y h ) = S h diag ( ϕ (y h 1 ),..., ϕ (y h n h ) ). The discrete reduced gradient j h (u h ) H h satisfies (j h (u h ), z h ) L 2 = (J h u h (y h, u h ), z h ) L 2 + (w h, gz h ) L 2 = (λ(u h u d ) gw h, z h ) L 2. Now observe that ( k (Mh 1 G ht w h ) k η h k, l ηh l zh l ) = z ht G ht w h = (w h, gz h ) L 2 = (gw h, z h ) L 2. Hence, the η h -coordinates of j h (u h ) are L 2 = z ht M h M h 1 G ht w h j h (u h ) = λ(u h u h d ) Mh 1 G ht w h. As already illustrated in Example 7.17, the vector j h (u h ) is not the usual gradient of j h (u h ) with respect to u h, which corresponds to the gradient representation with respect to the Euclidean inner product. In fact, we have d du h jh (u h ) = λm h (u h u h d) G ht w h = M h j h (u h ). (7.2) Rather, j h (u h ) is the gradient representation with respect to the inner product of H h, which is represented by the matrix M h. Writing down the first-order necessary conditions for the discrete reduced problem (7.16), we obtain u h C h, (j h (u h ), v h u h ) L 2 v h C h. (7.21) In coordinate form, this becomes u h C h, j h (u h ) T M h (v h u h ) v h C h. (7.22)
138 Applications Since M h is diagonal positive definite, we can write (7.21) equivalently as u h l P C(u h l jh (u h ) l ) =, l = 1,..., m h. This is the discrete analogue of the condition u P C (u j (u)) =, which we used to express the continuous problem in the form Π(u) def = π(u, j (u)) =, (7.23) where π = φ [α,β] is a continuous MCP-function for the interval [α, β]. As in the function space context, we apply an MCP-function π = φ [α,β] to reformulate (7.22) equivalently in the form Π h (u h ) def = π ( u h 1, jh (u h ) 1 ). π ( u h m h, j h (u h ) m h =. (7.24) ) This is the discrete version of the equation reformulation (7.12). If π is semismooth then, due to the continuous differentiability of j h, also Π h is semismooth and finitedimensional semismooth Newton methods can be applied. We expect a close relationship between the resulting discrete semismooth Newton method and the semismooth Newton method for the original problem in function space. This relation is established in the following considerations: First, we have to identify the discrete correspondent to the generalized differential Π(u) in Theorem 7.9. Let B Π(u). Then there exists d (L ) 2 with d(x) π(u(x), j (u)(x)) on Ω such that B = d 1 I + d 2 j (u). Replacing u by u h and j by j h, a suitable discretization of B is obtained by B h = d h 1I + d h 2 j h (u h ), (7.25) d h i Hh, d h (x) π ( u h (x), j h (u h )(x) ), x Ω. (7.26) Since u h and j h (u h ) are elements of H h, they are constant on any triangle T l T h with values u h l and j h (u h ) l, respectively. Denoting by d h i the η h -coordinates of d h i Hh, the functions d h i are constant on any triangle T l with values d h il. Therefore, (7.26) is equivalent to (d h 1l, d h 2l) π ( u h l, j h (u h ) l ), 1 l m h. Let j h (u h ) R mh m h denote the matrix representation of j h (u h ) with respect to the H h -inner product. More precisely, j h (u h )z h are the η h -coordinates of j h (u h )z h ; thus, for all z h, z h H h and corresponding coordinate vectors z h, z h, we have
139 7.1 Distributed Control of a Nonlinear Elliptic Equation 135 (z h, j h (u h ) z h ) L 2 = z ht M h j h (u h ) z h. The matrix representation of B h with respect to the H h inner product is B h = D h 1 + D h 2j h (u h ), where D h i = diag(dh i ). In fact, (η h l, B h z h ) L 2 = (η h l, d h 1 z h ) L 2 + (η h l, d h 2 j h (u h )z h ) L 2 = (η h l, dh 1l zh ) L 2 + (η h l, dh 2l jh (u h )z h ) L 2 = ( M h (d h 1lz h ) ) l + ( M h (d h 2lj h (u h )z h ) ) l. Therefore, the matrix representation of the discrete correspondent to Π(u) is Π h (u h ), the set consisting of all matrices B h R mh m h with B h = D h 1 + Dh 2 jh (u h ), (7.27) where D h 1 and Dh 2 are diagonal matrices such that ( (D h 1 ) ll, (D h 2 ) ) ( ) ll π u h l, j h (u h ) l, l = 1,..., m h. Next, we show that there is a very close relationship between Π h and finitedimensional subdifferentials of the function Π h. To establish this relation, let us first note that the coordinate representation j h (u h ) of j h (u h ) satisfies j h (u h ) = d du h jh (u h ). In fact, we have for all z h, z h H h and corresponding coordinate vectors z h, z h z ht M h j h (u h ) z h = (z h, j h (u h ) z h ) L 2 = z ht d2 du h2 jh (u h ) z h = z ht d du h (Mh j h )(u h ) z h = z ht M h d du h jh (u h ) z h, where we have used (7.2). This shows that for the rows of Π h holds Π h l = π d ( u h ) l du h l j h (u h ) l in the sense of Proposition 3.7 and that, by Propositions 3.3 and 3.7, Π h l is Π h l - semismooth if π is semismooth. Therefore, Π h is Π h -semismooth by Proposition 3.5. If π is α-order semismooth and j h is differentiable with α-hölder continuous derivative, then the above reasoning yields that Π h is even α-order Π h - semismooth. Finally, there is also a close relationship between Π h and C Π h. In fact, by the chain rule for Clarke s generalized gradient we have
140 Applications C Π h (u h ) Π h (u h ). Under additional conditions (e.g., if π or π is regular), equality holds. If we do not have equality, working with the differential Π h has the advantage that π and the derivatives of its arguments can be computed independently of each other, whereas in general the calculation of C Π h (u h ) is more difficult. We collect the obtained results in the following theorem: Theorem The discretization of the equation reformulation (7.23) of (7.1) in coordinate form is given by (7.24). Further, the multifunction Π h, where Π h (u h ) consists of all B h R mh m h defined in (7.27), is the discrete analogue of the generalized differential Π. We have C Π h (u h ) Π h (u h ) with equality if, e.g., π or π is regular. If π is semismooth, then Π h is Π h -semismooth and also semismooth in the usual sense. Further, if π is α-order semismooth and if j h (and thus j h ) is twice continuously differentiable with α-hölder continuous second derivative, then Π h is α-order Π h -semismooth and also α-order semismooth in the usual sense. Having established the Π h -semismoothness of Π h, we can use any variant of the semismooth Newton methods in sections to solve the semismooth equation (7.24). We stress that in finite dimensions no smoothing step is required to obtain fast local convergence. However, since the finite-dimensional problem (7.24) is a discretization of the continuous problem (7.12), we should, if necessary, incorporate a discrete version of a smoothing step to ensure that the algorithm exhibits mesh independent behavior. The resulting instance of Algorithm 3.9 then becomes: Algorithm Inexact Semismooth Newton s Method. Choose an initial point u h Rm h and set k =. 1. Compute the discrete state y h k Rnh by solving the discrete state equation A h y h k + S h ϕ(y h k) = f h + G h u h k. 2. Compute the discrete adjoint state w h k Rnh by solving the discrete adjoint equation ( A h + T h (y h k) ) w h k = S h (y h y h d). 3. Compute the discrete reduced gradient j h k = λ(u h k u h d ) Mh 1 G ht w h and the vector Π h k Rnh, (Π h k ) l = π ( (u h k ) l, j h k l). 4. If (Π h k T M h Π h k )1/2 ε, then STOP with result u h = u h k.
141 7.1 Distributed Control of a Nonlinear Elliptic Equation Compute B h k Π h (u h k ) (details are given below). 6. Compute s h k Rmh by solving the semismooth Newton system (details are given below) B h ks h k = Π h k, and set u h, k+1 = uh k + sh k. 7. Perform a smoothing step (if necessary): u h, k+1 uh k Increment k by one and go to step 1. Remark 7.2. (a) We can allow for inexactness in the matrices B h k, which results in an instance of Algorithm In fact, as was shown in Theorem 3.15, besides the uniformly bounded invertibility of the matrices B h k we only need that inf (B B h B Π h (u h k ) k )sh k = o( sh k ) as s h k to achieve superlinear convergence. (b) We also can achieve that the iteration stays feasible with respect to a closed convex set K h which contains the solution of (7.24). This can be achieved by incorporating a projection onto K h in the algorithm after the smoothing step and results in an instance of Algorithm In the following, we only consider the projection-free algorithm and the projected version with projection onto C h, which is given by coordinatewise projection onto C. (c) The efficiency of the algorithm crucially depends on the efficient solvability of the Newton equation in step 6. We propose an efficient method in section (d) We observed in Lemma 7.7 that j (u) = λu + G(u), where u U G(u) = gw(u) λu d L p (Ω) is locally Lipschitz continuous with p > 2. We concluded that a smoothing step is given by the scaled projected gradient step u P C (u λ 1 j (u)) = P C (u d + λ 1 gw(u)). Therefore, a discrete version of the smoothing step is given by u h P C ( u h λ 1 j h (u h ) ) = P C ( u h d + λ 1 M h 1 G ht w h). (7.28) Due to the smoothing property of G we also can apply a smoothing-step-free semismooth Newton method by choosing for the reformulation, which results in π(x) = x 1 P C (x 1 λ 1 x 2 )
142 Applications Π(u) = u P C ( λ 1 G(u) ) = u P C ( ud + λ 1 gw(u) ). In the discrete algorithm, this corresponds to Π h (u h ) = u h P C ( u h λ 1 j h (u h ) ) = u h P C ( u h d + λ 1 M h 1 G ht w h). (7.29) In section 7.2, we present numerical results for both variants, the one with general MCP-function π and smoothing step (7.28), and the smoothing-step-free algorithm with Π h as defined in (7.29) Efficient Solution of the Newton System We recall that a matrix B h k m h Rmh is contained in Π h (u h k ) if and only if B h k = D h k1 + D h k2j h (u h k), where D h k1 and Dh k2 are diagonal matrices such that ( (D h k1 ) ll, (D h k2) ll ) π ( (u h k ) l, j h (u h k) l ). (7.3) Further, for the choices of functions π we are going to use, namely φ F C B and φe,σ C : x φ E C (x 1, σx 2 ), σ >, the computation of π, and thus of the matrices D h ki, is straightforward. Concerning the calculation of φ E,σ C, see Proposition 5.6; for the computation of φ F C B, we refer to [54]. In both cases, there exist constants c i > such that for all x R 2 and all d π(x) holds d 1, d 2 c 1, d 1 + d 2 c 2. In particular, the matrices D h ki are positive semidefinite with uniformly bounded norms, and D h k1 + Dh k2 is positive definite with uniformly bounded inverse. We observed earlier the relation d2 j h (u h ) = M h 1 du h2 jh (u h ). For the computation of the right hand side we use the adjoint representation of appendix A.1, applied to problem (7.19). The state equation for this problem is E h (y h, u h ) = with E h (y h, u h ) = A h y h + S h ϕ(y h ) f h G h u h, and the Lagrange function is given by Observe that L h (y h, u h ) = J h (u h ) + w ht E h (y h, u h ).
143 7.1 Distributed Control of a Nonlinear Elliptic Equation 139 d dy h Eh (y h, u h ) = A h + T h (y h ), d du h Eh (y h, u h ) = G h, d 2 L h d(y h, u h ) 2 (yh, u h, w h ) = Therefore, introducing the diagonal matrix ( S h + S h diag(ϕ (y h )) diag(w h ) λm h Z h (y h, w h ) = S h( I + diag(ϕ (y h )) diag(w h ) ), and omitting the arguments for brevity, we obtain by the adjoint formula (( ) 1 ) T (( d 2 du h2 jh (u h de h de h d 2 L h de h ) = dy h du h I d(y h, u h ) 2 I ) 1 de h dy h du h = G ht (A h + T h (y h )) 1 Z h (y h, w h )(A h + T h (y h )) 1 G h + λm h. The Hessian j h (u h ) with respect to the inner product of H h is thus given by j h (u h ) = M h 1 G ht (A h + T h (y h )) 1 Z h (y h, w h )(A h + T h (y h )) 1 G h + λi. Therefore, the matrices B h Π h (u h ) are given by B h = D h + D h 2 Mh 1 G ht (A h + T h (y h )) 1 Z h (y h, w h )(A h + T h (y h )) 1 G h, ) ). where D h 1 and Dh 2 satisfy (7.3) and D h def = D h 1 + λdh 2. Note that D h is diagonal, positive definite, and D h as well as D h 1 are bounded uniformly in u h. Since computing (A h + T h (y h )) 1 v h means solving the linearized state equation, it is not a priori clear that Newton s equation in step 6 of Algorithm 7.19 can be solved efficiently. It is also important to observe that the main difficulties are caused by the structure of the Hessian j h, not so much by the additional factors D h 1 and D h 2 appearing in B h. In other words, it is also not straightforward how the Newton system for the unconstrained reduced control problem can be solved efficiently. However, the matrix B h is a discretization of the operator (d 1 + λd 2 )I + d 2 g ( + ϕ I) 1 [(1 + ϕ w)i]( + ϕ I) 1 (gi). Hence, one possibility to solve the discretized semismooth Newton system efficiently is to use the compactness of the operator ( + ϕ I) 1 [(1 + ϕ w)i]( + ϕ I) 1 [gi]
144 14 7. Applications to apply multigrid methods of the second kind [72, Ch. 16]. These methods are suitable for solving problems of the form u = Ku + f, where K : U V U (compact embedding). The application of ( + ϕ I) 1 to a function, i.e., application of (A h + T h (y h )) 1 to a vector, can be done efficiently by using, once again, multigrid methods. We believe that this approach has computational potential. In our computations however, we use a different strategy that we describe now. To develop this approach, we consider the Newton system B h s h = Π h (u h ) (7.31) and derive an equivalent system of equations that, under certain assumptions, can be solved efficiently. Hereby, we use the relations that we observed in section between the semismooth Newton system of the reduced Newton system and semismooth Newton system obtained for the all-at-once approach. To this end, consider the system d 2 dy h 2 L h D h 2 Mh 1 d 2 d2 L h d2 L h dy h du h dy h dw h L h D h du h dy h 1 + Dh 2 Mh 1 d 2 du h 2 L h D h 2 Mh 1 d 2 L h Π h du h dw. h d 2 dw h dy h L h d2 L h dw h du h Using the particular form of L h, this becomes Performing the transformation yields the equivalent system Z h A h + T h D h D h 2M h 1 G ht Π h. A h + T h G h Row 1 Row 1 Z h (A h + T h ) 1 Row 3 d2 dw h 2 L h Z h (A h + T h ) 1 G h A h + T h D h D h 2 Mh 1 G ht Π h, (7.32) A h + T h G h and by the transformation we arrive at Row 2 Row 2 + (D h 2 Mh 1 G ht )(A h + T h ) 1 Row 1,
145 7.1 Distributed Control of a Nonlinear Elliptic Equation 141 Z h (A h + T h ) 1 G h A h + T h B h Π h. A h + T h G h This shows that B h appears as a Schur complement of (7.32). Hence, if we solve (7.32), we also have a solution of the Newton system (7.31). For deriving an efficient strategy for solving (7.32), we first observe that D h is diagonal and nonsingular. Further the diagonal matrix Z h is invertible if and only if ϕ (y h ) i w h i 1 l = 1,..., nh. (7.33) In particular, this holds true if ϕ (y h ) i wi h is small for all i. If, e.g., the state equation is linear, then ϕ. Further, if y h is sufficiently close to the data yd h, then the right hand side of the adjoint equation is small and thus w h is small. Both cases result in a positive definite diagonal matrix Z h. If (7.33) happens to be violated, we can perform a small perturbation of Z h (but sufficiently large to avoid numerical instabilities) to make it nonsingular. With D h and Z h being invertible, we transform (7.32) according to Row 3 Row 3 + (A h + T h )Z h 1 Row 1 G h D h 1 Row 2, and obtain where Z h A h + T h D h D h 2 Mh 1 G ht Π h, Q h G h D h 1 Π h Q h = G h D h 1 D h 2 Mh 1 G ht + (A h + T h )Z h 1 (A h + T h ). The matrix D h 1 D h 2M h 1 is diagonal and positive definite. Hence, Q h is symmetric positive definite if Z h is positive definite. Furthermore, Q h can be interpreted as the discretization of the differential operator d 2 g 2 ( I + ( + ϕ (y)i) d 1 + λd ϕ (y)w I ) ( + ϕ (y)i), which is elliptic if (1 + ϕ (y)w) is positive on Ω. Hence, fast solvers (multigrid, preconditioned conjugate gradient, etc.) can be used to solve the system Q h v h = G h D h 1 Π h. (7.34) Then, the solution s h of the Newton system (7.31) is obtained as s h = Π h + D h 1 D h 2M h 1 G ht v h.
146 Applications Discrete All-at-Once Approach The detailed considerations of the black-box approach can be carried out in a similar way for semismooth reformulations of the KKT-system of the discretized control problem. We think there is no need to discuss this in detail. In the discrete all-atonce approach, L h u = M h 1 (d/du h )L h plays the role of j h, and the resulting h system to solve has the structure Z h A h + T h (d/dy h )L h D h D h 2M h 1 G ht Π h, A h + T h G h (d/dw h )L h see section If a globalization is used, it is important to formulate the merit function by means of the correct norms: 1 2 [ dl h dy h ] T h 1 dlh A dy h ΠhT M h Π h [ dl h dy h and to represent gradients with respect to the correct inner products. ] T h 1 dlh A dw h, 7.2 Numerical Results We now present numerical results for problem (7.1). Hereby, the domain is the unit square Ω = (, 1) (, 1). For ϕ we choose ϕ(y) = y 3, which satisfies the growth condition with s = 4. The choice of the other data is oriented on [14, Ex ] (therein, however, the state equation is linear and corresponds to ϕ ): β 1 =, β 2 =, y d (x) = 1 6 sin(2πx 1) sin(2πx 2 )e 2x 1, (7.35) u d, λ = 1 3. Figure 7.1 shows the computed optimal control on T 1/32 and Figure 7.2 the corresponding state. The code was implemented in Matlab Version 6 Release 12, using sparse matrix computations. Although Matlab is quite efficient, it usually cannot compete with Fortran or C implementations, which should be kept in mind when evaluating the runtimes given below. The computations were performed under Solaris 8 on a Sun SPARC Ultra workstation with a sparcv9 processor operating at 36 MHz. We present results for 1. Reformulations of the black-box VIP (7.11), 2. Reformulations of the all-at-once KKT-system (5.24) (5.26), to which we apply two variants of the semismooth Newton method,
147 7.2 Numerical Results Figure 7.1 Optimal control ū (h = 1/32). 1. Algorithm 3.9 (no constraints), 2. Algorithm 3.17 with K = C, In both cases we consider the following choices of MCP-functions: 1. π(x) = x 1 P (,] (x 1 λ 1 x 2 ) (smoothing-step-free algorithm). 2. π(x) = φ F B ( x). We obtain eight (actually six, see below) variants of algorithms, which are denoted by A111 A222, where the three numbers express the choices for the three criteria given above. For instance, A221 stands for Algorithm 3.17, applied to the KKTsystem, with K = C and π(x) = x 1 P (,] (x 1 λ 1 x 2 ). Since in the class Axy2 we compute smoothing steps as described in section 4.1, and the smoothing step contains already a projection onto C, we have A112=A122, A212=A222. We will use the names A112 and A212 in the sequel Using Multigrid Techniques For the efficient solution of the discrete state equation (needed in the black-box approach), and the linearized state equation (needed in the all-at-once approach), we use a conjugate gradient method that is preconditioned by one multigrid (MG)
148 Applications Figure 7.2 Optimal state y(ū) (h = 1/32). V-cycle with one red-black Gauß-Seidel iteration as presmoother and one adjoint red-black Gauß-Seidel iteration as postsmoother. Standard references on multigrid methods include [23, 72, 73, 145]. Our semismooth Newton methods with MGpreconditioned conjugate gradient solver of the Newton systems belong to the class of Newton multilevel methods [44]. For other multigrid approaches to variational inequalities we refer to [21, 82, 83, 99, 1, 11]. For the solution of the semismooth Newton system we solve the Schur complement equation (7.34) by a multigrid-preconditioned conjugate gradient method as just described. The grid hierarchy is generated as follows: The coarsest triangulation T 1 is shown in Figure 7.3. Given T 2h, the next finer triangulation T h is obtained by replacing any triangle in T 2h with four triangles, introducing the edge midpoints of the coarse triangles as new vertices, see Figure 7.4, which displays T 1/2. Table 7.1 shows the resulting number of interior vertices and the number of triangles for each triangulation level. There is a second strategy to use the multilevel philosophy: We can perform a nested iteration over the discrete control problems on the grid hierarchy: We first (approximately) solve the discrete control problem on the coarsest level. We then interpolate this solution to obtain an initial point for the discrete con-
149 7.2 Numerical Results 145 Figure 7.3 Coarsest triangulation T 1. Figure 7.4 Second triangulation T 1/2. Number of Number of h interior vertices triangles 1/ / / / / Table 7.1 Degrees of Freedom for different mesh sizes. trol problem on the next finer level, which we again solve approximately, and so forth. As we will see, this approach is very efficient Black-Box Approach We now present numerical results for semismooth Newton methods applied to the first-order necessary conditions of the reduced problem (7.9). We thus consider the three algorithms A111, A121 and A112. The initial point is u 1. We do not use a globalization since (as it is often the case for control problems) the undamped semismooth Newton method converges without difficulties. We stress that if the nonmonotone trust-region method of section 6.4 is used, the globalization parameters can be chosen in such a way that the method essentially behaves like the pure Newton method. To be independent of the choice of the MCP-function, we work with the termination condition χ(u k ) = u k P C (u k j (u k )) L 2 ε, or, in terms of the discretized problem, [ ] u h k P C (u h k j h T [ ] k ) M h u h k P C (u h k j h k ) ε 2.
150 Applications h k u k ū L 2 u k ū L χ(u k ) e e e e e+ 1.22e e e e e e e e e+ 4.85e e e e e e e e e e e e+ 4.87e e e e e e e e e e e e+ 4.88e e e e e e e e 8 1.1e e e e e e e+ 4.88e e e e e e e e e e e e e 15 Table 7.2 Iteration history of algorithm A111. Except from this, the method we use agrees with Algorithm We work with ε = 1 8. Smaller values can be chosen as well, but it does not appear to be very reasonable to choose ε much smaller than the discretization error. The nonlinear state equation is solved by a Newton iteration, where, in each iteration, a linearized state equation has to be solved. For the computation of j we solve the adjoint equation. All PDE solves are done by a multigrid-cg method as described above. In our first set of tests we choose λ =.1 and consider problems on the triangulations T h for h = 2 k, k = 4, 5, 6, 7, 8. See Table 7.1 for the corresponding number of triangles and interior nodes, respectively. The results are collected in Tables Hereby, Table 7.2 contains the results for A111, Table 7.3 the results for A121, and Table 7.4 the results for A112. Listed are the iteration k, the L 2 -distance to the (discrete) solution ( u k ū L 2), the L - distance to the (discrete) solution ( u k ū L ), and the norm of the projected gradient (χ(u k )). For all three variants of the algorithm we observe mesh-independent convergence behavior, and superlinear rate of convergence of order >1. Only 3 4 iterations are needed until termination. Table 7.5 shows for all three algorithms the total number of iterations (Iter.), of state equation solves (State), of linearized state equation solves (Lin. State), and of adjoint equation solves (Adj. State), and the total solution time in seconds (Time). The total number of solves of the semismooth Newton system coincides with the number of iterations Iter. All solves of the linearized state equations are performed
151 h k u k ū L 2 u k ū L χ(u k ) e e e e e+ 1.22e e e e e e e e e+ 4.85e e e e e e e e e e e e+ 4.87e e e e e e e e e e e e+ 4.88e e e e e e e e e e e e+ 4.88e e e e e e e e e e 1 Table 7.3 Iteration history of algorithm A Numerical Results 147 within the Newton method for the solution of the state equation. For algorithms A111 and A121, a total of Iter+1 state solves and Iter+1 adjoint state solves are required. Algorithm A112 requires in addition one state solves and one adjoint state solve per iteration for the computation of the smoothing step. We see that usually two Newton iterations are sufficient to solve the nonlinear state equation. Observe that the total computing time increases approximately linearly with the degrees of freedom. This shows that we indeed achieve multigrid efficiency. We note that algorithms A111 and A121 are superior to A112 in computing time. The main reason for this is that A112 requires the extra state equation and adjoint equation solves for the smoothing step. In a second test we focus on the importance of the smoothing step. To this end, we have run the Algorithms A112 and A122 without smoothing steps (A112 is without projection whereas A122 contains a projection). The results are shown in Table 7.6. We see that A112 without smoothing steps needs an average of 7 iterations, whereas the regular Algorithm A112, see Table 7.5, needs only 4 iterations in average. This shows that the smoothing step has indeed benefits, but that the algorithm still exhibits reasonable efficiency if the smoothing step is removed. If we do not perform a smoothing step, but include a projection (A122 without smoothing step), the performance of the algorithm is not affected by omitting the smoothing step, at least for the problem under consideration. We recall that the role of the smoothing step is to avoid large discrepancies between u k ū L p and u k ū L r, i.e., to avoid large (peak-like) deviations of u k from ū on small sets, see Example It
152 Applications h k u k ū L 2 u k ū L χ(u k ) 1.623e e e e e+ 1.63e e e e e e e e e e e e+ 4.85e e e e e e e e e e e e e e e+ 4.87e e e e e e e e 3 7.4e e e e e e e+ 4.88e e e e e e e e e e e e e e e+ 4.88e e e e e e e e e e e e e 1 Table 7.4 Iteration history of algorithm A112. is intuitively clear that a projection step can help in cutting off such peaks (but there is no guarantee). In our next test we show that lack of strict complementarity does not affect the superlinear convergence of the algorithms. Denoting by j the reduced objective function for the data (7.35) and by ū the corresponding solution, we now choose u d = λ 1 j (ū). With these new data, the (new) gradient vanishes identically on Ω at ū so that strict complementarity is violated. A representative run for this degenerated problem is shown in Table 7.7 (A111, h = 1/128). Hereby, u h d was obtained from the discrete solution and the discrete gradient. Similar as in the nondegenerate case, the algorithms show mesh independent behavior, see Table 7.8. We have not included further tables for this problem since they would look essentially like those for the nondegenerate problem All-at-Once Approach We now present numerical experiments for semismooth Newton methods applied to the all-at-once approach. Since the state equation is nonlinear, the advantage of this approach is that we do not have to solve the state equation in every iteration. On
153 7.2 Numerical Results 149 Alg. h Iter. State Lin. State Adj. State Time 1/ s 1/ s A111 1/ s 1/ s 1/ s 1/ s 1/ s A121 1/ s 1/ s 1/ s 1/ s 1/ s A112 1/ s 1/ s 1/ s Table 7.5 Performance summary for the algorithms A111, A121, and A112. h Iter. State Lin. State Adj. State Time Algorithm A112 without smoothing step 1/ s 1/ s 1/ s 1/ s 1/ s Algorithm A122 without smoothing step. 1/ s 1/ s 1/ s 1/ s 1/ s Table 7.6 Performance summary for algorithms A112 and A122 without smoothing step. the other hand, the main work is solving the Newton system so that an increase of iterations in the semismooth Newton method can compensate this win of time. We choose u 1, y, w. Better choices for y and w are certainly possible. Our termination condition is χ(y k, u k, w k ) = ( L u (y k, u k, w k ) P C (u k L u (y k, u k, w k )) 2 L 2 + L y (y k, u k, w k ) 2 H 1 + E(y k, u k ) 2 H 1 ) 1/2 ε with ε = 1 8. The all-at-once semismooth Newton system is solved by reducing it to the same Schur complement as was used for solving the black-box Newton equation, and by applying MG-preconditioned cg. Only the right hand side is different. Table 7.9 shows two representative runs of algorithm A212. Furthermore, Table 7.1
154 15 7. Applications h k u k ū L 2 u k ū L χ(u k ) e e+ 2.53e e e+ 1.6e e e e e e e-17 Table 7.7 Iteration history of algorithm A111 for a degenerate problem. h Iter. State Lin. State Adj. State Time 1/ s 1/ s 1/ s 1/ s 1/ s Table 7.8 Performance summary of algorithm A111 for a degenerate problem. h k u k ū L 2 u k ū L χ(y k, u k, w k ) 1.628e e+ 1.93e e e e e e-1 4.7e e e e e e e e e e e e+ 1.93e e e e e e e e e e e e e e e e-11 Table 7.9 Iteration history of algorithm A212. contains information on the performance of the algorithms A211, A221, and A212 for different mesh sizes. In comparison with the black-box algorithms, we see that all-at-once approach and black-box approach are comparably efficient. As an advantage of the all-at-once approach we note that the smoothing step can be performed with minimum additional cost, whereas in the black-box approach it requires one additional solve of both, state and and adjoint equation. We believe that the more expensive to solve the state equation is (due to nonlinearity), the more favorable is the all-at-once approach Nested Iteration Next, we present numerical results for the nested iteration approach. Hereby, we start on the grid T 1/2, solve the problem with termination threshold ε = 1 5 and compute from its solution an initial point for the problem on the next finer grid
155 7.2 Numerical Results 151 Algorithm h A211 A221 A212 Iter. Time Iter. Time Iter. Time 1/ s s s 1/ s s s 1/ s s s 1/ s s s 1/ s s s Table 7.1 Performance summary for the algorithms A211, A221, and A212. T 1/4, and so on. On the finest level we solve with termination threshold ε = 1 8. Table 7.11 shows the number of iterations per level and the total execution time Lin. Adj. Lin. Adj. h Iter. State State State h Iter. State State State 1/ / / / / / / / Total Time: 36 s Table 7.11 Performance summary for nested iteration version of algorithm A111. for the nested version of algorithm A111. Comparison with Table 7.11 shows that the nested version of A111 needs less than half the time to solve the problem than the unnested version (33 vs. 935 seconds). The use of nested iteration is thus very promising. Furthermore, it is very robust since, except for the coarsest problem, the Newton iteration is started with a very good initial point Discussion of the Results From the presented numerical results we draw the following conclusions: The proposed methods allow us to use fast iterative solvers for their implementation. This leads to runtimes of optimal order in the sense that they are approximately proportional to the number of unknowns. The class of semismooth Newton methods performs very efficiently and exhibits mesh-independent behavior. We observe superlinear convergence as predicted by our theory. Both, black-box and all-at-once approach lead to efficient and robust algorithms which are comparable in runtime. If smoothing steps are used, the all-at-once approach is advantageous since it does not require additional state and adjoint state solves to compute the smoothing step.
156 Applications Lack of strict complementarity does not affect the fast convergence of the algorithms. This confirms our theory, which does not require strict complementarity. The choice of the MCP-function π(x) = x 1 P C (x 1 λ 1 x 2 ) appears to be preferable to π(x) = φ F B ( x) for this class of problems, at least in the blackbox-approach. The main reason for this is the additional cost of the smoothing step. The performance of the φ F B -based algorithms, which from a theoretical point of view require a smoothing step, degrades by a certain margin if the smoothing step is turned off. This, however, is compensated if we turn on the projection step. Our numerical experience indicates that this effect is problem dependent. It should be mentioned that so far we never observed a severe deterioration of performance when switching off the smoothing step. But we stress that pathological situations like the one in Example 3.52 can occur, and that they result in a stagnation of convergence on fine grids (we have tried this, but do not include numerical results here). We conclude this section by noting that many other control problems can be handled in a similar way. In particular, Neumann boundary control can be used instead of distributed control. Furthermore, the control of other types of PDEs by semismooth Newton methods is possible, e.g., Neumann boundary control of the wave equation [14] and Neumann boundary control of the heat equation [24, 143]. The optimal control of the incompressible Navier Stokes equations is considered in section Obstacle Problems In this section we study the class of obstacle problems described in section Obstacle problems of this or similar type arise in many applications, e.g., potential flow of perfect fluids, lubrication, wake problems, etc., see, e.g., [63] and the references therein. We describe the problem in terms of the obstacle problem for an elastic membrane. For q [2, ), let g H 2,q (Ω) represent a (lower) obstacle located over the nonempty bounded open set Ω R 2 with sufficiently smooth boundary, denote by y H 1 (Ω) the position of a membrane, and by f Lq (Ω) external forces. For compatibility we assume g on Ω, which is assumed to be sufficiently smooth. Then y H 1 (Ω) solves the variational inequality y g on Ω, a(y, v y) (f, v y) L 2 v H 1 (Ω), v g on Ω, (7.36) where a : H 1 (Ω) H1 (Ω) R, a(y, z) = i,j y z a ij, x i x j a ij = a ji C 1 ( Ω), and a being H 1 -elliptic, i.e.,
157 7.3 Obstacle Problems 153 a(y, y) ν y 2 H 1 y H 1 (Ω) with a constant ν >. The bounded bilinear form a induces a bounded linear operator A L(H 1, H 1 ) via a(v, w) = v, Aw H 1,H for all v, w 1 H1 (Ω). The ellipticity of a and the Lax Milgram theorem imply that A L(H 1, H 1 ) is a homeomorphism with A 1 H 1,H 1 ν 1, and regularity results imply that A 1 L(L 2, H 2 ). Introducing the closed convex set F = {y H 1 (Ω) : y g on Ω} and the objective function J : H 1 (Ω) R, J(y) def = 1 2 a(y, y) (f, y) L 2, we can write (7.36) equivalently as optimization problem minimize J(y) subject to y F. (7.37) The ellipticity of a implies that J is strictly convex with J(y) as y H 1. Hence, using that F is a closed and convex subset of the Hilbert space H 1 (Ω), we see that (7.37) possesses a unique solution ȳ F [49, Prop. II.1.2]. Further, regularity results [22, Thm. I.1] ensure that ȳ H 1 (Ω) H 2,q (Ω) Dual Problem Since (7.37) is not posed in an L p -setting, we derive an equivalent dual problem, which, as we will see, is posed in L 2 (Ω). Denoting by I F : H 1 (Ω) R {+ }, the indicator function of F, i.e., I F (y)(x) = for x F and I F (y)(x) = + for x / F, we can write (7.37) in the form inf J(y) + I F (y). (7.38) y H 1(Ω) The corresponding (Fenchel Rockafellar) dual problem [49, Ch. III.4] (we choose F = I F, G = J, Λ = I, u = y, and p = u in the terminology of [49]) is sup J (u) IF ( u), (7.39) u H 1 (Ω) where J : H 1 (Ω) R {+ } and IF conjugate functions of F and I F, respectively: : H 1 (Ω) R {+ } are the J (u) = I F (u) = sup y, u H 1 y H 1(Ω),H 1 J(u), (7.4) sup y, u H 1 y H 1(Ω),H I F(y). (7.41) 1 (7.42)
158 Applications Let y H 1 (Ω) be such that I F (y ) =, e.g., y = ȳ. Then J is continuous at y and I F is bounded at y. Furthermore, since I F, the ellipticity implies J(y) + I F (y) as y H 1. Therefore, [49, Thm. III.4.2] applies so that (7.38) and (7.39) possess solutions ȳ (this we knew already) and ū, respectively, and for any pair of solutions holds J(ȳ) + I F (ȳ) + J (ū) + I F( ū) =. Further, the following extremality relations hold: J(ȳ) + J (ū) ū, ȳ H 1,H 1 I F (ȳ) + I F( ū) + ū, ȳ H 1,H 1 =, (7.43) =. (7.44) This implies In our case J is smooth, which yields ū J(ȳ), (7.45) ū I F (ȳ). (7.46) ū = J (ȳ) = Aȳ f. (7.47) We know that the primal solution ȳ is unique, and thus the dual solution ū is unique, too, by (7.47). Further, by regularity, ȳ H 1 (Ω) H2,q (Ω), which, via (7.47), implies ū L q (Ω). The supremum in the definition of J, see (7.4), is attained for y = A 1 (f +u), with value J (u) = u, y H 1,H y, Ay H 1,H 1 + f, y H 1,H 1 For u L 2 (Ω) we can write = 1 2 f + u, A 1 (f + u) H 1,H 1. J (u) = 1 2 (f + u, A 1 (f + u)) L 2. Further, see also [22, p. 19] and [49, Ch. IV.4], For u L 2 (Ω) we have IF(u) = sup u, y H 1,H 1 I F(y) = sup u, y y H 1 H 1,H 1. y F I F (u) = sup(u, y) L 2 = y F { (g, u)l 2 if u on Ω, + otherwise. Therefore, using the regularity of ȳ and ū, we can write (7.39) in the form
159 7.3 Obstacle Problems 155 maximize u L 2 (Ω) 1 2 (f + u, A 1 (f + u)) L 2 + (g, u) L 2 subject to u, (7.48) and we know that ū L q (Ω). We recall that from the dual solution ū we can recover the primal solution ȳ from the identity (7.47): ȳ = A 1 (f + ū). In the following we prefer to write (7.48) as a minimization problem: minimize u L 2 (Ω) 1 2 (f + u, A 1 (f + u)) L 2 (g, u) L 2 subject to u. (7.49) Example In the case A = the primal problem is 1 minimize y H 1(Ω) 2 y 2 H (f, y) 1 L 2 subject to y g, and the dual (minimization) problem reads minimize u L 2 (Ω) 1 2 f + u 2 H (g, u) 1 L2 subject to u, where u H 1 = 1 u H 1 is the norm dual to H 1. We collect our results in the following theorem. Theorem Under the problem assumptions, the obstacle problem (7.36) possesses a unique solution ȳ H 1 (Ω), and this solution is contained in H2,q (Ω). The dual problem (7.39) possesses a unique solution ū H 1 (Ω) as well. Primal and dual solution are linked via the equation Aȳ = f + ū. In particular, ū L q (Ω), and the dual (minimization) problem can be written in the form (7.49) Regularized Dual Problem Problem (7.49) is not coercive in the sense that for u L 2 the objective function tends to +. Hence, we consider the regularized problem minimize u L 2 (Ω) subject to j λ (u) def = 1 2 (f + u, A 1 (f + u)) L 2 + λ 2 u u d 2 L (g, u) 2 L 2 u on Ω (7.5) with u d L p (Ω), p (2, ), and (small) regularization parameter λ >. This problem has the following properties:
160 Applications Theorem The objective function of problem (7.5) is strongly convex and j λ (u) as u L 2. In particular, (7.5) possesses a unique solution ū λ L 2 (Ω), and this solution lies in L p (Ω). The derivative of j λ has the form j λ (u) = λ(u u d) + A 1 (f + u) g def = λu + G(u). (7.51) Hereby, the mapping G(u) = A 1 (f + u) g λu d maps L 2 (Ω) continuously and affine linearly into L p (Ω). Proof. Obviously, j λ is a smooth quadratic function and with z = A 1 (f + u), j λ (u) = λ 2 u u d 2 L a(z, z) (g, u) L 2 λ 2 u u d 2 L 2 g L 2 u L 2 as u L 2. Therefore, since {u L 2 (Ω) : u } is closed and convex, we see that (7.5) possesses a unique solution ū λ L 2 (Ω). Certainly, j λ (u) is given by (7.51), and the fact that A L(H1, H 1 ) implies that G : u L 2 (Ω) A 1 (f + u) g λu d H 1 (Ω) + L p (Ω) L p (Ω) is continuous affine linear. From the optimality conditions for (7.5) we conclude j λ (ū λ) = on {x Ω : ū λ (x) }. Hence, ū λ = 1 {ūλ }ū λ = λ 1 1 {ūλ }G(ū λ ) L p (Ω). Corollary Under the problem assumptions, F = j λ satisfies Assumption 3.33 (a), (b) for any p [2, ), any p [2, ) with p p and u d L p (Ω), and any 1 r < p. Furthermore, F satisfies Assumption 4.1 for r = 2 and all p (2, ) with u d L p (Ω). Finally, F also satisfies Assumption 4.6 (a) (e) for all p [2, ) and all p (2, ). Proof. The Corollary is an immediate consequence of Theorem 7.23 and the L 2 - coercivity of j λ. Remark Corollary 7.24 establishes all assumptions that are needed to establish the semismoothness of NCP-function based reformulations. In fact, for general NCP-functions Theorem 3.45 is applicable, whereas for the special choice π(x) = x 1 P [, ) (x 1 λ 1 x 2 ) we can use Theorem 4.4. Furthermore, the sufficient condition for regularity of Theorem 4.8 is applicable. Hence, we can apply our class of semismooth Newton methods to solve problem (7.5).
161 7.3 Obstacle Problems 157 Next, we derive bounds for the approximation errors ū λ ū H 1 and ȳ λ ȳ H 1, where ȳ λ = A 1 (f + ū λ ). Theorem Let ū and ū λ denote the solutions of (7.49) and (7.5), respectively. Then ȳ = A 1 (f+ū) solves the obstacle problem (7.36) and with ȳ λ = A 1 (f+ū λ ) holds, as λ + : ū λ ū H 1 = o(λ 1/2 ), (7.52) ȳ λ ȳ H 1 = o(λ 1/2 ). (7.53) Proof. By Theorems 7.22 and 7.23 we know that the dual problem (7.49) and the regularized dual problem (7.5) possess unique solutions ū, ū λ L p (Ω). Now j λ (ū λ ) j λ (ū) = j(ū) + λ 2 ū u d 2 L 2 j(ū λ) + λ 2 ū u d 2 L 2 = j λ (ū λ ) + λ 2 ( ū ud 2 L 2 ū λ u d 2 L 2 ). This proves Further, ū λ u d L 2 ū u d L 2. (7.54) j(ū) j(ū λ ) = j λ (ū λ ) λ 2 ū λ u d 2 L 2 j λ(ū) λ 2 ū λ u d 2 L 2 Therefore, = j(ū) + λ 2 ( ū ud 2 L 2 ū λ u d 2 L 2 ) j(ū) + λ 2 ū u d 2 L 2. (7.55) j(ū λ ) j(ū) λ 2 ū u d 2 L 2 = O(λ) as λ +. Now let λ k +. Since M = {u L 2 (Ω) : u, u u d L 2 ū u d L 2} is closed, convex, and bounded, there exists a subsequence and a point ũ M such that u λk ũ weakly in L 2. Since j is convex and continuous, it is weakly lower semicontinuous, so that j(ū) j(ũ) lim inf j(u [ λ k k ) = lim inf j(ū) + O(λk ) ] = j(ū). k Hence ũ is a solution of (7.49) and therefore ũ = ū, since ū is the unique solution. By a subsequence-subsequence argument we conclude ū λ ū weakly in L 2 (Ω) as λ +. (7.56) Since u u u d L 2 is convex and continuous, hence weakly lower semicontinuous, we obtain from (7.54) and (7.56)
162 Applications which proves ū u d L 2 lim inf ū λ u d L 2, λ + ū u d L 2 lim sup ū λ u d L 2, λ + ū λ u d L 2 ū u d L 2 as λ +. (7.57) Since L 2 is a Hilbert space, (7.56) and (7.57) imply Hence, (7.55) implies ū λ ū in L 2 as λ +. (7.58) j(ū λ ) j(ū) = o(λ). Since ū solves (7.49), there holds (j (ū), ū λ ū) L 2. Therefore, j(ū λ ) j(ū) = (j (ū), ū λ ū) L (ū λ ū, j (ū)(ū λ ū)) L (ū λ ū, j (ū)(ū λ ū)) L 2 = 1 2 (ū λ ū, A 1 (ū λ ū)) L 2. Hence, with v = ū λ ū and w = A 1 v, v 2 H = 1 Aw 2 H 1 A 2 H w 2 1,H 1 H A 2 1 H κ 1 w, Aw 1,H 1 H 1,H 1 κ 1 A 2 H v, w 1,H 1 L 2 2κ 1 A 2 H (j(ū 1 λ ) j(ū)),h 1 = 2κ 1 A 2 H 1,H 1 o(λ). This proves (7.52). The solution of the obstacle problem is ȳ = A 1 (f + ū). For ȳ λ = A 1 (f + ū λ ) holds: ȳ λ ȳ 2 H 1 = A 1 (ū λ ū) 2 H = w 2 1 H κ 1 w, Aw 1 H 1,H 1 = κ 1 (ū λ ū, A 1 (ū λ ū)) L 2 2κ 1 (j(ū λ ) j(ū)) = 2κ 1 o(λ). The proof is complete. Remark The parameter λ has to be chosen sufficiently small to ensure that the error is not larger than the discretization error. Our approach will be to successively reduce λ Discretization We use the same finite element spaces as in section A straightforward discretization yields the discrete obstacle problem (in coordinate form) minimize y h R nh 1 2 yht A h y h f ht y h subject to y h g h. (7.59)
163 7.3 Obstacle Problems 159 Hereby, g h R nh, gi h = g(pi h), approximates the obstacle. Furthermore, f i h = (βi h, f) L 2, and Ah ij = (Aβh i, βh j ) H 1,H 1. The corresponding dual problem is minimize 2 (f h + S h u h ) T A h 1 (f h + S h u h ) g ht S h u h subject to u h. u h R nh 1 (7.6) Hereby, S h R nh n h is defined as in (7.17). The discrete regularized dual problem then is given by minimize uh R nh jh λ (uh ) def = 1 2 (f h + S h u h ) T A h 1 (f h + S h u h ) + λ 2 (uh u h d) T S h (u h u h d) g ht S h u h (7.61) subject to u h, where, e.g., [S h u h d ] i = (L h βi h, Lh u d ) L 2. From the solution ū h λ of (7.61) we compute yλ h via Ah ȳλ h = f h + S h ū h λ. The gradient of j h λ and the Hessian j h λ of j h λ with respect to the S h -inner product are given by j h λ (u h ) = A h 1 (f h + S h u h ) + λ(u h u h d ) gh, j h λ (u h ) = A h 1 S h + λi. Choosing a Lipschitz continuous and semismooth NCP-function φ, we reformulate (7.61) in the form Φ h (u h ) def = φ ( u h 1, jh ) λ (u h ) 1. ) φ ( u h n h, j h λ (u h ) n h =. (7.62) This is the discrete counterpart of the semismooth reformulation in function space Φ(u) def = φ ( u, j λ (u)) =. As in section 7.1.4, we can argue that an appropriate discretization of Φ is Φ h (u h ), the set of all matrices B h R nh n h with B h = D h 1 + Dh 2 jh λ (u h ), (7.63) where D h 1 and Dh 2 are diagonal matrices such that ( (D h 1 ) ll, (D h 2 ) ) ( ll φ u h l, j h ) λ (u h ) l, l = 1,..., n h. Again, we have the inclusion
164 16 7. Applications C Φ h (u h ) Φ h (u h ) with equality if φ or φ is regular. With the same argumentation as in the derivation of Theorem 7.18 we can show that Φ h is Φ h -semismooth (and thus also semismooth in the usual sense). Semismoothness of higher order can be proved analogously. Hence, we can apply our semismooth Newton methods to solve (7.62). The details of the resulting algorithm, which are not given here, parallel Algorithm The central task is to solve the semismooth Newton system (we suppress the subscript k) [D h 1 + Dh 2 jh λ (u h )]s h = Φ h (u h ). Using the structure of j h λ and that (D h 1 + λd h 2 ) is diagonal and positive definite for our choices of φ, we see that this is equivalent to s h = S h 1 A h v h, where v h solves [A h + S h (D h 1 + λdh 2 ) 1 D h 2 ]vh = S h (D h 1 + λdh 2 ) 1 Φ h (u h ). This can be viewed as a discretization of the the PDE Av + d 2 d 1 + λd 2 v = 1 d 1 + λd 2 Φ(u). Therefore, we can apply a multigrid method to compute v h, from which s h can be obtained easily Numerical Results We consider the following problem: Ω = (, 1) (, 1), g = sin(πx 1) sin(πx 2 ) ( ) 1 f = 5 sin(2πx 1 ) sin(2πx 2 ) 2 + e2x 1+x 2. (7.64) The triangulation is the same as in section Again, the code was implemented in Matlab Version 6 Release 12, using sparse matrix computations, and was run under Solaris 8 on a Sun SPARC Ultra workstation with a sparcv9 processor operating at 36 MHz. To obtain sufficiently accurate solutions, the regularization parameter has to be chosen appropriately. Hereby, we use a nested iteration approach and determine λ in dependence on the current mesh size. It is known [63, App. I.3] that, under appropriate conditions, the described finite element discretization leads to approximation errors ȳ h ȳ H 1 = O(h). Since we have shown in Theorem 7.26 that ȳ λ ȳ H 1 = o(λ 1/2 ), we choose λ of the order h 2, more precisely, we work with λ = λ h = h2 1.
165 7.3 Obstacle Problems 161 We then solve problem (7.61) for h = 1/2 until χ(u k ) = u k P [, ) (u k j λ (u k)) L 2 ε (7.65) with ε = 1 5 (in the corresponding discrete norms), interpolate this coarse solution to obtain an initial point on T 1/4, solve this problem (now with λ = λ 1/4 ) until (7.65) is satisfied, interpolate again, and repeat this procedure until we have reached the finest grid on which we iterate until (7.65) holds with ε = 1 8. To further reduce the effect of regularization, we always use as u d the interpolated solution from the next coarser grid (the same point that we use as initial point). On T 1/2 we choose u d = u. The obstacle is shown in Figure 7.5, the state solution ȳ λ for PDE PDE h λ Iter. Solves h λ Iter. Solves h final = 1/64 1/2 2.5e / e /4 6.25e / e / e / e y ȳ H 1 = 2.375e 3 y ȳ λ H 1 = 1.978e 1 Total Time: 13.5 s h final = 1/128 1/2 2.5e / e /4 6.25e / e / e / e / e y ȳ H 1 = 8.671e 4 y ȳ λ H 1 = 3.572e 1 Total Time: 54.6 s h final = 1/256 1/2 2.5e / e /4 6.25e / e / e / e / e / e y ȳ H 1 = 3.24e 4 y ȳ λ H 1 = 5.594e 11 Total Time: s Table 7.12 Performance summary for nested iteration version of algorithm A111. λ = λ 1/64 is displayed in Figure 7.6, and the dual solution ū λ is depicted in Figure 7.7. Note that {x : ū(x) } is the contact region, and that for our choice of λ the
166 Applications Algorithm A111 k y k ȳ λ H 1 y k ȳ H 1 χ(u k ) 1.71e e e e e e e e e e e e e e e-11 Table 7.13 Iteration history of algorithm A111 on the final level h = h final = 1/ Figure 7.5 The obstacle g (h = 1/64). solution ū is approximated up to a fraction of the discretization error by ū λ. It can be seen that ū is discontinuous at the boundary of the contact region. In the numerical tests it turned out that it is not advantageous to let λ 1 become too large in the smoothing steps. Hence, we set γ = min{1 5, λ 1 } and work with smoothing steps of the form S k (u) = P [, ) (u γj λ (u)). On the other hand, even very small λ does not cause any problems in the NCP-function φ(x) = x 1 P [, ) (x 1 λ 1 ). We consider two methods: The smoothing-step-free Algorithm A111 with φ(x) = x 1 P [, ) (x 1 λ 1 ), and Algorithm A112 with φ F B and smoothing step as just described. It turns out that without globalization the
167 7.3 Obstacle Problems 163 PDE PDE h λ Iter. Solves h λ Iter. Solves h final = 1/64 1/2 2.5e / e /4 6.25e / e / e / e y ȳ H 1 = 2.374e 3 y ȳ λ H 1 = 1.631e 7 Total Time: 29.3 s h final = 1/128 1/2 2.5e / e /4 6.25e / e / e / e / e y ȳ H 1 = 8.67e 4 y ȳ λ H 1 = 3.69e 8 Total Time: s h final = 1/256 1/2 2.5e / e /4 6.25e / e / e / e / e / e y ȳ H 1 = 3.24e 4 y ȳ λ H 1 = 2.69e 11 Total Time: s Table 7.14 Performance summary for nested iteration version of algorithm A112. projected variant A121 tends to cycle when λ becomes very small. Since incorporating a globalization requires additional evaluations of j λ and/or its gradient, which is expensive due to the presence of A 1, we do not present numerical results for a globalized version of A121. In Table 7.12 (A111) and Table 7.14 (A112) we show, for each level of the nested iteration, the value of λ, the number of iterations performed on this level (Iter), and the number of PDE solves. Furthermore, the (discrete) distance y ȳ H 1 of the (discrete) computed solution y to the (discrete) solution ȳ corresponding to λ = and the (discrete) distance y ȳ λ H 1 of the (discrete) computed solution y to the (discrete) solution ȳ λ corresponding to λ = h 2 final /1 are shown. The total runtime is also given. We see that on each level only a few Newton iterations are performed. In Table 7.13 the iteration history of A111 on the finest level is shown for h final = 1/256. Obviously, the convergence is superlinear with rate >1, and we observe mesh-independent performance of the methods. Furthermore, the runtime
168 Applications Figure 7.6 Computed state ȳ λ (h = 1/64). increases approximately linearly with the number of unknowns. In conclusion, it can be seen that, similar as for the control problem in section 7.1, the algorithms offer all the favorable properties that are predicted by our theory. For this application, the smoothing-step-free algorithm with the projection-based NCP-function leads to significantly shorter solution times than the algorithm with Fischer Burmeister function and smoothing step. This is mainly caused by the additional PDE solves needed for the smoothing steps. As for the use of multigrid methods, it would be interesting to investigate if instead of multilevel Newton methods also nonlinear multigrid methods can successfully be used and investigated. Furthermore, we stress that many other variational inequalities can be treated in a similar way. In particular, this applies to certain kinds of the following problems: problems with constraints on the boundary, time-dependent VIPs, quasivariational inequalities [12, 13], and VIPs of the second kind.
169 7.3 Obstacle Problems Figure 7.7 Computed dual solution ū λ (h = 1/64).
170 8. Optimal Control of the Incompressible Navier Stokes Equations 8.1 Introduction The Navier Stokes equations describe viscous fluid flow and are thus of central interest for many simulations of practical importance (e.g., in aerodynamics, hydrodynamics, medicine, weather forecast, environmental and ocean sciences). Currently, significant efforts are made to develop and analyze optimal control techniques for the Navier-Stokes equations. In particular, control of the incompressible Navier-Stokes equations has been investigated intensively in, e.g., [1, 16, 17, 43, 58, 67, 68, 69, 7, 71, 75, 79, 8]. Our aim is to show that our class of semismooth Newton methods can be applied to the constrained distributed control of the incompressible Navier-Stokes equations. We consider instationary incompressible flow in two space dimensions. The set Ω R 2 occupied by the fluid is assumed to be nonempty, open, and bounded with sufficiently smooth boundary Ω. By t [, T ], T >, we denote time and by x = (x 1, x 2 ) T the spatial position. For the time-space domain we introduce the notation Q = (, T ) Ω. The state of the fluid is determined by its velocity field y = (y 1, y 2 ) T and its pressure P, both depending on t and x. Throughout, we work in dimensionless form. The Navier Stokes equations can be written in the form y t ν y + (y )y + P = Ru + f in Q, y = in Q, y = in (, T ) Ω, y(, ) = y in Ω. (8.1) Hereby, ν = 1/Re, where Re > is the Reynolds number, y is a given initial state at time t = satisfying y =, u(t, x) is the control, R is a linear operator and f(t, x) are given data. The precise functional analytic setting is given in section 8.2 below. In (8.1) the following notation is used: ( ) u1 u = (u 1 ) x1 + (u 2 ) x2, u = u 2 ( ) ( ) u1 (v (u )v = 1 ) x1 + u 2 (v 1 ) x2 Px1, P =. u 1 (v 2 ) x1 + u 2 (v 2 ) x2 P x2 = ( (u1 ) x1 x 1 + (u 1 ) x2 x 2 (u 2 ) x1 x 1 + (u 2 ) x2 x 2 ),
171 Optimal Control of the Incompressible Navier Stokes Equations We perform time-dependent control on the right hand side. To this end, let be given a nonempty and bounded open set Ω c R k and a control operator R L(L 2 (Ω c ) l, H 1 (Ω) 2 ), and choose as control space U = L 2 (Q c ) l, Q c = (, T ) Ω c. Example 8.1. For time-dependent control of the right hand side on a subset Ω c Ω of the spatial domain, we can choose R L(L 2 (Ω c ) 2, H 1 (Ω) 2 ), (Rv)(x) = v(x) for x Ω c, (Rv)(x) =, otherwise. Given a closed convex feasible set C U, the control problem consists in finding a control u C which, together with the corresponding solution (y, P ) of the state equation (8.1), minimizes the objective function J(y, u). Specifically, we consider tracking-type objective functions of the form J(y, u) = 1 T Ny z d 2 2 2dxdt + λ T u u d 2 Ω 2 2dωdt. (8.2) Ω c Hereby, N : H 1 (Ω) 2 L 2 (Ω) m, m 1, is a bounded linear operator, z d L 2 (Q) m is a desired candidate state observation to which we would like Ny to drive by optimal control, λ > is a regularization parameter, and u d L p (Q c ) l, p > 2, are given data. 8.2 Functional Analytic Setting of the Control Problem In our analysis we will consider weak solutions of the Navier Stokes equations. To make this precise, we first introduce several function spaces which provide a standard framework for the analysis of the Navier-Stokes equations [6, 17, 134] Function Spaces We work in the following spaces: V = {v C (Ω) 2 : v = }, H = closure of V in L 2 (Ω) 2, V = closure of V in H 1 (Ω) 2, L p (X) = L p (, T ; X), W = {v L 2 (V ) : v t L 2 (V )}, C(X) = C(, T ; X) = {v : [, T ] X, v continuous}. with inner products and norms (v, w) H = (v, w) L 2 (Ω) 2 = (v, w) V = (v, w) H 1 (Ω) 2 = ( ) 1/p T y L p (X) = y(t) p X dt Ω Ω ( i v iw i ) dx ( i,j [v i] xj [w i ] xj ), y L (X) = ess sup y(t) X, <t<t
172 v W = ( T 8.2 Functional Analytic Setting of the Control Problem 169 ) 1/2 ( v 2 V + v t 2 ) V dt, y C(X) = sup y(t) X. t T Hereby, the dual space V of V is chosen in such a way that V H = H V is a Gelfand triple. The following relations between the introduced spaces hold: W C(H) L (H), L p (V ) = L q (V 1 ), p + 1 = 1, 1 < p, q <, q L p (V ) L q (V ), 1 q p, The Control Problem For the state space and control space, respectively, we choose Y = W state space, U = L 2 (Q c ) l control space. The data of the control problem are: the initial state y H. the right hand side data f L 2 (H 1 (Ω) 2 ). the right hand side control operator R L(L 2 (Ω c ) l, H 1 (Ω) 2 ) such that 4/3 def w W = {v L 2 (V ) : v t L 4/3 (V )} R w L p (Q c ) l is well defined and continuous with p > 2. the objective function J : Y U R as defined in (8.2), with data z d L 2 (Q) m, u d L p (Q c ) l, observation operator N L(H 1(Ω), L2 (Ω) m ), and regularization parameter λ >. the feasible set C U, which is nonempty closed, and convex. In order to apply the semismooth Newton method, we will assume later in this chapter that where C R l is a closed convex set. C = {u U : u(t, ω) C, (t, ω) Q c }, (8.3) Remark 8.2. For the choice of R discussed in Example 8.1 and 2 < p < 7/2, we can use the embedding W 4/3 L p (Ω) 2 established in Lemma 8.12 below, to see that w W 4/3 R w = w Qc L p (Q) 2 is continuous.
173 17 8. Optimal Control of the Incompressible Navier Stokes Equations For the weak formulation of the Navier-Stokes equations it is convenient to introduce the trilinear form b : V V V R, b(u, v, w) = w T (u )vdx = Ω Ω w T v x udx = Ω i,j u i(v j ) xi w j dx, The variational form of (8.1) is obtained by applying test functions v V to the momentum equation: d dt (y, v) H + ν(y, v) V + b(y, y, v) = Ru + f, v H 1 (Ω) 2,H 1 (Ω)2 v V in (, T ), (8.4) y(, ) = y in Ω. (8.5) Note hereby that the incompressibility condition y = is absorbed in the definition of the state space W. Further, the pressure term drops out since v = and thus integration by parts yields P, v H 1 (Ω) 2,H 1 (Ω)2 = (P, v) L 2 (Ω) 2 =. Furthermore, the initial condition (8.5) makes sense for y W, since C(H) W. For the well-definedness of (8.4), and also for our analysis, it is important to know the following facts about the trilinear form b. Lemma 8.3. There exists a constant c > such that, for all u, v, w V, b(u, v, w) = b(u, w, v), (8.6) b(u, v, w) c u L4 (Ω) 2 v V w L4 (Ω) 2, (8.7) b(u, v, w) c u 1/2 H u 1/2 V v V w 1/2 H w 1/2 V c u V v V w V. (8.8) Proof. (sketched) Equation (8.6) results from integration by parts and using u =, (8.7) follows by applying Hölder s inequality, see [134, Ch. III Lem. 3.4], and (8.8) follows from V H and the estimate [134, Ch. III Lem. 3.3] v L4 (Ω) 2 1/4 v 1/2 L 2 (Ω) v 1/2 L 2 (Ω) v H 1 2 (Ω). (8.9) Equations (8.4) and (8.5) can be written as operator equation E(y, u) = (8.1) with E : W U Z, Z def = L 2 (V ) H. For convenience, we introduce the following operators: For all y, v, w V, all u L 2 (Ω c ) l, and all z L 2 (Ω) m
174 A L(V, V ), Av, w V,V = (v, w) V, B L(V, L(V, V )), R π L(L 2 (Ω c ) l, V ), 8.3 Analysis of the Control Problem 171 B(y)v, w V,V = b(y, v, w), R π u, v V,V = Ru, v H 1 (Ω) 2,H 1 (Ω)2, N π L(V, L 2 (Ω) m ), (N π v, z) L2 (Ω) m = (Nv, z) L 2 (Ω) m. Further, we define f π L 2 (V ) by f π, v V,V = f, v H 1 (Ω) 2,H 1 (Ω)2 v V. Using these notations, the operator E assumes the form ( ) ( E1 (y, u) yt + νay + B(y)y R E(y, u) = = π u f π E 2 (y, u) y(, ) y ). Thus, we can write the optimal control problem in abstract form: minimize J(y, u) subject to E(y, u) = and u C. (8.11) 8.3 Analysis of the Control Problem State Equation Concerning existence and uniqueness of solutions to the state equation (8.4) and (8.5), we have: Proposition 8.4. For all u U and y H, there exists a unique y = y(u) W such that E(y, u) =. Furthermore, with r(u) = R π u + f π, y C(H) y H + 1 ν r(u) L2 (V ), (8.12) y L2 (V ) 1 ν y H + 1 ν r(u) L 2 (V ), (8.13) y W c ( y H + r(u) L 2 (V ) + y 2 H + r(u) 2 L 2 (V )). (8.14) The constant c depends only on ν. Proof. The existence and uniqueness is established in, e.g., [17, Thm. 3.3], together with the following energy equality 1 2 y(t) 2 H + ν t y(s) 2 V ds = 1 2 y 2 H + t r(u)(s), y(s) V,V ds, (8.15) which holds for all t [, T ] and is obtained by choosing v = y(t) as test function in (8.4), integrating from to t, and using
175 Optimal Control of the Incompressible Navier Stokes Equations 2 t y t (s), y(s) V,V ds = y(t) 2 H y() 2 H. By the Cauchy Schwarz and Young inequalities we have t t r(u)(s), y(s) V,V ds r(u)(s) V y(s) V,V ds Hence, (8.15) yields 1 2ν t r(u)(s) 2 V ds + ν 2 t y(t) 2 H + ν y(s) 2 V ds y 2 H + 1 ν t t y(s) 2 V ds. r(u)(s) 2 V ds, which proves (8.12) and (8.13). The state equation (8.4) yields for all v L 2 (V ), using (8.6), (8.8), and Hölder s inequality T T ( yt, v V,V dt ν (y, v)v + b(y, y, v) + r(u), v V,V ) dt T ( ) ν y V + c y H y V + r(u) V v V dt ( ν y L2 (V ) + c y L (H) y L2 (V ) + r(u) L2 (V )) v L2 (V ). With the Young inequality, (8.12), and (8.13) follows (8.14). We know already that the state equation possesses a unique solution y(u). Our aim is to show that the reduced control problem minimize j(u) def = J(y(u), u) subject to u B (8.16) can be solved by the semismooth Newton method. In particular, we must show that j is twice continuously differentiable. This will be done based on the implicit function theorem, which requires to investigate the differentiability properties of the operator E. In this context, it is convenient to introduce the trilinear form β : V V V R, β(u, v, w) = b(u, v, w) + b(v, u, w). (8.17) The following estimates are used several times. In their derivation, and throughout the rest of this chapter (if not stated differently), c denotes a generic constant that may differ from instance to instance. From (8.6), (8.8), and V H follows for all u, v, w V β(u, v, w) b(u, w, v) + b(v, w, u) c u 1/2 H u 1/2 V v 1/2 H v 1/2 V w V (8.18) c u 1/2 H u 1/2 V v V w V. (8.19)
176 8.3 Analysis of the Control Problem 173 Further, (8.18) and Hölder s inequality with exponents (, 4,, 4, 2) yield for all u, v L 2 (V ) L (H) W and all w L 2 (V ) T T β(u, v, w) dt c u 1/2 H u 1/2 V v 1/2 H v 1/2 V w V dt c u 1/2 L (H) u 1/2 L 2 (V ) v 1/2 L (H) v 1/2 L 2 (V ) w L 2 (V ). (8.2) In particular, for all u, v W and w L 2 (V ), T β(u, v, w) dt c u W v W w L 2 (V ). (8.21) Finally, (8.19) and Hölder s inequality with exponents (, 4, 4, 2) give for all u L 2 (V ) L (H), v L 4 (V ), and w L 2 (V ) T T β(u, v, w) dt c u 1/2 H u 1/2 V v V w V dt c u 1/2 L (H) u 1/2 L 2 (V ) v L 4 (V ) w L2 (V ). (8.22) We now prove that the state equation is infinitely Fréchet differentiable. Proposition 8.5. Let y H and (y, u) W U. Then the operator E : W U Z is twice continuously differentiable with Lipschitz continuous first derivative, constant second derivative, and vanishing third and higher derivatives. The derivatives are given by: E 1(y, u)(v, w) = v t + νav + B(y)v + B(v)y R π w, (8.23) E 2(y, u)(v, w) = v(, ), (8.24) E 1 (y, u)(v, w)(ˆv, ŵ) = B(ˆv)v + B(v)ˆv, (8.25) E 2 (y, u)(v, w)(ˆv, ŵ) =. (8.26) Proof. Since E 2 is linear and continuous, the assertions on E 2 and E 2 are obvious. Thus, we only have to consider E 1. If E 1 is differentiable, then formal differentiation shows that E 1 has the form stated in (8.23). This operator maps (v, w) W U continuously to L 2 (V ). In fact, for all z L 2 (V ), we obtain using (8.21) T vt + νav + B(y)v + B(v)y R π w, z V,V dt T ( vt V z V + ν v V z V + β(y, v, z) + R π w V z V ) dt ( v t L2 (V ) + ν v L2 (V ) + c y W v W + R π U,L2 (V ) w U ) z L2 (V ). Next, we show that E 1 is differentiable with its derivative given by (8.23). Using the linearity of A, B(v), v B(v), and R π, we obtain for all y, v W, u, w U
177 Optimal Control of the Incompressible Navier Stokes Equations E 1 (y + v, u + w) E 1 (y, u) (v t + νav + B(y)v + B(v)y R π w) = B(y + v)(y + v) B(y)y B(y)v B(v)y = B(v)v. For all z L 2 (V ) holds by (8.6), (8.8), and Hölder s inequality T T B(v)v, z V,V dt = b(v, v, z) dt T c v L 2 (V ) v L (H) z L 2 (V ) c v 2 W z L 2 (V ), c v V v H z V dt which proves the Fréchet differentiability of E 1. Note that E 1 depends affine linearly on (y, u) W U. It remains to show that the mapping E 1 : W U L(W U, L 2 (V )) is continuous at (, ). But this follows from E 1 (y, u)(v, w) E 1 (, )(v, w), z V,V = β(y, v, z) c y W v W z L 2 (V ). for all y, v W, all u, w U, and all z L 2 (V ), where we have used (8.21). As a consequence, E 1 is affine linear and continuous, thus Lipschitz, and E 1 is twice continuously differentiable with constant second derivative as given in (8.25). Further, since E is constant, it follows that E (k) = for all k 3. The next result concerns the linearized state equation The proof can be obtained by standard methods; the interested reader is referred to [79, 8]. Proposition 8.6. Let y H and (y, u) W U. Then the operator E y (y, u) L(W, Z ) is a homeomorphism, or, in more detail: For all y W, g L 2 (V ), and v H, the linearized Navier-Stokes equations v t + νav + B(y)v + B(v)y = g in L 2 (V ) v(, ) = v in H possess a unique solution v W. Furthermore, the following estimate holds: (8.27) v t L2 (V ) + v L2 (V ) + v L (H) c v W (8.28) c( y L2 (V ), y L (H))( g L2 (V ) + v H ) (8.29) c( y W )( g L2 (V ) + v H ), (8.3) where the functions c( ) depend locally Lipschitz on their arguments. Proposition 8.7. The mapping (y, u) W U E y (y, u) 1 L(Z, W ) is Lipschitz continuous on bounded sets. More precisely, there exists a locally Lipschitz continuous function c such that, for all (y i, u i ) W U, i = 1, 2, the following holds: E y (y 1, u 1 ) 1 E y (y 2, u 2 ) 1 Z,W c( y 1 W, y 2 W ) y 1 y 2 W.
178 8.3 Analysis of the Control Problem 175 Proof. Let z = (g, v ) Z = L 2 (V ) H be arbitrary and set, for i = 1, 2, v i = E y (y i, u i ) 1 z. Then, with y 12 = y 1 y 2, u 12 = u 1 u 2, and v 12 = v 1 v 2, we have v 12 () = and Therefore, = (E 1 ) y (y 1, u 1 )v 1 (E 1 ) y (y 2, u 2 )v 2 = (v 12 ) t + νav 12 + B(y 1 )v 1 + B(v 1 )y 1 B(y 2 )v 2 B(v 2 )y 2 = (v 12 ) t + νav 12 + B(y 2 )v 12 + B(v 12 )y 2 + B(y 12 )v 1 + B(v 1 )y 12 = (E 1 ) y (y 2, u 12 )v 12 + B(y 12 )v 1 + B(v 1 )y 12, = (E 2 ) y (y 1, u 1 )v 1 (E 2 ) y (y 1, u 1 )v 2 = v 12 (, ). ( ) B(y12 )v 1 B(v 1 )y 12 E y (y 2, u 12 )v 12 =, and thus, by Proposition 8.6 and (8.21) v 12 W c( y 2 W )( B(y 12 )v 1 + B(v 1 )y 12 L2 (V )) c( y 2 W ) v 1 W y 12 W c( y 2 W )c( y 1 W )( g L2 (V ) + v H ) y 12 W c( y 1 W, y 2 W ) y 12 W z Z, where c( ) are locally Lipschitz continuous functions Control-to-State Mapping In this section we show that the control-to-state mapping u U y(u) W is infinitely differentiable and that y(u), y (u), and y (u) are Lipschitz continuous on bounded sets. Theorem 8.8. The solution operator u U y(u) W of (8.1) is infinitely continuously differentiable. Further, there exist locally Lipschitz continuous functions c( ) such that for all u, u 1, u 2, v, w U holds y(u) W c( y H, r L2 (V )), (8.31) y (u) W c( y H, r L2 (V )), (8.32) y 1 y 2 W c( y H, r 1 L2 (V ), r 2 L2 (V )) u 1 u 2 U, (8.33) (y 1 y 2)v W c( y H, r 1 L2 (V ), r 2 L2 (V )) R π (u 1 u 2 ) L2 (V ) R π v L2 (V ), (8.34) (y 1 y 2 )(v, w) W c( y H, r 1 L2 (V ), r 2 L2 (V )) R π (u 1 u 2 ) L2 (V ) R π v L2 (V ) R π w L2 (V ), (8.35) with r = R π u + f π, r i = R π u i + f π, y i = y(u i ), y i = y (u i ), and y i = y (u i ).
179 Optimal Control of the Incompressible Navier Stokes Equations Proof. Since E is infinitely continuously differentiable by Proposition 8.5 and the partial derivative E y (y(u), u) L(W, Z ) is a homeomorphism according to Proposition 8.6, the implicit function theorem yields that u U y(u) W is infinitely continuously differentiable. The estimate (8.31) is just a restatement of (8.14) in Proposition 8.4. Using (8.31) and Proposition 8.6, we see that the derivative u U y (u) L(U, W ) satisfies, setting y = y(u), for all v U, y (u)v W = E y (y, u) 1 E u (y, u)v W E y (y, u) 1 Z,W E u (y, u)v Z c( y W ) E u (y, u)v Z c( y H, r L 2 (V )) R π v L 2 (V ) with c( ) being locally Lipschitz. This proves (8.32). Using (8.32), we obtain for all u 1, u 2 U, setting u 12 = u 1 u 2 and u(τ) = τu 1 + (1 τ)u 2, y 1 y 2 W = 1 1 y (u(τ))u 12 W dτ c ( y H, r(u(τ)) L2 (V )) R π u 12 L2 (V )dτ c ( y H, r 1 L2 (V ), r 2 L2 (V )) R π (u 1 u 2 ) L2 (V ) with a locally Lipschitz function c. Therefore, (8.33) is shown. From Proposition 8.7, (8.31), and (8.33), we obtain, for all v U, (y 1 y 2 )v W = E y (y 1, u 1 ) 1 E u (y 1, u 1 )v E y (y 2, u 2 ) 1 E u (y 2, u 2 )v W c( y 1 W, y 2 W ) y 1 y 2 W R π v L2 (V ) c( y H, r 1 L2 (V ), r 2 L2 (V )) R π (u 1 u 2 ) L2 (V ) R π v L2 (V ) with c( ) being locally Lipschitz continuous. This establishes (8.34). Finally, differentiating the equation E(y(u), u) = twice yields, for all u, v, w U, with y = y(u), E y (y, u)y (u)(v, w) + E yy (y, u)(y (u)v, y (u)w) + E yu (y, u)(y (u)v, w) + E uy (y, u)(v, y (u)w) + E uu (y, u)(v, w) =. Now, we use that E u v = ( R π v, ) T is constant to conclude that y (u)(v, w) = E y (y, u) 1 E yy (y, u)(y (u)v, y (u)w) = E y (y, u) 1( B(y (u)v)y (u)w + B(y (u)w)y (u)v ). From this, Proposition 8.7, (8.33), and (8.34) we see that (8.35) holds true Adjoint Equation Next, given a control u U and a state y W, we analyze the adjoint equation
180 8.3 Analysis of the Control Problem 177 ( ) E y (y, u) w = g, (8.36) h which can be used for the representation of the gradient j (u). In fact, see appendix A.1, we have with y = y(u) ( ) ( ) w w j (u) = J u (y, u) + E u (y, u), where E y (y, u) = J y (y, u). h h Proposition For every u U and y W, the adjoint equation (8.36) possesses a unique solution (w, h) Z = L 2 (V ) H for all g W. Moreover, w L2 (V ) + h H c (w, h) Z c( y W ) g W, (8.37) where c( ) is locally Lipschitz. 2. Assume now that g L 4/3 (V ) W. Then the adjoint equation can be written in the form d dt (w, v) H + ν(w, v) V + β(y, v, w) = g, v V,V v V on (, T ), (8.38) Furthermore, w t L 4/3 (V ) W, w C(V ), and w(t, ) = on Ω, (8.39) h w(, ) = on Ω. (8.4) w t W c( y W ) g W, (8.41) w t L 4/3 (V ) c( y W ) g W + g L 4/3 (V ) (8.42) with c( ) being locally Lipschitz continuous. Proof. 1. From Proposition 8.6 we know that E y (y, u) L(W, Z ) is a homeomorphism and thus also E y (y, u) L(Z, W ) is a homeomorphism. Hence, the adjoint equation possesses a unique solution (w, h) Z = L 2 (V ) H that depends linearly and continuously on g W. More precisely, Proposition 8.6 yields w L2 (V ) + h H c (w, h) Z = c (E y (y, u) ) 1 g Z c (E y (y, u) ) 1 W,Z g W = c E y (y, u) 1 Z,W g W c( y W ) g W, where c( ) depends locally Lipschitz on y W. 2. For the rest of the proof we assume g W L 4/3 (V ). We proceed by showing that the adjoint equation coincides with (8.38). Using the trilinear form β defined in (8.17), the adjoint state (w, h) L 2 (V ) H satisfies for all v W : T ( ) vt, w V,V +ν(v, w) V +β(y, v, w) g, v V,V dt+(v(), h)h =. (8.43)
181 Optimal Control of the Incompressible Navier Stokes Equations In particular, we obtain for v W replaced by ϕv with ϕ C (, T ) and v V : d dt (w, v) H + ν(w, v) V + β(y, v, w) = g, v V,V v V on (, T ), in the sense of distributions, which is (8.38). As a result of (8.22), we have that z L 4 (V ) β(y, z, w) is linear and continuous and therefore an element of L 4 (V ) = L 4/3 (V ). For v V this implies β(y, v, w) L 4/3 (, T ). Further, g, v V,V L 4/3 (, T ) and (w, v) V L 2 (, T ), hence d dt (w, v) H = ν(w, v) V + β(y, v, w) g, v V,V L 4/3 (, T ). This shows that (w, v) H H 1,4/3 (, T ). For all v V and all ϕ C ([, T ]) holds ϕv W. We choose these particular test functions in (8.43) and integrate by parts (which is allowed since C ([, T ]) H 1,4 (, T )) This gives = = T T ( (v, w) H ϕ + ( ) ) ν(v, w) V + β(y, v, w) g, v V,V ϕ dt + (v, h) H ϕ() ( d dt (w, v) ) H + ν(w, v) V + β(y, v, w) g, v V,V ϕdt + (v, h w()) H ϕ() + (v, w(t )) H ϕ(t ) The integral vanishes, since (8.38) was already shown to hold. Considering all ϕ C ([, T ]) with ϕ() = proves (8.39), whereas (8.4) follows by considering all ϕ C ([, T ]) with ϕ(t ) =. Finally, we solve (8.38) for w t and apply (8.21) to derive, for all z W, w t, z W,W T Further, for all z L 4 (V ), T w t, z V,V dt ( ν (w, z)v + β(y, z, w) ) dt + g, z W,W ν w L2 (V ) z L2 (V ) + c y W w L2 (V ) z W + g W z W. T ν(w, z)v + β(y, z, w) g, z V,V dt ( ν w L 4/3 (V ) + c y W w L2 (V ) + g L 4/3 (V )) z L4 (V ), where we have used Hölder s inequality and (8.22). Application of (8.37) completes the proof of (8.41) and (8.42). The assertion w C(V ) follows from the embedding {w L 2 (V ) : w t L 4/3 (V )} C(V ). Our next aim is to estimate the distance of two adjoint states (w i, h i ), i = 1, 2, that correspond to different states y i and right hand sides g i.
182 8.3 Analysis of the Control Problem 179 Proposition 8.1. For given y i W and g i W L 4/3 (V ), i = 1, 2, let (w i, h i ) L 2 (V ) H denote the corresponding solutions of the adjoint equation (8.36) with state y i and right hand side g i. Then w i L 2 (V ) C(V ), (w i ) t W L 4/3 (V ), h i = w i (), and w 1 w 2 L 2 (V ) + (w 1 w 2 ) t L 4/3 (V ) + h 1 h 2 H c( y 1 W, y 2 W ) ( g 1 g 2 W + g 1 W y 1 y 2 W ) + g 1 g 2 L 4/3 (V ), (8.44) where c( ) is locally Lipschitz continuous. Proof. The existence and regularity results are those stated in Proposition 8.9. Introducing the differences w 12 = w 1 w 2, h 12 = h 1 h 2, y 12 = y 1 y 2, and g 12 = g 1 g 2, we have w 12 (T ) = and h 12 = w 12 () on Ω and, on (, T ), d dt (w 12, v) H + ν(w 12, v) V + β(y 1, v, w 1 ) β(y 2, v, w 2 ) = g 12, v V,V. Rearranging terms yields d dt (w 12, v) H + ν(w 12, v) V + β(y 2, v, w 12 ) = g 12, v V,V β(y 12, v, w 1 ). Therefore, (w 12, h 12 ) is solution of the adjoint equation for the state y 2 and the right hand side g = g 12 l, l : v β(y 12, v, w 1 ). From (8.21), (8.22) we know that l W L 4/3 (V ) and Therefore, by Proposition 8.9 l W + l L 4/3 (V ) c y 12 W w 1 L 2 (V ). w 12 L2 (V ) + (w 12 ) t L 4/3 (V ) + h 12 H c( y 2 W ) g W + g L 4/3 (V ) c( y 2 W ) ( ) g 12 W + c w 1 L2 (V ) y 12 W + g12 L 4/3 (V ) + c w 1 L 2 (V ) y 12 W c( y 2 W ) ( ) g 12 W + w 1 L 2 (V ) y 12 W + g12 L 4/3 (V ) c( y 2 W ) ( ) g 12 W + c( y 1 W ) g 1 W y 12 W + g12 L 4/3 (V ) c( y 1 W, y 2 W ) ( ) g 12 W + g 1 W y 12 W + g12 L 4/3 (V ), where c( ) is locally Lipschitz. The proof is complete Properties of the Reduced Objective Function We will now show that the reduced objective function j meets all requirements that are needed to apply semismooth Newton methods for the solution of the control problem (8.16). We have, since J is quadratic,
183 18 8. Optimal Control of the Incompressible Navier Stokes Equations J u (y, u) = λ(u u d ), J y (y, u) = N π (N π y z d ), J uu (y, u) = λi, J uy (y, u) =, J yu (y, u) =, J yy (y, u) = N π N π. Since, u U y(u) W is infinitely differentiable and y, y, and y are Lipschitz continuous on bounded sets, see Theorem 8.8, we obtain that j(u) = J(y(u), u) is infinitely differentiable with j, j, and j being Lipschitz continuous on bounded sets. Further, using the adjoint representation of the gradient, and the fact that E u v = ( R π v, ) T, we have, with y = y(u), j (u) = J u (y, u) R π w = λ(u u d ) R w, (8.45) where w solves the adjoint equation (8.38), (8.39) with right hand side g = J y (y, u) = N π (N π y z d ) L 2 (V ) W L 4/3 (V ). (8.46) Therefore, we have: Theorem The reduced objective function j : U = L 2 (Q c ) l R is infinitely differentiable with j, j, and j being Lipschitz continuous on bounded sets. The reduced gradient has the form j (u) = λu + G(u), G(u) = R w λu d, where w is the adjoint state. In particular, the operator G maps L 2 (Q c ) l Lipschitz continuously on bounded sets to L p (Q c ) l. Further, G : L 2 (Q c ) l L 2 (Q c ) l is continuously differentiable with G (u) = G (u) being bounded on bounded sets in L(L 2 (Q c ) l, L p (Q c ) l ) Proof. The properties of j follow from Theorem 8.8 and (8.45). The Lipschitz continuity assertion on G follows from (8.44), (8.33), and (8.46). Further, G(u) = j (u) λu is, considered as a mapping L 2 (Q c ) l L 2 (Q c ) l, continuously differentiable with derivative G (u) = j (u) λi. In particular, we see that G is self-adjoint. Now consider G (u) for all u B ρ = ρb L2 (Q c ) l. On this set G maps Lipschitz continuously into L p (Q c ) l. Denoting the Lipschitz rank by L ρ, we now prove G (u) L2 (Q c ) l,l p (Q c ) L l ρ for all u B ρ. In fact, for all u B ρ and all v L 2 (Q c ) l we have u + tv B ρ for t > small enough and thus G (u)v L p (Q c ) l = lim t + t 1 G(u + tv) G(u) L p (Q c ) l L ρ v L2 (Q c ) l. For illustration, we consider the case where Ω c Ω, l = 2, and (Rv)(x) = v(x) for x Ω c, (Rv)(x) =, otherwise. We need the following embedding:
184 8.4 Application of Semismooth Newton Methods 181 Lemma For all 1 p < 7/2 and all v L 2 (V ) with v t L 4/3 (V ) holds v Lp (Q) 2 c( v t L 4/3 (V ) + v L2 (V )). Proof. In [7] it is proved that for all 1 q < 8 holds W 4/3 = {v L 2 (V ) : v t L 4/3 (V )} L q (H) (the embedding is even compact). We proceed by showing that for all p [1, 7/2) there exists q [1, 8) such that L q (H) L 2 (V ) L p (Q) 2. Due to the boundedness of Q it suffices to consider all p [2, 7/2). Recall that V L s (Ω) 2 for all s [1, ). Now let r = 4, r = 4/3, Then holds θ = [1/4, 4/7) and s = [2, ). 2p 7 2p θ θ s = 1 p, 1 r + 1 r = 1, q = θpr = 4p 6 [2, 8), (1 θ)pr = 2. Thus, we can apply the interpolation inequality and Hölder s inequality to conclude v p L p (Q) 2 = c T ( T v p L p (Ω) 2 dt c v θpr L 2 (Ω) 2 dt T ) 1/r ( T = c v θp L q (H) 2 v (1 θ)p L 2 (L s (Ω) 2 ) For 2 < p < 7/2 we thus have that c ( v t L 4/3 (V ) + v t L2 (V ) c ( p. v t L 4/3 (V ) + v t L2 (V )) v θp L 2 (Ω) 2 v (1 θ)p L s (Ω) 2 dt c v (1 θ)pr L s (Ω) dt 2 ) θp v (1 θ)p L 2 (V ) w W 4/3 L p (Q) 2 R w = w Qc L p (Q c ) 2 is continuous, so that Theorem 8.11 is applicable. ) 1/r 8.4 Application of Semismooth Newton Methods We now consider the reduced problem (8.16) with feasible set of the form (8.3), and reformulate its first order necessary optimality conditions in form of the nonsmooth operator equation
185 Optimal Control of the Incompressible Navier Stokes Equations Π(u) =, Π(u)(t, ω) = u(t, ω) P C ( u(t, ω) λ 1 j (u)(t, ω) ), (t, ω) Q c. Let us assume that P C is semismooth. Then, for r = 2 and any p as specified, Theorem 8.11 shows that Assumption 5.14 is satisfied by F = j. Therefore, Theorem 5.15 is applicable and yields the C Π-semismoothness of Π : L2 (Q c ) l L 2 (Q c ) l. If we prefer to work with a reformulation by means of a different Lipschitz continuous and semismooth function π, in the form π(x) = x 1 P C (x 1 x 2 ) =, π ( u, j (u) ) =, we can use Theorem 5.11 to establish the semismoothness of the resulting operator as a mapping L p (Q c ) l L 2 (Q c ) l for any p p. Therefore, our class of semismooth Newton methods is applicable to both reformulations. We also can apply the sufficient condition for regularity of Theorem 4.8. Since this condition was established in the framework of NCPs, we consider now the case U = L 2 (Q c ) and C = [, ). Then, we immediately see that Theorem 8.11 provides everything to verify Assumption 4.6, provided that j (ū) is coercive on the tangent space of the strongly active constraints as assumed in (e) and that the used NCP-function π = φ satisfies (f) (h). The coercivity condition can be interpreted as a strong second order sufficient condition for optimality, see [46, 143]. We are currently working on a finite-element discretization of the flow control problem and hope to have numerical results available soon. In the implementation of the method we plan to use a preconditioned iterative method (gmres, cg, etc.) for the solution of the semismooth Newton system. Hereby, depending on the particular problem, reduction techniques can be used to symmetrize the semismooth Newton system, which makes conjugate gradient methods applicable. The encouraging numerical results by Hinze and Kunisch [8] for second-order methods applied to the unconstrained control of the Navier-Stokes equation make us confident that the semismooth Newton method can be solved efficiently.
186 9. Optimal Control of the Compressible Navier Stokes Equations 9.1 Introduction In this chapter we apply our class of semismooth Newton methods to a boundary control problem governed by the time-dependent compressible Navier Stokes equations. The underlying Navier Stokes solver and the adjoint code for the computation of the reduced gradient were developed in joint work with Scott Collis (Rice University), Matthias Heinkenschloss (Rice University), Kaveh Ghayour (Rice University), and Stefan Ulbrich (TU München) as part of the the Rice AeroAcoustic Control (RAAC) project, which was initiated and is directed by Scott Collis and Matthias Heinkenschloss. The major aim of this project is to put forward an optimal control framework for the control of aeroacoustic noise where the acoustic source is predicted by the unsteady, compressible, Navier-Stokes equations. A particularly interesting application is the control of the sound arising from Blade-Vortex Interaction (BVI), which can occur for rotorcraft under certain flight conditions (e.g., during landing). Hereby, vortices shed by a preceding blade hit a subsequent blade which results in a high amplitude, impulsive noise. This loud noise restricts civil rotorcraft use severely, and thus makes active noise control on the blade surface highly desirable. For more details we refer to [34, 35] and the references therein. 9.2 The Flow Control Problem In the following, we will not consider noise control. Rather, we content ourselves with solving a model problem to investigate the viability of our approach for controlling the compressible Navier Stokes equations. This model consists in two counterrotating viscous vortices above an infinite wall which, due to the self-induced velocity field, propagate downward and interact with the wall. As control mechanism we use suction and blowing on part of the wall, i.e., we control the normal velocity of the fluid on this part of the wall. As computational domain we use a rectangle Ω = ( L 1, L 1 ) (, L 2 ). The wall is located at x 2, whereas the left, right, and upper part of the boundary are transparent in the sense that we pose nonreflecting boundary conditions there. Ω is occupied by a compressible fluid whose state is described by y = (ρ, v 1, v 2, θ)
187 Optimal Control of the Compressible Navier Stokes Equations with density ρ(t, x), velocities v i (t, x), i = 1, 2, and temperature θ(t, x). Hereby, t I def = (, T ) is the time and x = (x 1, x 2 ) denotes the spatial location. The state satisfies the Compressible Navier Stokes Equations (CNS): t F (y) + 2 i=1 x i F i (y) = 2 i=1 G i (y, y) = 1 Re τ 1i v 1 + τ 2i v 2 + x i G i (y, y) on I Ω, y(, ) = y on Ω. Hereby, we have written CNS in conservative form. Boundary conditions are specified below. We have used the following notation: ρ ρv 1 ρv 2 F (y) = ρv 1 ρv 2, F 1 (y) = ρv1 2 + p ρv 1 v 2, F 2 (y) = ρv 1 v 2 ρv2 2 + p, ρe (ρe + p)v 1 (ρe + p)v 2 τ 1i τ 2i κ (γ 1)M 2 Pr θ x i The pressure p, the total energy per unit mass E, and the stress tensor τ are given by p = ρθ γm, E = θ 2 γ(γ 1)M (v2 1 + v2 2 ),. τ ii = 2µ(v i ) xi + λ( v), τ 12 = τ 21 = µ((v 1 ) x2 + (v 2 ) x1 ). Here µ and λ are the first and second coefficient of viscosity, κ is the thermal conductivity, M is the reference Mach number, Pr is the reference Prandtl number, and Re is the reference Reynolds number. The boundary conditions on the wall are θ/ n =, v 1 =, v 2 = u on Σ c = I ( L 1, L 1 ) {}, and on the rest of the boundary we pose nonreflecting boundary conditions that are derived from inviscid characteristic boundary conditions. At the initial time t = two counter-rotating viscous vortices are located in the center of Ω. Without control (v 2 = u ), the vortices move downward and interact with the wall, which causes them to bounce back, see Figure 9.1. Our aim is to perform control by suction and blowing on the wall in such a way that the terminal kinetic energy is minimized. To this end, we choose the objective function [ ρ ] J(y, u) = 2 (v2 1 + v2 2 ) dx + α 2 u 2 H 1 (Σ c ). Ω The first term is the kinetic energy at the final time t = T, whereas the second term is an H 1 -regularization with respect to (t, x 1 ). Here, we write α > for the t=t
188 9.3 Adjoint-Based Gradient Computation 185 regularization parameter to avoid confusion with the second coefficient of viscosity. As control space, we choose U = H 1 (I, H 1( L 1, L 1 )). We stress that the mathematical existence and uniqueness theory for the compressible Navier Stokes equations, see [81, 18, 111] for state of the art references, seems not yet to be complete enough to admit a rigorous control theory. Therefore, our choice of the control space is guided more by formal and heuristic arguments than by rigorous control theory. If the H 1 -regularization is omitted or replaced by an L 2 -regularization, the control exhibits increasingly heavy oscillations in time and space during the course of optimization, which indicates that the problem is ill-posed without a sufficiently strong regularization. In the RAAC project, we considered so far only the unconstrained flow control problem and worked with a nonlinear conjugate gradient method for its solution. In the following, we want to solve the same problem, but with the control subject to pointwise bound constraints. We then apply our inexact semismooth Newton methods and use BFGS-updates [41, 42] to approximate the Hessian of the reduced objective function. Therefore, in the following we restrict the control by pointwise bound constraints (with the realistic interpretation that we are only allowed to inject or draw off fluid with a certain maximum speed), and arrive at the following flow control problem: [ ρ minimize J(y, u) def = 2 (v2 1 + v2)] 2 dx + α 2 u 2 H 1 (Σ c ) Ω subject to y solves CNS for the boundary conditions associated with u, u min u u max. t=t (9.1) 9.3 Adjoint-Based Gradient Computation For our computations we use the following results that were obtained jointly with Scott Collis, Kaveh Ghayour, Matthias Heinkenschloss, and Stefan Ulbrich [34, 35]: 1. A Navier Stokes solver, written in Fortran9 by Scott Collis [36], was ported to the parallel computer SGI Origin 2 and adjusted to the requirements of optimal control. For space discretization finite differences are used which are sixth order accurate in the interior of the domain. The time discretization is done by an explicit Runge Kutta method. The code was parallelized on the basis of OpenMP. 2. Two different variants of adjoint-based gradient computation were considered: (a) The first approach derives the adjoint Navier Stokes equations including adjoint wall boundary conditions [35]. The derivation of adjoint boundary conditions for the nonreflecting boundary conditions turns out to be a delicate matter and is not yet completely done. Hence, in this approach we have used the (appropriately augmented) adjoint boundary conditions of the Euler equation. The gradient calculation then requires the solution of the Navier Stokes equations, followed by the solution of the adjoint Navier Stokes equations backward in
189 Optimal Control of the Compressible Navier Stokes Equations time. Since the discretized adjoint equation is usually not the exact adjoint of the discrete state equation, this approach, which usually is called optimize, then discretize (OD), only yields inexact discrete gradients in general. (b) In a second approach we have investigated the adjoint-based computation of gradients by applying the reverse mode of automatic differentiation (AD). Hereby, we used the AD-software TAMC [59], a source-to-source compiler, which translates Fortran9 routines to their corresponding adjoint Fortran9 routines. This approach yields exact (up to roundoff errors) discrete gradients and is termed discretize, then optimize (DO). For the computational results shown below, the DO method described in (b) was used. This approach has the advantage of providing exact discrete gradients, which is very favorable when doing optimization. In fact, descent methods based on inexact gradients require a control mechanism over the amount of inexactness, which is not a trivial task in OD based approaches. Secondly, the use of exact gradients is very helpful in verifying the correctness of the adjoint code, since potential errors can usually be found immediately by comparing directional derivatives with the corresponding finite difference quotients. When working with the OD approach, which has the advantage that the source code of the CNS-solver is not required, the discretization of state equation, adjoint equation, and objective function have to be compatible (in a sense not discussed here, see, e.g., [34, 74]) to obtain gradients that are good approximations (i) of the infinitedimensional gradients, and (ii) of the exact discrete gradients. Hereby, requirement (ii) is important for a successful solution of the discrete control problem, whereas (i) crucially influences the quality of the computed discrete optimal control, measured in terms of the infinite-dimensional control problem. This second issue also applies to the DO approach, but for DO it is only important to use compatible discretizations for state equation and objective function. With respect to this interesting topic, we have used [74] as a guideline, to which we refer for further reference. 9.4 Semismooth BFGS-Newton Method The implementation of the semismooth Newton method uses BFGS-approximations of the Hessian matrix. The resulting semismooth Newton systems have a similar structure as those arising in the step computation of the successful Limited-Memory BFGS method L-BFGS-B by Byrd, Lu, Nocedal and Zhu [25, 148]. Hence, in our implementation we decided to follow the design of L-BFGS-B (the computations for this chapter were done before we developed our trust region theory in section 6) Quasi-Newton BFGS-Approximations In this section, we focus on the use of BFGS-approximations in semismooth Newton methods for the discretized control problem. We stress, however, that convergence
190 9.4 Semismooth BFGS-Newton Method 187 results for quasi-newton methods in infinite-dimensional Hilbert spaces are available [64, 94, 131]. Using a similar notation as in chapter 7, the semismooth Newton system for the discrete control problem assumes the form (written in coordinates in the discrete L 2 -space) [D h 1] k + [D h 2] k H h ks h k = Φ h (u h k) with H h k = jh (u h k ) and diagonal matrices [Dh i ] k, ([D h 1 ] k + [D h 2 ] k) jj κ. For the approximation of the Hessian H h k we work with Limited-Memory BFGS-Matrices (l 1): B h k = Bh Wh k Zh k Wh k T R n h n h, W h k Rnh 2l, Z h k R2l 2l, where we have used the compact representation of [26], to which we refer for details. The matrix B h is the initial BFGS-matrix and should be chosen such that (a) the product (B h ) 1 v h can be computed reasonably efficient, since this is needed in the BFGS-updates, and (b) the inner product induced by B h approximates the original infinite-dimensional inner product on U sufficiently well. In the case of our flow control problem, we have U = H 1 (I, H 1 ( L 1, L 1 )), and use a finite difference approximation of the underlying Laplace operator to obtain B h. Compared with the state and adjoint solves, the solution of the 2-D Helmholtz equation required to compute (B h ) 1 v h is negligible. The inverse of M h k = [Dh 1 ] k + [D h 2 ] kb h k can be computed by the Sherman Morrison Woodbury formula: (M h k ) 1 = C h k + Ch k [Dh 2 ] kw h k( I Z h k W h k T C h k [D h 2 ] kw h k) 1Z h k W h k T C h k, where C h k = ([Dh 1] k + [D h 2] k B h ) The Algorithm We now give a sketch of the algorithm: 1. The Hessian matrix of the discrete objective function is approximated by Limited-Memory BFGS-matrices. Hereby, we choose B h such that it represents a finite difference approximation of the inner product on U. 2. The globalization is similar as in the well-accepted L-BFGS-B method of Byrd, Lu, Nocedal and Zhu [25, 148]: i. At the current point u h k Bh, the objective function j h is approximated by a quadratic model q h k. ii. Starting from u h k, a generalized Cauchy point uh,c k B h is computed by an Armijo-type linesearch for q h k along the projected gradient path P B h(u h k tjh k ), t. iii. The semismooth Newton method is used to compute a Newton point u h,n k.
191 Optimal Control of the Compressible Navier Stokes Equations iv. By approximate minimization of q h k along the projected path P B h(u h,c k + t(u h,n k u h,c k )), t [, 1], the point u h,q k is computed. v. The new iterate u h k+1 is obtained by approximate minimization of jh k on the line segment [u h k, uh,q k ], using the algorithm by Moré Thuente [114]. Remark 9.1. We should mention that we have not developed a convergence theory for the above algorithm. We also point out that the control problem under consideration does not fit directly in the framework under which we analyzed semismooth Newton methods. In particular, the problem is not posed in L p. Nevertheless, we think that the developed theory is encouraging enough to try to apply the method also to problems for which a complete theory is not (yet) available. 9.5 Numerical Results We now present numerical results for the described semismooth BFGS-Newton method when applied to the flow control problem (9.1). Here are the main facts about problem and implementation: The space discretization is done by a high order finite difference method on an cartesian mesh. For the time discretization the standard 4-stage Runge Kutta method is used, with 6 time steps and T = 24. This allows parallelization within each time step. We compute exact discrete gradients by solving the adjoint of the discrete state equation, which is obtained by the reverse mode of automatic differentiation using TAMC [59]. As optimization method, we use the semismooth BFGS-Newton method described above. Parameter: Re = 5, Pr = 1, M =.5, γ = 1.4; regularization parameter α =.5; bounds u min =.2, u max =.2. As NCP-function we use a variant of the penalized Fischer Burmeister function [28]. The resulting problem has over 75, control variables and over 29,, state variables and thus is very large scale. The computations were performed on an SGI Origin 2 with 16 R12 processors and 1GB memory. We used four processors. Figure 9.1 displays the state (the density ρ is shown) of the uncontrolled system (v 2 Σc = u ). We see that the vortices hit the wall and bounce back. The terminal state, at which we evaluate the kinetic energy, is shown in the last, magnified picture. The resulting terminal kinetic energy in the no-control case is
192 9.5 Numerical Results 189 No control (v 2 Σc = u ): E kin t=t = J(y(), ) = 7.9. Figure 9.2 shows the state (represented by the density ρ) when optimal control is applied. Hereby, the optimal control was obtained by 1 iterations of the BFGS- Newton method. The resulting terminal kinetic energy in the optimal control case and the objective function value (E kin t=t + regularization), respectively, are Optimal control (v 2 Σc = u ): E kin t=t =.59, J(u, y(u )) =.85, where u denotes the computed optimal control, which is displayed in Figure 9.3. It can be seen in Figure 9.3 that the lower bound becomes active. In fact, the upper bound also is active at a few points, but this is not apparent from the picture. By t x Figure 9.3 Computed optimal control u. applying optimal control the vortices are successfully absorbed. If we had displayed the kinetic energy instead of the density, the vortices would be almost invisible at the terminal time in the optimal control case, since the optimal control reduces the terminal kinetic energy to less than one hundredth of its value without control. In comparison with our computational experience for the unconstrained control problem, the semismooth Newton method performs comparably efficient. This shows the efficiency of semismooth Newton methods for the solution of very large scale problems.
193 A. Appendix A.1 Adjoint Approach for Optimal Control Problems In this appendix we describe the adjoint approach for the computation of gradient and Hessian of the reduced objective function. Hereby, we consider the abstract optimal control problem minimize y Y,u U J(y, u) subject to E(y, u) =, u U ad (A.1) with feasible set U ad U, objective function and state equation operator J : Y U R, E : Y U W. The control space U and the state space Y are Banach spaces, and W is the dual of a reflexive Banach space W. We assume the existence of a neighborhood V of U ad such that, for all u V, the state equation E(y, u) = possesses a unique solution y = y(u). Then the control problem (A.1) is equivalent to the reduced control problem minimize j(u) subject to u U ad, (A.2) where j : U V R, j(u) = J(y(u), u) is the reduced objective function. A.1.1 Adjoint Representation of the Reduced Gradient We now describe the adjoint approach for the computation of j (u). To this end, we assume that J and E are Fréchet differentiable near (y(u), u) and that u y(u) is Fréchet differentiable near u. According to the implicit function theorem, the latter holds, e.g., if E is continuously differentiable near (y(u), u) and if the partial derivative E y (y(u), u) is continuously invertible. Under the given hypotheses the function j is differentiable near u. We introduce a Lagrange multiplier w W for the state equation in (A.1) and define the Lagrange function L : Y V W R, L(y, u, w) = J(y, u) + E(y, u), w W,W.
194 192 A. Appendix Since E(y(u), u) = on V, we have Hence, L(y(u), u, w) = J(y(u), u) = j(u) u V, w W. j (u) = y u (u) L y (y(u), u, w) + L u (y(u), u, w) u V, w W. (A.3) The idea now is to choose w W such that L y (y(u), u, w) =. This equation is called adjoint equation and its solution w = w(u) W is the adjoint state. Thus, written in detail, the adjoint state w = w(u) is the solution of the adjoint equation J y (y(u), u) + E y (y(u), u) w =. If we assume that E y (y(u), u) is continuously invertible, the adjoint state w is uniquely determined. For w = w(u) we obtain j (u) = y u (u) L y (y(u), u, w(u)) + L u (y(u), u, w(u)) = L u (y(u), u, w(u)) = J u (y(u), u) + E u (y(u), u) w(u). The identity j (u) = J u (y(u), u) + E u (y(u), u) w(u) is called adjoint representation of the reduced gradient j (u). Therefore, the derivative j (u) can be computed as follows: 1. Compute the state y = y(u) Y by solving the state equation E(y, u) =. 2. Compute the adjoint state w = w(u) W by solving the adjoint equation E y (y, u) w = J y (y, u). 3. Compute j (u) = J u (y, u) + E u (y, u) w. Remark A.1. If the state equation is an initial value problem, then the adjoint equation is reverse in time. For the derivation of adjoint equations for various types of control problems governed by PDEs, we refer to Lions [16]. A.1.2 Adjoint Representation of the Reduced Hessian The adjoint approach can be continued to obtain adjoint formulas for the Hessian operator j (u). To this end, we assume that J and E are twice differentiable near
195 A.1 Adjoint Approach for Optimal Control Problems 193 (y(u), u) and that u y(u) is twice differentiable near u. By (A.3), we have, for all w W and all v 1, v 2 U, writing y = y(u), j (u)(v 1, v 2 ) = L yu (y, u, w)(y u (u)v 1, v 2 ) + L yy (y, u, w)(y u (u)v 1, y u (u)v 2 ) + L y (y, u, w), y uu (u)(v 1, v 2 ) Y,Y + L uy (y, u, w)(v 1, y u (u)v 2 ) + L uu (y, u, w)(v 1, v 2 ). If we choose w = w(u), then L y (y(u), u, w) =, and thus where L (y,u) j (u) = T (u) L (y,u)(y(u), u, w(u))t (u), (A.4) denotes the second partial derivative with respect to (y, u), and ( ) ( yu (u) Ey (y(u), u) 1 ) E u (y(u), u) T (u) = =. I U I U Hereby, in the second expression for T (u) we assume that E y (y(u), u) is continuously invertible and use that, since E(y( ), ), there holds E y (y(u), u)y u (u) + E u (y(u), u) =. Remark A.2. It is interesting to note that in the case where E y (y(u), u) is continuously invertible, the mapping T (u) is a continuous linear homeomorphism from U to the null space of E (y(u), u). In fact, it is obvious that E (y(u), u)t (u) =. Conversely, if E y (y(u), u)h + E u (y(u), u)v =, then h = E y (y(u), u) 1 E u (y(u), u)v, and thus ( ) h = T (u)v. v Therefore, j (u) is the restriction of the Hessian L (y,u)(y(u), u, w(u)) of the Lagrangian to the null space of E (y(u), u), parameterized by v U T (u)v. Usually, the formula (A.4) is not used to compute the complete Hessian operator. Rather, it is used to compute directional derivatives j (u)v of j. Here is the required procedure: 1. Compute the state y = y(u) Y by solving the state equation E(y, u) =. 2. Compute the adjoint state w = w(u) W by solving the adjoint equation E y (y, u) w = J y (y, u). 3. Compute z = z(u) Y as solution of the linearized state equation E y (y, u)z = E u (y, u)v. 4. Compute h = h(u) W by solving the adjoint system E y (y, u) h = L yy (y, u, w)z L yu (y, u, w)v. 5. Set j (u)v := E u (y, u) h + L uy (y, u, w)z + L uu (y, u, w)v.
196 194 A. Appendix A.2 Several Inequalities For convenience, we recall several well-known inequalities, which are frequently used throughout this work. Lemma A.3 (Hölder s inequality). Let p i [1, ], i = 1..., n, and p [1, ] satisfy = 1 p 1 p n p. Then, for all f i L p i (Ω) holds f = f 1 f 2 f n L p (Ω) and f L p f 1 L p 1 f n L pn. The following estimate is frequently used in chapter 3. It follows immediately from Hölder s inequality. Lemma A.4. Let Ω be bounded, 1 p q, and Then for all v L q (Ω) holds c p,q (Ω) def = µ(ω) q p pq if p < q <, c p, (Ω) def = µ(ω) 1/p if p <, c p,q (Ω) def = 1 if p = q. v L p c p,q (Ω) v L q. Lemma A.5 (Young s inequality). Let be given a, b, η >, and p, q (1, ) with 1/p + 1/q = 1. Then, setting p =, holds ab η p ap + η q/p b q. q A.3 Elementary Properties of Multifunctions A multifunction Γ : X V Y between Banach spaces X and Y assigns to every x V a subset Γ (x) Y of Y, which can be empty. Γ is called closedvalued (compact-valued, nonempty-valued, etc.) if for all x V the image set Γ (x) is closed (compact, nonempty, etc.). Definition A.6. [32, 129] A multifunction Γ : V R l defined on V R k is upper semicontinuous at x V if for all ε > there exists δ > such that Γ (x ) {z + h : z Γ (x), h < ε} for all x V, x x < δ.
197 A.4 Nemytskij Operators 195 Definition A.7. [32, 129] A multifunction Γ : V R l defined on the measurable set V R k is called measurable [129, p. 16] if it is closed-valued and if for all closed (or open, or compact, see [129, Prop. 1A]) sets C R l the preimage is measurable. The following theorem is important: Γ 1 (C) = {x V : Γ (x) C } Theorem A.8 (Measurable Selection). [32, Thm ] Let Γ : V R k R l be measurable and nonempty-valued. Then there exists a measurable function γ : V R l such that γ(x) Γ (x) for all x V. Further results on set-valued analysis can be found in [11, 32, 129]. A.4 Nemytskij Operators In this appendix we establish several results on superposition (or Nemytskij) operators involving differentiable outer functions. These results are used in the proof of the continuous differentiability of the merit function u Φ(u) 2 L 2 /2 in section 6 as well as in the analysis of the nonlinear elliptic control problem in section 7.1. Concerning Nemytskij operators, we also refer to [8, 9, 1] Proposition A.9. Let Ω R n be measurable with finite measure and 1 p, q <. Let f : R m R be continuous and consider F (u)(x) = f(u(x)) for u L p (Ω) m. Assume that f(u) c 1 + c 2 u p/q 2 u R m (A.5) with constants c i. Then F : L p (Ω) m L q (Ω) is continuous and bounded with with constants C i. F (u) L q C 1 + C 2 u p/q [L p ] m. Proof. See [147, Prop. 26.6]. Proposition A.1. Let Ω R n be measurable with finite measure and 1 q < p <. Let f : R m R be continuously differentiable and consider F (u)(x) = f(u(x)) for u L p (Ω) m. Assume that f (u) 2 c p q 1 + c 2 u q 2 u R m (A.6) with constants c i. Then F : Lp (Ω) m L q (Ω) is continuously Fréchet differentiable with F (u)v = f (u)v.
198 196 A. Appendix Proof. We have f(u) f() + 1 f (tu)u dt f() + u 2 1 f() + c 1 u 2 + c 2 q p p u q 2 c 1 + c 2 u p q 2 ( c 1 + c p q ) q 2 tu 2 dt with constants c i. Hence, by Proposition A.9, F : L p L q is continuous. Further, with r = pq/(p q) there holds p r = p q, q so that u L p (Ω) m f ui (u) L r (Ω) is continuous by Proposition A.9. Hence, f (u)v L q C f (u) [L r ] m v [L p ] m, showing that M(u) : v [L p ] m f (u)v L q satisfies M(u) L([L p ] m, L q ). The estimate f (u 1 )v f (u 2 )v L q C f (u 1 ) f (u 2 ) [L r ] m v [L p ] m, proves that M : [L p ] m L([L p ] m, L q ) is continuous. Further, F (u + v) F (u) M(u)v L q = f(u + v) f(u) f (u)v L q 1 1 = [f (u + tv) f (u)]vdt [f (u + tv) f (u)]v L qdt 1 L q f (u + tv) f (u) [Lr ] m v [L p ] mdt = o( v [L p ] m) as v [L p ] m, so that F is continuously Fréchet differentiable with F = M. Proposition A.11. Let Ω R n be measurable with finite measure and 1 p, q <, p > 2q. Let f : R R be twice continuously differentiable and consider F (u)(x) = f(u(x)) for u L p (Ω). Assume that f (u) c 1 + c 2 p 2q u q (A.7) with constants c i differentiable. Then F : L p (Ω) L q (Ω) is twice continuously Fréchet F (u)v = f (u)v, F (u)(v, w) = f (u)vw. (A.8) Proof. As in the proof of Proposition A.1 we obtain constants c i with f (u) c 1 + p q c 2 u q.
199 A.4 Nemytskij Operators 197 Hence, by Proposition A.1, F : L p L q is continuously differentiable with derivative F (u)v = f (u)v. Now consider g(u) = f (u). From (A.7) and Proposition A.1 we obtain that for r = pq/(p q) > q the operator G : L p (Ω) L r (Ω), G(u) = g(u(x)) = f (u(x)), is continuously differentiable with derivative G (u)v = g (u)v = f (u)v. Now, define the operator b(u; v, w) = f (u)vw. Then b(u; v, w) L q f (u)v L r w L p G (u) Lp,L r v L p w L p. Therefore, b(u;, ) is a continuous bilinear operator L p L p L q that depends continuously on u L p. Further, F (u + w)v F (u)v b(u; v, w) L q = f (u + w)v f (u)v f (u)vw L q f (u + w) f (u) f (u)w L r v L p = G(u + w) G(u) G (u)w L r v L p = o( w L p) v L p as v L p, w L p. This proves that F : L p L q is twice continuously differentiable with derivatives as in (A.8).
200 Notations General Notations Y Norm of the Banach space Y. (, ) Y Inner product of the Hilbert space Y. Y Dual space of the Banach space Y., Y,Y Dual pairing of the Banach space Y and its dual space Y., Dual pairing u, v = Ω u(ω)v(ω)dω. L(X, Y ) X,Y M Space of bounded linear operators M : X Y from the Banach space X to the Banach space Y, equipped with the norm X,Y. Strong operator norm on L(X, Y ), i.e., M X,Y = sup{ Mx Y : x X, x X = 1}. Adjoint operator of M L(X, Y ), i.e., M L(Y, X ) and Mx, y Y,Y = x, M y X,X for all x X, y Y. B Y Open unit ball about in the Banach space Y. B Y Closed unit ball about in the Banach space Y. Bp n Open unit ball about in (R n, p ). B p n Closed unit ball about in (R n, p ). Ω Boundary of the domain Ω. cl M Topological closure of the set M. co M Convex hull of the set M. co M Closed convex hull of the set M. µ Lebesgue measure. 1 Ω Characteristic function of a measurable set Ω Ω, taking the value one on Ω and zero on its complement Ω \ Ω.
201 2 A. Appendix Derivatives F Fréchet-derivative of the operator F : X Y, i.e., F (x) L(X, Y ) and F (x + s) F (x) F (x)s Y = o( s X ) as s X. F x F F xy Partial Fréchet-derivative of the operator F : X Y Z with respect to x X. Second Fréchet derivative. Second partial Fréchet derivative. B f B-differential of the locally Lipschitz function f : R n R m. f Clarke s generalized Jacobian of the locally Lipschitz continuous function f : R n R m. C f Qi s C-subdifferential of the locally Lipschitz function f : R n R m. f Generalized differential of an operator f : X Y, see section 3.2. Ψ Generalized differential of a superposition operator Ψ(u) = ψ(g(u)), see section 3.3. Function Spaces L p (Ω) p [1, ); Banach space of equivalence classes of Lebesgue measur- def able functions u : Ω R such that u L p = ( Ω u(x) p dx ) 1/p <. L 2 (Ω) is a Hilbert space with inner product (u, v) L 2 = Ω u(x)v(x)dx. L (Ω) Banach space of equivalence classes of Lebesgue measurable functions u : Ω R that are essentially bounded on Ω, i.e., def u L = ess sup u(x) <. x Ω C (Ω) Space of infinitely differentiable functions u : Ω R, Ω Rn open, with compact support cl{x : u(x) } Ω. H k,p (Ω) k, p [1, ]; Sobolev space of functions u L p (Ω), Ω R n open, such that D α u L p (Ω) for all weak derivatives up to order k, i.e., for all α k. Hereby D α = α 1 x α αn 1 x α n and α = 1 n α α n. H k,p (Ω) is a Banach space with norm u H k,p = ( α k Dα u p ) 1/p L and similarly for p =. p H k (Ω) k ; short notation for the Hilbert space H k,2 (Ω). H k (Ω) H k (Ω) k 1; closure of C (Ω) in H k (Ω). k 1; dual space of H k (Ω) with respect to the distributional dual pairing. Several vector-valued function spaces are introduced in section 8.2.
202 References [1] F. Abergel and R. Temam, On some control problems in fluid mechanics, Theor. Comput. Fluid Dyn., 1 (1986), pp [2] W. Alt, The Lagrange-Newton method for infinite-dimensional optimization problems, Numer. Funct. Anal. Optim., 11 (199), pp [3], Parametric optimization with applications to optimal control and sequential quadratic programming, Bayreuth. Math. Schr., (1991), pp [4], Sequential quadratic programming in Banach spaces, in Advances in optimization (Lambrecht, 1991), Springer, Berlin, 1992, pp [5] W. Alt and K. Malanowski, The Lagrange-Newton method for nonlinear optimal control problems, Comput. Optim. Appl., 2 (1993), pp [6] W. Alt, R. Sontag, and F. Tröltzsch, An SQP method for optimal control of weakly singular Hammerstein integral equations, Appl. Math. Optim., 33 (1996), pp [7] H. Amann, Compact embeddings of vector-valued Sobolev and Besov spaces, Glas. Mat. Ser. III, 35(55) (2), pp [8] J. Appell, Upper estimates for superposition operators and some applications, Ann. Acad. Sci. Fenn. Ser. A I Math., 8 (1983), pp [9], The superposition operator in function spaces a survey, Exposition. Math., 6 (1988), pp [1] J. Appell and P. P. Zabrejko, Nonlinear superposition operators, Cambridge University Press, Cambridge, 199. [11] J.-P. Aubin and H. Frankowska, Set-valued analysis, Birkhäuser Boston Inc., Boston, MA, 199. [12] C. Baiocchi and A. Capelo, Variational and quasivariational inequalities, John Wiley & Sons Inc., New York, [13] A. Bensoussan and J.-L. Lions, Impulse control and quasivariational inequalities, Gauthier-Villars, Montrouge, [14] M. Bergounioux, M. Haddou, M. Hintermüller, and K. Kunisch, A comparison of a Moreau Yosida-based active set strategy and interior point methods for constrained optimal control problems, SIAM J. Optim., 11 (2), pp [15] M. Bergounioux, K. Ito, and K. Kunisch, Primal-dual strategy for constrained optimal control problems, SIAM J. Control Optim., 37 (1999), pp [16] T. Bewley, R. Temam, and M. Ziane, Existence and uniqueness of optimal control to the Navier-Stokes equations, C. R. Acad. Sci. Paris Sér. I Math., 33 (2), pp [17] T. R. Bewley, R. Temam, and M. Ziane, A general framework for robust control in fluid mechanics, Phys. D, 138 (2), pp [18] S. C. Billups, Algorithms for complementarity problems and generalized equations, PhD thesis, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, 1995.
203 22 References [19] J. F. Bonnans and C. Pola, A trust region interior point algorithm for linearly constrained optimization, SIAM J. Optim., 7 (1997), pp [2] J. M. Borwein and Q. J. Zhu, A survey of subdifferential calculus with applications, Nonlinear Anal., 38 (1999), pp [21] A. Brandt and C. W. Cryer, Multigrid algorithms for the solution of linear complementarity problems arising from free boundary problems, SIAM J. Sci. Statist. Comput., 4 (1983), pp [22] H. Brézis, Problèmes unilatéraux, J. Math. Pures Appl. (9), 51 (1972), pp [23] W. L. Briggs, V. E. Henson, and S. F. McCormick, A multigrid tutorial, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, second ed., 2. [24] J. Burger and M. Pogu, Functional and numerical solution of a control problem originating from heat transfer, J. Optim. Theory Appl., 68 (1991), pp [25] R. H. Byrd, P. Lu, J. Nocedal, and C. Y. Zhu, A limited memory algorithm for bound constrained optimization, SIAM J. Sci. Comput., 16 (1995), pp [26] R. H. Byrd, J. Nocedal, and R. B. Schnabel, Representations of quasi-newton matrices and their use in limited memory methods, Math. Programming, 63 (1994), pp [27] P. H. Calamai and J. J. Moré, Projected gradient methods for linearly constrained problems, Math. Programming, 39 (1987), pp [28] B. Chen, X. Chen, and C. Kanzow, A penalized Fischer-Burmeister NCP-function, Math. Program., 88 (2), pp [29] B. Chen and N. Xiu, A global linear and local quadratic noninterior continuation method for nonlinear complementarity problems based on Chen-Mangasarian smoothing functions, SIAM J. Optim., 9 (1999), pp [3] X. Chen, Z. Nashed, and L. Qi, Smoothing methods and semismooth methods for nondifferentiable operator equations, SIAM J. Numer. Anal., 38 (2), pp [31] X. Chen, L. Qi, and D. Sun, Global and superlinear convergence of the smoothing Newton method and its application to general box constrained variational inequalities, Math. Comp., 67 (1998), pp [32] F. H. Clarke, Optimization and nonsmooth analysis, John Wiley & Sons Inc., New York, [33] F. H. Clarke, Y. S. Ledyaev, R. J. Stern, and P. R. Wolenski, Nonsmooth analysis and control theory, Springer-Verlag, New York, [34] S. S. Collis, K. Ghayour, M. Heinkenschloss, M. Ulbrich, and S. Ulbrich, Towards adjoint-based methods for aeroacoustic control, in 39th Aerospace Science Meeting & Exhibit, January 8 11, 21, Reno, Nevada, AIAA Paper , 21. [35], Numerical solution of optimal control problems governed by the compressible Navier Stokes equations, in Proceedings of the International Conference on Optimal Control of Complex Structures, G. Leugering, J. Sprekels, and F. Tröltzsch, eds., Birkhäuser Verlag, 21, to appear. [36] S. S. Collis and S. K. Lele, A computational investigation of recepvitity in high-speed flow near a swept leading-edge, Technical Report TF-71, Flow Physics and Computation Division, Department of Mechanical Engineering, Stanford University, Stanford, California, [37] B. D. Craven and B. M. Glover, An approach to vector subdifferentials, Optimization, 38 (1996), pp [38] T. De Luca, F. Facchinei, and C. Kanzow, A semismooth equation approach to the solution of nonlinear complementarity problems, Math. Programming, 75 (1996), pp [39], A theoretical and numerical comparison of some semismooth algorithms for complementarity problems, Comput. Optim. Appl., 16 (2), pp [4] J. E. Dennis, Jr. and J. J. Moré, A characterization of superlinear convergence and its application to quasi-newton methods, Math. Comp., 28 (1974), pp
204 References 23 [41] J. E. Dennis, Jr. and J. J. Moré, Quasi-Newton methods, motivation and theory, SIAM Rev., 19 (1977), pp [42] J. E. Dennis, Jr. and R. B. Schnabel, Numerical methods for unconstrained optimization and nonlinear equations, Prentice-Hall Inc., Englewood Cliffs, N.J., [43] M. Desai and K. Ito, Optimal controls of Navier-Stokes equations, SIAM J. Control Optim., 32 (1994), pp [44] P. Deuflhard and M. Weiser, Local inexact Newton multilevel FEM for nonlinear elliptic problems, in Computational science for the 21st Century, M.-O. Bristeau, G. Etgen, W. Fitzigibbon, J.-L. Lions, J. Periaux, and M. Wheeler, eds., Wiley, 1997, pp [45] S. P. Dirkse and M. C. Ferris, The PATH solver: A non-monotone stabilization scheme for mixed complementarity problems, Optimization Methods and Software, 5 (1995), pp [46] J. C. Dunn and T. Tian, Variants of the Kuhn-Tucker sufficient conditions in cones of nonnegative functions, SIAM J. Control Optim., 3 (1992), pp [47] G. Duvaut and J.-L. Lions, Inequalities in mechanics and physics, Springer-Verlag, Berlin, Grundlehren der Mathematischen Wissenschaften, 219. [48] B. C. Eaves, On the basic theorem of complementarity, Math. Programming, 1 (1971), pp [49] I. Ekeland and R. Temam, Convex analysis and variational problems, North-Holland Publishing Co., Amsterdam, [5] F. Facchinei, A. Fischer, and C. Kanzow, Regularity properties of a semismooth reformulation of variational inequalities, SIAM J. Optim., 8 (1998), pp [51] F. Facchinei, H. Jiang, and L. Qi, A smoothing method for mathematical programs with equilibrium constraints, Math. Program., 85 (1999), pp [52] F. Facchinei and C. Kanzow, A nonsmooth inexact Newton method for the solution of large-scale nonlinear complementarity problems, Math. Programming, 76 (1997), pp [53] F. Facchinei and J. Soares, A new merit function for nonlinear complementarity problems and a related algorithm, SIAM J. Optim., 7 (1997), pp [54] M. C. Ferris, C. Kanzow, and T. S. Munson, Feasible descent algorithms for mixed complementarity problems, Math. Programming, (1999), pp [55] A. Fischer, A special Newton-type optimization method, Optimization, 24 (1992), pp [56], Solution of monotone complementarity problems with locally lipschitzian functions, Math. Programming, 76 (1997), pp [57] M. Fukushima and J.-S. Pang, Some feasibility issues in mathematical programs with equilibrium constraints, SIAM J. Optim., 8 (1998), pp [58] A. V. Fursikov, Optimal control of distributed systems. Theory and applications, American Mathematical Society, Providence, RI, 2. [59] R. Giering and T. Kaminski, Recipes for adjoint code construction, ACM Transactions of Mathematical Software, 24 (1998), pp [6] V. Girault and P.-A. Raviart, Finite element methods for Navier-Stokes equations, Springer-Verlag, Berlin, [61] B. M. Glover and D. Ralph, First order approximations to nonsmooth mappings with application to metric regularity, Numer. Funct. Anal. Optim., 15 (1994), pp [62] R. Glowinski, Numerical methods for nonlinear variational problems, Springer- Verlag, New York, [63] R. Glowinski, J.-L. Lions, and R. Trémolières, Numerical analysis of variational inequalities, North-Holland Publishing Co., Amsterdam, [64] A. Griewank, The local convergence of Broyden-like methods on Lipschitzian problems in Hilbert spaces, SIAM J. Numer. Anal., 24 (1987), pp
205 24 References [65] L. Grippo, F. Lampariello, and S. Lucidi, A nonmonotone line search technique for Newton s method, SIAM J. Numer. Anal., 23 (1986), pp [66] W. A. Gruver and E. Sachs, Algorithmic methods in optimal control, Pitman (Advanced Publishing Program), Boston, Mass., [67] M. D. Gunzburger, L. Hou, and T. P. Svobodny, Analysis and finite element approximation of optimal control problems for the stationary Navier-Stokes equations with distributed and Neumann controls, Math. Comp., 57 (1991), pp [68] M. D. Gunzburger, L. S. Hou, and T. P. Svobodny, Analysis and finite element approximation of optimal control problems for the stationary Navier-Stokes equations with Dirichlet controls, RAIRO Modél. Math. Anal. Numér., 25 (1991), pp [69] M. D. Gunzburger and S. Manservisi, The velocity tracking problem for Navier-Stokes flows with bounded distributed controls, SIAM J. Control Optim., 37 (1999), pp [7], Analysis and approximation of the velocity tracking problem for Navier-Stokes flows with distributed control, SIAM J. Numer. Anal., 37 (2), pp [71], The velocity tracking problem for Navier-Stokes flows with boundary control, SIAM J. Control Optim., 39 (2), pp [72] W. Hackbusch, Multigrid methods and applications, Springer-Verlag, Berlin, [73] W. Hackbusch and U. Trottenberg (eds.), Multigrid methods, Springer-Verlag, Berlin, [74] W. W. Hager, Runge-Kutta methods in optimal control and the transformed adjoint system, Numer. Math., 87 (2), pp [75] M. Heinkenschloss, Formulation and analysis of a sequential quadratic programming method for the optimal Dirichlet boundary control of Navier-Stokes flow, in Optimal control (Gainesville, FL, 1997), Kluwer Acad. Publ., Dordrecht, 1998, pp [76] M. Heinkenschloss and F. Tröltzsch, Analysis of the Lagrange-SQP-Newton method for the control of a phase field equation, Control Cybernet., 28 (1999), pp [77] M. Heinkenschloss, M. Ulbrich, and S. Ulbrich, Superlinear and quadratic convergence of affine-scaling interior-point Newton methods for problems with simple bounds without strict complementarity assumption, Math. Program., 86 (1999), pp [78] M. Hintermüller, K. Ito, and K. Kunisch, The primal-dual active set strategy as semismooth newton method, Bericht Nr. 214 des Spezialforschungsbereichs F3 Optimierung und Kontrolle, Karl-Franzens Universität Graz, Austria, 21. [79] M. Hinze, Optimal and instantaneous control of the instationary Navier Stokes equations, Habilitationsschrift, Fachbereich Mathematik, Technische Universität Berlin, Berlin, Germany, 2. [8] M. Hinze and K. Kunisch, Second order methods for optimal control of time-dependent fluid flow, Bericht Nr. 165 des Spezialforschungsbereichs F3 Optimierung und Kontrolle, Karl-Franzens Universität Graz, Austria, [81] D. Hoff, Discontinuous solutions of the Navier Stokes equations for multidimensional flows of heat-conducting fluids, Arch. Rational Mech. Anal., 139 (1997), pp [82] R. H. W. Hoppe, Une méthode multigrille pour la solution des problèmes d obstacle, RAIRO Modél. Math. Anal. Numér., 24 (199), pp [83] R. H. W. Hoppe and R. Kornhuber, Adaptive multilevel methods for obstacle problems, SIAM J. Numer. Anal., 31 (1994), pp [84] A. D. Ioffe, Nonsmooth analysis: differential calculus of nondifferentiable mappings, Trans. Amer. Math. Soc., 266 (1981), pp [85] V. Jeyakumar, Simple characterizations of superlinear convergence for semismooth equations via approximate Jacobians, Applied Mathematics Research Report AMR98/28, School of Mathematics, University of New South Wales, Sydney, New South Wales, Australia, 1998.
206 References 25 [86], Solving B-differentiable equations, Applied Mathematics Research Report AMR98/27, School of Mathematics, University of New South Wales, Sydney, New South Wales, Australia, [87] V. Jeyakumar and D. T. Luc, Approximate Jacobian matrices for nonsmooth continuous maps and C 1 -optimization, SIAM J. Control Optim., 36 (1998), pp [88] H. Jiang, M. Fukushima, L. Qi, and D. Sun, A trust region method for solving generalized complementarity problems, SIAM J. Optim., 8 (1998), pp [89] H. Jiang and L. Qi, A new nonsmooth equations approach to nonlinear complementarity problems, SIAM J. Control Optim., 35 (1997), pp [9] H. Jiang and D. Ralph, Smooth SQP methods for mathematical programs with nonlinear complementarity constraints, SIAM J. Optim., 1 (2), pp [91] L. V. Kantorovich and G. P. Akilov, Functional analysis, Pergamon Press, Oxford, second ed., [92] C. Kanzow and H. Pieper, Jacobian smoothing methods for nonlinear complementarity problems, SIAM J. Optim., 9 (1999), pp [93] C. Kanzow and M. Zupke, Inexact trust-region methods for nonlinear complementarity problems, in Reformulation: Nonsmooth, Piecewise Smooth, Semismooth and Smoothing Methods (Lausanne, 1997), M. Fukushima and L. Qi, eds., Kluwer Acad. Publ., Dordrecht, 1999, pp [94] C. T. Kelley and E. W. Sachs, A new proof of superlinear convergence for Broyden s method in Hilbert space, SIAM J. Optim., 1 (1991), pp [95], Multilevel algorithms for constrained compact fixed point problems, SIAM J. Sci. Comput., 15 (1994), pp [96], A trust region method for parabolic boundary control problems, SIAM J. Optim., 9 (1999), pp Dedicated to John E. Dennis, Jr., on his 6th birthday. [97] N. Kikuchi and J. T. Oden, Contact problems in elasticity: a study of variational inequalities and finite element methods, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, [98] D. Kinderlehrer and G. Stampacchia, An introduction to variational inequalities and their applications, Academic Press Inc., New York, 198. [99] R. Kornhuber, Monotone multigrid methods for elliptic variational inequalities. I, Numer. Math., 69 (1994), pp [1], Monotone multigrid methods for elliptic variational inequalities. II, Numer. Math., 72 (1996), pp [11], Adaptive monotone multigrid methods for nonlinear variational problems, B. G. Teubner, Stuttgart, [12] B. Kummer, Newton s method for nondifferentiable functions, in Advances in Mathematical Optimization, J. Guddat et al., eds., Akademie-Verlag, Berlin, 1988, pp [13], Newton s method based on generalized derivatives for nonsmooth functions: convergence analysis, in Advances in Optimization (Lambrecht, 1991), W. Oettli and D. Pallaschke, eds., Springer, Berlin, 1992, pp [14] I. Lasiecka and R. Triggiani, Regularity theory of hyperbolic equations with nonhomogeneous Neumann boundary conditions. II. General boundary data, J. Differential Equations, 94 (1991), pp [15] C.-J. Lin and J. J. Moré, Newton s method for large bound-constrained optimization problems, SIAM J. Optim., 9 (1999), pp Dedicated to John E. Dennis, Jr., on his 6th birthday. [16] J.-L. Lions, Optimal control of systems governed by partial differential equations., Springer-Verlag, New York, [17] P.-L. Lions, Mathematical topics in fluid mechanics. Vol. 1, The Clarendon Press Oxford University Press, New York, 1996.
207 26 References [18], Mathematical topics in fluid mechanics. Vol. 2, The Clarendon Press Oxford University Press, New York, [19] Z.-Q. Luo, J.-S. Pang, and D. Ralph, Mathematical programs with equilibrium constraints, Cambridge University Press, Cambridge, [11] O. L. Mangasarian, Equivalence of the complementarity problem to a system of nonlinear equations, SIAM J. Appl. Math., 31 (1976), pp [111] A. Matsumura and T. Nishida, The initial value problem for the equations of motion of viscous and heat-conductive gases, J. Math. Kyoto Univ., 2 (198), pp [112] G. P. McCormick and K. Ritter, Methods of conjugate directions versus quasi-newton methods, Math. Programming, 3 (1972), pp [113] R. Mifflin, Semismooth and semiconvex functions in constrained optimization, SIAM J. Control Optim., 15 (1977), pp [114] J. J. Moré and D. J. Thuente, Line search algorithms with guaranteed sufficient decrease, ACM Trans. Math. Software, 2 (1994), pp [115] T. S. Munson, Algorithms and Environments for Complementarity, PhD thesis, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, 2. [116] T. S. Munson, F. Facchinei, M. C. Ferris, A. Fischer, and C. Kanzow, The Semismooth algorithm for large scale complementarity problems, Mathematical Programming Technical Report MP-TR-99-7, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, [117] P. D. Panagiotopoulos, Inequality problems in mechanics and applications. Convex and nonconvex energy functions, Birkhäuser Boston Inc., Boston, Mass., [118] J.-S. Pang and L. Qi, Nonsmooth equations: motivation and algorithms, SIAM J. Optim., 3 (1993), pp [119] H.-D. Qi, L. Qi, and D. Sun, Solving KKT systems via the trust region and the conjugate gradient methods, Applied Mathematics Research Report AMR99/19, School of Mathematics, University of New South Wales, Sydney, New South Wales, Australia, [12] L. Qi, Convergence analysis of some algorithms for solving nonsmooth equations, Math. Oper. Res., 18 (1993), pp [121], C-differential operators, C-differentiability and generalized Newton methods, Research Report AMR96/5, School of Mathematics, University of New South Wales, Sydney, New South Wales, Australia, [122] L. Qi and J. Sun, A nonsmooth version of Newton s method, Math. Programming, 58 (1993), pp [123] D. Ralph, Rank-1 support functionals and the rank-1 generalized Jacobian, piecewise linear homeomorphisms, PhD thesis, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, 199. [124], Global convergence of damped Newton s method for nonsmooth equations via the path search, Math. Oper. Res., 19 (1994), pp [125] K. Ritter, A quasi-newton method for unconstrained minimization problems, in Nonlinear programming, 2 (Proc. Special Interest Group Math. Programming Sympos., Univ. Wisconsin, Madison, Wis., 1974), Academic Press, New York, 1974, pp [126] S. M. Robinson, Stability theory for systems of inequalities. II. Differentiable nonlinear systems, SIAM J. Numer. Anal., 13 (1976), pp [127], Normal maps induced by linear transformations, Math. Oper. Res., 17 (1992), pp [128], Newton s method for a class of nonsmooth functions, Set-Valued Anal., 2 (1994), pp [129] R. T. Rockafellar, Integral functionals, normal integrands and measurable selections, in Nonlinear Operators and the Calculus of Variations (Summer School, Univ. Libre Bruxelles, Brussels, 1975), J. P. Gossez et al., eds., Springer, Berlin, 1976, pp Lecture Notes in Math., Vol. 543.
208 References 27 [13] R. T. Rockafellar and R. J.-B. Wets, Variational analysis, Springer-Verlag, Berlin, [131] E. W. Sachs, Broyden s method in Hilbert space, Math. Programming, 35 (1986), pp [132] S. Scholtes, Introduction to piecewise differentiable equations, Habilitationsschrift, Institut für Statistik und Mathematische Wirtschaftstheorie, Universität Karlsruhe, Karlsruhe, Germany, [133] A. Shapiro, On concepts of directional differentiability, J. Optim. Theory Appl., 66 (199), pp [134] R. Temam, Navier Stokes equations, North-Holland Publishing Co., Amsterdam, third ed., [135] L. Thibault, On generalized differentials and subdifferentials of Lipschitz vector-valued functions, Nonlinear Anal., 6 (1982), pp [136] P. L. Toint, Global convergence of a class of trust-region methods for nonconvex minimization in Hilbert space, IMA J. Numer. Anal., 8 (1988), pp [137], Non-monotone trust-region algorithms for nonlinear optimization subject to convex constraints, Math. Programming, 77 (1997), pp [138] F. Tröltzsch, An SQP method for the optimal control of a nonlinear heat equation, Control Cybernet., 23 (1994), pp [139] M. Ulbrich, Semismooth Newton methods for operator equations in function spaces, Technical Report TR-11, Department of Computational and Applied Mathematics, Rice University, Houston, Texas , 2. Accepted for publication (in revised form) in SIAM J. Optimization. [14], Non-monotone trust-region methods for bound-constrained semismooth equations with applications to nonlinear mixed complementarity problems, SIAM J. Optim., 11 (21), pp [141], On a nonsmooth Newton method for nonlinear complementarity problems in function space with applications to optimal control, in Complementarity: Applications, Algorithms and Extensions, M. C. Ferris, O. L. Mangasarian, and J.-S. Pang, eds., Kluwer Acad. Publ., Dordrecht, 21, pp [142] M. Ulbrich and S. Ulbrich, Non-monotone trust region methods for nonlinear equality constrained optimization without a penalty function, Technical Report, Fakultät für Mathematik, Technische Universität München, 829 München, Germany, 2. [143], Superlinear convergence of affine-scaling interior-point Newton methods for infinite-dimensional nonlinear problems with pointwise bounds, SIAM J. Control Optim., 38 (2), pp [144] M. Ulbrich, S. Ulbrich, and M. Heinkenschloss, Global convergence of trust-region interior-point algorithms for infinite-dimensional nonconvex minimization subject to pointwise bounds, SIAM J. Control Optim., 37 (1999), pp [145] P. Wesseling, An introduction to multigrid methods, John Wiley & Sons Ltd., Chichester, [146] H. Xu, Set-valued approximations and Newton s methods, Math. Program., 84 (1999), pp [147] E. Zeidler, Nonlinear functional analysis and its applications. II/B, Springer-Verlag, New York, 199. [148] C. Zhu, R. H. Byrd, P. Lu, and J. Nocedal, Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization, ACM Trans. Math. Software, 23 (1997), pp [149] W. P. Ziemer, Weakly differentiable functions. Sobolev spaces and functions of bounded variation, Springer-Verlag, Berlin, [15] J. Zowe and S. Kurcyusz, Regularity and stability for the mathematical programming problem in Banach spaces, Appl. Math. Optim., 5 (1979), pp
2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
Nonlinear Optimization: Algorithms 3: Interior-point methods
Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris [email protected] Nonlinear optimization c 2006 Jean-Philippe Vert,
Numerical Verification of Optimality Conditions in Optimal Control Problems
Numerical Verification of Optimality Conditions in Optimal Control Problems Dissertation zur Erlangung des naturwissenschaftlichen Doktorgrades der Julius-Maximilians-Universität Würzburg vorgelegt von
Mathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
Adaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
MA651 Topology. Lecture 6. Separation Axioms.
MA651 Topology. Lecture 6. Separation Axioms. This text is based on the following books: Fundamental concepts of topology by Peter O Neil Elements of Mathematics: General Topology by Nicolas Bourbaki Counterexamples
Metric Spaces. Chapter 7. 7.1. Metrics
Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some
BANACH AND HILBERT SPACE REVIEW
BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but
1 if 1 x 0 1 if 0 x 1
Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or
Duality of linear conic problems
Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least
Numerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number
A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION
1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of
Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725
Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T
Inner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
Separation Properties for Locally Convex Cones
Journal of Convex Analysis Volume 9 (2002), No. 1, 301 307 Separation Properties for Locally Convex Cones Walter Roth Department of Mathematics, Universiti Brunei Darussalam, Gadong BE1410, Brunei Darussalam
Convex analysis and profit/cost/support functions
CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m
1 Norms and Vector Spaces
008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)
10. Proximal point method
L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing
Metric Spaces. Chapter 1
Chapter 1 Metric Spaces Many of the arguments you have seen in several variable calculus are almost identical to the corresponding arguments in one variable calculus, especially arguments concerning convergence
Quasi-static evolution and congested transport
Quasi-static evolution and congested transport Inwon Kim Joint with Damon Alexander, Katy Craig and Yao Yao UCLA, UW Madison Hard congestion in crowd motion The following crowd motion model is proposed
Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1
Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing
Error Bound for Classes of Polynomial Systems and its Applications: A Variational Analysis Approach
Outline Error Bound for Classes of Polynomial Systems and its Applications: A Variational Analysis Approach The University of New South Wales SPOM 2013 Joint work with V. Jeyakumar, B.S. Mordukhovich and
(Quasi-)Newton methods
(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting
LINEAR ALGEBRA W W L CHEN
LINEAR ALGEBRA W W L CHEN c W W L Chen, 1997, 2008 This chapter is available free to all individuals, on understanding that it is not to be used for financial gain, and may be downloaded and/or photocopied,
t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).
1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction
Class Meeting # 1: Introduction to PDEs
MATH 18.152 COURSE NOTES - CLASS MEETING # 1 18.152 Introduction to PDEs, Fall 2011 Professor: Jared Speck Class Meeting # 1: Introduction to PDEs 1. What is a PDE? We will be studying functions u = u(x
Chapter 5. Banach Spaces
9 Chapter 5 Banach Spaces Many linear equations may be formulated in terms of a suitable linear operator acting on a Banach space. In this chapter, we study Banach spaces and linear operators acting on
An optimal transportation problem with import/export taxes on the boundary
An optimal transportation problem with import/export taxes on the boundary Julián Toledo Workshop International sur les Mathématiques et l Environnement Essaouira, November 2012..................... Joint
TOPOLOGY: THE JOURNEY INTO SEPARATION AXIOMS
TOPOLOGY: THE JOURNEY INTO SEPARATION AXIOMS VIPUL NAIK Abstract. In this journey, we are going to explore the so called separation axioms in greater detail. We shall try to understand how these axioms
THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS
THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS KEITH CONRAD 1. Introduction The Fundamental Theorem of Algebra says every nonconstant polynomial with complex coefficients can be factored into linear
24. The Branch and Bound Method
24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete
Inner Product Spaces and Orthogonality
Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,
Continued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
Row Ideals and Fibers of Morphisms
Michigan Math. J. 57 (2008) Row Ideals and Fibers of Morphisms David Eisenbud & Bernd Ulrich Affectionately dedicated to Mel Hochster, who has been an inspiration to us for many years, on the occasion
1 VECTOR SPACES AND SUBSPACES
1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such
FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES
FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES CHRISTOPHER HEIL 1. Cosets and the Quotient Space Any vector space is an abelian group under the operation of vector addition. So, if you are have studied
A QUICK GUIDE TO THE FORMULAS OF MULTIVARIABLE CALCULUS
A QUIK GUIDE TO THE FOMULAS OF MULTIVAIABLE ALULUS ontents 1. Analytic Geometry 2 1.1. Definition of a Vector 2 1.2. Scalar Product 2 1.3. Properties of the Scalar Product 2 1.4. Length and Unit Vectors
Properties of BMO functions whose reciprocals are also BMO
Properties of BMO functions whose reciprocals are also BMO R. L. Johnson and C. J. Neugebauer The main result says that a non-negative BMO-function w, whose reciprocal is also in BMO, belongs to p> A p,and
Big Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
Reference: Introduction to Partial Differential Equations by G. Folland, 1995, Chap. 3.
5 Potential Theory Reference: Introduction to Partial Differential Equations by G. Folland, 995, Chap. 3. 5. Problems of Interest. In what follows, we consider Ω an open, bounded subset of R n with C 2
No: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics
No: 10 04 Bilkent University Monotonic Extension Farhad Husseinov Discussion Papers Department of Economics The Discussion Papers of the Department of Economics are intended to make the initial results
A PRIORI ESTIMATES FOR SEMISTABLE SOLUTIONS OF SEMILINEAR ELLIPTIC EQUATIONS. In memory of Rou-Huai Wang
A PRIORI ESTIMATES FOR SEMISTABLE SOLUTIONS OF SEMILINEAR ELLIPTIC EQUATIONS XAVIER CABRÉ, MANEL SANCHÓN, AND JOEL SPRUCK In memory of Rou-Huai Wang 1. Introduction In this note we consider semistable
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
Linear Algebra Notes for Marsden and Tromba Vector Calculus
Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of
Basic Concepts of Point Set Topology Notes for OU course Math 4853 Spring 2011
Basic Concepts of Point Set Topology Notes for OU course Math 4853 Spring 2011 A. Miller 1. Introduction. The definitions of metric space and topological space were developed in the early 1900 s, largely
ALMOST COMMON PRIORS 1. INTRODUCTION
ALMOST COMMON PRIORS ZIV HELLMAN ABSTRACT. What happens when priors are not common? We introduce a measure for how far a type space is from having a common prior, which we term prior distance. If a type
CONTROLLABILITY. Chapter 2. 2.1 Reachable Set and Controllability. Suppose we have a linear system described by the state equation
Chapter 2 CONTROLLABILITY 2 Reachable Set and Controllability Suppose we have a linear system described by the state equation ẋ Ax + Bu (2) x() x Consider the following problem For a given vector x in
Several Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
Extremal equilibria for reaction diffusion equations in bounded domains and applications.
Extremal equilibria for reaction diffusion equations in bounded domains and applications. Aníbal Rodríguez-Bernal Alejandro Vidal-López Departamento de Matemática Aplicada Universidad Complutense de Madrid,
How To Find Out How To Calculate A Premeasure On A Set Of Two-Dimensional Algebra
54 CHAPTER 5 Product Measures Given two measure spaces, we may construct a natural measure on their Cartesian product; the prototype is the construction of Lebesgue measure on R 2 as the product of Lebesgue
Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.
Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Fabienne Comte, Celine Duval, Valentine Genon-Catalot To cite this version: Fabienne
A FIRST COURSE IN OPTIMIZATION THEORY
A FIRST COURSE IN OPTIMIZATION THEORY RANGARAJAN K. SUNDARAM New York University CAMBRIDGE UNIVERSITY PRESS Contents Preface Acknowledgements page xiii xvii 1 Mathematical Preliminaries 1 1.1 Notation
3. INNER PRODUCT SPACES
. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.
Introduction to Algebraic Geometry. Bézout s Theorem and Inflection Points
Introduction to Algebraic Geometry Bézout s Theorem and Inflection Points 1. The resultant. Let K be a field. Then the polynomial ring K[x] is a unique factorisation domain (UFD). Another example of a
NOTES ON LINEAR TRANSFORMATIONS
NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all
Computing a Nearest Correlation Matrix with Factor Structure
Computing a Nearest Correlation Matrix with Factor Structure Nick Higham School of Mathematics The University of Manchester [email protected] http://www.ma.man.ac.uk/~higham/ Joint work with Rüdiger
Mathematical Methods of Engineering Analysis
Mathematical Methods of Engineering Analysis Erhan Çinlar Robert J. Vanderbei February 2, 2000 Contents Sets and Functions 1 1 Sets................................... 1 Subsets.............................
3. Linear Programming and Polyhedral Combinatorics
Massachusetts Institute of Technology Handout 6 18.433: Combinatorial Optimization February 20th, 2009 Michel X. Goemans 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the
Section 1.1. Introduction to R n
The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to
IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL. 1. Introduction
IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL R. DRNOVŠEK, T. KOŠIR Dedicated to Prof. Heydar Radjavi on the occasion of his seventieth birthday. Abstract. Let S be an irreducible
FOUNDATIONS OF ALGEBRAIC GEOMETRY CLASS 22
FOUNDATIONS OF ALGEBRAIC GEOMETRY CLASS 22 RAVI VAKIL CONTENTS 1. Discrete valuation rings: Dimension 1 Noetherian regular local rings 1 Last day, we discussed the Zariski tangent space, and saw that it
Date: April 12, 2001. Contents
2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........
I. GROUPS: BASIC DEFINITIONS AND EXAMPLES
I GROUPS: BASIC DEFINITIONS AND EXAMPLES Definition 1: An operation on a set G is a function : G G G Definition 2: A group is a set G which is equipped with an operation and a special element e G, called
How To Know If A Domain Is Unique In An Octempo (Euclidean) Or Not (Ecl)
Subsets of Euclidean domains possessing a unique division algorithm Andrew D. Lewis 2009/03/16 Abstract Subsets of a Euclidean domain are characterised with the following objectives: (1) ensuring uniqueness
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
Load Balancing and Switch Scheduling
EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: [email protected] Abstract Load
Solutions of Equations in One Variable. Fixed-Point Iteration II
Solutions of Equations in One Variable Fixed-Point Iteration II Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011
The Goldberg Rao Algorithm for the Maximum Flow Problem
The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }
CHAPTER 1 BASIC TOPOLOGY
CHAPTER 1 BASIC TOPOLOGY Topology, sometimes referred to as the mathematics of continuity, or rubber sheet geometry, or the theory of abstract topological spaces, is all of these, but, above all, it is
Interior Point Methods and Linear Programming
Interior Point Methods and Linear Programming Robert Robere University of Toronto December 13, 2012 Abstract The linear programming problem is usually solved through the use of one of two algorithms: either
INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS
INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem
The Heat Equation. Lectures INF2320 p. 1/88
The Heat Equation Lectures INF232 p. 1/88 Lectures INF232 p. 2/88 The Heat Equation We study the heat equation: u t = u xx for x (,1), t >, (1) u(,t) = u(1,t) = for t >, (2) u(x,) = f(x) for x (,1), (3)
CHAPTER 9. Integer Programming
CHAPTER 9 Integer Programming An integer linear program (ILP) is, by definition, a linear program with the additional constraint that all variables take integer values: (9.1) max c T x s t Ax b and x integral
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
T ( a i x i ) = a i T (x i ).
Chapter 2 Defn 1. (p. 65) Let V and W be vector spaces (over F ). We call a function T : V W a linear transformation form V to W if, for all x, y V and c F, we have (a) T (x + y) = T (x) + T (y) and (b)
MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.
MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α
Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen
(für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained
4: SINGLE-PERIOD MARKET MODELS
4: SINGLE-PERIOD MARKET MODELS Ben Goldys and Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2015 B. Goldys and M. Rutkowski (USydney) Slides 4: Single-Period Market
The Henstock-Kurzweil-Stieltjes type integral for real functions on a fractal subset of the real line
The Henstock-Kurzweil-Stieltjes type integral for real functions on a fractal subset of the real line D. Bongiorno, G. Corrao Dipartimento di Ingegneria lettrica, lettronica e delle Telecomunicazioni,
The equivalence of logistic regression and maximum entropy models
The equivalence of logistic regression and maximum entropy models John Mount September 23, 20 Abstract As our colleague so aptly demonstrated ( http://www.win-vector.com/blog/20/09/the-simplerderivation-of-logistic-regression/
Notes V General Equilibrium: Positive Theory. 1 Walrasian Equilibrium and Excess Demand
Notes V General Equilibrium: Positive Theory In this lecture we go on considering a general equilibrium model of a private ownership economy. In contrast to the Notes IV, we focus on positive issues such
Research Article Stability Analysis for Higher-Order Adjacent Derivative in Parametrized Vector Optimization
Hindawi Publishing Corporation Journal of Inequalities and Applications Volume 2010, Article ID 510838, 15 pages doi:10.1155/2010/510838 Research Article Stability Analysis for Higher-Order Adjacent Derivative
Section 6.1 - Inner Products and Norms
Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,
SOLUTIONS TO EXERCISES FOR. MATHEMATICS 205A Part 3. Spaces with special properties
SOLUTIONS TO EXERCISES FOR MATHEMATICS 205A Part 3 Fall 2008 III. Spaces with special properties III.1 : Compact spaces I Problems from Munkres, 26, pp. 170 172 3. Show that a finite union of compact subspaces
Low upper bound of ideals, coding into rich Π 0 1 classes
Low upper bound of ideals, coding into rich Π 0 1 classes Antonín Kučera the main part is a joint project with T. Slaman Charles University, Prague September 2007, Chicago The main result There is a low
Stationarity Results for Generating Set Search for Linearly Constrained Optimization
SANDIA REPORT SAND2003-8550 Unlimited Release Printed October 2003 Stationarity Results for Generating Set Search for Linearly Constrained Optimization Tamara G. Kolda, Robert Michael Lewis, and Virginia
Linear Maps. Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007)
MAT067 University of California, Davis Winter 2007 Linear Maps Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007) As we have discussed in the lecture on What is Linear Algebra? one of
Vector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty
Math 4310 Handout - Quotient Vector Spaces
Math 4310 Handout - Quotient Vector Spaces Dan Collins The textbook defines a subspace of a vector space in Chapter 4, but it avoids ever discussing the notion of a quotient space. This is understandable
ISOMETRIES OF R n KEITH CONRAD
ISOMETRIES OF R n KEITH CONRAD 1. Introduction An isometry of R n is a function h: R n R n that preserves the distance between vectors: h(v) h(w) = v w for all v and w in R n, where (x 1,..., x n ) = x
Duality in Linear Programming
Duality in Linear Programming 4 In the preceding chapter on sensitivity analysis, we saw that the shadow-price interpretation of the optimal simplex multipliers is a very useful concept. First, these shadow
1 Local Brouwer degree
1 Local Brouwer degree Let D R n be an open set and f : S R n be continuous, D S and c R n. Suppose that the set f 1 (c) D is compact. (1) Then the local Brouwer degree of f at c in the set D is defined.
Convex Programming Tools for Disjunctive Programs
Convex Programming Tools for Disjunctive Programs João Soares, Departamento de Matemática, Universidade de Coimbra, Portugal Abstract A Disjunctive Program (DP) is a mathematical program whose feasible
Notes on Factoring. MA 206 Kurt Bryan
The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor
A new continuous dependence result for impulsive retarded functional differential equations
CADERNOS DE MATEMÁTICA 11, 37 47 May (2010) ARTIGO NÚMERO SMA#324 A new continuous dependence result for impulsive retarded functional differential equations M. Federson * Instituto de Ciências Matemáticas
TD(0) Leads to Better Policies than Approximate Value Iteration
TD(0) Leads to Better Policies than Approximate Value Iteration Benjamin Van Roy Management Science and Engineering and Electrical Engineering Stanford University Stanford, CA 94305 [email protected] Abstract
Some Polynomial Theorems. John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 [email protected].
Some Polynomial Theorems by John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 [email protected] This paper contains a collection of 31 theorems, lemmas,
