Nested iteration methods for nonlinear matrix problems

Transcription

1 Nested iteration methods for nonlinear matrix problems Geneste iteratie methoden voor niet-lineaire matrix problemen (met een samenvatting in het Nederlands) Proefschrift ter verkrijging van de graad van doctor aan de Universiteit Utrecht op gezag van de Rector Magnificus, Prof. dr. W. H. Gispen, ingevolge het besluit van het College van Promoties in het openbaar te verdedigen op maandag 22 september 2003 des ochtends te uur door Jasper van den Eshof geboren op 21 januari 1975 te Utrecht

2 Promotor: Co-promotor: Prof. Dr. H.A. van der Vorst Faculteit der Wiskunde en Informatica Universiteit Utrecht Dr. G.L.G. Sleijpen Faculteit der Wiskunde en Informatica Universiteit Utrecht Het onderzoek beschreven in dit proefschrift is financieel mogelijk gemaakt door de Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO). Mathematics Subject Classification: 65F15, 65F10, 65F50. Van den Eshof, Jasper Nested iteration methods for nonlinear matrix problems Proefschrift Universiteit Utrecht Met een samenvatting in het Nederlands. ISBN

3 Contents 1 Introduction The eigenvalue problem Chapter 2: The subspace extraction Chapter 3: Simple vector iterations The overlap operator in quantum chromodynamics Chapter 4: Numerical methods for the overlap operator Chapter 5: Inexact Krylov subspace methods Eigenvector approximations from a subspace Introduction Rayleigh-Ritz approximations A priori error bounds for the Ritz pair A well-known upper bound A sharp upper bound Some results based on Theorem Discussion Harmonic Rayleigh-Ritz approximations Useful properties of harmonic Rayleigh-Ritz A minmax characterization for harmonic Ritz values Optimal inclusion intervals for eigenvalues

4 iv Contents The concept of ρ-values Harmonic Rayleigh-Ritz and Krylov subspaces A connection with Gauss-Radau quadrature Comparing harmonic and refined Rayleigh-Ritz Refined Rayleigh-Ritz approximations The optimal value of ξ in refined Rayleigh-Ritz The optimal value of σ in harmonic Rayleigh-Ritz Illustration Discussion Numerical experiments A priori error bounds A posteriori error estimation A condition for the minimizing shift A condition for the shift σ m Discussion The selection of a harmonic Ritz pair The selection strategies Numerical experiments Summary and outlook Subspace expansion using simple vector iterations Introduction Rayleigh quotient iteration The Jacobi-Davidson correction equation Illustration Discussion The iterative solution of the correction equation Discussion

5 Contents v 3.5 Numerical experiments Summary and outlook Numerical methods for the QCD overlap operator Introduction A Krylov subspace framework The Chebyshev approach Methods based on the Lanczos reduction Lanczos approximations Smooth convergence with Lanczos on Q The quality of the polynomials Error estimation Practical implementations The PFE/CG method Error estimation The choice of the rational approximation Removing converged systems Discussion Numerical experiments Summary and outlook Inexact Krylov subspace methods for linear systems Introduction Krylov subspace methods Derivation from Krylov decompositions Inexact Krylov subspace methods Relaxation strategies The analysis of inexact Krylov subspace methods

6 vi Contents A general expression for the residual gap Inexact Richardson iteration Discussion Inexact Chebyshev iteration Discussion The inexact Conjugate Gradient method The case of T k positive definite The case of T k indefinite The behavior of the computed residuals Variants of the Conjugate Gradient method Numerical experiments Discussion Inexact FOM and GMRES The behavior of the computed residuals Practical aspects of relaxation Nested inexact Krylov subspace method The outer iteration: Richardson iteration The outer iteration: flexible GMRES Choosing the precisions ξ j Discussion Numerical experiments Summary and outlook A Subspaces and their bases 147 A.1 The Krylov subspace B Definitions from QCD 151 C The overlap operator in computer arithmetic 153

7 Contents vii C.1 Boriçi s method for the overlap operator C.2 The effect of rounding errors in the Lanczos method C.3 An alternative implementation C.3.1 Dealing with the partial fraction expansion Nederlandse samenvatting 177 Dankwoord 181 Curriculum Vitae 183

8

9 Chapter 1 Introduction Today s applications in scientific computing require the more and more complex coupling of various building blocks dedicated to specific visualization and numerical tasks. The Numlab scientific computing workbench [85] aims at providing its users with the possibility to construct rapidly and conveniently applications for scientific computing and visualization problems. A key issue here is the availability of flexible (and relevant) modules. Particularly focusing on the numerical part, we see that in large scale problems iterative solution methods often play a pivotal role. Iteration methods are solution methods that proceed by, starting with an initial guess, repeatedly improving the last obtained approximation until it satisfies the required accuracy. Traditionally, iteration methods are seen as the opposite of, what are called, direct methods (although more and more there grows a common consensus among researchers that clever combinations of the classes can lead to very good solvers). Whereas for direct methods there is now a vast amount of literature studying the efficiency, stability and accuracy of various methods, the situation for iterative methods is still not as advanced. General purpose implementations of iterative methods are often not readily available. In fact, in practice often experts are needed to tune the specific methods on a problem by problem basis. The general goal of this thesis is to prepare iterative methods for use in a scientific computing laboratory. As a concrete starting point we will study the numerical solution of two matrix problems: the computation of eigenvectors, and their corresponding eigenvalues, of large sparse matrices the multiplication of Green s function with a source vector in simulations in quantum chromodynamics with overlap fermions. Both these problems are nonlinear and they are frequently solved by nesting itera-

10 2 Chapter 1. Introduction tion methods. This means that in each iteration step of a certain method a second iteration method is invoked to solve some subproblem. The nesting aspect is the focus of this thesis and is what makes these problems in particular interesting for a scientific computing laboratory because of the necessity of coupling different iteration methods (or computational kernels). There are various reasons for nesting iteration methods. For example, in some iterative methods the cost of an iteration step grows linearly with the iteration number. This can happen if for acceleration purposes a subspace is formed that is spanned by the approximations computed so far. The dimension of this subspace increases with every step of the iteration method, thereby increasing with every step the amount of work involved in the construction of an appropriate basis for this subspace. Introducing additional information in the process may reduce the number of required iterations and therefore result in a significant reduction of the overall cost. The necessary information might be obtained by partly solving the original problem or some relevant subproblem by invoking a second iteration method. This raises the question how to couple the two levels of iteration or, posed differently, how accurately the embedded solver has to solve the subproblem. As a concrete example, we study, in the first part of this thesis, modern iterative solvers for computing a few eigenvectors and eigenvalues of large sparse matrices. A different situation where nested iteration schemes appear naturally is when the problem to be solved consists of relatively simple problems that are coupled. An example where this occurs is the so-called Stokes problem from computational fluid dynamics. The Stokes problem consists of two coupled linear systems of equations. The velocity can be computed by solving a linear system which involves the unknown pressure and the pressure depends again, in some discretizations, on the velocity by a second linear system. Here, two-level iteration methods are sometimes used in practical solution strategies. As a different and concrete example, we will study a linear problem that occurs in large scale simulations in quantum chromodynamics, the physical theory that describes the strong interaction between elementary particles. The challenge here is that the system of equations is only given implicitly by a matrix function that must be computed with an iterative method. The solution method that is used in practical simulations for this problem is a standard iteration method for linear system that invokes a second iteration method for dealing with the matrix function. Again, we have to ask ourselves the question how accurately we have to compute this matrix function given a required and predefined precision for the whole problem. In this thesis we study the two separate building blocks that make up a two-level iteration method for both problems and, in particular, the tuning of the coupling between the two levels of iteration which is important for a scientific working environment. Our study should lead to strategies for automatically optimizing this coupling. This puts also emphasis on the individual components. For example, we need good termination criteria for the iterative methods. The outline of this thesis is as follows. Chapter 2 and Chapter 3 are dedicated to the eigenvalue

11 1.1. The eigenvalue problem 3 problem. Chapter 4 and Chapter 5 are concerned with simulations in quantum chromodynamics with overlap fermions. In the remainder of this chapter we give a short summary of the work in this thesis and summarize some of our contributions. 1.1 The eigenvalue problem The origins of eigenvalue problems are diverse and range from search engines for searching the Internet to the stability analysis of large structures. The problem can often be formulated as an algebraic eigenvalue problem where one tries to find a nonzero vector x and a scalar λ such that Ax = λx. Usually only a very small number of eigenvectors and their corresponding eigenvalues are needed. In this case iterative projection methods (sometimes called subspace methods) come into the picture. These methods compute iteratively an approximation to the desired eigenpair by building up a subspace and in every iteration step they extract an approximation to the sought-after eigenpair from this subspace. Two components can be identified. First, there is the computation of appropriate vectors to expand the subspace which may require the (approximate) solution of a linear system. If this is done iteratively we refer to this part as the inner iteration. The other key ingredient is the collection of the expansion vectors in a subspace and the subsequent extraction of good approximations to the wanted eigenpair by an extraction technique. It generally involves an orthogonalization method for constructing an orthogonal basis for the subspace. We term this the outer iteration. The identified structure of iterative projection methods is also reflected by the outline of the first part of this thesis. In Chapter 2 we focus on the extraction of useful eigenvector/eigenvalue approximations from a given subspace and issues concerning the computation of the expansion vectors are treated in Chapter Chapter 2: The subspace extraction We ask ourselves the following main question in Chapter 2: suppose a given subspace contains a good approximation to the eigenvector, how can we extract eigenvector approximations from that subspace? We will review the Rayleigh-Ritz method and prove that for eigenvalues that are in some sense in the exterior of the spectrum, the approximations generated by the Rayleigh-Ritz method are guaranteed to be useful. We study this in more detail, for the situation that A is real symmetric, by deriving a priori error bounds for the eigenvector approximations expressed in terms of the eigenvalues of the matrix A and the angle of the subspace with the eigenvector of interest.

12 4 Chapter 1. Introduction For eigenvalues that are in the interior of the spectrum, i.e., interior eigenvalues, the Rayleigh-Ritz method always constructs good approximations to the eigenvalues but the approximation to the eigenvectors might be useless. This means that alternative extraction methods must be considered for these type of eigenvectors. One such alternative is the recently proposed harmonic Rayleigh-Ritz method, which can be seen as a variant of the Rayleigh-Ritz method. This method involves a parameter that should be chosen appropriately depending on the location of the part of the spectrum that is of interest. In literature, many numerical experiments have been reported showing that harmonic Rayleigh-Ritz indeed resolves the problems of standard Rayleigh-Ritz for interior eigenvalues. Despite this practical success, the effect of the parameter on the method is complex and not well understood. This raises the question how to choose this parameter when we are interested, for example, in the eigenpair with its eigenvalue close to some target value. In Chapter 2 we address this question by showing that the harmonic Rayleigh- Ritz approximations of interest are equal to the approximations of the classical Rayleigh-Ritz method when applied to the transformed eigenvalue problem (A τi) 2 x = (λ τ) 2 x, for a specific value of τ. We also use this relation to give a comparison of harmonic Rayleigh-Ritz and an alternative extraction method, specially designed for interior eigenvalues, known as the refined Rayleigh-Ritz method. For more theoretical purposes, we use the demonstrated equivalence to derive a posteriori and a priori error bounds for the eigenpair approximations of this method. The harmonic Rayleigh-Ritz method generates a whole set of approximations to eigenpairs of the matrix A just as the Rayleigh-Ritz method. From this set an appropriate approximation to the eigenpair of interest must be selected. We conclude Chapter 2 by discussing a new criterion for doing this Chapter 3: Simple vector iterations A reliable and robust extraction method results in an approximation from the subspace to the eigenpair of interest. Based on this approximation, we want to construct a vector to expand our subspace with. This is typically accomplished by iteratively solving some linear system. This is the inner iteration of the iterative projection method and is the subject of Chapter 3. As a basis for an expansion strategy we will discuss simple vector iterations like inverse iteration and Rayleigh quotient iteration. These methods repeatedly compute a new approximation to an eigenvector based only on the approximation from the previous step. The important observation is that we can apply one step of some simple iteration scheme to the extracted approximation from the subspace and expand the subspace with the resulting vector. Many powerful subspace methods

13 1.1. The eigenvalue problem 5 are based on this idea and sometimes the resulting methods are seen as accelerated versions of the simpler iterations. Therefore, we study in Chapter 3 simple vector iterations. Of particular importance is Rayleigh quotient iteration which computes a new approximation u based on a known approximation u by solving the linear system (A ϑi)u = u with ϑ = ut Au u T u. (1.1.1) The matrix on the left is ill conditioned if ϑ is close to an eigenvalue and therefore accurate approximations to u are often too expensive to determine in practice. Rayleigh quotient iteration has appealing local convergence properties that are, unfortunately, lost when the matrix A in (1.1.1) is replaced by a nearby matrix that allows a cheaper computation of an approximate u. In Chapter 3 we propose a simple vector iteration that is based on the correction equation of the Jacobi-Davidson method. This iteration is mathematically equivalent to Rayleigh quotient iteration if the correction equation is solved exactly. However, it is observed in literature that this correction equation is more robust with respect to replacing the exact matrix A with a nearby matrix. We will explain this by relating this iteration scheme to Rayleigh quotient iteration on a nearby matrix that possesses an eigenvector that is, with every step of the iteration, increasingly closer to the wanted eigenpair. This connection leads to convergence bounds for the simple iteration when the matrix A is replaced with some nearby matrix. The correction equation may be solved with a (preconditioned) iterative solver for linear systems. Iterative solvers are usually terminated when a given relative residual precision for the linear system has been obtained. We discuss the effect of this criterion when used in the simple iteration. This confirms the results of Dembo et al. [26] for the more general class of inexact Newton methods. Their results show that higher order convergence can be achieved by working with an increasingly smaller tolerance. As a consequence, this gives a suitable sequence of tolerances for use in the Jacobi-Davidson method. In our final section we discuss some numerical experiments where this strategy is applied for the full Jacobi-Davidson method, that is, an additional outer iteration is added to the simple iteration which contains the subspace acceleration. In these cases a trade-off has to be made between the amount of work that is spent in the inner iterations and in the outer iteration. Obviously, solving the correction equation very accurately is not efficient. Conversely, if the correction equation is solved less accurately then the number of outer iterations grows and therefore, for example, the cost for the orthogonalization of the basis for the subspace increases. We show by several numerical experiments that an improved condition, as discussed for the simple iteration, might also be useful for the complete method.

14 6 Chapter 1. Introduction 1.2 The overlap operator in quantum chromodynamics In the second part of this thesis we start our discussion on numerical techniques for the overlap formulation in quantum chromodynamics (QCD), the physical theory that describes the strong interaction between elementary particles. This overlap formulation initiated a lot of research in solving linear systems of the form (rg 5 sign(q))x = b (r 1), (1.2.1) where Q and G 5 are sparse Hermitian indefinite matrices. In today s simulations the dimension of Q and G 5 is in the order of one to ten million. The matrix sign(q) is the so-called matrix sign function or, more precisely, if we have the eigenvalue/eigenvector decomposition, Q = XDX, with D = diag(λ 1,..., λ n ), then the matrix sign function is defined as sign(q) := X sign(d)x = Xdiag( sign(λ 1 ),..., sign(λ n ))X, where sign(t) is the standard sign function. Solving the full problem, that is solving (1.2.1) for x, given G 5 and Q, requires the solution of a simple linear system coupled to the nonlinear problem of computing the matrix sign function. Although the matrix Q is very sparse, the linear system in (1.2.1) is dense. The solution method that we consider, which is the method of choice in practical simulations, consists of applying a standard iterative solver for linear systems to (1.2.1). This is the outer iteration. One of the main advantages of using an iterative solver for this problem is that the matrix rg 5 sign(q) has not to be known and stored explicitly which is, due to the density and large dimension of this matrix, not feasible. Instead, we need to compute the product of this matrix with some vector in every outer iteration step. (Nevertheless still a computationally demanding task.) Vector iteration methods for computing the product of sign(q) with a generic vector are discussed in Chapter 4. In Chapter 5 we study the impact of an approximate matrix-vector product on various iterative solvers for linear systems which should lead to strategies for tuning the precision of the matrix sign function times vector Chapter 4: Numerical methods for the overlap operator In Chapter 4 we focus on the computation of the product of the matrix sign function with a generic vector, say y. The methods that we will consider are vector iteration methods that compute, in step k, an approximation of the form sign(q)y p(q)y,

15 1.2. The overlap operator in quantum chromodynamics 7 where p is a polynomial of degree less than k. We give a unified treatment of, and propose various improvements to, a number of methods that have been considered previously in literature. We consider, among others, methods based on Chebyshev polynomials, Lanczos approximations and methods exploiting the multi-shift Conjugate Gradient method. Special emphasis is put on explicit accuracy bounds on the inner iterations. This is important in order to be able to tune the precision of the computed matrix-vector product in every outer iteration step with strategies that we will propose in Chapter 5. We develop procedures for various approximation methods that guarantee a given accuracy for the matrix-vector product. In one particular method, frequently used by physicists, the matrix sign function is approximated by a rational matrix function written as the sum of poles, this gives m sign(q)y ω i Q(Q 2 + τ i I) 1 y. i=1 The choice of the shifts τ i, the weights ω i and the number of poles, m, depends on the type of rational approximation used, the location of the eigenvalues of Q and the required precision. This scheme reduces the problem to solving m, socalled, shifted linear systems which may be efficiently accomplished with a method from the class of multi-shift Krylov subspace methods, which are variants of the standard iterative methods designed for solving families of shifted systems. The cost of this method depends, besides the standard cost of the conjugate gradient method, on the number shifted systems to be solved. We will improve this method considerably by reducing the number of poles. First, we propose a new rational approximation based on the work of Zolotarev. This leads to a significant reduction of the number of necessary poles compared to rational approximation previously used in the computation of the overlap operator. Furthermore, we propose a modification of the multi-shift iterative solver that saves computational work by using an individual tolerance for each shifted system. Again, we are developing a procedure to guarantee a given accuracy. Chapter 4 is concluded with a comparative study with realistic configurations of the various improved methods on a parallel cluster computer. This shows that our new multishift approach based on Zolotarev s work in combination with early termination of converged shifted systems is the most efficient Chapter 5: Inexact Krylov subspace methods Matrix-vector products are an essential ingredient of iterative solvers for linear systems, in particular of the so-called Krylov subspace methods. In Chapter 5 we discuss the impact of an approximately computed matrix-vector product on a variety of iterative solvers for linear systems. Although this problem was motivated by the overlap formulation in quantum chromodynamics, we will give a very general

16 8 Chapter 1. Introduction treatment of this problem for linear systems of the form Ax = b. Following nomenclature often used in literature, we will refer to Krylov subspace methods with approximate matrix-vector product as inexact Krylov subspace methods. The errors in the matrix-vector products essentially have two consequences: the accuracy of the iterative method is limited and, secondly, the convergence speed is altered. We investigate both aspects by studying the convergence behavior and smallest attainable value of the true residual, defined as b Ax k where x k is the computed approximation in step k of the iterative method. A consequence of working with an inexact matrix-vector product is that the computed residual in step k, r k, usually is not a residual anymore corresponding to the computed approximation x k, hence, r k b Ax k. We have that b Ax } {{ k } 2 r k (b Ax k ) } {{ } 2 + r }{{} k 2, true residual residual gap computed residual where the first quantity on the right is commonly referred to as the norm of the residual gap. This simple inequality forms the basis of our analysis. We argue that the attainable accuracy is determined by the norm of the residual gap whereas convergence speed is determined by the computed residuals. In Chapter 5 we study the residual gap and convergence behavior of the computed residuals for various Krylov subspace methods, including stationary methods, like Chebyshev iteration, as well as several non-stationary methods as the Conjugate Gradient method and the GMRES method. Bouras and Frayssé present in a recent technical report [14] a large number of numerical experiments in which, in step k of the GMRES method, the matrixvector product is computed with a relative precision given by ε b Ax k 1 2. The value of ε is chosen in the order of the required residual precision. They empirically observe that, for this choice of the precision, the attainable precision of the inexact method is about ε. Furthermore, they notice from their numerical experiments that the convergence speed of the perturbed method is approximately as fast as for the exact GMRES method. They refer to this choice for the relative precision as a relaxation strategy since it results in very accurate matrix-vector products in the early iterations but this precision is relaxed during the iteration process as soon as x k 1 becomes a better approximation to the exact solution. Our analysis in Chapter 5 explains the success of this strategy and shows that it is essentially correct and optimal. Furthermore, we point out when a similar strategy is appropriate for other Krylov subspace methods as well, and for some Krylov methods an even more aggressive relaxation strategy is proposed.

17 1.2. The overlap operator in quantum chromodynamics 9 In the second part of Chapter 5 we discuss the computational advantages and drawbacks of the use of a relaxation strategy. We argue that the drawbacks can be overcome by preconditioning an inexact Krylov subspace method by another inexact Krylov subspace method set to a larger precision. This means, for example for the QCD problem, that we get, in total, a three-level iteration scheme. The nesting of inexact Krylov subspace methods can be a very effective tool in reducing the total cost of the matrix-vector multiplications, we demonstrate this for a Schur complement system that stems from a model that describes the steady barotropic flow in a homogeneous ocean with constant depth.

18

19 Chapter 2 Eigenvector approximations from a subspace The research in this chapter is published as part of: G. L. G. Sleijpen, J. van den Eshof, and P. Smit. Optimal a priori error bounds for the Rayleigh-Ritz method. Math. Comp., 72: , G. L. G. Sleijpen and J. van den Eshof. On the use of harmonic Ritz pairs in approximating internal eigenpairs. Linear Algebra Appl., 358(1-3): , 2003.

20 12 Chapter 2. Eigenvector approximations from a subspace 2.1 Introduction In many scientific computations it is at some point necessary to compute an eigenvector corresponding to some eigenvalue of a matrix A. Or, in other words, one wants to find an approximation to a pair (λ, x) (with x 0) that satisfies Ax = λx. Often the matrix A is of very large dimension but contains a few nonzero elements and only a small subset of the eigenvalues and eigenvectors is required. Iterative projection methods are designed for solving these large sparse eigenvalue problems and well-known examples of methods in this class include the Lanczos method [94, Chapter 13], the Davidson method [24] and Jacobi-Davidson [110], to mention only a few. There are two distinct aspects of these type of projection methods. The first is the step-by-step construction of a subspace that contains approximations to the sought-after eigenvectors. The second aspect is the extraction of good eigenvector approximations from that subspace by using a projection technique. The subspace projection is sometimes viewed as a way to accelerate the convergence of a simple iteration method, in a similar fashion as, for example, GMRES for systems of linear equations can be seen as an accelerated version of Richardson iteration. However, the situation for eigenvalue methods is often more delicate because frequently an approximate eigenpair from the subspace is used in the computation of a vector to expand the subspace or for restart purposes. For this reason the success of the solution method crucially depends on the success of extracting a good eigenvector approximation to a relevant eigenpair. In this chapter we focus on the extraction phase. The expansion of the subspace is the subject of Chapter 3. This means that in this chapter we assume that we are given some subspace that contains a reasonable approximation to the eigenvector of interest to us which depends on the particular application. In the remainder of this section we outline the organization of this chapter. The best-known method for forming approximations from a given subspace is the Rayleigh-Ritz method which we discuss in Section 2.2. We then review a result that says that, if the subspace contains a good approximation to the wanted eigenvector, the Rayleigh-Ritz method constructs at least one approximate eigenpair for which the approximate eigenvalue is close to the eigenvalue of interest. We consider how good the associated approximate eigenvector (called a Ritz vector) is as approximation to the eigenvector. This is the central question that we ask ourselves for Rayleigh-Ritz since it teaches us for which type of eigenvalues the Rayleigh-Ritz method is guaranteed to be an appropriate method. We will show that, in order for the eigenvector approximation to be relevant, it is sufficient that the target eigenvalue is in some sense an outlier in the spectrum. We will say that this eigenvalue is in the exterior of the spectrum. To get some insight into the behavior of the Ritz vectors as a function of the quality of the given subspace, we work out the details for the symmetric case by

21 2.1. Introduction 13 deriving error bounds for the Rayleigh-Ritz approximation to the eigenpair with the smallest eigenvalue. The bounds are expressed in terms of the eigenvalues of A and the angle between the subspace and the eigenvector of interest. We may therefore call these bounds truly a priori. (Obviously, all results can be transformed to statements about the largest eigenvalue and corresponding eigenvector by replacing A with A.) This is the subject of Section 2.3. In practical applications one is often searching for an eigenpair with the eigenvalue in some relevant region of the complex plane. For example, one is interested in the smallest eigenvalue or the one closest to some target value in the interior of the spectrum. Unfortunately, Rayleigh-Ritz is less suitable in this latter case. We discuss this in Section In particular for symmetric matrices there are various efforts to overcome the difficulties with finding interior eigenpairs. For example, Scott [102] argues that working with a shifted and inverted operator in Rayleigh-Ritz is preferable. Morgan points out in [86] that the necessary expensive inversion of the operator can be handled implicitly with a particular choice for the subspace. The resulting method has been given the name harmonic Rayleigh-Ritz in [93]. Independently of this work, the eigenvalue approximations of this method (the harmonic Ritz values) had already received considerable attention in the special case that the subspace is a so-called Krylov subspace. Then the harmonic Ritz values are equal to the roots of Kernel polynomials which play an important role in the theory of iterative minimal residual methods for linear systems, see [35, 84] and [32, Section 2.5] for some recent work and references. For general subspaces, harmonic Ritz values have also been studied in the context of Lehmann s optimal inclusion intervals for eigenvalues [81, 82, 94, 7]. The connection between these different areas of research was made in [93]. In Section 2.4 we give a definition of harmonic Rayleigh-Ritz with respect to some shift parameter and we summarize some useful properties in Section 2.5. Subsequently, in Section 2.6, we compare harmonic Rayleigh-Ritz to refined Rayleigh-Ritz. Refined Rayleigh-Ritz, popularized by Jia [72], is another method to compute approximations from a subspace specially designed for eigenvectors with eigenvalues in the interior of the spectrum. In Section we give a relation that shows that both methods are equivalent in some sense. Although the relation between these two approaches is of interest on its own account, it turns out to be also useful in the rest of this chapter. If we vary the shift in the harmonic Rayleigh- Ritz method then the angle between the eigenvector approximation (the harmonic Ritz vector) and the target eigenvector changes. As an application of the relation between harmonic and refined Rayleigh-Ritz we also discuss in Section 2.6 the question, what shift for harmonic Rayleigh-Ritz minimizes this angle. This should provide insight into the issue of choosing this shift parameter. The subject of Section 2.7 is a priori error bounds for the harmonic Rayleigh-Ritz method. We generalize well-known error bounds for Rayleigh-Ritz to the harmonic Rayleigh-Ritz context and discuss some of their limitations. A posteriori error

22 14 Chapter 2. Eigenvector approximations from a subspace bounds for the harmonic Ritz values are discussed in Section 2.8. By changing the shift in harmonic Rayleigh-Ritz different intervals can be obtained. Each interval contains at least one eigenvalue. We give a condition for a posteriori choosing a new shift that results in a smaller inclusion interval. Repeatedly relocating the shift using this condition will ultimately result in an, evidently appealing, optimal interval with respect to the given information. This interval can be used as an a posteriori error estimator. So far we have assumed that we were able to identify the harmonic Ritz pair that has its approximate eigenvector close to the wanted eigenvector. When searching for the smallest and largest eigenvalues of a symmetric matrix with the Rayleigh- Ritz method, this is indeed not a difficult problem. However, when searching for an eigenpair with its eigenvalue closest to some target with harmonic Rayleigh-Ritz this is less obvious. For a particular shift, the harmonic Rayleigh-Ritz method produces a set of harmonic Ritz vectors. In practice, the eigenvector is unknown, and it is not obvious how to tell which vector from this set forms the best approximation to the target eigenvector. The problem of selecting a well-suited harmonic Ritz vector for a given shift is treated in Section 2.9. Although some of the results in this chapter have practical applications, the purpose of this chapter is to provide insight rather than algorithms. 2.2 Rayleigh-Ritz approximations Let A C n n be a general matrix with eigenpairs (λ, x) and let V C n k be a matrix, whose columns form an orthonormal basis for the k dimensional subspace V. We are interested in techniques that compute approximations from a subspace to eigenpairs. The most important method in this class is the Rayleigh- Ritz method. The Rayleigh-Ritz method obtains k approximate eigenpairs (ϑ, u), the so-called Ritz pairs, by imposing the Ritz-Galerkin condition Au ϑu V with u V\{0}, or equivalently, V AV z ϑz = 0 with u := V z 0. (2.2.1) The value ϑ can be seen as an approximation to an eigenvalue of A and is called a Ritz value. The associated vector u (Ritz vector) forms an approximation to an eigenvector of A. 1 From (2.2.1) it follows that ϑ equals the so-called Rayleigh 1 According to B.N. Parlett the terms Ritz value and Ritz vector are not correct in the non- Hermitian case for historical reasons. He proposed to overcome this problem of nomenclature by adding quotations marks in the non-hermitian case, i.e., use the terms Ritz value and Ritz vector. We, however, will not follow this suggestion.

23 2.2. Rayleigh-Ritz approximations 15 quotient, ρ(u), of the vector u, ϑ = ρ(u) where ρ(v) := v Av v v. We will assume throughout this chapter that u 2 = 1. In this chapter we assume that we are looking for some approximation to a particular eigenpair that we denote with (λ, x). In order to be able to construct robust algorithms for eigenvector computation, we need reliable methods for extracting eigenvector approximations from a subspace to this eigenvector x and similar for the eigenvalue. Therefore, we consider the following question: suppose that we are searching for an eigenpair (λ, x) and that (V, x) is small, is there then a Ritz pair (ϑ, u) such that ϑ λ and (u, x) are small? For the Ritz values this question is answered positively by the following result. Theorem (Stewart and Jia [74]). There exists a Ritz value ϑ such that ϑ λ 4 A 2 tan (V, x) 1/k (2 + tan (V, x) ) 1 1/k. This shows that if the angle between the subspace V and the unknown eigenvector x decreases there is always a Ritz value getting closer and closer to the eigenvalue λ. For the Ritz vectors the following result is well known. It was originally proved by Saad [98] for real symmetric matrices and later extended by Stewart to the general case [119]. Theorem (Stewart [119]). Let u be a Ritz vector with respect to the space V. Let W be the orthogonal complement of u in V. Then ) sin 2 (u, x) (1 + η2 sin 2 (V, x), (2.2.2) where η := V V A(I V V ) 2 and α := inf z 2=1 (W AW )z λz 2 with W an arbitrary orthogonal basis for W. α 2 The problem with this bound is that the value α in general cannot be bounded from below a priori. It can be shown [74] that a similar result holds with λ in the expression for α in Theorem replaced by ϑ. This gives the possibility of a posteriori checking the quality of the Ritz vector. Nevertheless, it can happen in practice that a good subspace V (i.e., (V, x) small) results in a Ritz value close to the eigenvalue of interest, λ, but the theorem does not guarantee that the corresponding Ritz vector is a good approximation to x because α is small. Unfortunately, in practical situations it is observed that in these cases the Ritz vector can be totally irrelevant. We return to this in Section (see also [102, 74, 86]). Theorem does not exclude that there are eigenvalues λ for which we, at forehand, can say that we can safely use the Ritz vector corresponding to ϑ as an

24 16 Chapter 2. Eigenvector approximations from a subspace approximation to x. This means that we have to show that α is bounded from below if ϑ is close to the target eigenvalue. This quantity α is unfortunately difficult to assess since it requires knowledge of the unknown Ritz vector. Therefore, we give the following variant of Theorem that involves a quantity γ( ). Theorem Let (ϑ, u) be a Ritz pair with respect to the space V. If γ(ϑ) > 0 where γ(µ) := min z Az z x,z 0 z z µ, then where η := (I xx )(A ϑi)(i xx ) 2. tan (u, x) η tan (V, x) (2.2.3) γ(ϑ) Proof. Without loss of generality we can assume that the matrix A is upper triangular and is of the form [ ] λ r A =, (2.2.4) 0 R with r R 1 n 1 and R R n 1 n 1 upper triangular. Hence, the eigenvector of interest is simply the first standard basis vector: x = e 1. Let x V be the projection of x onto the space V. For the moment we assume that we can write u := (e 1u) 1 u = [ 1 e ] and x V := (e 1x V ) 1 x V = [ 1 f ]. (2.2.5) The residual of the Ritz vector u, Au ϑu = [ λ ϑ + re (R ϑi)e ], is by definition orthogonal to u and x V. This results in the two equations 0 = λ ϑ + re + e (R ϑi)e 0 = λ ϑ + re + f (R ϑi)e. Equating both expressions and taking the absolute value on both sides gives γ(ϑ) e 2 2 e (R ϑi)e = f (R ϑi)e f 2 e 2 R ϑi 2 from which (2.2.3) follows. It remains to be checked that u is not perpendicular to x, or equivalently, [ ] 0 u =: u e for some e with e 2 = 1. This also implies that x V is nonzero. The proof is by contradiction: writing out u (Au ϑu ) = 0 gives that e (R ϑi)e = 0 which implies that e = 0 since γ(ϑ) > 0. This concludes the proof.

25 2.3. A priori error bounds for the Ritz pair 17 It follows from this theorem that if γ(λ) > 0 then the Ritz vector associated to ϑ is a good approximation to x if ϑ is close enough to λ. Theorem shows that there is a Ritz value arbitrarily close to λ if the quality of the subspace is high (that is, (V, x) is small). Hence, for this type of eigenvalue we can safely use the Rayleigh-Ritz approximation without extra precautions. The condition γ(λ) > 0 means that λ is, in some sense, an extreme eigenvalue. For example if A is normal it says that λ is outside the convex hull of the other eigenvalues of A. In particular for the real symmetric case it means that we can expect sensible approximations for the smallest and largest eigenpair. The bound in Theorem does not provide a true a priori error bound since it requires knowledge of ϑ. Therefore, it is difficult to interpret how the quality of the Ritz vector precisely depends on the quality of the subspace ( (V, x)). In the next section we derive a priori error bounds for the Ritz vector in the real symmetric case when approximating the smallest eigenvalue. 2.3 A priori error bounds for the Ritz pair From now on we assume that A R n n is symmetric and V is real. The eigenpairs (λ i, x i ) of A are numbered such that λ 1 λ 2 λ n, and we index the Ritz values in a similar fashion: ϑ 1 ϑ 2 ϑ k 1 ϑ k. From Theorem we know that the Rayleigh-Ritz method can be safely used for finding an approximation to the first eigenpair. In this section we want to make this statement more precise and we are interested in the Ritz pair, (ϑ V, u V ), for which sin 2 (u V, x 1 ) is minimal over all Ritz vectors u i. This is the pair with the Ritz vector that makes the smallest angle with x 1 over all Ritz vectors. In the ideal case we would have that u V is a multiple of x V, where x V is the normalized projection of x 1 on V. This would give sin 2 (u V, x 1 ) = sin 2 (V, x 1 ), which is optimal. Unfortunately, the approximation u V is not a multiple of x V in general. In this section we derive optimal upper bounds for the first Ritz pair. We will moreover show that ϑ V equals ϑ 1 given that the subspace contains a sufficiently accurate approximation. This is the subject of Section For convenience of the reader and comparison purposes, we start in the next subsection with discussing some classical bounds for the first Ritz pair that can be found in literature. Besides our theoretical interest in a priori error bounds, the new, sharper bounds can be used to improve a priori convergence bounds for iterative eigenvalue methods. Often, the analysis of these methods can be split in the construction of an

26 18 Chapter 2. Eigenvector approximations from a subspace upper bound on sin 2 (V, x 1 ) and the analysis of the error contributed by the Rayleigh-Ritz method. For example, Theorem 1 in [98] gives a bound for the angle between x 1 and Krylov subspaces. Combining this with the classical and known error bounds discussed in the next section gives precisely the bound for the first eigenvector of Kaniel [75] for the Lanczos method. In literature, these bounds are often improved by (implicitly) constructing better bounds for sin 2 (V, x 1 ). In this section we focus on error bounds for the Rayleigh-Ritz method and our results are not restricted to a specific method A well-known upper bound A first approach for obtaining a true a priori bound is suggested at the end of Section 11.9 in [94] where the elegant bounds of Kaniel [75] (see also [94, Theorem ]) are the starting point. Using the notation ε := sin 2 (V, x 1 ) these bounds are summarized by the following theorem. Theorem (Kaniel [75]). Furthermore, both inequalities are sharp. ϑ 1 λ 1 (λ n λ 1 )ε (2.3.1) sin 2 (u 1, x 1 ) ϑ 1 λ 1 λ 2 λ 1. (2.3.2) We recall that for more general matrices we gave an error bound in Theorem for the Ritz vector in case the corresponding Ritz value is close to an extreme eigenvalue λ, that is γ(λ) > 0. An interesting question is, if we, for this more general situation, can also derive a bound in terms of the eigenvalues of the matrix and λ ϑ as in (2.3.2). It turns out that this is not possible, not even for γ(λ) > 0. To see this, let u be some vector with ρ(u) = λ and let u and A be decomposed as in (2.2.5) and (2.2.4), respectively. Then, 0 = ρ(u) λ = r e + e Re. This equality does not imply that e = 0 if γ(λ) > 0. Therefore, it follows from this example of the Rayleigh-Ritz method with a one dimensional subspace that it is in general not possible to derive error bounds for the Ritz vector in terms of the quantity ϑ λ and the eigenvalues of the matrix only (unless r in (2.2.4) is zero). We return to the issue of deriving a priori error bounds for the Ritz vectors in the symmetric case. From Theorem we can easily obtain an error bound for the first Ritz vector that is truly a priori, in other words it is expressed in terms of ε and the eigenvalues of A. The proof of this statement is a straightforward combination of (2.3.1) and (2.3.2).

27 2.3. A priori error bounds for the Ritz pair 19 Theorem sin 2 (u 1, x 1 ) λ ( n λ 1 ε = 1 + λ ) n λ 2 ε. (2.3.3) λ 2 λ 1 λ 2 λ 1 Although (2.3.3) is a combination of the sharp bounds (2.3.1) and (2.3.2), there is no guarantee that this bound is sharp itself. Since (2.3.2) attains equality if u 1 has a component in the direction of x 2, while for (2.3.1) equality is attained when there is a component in the direction of x n, it is suggested that (2.3.3) may not be sharp. Indeed, in the next section we improve this bound and construct a sharp bound for ε < λ2 λ1 λ n λ 1. Notice that (2.3.3) is not useful when this condition on ε is not fulfilled. Another question that we address is whether ϑ V equals ϑ 1. This is important for the selection problem, i.e., at some point, it is necessary to select the Ritz vector that makes the smallest angle with x A sharp upper bound In his PhD thesis [117] and in technical report [116], Smit addressed the problem of obtaining optimal bounds for the Rayleigh-Ritz process. He derived such bounds for the case dim(v) = 2 and generated approximations for the k dimensional case (k > 2) by numerical experiments. On the basis of his numerical results, he conjectured that when ε < λ2 λ1 λ n λ 1, the optimal bound for the k dimensional case equals the optimal bound for the two dimensional case. In this section we prove that this is indeed correct. For convenience we use the following notation. Let δ V := min sin 2 (u j, x 1 ), where the minimum is taken over all Ritz vectors, u j, with respect to V. Put ε V := sin 2 (V, x 1 ). For ε > 0 we define δ k (ε) := max{δ V dim(v) = k, ε V ε}. The following lemma is an adaptation of Theorem 4.1 in [116]. We give a shorter proof and have added the statement that ϑ V = ϑ 1 in case ε < λ2 λ1 λ n λ 1 which we need in the remainder of this section. Lemma If dim(v) = 2 and 0 ε < λ2 λ1 λ n λ 1, then ϑ V = ϑ 1 < λ 2. Furthermore, with κ := { 1 2 δ 2 (ε) = (1 + ε) 1 2 (1 ε)2 κε if ε < λ2 λ1 λ2 λ1 (1 + ε) if ε 1 2 (λ n λ 2) 2 (λ n λ 1)(λ 2 λ 1). λ n λ 1, λ n λ 1,