A perturbed two-level preconditioner for the solution of three-dimensional heterogeneous Helmholtz problems with applications to geophysics

Transcription

1 Institut National Polytecnique de Toulouse(INP Toulouse) Matématiques, Informatiques et Télécommunication Xavier Pinel mardi18mai2010 A perturbed two-level preconditioner for te solution of tree-dimensional eterogeneous Helmoltz problems wit applications to geopysics Hélène Barucq, Rapporteur, Henri Calandra, Membre du Jury Iain Duff, Membre du Jury, Andreas Frommer, Rapporteur Serge Gratton, Directeur de tèse, Cornelis Oosterlee, Rapporteur Xavier Vasseur, co-encadrant Matématiques Informatique Télécommunications(MITT) CERFACS Serge Gratton Hélène Barucq, Andreas Frommer et Cornelis Oosterlee

2 Dissertation for te degree of doctor in Matematics, Computer Science and Telecommunications (ED MITT) A perturbed two-level preconditioner for te solution of tree-dimensional eterogeneous Helmoltz problems wit applications to geopysics Xavier Pinel (PD student, CERFACS and INPT) Hélène Barucq Researc director, INRIA France Referee and University of Pau Henri Calandra Senior advisor, TOTAL France Member of jury Iain Duff Professor, RAL and CERFACS UK, France Member of jury Andreas Frommer Professor, University of Wuppertal Germany Referee Serge Gratton Professor, ENSEEIHT and INPT/IRIT France PD advisor Cornelis Oosterlee Professor, Delft University of Tecnology Te Neterlands Referee and CWI Amsterdam Xavier Vasseur Senior researcer, CERFACS France PD co-advisor July 23, 2010

3 ii

4 iii Remerciements En premier lieu, je désirerais remercier le groupe énergétique TOTAL pour le financement de ma tèse au travers du CERFACS ainsi que les membres de mon Jury de tèse. En particulier, je tiens à exprimer ma gratitude à mon directeur de tèse, le professeur Serge Gratton, et à mon co-encadrant, le docteur Xavier Vasseur, sans qui ce travail n aurait pas été possible. Il en va de même pour les rapporteurs de ma tèse: la directrice de recerce Hélène Barucq, le professeur Andreas Frommer et le professeur Kees Oosterlee. Je sais gré à tous les membres de l équipe ALGO du CERFACS et à son cef, le professeur Iain Duff, d avoir été à mes côtés durant ces quatre dernières années: Anke, Antoine, Audrey, Azzam, Bora, Brigitte, Caroline, Mme Catelain, Fabian, François, Jean, Kamer, Léon, Marc, Martin, Mélodie, Milagros, Moamed, Nicole, Pablo, Pavel, Pillip, Rafael, Riad, Selime, Tzvetomila, Xueping. Je voudrais pareillement saluer l équipe APO de L ENSEEIHT et l équipe MUMPS dont l aide et les conseils m ont été précieux. Je souaite également remercier les personnes dont la collaboration m a permis de mener à bien ce projet: Henri Calandra et Pierre-Yves Aquilanti de TOTAL, Luc Giraud, Julien Langou, ainsi que les organismes de calcul intensif dont j ai utilisé les super-calculateurs: le CINES, le CSC-IT Espoo, l IDRIS et le Jülic Forscungszentrum. Finalement, je tiens à témoigner ma reconnaissance à mes parents et amis: M. et Mme Pinel, Franzi, Pilippe, Laetitia, Pépé, Mamie, Tatie Joe, Gérard, Constance, Bernie, Tatie Anne, Caroline, Marion, Elizabet, Henri Pinel, Clément, les descendants de Jeannot, Jako, Glup, Coco, Biquet, Dani, Marc, Otto, Julie, Pierre, Célia, Manu, Ponco, Vincent, Kévin...

5 iv Tesis Summary Te topic of tis PD tesis is te development of iterative metods for te solution of large sparse linear systems of equations wit possibly multiple rigt-and sides given at once. Tese metods will be used for a specific application in geopysics - seismic migration - related to te simulation of wave propagation in te subsurface of te Eart. Here te tree-dimensional Helmoltz equation written in te frequency domain is considered. Te finite difference discretization of te Helmoltz equation wit te Perfect Matced Layer formulation produces, wen ig frequencies are considered, a complex linear system wic is large, non-symmetric, non-hermitian, indefinite and sparse. Tus we propose to study preconditioned flexible Krylov subspace metods, especially minimum residual norm metods, to solve tis class of problems. As a preconditioner we consider multi-level tecniques and especially focus on a two-level metod. Tis twolevel preconditioner as sown efficient for two-dimensional applications and te purpose of tis tesis is to extend tis to te callenging tree-dimensional case. Tis leads us to propose and analyze a perturbed two-level preconditioner for a flexible Krylov subspace metod, were Krylov metods are used bot as smooter and as approximate coarse grid solver.

6 Contents 1 Introduction 1 2 Krylov subspace metods Introduction Notations General Minimum RESidual (GMRES) Restarted GMRES wit rigt preconditioning Flexible GMRES Spectrum analysis in te Flexible GMRES metod GMRES wit deflated restarting Flexible GMRES wit deflated restarting Analysis of a cycle Algoritm and computational aspects Numerical experiments Block Krylov metods Principles of block Krylov metods Block FGMRES Block FGMRES wit deflation Numerical experiments Conclusions A tree-dimensional geometric two-level metod applied to Helmoltz problems Introduction Sort introduction to tree-dimensional geometric multigrid Basic geometric multigrid components Geometric multigrid algoritms Rigorous and Local Fourier Analysis of a two-grid metod Rigorous Fourier Analysis (RFA) of a two-grid metod Local Fourier analysis (LFA) of a two-grid metod A perturbed two-level preconditioner Approximation of te convergence factor of a perturbed two-grid metod Smooter selection Spectrum analysis of te perturbed two-level metod in te Flexible GMRES framework Algoritm of te perturbed two-level preconditioner for tree-dimensional Helmoltz problem Influence of te approximate coarse solution on te convergence of te Krylov metod Spectrum analysis in te flexible GMRES framework for tree-dimensional omogeneous Helmoltz problems Conclusions Numerical experiments - Applications to geopysics Introduction Tree-dimensional omogeneous Helmoltz problems wit a single rigt-and side PRACE experiments: Cray XT4 at Espoo (Finland) v

7 vi PRACE experiments: IBM Blue Gene/P at Jülic (Germany) Tree-dimensional eterogeneous Helmoltz problems wit a single rigt-and side SEG/EAGE Salt dome model problem SEG/EAGE Overtrust model problem Tree-dimensional eterogeneous Helmoltz problems wit multiple rigt-and sides SEG/EAGE Salt dome model problem SEG/EAGE Overtrust model problem Conclusions Conclusions 99 A Tree-dimensional Helmoltz equation in te frequency domain wit a PML formulation 101 A.1 Continuous formulation A.2 Discrete formulation B Résumé en Français 107

8 List of Figures 2.1 Histories of convergence for te convection-diffusion problem of FGMRES (5) preconditioned by full GMRES (m inner ) for different values of m inner Plot of Λ(H m+1 (m inner )) wit te convection-diffusion problem, for FGMRES (5) preconditioned by a full GMRES (m inner ) for different values of m inner Histories of convergence for te FIDAP-ex11 matrix of FGMRES (5) preconditioned by a diagonal preconditioned full GMRES (m inner ) for different values of m inner Plot of Λ(H m+1 (m inner )) for te FIDAP-ex11 matrix, wit FGMRES (5) preconditioned by a diagonal preconditioned full GMRES (m inner ) for different values of m inner Histories of convergence of block metods wen solving te Poisson problem wit p = 5 canonical rigt-and sides (Table 2.6) Histories of convergence of block metods wen solving te Poisson problem wit p = 10 canonical rigt-and sides (Table 2.6) Histories of convergence of block metods wen solving te Poisson problem wit p = 5 random rigt-and sides (Table 2.7) Histories of convergence of block metods wen solving te Poisson problem wit p = 10 random rigt-and sides (Table 2.7) Histories of convergence of block metods wen solving te convection-diffusion problem for p = 5 rigt-and sides (Table 2.8) Histories of convergence of block metods wen solving te convection-diffusion problem for p = 10 rigt-and sides (Table 2.8) A 3D fine grid wit standard geometric coarsening ( : coarse grid point) Fine grid for a 3D trilinear interpolation ( : coarse grid points) Weigtings for 3D interpolation on a cube face ( : coarse grid points) Two-grid V-cycle F-cycles for two, tree and four grids (from left to rigt) Spectra of L (0) 1 (β) for two values of β, (β = 0, ω r = 0.8) (left) and (β = 0.6, ω r = 0.3) (rigt), considering a 64 3 grid for a wavenumber k = π/(6) History of convergence of GMRES(5) preconditioned by a two-grid cycle using two Gauss- Seidel iterations as pre- and post-relaxations (ν 1 = ν 2 = 2) to solve a one-dimensional Helmoltz problem wit PML (1/ = 1024, k = π 6 ) for four values of β (0, 0.5, 0.6, 0.7). Convergence is acieved only in te case β = 0 ere Spectra of A (0) 1 (β) (1/ = 1024, k = π 6 ) using two Gauss-Seidel iterations as pre- and post-relaxations (ν 1 = ν 2 = 2) for four values of β, from left to rigt and from top to bottom, β = 0.5, β = 0.6, β = 0.7 and β = 0 respectively. Te unit circle centered in one (in blue) is used to scale te spectra History of convergence of GMRES(5) preconditioned by a two-grid cycle using two Gauss- Seidel iterations as pre- and post-relaxations (ν 1 = ν 2 = 2) to solve a one-dimensional Helmoltz problem wit PML (1/ = 1024, k = π 6 ) for four values of β ( 0.7, 0.6, 0.5, 0) Spectra of A (0) 1 (β) (1/ = 1024, k = π 6 ) using two Gauss-Seidel iterations as pre- and post-relaxations (ν 1 = ν 2 = 2) for four values of β, from left to rigt and from top to bottom, β = 0.5, β = 0.6, β = 0.7 and β = 0 respectively. Te unit circle centered in one (in blue) is used to scale te spectra vii

9 viii 3.11 Histories of convergence of FGMRES(5) preconditioned by a tree-grid V-cycle wit two iterations of lexicograpical forward Gauss-Seidel as pre- and post-smooter (ν 1 = ν 2 = 2) for a wavenumber k = π Spectra of L (0) 1 (ε 2) for two values of ε 2 (ε 2 = 0 (left) and ε 2 = 0.1 (rigt)), considering Helmoltz problems wit Diriclet boundary conditions wit a 64 3 grid for a wavenumber k = π/(6) and two iterations of Jacobi as a smooter (ν 1 = ν 2 = 1) wit relaxation parameter ω r = Slice of te initial error (y = 0.5) in te plane (x, z) for te 64 3 grid built wit te Matlab random number generator rand( seed,0) Slices of te error (y = 0.5) in te plane (x, z) after one iteration of Gauss-Seidel (GS LEX (1), left) and two iterations of Gauss-Seidel (GS LEX (2), rigt) for te 64 3 grid (k = 33.51) on two processors Slices of te error (y = 0.5) in te plane (x, z) after one iteration of Symmetric Gauss-Seidel (GS S Y M (1), left) and two iterations of Symmetric Gauss-Seidel (GS S Y M (2), rigt) for te 64 3 grid (k = 33.51) on two processors Slices of te error (y = 0.5) in te plane (x, z) after one iteration of GMRES (GMRES (1), left) and two iterations of GMRES (GMRES (2), rigt) for te 64 3 grid (k = 33.51) on two processors Slices of te error (y = 0.5) in te plane (x, z) after one iteration of GMRES preconditioned by one iteration of symmetric Gauss-Seidel (GMRES (ν)/gs S Y M (1), left) and two iterations of GMRES preconditioned by one iteration of symmetric Gauss-Seidel (GMRES (ν)/gs S Y M (2), rigt) for te 64 3 grid (k = 33.51) on two processors From rigt to left: Λ(H m+1 ) for different coarse tolerance ε 2, m = 5 on a grid wit k = π and PML Number of iterations needed by GMRES(10) preconditioned by a reverse symmetric Gauss- 78 Seidel cycle to converge to 0.6 wit respect to te FGMRES(5) current iteration From rigt to left: Λ(H m+1 ) spectrum using 100 coarse iterations of GMRES(10) preconditioned by a reverse symmetric Gauss cycle for a grid (k = ) and a grid (k = ) to converge to 10 6 wit FGMRES(5) Number of iterations (It) of Table 4.1 for bot single and double precision aritmetic wit respect to te wavenumber k Contours and sections of te solution of te tree-dimensional Helmoltz problem at f = 2.5Hz (rigt) and te SEG/EAGE Salt dome - velocity field (left) Contours and sections of te solution of te tree-dimensional Helmoltz problem at f = 5Hz (rigt) and te SEG/EAGE Salt dome - velocity field (left) Contours and sections of te solution of te tree-dimensional Helmoltz problem at f = 10Hz (rigt) and te SEG/EAGE Salt dome velocity field (left) Contours and sections of te solution of te tree-dimensional Helmoltz problem at f = 20Hz (rigt) and te SEG/EAGE Salt dome velocity field (left) Contours and sections of te solution of te tree-dimensional Helmoltz problem at f = 3.75Hz (rigt) and te SEG/EAGE Overtrust velocity field (left) Contours and sections of te solution of te tree-dimensional Helmoltz problem at f = 7.5Hz (rigt) and te SEG/EAGE Salt dome velocity field (left) Contours and sections of te solution of te tree-dimensional Helmoltz problem at f = 15Hz (rigt) and te SEG/EAGE Overtrust velocity field (left) Contours and sections of te solution of te tree-dimensional Helmoltz problem at f = 30Hz (rigt) and te SEG/EAGE Overtrust velocity field (left) A.1 Slice of a tree-dimensional solution (Ω = [0, 1] 3, = 1/512, k = π ), Te source term is 6 located at ( 1 2, 1 2, L PML + ). Red lines represent te interface between te interior and te PML zone A.2 Cartesian stencil (7 points) for a Laplacian-like operator A.3 Pattern of te Helmoltz matrix wit a lexicograpical ordering of te unknowns

10 List of Tables 2.1 Computational cost of a generic cycle of FGMRES(m), GMRES-DR(m, k) and FGMRES- DR(m, k) Storage required for GMRES-DR(m, k) and FGMRES-DR(m, k) Performance of FGMRES(m) and FGMRES-DR(m,k) to satisfy te convergence tresold (2.18); Mv is te total number of matrix vector products, dot te total number of dot products and r ops and r mem are te ratios of floating point operations and memory respectively were te reference metod is full FGMRES (see Equation (2.19)) Cost of te block Arnoldi and te classical Arnoldi process according to te matrix dimension n, its number of non-zero elements nnz(a), te Krylov subspace restart parameter m and te number of rigt-and sides p Storage required for BFGMRES(m), BFGMRESD(m) and BFGMREST(m,p f ) considering a block size p and a problem dimension n Number of iterations (It) and operation ratio (r ops ) for te Poisson problem for p canonical basis rigt-and sides Number of iterations (It) and operation ratio (r ops ) for te Poisson problem for p random rigt-and sides Number of iterations (It) and operation ratio (r ops ) for te convection-diffusion problem for p random rigt-and sides Smooting factors µ loc ((S (Jac(ω r)) ) ν ) of te Jacobi smooter S (Jac(ω r)), ω r = 0.3 for two values of ν and four grid sizes considering te sifted 3D Helmoltz operator (β = 0.6) for a wavenumber k = π Smooting factors µ loc ((S (Jac(ω r)) ) ν ) of te Jacobi smooter S (Jac(ω r)), ω r = 0.8 for two values of ν and four grid sizes considering te original 3D Helmoltz operator (β = 0) for a wavenumber k = π. Smooting factors larger tan one are indicated in brackets (GS f orw) 3.3 Smooting factors µ loc ((S ) ν (GS f orw) ) of te Gauss-Seidel-lex smooter S for two values of ν and four grid sizes considering te sifted 3D Helmoltz operator (β = 0.6) for a wavenumber k = π. Smooting factors larger tan one are indicated in brackets (GS f orw) 3.4 Smooting factors µ loc ((S ) ν (GS f orw) ) of te Gauss-Seidel-lex smooter S for two values of ν and four grid sizes considering te original 3D Helmoltz operator (β = 0) for a wavenumber k = π. Smooting factors larger tan one are indicated in brackets Teoretical estimation of te convergence factor ( ρ(t )) and experimental convergence factors ρ Exp (T ) for several coarse tolerances ε Number of iterations needed to reac 10 6 for FGMRES(5) preconditioned by a two-grid cycle considering several smooters and grids (1/ 3 ) at wavenumbers k = π Number of iterations (It) of FGMRES(5) wit respect to te coarse problem normalized tolerance (ε 2 ) for wavenumbers k = π Number of iterations of FGMRES(5) required to reac 10 6 performing 100 iterations of preconditioned GMRES(10) on te coarse level at eac iteration of FGMRES(5) for two wavenumbers ix

11 x 4.1 Two-grid preconditioned Flexible GMRES(5) for te solution of te Helmoltz equation for te omogeneous model problem wit wavenumber k suc tat k = π/6. Te results are sown for bot single precision (sp) and double precision (dp) aritmetic Two-grid preconditioned Flexible GMRES(5) for te solution of te Helmoltz equation for te omogeneous model problem wit wavenumber k suc tat k = π/6. τ = T re f T is a scaled speed-up were T, P denote te elapsed time and number of cores on a given experiment respectively Two-grid preconditioned Flexible GMRES(5) for te solution of te Helmoltz equation for te omogeneous model problem wit wavenumber k suc tat k = π/ Two-grid preconditioned Flexible GMRES(5) for te solution of te Helmoltz equation for te omogeneous model problem wit wavenumber k suc tat k = π/6. τ = T re f T is a scaled speed-up were T, P denote te elapsed time and corresponding number of cores on a given experiment respectively Two-grid preconditioned Flexible GMRES(5) for te solution of te Helmoltz equation for te SEG/EAGE Salt dome model wit mes grid size suc tat = min (x,y,z) Ω v(x, y, z)/(12 f ). Te parameter T denotes te total computational time, It te number of preconditioner applications and M te requested memory Two-grid preconditioned Flexible GMRES(5) for te solution of te Helmoltz equation for te SEG/EAGE Salt dome model wit mes grid size suc tat = min (x,y,z) Ω v(x, y, z)/(12 f ). Te parameter T denotes te total computational time, It te number of preconditioner applications and M te memory. τ = T re f T is a scaled speed-up were T, P denote te elapsed time and corresponding number of cores on a given experiment respectively Two-grid preconditioned Flexible GMRES(5) performing 200 coarse iterations per cycle for te solution of te Helmoltz equation for te SEG/EAGE Salt dome model wit mes grid size suc tat = min (x,y,z) Ω v(x, y, z)/(12 f ). Te parameter T denotes te total computational time, It te number of preconditioner applications and M te memory. τ = T re f T / P P re f / P P re f is a scaled speed-up were T, P denote te elapsed time and corresponding number of cores on a given experiment respectively Two-grid preconditioned Flexible GMRES(5) for te solution of te Helmoltz equation for te SEG/EAGE Overtrust model wit mes grid size suc tat = min (x,y,z) Ω v(x, y, z)/(12 f ). Te parameter T denotes te total computational time, It te number of preconditioner applications and M te requested memory Two-grid preconditioned Flexible GMRES(5) for te solution of te Helmoltz equation for te SEG/EAGE Overtrust model wit mes grid size suc tat = min (x,y,z) Ω v(x, y, z)/(12 f ). Te parameter T denotes te total computational time, It te number of preconditioner applications and M te memory. τ = T re f T is a scaled speed-up were T denotes a computational time Perturbed two-grid preconditioned block metods for te solution of te Helmoltz equation for te SEG/EAGE Salt dome model and f = 2.5 Hz ( = 50 m), wit 8, 16 and 32 rigtand sides at once. Te parameter T denotes te total computational time, It te number of preconditioner applications and M te requested memory Perturbed two-grid preconditioned block metods for te solution of te Helmoltz equation for te SEG/EAGE Salt dome model and f = 5 Hz ( = 25 m), wit 8, 16 and 32 rigtand sides at once. Te parameter T denotes te total computational time, It te number of preconditioner applications and M te requested memory Perturbed two-grid preconditioned block metods for te solution of te Helmoltz equation for te SEG/EAGE Salt dome model and f = 10 Hz ( = 12.5 m), wit 8, 16 and 32 rigtand sides at once. Te parameter T denotes te total computational time, It te number of preconditioner applications and M te requested memory Different strategies using BFGMREST preconditioned by a perturbed two-grid metod in order to solve te Helmoltz equation for te SEG/EAGE Salt dome model wit 16 rigtand sides at tree frequencies. Te parameter p denotes te number of rigt-and side taken at once, # runs te number of times BFGMREST is used, T te computational time, It te number of iterations and M te requested memory / P P re f / P P re f / P P re f

12 xi 4.14 Perturbed two-grid preconditioned block metods for te solution of te Helmoltz equation for te SEG/EAGE Overtrust model and f = 3.64 Hz ( = 50 m), wit 4, 8 and 16 rigtand sides at once. Te parameter T denotes te total computational time, It te number of preconditioner applications and M te requested memory Perturbed two-grid preconditioned block metods for te solution of te Helmoltz equation for te SEG/EAGE Overtrust model and f = 7.27 Hz ( = 25 m), wit 4, 8 and 16 rigtand sides at once. Te parameter T denotes te total computational time, It te number of preconditioner applications and M te requested memory Perturbed two-grid preconditioned block metods for te solution of te Helmoltz equation for te SEG/EAGE Overtrust model and f = Hz ( = 12.5 m), wit 4, 8 and 16 rigtand sides at once. Te parameter T denotes te total computational time, It te number of preconditioner applications and M te requested memory Different strategies using BFGMREST preconditioned by a perturbed two-grid metod in order to solve te Helmoltz equation for te SEG/EAGE Overtrust model wit 8 rigtand sides at tree frequencies. Te parameter p denotes te number of rigt-and side taken at once, # runs te number of times BFGMREST is launced, T te computational time, It te number of preconditioner applications and M te requested memory A.1 Wavenumbers corresponding to = 1 2 p, p N for adimensional model problems A.2 Grid sizes for different frequencies suc tat tey verify te stability condition (A.3) for te SEG/EAGE Salt dome velocity field wit minimum velocity 1500 m.s 1 and size km 3, taking 16 points in te PML layer on eac side of te pysical domain

13 xii

14 List of Algoritms 1 Restarted GMRES (GMRES(m)) Arnoldi process wit Modified Gram-Scmidt (MGS): computation of V m+1 and H m Flexible GMRES (FGMRES(m)) Flexible Arnoldi process: computation of V m+1, Z m and H m Rigt-preconditioned GMRES wit deflated restarting: GMRES-DR(m, k) GMRES-DR(m, k): computation of V new k+1 and H new k Flexible GMRES wit deflated restarting: FGMRES-DR(m, k) FGMRES-DR(m, k): computation of V new k+1, Znew k and H new k Flexible block Arnoldi process (MGS implementation): computation of V j+1, Z j and H j for j m Block Flexible GMRES (BFGMRES(m)) Block Flexible GMRES wit SVD based deflation (BFGMRESD(m)) Block Flexible GMRES wit SVD based truncation (BFGMREST(m,p f )) Two-grid cycle TG(L, u, b ) Multigrid cycle MG(L, u, b ) Perturbed two-grid cycle to solve L u = b Perturbed two-grid cycle to solve approximately L z = v xiii

15 xiv

16 Capter 1 Introduction Te target industrial application of tis PD tesis is related to te solution of wave propagation problems in seismics [24]. At a given frequency, a source is triggered at a certain position on te Eart s surface. As a consequence, a pressure wave propagates from te source. Wen a wave encounters discontinuities, it is scattered and propagated back to te surface. Te pressure field is ten recorded at several receiver locations located on te Eart s surface. Tis experimental process is repeated over a given range of frequencies and wit multiple source locations. Te main aim of te numerical simulation is tus to reproduce tese wave propagation penomena occurring in eterogeneous media. Tis leads to an interpretative map of te subsoil tat elps to detect bot te location and te tickness of te reflecting layers. Te resulting frequency-domain problem is ten solved using efficient solvers, able to take benefit of te structure of te system on modern parallel arcitectures. Afterward, an inverse Fast Fourier Transform is employed to obtain te time-domain solution from te set of frequency-domain solutions. Tis time-domain solution is of great importance in oil exploration for predicting correctly te structure of te subsurface. In tis tesis, te wave propagation is modeled by te Helmoltz equation, u k 2 u = s, were u denotes te wave pressure, k te wavenumber and s a given source term. Absorbing boundary conditions are used to simulate an infinite domain and to limit spurious reflections. A key point for an efficient migration tus relies on a robust and fast solution metod for te eterogeneous Helmoltz problem at ig wavenumbers wit multiple sources. For eac considered frequency, te discretization of te Helmoltz operator by finite difference or finite element tecniques leads to a linear system of equations of te following type AX = B, were A C n n is a square matrix wic is sparse, usually non-hermitian, non-symmetric, large and indefinite at ig wavenumbers, X C n p, B C n p were p is te number of sources. Tese large and indefinite linear systems can be andled very efficiently up to a certain point by sparse direct metods [29, 30] (e.g. sparse Gaussian elimination LU-factorization). In te indefinite nonsymmetric case, pre-processing (permutation and scaling) can be performed before te factorization pase to minimize te fill-in and improve te accuracy of te factorization e.g. obtaining matrices wit a zero-free diagonal [34]. In te two-dimensional case, tese metods ave proved efficient [65] since tey enable bot te solution of linear systems to macine precision and te reuse of te LU-factorization in multi-source situations. However, teir memory requirement greatly increases wit te size of te problem, compromising teir use on a parallel distributed memory computer for large tree-dimensional problems. In [89], te autors ave used MUMPS [2, 3, 4] to solve tree-dimensional Helmoltz problems formulated wit a compact 27 point stencil discretization sceme. In fact tey ave reported tat te memory complexity of te LU factorization is O(35n 4/3 ), te number of floating-point operations during te factorization pase is O(n 2 ) and te computational complexity of te solution pase O(n 4/3 ). Despite tis computational cost, te approac was used wit success wen solving a Helmoltz problem at 10 Hz on te SEG/EAGE Overtrust model [5] considering a grid and allocating 450 GB of memory. To alleviate te memory constraint, iterative metods can be considered. One of te key points becomes ten te design of an efficient preconditioner to obtain a fast convergence. Similarly te coice of te Krylov 1

17 2 metod [101], especially in te multiple rigt-and sides situation, as to be addressed. First, we describe some preconditioning tecniques described in te literature for Helmoltz problems. Incomplete factorizations (ILU) [8] are popular preconditioning tecniques, tat may owever lead to unstable, igly ill-conditioned incomplete factors in te indefinite case. Some remedies ave been proposed to manage tese issues wen considering Helmoltz problems. In [52] a specific factorization is designed tat aims at performing an analytic incomplete factorization (AILU); tis approac is yet difficult to extend to te eterogeneous case. We also note tat incomplete LU factorization wit tresold (ILUT [100]) is recommended in [68] for a finite element discretization of te Helmoltz operator (Galerkin Least Square (GLS)). Finally, an oter approac consists in performing an incomplete factorization of a complex sifted Helmoltz operator as a preconditioner for te original Helmoltz problem [80, 90] (see Equation 1.1 and details ereafter). However, te convergence of ILU preconditioned Krylov metods is found to be generally slow at ig wavenumbers and storing te ILU factors may not be always affordable. Furtermore it is recognized tat ILU metods are difficult to parallelize [8, 63]. Anoter important class of preconditioners relies on domain decomposition tecniques [94, 111, 114]. Tese metods solve te original problem by splitting te pysical domain into smaller subdomains were te solution of te local problems is affordable wit direct metods. For elliptic definite problems, teir convergence rate becomes independent of te number of subdomains if a coarse space correction is included. Due to teir indefinitness at ig wavenumbers, Helmoltz type problems are callenging for domain decomposition preconditioners for two main reasons. First in order to be effective, a rater fine coarse space as to be considered. Consequently tis leads to large coarse problems. Secondly local Diriclet or Neumann problems may be close to singular. We refer te reader to Section (11.5.2) in [114] for furter comments and references. Wen nonoverlapping domain decompositions metods are considered, it is advocated to use Sommerfeldlike conditions on te subdomain boundaries to obtain well-posed local problems [10, 26, 51]. An efficient domain decomposition preconditioner for indefinite Helmoltz problem is FETI-H [46] were an auxiliary coarse problem based on plane waves is considered. Tis approac as been improved in [44, 45] introducing a dual primal variant of FETI-H (FETI-DPH) and allows to solve Helmoltz scattering problems at middle-range frequencies on a large number of cores [44]. To te best of our knowledge te most recent teoretical result related to domain decomposition preconditioners for omogeneous Helmoltz problems wit Diriclet boundary conditions (discretized wit standard finite element tecniques) is due to Li and Tu [76]. A bound for te condition number of te preconditioned operator A M 1 as been proven for te case of exact local solvers: ( ( H )) 2 κ(a M 1 ) C(1 + k 2 )(1 + k 2 H 2 ) 1 + log. were C is a positive constant independent of te diameter element and te maximal diameter of te subdomains H. Consequently κ(a M 1 ) is found to grow like k 4. Obviously tis is a major drawback wen considering ig wavenumbers. Recently an algebraic formulation as been proposed for Helmoltz problems in [62, 120]. It consists in an algebraic additive Scwarz preconditioner and enables to solve problems for frequencies up to 12 Hz in a reasonable time on real-life velocity model (SEG/EAGE Saltdom) on 2000 BlueGene/P processors 1. However a drawback of tis metod is its ig memory cost. Multigrid metods [15, 20, 61, 115] can also be used as a preconditioner for Helmoltz problems. Neverteless tey also encounter difficulties to cope wit suc indefinite problems. Regarding Helmoltz problems, classical multigrid ingredients suc as standard smooting and coarse grid correction are found ineffective [7, 19, 37, 42, 70]. First, smooters cannot smoot error components on te intermediate grids. Second, te wavenumber k in te discrete Helmoltz operator makes its approximations poor on coarse meses, te effect of te coarse grid correction being ten deteriorated. In [31, 37, 42, 70, 78], strategies ave been proposed to adapt te multigrid tecnique to te solution of Helmoltz problems. A first strategy consists of te use of few grids in te ierarcy of te multigrid preconditioner [31, 37, 70] suc tat te grid approximation is effective on te considered grids. If more tan two grids are considered, non-standard smooters (Krylov based suc as GMRES [102]) on te coarser levels sould be used to alleviate te weakness of standard smooters on intermediate grids [37]. However, in tree 1 ttp://

18 3 dimensions, a reduced number of grids in te multigrid ierarcy could lead to a coarse problem wose factorization is proibitive in terms of computational resources. A second approac is to solve Helmoltz problems wit a wave-ray multigrid algoritm [77]. Tese metods are based on two representations of te error on te coarse grids of te ierarcy. Tese representations enable ten bot te smooter and coarse grid corrections to be efficient. Tis metod performs well in te omogeneous case [74, 78, 118] but, in te eterogeneous case, ray functions must be computed. It implies to solve large eigenvalue problems [122, 123] tat may be expensive in terms of computational resources. Lately a tird multigrid preconditioner - considered as a significant breaktroug - as been proposed in [42, 43], it is not directly applied to te discrete Helmoltz operator but to a complex sifted one defined as: u (1 iβ)k 2 u (1.1) were β denotes te sift parameter. Tis sift parameter makes standard multigrid efficient on te preconditioning problem [42]. Tis solution metod as proved efficient for relatively ig wavenumbers considering bot omogeneous and eterogeneous problems [42, 95, 96]. However te complexity of te metod remains ig (see [95] in two dimensions and [96] in tree dimensions respectively). More recently, an algebraic multi-level preconditioner based on tis sifted approac as been proposed in [14]. An incomplete LDL T factorization is performed on eac level of te multi-level ierarcy taking advantages of modern direct metods for sparse symmetric indefinite matrices [35, 103]. Tis metod as sown efficient to improve te convergence of Krylov metods for bot two-dimensional and tree-dimensional eterogeneous problems but its complexity is still relatively ig. Yet tis class of multi-level preconditioners raises te question of te determination of te sift parameter β. Indeed it is depending on te multilevel components [14, 42] and of course on te discretization of te Helmoltz operator [116]. Terefore, te coice of te sift parameter is not obvious and often relies on extensive numerical experiments and/or on a Fourier analysis [15]. Te coice of a sift parameter can be avoided if a two-grid preconditioner is applied to te original Helmoltz discrete operator [31] were a sparse direct metod is employed for te coarse solution pase of te two-grid algoritm. As said before, te computational cost of a LU-factorization in tree dimensions, even on te coarse grid, is too severe. Consequently, an iterative metod seems to be te natural coice for solving te coarse grid problem. Tus, in tis tesis, we consider a perturbed two-grid preconditioner applied to te original Helmoltz operator were te coarse problem is solved only approximately. Te efficiency of suc a preconditioner relies on bot its monitorable computational memory requirements and its good preconditioning properties wen using a really large convergence tresold on te coarse grid. Tis last point will be analyzed in te Fourier analysis framework and illustrated bot by numerical experiments and a spectrum analysis. Moreover we advocate te use of a preconditioned Krylov metod on te coarse level of te two-level metod. Tis leads us to te coice of te flexible GMRES (FGMRES [99]) as an outer Krylov metod. Indeed te two-level preconditioner varies from one iteration to te next. In tis work, we ave extended GMRES wit deflated restarting [85] to te flexible case (FGMRES-DR [53]). Tis metod as sown efficient for two-dimensional Helmoltz problems wit Diriclet boundary conditions but relatively less efficient for absorbing boundary conditions. Anoter callenging issue in te geopysics application is an efficient treatment of multiple sources (up to few tousands). Te design of efficient block Krylov metods to process several sources at once is ten of crucial interest. In tis tesis, starting from existing references related to block GMRES metods [59, 72, 73, 79, 97, 112, 121], we ave developed efficient variants of Block Flexible GMRES (BFGMRES) implementing te deflation of te block residual at te restart: Block Flexible GMRES wit SVD based Deflation (BFGMRESD) and Block Flexible GMRES wit SVD based Truncation (BFGMREST). Bot metods perform a SVD of te block residual (R = UΣW H [54]) at eac restart. Te BFGMRESD metod uses as an initial block vector at eac restart te singular vectors corresponding to te largest singular values as defined by a tresold wereas BFGMREST keeps as an initial block residual a fixed number of singular vectors corresponding to te largest singular values. Finally, all tese metods ave been evaluated in a parallel distributed memory environment. Extensive numerical experiments ave sown te robustness and efficiency of te perturbed two-level preconditioner bot on omogeneous and eterogeneous problems on tousands of cores. Moreover, te interest for

19 4 BFGMRESD and BFGMREST clearly appears wen multiple rigt-and sides ave been considered. Te outline of te tesis is tus as follows: In Capter 2, Krylov metods for bot single and multiple rigt-and sides situation are presented. First, a brief description of GMRES and Flexible GMRES (FGMRES) is given, introducing a spectrum analysis tool in te FGMRES context. Ten te flexible GMRES metod wit spectral deflation at te restart is introduced. Finally, block flexible Krylov metods are presented. We describe some strategies to take advantage of te multiple rigt-and sides context: deflation of te residual (computation of te numerical rank of te block residual at eac restart) and truncation of te residual (use of a part of te block residual to compute te block solution corresponding to te wole block residual). In Capter 3, we focus on multi-level metods used as a preconditioner for tree-dimensional Helmoltz problems. First, basic elements on tree-dimensional geometric multigrid are introduced. Ten, a Fourier analysis is described for tree-dimensional Helmoltz problems. A smooting analysis is performed for bot original and sifted Helmoltz operators. It is followed by an analysis of a twolevel cycle were a preconditioned Krylov metod is used on te coarse level. Tis analysis sows tat te convergence factor of a two-grid metod is nearly te same weter te coarse solution is exact or weter te coarse problem is solved witin a rater large convergence tresold. Tis beavior is numerically confirmed using a perturbed two-level metod as a preconditioner. Finally a spectrum analysis is included to sow te evolution of te spectrum of te preconditioned Helmoltz operator according to several coarse tolerances. In Capter 4, numerical experiments on parallel distributed memory computers are presented. First tree-dimensional omogeneous Helmoltz problems are considered. Using te perturbed two-level metod described in Capter 3 as a preconditioner for FGMRES, a strong scalability property is obtained (growing numbers of cores for a fixed problem size) wit experiments up to 65, 536 cores. Concerning te weak scalability (te number of cores is growing linearly wit te size of te problem), te number of iterations of te metod is found to grow linearly wit te frequency parameter up to Ten, eterogeneous problems are considered. Two public domain velocity fields are considered, te SEG/EAGE Salt dome and te SEG/EAGE Overtrust. Te two-level preconditioner is found efficient for eterogeneous problems even if it does not scale as well as in te omogeneous case for a large number of cores (more tan 2048). Finally, we present numerical results in te multiple rigt-and side context for eterogeneous problems. We sow tat, using block metods presented in Capter 2 in combination wit te two-level preconditioner can greatly improve te overall number of iterations required wen solving te multiple rigt-and side problems.

20 Capter 2 Krylov subspace metods 2.1 Introduction In tis section we focus on a class of iterative metods called Krylov subspace metods for solving linear systems of te following type: Ax = b, A C n n, b, x C n were A is complex, non-symmetric, non-hermitian, sparse and non-singular. Of course, te most robust way to solve linear systems is to use direct metods. For tis class of problem, tey consist in performing an LU-factorization of te matrix and forward backward substitutions to obtain te solution of te linear system. Once te LU factorization is obtained, tey enable to solve easily several linear systems involving te same matrix (multiple rigt-and sides situation). However, a direct metod may need important computational resources. Indeed, te LU factors must be stored and tey are less sparse tan te matrix A in general [29, 30]. Furtermore, it as a computational cost of O(n (4/3) ) for a Laplacian like operator. Iterative metods can remedy tese drawbacks, teir memory requirement is generally low and can be controlled; matrix-vectors and dot products are teir dominant operations in flops. Teir principle is to look for te solution in a Krylov subspace. Krylov subspaces, denoted by K m (A, r 0 ), are vector subspaces of C n spanned by monomials of A applied to te initial residual vector r 0 = b Ax 0 were x 0 is te initial solution guess: K m (A, r 0 ) = span { r 0, Ar 0, A 2 r 0,..., A m 1 r 0 }. Te parameter m is ten an upper bound for te dimension of te space K m (A, r 0 ) since it is generated by m vectors. Te most popular Krylov metods for te non-hermitian case are BiCGSTAB [117], GMRES [102] and QMR [50]. We are focusing on te GMRES (General Minimum RESidual) family of metods. In te first alf of tis capter, GMRES metods for a single rigt-and side will be presented. First, classical GMREStype metods are depicted: restarted GMRES [102] wit and witout preconditioning, FGMRES (Flexible GMRES) [99]. Ten, metods implementing spectral deflation at te restart (deflated restarting): GMRES wit deflated restarting [85] (GMRES-DR), and its flexible variant: FGMRES-DR [53]. Te second alf of tis capter will be devoted to block Krylov metods, i.e. Krylov metods for multiple rigt-and side problems. First a state of te art bibliograpical description will be proposed. Ten, block Flexible GMRES (BFGMRES) will be introduced followed by two metods tat implement residual deflation (BFGMRESD) and residual truncation (BFGMREST) respectively Notations We denote by. te Euclidean norm, I k C k k te identity matrix of dimension k and 0 i j C i j te zero rectangular matrix wit i rows and j columns. Te operator T denotes te transpose operation, wereas H represents te Hermitian transpose operation. Given a vector d C k wit components d i, D = diag(d 1,, d k ) is te diagonal matrix D C k k suc tat D ii = d i. Given a matrix Q we denote by q j its j t column. Te vector e m C m denotes te m-t canonical vector of C m. Finally, we denote by λ(a) te spectrum of te matrix A. Regarding te algoritmic part, we adopt Matlab-like notations in te presentation. For instance Q(i, j) denotes te entry of matrix Q and Q(1 : m, 1 : j) refers to te submatrix made of te m first rows and first j columns of Q. 5

21 6 2.2 General Minimum RESidual (GMRES) Tis metod consists in finding te solution in te space x 0 + K m (A, r 0 ) minimizing te two-norm of te residual b Ax, were x is te solution. It can be formulated as follows: Find x m suc tat it minimizes min b Ax. x x 0 +K m (A,r 0 ) x m x 0 + K m (A, r 0 ) can be written as any vector x m, x m = x 0 + V m y m were y m is a vector of dimension m and V m is a unitary n m-matrix wose columns span K m (A, r 0 ). Te matrix V m results from te ortogonalization of a basis of K m (A, r 0 ). Tis ortogonalization is usually made wit an Arnoldi process using Modified Gram-Scmidt (MGS), so tat V m satisfies te Arnoldi relation: AV m = V m+1 H m, (2.1) were H m C (m+1) m is a Hessenberg matrix containing te ortogonalization coefficients. Writing te residual b Ax wit tese matrices leads to, denoting by β = r 0 : As V m+1 is unitary, te residual norm is tus: b Ax m = b A(x 0 + V m y m ) = r 0 AV m y m (2.2) = βv 1 V m+1 H m y m = V m+1 (βe 1 H m y m ). (2.3) J(y m ) = b Ax m = b A(x 0 + V m y m ) = βe 1 H m y m. Minimizing b Ax for x C n is tus equivalent to minimize βe 1 H m y for y C m. Terefore, te GMRES algoritm can be divided into two parts: 1. Ortogonalization (Arnoldi process) of te basis for K m wic yields H m and V m. 2. Minimization of βe 1 H m y. Its minimizer y m is ten used to compute x m by x m = x 0 + V m y m. In practice, te computational and memory costs of GMRES are increasing wit m. Indeed, te computational cost is O(m 2 n) because of te Arnoldi process and te memory cost is O(mn), wic may be proibitive for large m (te dimension n of te problem is fixed). To remedy tese problems, restarting GMRES after a few iterations can be a satisfactory solution: GMRES is restarted again after m iterations wit x m replacing te initial guess x 0. Te weakness of tis metod is tat convergence is not as easily caracterized as e. g. in te case of full GMRES were a Krylov space of dimension n (matrix dimension) contains te solution A 1 b [126]. Algoritm 1 describes te classical restarted GMRES algoritm. Remark 1. In Algoritm 1 line 7, te convergence is verified on te Arnoldi residual norm ( c H j y j ) normalized by te norm of te rigt-and side. However, c H j y j can be different from te norm of te true residual b Ax j. Furtermore, te convergence sould rater be cecked on te backward error b Ax j, to scale te matrix and rigt-and side entries and so to insure te convergence up to a b + A x j tresold tol wit macine precision ψ. Neverteless, in [28], te autors sow tat te backward stability of GMRES using MGS is verified at eac step if te matrix is real and if its smallest singular value is muc larger tan n 2 ψ A F. Tese last results ave led us to consider a convergence criterion based on te Arnoldi s residual Restarted GMRES wit rigt preconditioning Te convergence of restarted GMRES is not guaranteed in general (unless Re(x T Ax) > 0, x 0) but can opefully be improved wit preconditioning. Preconditioning consists in improving te numerical properties of te matrix. Some desirable properties satisfied by te preconditioning matrix, denoted by M, are given in [101]: it as to approximate te original matrix, it as to be non-singular and solving te linear system Mx = b sould not be too expensive. In restarted GMRES wit rigt-preconditioning, te preconditioning pase appears bot in te matrix vector product needed by te Arnoldi process and in te computation of te solution. In order to obtain te rigt-preconditioned variant of restarted GMRES, line 2 of Algoritm 2 is replaced by w = AM 1 v j ; and line 8 and 11 of Algoritm 1 by x j = x 0 + M 1 V j y j ;

22 7 Algoritm 1 Restarted GMRES (GMRES(m)) 1: Coose m > 0, itermax > 0, tol > 0, x 0 C n. Let r 0 = b Ax 0, β = r 0, c = [β, 0 1 m ] T were c C m+1, v 1 = r 0 /β. 2: for iter = 1, itermax do 3: Set β = r 0, c = [β, 0 1 m ] T and v 1 = r 0 /β. 4: for j = 1, m do 5: Completion of V j+1 and H j : apply Algoritm 2 from line 2 to 8 to obtain V j+1 C n ( j+1) and te upper Hessenberg matrix H j C ( j+1) j suc tat: AV j = V j+1 H j wit V H j+1 V j+1 = I m+1. 6: Compute y j = argmin y C j βe 1 H j y ; 7: if c H j y j / b tol ten 8: x j = x 0 + V j y j ; stop; 9: end if 10: end for 11: Compute x m = x 0 + V m y m ; 12: Set x 0 = x m, r 0 = b Ax 0 ; 13: Return to line 2. 14: end for Algoritm 2 Arnoldi process wit Modified Gram-Scmidt (MGS): computation of V m+1 and H m 1: for j = 1, m do 2: w = Av j 3: for i = 1, j do 4: i, j = w H v i 5: w = w i, j v i 6: end for 7: i+1, j = w, v j+1 = w/ i+1, j 8: Define V j+1 = [v 1,, v j+1 ], H j = { i,l } 1 i j+1,1 l j 9: end for and x m = x 0 + M 1 V m y m. In fact, rigt preconditioned restarted GMRES is equivalent to solve te linear system (AM 1 )t = b wit a restarted GMRES and to compute te solution x of te original system Ax = b via x = M 1 t. Wen preconditioning is considered, at eac iteration, M 1 v j is computed, wic is equivalent to compute te solution z j of te linear system Mz j = v j. Preconditioners can be divided into two classes: explicit and implicit preconditioners. For explicit preconditioners, te preconditioning matrix M is built. Diagonal preconditioning, M = diag(a), incomplete LU factorization (ILU) [8], M = L inc U inc, and domain decomposition tecniques wit exact local solvers [94, 111, 114] are suc preconditioners. Implicit preconditioners are solution metods aiming at solving approximately Az j = v j. M is never explicitly formed but as to be non-variable to be used in GMRES wit rigt-preconditioning. Iterative metods like relaxation metods and standard multigrid [115] are suc preconditioners. One may want to use GMRES itself to precondition restarted GMRES; tis is not possible wit rigtpreconditioned GMRES. Indeed, te GMRES solution x m is not depending linearly on te rigt-and side b (unlike standard multigrid for instance) except if x m satisfies Ax m = b. As a consequence, te solution cannot be computed as in line 11 of Algoritm 1. However, various metods ave been developed to use variable operators as a preconditioner: tis is te class of flexible Krylov metods. Tese metods allow to use a different preconditioner at eac preconditioning step. Te next section describes one of tese flexible metods: te flexible variant of GMRES (FGMRES).

23 8 2.3 Flexible GMRES FGMRES is a minimum residual norm subspace metod based on te GMRES approac tat allows variable preconditioning [99]. We denote by M j te non singular matrix tat represents te preconditioner at step j of te metod. Algoritm 3 depicts te FGMRES(m) metod. Starting from an initial guess x 0, it is based on te flexible Arnoldi relation wit Z m C n m, V m+1 C n (m+1) and te upper Hessenberg matrix H m C (m+1) m defined below: Definition 1. Te matrices computed wit te FGMRES algoritm [99] satisfy te so-called Flexible Arnoldi relation: AZ j = V j+1 H j were Z j C n j, V j+1 C n ( j+1) suc tat V H j+1 V j+1 = I j+1 and H j C ( j+1) j. FGMRES computes an approximation of te solution in a j-dimensional affine space x 0 + Z j y j were y j C j. An approximate solution x m C n is ten found by minimizing te residual norm b A(x 0 + Z m y) over te space x 0 + range(z m ), te corresponding residual being r m = b Ax m C n wit r m range(v m+1 ). A similar relation as in GMRES (relation 2.2) is obtained. However, it as to be noted tat, on one and, FGMRES(m) as a greater memory cost tan GMRES(m): te preconditioning solutions must be stored in Z m ; i.e., m additional vectors of lengt n ave to be stored. On te oter and, convergence results related to GMRES cannot be extended to FGMRES since te subspace range(z m ) is a subspace wic is not generated by a single fixed matrix. Neverteless, a breakdown analysis of FGMRES can be found in [99]. Tese last considerations lead us to develop a practical tool to obtain a better understanding of te convergence of FGMRES based on a spectrum analysis; tis is te topic of te next section. Algoritm 3 Flexible GMRES (FGMRES(m)) 1: Coose m > 0, itermax > 0, tol > 0, x 0 C n. Let r 0 = b Ax 0, β = r 0, c = [β, 0 1 m ] T were c C m+1, v 1 = r 0 /β. 2: for iter = 1, itermax do 3: Set β = r 0, c = [β, 0 1 m ] T and v 1 = r 0 /β. 4: for j = 1, m do 5: Completion of V j+1, Z j and H j : Apply Algoritm 4 from line 2 to 8 wit preconditioning to obtain V j+1 C n ( j+1), Z j C n j and te upper Hessenberg matrix H j C ( j+1) j suc tat: AZ j = V j+1 H j wit V H j+1 V j+1 = I m+1. 6: Compute y j = argmin y C j βe 1 H j y ; 7: if c H j y j / b tol ten 8: x j = x 0 + Z j y j ; stop; 9: end if 10: end for 11: Compute x m = x 0 + Z m y m ; 12: Set x 0 = x m, r 0 = b Ax 0 ; 13: Return to line 2. 14: end for Spectrum analysis in te Flexible GMRES metod It is known tat unpreconditioned GMRES(m) converges for any m wen te eigenvalues of te matrix A are lying in a convex set, called te field of values, located in a alf plane of te complex plane [101, Section ]. Tis property can be partly sown by computing approximations of te extremal eigenvalues of A [38] tanks to Ritz (λ(h m )) or armonic Ritz values (λ(h m + 2 m+1,m H H m e T me m )) [55], [9] were H m = H m (1 : m, 1 : m) and λ(h m ) denotes te spectrum of H m. However, in te flexible case, since te Arnoldi relation is AZ m = V m+1 H m (see Definition 1), te Ritz or armonic Ritz values, are ten not corresponding to approximate eigenvalues of A [53].

24 9 Algoritm 4 Flexible Arnoldi process: computation of V m+1, Z m and H m 1: for j = 1, m do 2: z j = M 1 j v j 3: w = Az j 4: for i = 1, j do 5: i, j = w H v i 6: w = w i, j v i 7: end for 8: i+1, j = w, v j+1 = w/ i+1, j 9: Define Z j = [z 1,, z j ], V j+1 = [v 1,, v j+1 ], H j = { i,l } 1 i j+1,1 l j 10: end for Proposition 1. At te end of te restart in FGMRES, te Ritz or armonic Ritz values approximate eigenvalues of a certain matrix A C n n wic can be expressed as: A = AZ m V H m + XV H, were X is a n (n m) matrix and V is a n (n m) matrix wose columns span te ortogonal complement of S pan {V m }. Note: A canges at eac restart. Proof. Indeed, A is satisfying te GMRES Arnoldi relation AV m = ( AZ m V H m + XV H) V m = AZ m, AV m = V m+1 H m, and so, te GMRES metod applied to A produces te same iterates as FGMRES applied to A. Furtermore, we note tat FGMRES does not require V nor X to fro te computation of te solution. We can coose tem appropriately for our convergence analysis. ] Proposition 2. We propose to coose V = X = [v m+1, V in A, were v m+1 is te (m + 1) t column of } V m+1 and S pan {V S pan {V m+1 }. Te spectrum of A, were multiple eigenvalues are not repeated, is te spectrum of [ H m, e m+1 ]. Proof. We ave A = [ AZ m, V ] [ ] Vm H V H = [ V m+1 H m, V ] [ ] Vm H [ [ ] ] [ V H V H = V m+1 H m, e m+1, m+1 V H V ] [ [ ] H [V m+1, V m, e m+1 = 0 (m+1) (n m 1) 0 (n m 1) (m+1) I n m 1 Tus, since H m+1 = [ H m, e m+1 ], we obtain: [ Terefore, A is similar to te matrix ] [ V H m+1 V H A = [ V m, V ] [ ] [ ] H m+1 0 (m+1) (n m 1) V H m 0 (n m 1) (m+1) I n m 1 V H, Te spectrum of A is ten equal to te spectrum of H m+1 : ]. ] H m+1 0 (m+1) (n m 1) 0 (n m 1) (m+1) I n m 1 λ(a) = λ(h m+1 ). ],, because [ V m, V ] is ortonormal.

25 10 Terefore, a spectrum analysis is possible wen considering te matrices A at eac restart. It requires to compute te eigenvalues of H m+1. We propose to compute te eigenvalues of H m+1 at te end of eac restart and display tem on te same plot. Te distribution of tese eigenvalues will enable us to sow some information related to te performance of a given flexible preconditioner. We now experiment te relevance of tis spectrum analysis. Wen considering preconditioners of different quality, FGMRES - for te same restart parameter - may need more or less iterations to converge, depending on te quality of preconditioner [108]. We plan to sow te correlation between te spectrum distributions and te istories of convergence. Te easiest way to generate variable preconditioners of different quality is to use full GMRES wit different prescribed numbers of iterations as an inner solver. We denote by m inner tis number of iterations. and full GMRES wit a Krylov subspace of size m inner : GMRES (m inner ). Terefore, we will use FGMRES (m) preconditioned by GMRES (m inner ), denoted by FGMRES (m)/gmres (m inner ). We will compute te eigenspectra of H m+1 at eac restart of FGMRES for all m inner values. We denote by H (i) m+1 te Hessenberg matrix corresponding to te it restart and by λ(h (i) m+1 (m inner)) its eigenspectrum corresponding to te inner restart parameter m inner. Finally, we denote by Λ(H m+1 (m inner )) te union of all λ(h (i) m+1 (m inner)) for i 1: Λ(H m+1 (m inner )) = i λ(h (i) m+1 (m inner)). Tis represents te spectrum to be analyzed in our study. We will consider one academic test case and one real life test case from te University of Florida matrix collection [32]. For bot test cases, te iterative metod is stopped wen te normalized residual is below 10 6 : b A x j b Example 1: a two-dimensional convection diffusion problem We consider a two-dimensional convection diffusion problem wit Diriclet boundary conditions in te unit square [0, 1] 2 = Ω Ω wit Ω = (0, 1) 2 suc as: ε u + cu x + du y = g in Ω, u = 1 on Ω. (2.4) Tis problem is discretized wit a second-order finite difference sceme for a vertex-centered location of unknowns. Te Péclet condition ([115] equation (7.1.9)) is satisfied: ε max( c, d ) = 2 were = 1 N 1 is te mes size and N te number of points per direction. For te spectrum study, we consider a = grid ( = 1 ) for c = d = 512 and ε = 1. Te matrix as non zero entries, it is real, sparse and 256 non-symmetric. Te rigt-and side is b = A e were e is a vector of ones. We consider as an outer solver FGMRES(5) and five values for m inner : 1, 2, 3, 4, 5. Histories of convergence are plotted in Figure 2.1. Eac symbol on te convergence curves corresponds to one application of te variable preconditioner. We can first notice tat te value of m inner as a direct impact on te preconditioner quality: a large value of m inner implies a smaller number of iterations for FGMRES(5). Ten, looking at te spectrum in Figure 2.2, we remark tat te better te quality of te preconditioner, te larger te minimum value of Λ(H m+1 (m inner )) on te real axis. Terefore, tere is a correlation for tis model problem between te quality of te inner preconditioner and te distribution of Λ(H m+1 (m inner )).

26 11 Example 2: a tree-dimensional Navier-Stokes problem We now consider a matrix from te FIDAP group in te University of Florida collection, te ex11 matrix 1. Tis matrix is real, sparse and non-symmetric. Its dimension is 16, 614 and it as 1, 096, 948 non zero entries. It models a tree-dimensional fully coupled Navier-Stokes problem. As advised in [108], we use a diagonal preconditioner for te inner GMRES. Te rigt-and side is b = A e were e is a vector of ones. We perform te same tests as for te convection-diffusion problem: we consider as an outer solver FGMRES(5) and five values for m inner : 1, 2, 3, 4, 5. Histories of convergence are plotted in Figure 2.3. Increasing m inner tends to decrease significantly te number of iterations. However a different beavior can be observed for m inner = 2. Indeed, altoug FGMRES (5)/GMRES (2) is converging faster tan FGMRES (5)/GMRES (1), before its normalized residual is below , a large plateau appears close to convergence. Suc beaviors ave already been remarked in [39], yet teir analysis as been done for small restart parameters and GMRES witout preconditioning. Our spectrum analysis gives information about tis beavior in a more general framework. Indeed, looking at Figure 2.4, for m inner = 2, te minimum value of Λ(H m+1 (2)) on te real axis is negative ( ), wereas te one related to Λ(H m+1 (1)) is positive ( ). It seems ten tat GMRES applied to a matrix wit a spectrum distribution suc as Λ(H m+1 (2)) would converge slower tan wen it is applied to a matrix wit a spectrum distribution suc as Λ(H m+1 (1)). For te oter values of m inner, we remark tat te better is te quality of te preconditioner, te larger is te minimum value on te real axis of Λ(H m+1 (m inner )) and te fewer are te eigenvalues close to zero. Tus, tis spectrum analysis can give some indication wy a preconditioner could be efficient or not looking at te H m+1 along te restart. Indeed, if te minimal real part value of Λ(H m+1 ) is positive, or if its maximal real part is negative, and far from te origin, te preconditioner may improve te convergence. Notwitstanding, if te spectrum of H m+1 as values wit a negative real part and values wit a positive real part, convergence may be slow even if preconditioning is performed. Tis study points out once again te practical importance of te spectrum distribution for Krylov metods even if a flexible preconditioner is used. However, as for te non flexible case, tis result as to be balanced wit te teoretical result by Greenbaum, Ptak and Strakos [56]: any convergence curve can be generated by GMRES applied to a matrix aving any desired eigenvalues. Neverteless, since te rigt-and side is fixed, it remains an useful tool to understand in more details te convergence of FGMRES. Besides, tis approximate spectral information can be easily computed tanks to te Hessenberg matrix (Ritz, armonic Ritz vectors). It could even be used to improve te convergence properties of FGMRES. Tis as already been realized for GMRES wit GMRES-DR [84] wic preserves spectral information from one restart to te next. Tus, we propose to extend suc a tecnique to te flexible case. Terefore, after depicting te GMRES-DR in te next section, we will present te flexible variant of GMRES-DR metod: FGMRES-DR [53]. 1 ttp://

27 12 Figure 2.1: Histories of convergence for te convection-diffusion problem of FGMRES (5) preconditioned by full GMRES (m inner ) for different values of m inner. Figure 2.2: Plot of Λ(H m+1 (m inner )) wit te convection-diffusion problem, for FGMRES (5) preconditioned by a full GMRES (m inner ) for different values of m inner.

28 13 Figure 2.3: Histories of convergence for te FIDAP-ex11 matrix of FGMRES (5) preconditioned by a diagonal preconditioned full GMRES (m inner ) for different values of m inner. Figure 2.4: Plot of Λ(H m+1 (m inner )) for te FIDAP-ex11 matrix, wit FGMRES (5) preconditioned by a diagonal preconditioned full GMRES (m inner ) for different values of m inner.

29 GMRES wit deflated restarting Krylov subspace metods wit standard restarting implement a sceme were te maximal dimension of te approximation subspace is fixed (m ere). After m steps, te metod is ten restarted, in order to control bot te memory requirements and te computational cost of te ortogonalization sceme of te metod. In te case of GMRES(m) it means in practice tat te ortonormal basis is trown away after m steps. Since some information is discarded at te restart, te convergence is expected to be slower compared to full GMRES. Neverteless more sopisticated procedures ave been proposed to enance convergence properties of restarted Krylov subspace metods. Basically tese metods fall in te category of augmented or deflated metods and we refer te reader to [109, Sections 8 and 9] for a review and detailed references. In tis section we focus on GMRES wit deflated restarting, and more particularly to one of tose metods, referred to as GMRES-DR [84]. Tis metod aims at using spectral information at a restart mainly to improve te convergence of restarted GMRES. A subspace of dimension k (wit k < m) spanned by armonic Ritz vectors (and not only te approximate solution wit minimum residual norm) is retained in tis restarting sceme. Property 1 describes ow tis subspace of dimension k is obtained in GMRES wit deflated restarting, wen a fixed rigt preconditioning matrix noted M is considered. Before introducing te principle of GMRES-DR, we recall te definition of a armonic Ritz pair [91, 110] since tis notion plays an important role wen considering deflated restarting. Definition 2. Harmonic Ritz pair. Consider a subspace U of C n. Given a matrix B C n n, λ C and y U, (λ, y) is a armonic Ritz pair of B wit respect to U if and only if or equivalently, for te canonical scalar product, By λ y B U w range(b U) w H (By λ y) = 0. We call y a armonic Ritz vector associated wit te armonic Ritz value λ. Property 1. GMRES wit deflated restarting relies on te computation of k armonic Ritz vectors Y k = V m G k of AM 1 V m V H m wit respect to range(v m ) wit Y k C n k and G k C m k. Proof. Let us denote Y k = [y 1,..., y k ] and G k = [g 1,..., g k ]. Since y j = V m g j is a armonic Ritz vector of AM 1 V m V H m wit respect to range(v m ), te following relation olds (see Definition 2) wic is equivalent to (AM 1 V m V H m V m ) H (AM 1 V m V H m y j λ j y j ) = 0 (2.5) (AM 1 V m ) H (AM 1 V m g j λ j V m g j ) = 0. (2.6) Tanks to te Arnoldi relation AM 1 V m = V m+1 H m we deduce ( ) H m H H g m g j λ j H m H j 0 = 0. (2.7) Since H m C (m+1) m as te following form [ ] H H m = m m+1,m e T m were H m C m m is supposed to be non-singular, te eigenvalue problem becomes ten (H m + m+1,m 2 H H m e m e T m)g j λ j g j = 0 (2.8) wic corresponds to te formulation originally proposed by Morgan [84].

30 15 Next, te QR factorization of te following (m + 1) (k + 1) matrix [[ ] ] [[ ] ] Gk V 0 m+1 H r Gk 0 = c H 1 k 0 m y wit r 0 = V m+1 (c H m y ) 1 k is performed were c C m+1 and y C m. Tis allows to compute new matrices V new k+1 Cn (k+1) and C (k+1) k suc tat H new k AM 1 V new k = V new k+1 H new k, V newh k+1 V new k+1 = I k+1, range([y k, r 0 ]) = range(v new k+1 ) were H new k is a (k + 1) k rectangular matrix. GMRES-DR ten carries out m k Arnoldi steps wit fixed preconditioning and starting vector v new k+1 to eventually build V m+1 and H m. At te end of te GMRES cycle wit deflated restarting we ave a final relation similar to te Arnoldi relation (2.1) wit V m+1 C n (m+1) and H m C (m+1) m AM 1 V m = V m+1 H m wit Vm+1 H V m+1 = I m+1 were H m is no longer upper Hessenberg after te first cycle. An approximate solution x m C n is ten found by minimizing te residual norm b A(x 0 + M 1 V m y) over te space x 0 + M 1 range(v m ), te corresponding residual being r m = b Ax m C n wit r m range(v m+1 ). An optimality property is tus also obtained. We refer te reader to [84, 98] for furter comments on te algoritm and computational details. Tis approac as proved efficient on many academic examples [84]. We note tat GMRES wit deflated restarting is equivalent to GMRES wit eigenvectors [82] and to implicitly restarted GMRES [83]. Details of te metod are given in Algoritms 5 and 6 respectively. GMRES-DR(m, k) does require only m k matrix vector products and preconditioning operations per cycle wile GMRES(m) needs m. Finally we note tat Krylov subspace metods wit deflated restarting ave been exclusively developed in te case of a fixed preconditioner. In Section 2.5 we extend te GMRES-DR metod to te case of variable preconditioning. Algoritm 5 Rigt-preconditioned GMRES wit deflated restarting: GMRES-DR(m, k) 1: Initialization: Coose m > 0, k > 0, tol > 0, x 0 C n. Let r 0 = b Ax 0 ; β = r 0, c = [β, 0 1 m ] T C m+1, v 1 = r 0 /β. 2: Computation of V m+1 and H m : Apply m steps of te Arnoldi procedure (algoritm 2) wit rigt preconditioning to obtain V m+1 C n (m+1) and te upper Hessenberg matrix H m C (m+1) m suc tat: Loop AM 1 V m = V m+1 H m wit V H m+1 V m+1 = I m+1. 3: Minimum norm solution: Compute te minimum norm solution x m C n in te affine space x 0 + M 1 range(v m ); tat is, x m = x 0 + M 1 V m y were y = argmin c H m y. Set x 0 = x m and y C m r 0 = b Ax 0. 4: Ceck te convergence criterion: If c H m y / b tol, exit 5: Computation of V new k+1 and H new k : see Algoritm 6. At te end of tis step te following relations old: AM 1 V new k = V new k+1 H new k wit V newh k+1 V new k+1 = I k+1 and r 0 range(v new k+1 ). 6: Arnoldi procedure: Set V k+1 = V new k+1, H k = H new k and apply (m k) steps of te Arnoldi procedure wit rigt preconditioning and starting vector v k+1 to build V m+1 C n (m+1) and H m C (m+1) m suc tat: AM 1 V m = V m+1 H m wit Vm+1 H V m+1 = I m+1. 7: Setting: Set c = V H m+1 r 0. End of loop

31 16 Algoritm 6 GMRES-DR(m, k): computation of V new k+1 and H new k 1: Input: A, V m+1 suc tat AM 1 V m = V m+1 H m and c H m y suc tat r 0 = V m+1 (c H m y ). 2: Settings: Define m+1,m = H m (m + 1, m), H m C m m as H m = H m (1 : m, 1 : m). 3: Compute k armonic Ritz vectors: Compute k independent eigenvectors g i of te matrix H m + m+1,m 2 Hm H e m e T m. Set G k = [g 1,..., g k ] C m k. 4: Augmentation of G k : Define G k+1 C (m+1) (k+1) as [[ ] ] Gk G k+1 =, c H 0 m y. 1 k 5: Ortonormalization of te columns of G k+1 : Perform a QR-factorization of G k+1 as G k+1 = P k+1 Γ k+1. Define P k C m k as P k = P k+1 (1 : m, 1 : k). 6: Settings and final relation: Set V new k+1 = V m+1p k+1 and H new k = Pk+1 H H m P k. At te end of tis step te following relations are satisfied: AM 1 V m P k = V m+1 P k+1 P H k+1 H m P k ; i.e., AM 1 V new k = V new k+1 H new k were H new k is generally a dense matrix. 2.5 Flexible GMRES wit deflated restarting In tis section we present te new subspace metod tat allows deflated restarting and variable preconditioning simultaneously. We suppose tat a flexible Arnoldi relation olds (AZ m = V m+1 H m ) and analyze one cycle of tis metod Analysis of a cycle We discuss now te two main points related to te extension of GMRES-DR in a flexible setting: wat is te armonic Ritz information recovered at restart and is it still possible as in GMRES-DR to restart at low computational cost te flexible Arnoldi relation? Bot questions will be answered in tis section. Harmonic Ritz formulation Property 2 presents te armonic Ritz formulation used in te flexible variant of GMRES wit deflated restarting. It is a straigtforward adaptation of Property 1 now wen flexible preconditioning is considered. Property 2. Flexible GMRES wit deflated restarting relies on te computation of k armonic Ritz vectors Y k = V m G k of AZ m V H m wit respect to range(v m ) wit Y k C n k and G k C m k respectively. Proof. Following Definition 2, eac armonic Ritz pair (λ k, V m g k ) satisfies te following relation w range(az m V H m V m ) w H (AZ m V H m V m g k λ k V m g k ) = 0, or equivalently since V H m V m = I m w range(az m ) w H (AZ m g k λ k V m g k ) = 0, (2.9) were λ k denotes te armonic Ritz value associated to V m g k. Exploiting te flexible Arnoldi relation AZ m = V m+1 H m leads to te following eigenvalue problem ( ) H m H H m g j λ j H m H gi = 0 0 or equivalently (H m + m+1,m 2 H H m e m e T m)g j λ j g j = 0 (2.10)

32 17 wic is te same as in GMRES wit deflated restarting (see relation (2.8)). Due to relation (2.9) we also note tat te armonic residual vectors AZ m V H m V m g k λ k V m g k range(v m+1 ) are ortogonal to a subspace of dimension m spanned by te columns of AZ m. In Lemma 1 we detail a useful relation satisfied by te armonic Ritz vectors. Lemma 1. In Flexible GMRES wit deflated restarting, te armonic Ritz vectors are given by Y k = V m G k wit corresponding armonic Ritz values λ k. G k C m k satisfies te following relation: [[ ] ] [ ] Gk diag(λ1,, λ AZ m G k = V m+1, ρ k ) 0 m (2.11) 1 k α 1 k were ρ m C m+1 is suc tat r 0 = V m+1 ρ m = V m+1 (c H m y ) and α 1 k = [α 1,, α k ] C 1 k. Proof. Te armonic residual vectors AZ m V H m V m g i λ i V m g i and te residual vector r 0 all reside in a subspace of dimension m + 1 (spanned by te columns of V m+1 ) and are ortogonal to te same subspace of dimension m (spanned by te columns of AZ m, a subspace of range(v m+1 ), so tey must be collinear. Consequently tere exist k coefficients noted α i C wit 1 i k suc tat i {1,, k} AZ m g i λ i V m g i = α i r 0 = α i V m+1 ρ m. (2.12) Setting α 1 k = [α 1,, α k ] C 1 k, te collinearity expression (2.12) can be written in matrix form [[ ] ] [ ] Gk diag(λ1,, λ AZ m G k = V m+1, ρ k ) 0 m. 1 k α 1 k Flexible Arnoldi relation Let us furter denote by G k = P k Γ k te QR-factorization of G k, were P k C m k as ortonormal columns and Γ k C k k is a non-singular upper triangular matrix. We denote G k+1 C (m+1) (k+1) te following matrix tat appears in Lemma 1: [[ ] ] Gk G k+1 =, ρ 0 m. (2.13) 1 k Proposition 3 sows tat a flexible Arnoldi relation can be recovered at low computational cost wen restarting wit some armonic information; i.e., witout involving any matrix-vector product wit A as in [23]. Proposition 3. At eac restart of Flexible GMRES wit deflated restarting, te flexible Arnoldi relation olds wit AZ new k = V new k+1 Z new k = Z m P k, H new k V new k+1 = V m+1p k+1, and H new k = Pk+1 H H m P k. [ Proof. After ortogonalization of te vector ρ m against te columns of vector p k+1 C m+1 tat satisfies [ p k+1 = p k+1 / p k+1 wit p k+1 = ρ m ] [ P k 0 1 k ] P k we obtain te unit norm 0 1 k P k 0 1 k ] H ρ m.

33 18 We note a = p k+1 and u k 1 C k te following quantity u k 1 = ρ m = [[ ] ] [ ] Pk uk 1, p 0 k+1. 1 k a Consequently te QR factorization of G k+1 = P k+1 Γ k+1 can be written as From relation (2.11) of Lemma 1 we deduce [[ ] ] [[ ] ] [ ] Gk Pk Γk u, ρ 0 m =, p k 1 1 k 0 k+1. 1 k 0 1 k a [ Pk 0 1 k ] H ρ m respectively. Tus [ ] diag(λ1,, λ AZ m P k = V m+1 P k+1 Γ k ) k+1 Γ α 1 k. (2.14) 1 k Using te flexible Arnoldi relation AZ m = V m+1 H m and P H k+1 P k+1 = I k+1 we obtain If we denote Z new k [ ] Pk+1 H H diag(λ1,, λ m P k = Γ k ) k+1 Γ α 1 k. 1 k = Z m P k, V new k+1 = V m+1p k+1 and [ ] H new diag(λ1,, λ k = Γ k ) k+1 Γ α 1 k = P H H k+1 m P k, 1 k Equation (2.14) can be written in te following flexible Arnoldi relation AZ new k = V new k+1 Next, setting Z k = Z new k, V k+1 = V new k+1 and H k = H new k respectively flexible GMRES wit deflated restarting ten carries out (m k) flexible Arnoldi steps wit flexible preconditioning and starting vector v k+1 leading to A Z m = V m+1 H m, were Z m C n m, V m+1 C n (m+1) and H m C (m+1) m. H new k Algoritm and computational aspects Details of flexible GMRES wit deflated restarting are depicted in Algoritms 7 and 8 respectively. We will call tis algoritm FGMRES-DR(m, k) and compare tis metod wit bot FGMRES(m) and GMRES- DR(m, k) from a computational and storage point of view. Computational cost We summarize now in Table 2.1 te main computational costs associated wit eac generic cycle of FGMRES(m), GMRES-DR(m, k) and FGMRES-DR(m, k). We ave only included te costs proportional to te size of te original problem n wic is supposed to be muc larger tan m and k. We denote op A and op M te floating point operation counts for te matrix-vector product and te preconditioner application respectively. Te main computational differences are in te calculation of V k+1 and Z k wen comparing FGMRES and FGMRES-DR. In FGMRES-DR tose vectors are computed using dense matrix-matrix operations efficiently implemented in BLAS-3 libraries, wile in FGMRES tey are obtained troug a sequence of matrix-vector products, possibly sparse, depending on te nature of A and te preconditioners.

34 19 Algoritm 7 Flexible GMRES wit deflated restarting: FGMRES-DR(m, k) 1: Initialization: Coose m > 0, k > 0, tol > 0, x 0 C n. Let r 0 = b Ax 0 ; β = r 0, c = [β, 0 1 m ] T C m+1, v 1 = r 0 /β. 2: Computation of V m+1, Z m and H m : Apply m steps of te Arnoldi procedure wit flexible preconditioning (Algoritm 4) to obtain V m+1 C n (m+1), Z m C n m and te upper Hessenberg matrix H m C (m+1) m suc tat: AZ m = V m+1 H m wit V H m+1 V m+1 = I m+1. Loop 3: Minimum norm solution: Compute te minimum norm solution x m C n in te affine space x 0 + range(z m ); tat is, x m = x 0 + Z m y were y = argmin c H m y. Set x 0 = x m and r 0 = b Ax 0. y C m 4: Ceck te convergence criterion: If c H m y / b tol, exit 5: Computation of V new k+1, Znew k and H new k : see Algoritm 8. At te end of tis step te following relations old: AZ new k = V new k+1 H new k wit V newh k+1 V new k+1 = I k+1 and r 0 range(v new k+1 ). (2.15) 6: Arnoldi procedure: Set V k+1 = V new k+1, Z k = Z new k, H k = H new k and apply (m k) steps of te Arnoldi procedure wit flexible preconditioning and starting vector v k+1 to build V m+1 C n (m+1), Z m C n m and H m C (m+1) m suc tat: AZ m = V m+1 H m wit V H m+1 V m+1 = I m+1. 7: Setting: Set c = V H m+1 r 0. End of loop Algoritm 8 FGMRES-DR(m, k): computation of V new k+1, Znew k and H new k 1: Input: A, Z m, V m+1 suc tat AZ m = V m+1 H m and c H m y suc tat r 0 = V m+1 (c H m y ). 2: Settings: Define m+1,m = H m (m + 1, m), H m C m m as H m = H m (1 : m, 1 : m). 3: Compute k armonic Ritz vectors. Compute k independent eigenvectors g i of te matrix H m + m+1,m 2 Hm H e m e T m. Set G k = [g 1,..., g k ] C m k. 4: Augmentation of G k : Define G k+1 C (m+1) (k+1) as [[ ] ] Gk G k+1 =, c H 0 m y. (2.16) 1 k 5: Ortonormalization of te columns of G k+1 : Perform a QR-factorization of G k+1 as G k+1 = P k+1 Γ k+1. Define P k C m k as P k = P k+1 (1 : m, 1 : k). 6: Settings and final relation: Set V new k+1 = V m+1p k+1, Z new k = Z m P k and H new k = Pk+1 H H m P k, so tat te following relations are satisfied: were H new k AZ m P k = V m+1 P k+1 P H k+1 H m P k ; i.e., AZ new k is generally a dense matrix. = V new k+1 H new k (2.17) For deflating variants, te reduction of tis total cost is still possible. Te rigt-and side c of te leastsquares problem is computed as c = V H m+1 r 0 wic involves 2n(m+1) operations as sown in Table 2.1. Tis cost can be first reduced by observing tat te residual r 0 belongs to te subspace spanned by te columns of V k+1, consequently only its first (k + 1) entries are non-zero. Tese quantities can be obtained by computing V H k+1 r 0 and it only requires 2n(k + 1) operations. Tis as been notably investigated in [98]. Te calculation of c can be even more reduced as described in Proposition 4.

35 20 Computation of FGMRES(m) GMRES-DR(m, k) FGMRES-DR(m, k) V m (:, 1 : k + 1) kop A + nk(2k + 5) 2n(m + 1)(k + 1) 2n(m + 1)(k + 1) Z m (:, 1 : k) kop M - 2nmk V m (:, k + 2 : m + 1) (m k)op A + (m k)(op A + op M )+ (m k)op A + n(m k)(2m + 2k + 5) n(m k)(2m + 2k + 5) n(m k)(2m + 2k + 5) Z m (:, k + 1 : m) (m k)op M - (m k)op M c 2n 2n(m + 1) 2n(m + 1) Table 2.1: DR(m, k). Computational cost of a generic cycle of FGMRES(m), GMRES-DR(m, k) and FGMRES- Proposition 4. Te first (k + 1) components of te rigt-and side c of te next least-squares problem are given by te last column of Γ k+1, te triangular factor of te QR factorization of te matrix G k+1 defined in relation (2.13). [ ] uk 1 Proof. In Proposition 3 we ave sown tat ρ m = P k+1 a Tus te rigt-and side of te new least-squares problem is given by c = V H m+1 r 0 = V H m+1 Vnew k+1 [ ] uk 1 = a. Consequently r 0 = V m+1 ρ m = V new k+1 u k 1 a 0 (m k) 1 We note tat Proposition 4 olds for bot GMRES-DR(m, k) and FGMRES-DR(m, k). Storage requirements. [ ] uk 1. a Regarding storage, we ave only included te storage proportional to te size of te original problem n wic is supposed to be muc larger tan m and k. Standard Wit tis convention FGMRES-DR(m, k) requires te storage of Z m, V m+1 and at most k + 1 additional vectors to store in turn V new k+1 and Znew k. Tus FGMRES-DR(m, k) requires te storage of (2m+k+2) vectors of lengt n. Buffered If an extra memory block of buffer size can be allocated, a blocked matrix-matrix product can be implemented to perform V new k+1 = V m+1p k+1 and Z new k = Z m P k, tat computes tese matrices block-row by block-row before overwriting te result in te data structure allocated for V m+1 (Z m respectively). Te definition of tis block size can be governed by te BLAS-3 performance of te targeted computer. Economic A reduction of storage is owever still possible. It can indeed be remarked tat Z new k and V new k+1 can overwrite Z k and V k+1. Tis can be accomplised by performing te matrix multiplications V k+1 V m+1 P k+1 and Z k Z m P k of Step 6 in Algoritm 8 in place, i.e., witin te arrays V m+1 and Z m. Here we ave exploited te fact tat multiplications involving triangular factors can be done in place. It is terefore advisable to perform a LU factorization wit complete pivoting of P k+1 to obtain a very good approximation ΠP k+1 Σ = LU, and ten, to perform successively te operations X XL and X XU and te corresponding permutations e.g. for X being V. Tis approac leads to a storage of (2m + 1) vectors of lengt n only. It is clearly saving a lot of memory wen k is close to m, but may introduce additional round-off errors tat can opefully be monitored by inspecting te quantity ΠP kσ LU. P k Table 2.2 summarizes te requirements related to te storage for bot GMRES-DR(m, k) and FGMRES- DR(m, k). We note tat te economic variant of FGMRES-DR(m, k) needs te same amount of memory as FGMRES(m) and tat flexible variants require m additional vectors wit respect to non flexible variants.

36 21 Strategy GMRES-DR(m, k) FGMRES-DR(m, k) Standard n(m + k + 2) n(2m + k + 2) Buffered n(m + 1) + bu f f size n(2m + 1) + bu f f size Economic n(m + 1) n(2m + 1) Table 2.2: Storage required for GMRES-DR(m, k) and FGMRES-DR(m, k) Numerical experiments In tis section we investigate te numerical beavior of te FGMRES-DR(m,k) algoritm on academic problems. We consider te case of bot sparse matrices in eiter real or complex aritmetic. All te examples include a detailed comparison wit FGMRES(m). Tis allows us to sow te effects of incorporating te deflation strategy in te flexible preconditioning framework. In te following experiments, te rigt-and sides are computed as b = A1 were 1 is te vector of appropriate dimension wit all components equal to one. A zero initial iterate x 0 is considered as an initial guess and te following stopping criterion is used: b A x j b (2.18) were j represents te step wen te iterations are stopped. Te coice of suc a small tolerance relies on te fact tat metods ave to be restarted to be compared. Indeed, spectral deflation occurs only wen te metods are restarted. If a larger convergence tresold is cosen, convergence could occur before restarting and no relevant comparison could be done. Harwell-Boeing and Matrix Market test problems In order to illustrate te numerical beavior of FGMRES-DR(m,k), we first consider a few test matrices from te Harwell-Boeing [33] and Matrix Market [13] libraries so tat any reader could reproduce tese experiments. Te sparse matrices named Serman4, Saylor4 and Young1c ave been cosen. Serman4 and Saylor4 are real matrices, wereas Young1c is a complex-valued one. Tey represent callenging sparse matrices coming from realistic applications (reservoir modeling, acoustics) tat are often used to analyze te beavior of numerical algoritms. For tose experiments, te preconditioner consists in five steps of preconditioned full GMRES, were te preconditioner is based on an ILU(0) factorization. In te case of Serman4 only, te inner solver corresponds to five steps of unpreconditioned full GMRES. In Table 2.3, we depict te total number of matrix-vector products performed in te inner and outer parts of te solver (Mv) and te total number of dot products (dot) for several flexible metods. We also display te ratios of total memory and total floating point operations were te reference is te corresponding quantity of te full FGMRES metod; i.e., r ops = f lops(krylov solver) f lops( f ull FGMRES ) and r mem(krylov solver) mem = mem( f ull FGMRES ), (2.19) were we assume tat te memory allocated for full FGMRES is exactly wat is needed to store Z j and V j+1, j being te step were convergence is acieved. In order to illustrate te possible benefit of using te economic implementation presented in Section we effectively consider different combinations of restart parameters and armonic Ritz values for te flexible metods. Indeed te performance of FGMRES-DR(5,3) can be compared wit FGMRES(5) if te economic variant is implemented or wit FGMRES(7) if a standard implementation is considered (see Table 2.2). Te total amount of floating point operations spent in matrix-vector products, dot products, preconditioning and basis ortogonalization as been computed for eac solution metod, excluding owever te cost of te ILU(0) factorization tat is identical for eac proposed metod. We ave also indicated te results related to full FGMRES as a reference solution metod; i.e., wen memory is not constrained.

37 22 SHERMAN4 SAYLOR4 YOUNG1C Mv dot r ops r mem Mv dot r ops r mem Mv dot r ops r mem FGMRES-DR(5,3) FGMRES(5) FGMRES(7) FGMRES-DR(10,5) FGMRES(10) FGMRES(13) full FGMRES Table 2.3: Performance of FGMRES(m) and FGMRES-DR(m,k) to satisfy te convergence tresold (2.18); Mv is te total number of matrix vector products, dot te total number of dot products and r ops and r mem are te ratios of floating point operations and memory respectively were te reference metod is full FGMRES (see Equation (2.19)). It can be noticed tat flexible metods wit deflated restarting enables a faster convergence tan tose wit standard restarting. It also results in a faster calculation since a significant amount of floating point operations is saved. Moreover we also note tat te performances of FGMRES-DR(10,5) in terms of floating point operations are close to tat of full flexible GMRES especially wen considering te Serman4 and Saylor4 matrices. Tose results also igligt te benefit of using deflated restarting as it may lead to important memory savings. FGMRES-DR as also been found efficient on bot two-dimensional wave propagation problems (Helmoltz equation wit Diriclet boundary conditions) and tree-dimensional electromagnetic problems related to Maxwell s equations. We refer te reader to [53] for furter details. Tus, to improve standard Krylov metods, reusing spectral information can be of great interest to save time and memory wen solving a linear system. Tese improvements could ave a deeper impact in te multiple rigt-and side case. Indeed, te spectral information would be sared by all linear systems, tis will be te topic of a future work. In te next section, Krylov metods for multiple rigt-and side situations are considered but witout any deflated restarting. Generalization of FGMRES to te block case will be presented. Several strategies benefiting from te multiple rigt-and side situation will be investigated. First, notations and principles of block metods are presented; we introduce later te related state-of-te art tecniques. 2.6 Block Krylov metods In tis section, we consider block Krylov metods for te solution of linear systems wit multiple rigt-and sides given at once: AX = B, wit A C n n, B C n p, X C n p and were p is te number of rigt-and sides. Teir principle relies on te same idea as in standard Krylov metods except tat te subspace to be considered is a block Krylov subspace: K m (A, R 0 ) = span { R 0, AR 0, A 2 R 0,..., A m 1 R 0 }. were te initial residual vector R 0 = B AX 0 wit X 0 is te block initial iterate. Te product m p is ten an upper bound of te dimension of te space K m (A, R 0 ) since m 1 is te igest degree of te monomials and p is te number of columns of R 0. Most of te Krylov subspace metods for non-hermitian case ave a block counterpart (block GMRES (BGMRES) [121], block BiCGStab (BBiCGSTAB) [58] and block QMR [49]) Principles of block Krylov metods Te idea of block Krylov metod is to solve several linear systems simultaneously in order to save computational time. Tis idea is justified at least by two reasons. Te first one is enabling matrix-vector products

38 23 involving several vectors. Indeed, applying te matrix to a block of vectors instead of eac vector independently may reduce, depending on te sparsity of A, te number of accesses to te memory ([6], [72, Section ]). Considering parallel computers, tis may also reduce te number of messages sent by MPI and terefore te latency cost. Te second reason lies in te fact tat te solution of eac linear system is sougt in a larger Krylov subspace. In fact, a block Krylov subspace contains all Krylov subspaces generated by eac initial residual K m (A, R 0 (:, i)) for i suc tat 1 i p and all possible linear combinations of te vectors contained in tese subspaces. Indeed, block Krylov subspaces can be expressed as: m 1 K m (A, R 0 ) = A k R 0 γ k wit γ k C p p, k 0 k m 1 Cn p k=0 and as te Cartesian product ( ) of te sum of te p Krylov subspaces B m (A, R 0 ) = + p i=0 K m(a, R 0 (:, i)) [60]: K m (A, R 0 ) = B m (A, R 0 ) B m (A, R 0 ). } {{ } p times Tus eac column X m (:, i) of te block solution X m is searced in te space B m (A, R 0 ) wereas te solution obtained wit Krylov metods for a single rigt-and side is searced in K m (A, R 0 (:, i)) B m (A, R 0 ). Terefore, block Krylov space metods ave more information to obtain te solution of eac linear system tan te Krylov subspace for a single rigt-and side. However, te extra cost of block Krylov subspace due to ortogonalization can make tese metods more expensive in terms of flops compared to a Krylov metod solving one linear system after te oter unless te gain in iteration count is large enoug. Tis is clearly igligted wen considering te costs in operations of te block Arnoldi process ([67, 121]) and of te classical Arnoldi process for p vectors (see Table 2.4). Operations block Arnoldi cost p times Arnoldi costs mvp 2nnz(A)mp 2nnz(A)mp Ortogonalization (4np 2 + np)(m(m + 1)/2) + (m + 1)(5np + 2np 2 ) nm(2m + 5)p Table 2.4: Cost of te block Arnoldi and te classical Arnoldi process according to te matrix dimension n, its number of non-zero elements nnz(a), te Krylov subspace restart parameter m and te number of rigt-and sides p. Te use of block operations in certain steps can accelerate te block metods but tis speed-up essentially depends on te sparsity of A. Te denser it is, te larger te speed-up will be, since memory accesses to te entries of te matrix are made more efficient by an appropriate use of te memory ierarcy tat is implemented on most modern supercomputers. Neverteless, as B m (A, R 0 ) may not be a direct sum, it seems natural to improve block Krylov metods by removing from te block Krylov subspaces useless information for te convergence. Tis tecnique is called later deflation. Te first strategy to remove useless information from a block Krylov subspace is initial deflation. It consists in detecting linear dependency in te rigt-and side block B or/and in te initial residual block R 0 ([59, Section 12] and [72, Section 3.7.2]). Tis requires to compute its/teir numerical rank tanks to a rank-revealing QR-factorization [21] or a singular value decomposition [54] according to a certain deflation tolerance [64]. Te linear dependency in te block residual can also be detected at eac iteration of te block Krylov metod. Tis as been first implemented for te symmetric case in block CG [86] and for non-symmetric problems in Lanczos and Arnoldi metods [1, 27]. It as ten been extended to GMRES, FOM [97] and GCR [73]. A ceap variant in memory of block GCR wit deflation is also proposed in [112], tis metod is building te block solution wit only one column of its block residual (te one wit maximal two-norm). It is only keeping te residual wit maximal norm over te residual norms from one iteration to te next. Deflation can also be performed at eac initial computation of te residual block wen a restarted metod is used ([79], [59, Section 14]). Te advantage of suc metods is tat tey save some rank revealing QR-factorizations or singular value decompositions and can in some cases be as efficient as metods based on deflation at eac iteration. In te next sections, we will focus on certain metods derived from block flexible GMRES. Tis coice is governed by te fact tat algoritms using a constant preconditioner can easily be deduced from te

39 24 variants available for a variable preconditioner. Tis is also a natural extension of te metods investigated in te single rigt-and side case. We will first propose te block flexible GMRES and ten two versions of block FGMRES algoritms wit deflation at te restart. We will finally sow some numerical experiments Block FGMRES In tis section, we present te block flexible GMRES algoritm, a combination of block GMRES [121] and FGMRES [99]. Block GMRES (BGMRES) as been presented for te first time in Vital s tesis [121]. Since ten, numerous variants ave been proposed. We refer to [36, 66, 67, 75, 105, 106, 107] for different variants of block GMRES and to [57, 85] for block GMRES wit deflated restarting. However, we will focus on algoritms wic stay close to FGMRES (for a single rigt-and side). Indeed, a block version of te modified Gram-Scmidt metod (MGS) is used as a block Arnoldi process and convergence is detected from te Arnoldi s residual. Te block MGS (modified Gram-Scmidt) ortogonalization sceme is described in Algoritm 9. Algoritm 9 Flexible block Arnoldi process (MGS implementation): computation of V j+1, Z j and H j for j m 1: for j = 1,..., m do 2: Z j = M 1 j V j 3: W = AZ j 4: for i = 1,..., j do 5: H i, j = Vi H W 6: W = W V i H i, j 7: end for 8: Compute te QR decomposition W = QR, V j+1 = Q, H j+1, j = R; 9: 10: Set H i, j = 0 p p for i > j + 1 Define Z j = [Z 1,..., Z j ], V j = [V 1,..., V j ], H j = (H k,l ) 1 k j+1,1 l j. 11: end for As in te standard flexible Arnoldi process (Algoritm 4), te flexible block Arnoldi process produces matrices Z j C n jp, V j C n ( j+1)p and H j C ( j+1)p jp, wic satisfy a flexible Arnoldi relation, for j m, AZ j = V j+1 H j. (2.20) Combining te expressions of W in Algoritm 9, we obtain for all j suc tat 1 j m: j W = V j+1 H j+1, j = AZ j V i H i, j i=1 wic can be written: AZ j = [ V 1, V 2,..., V j+1 ] H 1, j H 2, j. H j+1, j. Finally, we generalize tis expression for all j suc tat 1 j m: H 1,1 H 1,2... H 1, j A [ ] Z 1,..., Z j = [ H 2,1 H 2,2... H 2, j ] V 1, V 2,..., V j+1 0 p p H 3, p p 0 p p 0 p p H j+1, j.

40 25 Using te definitions given in Algoritm 9 line 10 of Z j, V j+1 and H j, we deduce te block flexible Arnoldi relation: AZ j = V j+1 H j. Remark 2. It sould be noticed tat H j is no longer a Hessenberg matrix but a block Hessenberg matrix. Indeed, its block sub-diagonal consists of diagonal blocks of size p p. We depict now te block flexible GMRES algoritm tat we derive from te algoritm involving a constant preconditioner [105, 121]. Algoritm 10 Block Flexible GMRES (BFGMRES(m)) 1: Coose a convergence tresold tol, te size of te restart m and te maximum number of iterations itermax. 2: Coose initial guess X 0 ; 3: Compute te initial block residual R 0 = B AX 0 ; 4: for iter = 1,..., itermax do [ ] T 5: Compute te QR decomposition R 0 = QT, V 1 = Q, B j =, 1 j m. 0 jp p 6: for j = 1,..., m do 7: Completion of V j+1, Z j and H j : Apply Algoritm 9 from line 2 to 10 wit flexible preconditioning (Z j = M 1 j V j, 1 j m) to obtain V j+1 C n ( j+1)p, Z j C n jp and te matrix H j C ( j+1)p jp suc tat: AZ j = V j+1 H j wit V H j+1 V j+1 = I ( j+1)p. 8: Solve te minimization problem Y j = argmin Y C jp p B j H j Y F ; 9: if B j (:, l) H j Y j (:, l) 2 / B(:, l) 2 tol, l 1 l p ten 10: compute X j = X 0 + Z j Y j ; stop 11: end if 12: end for 13: Compute X m = X 0 + Z m Y m and R m = B AX m ; 14: Set R 0 = R m and X 0 = X m ; 15: end for In te following propositions, we first derive te relation between te true residual R j = B AX j and te Arnoldi s residual B j H j Y j tat olds in te block case (Proposition 5). Ten, we prove tat BFGMRES minimizes te Euclidean norm of eac residual (Proposition 6). Proposition 5. At te end of te restart or at te convergence in Algoritm 10, te computed solution X j and te least-squares solution Y j satisfy te following block relation for j suc tat 1 j m: R j = B AX j = V j+1 (B j H j Y j ). Proof. We first recall tat te initial residual can be written as (see Algoritm 10 line 5): R 0 = V j+1 B j. We ten deduce te proposed relation using tis last equality and te block flexible Arnoldi relation (2.20): B A(X 0 + Z j Y j ) = R 0 V j+1 HY j, = V j+1 (B j H j Y j ). Proposition 6. Algoritm 10 minimizes te Euclidean norm of te residual of eac linear system.

41 26 Proof. Tis is a direct consequence of Proposition 5 and of some properties related to te Frobenius norm: B AX j 2 F = B j H j Y j 2 F = min Y C jp p B j H j Y 2 F = = l=1 p min B j(:, l) H j Y(:, l) 2 Y(:,l) C jp 2, l=1 p p min R 0(:, l) AZ j Y(:, l) 2 Y(:,l) C jp 2 = min B(:, l) A(X 0(:, l) + Z j Y(:, l)) 2 Y(:,l) C jp 2. l=1 Tis last equality proves te proposition. Corollary 1. Te convergence of BFGMRES is monotone in te Euclidean norm for te residual of eac linear system. Proof. Since [ ] min B j(:, l) H j y 2 B j (:, l) H y j 1 j y C jp 0 2 = B j 1 (:, l) H j 1 y j 1 2 p p were y j 1 = argmin y C ( j 1)p B j 1 (:, l) H j 1 y 2, we ave, for all l suc tat 1 l p, min B j(:, l) H j y 2 min B j 1 (:, l) H j 1 y 2. y C jp y C ( j 1)p Since B(:, l) A(X 0 (:, l) + Z j y) 2 = B j (:, l) H j y 2, te corollary is proved. Corollary 2. In Algoritm 10, detecting te convergence on te true residual is equivalent to detecting te convergence on te Arnoldi s residual in exact aritmetic: B(:, l) AX j (:, l) 2 B(:, l) 2 tol, l 1 l p B j(:, l) H j Y j (:, l) 2 B(:, l) 2 tol, l 1 l p. Proof. Tis is a direct consequence of Proposition 6. Remark 3. Te stopping criterion in Algoritm 10 (line 9) as been cosen considering Corollary 2. Te Frobenius norm could be used to ceck convergence ( R j F p residual since max R j(:, l) 2 2 R j 2 F p max R j(:, l) l p 1 l p ε) instead of te Euclidean norm of eac Despite te fact tat te Frobenius norm would be convenient for detecting te convergence at once, it can be too severe for detecting at te rigt time te convergence of eac rigt-and side. Indeed, if one rigtand side converges muc earlier tan te oters, te Frobenius norm cannot detect it. Tus, even simple strategies, like removing converged solutions, cannot be considered using a Frobenius norm. Remark 4. In Algoritm 10, te true residual is computed at eac restart wereas it could be computed tanks to Proposition 5: R m = V m+1 (B m H m Y m ). Indeed, it is usually ceaper to compute explicitly R m = B AX m for a sparse matrix A (2nnz(A)p + np operations) tan evaluating V m+1 (B m H m Y m ) explicitly (2n(m + 1)p 2 operations). Tere exists a lot of applications in te literature for wic traditional block metods are very efficient but tese metods are not consistently profitable; floating point operations and memory ave to be considered carefully. In te next section, we derive block metod found efficient on te numerical tests addressed in tis tesis.

42 Block FGMRES wit deflation Ten it exists an elegant but complex way to introduce deflation during eac iteration of block GMRES due to Robbé and Sadkane [97]. It consists in detecting linear dependency in te block of residuals at eac iteration. Of course, tis requires additional operations at eac iteration but can really improve convergence [69] at te same memory cost as BGMRES. However, since small restart parameters are considered in practice for memory issues, we propose a simpler algoritm implementing deflation at te restart. It consists in detecting linear dependency in te true block residual B AX at te beginning of eac restart. Te main ideas of tis metod are presented in ([59, Section 14], [79]), it is a generalization of initial deflation tecniques [72]. Tanks to a small restart parameter m, linear dependencies in te block residual could ten be detected nearly wen tey occur. Tis detection is performed wit a rank revealing QR-factorization or a SVD of te block residual. Of course, since exact deflation never occurs in practice, a deflation tolerance as to be selected. Tis deflation tolerance introduces a numerical error wic may badly influence te convergence [59], te question of its coice will be discussed later in tis section. Te block Flexible GMRES wit deflation (Algoritm 11) introduces deflation at te restart in BFGM- RES. Tis metod is a direct adaptation of block FGMRES (Algoritm 10) to te case of deflation. It uses deflation tecniques really close to te one depicted in [59] for BGMRES and in [79] for BQMR. Te deflation is performed tanks to te SVD of te upper triangular factor arising from te QR-factorization of te true block residual B AX. It consists in selecting te p d singular vectors corresponding to te p d singular values larger tan a deflation tolerance ε d tol were ε d (0, 1] and tol is te convergence tresold for te linear systems. Te pilosopy beind tis process is to detect linear combinations of te columns of te block residual wic ave converged. Te value of ε d as to be carefully cosen to guarantee convergence to a tolerance tol. To make tis coice easier, a "quality of convergence criterion" ε q [0, 1] is introduced. Te parameter ε q sets a convergence criterion for te small residual ρ j. We will remark tat if ε d and ε q are cosen suc tat ε d + ε q 1, te convergence will be guaranteed on te true residual. In order to keep te same scaled stopping criterion as in BFGMRES ( B(:,l) AX(:,l) 2 B(:,l) 2 tol, l p) and to avoid te scaling on te deflation condition (Algoritm 11 lines 8 and 24), Algoritm 11 deals wit scaled rigt-and sides and scaled initial solutions (Algoritm 11 line 6). Moreover, in tis section, we assume tat te singular values during te SVD (T = U T Σ T WT H, T Cp p ) are sorted in a increasing order: Σ T (l + 1, l + 1) Σ T (l, l), l 1 l < p.

43 28 Algoritm 11 Block Flexible GMRES wit SVD based deflation (BFGMRESD(m)) 1: In tis algoritm, we consider tat A C n n, B C n p. 2: Coose a convergence tresold tol, a deflation criterion ε d, a convergence criterion ε q, te size of te restart m and te maximum number of iterations itermax. 3: Coose initial guess X 0 C n p ; 4: Compute: D B (l, l) = B(:, l) 2 for all l suc tat 1 l p. 5: Compute te initial block residual R 0 = B AX 0 ; 6: Compute te QR decomposition R 0 D 1 B = QT; 7: Compute te SVD of T: T = U T Σ T WT H, 8: Select p d singular values of T suc tat Σ T (l, l) ε d tol for all l suc tat 1 l p d ; 9: Compute V 1 : V 1 = QU T (:, 1 : p d ). 10: for iter = 1, [..., itermax ] do I 11: Let B j = pd, 1 j m. 0 jpd p d 12: for j = 1,..., m do 13: Completion of V j+1, Z j and H j (see Algoritm 9): Apply Algoritm 9 from line 2 to 10 wit flexible preconditioning (Z j = M 1 j V j, 1 j m) to obtain V j+1 C n ( j+1)p d, Z j C n jp d and te matrix H j C ( j+1)p d jp d suc tat: AZ j = V j+1 H j wit V H j+1 V j+1 = I ( j+1)pd. 14: Solve te minimization problem Y j = argmin Y C jp d p d B j H j Y F ; 15: Compute ρ j = (B j H j Y j )Σ T (1 : p d, 1 : p d )W T (1 : p, 1 : p d ) H 16: if ρ j (:, l) 2 tolε q, l 1 l p ten 17: Compute X j = X 0 + Z j Y j Σ T (1 : p d, 1 : p d )W T (1 : p, 1 : p d ) H D B ; stop; 18: end if 19: end for 20: X m = X 0 + Z m Y m Σ T (1 : p d, 1 : p d )W T (1 : p, 1 : p d ) H D B, 21: R m = B AX m, 22: Compute te QR decomposition R m D 1 B = QT; 23: Compute te SVD of T: T = U T Σ T WT H, 24: Select p d singular values of T suc tat Σ T (l, l) ε d tol for all l suc tat 1 l p d ; 25: Compute V 1 : V 1 = QU T (:, 1 : p d ). 26: Set R 0 = R m and X 0 = X m. 27: end for Proposition 7 gives a generalization of Proposition 5 to te deflation case (relation between te true residual and te Arnoldi s one). In order to simplify te notations, we set U + = U T (:, 1 : p d ), Σ + = Σ(1 : p d, 1 : p d ), W + = W T (:, 1 : p d ) and U = U T (:, p d + 1 : p), Σ = Σ T (p d + 1 : p, p d + 1 : p), W = W T (:, p d + 1 : p). Proposition 7. At te end of one restart or at convergence in Algoritm 11, te block true residual R j = B AX j and te small residual ρ j = (B j H j Y j )Σ + W+ H satisfy te following property for j suc tat 1 j m: R j = V j+1 ρ j D B if p d = p, R j = [ ] V j+1 ρ j + QU Σ W H DB if p d < p. Proof. Te first equality is a direct consequence of te fact tat Algoritm 11 witout deflation is equivalent to block FGMRES (Algoritm 10). To obtain te second equality, we develop R j : R j = B AX j = B A(X 0 + Z j Y j Σ + W+ H D B ), = R 0 V j+1 H j Y j Σ + W+ H D B = [ QU T Σ T WT H V H ] j+1 j Y j Σ + W+ H DB.

44 29 Since V 1 = QU + and V j+1 B j = V 1, we ave R j = [ V j+1 (B j H ] j Y j )Σ + W+ H + QU Σ W H DB, = [ ] V j+1 ρ j + QU Σ W H DB. Proposition 8. In Algoritm 11 for any ε d (0, 1], te Frobenius norm of te block residual is decreasing from one iteration to te next. Tis olds even if ε d is allowed to vary at eac restart. Proof. Proposition 7 gives: R j D 1 B 2 F = V j+1 (B j H j Y j )Σ + W+ H + QU Σ W H 2 F, = tr((v j+1 (B j H j Y j )Σ + W+ H + QU Σ W H )(V j+1 (B j H j Y j )Σ + W+ H + QU Σ W H ) H ), = (B j H j Y j )Σ + 2 F + Σ 2 F + tr(v j+1(b j H j Y j )Σ + W H + W Σ U H Q H ) + tr(qu Σ W H W + Σ + (B j H j Y j ) H V H j+1 ) = (B j H j Y j )Σ + 2 F + Σ 2 F, since W T = [W +, W ] is unitary. Furtermore, te definition of te Frobenius norm yields: B j H j Y j 2 F = min B j H j Y 2 F = Y C jp d p d = p d min Y(:,l) C l=1 jp d Tus, Y j also minimizes (B j H j Y)Σ + F because min Y C jp d p d p d l=1 B j (:, l) H p d j Y(:, l) 2 2 = B j (:, l) H j Y(:, l) 2 2, l=1 min y C jp d B j (:, l) H j y 2 2. (B j H p d j Y)Σ + 2 F = (B j (:, l) H p d j Y(:, l))σ + (l, l) 2 2 = B j (:, l) H j Y(:, l) 2 2 Σ +(l, l) 2. l=1 Terefore, according to te proof of Corollary 1, we ave R j D 1 B 2 F = (B j H j Y j )Σ + 2 F + Σ 2 F, (B j 1 H j 1 Y j 1 )Σ + 2 F + Σ 2 F. Coming back to te first equality of te proof, and writing it for j 1 instead of j, it follows tat l=1 R j 1 D 1 B 2 F = (B j 1 H j 1 Y j 1 )Σ + 2 F + Σ 2 F. Te monotonicity of BFGMRESD in te Frobenius norm is ten proved, since we ave R j D 1 B F R j 1 D 1 B F. Tis proposition sows tat te Frobenius norm of te block residual decreases along te iterations wen deflation is used. Neverteless, it does not ensure a monotone beavior for te Euclidean norm of eac residual; te following remark explores tis issue. Remark 5. In order to guarantee monotonicity on te Euclidean norm of eac residual, we sould include te quantity QU Σ W H in te least-squares problem solution (Algoritm 11 line 14). However, it implies to solve a least-squares problem in a space wit a larger dimension tan C jp d p d for eac residual. Indeed, since V 1 = QU +, we ave, for all l suc tat 1 l p, R j (:, l) = V j+1 (B j H j Y j )Σ + W + (l, :) H + QU Σ W (l, :) H, ([ ] [ ] UΣW(l, :) H U = [Q, V 2,..., V j+1 ] + 0 pd, jp d 0 jpd,p d 0 jpd,p d I jpd, jp d H j Y j Σ + W + (l, :) H ).

45 30 [ UΣW(l, :) H Tus, to minimize 0 jpd,p d ] [ ] U + 0 pd, jp d 0 jpd,p d I jpd, jp d H j YΣ + W + (l, :) H 2 over Y C jp d p d, is not equivalent to minimize te true residual R j (:, l) Euclidean norm since [Q, V 2,..., V j+1 ] is not ortogonal. [Q, V 2,..., V j+1 ] as ten to be taken into account in te minimization problem. Te next corollaries (Corollaries 3 and 4) give upper bounds of te individual residual norms wic guarantee convergence on te true residual wen deflation as occurred and wen te norm of ρ j (:, l) is less tan tol for all l p. Corollary 3. In Algoritm 11, at te end of te restart or at convergence, wen deflation as occurred (p d < p), R j and ρ j satisfy te following property: R j (:, l) V j+1 ρ j (:, l) D B (l, l) 2 Σ T (p d + 1, p d + 1) D B (l, l), l 1 l p. Proof. Tis is a direct consequence of Proposition 7 and te SVD properties, indeed, we ave R j (:, l) V j+1 ρ j (:, l) D B (l, l) 2 = QU Σ W T (l, p d + 1 : p) H D B (l, l) 2, = Σ W T (l, p d + 1 : p) H 2 D B (l, l), Σ T (p d + 1, p d + 1) D B (l, l). Tis corollary makes te relation between te norm of te true residual and te norm of te small residual ρ j explicit. It sows tat te norm of teir difference is always lower tan te largest deflated singular value Σ T (p d + 1, p d + 1) multiplied by te corresponding rigt-and side norm D B (l, l). It means tat if te deflation tolerance is well cosen, wen deflation occurs te residual ρ j will be close to be te ortogonal projection of R j onto V j+1. Te next corollary is a reformulation of Corollary 3. It will be used in Remark 6 to discuss possible coices for te deflation criterion. Corollary 4. Wen Algoritm 11 restarts and deflation as occurred (p d < p), te block true residual R j verifies R j (:, l) 2 D B (l, l) ρ j (:, l) 2 + Σ T (p d + 1, p d + 1), l 1 l p. Furtermore, if Algoritm 11 as converged and deflation as occurred, R j verifies R j (:, l) 2 D B (l, l) Proof. Corollary 3 gives for all l suc tat 1 l p: It follows tat tol ε q + Σ T (p d + 1, p d + 1), l 1 l p. Σ T (p d + 1, p d + 1) D B (l, l) R j (:, l) 2 ρ j (:, l) 2 D B (l, l). R j (:, l) 2 D B (l, l) ρ j (:, l) 2 + Σ T (p d + 1, p d + 1). Tis sows te first inequality of te corollary. Te second inequality is straigtforward: if ρ j (:, l) 2 D B (l, l) 1 ε q tol, we ave: R j (:, l) 2 D B (l, l) ε q tol + Σ T (p d + 1, p d + 1). Remark 6. A way to insure tat convergence is well detected is ten to coose a fixed quality convergence criterion ε q (0, 1) suc tat ε q + ε d = 1 wic means ε q = 1 ε d. Indeed, if suc a criterion is cosen, we ave: tol ε q + Σ T (p d + 1, p d + 1) (ε d + ε q ) tol, tol.

46 31 A variable quality convergence criterion ε q can also be cosen at eac restart. It aims at obtaining a iger convergence tolerance on te Arnoldi residual ρ j and ten to gain some iterations. Considering Corollary 4, if at eac restart ε q is taken suc tat: ε q = 1 Σ T (p d + 1, p d + 1), tol te convergence will be guaranteed and te convergence condition on ρ j will be weaker. Te coice of te deflation tolerance (Remark 6) is yet an open question. It can be cosen suc tat, even if te true residual norm is guaranteed to be less tan tol wen ρ j (:, l) 2 tol, l 1 l p, it improves convergence and acieve it, even if ε d = ε q = 1. It can also be cosen differently at eac restart considering in some way a convergence criterion for te current restart. However, BFGMRESD requires as muc memory as BFGMRES (Table 2.5). From Proposition 8 we know tat BFGMRESD will ave a monotone convergence in te Frobenius norm for any deflation tolerance criterion. Tis memory requirement issue and tis last observation lead us to propose a truncation strategy; instead of aving a deflation tolerance, we keep te size of te block constant from one restart to te oter. In Table 2.5, we remark tat te memory requirement is significantly lower tan for oter block metods wen te fixed block size is cosen suc tat p f < p. Of course, tis metod will often need more iterations to converge tan BFGMRES wit SVD based deflation, but tis as to be balanced wit its memory requirements. Te metod is depicted in Algoritm 12. Metod BFGMRES(m) BFGMRESD(m) BFGMREST(m,p f ) Storage n(2m + 1)p + 3np n(2m + 1)p + 3np n(2m + 1)p f + 3np Table 2.5: Storage required for BFGMRES(m), BFGMRESD(m) and BFGMREST(m,p f ) considering a block size p and a problem dimension n. As previously said, te convergence of suc an algoritm will be monotone in te Frobenius norm and tere is no rule about ow te individual residual norms will vary along te iterations. We only know tat it is led by te larger deflated singular value (Corollary 4). Indeed, Remark 6 states tat convergence would occur if ρ j (:, l)/d B (l, l) convergence tresold is cosen equal to tol Σ T (p b + 1, p b + 1). Unfortunately, as singular vectors are cosen by truncation at te beginning of te restart in Algoritm 12, tere is no guarantee tat Σ T (p b + 1, p b + 1) is close to tol. Te quantity tol Σ T (p b + 1, p b + 1) as ten many cances to be negative. Tereby, we consider as a convergence tresold on ρ j (:, l), te minimum between tol and tol Σ T (p b + 1, p b + 1) (Algoritm 12 line 17) and to be sure te convergence is acieved, te convergence criterion is cecked on te true residual (Algoritm 12 line 20).

47 32 Algoritm 12 Block Flexible GMRES wit SVD based truncation (BFGMREST(m,p f )) 1: In tis algoritm, we consider tat A C n n, B C n p. 2: Coose a convergence tresold tol, a deflation criterion ε d, a fixed block size p f < p, a restart size m and te maximum number of iterations itermax. 3: Coose initial guess X 0 C n p ; 4: Compute: D B (l, l) = B(:, l) 2 for all l suc tat 1 l p. 5: Compute te initial block residual R 0 = B AX 0 ; 6: Compute te QR decomposition R 0 D 1 B = QT; 7: Compute te SVD of T: T = U T Σ T WT H, 8: Calculate te number, p d, of singular values of T suc tat Σ T (l, l) ε d tol for all k suc tat 1 l p d ; 9: Compute p b = min(p d, p f ); 10: Compute V 1 : V 1 = QU T (:, 1 : p b ). 11: for iter = 1, [..., itermax ] do I 12: Let B j = pb, 1 j m. 0 jpb p b 13: for j = 1,..., m do 14: Completion of V j+1, Z j and H j (see Algoritm 9): Apply Algoritm 9 from line 2 to 10 wit flexible preconditioning (Z j = M 1 j V j, 1 j m) to obtain V j+1 C n ( j+1)p b, Z j C n jp b and te matrix H j C ( j+1)p b jp b suc tat: AZ j = V j+1 H j wit V H j+1 V j+1 = I ( j+1)pb. 15: Solve te least-squares problem Y j = argmin Y C jp b p b B j H j Y F ; 16: Compute ρ j = (B j H j Y j )Σ T (1 : p b, 1 : p b )W T (1 : p, 1 : p b ) H 17: if ρ j (:, l) 2 min(tol, tol Σ T (p b + 1, p b + 1) ) l p ten 18: Compute X j = X 0 + Z j Y j Σ T (1 : p b, 1 : p b )W T (1 : p, 1 : p b ) H D B ; 19: Compute R j = B AX j ; 20: if R j (:, l) 2 /D B (l, l) tol, l p ten 21: stop; 22: else 23: Return to 29; 24: end if 25: end if 26: end for 27: X m = X 0 + Z m Y m Σ T (1 : p b, 1 : p b )W T (1 : p, 1 : p b ) H D B, 28: R m = B AX m, 29: Compute te QR decomposition R m D 1 B = QT; 30: Compute te SVD of T: T = U T Σ T WT H, 31: Select p d singular values of T suc tat Σ T (l, l) ε d tol for all l suc tat 1 l p d ; 32: Compute p b = min(p d, p f ); 33: Compute V 1 : V 1 = QU T (:, 1 : p b ); 34: Set R 0 = R m and X 0 = X m ; 35: end for However, for all te test cases we ave considered, individual convergence on te Euclidean norm was always monotone for block metods wit deflation or truncation. Te next section is dedicated to numerical experiments in Matlab Numerical experiments Te aim of our experiments is to compare different flexible block metods wit respect to bot memory requirements and numerical efficiency, using on Matlab on academic test cases. Te first metod is te most natural way to solve a linear system wit many rigt-and sides using an iterative metod. It consists

48 33 in using FGMRES (Algoritm 3) for eac rigt-and side, and solving te linear systems, one after te oter. We call it FGMRES sequence. Tis strategy is te ceapest in memory to solve tese problems but it does not benefit from te multiple rigt-and side situation. Te second metod is a traditional block metod (BFGMRES, Algoritm 10). Te memory requirement of tis metod is quite ig but we expect an improved convergence due to a larger searc space. Te tird metod is a block metod using deflation (BFGMRESD, Algoritm 11) for a deflation tolerance equal to te convergence tresold (ε d = 1). Te stopping criterion on te small residual ρ j is described at te end of Remark 6: ε q = 1 Σ T (p d + 1, p d + 1). tol Tis metod is still expensive in memory but sould beave, at least, as well as BFGMRES. Te fourt and te fift metods are block metods using bot truncation and deflation (BFGMREST, Algoritm 12) for two different truncated block sizes. Te first size is equal to te number of rigt-and sides divided by 2 rounded up (in Matlab ceil(p/2)), te second one is te number of rigt-and sides divided by 3 rounded up (ceil(p/3)). Tis involves a ceaper cost in memory but te convergence may beave worse tan BFGMRES. Te coice of flexible metods is governed by te fact tat one restart of GMRES(5), tat cannot be represented by a matrix for any rigt-and side, will be our preconditioning strategy. For block metods, te preconditioner will be applied to eac block vectors one after te oter. Tis allows to compare all te metods wit te same preconditioner. Te restart size m is taken equal to 5 for all te preconditioned metods. Several block sizes (number of rigt-and sides processed at once), denoted by p, will be considered in order to determine te best block size for eac test case. Te values of p are taken equal to 5, 10, 20, 40, 80 and 160 respectively. Despite te fact tat te last two values would not be very relevant in practice for bot memory requirement reasons and ortogonalization costs, tey ave been cosen to sow te effect of using a large block size. For all te experiments, te algoritms are stopped wen te Euclidean norm of eac residual normalized by te corresponding rigt-and side norm is below 10 6 : B(:, l) AX(:, l) 2 B(:, l) , l 1 l p. Te block metods are compared according to te number of iterations (equivalent to te number of applications of te preconditioner) and to te number of floating point operations (flops) to acieve convergence. Tis last comparison criterion ensures tat te metod wit te smallest flops number required will be te fastest. However, tese two measures do not take into account te possible computational speed-up of block metods especially te block matrix vector acceleration. Timing would sow suc a beavior but since Matlab timing is not reliable, we ave decided not to provide tis information. In Capter 4 experiments wit block flexible metods will be performed on a geopysical application in Fortran and timings will be reported, tis will empasize te real capabilities of block metods. Poisson problem Te first test case is a two-dimensional Poisson problem in te unit square [0, 1] 2 = Ω Ω, were Ω = (0, 1) 2, wit Diriclet boundary conditions: u = g on Ω, u = 0 on Ω. (2.21) It is discretized wit a second-order finite difference sceme for a vertex-centered grid arrangement. Te mes grid size is taken equal to 1/128. Tus, te size of matrix is n = = 16129, it is sparse, symmetric wit a five-banded structure. Canonical rigt-and side First we take as rigt-and side te canonical basis vectors: B(:, 1 : p) = [e 1, e 2,..., e p ]. Te rigt-and side matrix is unitary and no initial deflation can ten be performed. Te initial iterate X 0 is set to zero.

49 34 Numerical results Results are reported in Table 2.6. number of RHS p = 5 p = 10 p = 20 p = 40 p = 80 p = 160 It r ops It r ops It r ops It r ops It r ops It r ops FGMRES(5) sequence BFGMRES(5) BFGMRESD(5) BFGMREST(5,ceil(p/2)) BFGMREST(5,ceil(p/3)) Table 2.6: Number of iterations (It) and operation ratio (r ops ) for te Poisson problem for p canonical basis rigt-and sides. In Table 2.6, te parameter p is te number of columns of te rigt-and side block. Te quantity It is te number of preconditioner applications required to converge. Te quantity r ops is te ratio between te number of operations, including te preconditioning operations, needed by FGMRES(5) in sequence over te number of operations needed by eac oter metods. For instance, r ops in te second row is: r ops = ops(bfgmres (5)) ops(fgmres (5) sequence), were ops(metod) denotes te number of operations needed by te relevant metod to converge. FGMRES sequence We first observe tat te number of iterations needed by FGMRES(5) sequence is increasing almost linearly wit te block size p; it is multiplied from one column to te next by a factor of two. It means tat FGMRES converges in more or less te same number of iterations for eac rigt-and side given in te sequence. BFGMRES Looking at te second row of Table 2.6, it can be remarked tat BFGMRES(5) requires a reduced number of iterations wit respect to FGMRES(5) sequence : te larger is p, te smaller is te number of iterations needed to converge. For p = 160, BFGMRES(5) needs only a sligtly more tan te alf of FGMRES(5) sequence iterations to converge, respectively 2240 and However, altoug BFGMRES(5) needs less iterations, its computational cost is especially ig (up to twelve times more (wen p = 160)). In fact, te extra ortogonalizations make BFGMRES slower tan solving te linear systems in sequence. Furtermore, we note tat te operations performed by te preconditioner are taken into account in te computational cost calculation. Since te cost of te preconditioner application is low, to improve significantly te number of iterations does not decrease significantly te total computational cost. However, if te block matrix vector product computations significantly speed up te block metod, it could be interesting to use suc a metod on tis problem. BFGMRESD Notwitstanding, we also remark tat BFGMRES(5) is greatly improved by deflation. Indeed, BFGMRESD(5) diminises significantly te number of iterations. Te ratio between te number of iterations of BFGMRESD(5) and FGMRES(5) sequence greatly increases wit te block size p. Tis ratio starts at more tan 2 for p = 5 and it ends at more tan 4 for p = 160. However, despite suc a beavior, te operation ratio r ops does not vary similarly. Te best situation for BFGMRESD(5) is met wen p = 5 were (r ops = 0.56) and r ops is larger tan one for p = 40, larger tan two for p = 80 and larger tan four for p = 160. Tis beavior is once again due to te extra-ortogonalization and te ceap cost of te preconditioning operations. BFGMREST Wen truncation is used (rows 4 and 5), te numbers of iterations of BFGMREST are quite close to te ones of BFGMRESD(5). Of course, te number of iterations of BFGMREST is always larger tan one of BFGMRESD, since it lacks information compared to BFGMRESD. Moreover, te same beavior is observed between te fourt and fift rows, te larger te fixed block size parameter p f is, te smaller te number of iterations is. Neverteless, on tis test case, truncated metods require less operations to converge tan a traditional block deflated metod and te smaller p f, te smaller te r ops. Tis is once again a direct consequence of te extra block ortogonalization cost: to decrease te size of te block as a direct impact on te operation cost. Indeed, te cost in operations of te block Arnoldi process involves te square of te block size (see Table 2.5). Since te number of iterations is kept close to BFGMRESD, BFGMREST as a ceaper cost in operation.

50 35 Comments on istories of convergence Histories of convergence for te first two values of p (p = 5, p = 10) are plotted in Figures 2.5 and 2.6 respectively. We do not sow te istories of convergence for te oter values of p because te plots for larger p would be too overloaded. Indeed, te istories are plotted for eac rigt-and side and for eac metod on te same figure. How to read te plots Eac metod is associated to a color and a symbol; (magenta, ) for FGMRES(5) sequence, (black, ) for BFGMRES(5), (blue, ) for BFGMRESD(5), (red, +) for BFGMREST(5, ceil(p/2)) and (green, ) for BFGMREST(5, ceil(p/3)). Concerning FGMRES(5) sequence, istories of convergence are drawn for eac rigt-and side. Once te istory of convergence for a rigt-and side as been plotted, te istory of convergence for te next rigt-and side is plotted from te abscissa were te previous istory ends. For BFGMRES(5), istories of convergence are also plotted for eac rigt-and side but te normalized norm of eac residual is plotted against block iterations. Tereby p squares ( ) appear in group at eac iteration in te istory of convergence of BFGMRES(5). Histories of convergence of BFGMRESD(5) are plotted in almost te same way. Te only difference appens wen deflation occurs. Indeed, since te block size decreases (p d < p) due to deflation, we note tat at te end of BFGMRESD(5) convergence, p d triangles ( ) appear instead of p. In te truncation case, te small residual ρ j do not give information on te true block residual R j. Te true block residual is only computed at te end at te restart. Tus, te normalized norms of eac true residual are plotted against one block restart iteration. Terefore, residuals norms are plotted in groups of size mp b were p b is te minimum between te size of te truncated block p f and te number of significant singular vectors for te convergence p d. Te main purpose of tese plotting conventions is to illustrate te ranking in Table 2.6. Of particular interest are te istories of convergence of deflated and truncated block metods. Indeed, we notice on bot Figures 2.5 and 2.6 ow te block size is varying for BFGMRESD(5). At te beginning, tere is no deflation, BFGMRESD(5) beaves like BFGMRES(5) but after te first restart, singular vectors are removed and te block size p d of BFGMRESD(5) is decreasing along te solution pase. At te end, in Figures 2.5 and 2.6, te block size p d of BFGMRESD(5) is found to be one. Te istories of convergence of truncated metods points out an interesting beavior. Some residual norms are not decreasing during te first restart wereas te oters reac 10 4 for bot values of p (Figures 2.5 and 2.6). Ten, te lower residuals from te first restart decrease a little during te second restart (p f = ceil(p/2) in Figure 2.5) or not (p f = ceil(p/2) in Figure 2.5 and for bot values of p f in Figure 2.6) wereas, te iger residuals from te first restart decrease a lot. Finally, te residual decreases uniformly during te tird restart. Tis particular beavior is due to te structure of te rigt-and side. Indeed, te initial block residual is [ R 0 = ] I p p, 0 n p p and so coincide wit te first V 1. Ten, te first restart only deals wit te first p f rigt-and sides and does not manage te last p p f ones. Te second restart andles tese last p p f rigt-and sides and does not affect te first p f rigt-and sides except for p = 5 and p f = ceil(p/2) = 3. In fact, for p f = 3, one column of te second restart V 1 contains a singular vector related to te tree residuals of te first restart. Te corresponding residual norms are ten decreasing along te second restart. For all te values of p f, te tird restart is dealing wit V 1 wic columns are singular vectors related to all te rigt-and side residuals. Terefore, all te residuals norms decrease along tis restart and it continues similarly along te next restarts.

51 36 Figure 2.5: Histories of convergence of block metods wen solving te Poisson problem wit p = 5 canonical rigt-and sides (Table 2.6) Figure 2.6: Histories of convergence of block metods wen solving te Poisson problem wit p = 10 canonical rigt-and sides (Table 2.6) Random rigt-and sides Now, random vectors are cosen as rigt-and sides for te Poisson problem. Tey are generated in Matlab using te seed random number generator (rand( seed,0)) and te command B = rand(n, p max ) were p max = 160. Te rigt-and side block is ten no more unitary nor ortogonal and as full rank. Once again, te initial iterate X 0 is set to zero.

52 37 Numerical results We report te results in Table 2.7. number of RHS p = 5 p = 10 p = 20 p = 40 p = 80 p = 160 It r ops It r ops It r ops It r ops It r ops It r ops FGMRES(5) sequence BFGMRES(5) BFGMRESD(5) BFGMREST(5,ceil(p/2)) BFGMREST(5,ceil(p/3)) Table 2.7: Number of iterations (It) and operation ratio (r ops ) for te Poisson problem for p random rigt-and sides. First we remark tat tis problem is more difficult tan te previous one. FGMRES sequence For eac block size, FGMRES(5) sequence needs almost twice te number of iterations used in te test case wit canonical basis vectors. However, te number of iterations still increases nearly linearly wit te block size p. BFGMRES BFGMRES(5) beaves in a weird way on tis problem. First, BFGMRES(5) does not improve te convergence, it performs more iterations tan FGMRES(5) sequence for p = 5. Tis result is te consequence of te block convergence detection: BFGMRES(5) stops wen eac solution as converged, even if some solutions ave converged earlier tan oter. Afterward, te numbers of iterations is similar for p = 20 and p = 40 (It = 400). Tis penomenon could be explained by te fact tat te union of te Krylov subspaces generated at eac restart for p = 40 contains te union of tose generated for p = 20 after only 20 block iterations. BFGMRESD For larger values of p, te numbers of iteration are in te range of tose of deflated metods. BFGMRESD beaves similarly as for te canonical basis rigt-and sides. Deflation always improves te number of iterations and reduces te number of operations, at least for te low values of p (10, 20, 40). Besides, te number of iterations of FGMRES(5) sequence for te last value of p (p = 160) is more tan eigt times te one of BFGMRESD. BFGMREST Te number of iterations of truncated metods is not as good as in te previous example. Tey converge in more iterations tan BFGMRES for p = 40, 80 and BFGMREST(5, ceil(p/3)) converges in more iterations tan BFGMRES for p = 160. Since BFGMRES works exceptionally well for tis example, tese results seem reasonable. Anoter unusual beavior can be observed for p = 80, BFGMREST(5, ceil(p/3)) converges in few less iterations tan BFGMREST(5, ceil(p/2)). Te only possible explanation of tis beavior would be tat te main information for te convergence is contained in te first ceil(p/3) columns of te block rigt-and sides. However, te numbers of operations of truncated metods are still lower tan te deflated ones. Besides, BFGMREST(5, ceil(p/3)) improves nearly always te number of operations, it only fails for te larger block size p = 160. Comment on te istories of convergence As in te previous example, istories of convergence are plotted for only two values of p (p = 5, p = 10), in Figures 2.7 and 2.8 respectively. Despite te scaling of te plot, we observe te same penomena as for te previous set of rigt-and sides. BFGMRESD starts like BFGMRES and ten acieves convergence in alf of te iterations of BFGMRES. It can also be noticed tat te block size at te end of te BFGMRESD convergence is again found to be one for bot values of p. BFGMREST is beaving in a sligtly different manner tan in te previous example. Contrary to te canonical rigt-and sides case, all te residual norms ave decreased after te first restart. Tis must be te consequence of te non-ortogonality of te block rigt-and side. Indeed, te p b first singular vectors seem to provide information about all te initial residuals. However, at te second restart, like for te ortonormal rigt-and sides case, some residuals are decreasing more slowly tan oters, except for p = 5 and p b = ceil(p/2). Tis beavior could be te effect of te size of p b, BFGMRES seems to lack information at te second restart to make converge all te residuals in an uniform way. However, after te two first restarts, te convergence rate is nearly te same for eac residuals watever te p b parameter is. Bot examples illustrate te efficiency of block metods wit deflation or truncation on tis Poisson example. It as to be stressed tat no computational speed-up is taken into account, like block matrix vector products acceleration, in tis comparison. Neverteless, bot deflation and truncation strategies ave sown efficient compared wit a block flexible metod and te sequence strategy.

53 38 Figure 2.7: Histories of convergence of block metods wen solving te Poisson problem wit p = 5 random rigt-and sides (Table 2.7). Figure 2.8: Histories of convergence of block metods wen solving te Poisson problem wit p = 10 random rigt-and sides (Table 2.7). Convection-diffusion problem In tis section, we focus on te convection diffusion problem (see Equation (2.4)). For tese numerical experiments, te mes grid size is taken equal to 1/128. Te parameters c and d are taken equal to 256 and

54 39 ε equal to 1; te Péclet condition is ten satisfied. Te problem size is ten equal to = and te matrix is non-symmetric and five banded. Since Diriclet boundary conditions are included in te linear system, te rigt-and side B as to be generated suc tat A 1 B satisfy tem. Tus, in order to build te rigt-and side, we first generate te solution X. Te solution X is a random matrix wic values on te boundaries of te domain are set to one to satisfy te boundary conditions. We still use te seed random number generator (rand( seed,0)) in Matlab. Te solution is ten multiplied by A to obtain te rigt-and side: B = AX. Te initial iterate X 0 is first set to zero in te interior wereas its values on te boundaries of te domain are set to one. Numerical results Numerical results are displayed in Table 2.8. number of RHS p = 5 p = 10 p = 20 p = 40 p = 80 p = 160 It r ops It r ops It r ops It r ops It r ops It r ops FGMRES(5) sequence BFGMRES(5) BFGMRESD(5) BFGMREST(5,ceil(p/2)) BFGMREST(5,ceil(p/3)) Table 2.8: Number of iterations (It) and operation ratio (r ops ) for te convection-diffusion problem for p random rigt-and sides. FGMRES sequence In Table 2.8, we notice again tat FGMRES (5) needs for eac rigt-and side, almost always te same number of iterations to reac convergence. It can be also noticed tat block metods still improve convergence. Neverteless, it is not as efficient as in Table 2.7. Indeed, despite a lower number of iterations tan FGMRES(5) sequence, te operation ratios are in most of te case greater tan one. Exclusively truncated metods can improve te number of operations but only for te lower values of p (5 and 10). However, it can be noticed tat BFGMREST(5, ceil(p/3)) needs more iterations tan BFGMRES for te largest value of p. BFGMRESD(5) is yet te metods wic always needs te lower number of iterations; te best iteration ratio, compared wit FGMRES(5) sequence, is nearly 4 for p = 160. Comments on te istories of convergence Histories of convergence for p = 5 and p = 10 are once again plotted in Figures 2.9 and 2.10 respectively. Most of te comments related to Figures 2.7 and 2.8 (Poisson problems for random rigt-and sides) remain also valid on tis experiment. Indeed, te block size p d of BFGMRESD(5) at te end of te convergence is found to be only one. Te truncated metod still does not converge uniformly during te first two restarts for te same values of p and p f as for te previous example (p = 5, p f = 2 and p = 10, p f = 5, 4). But ten, it beaves nearly similarly for all rigt-and sides.

55 40 Figure 2.9: Histories of convergence of block metods wen solving te convection-diffusion problem for p = 5 rigt-and sides (Table 2.8). Figure 2.10: Histories of convergence of block metods wen solving te convection-diffusion problem for p = 10 rigt-and sides (Table 2.8).

56 Conclusions In tis capter a flexible variant of GMRES wit deflated restarting as been presented. Its principle relies on injecting armonic Ritz vectors in te Krylov subspace at eac restart. Since tis metod allows te preconditioner to vary from one iteration to te next, te armonic Ritz vectors are approximate eigenvectors of a different preconditioned matrix from one restart to te next. Tis metod as proven efficient on bot academic test cases and real-life applications. In Section 2.6, we ave also illustrated numerically tat block metods can greatly improve te convergence of single rigt-and side metod. A decrease in te number of iterations is observed, especially wen useless information for te convergence is removed along te solution pase (BFGMRESD, BFGMREST). However, on te previous examples, te block metods require often more operations tan wen solving te problem in sequence. Tis beavior is te consequence of two main features: te additional ortogonalization required in te block metods, te ceap computational cost of te preconditioning tecnique (GMRES (5)). Indeed, in tis configuration, to reduce significantly te number of iterations does not ave a direct impact on te number of operations. Terefore, a favorable situation for block metods would be to use an expensive preconditioner. Neverteless, in te presented comparisons, no timing or memory estimation were appearing. Tese quantities are yet of crucial interest in real life applications. Furtermore, considering tese quantities wen comparing tese metods rater tan iterations and operations could igligt te interest of block metods. Indeed, on one and, giving elapsed times could igligt te speed-up obtained wen gatering matrix-vector products. On te oter and, in a memory constrained parallel environment, block metods would also appear not as expensive as for sequential experiments, especially wen using truncated metods. All tese quantities will be analyzed on a real life geopysics application in Capter 4. Besides, block metods can be numerically improved using spectral information. Tis would be te purpose of BFGMRESD wit deflated restarting (BFGMRES D DR) or BFGMREST wit deflated restarting (BFGMRES T DR). Suc metods will be te object of future works since teir derivation and implementation cannot be straigtforwardly deduced from Algoritms 11, 12 and 7 respectively. Moreover, anoter kind of block Arnoldi s process (te Rue variant [85]) sould be used to guarantee a coice of te parameter k independent of te current block size p d.

57 42

58 Capter 3 A tree-dimensional geometric two-level metod applied to Helmoltz problems 3.1 Introduction In tis capter, we propose a multigrid preconditioner for te solution of te tree-dimensional Helmoltz equation at ig wavenumbers wit absorbing boundary conditions in a bounded domain Ω: u k 2 u = s were u denotes te wave pressure, k te wavenumber and s a source term. Te discretization of suc problems is detailed in Appendix A: a second order finite difference discretization sceme is used and te absorbing boundary conditions are formulated wit a Perfectly Matced Layer (PML [11, 12]). Te finite difference discretization of te Helmoltz problem at ig wavenumbers leads to a linear system Ax = b were A is a large sparse matrix. Tis matrix is complex non-symmetric, indefinite, and generally illconditioned. For some years tere as been considerable interest in multigrid metods [15, 20, 61, 115] for Helmoltz problems (see also references terein). Neverteless te indefiniteness of te Helmoltz problem as prevented multigrid metods from being as efficient as tey are for symmetric positive-definite problems. Multigrid metods encounter difficulties bot in te smooting procedure and in te coarse grid correction [7, 19, 37, 42, 70]. On te one and, standard smooters cannot smoot error components on te intermediate grids. On te oter and, on coarse or very coarse meses, te approximation of te discrete Helmoltz operator is relatively poor and tis creates a difficulty for te coarse grid correction. Remedies ave been proposed and analyzed in te case of omogeneous problems [31, 37, 42, 70, 78]. Tey can be split into tree groups: In [31, 37, 70], it is advised to use bot few grids in te multigrid ierarcy and non-standard smooters (GMRES) on te coarser levels [37]. However, using few grids in te multigrid ierarcy can be a bottleneck in tree dimensions since te coarsest linear system can still be large. Indeed, te solution of te coarse problem could not be affordable in terms of computational resources. A second approac is to use a wave-ray multigrid algoritm [77] only. It consists in using ray grids in addition of te wave grids to represent te error on te coarser grids of te ierarcy. Tanks to tis representation, it is possible to obtain good smooting properties on te intermediate grids and an efficient coarse grid correction. Tis metod is found efficient for omogeneous problems in bot geometric [74, 78] and algebraic multigrid [118]. Notwitstanding, extending tis approac to real life applications involves to compute ray functions by possibly solving large eigenvalue problems [122, 123]. Once again, tis strategy is expensive in terms of computational resources. More recently a tird multigrid strategy as been proposed for te numerical solution of te Helmoltz equation [42, 43]. Te multigrid metod is not directly applied to te discrete Helmoltz operator but to a complex sifted one defined as: u (1 iβ)k 2 u 43

59 44 were β denotes te sift parameter. Using tis sifted operator avoids bot te indefiniteness and te coarse grid correction problems [42]. Tus it is possible to build a robust multigrid metod wit standard multigrid components tat is used as a preconditioner for a Krylov subspace metod. Tis solution metod as been evaluated on model and realistic geopysical applications involving igly variable coefficients and relatively ig wavenumbers. Neverteless we note tat te complexity of te metod for pure Helmoltz problems was found to be relatively ig at ig wavenumbers, see for example te recent analysis on a realistic dataset in geopysics (see [95] in two dimensions and [96] in tree dimensions respectively). In [14], te autors apply te sifted strategy to an algebraic multi-level metod. Introducing pivoting based on weigted grap matcing [35, 103], tey perform an incomplete LDL T factorization on eac level of teir ierarcy. Tis multi-level metod applied to a sifted Helmoltz operator is ten used as a preconditioner for te original Helmoltz problem. Tis metod as been evaluated on realistic geopysical data for bot two-dimensional and treedimensional problems. It as sown efficient to improve te convergence of Krylov metods but its complexity and memory are still relatively ig since LDL T factors must be built and stored. Yet for bot geometric and algebraic multilevel preconditioners, an important question is ow to determine te sift parameter β. For te algebraic multilevel preconditioner [14], it is advised to use β = 0.1, tis coice is supported by extensive numerical experiments. For te geometric multigrid [42, 43], in two dimensions, it is advised to take β = 0.5, tis coice is led by Fourier analysis [15] and β = 0.4 wen a fourt-order discretization sceme is used for te Helmoltz operator [116]. In tree dimensions [96], β is also taken equal to 0.5; te two-dimensional geometric preconditioner [42] is used plane by plane: plane smooters [88] and semi-coarsening [124] are used. However, wen a Fourier analysis is performed in te tree-dimensional context, coosing β = 0.6 would lead to improved results (see Section 3.3.2). Tus, te coice of te sift parameter is really an open question; it depends on te multilevel components and of course on te discretization sceme cosen for te Helmoltz operator. In te two-dimensional case, te use of a two-grid preconditioner applied to te original Helmoltz operator enables to avoid te coice of a sift parameter [31], and te coarse solution pase of te two-grid algoritm is andled wit a sparse direct metod. However, tis cannot be extended to tree dimensions easily; indeed te computational cost of a LU-factorization, even on te coarse grid, may be too severe. Terefore, an iterative metod as to be considered on te coarse grid. We are ten considering a two-grid cycle wit an approximate coarse solution tat we call a perturbed two-grid cycle. In tis capter, we will sow tat a perturbed two-grid cycle can be as efficient as a two-grid metod wit an exact coarse solution, even wen using a really large coarse grid convergence tresold. Terefore, te purpose of tis capter is to introduce te perturbed two-grid metod and to motivate its use for tree-dimensional Helmoltz problems. First, we will give some basic information about treedimensional multigrid and Fourier analysis. We will ten perform a smooting analysis in te Local Fourier Analysis (LFA) sense. Ten, a Rigorous Fourier Analysis (RFA) of a perturbed two-grid cycle will be performed. Finally, after a practical smooter selection for te tree-dimensional Helmoltz operator wit PML, we will analyze te spectrum of tis operator preconditioned by one cycle of te perturbed two-level metod in te flexible GMRES framework (see Section 2.3.1). In tis capter, we mainly refer to two monograps related to multigrid: "Multigrid" from U. Trottenberg, C. Oosterlee and A. Scüller [115] and "Multi-Grid Metods and Applications" from W. Hackbusc [61]. Neverteless, even if tey are not cited in te text, we ave also found te following books relevant and elpful: "Multigrid metods" from S. F. McCormick [81], "A Multigrid tutorial" from W. L. Briggs and V. E. Henson and S. F. McCormick [20] and "Multigrid metods: fundamental algoritms, model problem analysis and applications" from K. Stüben and U. Trottenberg [113]. 3.2 Sort introduction to tree-dimensional geometric multigrid Te multigrid metod is a very efficient multi-scale metod for te solution of linear systems arising from te discretization of elliptic partial differential equations. It exploits discretizations wit different mes sizes of a given problem to obtain optimal convergence factor using standard relaxation tecniques (Jacobi, Gauss-Seidel...). Tis metod enjoys two main favorable convergence properties for elliptic problems:

60 45 te complexity of its algoritm is O(N) were N is te total number of unknowns, te convergence factor of a multigrid cycle is essentially independent of te size of te finest grid. Constant efforts ave been made to extend tese properties to a larger class of problems. Since we are considering finite difference discretization scemes on structured grids in tis capter, geometric multigrid is a natural coice. Terefore coarse grid operators are deduced using te same discretization sceme as for te fine grid operator discretization, considering a coarser mes size (direct coarse grid approximation). We are using te most standard coarsening: te coarse mes size is te double of te fine mes size (2). Te multigrid metod is mainly built on four components: smooting, restriction, prolongation and coarse grid solution. We enumerate teir main role in multigrid: Smooting enables to avoid solution wit ig frequency components. Few iterations of a relaxation metod, as Jacobi, are used to smoot te ig frequency components. Relaxation metods are not efficient in smooting low-frequency components but tese components correspond to ig frequency components on a coarse grid. Consequently a ierarcy of grids is used to reduce te low-frequency components efficiently. Tus, transfer operators (restriction, interpolation) are needed to move from a grid to anoter. Restriction enables to pass from a fine grid-level to a coarse one. Prolongation enables to pass from a coarse grid-level to a fine one. Te final element of multigrid is te solution metod on te coarsest grid level. Here, direct or iterative metods can be used to solve tis coarse linear system. Wen several grids are considered in a multigrid ierarcy, to use a direct metod is te most natural way to solve te coarse problem as it is of reduced size. However, in te tree-dimensional case, wen few grids are considered in te multigrid ierarcy, direct metods may be proibitive in terms of computational resources and iterative metods ave to be used. Tis question will be discussed later in tis capter. In te next subsections, basic components of a geometric multigrid algoritm in tree dimensions are described Basic geometric multigrid components Standard smooters As previously said, standard smooters are often relaxation metods suc as Jacobi, forward/back-ward Gauss-Seidel, Red-Black Gauss-Seidel [115], symmetric Gauss-Seidel [93, Section 4.2.6]. Te easiest and more general way to write tem is first to split te system matrix in te following way: A = D E F, were D is te diagonal of A, E its strictly lower part and F its strictly upper part respectively. Wit tese notations, we give te expression of a Jacobi, a Gauss-Seidel and a symmetric Gauss-Seidel iteration as in ([101], Capter 4). Besides, we denote b te rigt-and side, u te exact solution verifying Au = b and u m te current approximate solution and um+1, te next one. Jacobi According to te previous notations, a Jacobi iteration can be written as u m+1 = u m + ω rd 1 (b Au m ), were ω r denotes a relaxation parameter (0 < ω r < 2) and its iteration matrix S is deduced from were I denotes te identity matrix. u u m+1 = S (u u m ) = ( I ω r D 1 A ) (u u m ),

61 46 Forward Gauss-Seidel as: Wit te same notations as above, a forward Gauss-Seidel iteration can be written x k+1 = (D E) 1 (Fx k + b). Its iteration matrix S is ten: u u m+1 = S (u u m ) = (D E) 1 F(u u m ). Backward Gauss-Seidel written as: Wit te same notations as above, a backward Gauss-Seidel iteration can be x k+1 = (D F) 1 (Ex k + b). Its iteration matrix S is u u m+1 = S (u u m ) = (D F) 1 E(u u m ). Symmetric Gauss-Seidel A symmetric Gauss-Seidel iteration consists of a first iteration of forward Gauss-Seidel and a second iteration of backward Gauss-Seidel. Wit te same notations as above, a symmetric Gauss-Seidel iteration can be written as: Its iteration matrix S is ten x k+1 = (D F) 1 (E((D E) 1 (Fx k + b)) + b). u u m+1 = S (u u m ) = (D F) 1 E(D E) 1 F(u u m ). Example 1. We consider te matrix A resulting from te discretization of tree-dimensional Helmoltz type operators wit Diriclet boundary conditions (discretized wit a classical second-order finite difference sceme for a vertex-centered arrangement) also denoted by L : L = κ 2 I wit κ C. Te stencil of L is, using te stencil notation defined in ([115], section 1.3.4), L = κ were κ denotes a term proportional to te wavenumber k (see Appendix A). If κ = k, te original treedimensional Helmoltz operator is obtained. If κ = 1 βik, β [0, 1], te sifted tree-dimensional Helmoltz operator is obtained ([40], Capter 7). We deduce te stencils of D, E and F as: D = E = F = κ , It as to be noticed tat if te wavenumber κ is taken equal to 0, te different expressions old for te negative Laplacian operator wit Diriclet boundary conditions..,,

62 47 Restriction Te aim of restriction is to transfer information from a fine grid to a coarser one. We denote by Ω ι, ι R, te grid defined by Ω ι = G ι Ω were G ι denotes te infinite grid: G ι = { (x, y, z) (x, y, z) = (iι, jι, kι); (i, j, k) Z 3}, (3.1) and Ω R 3 is a closed bounded parallelepiped domain. In geometric multigrid (vertex-centered case), only one out of two points per direction of te fine grid Ω will remain on te coarse grid Ω 2. Considering Figure 3.1, te remaining points after restriction are te ones marked wit a bullet ( ). Figure 3.1: A 3D fine grid wit standard geometric coarsening ( : coarse grid point). A natural coice for restriction could be injection. However, tis coice is not very practicable in a multigrid context: it implies coosing of a ig-order prolongation in order to maintain just convergence. Tis is debated in Remark (2.7.1) in [115] were a relation between te orders of te prolongation and restriction and te order of te differential operator is given to guarantee te efficiency of multigrid on SPD problems. Te order of te prolongation m P is defined as te igest degree plus one of polynomials tat are interpolated exactly [61, Section 3.4.3]. Similarly te order of te restriction m R is defined as te igest degree plus one of polynomial tat are restricted exactly. In order to obtain an efficient multigrid algoritm, te sum of m R and m P must be larger tan te order of te differential operator denoted by m PDE (iger derivative degree in te partial differential equation (PDE)): m R + m P > m PDE. Since te order of te injection is zero, a quadratic interpolation sould ten be used for a Helmoltz type of PDE. Tus, a frequent coice for restriction is te Full weigting (FW) operator; its order is equal to two (m R = 2). Its principle relies on weigting fine grid coefficients around te neigboring coarse grid points. Considering coordinates (x, y, z) Ω 2, te FW-restriction function I 2 applied to a fine grid function r in te tree-dimensional case is: I 2 (r (x, y, z)) = 1 64 (8r (x, y, z) + 4r (x +, y, z) + 4r (x, y +, z) + 4r (x, y, z + ) +4r (x, y, z) + 4r (x, y, z) + 4r (x, y, z ) +2r (x +, y +, z) + 2r (x, y +, z + ) + 2r (x +, y, z + ) +2r (x, y, z) + 2r (x, y, z ) + 2r (x, y, z ) +2r (x, y +, z) + 2r (x, y, z + ) + 2r (x, y, z + ) +2r (x +, y, z) + 2r (x, y +, z ) + 2r (x +, y, z ) +r (x +, y +, z + ) + r (x +, y +, z ) + r (x +, y, z + ) +r (x, y +, z + ) + r (x +, y, z ) + r (x, y, z + ) +r (x, y +, z ) + r (x, y, z )).

63 48 Prolongation Prolongation transfers information from a coarse (2) to a fine grid. Te prolongation will be based on a trilinear interpolation (m P = 2). Considering Figures 3.2 and 3.3, it can be seen ow fine grid points (empty polygons) are deduced from coarse grid points (bullets). Coefficients used for te points on te cube faces are described in Figure 3.3 for te trilinear interpolation case. For te cube center, represented by an empty disk in Figure 3.2, all te coefficients from te eigt corners are weigted by a factor of 1 8. Coarse grid points are remaining wit te same associated values. In fact, te trilinear interpolation is te adjoint of te Full weigting (FW) restriction. Figure 3.2: Fine grid for a 3D trilinear interpolation ( : coarse grid points). Figure 3.3: Weigtings for 3D interpolation on a cube face ( : coarse grid points). In te next section, we depict ow to assemble all tese components to obtain a geometric multigrid algoritm Geometric multigrid algoritms In tis section, we introduce notations tat we will use later in tis capter. We denote by I 2 te restriction operator, I 2 te prolongation operator, L te fine grid and L 2 te coarse grid operators. Te vector u is te exact solution satisfying b = L u, u te current approximate solution at te fine level, u 2 te solution of te current coarse problem, b 2 te coarse rigt-and side. We denote by ν 1 and ν 2 te number of pre-

64 49 and post smooting iterations respectively and S te smooting procedure. Algoritm 13 depicts a classical two-grid cycle. Algoritm 13 Two-grid cycle TG(L, u, b ). 1: Presmooting: u := S(L, u, b, ν 1 ) 2: Compute te residual r : r = b L u 3: Restrict te residual: b 2 = I 2 r 4: Solve on Ω 2 : L 2 u 2 = b 2 5: Interpolate te coarse solution u 2 to obtain a correction of te fine solution u : I 2 u 2 6: Add tis correction to te solution: u := u + I 2 u 2 7: Postsmooting: u := S(L, u, b, ν 2 ) Tus, a two-grid cycle consists first in a smooting step (presmooting), te restriction of residual b L u, tis last operation gives te coarse rigt-and side, te coarse problem is ten solved, te coarse solution is interpolated and te obtained correction added to te fine solution. Figure 3.4 represents a V- cycle in te two-grid case wic corresponds to Algoritm 13. Figure 3.4: Two-grid V-cycle. Te two-grid cycle is te simplest form of a multigrid cycle. Tis process can be generalized to any number of grid levels, as te next recursive algoritm sows, denoted by MG(L, u, b ) in Algoritm 14. Algoritm 14 Multigrid cycle MG(L, u, b ). 1: Presmooting: u := S(L, u, b, ν 1 ) 2: Compute te residual r : r = b L u 3: Restrict te residual: b 2 = I 2 4: Set u 2 := 0. 5: for it=1:γ do 6: u 2 := MG(L 2, u 2, b 2 ) 7: end for r 8: Interpolate te coarse solution u 2 to obtain a correction of te fine solution u : I 2 u 2 9: Add tis correction to te solution: u := u + I 2 u 2 10: Postsmooting: u := S(L, u, b, ν 2 ) In fact, te sape of te multigrid cycle depends on te γ parameter. Te sape of te multigrid cycle can be canged in order to possibly improve te convergence beavior, combining iteratively multigrid components in a different way. A W-cycle is obtained wit γ = 2, a F-cycle wit a combination of γ = 1 and γ = 2, and a V-cycle wit γ = 1 (Figure 3.5).

65 50 Figure 3.5: F-cycles for two, tree and four grids (from left to rigt). Notwitstanding, we will only consider a two-grid cycle in te following of tis capter. Tis is motivated by our application. In te next section, we present a metod to analyze te convergence properties of a tree-dimensional geometric two-grid cycle. Tis tecnique is named Fourier analysis. 3.3 Rigorous and Local Fourier Analysis of a two-grid metod First, we write te iteration matrix M of a classical two-grid metod wit te notations of Section 3.2.2: M (u u ) = S ν 2 (I I 2 L 1 2 I2 L )S ν 1 (u u ). (3.2) Fourier analysis aims at obtaining an estimation of te norm and spectral radius of M and at analyzing te smooting beavior of relaxation procedures. In fact, te two-norm of M leads te convergence of te two-grid cycle. Indeed, since te (m + 1)t iterate u m+1 satisfy u u m+1 = M (u u m ), it follows tat u u m+1 2 u u m 2 M 2 = ρ(m H M ). Furtermore te spectral radius of M (ρ(m )) is equal to te convergence factor of te two-grid cycle. Tis last quantity plays an important role in te multigrid convergence teory approximating te asymptotic beavior of te two-level cycle. Te Fourier analysis implements tecniques to block diagonalize te operator M in a Fourier basis [61, p. 25]. Tis block diagonal representation of M enables ten to easily deduce te two-norm of M. In tis section, we will present two different Fourier analysis. Te first one is te Rigorous Fourier analysis (RFA) [115, Section 3.3]: te two-grid convergence factor can be deduced in te situations enumerated in [115, Section 3.4.3]. We will focus on te case were te operator satisfies Diriclet boundary conditions and were a Jacobi smooter is used. Te second one is te Local Fourier analysis (LFA) [115, Capter 4] or local mode analysis [15]: te influence of boundary conditions is not taken into account and smooters suc as Gauss-Seidel can be analyzed Rigorous Fourier Analysis (RFA) of a two-grid metod We now introduce some of te main elements of RFA to study te two-grid convergence. First, we consider te ortogonal basis of te fine grid space Ω = G [0, 1] 3 spanned by te eigenfunctions of L : ϕ l 1,l 2,l 3 (x, y, z) = sin(l 1 πx) sin(l 2 πy) sin(l 3 πz), for l 1, l 2, l 3 = 1,..., n 1 and (x, y, z) Ω, were n denotes te inverse of te mes grid size, n = 1/. Tese functions are eigenfunctions of te Helmoltz operator wit Diriclet boundary conditions (see Example 1). We introduce ten te at most

66 51 eigt-dimensional spaces of armonics for l 1, l 2, l 3 = 1,..., n 2 [115, Equation (3.4.1)]: E l 1,l 2,l 3 = span[ϕ l 1,l 2,l 3, ϕ n l 1,n l 2,n l 3, ϕ n l 1,l 2,l 3, ϕ l 1,n l 2,n l 3, ϕ l 1,n l 2,l 3, ϕ n l 1,l 2,n l 3, ϕ l 1,l 2,n l 3, ϕ n l 1,n l 2,l 3 ], wic allows to block diagonalize [61, p. 25] te two-grid iteration matrix M in te Fourier basis Q defined by: ] ] ] Q = [[[ E l 1,1 2,l 3. (3.3) l 1 =1,...,n/2 l 2 =1,...,n/2 l 3 =1,...,n/2 In fact, M leaves te armonic spaces E l 1,l 2,l 3 invariant for l 1, l 2, l 3 = 1,..., n 2 if e.g. Jacobi or Red-Black Gauss-Seidel is used as a smooter. Te E l 1,l 2,l 3 spaces are eigt-, four-, two- and one-dimensional spaces wit respect to te values of l 1, l 2, l 3 respectively: 8 if l 1 n 2 and l 2 n 2 and l 3 n 2, dim(e l 1,l 2,l 3 4 if l ) = 1 = n 2 or l 2 = n 2 or l 3 = n 2, 2 if l 1 = l 3 = n 2 or l 1 = l 2 = n 2 or l 2 = l 3 = n 2, 1 if l 1 = l 2 = l 3 = n 2. Similarly as on te fine grid, we introduce te eigenfunctions on te coarse grid space Ω 2 = G 2 [0, 1] 3 : ϕ l 1,l 2,l 3 2 (x, y, z) = sin(l 1 πx) sin(l 2 πy) sin(l 3 πz), for l 1, l 2, l 3 = 1,..., n 2 1 and (x, y, z) Ω 2. On Ω 2, te E l 1,l 2,l 3 2 spaces are one-dimensional spaces only. Indeed, te eigenfunctions spanning E l 1,l 2,l 3 2 coincide up to teir sign on Ω 2 for l 1, l 2, l 3 = 1,..., n 2 : ϕ l 1,l 2,l 3 2 (x, y, z) = ϕ n l 1,n l 2,n l 3 2 (x, y, z) = ϕ n l 1,l 2,l 3 2 (x, y, z) = ϕ l 1,n l 2,n l 3 2 (x, y, z) = ϕ l 1,n l 2,l 3 2 (x, y, z) = ϕ n l 1,l 2,n l 3 2 (x, y, z) = ϕ l 1,l 2,n l 3 2 (x, y, z) = ϕ n l 1,n l 2,l 3 2 (x, y, z), (x, y, z) Ω 2. Practically, tis means tat E l 1,l 2,l 3 2 = span [ ] ϕ l 1,l 2,l 3 2. Later in tis section, we denote operators written in te Fourier basis Q wit a at. Tus, denoting by Q te matrix wose columns span te Fourier basis, Q = span[q ], we obtain: M = 3 Q M Q H. To simplify tese notations, we introduce te symbol = and write: M = M. For eac triplet (l 1, l 2, l 3 ) we ave M (l 1, l 2, l 3 ) = M l E 1,l 2,l 3. Tus, we will ave a block diagonal representation of M in te Fourier basis: M = M = [ M (l 1, l 2, l 3 ) ] l 1,l 2,l 3 =1,...,n/2. In te following, we will deduce a representation of te two-grid iteration matrix (Equation 3.2) wit respect to te E l 1,l 2,l 3 spaces considering a Jacobi smooter. We first give te representation wit respect to E l 1,l 2,l 3 of te tree-dimensional Helmoltz operator wit Diriclet boundary conditions (Example 1) bot on te fine and te coarse grid. We will ten detail te Fourier representation of te trilinear interpolation and te full-weigting restriction denoted by Î 2 and Î2 respectively. Tis will enable us to obtain a representation in te Fourier basis of te coarse grid correction operator: K 2 = I I 2 L 1 2 I2 L.

67 52 For tat purpose, we introduce ξ, η and γ; tese parameters will be used to write more syntetically te different operators in te Fourier basis: ( ) ξ = sin 2 l1 π, ( 2 ) η = sin 2 l2 π, 2 γ = sin 2 ( l3 π 2 Lemma 1. Te armonic spaces E l 1,l 2,l 3 for l 1, l 2, l 3 = 1,..., n 2 are invariant under Helmoltz type operators L wit Diriclet boundary conditions: L : E l 1,l 2,l 3 E l 1,l 2,l 3, for l 1, l 2, l 3 = 1,..., n 2. Te operator L can be represented in te Fourier basis as a block diagonal matrix. Its representation wit respect to te spaces E l 1,l 2,l 3 consists in diagonal blocks as described below, using notations of Equation 3.4: For l 1, l 2, l 3 = 1,..., n 1, we obtain te following 8 8 block 2 4 (ξ + η + γ) κ2 2 4 (3 ξ η γ) κ2 2 4 (1 ξ + η + γ) κ2 2 4 (2 + ξ η γ) κ2 L (l 1, l 2, l 3 ) = diag 2 4. (1 + ξ η + γ) κ2 2 4 (2 ξ + η γ) κ2 2 ). 4 (1 + ξ + η γ) κ2 2 4 (2 ξ η + γ) κ2 2 For l 1 = n 2, l 2, l 3 < n 2 or l 2 = n 2, l 1, l 3 < n 2 or l 3 = n 2, l 1, l 2 < n, we obtain te following 4 4 block 2 4 (ξ + η + γ) κ2 2 4 (3 ξ η γ) κ2 L (l 1, l 2, l 3 ) = diag 2. 4 (1 ξ + η + γ) κ2 2 4 (2 + ξ η γ) κ2 2 For l 1 = l 2 = n 2, l 3 < n 2 or l 1 = l 3 = n 2, l 2 < n 2 or l 2 = l 3 = n 2, l 1 < n, we obtain te following block L (l 1, l 2, l 3 ) = diag For l 1 = l 2 = l 3 = n, we obtain te following 1 1 block 2 4 (ξ + η + γ) κ2 2 4 (3 ξ η γ) κ2 2. (3.4) L (l 1, l 2, l 3 ) = 4 2 (ξ + η + γ) κ2.

68 53 Proof. Obviously, since te eigenfunctions spanning E l 1,l 2,l 3 are eigenfunctions of L, te armonic spaces are invariant under L. Te representation of L wit respect to te armonic space E l 1,l 2,l 3 is obtained by calculating te image of eac of its basis functions using trigonometric formulas: L ϕ l 1,l 2,l 3 = ( 4 2 (ξ + η + γ) κ2 )ϕ l 1,l 2,l 3, L ϕ n l 1,n l 2,n l 3 = ( 4 2 (3 ξ η γ) κ2 )ϕ n l 1,n l 2,n l 3, L ϕ n l 1,l 2,l 3 = ( 4 2 (1 ξ + η + γ) κ2 )ϕ n l 1,l 2,l 3, L ϕ l 1,n l 2,n l 3 = ( 4 2 (2 + ξ η γ) κ2 )ϕ l 1,n l 2,n l 3, L ϕ l 1,n l 2,l 3 = ( 4 2 (1 + ξ η + γ) κ2 )ϕ l 1,n l 2,l 3, L ϕ n l 1,l 2,n l 3 = ( 4 2 (2 ξ + η γ) κ2 )ϕ n l 1,l 2,n l 3, L ϕ l 1,l 2,n l 3 = ( 4 2 (1 + ξ + η γ) κ2 )ϕ l 1,l 2,n l 3, L ϕ n l 1,n l 2,l 3 = ( 4 2 (2 ξ η + γ) κ2 )ϕ n l 1,n l 2,l 3. Considering te different values of l 1, l 2, l 3, we obtain te results proposed in Lemma 1 taking into account te dimensions of spaces E l 1,l 2,l 3. Lemma 2. On te coarse grid space Ω 2, E l 1,l 2,l 3 2 is invariant under te coarse tree-dimensional Helmoltz operator L 2 : and its representation wit respect to E l 1,l 2,l 3 2 L 2 : E l 1,l 2,l 3 2 E l 1,l 2,l 3 2, for l 1, l 2, l 3 = 1,..., n 2 is L 2 (l 1, l 2, l 3 ) = 4 2 ((1 ξ)ξ + (1 η)η + (1 γ)γ) κ2. Proof. Te proof is similar to tat of Lemma 1: L 2 ϕ l 1,l 2,l 3 2 = [ 4 2 ((1 ξ)ξ + (1 η)η + (1 γ)γ) κ2 ]ϕ l 1,l 2,l 3 2. We now focus on te grid transfer operators: te full-weigting restriction and te trilinear interpolation. Once te representation in te Fourier basis of te restriction is obtained, te representation of te interpolation is deduced straigtforwardly since it is its adjoint [115, Remark 3.3.5]. Lemma 3. Te range of I 2 (El 1,l 2,l 3 ) coincides wit te coarse armonic space E l 1,l 2,l 3 2 for l 1, l 2, l 3 = 1,..., n 2 1: I 2 : E l 1,l 2,l 3 span[ϕ l 1,l 2,l 3 2 ], for l 1, l 2, l 3 = 1,..., n 2 1. Te full-weigting restriction can be block-diagonalized in te Fourier basis and as te following block representation:

69 54 For l 1, l 2, l 3 = 1,..., n 1, we ave te following 8 1 block 2 Î 2 (l 1, l 2, l 3 ) = (1 ξ)(1 η)(1 γ) ξηγ ξ(1 η)(1 γ) (1 ξ)ηγ (1 ξ)η(1 γ) ξ(1 η)γ (1 ξ)(1 η)γ ξη(1 γ) T. For l 1 = n 2 or l 2 = n 2 or l 3 = n 2, Î 2 (l 1, l 2, l 3 ) = 0. Proof. First, we apply te restriction to te basis functions of E l 1,l 2,l 3 for l 1, l 2, l 3 = 1,..., n 2 1 using te full-weigting restriction and trigonometric identities: ϕ l 1,l 2,l 3 (1 ξ)(1 η)(1 γ) ϕ n l 1,n l 2,n l 3 ξηγ ϕ n l 1,l 2,l 3 ξ(1 η)(1 γ) I 2 ϕ l 1,n l 2,n l 3 (1 ξ)ηγ ϕ l 1,n l 2,l 3 = ϕ l 1,l 2,l 3 (1 ξ)η(1 γ) 2. ϕ n l 1,l 2,n l 3 ξ(1 η)γ ϕ l 1,l 2,n l 3 (1 ξ)(1 η)γ ϕ n l 1,n l 2,l 3 ξη(1 γ) Tese equalities prove tat I 2 (El 1,l 2,l 3 ) = span[ϕ l 1,l 2,l 3 2 ] for l 1, l 2, l 3 = 1,..., n 2 1, and give te block representation of I 2 in te Fourier basis. Furtermore, if l 1 = n 2 or l 2 = n 2 or l 3 = n, te coarse eigenfunctions 2 ϕ l 1,l 2,l 3 2 are zero. Indeed, te definition of ϕ l 1,l 2,l 3 2 gives, wit ( j 1, j 2, j 3 ) N 3, ϕ l 1,l 2,l 3 2 ( j 1 2, j 2 2, j 3 2) = sin(l 1 π j 1 2) sin(l 2 π j 2 2) sin(l 3 π j 2 2), for ( j 1 2, j 2 2, j 3 2) Ω 2, ten, if l 1 = n 2, we ave ϕ n 2,l 2,l 3 2 ( j 1 2, j 2 2, j 3 2) = sin( n 2 π j 12) sin(l 2 π j 2 2) sin(l 3 π j 2 2) and since j 1 is an integer, it follows tat sin( n 2 π j 12) = sin( n 2 π j 2 1 n ) = sin(π j 1) = 0. Terefore, we ave ϕ n 2,l 2,l 3 2 ( j 1 2, j 2 2, j 3 2) = 0. Te proof is similar for l 2 = n 2 and l 3 = n 2. Lemma 4. Te range of I 2 (ϕl 1,l 2,l 3 2 ) is E l 1,l 2,l 3, for l 1, l 2, l 3 = 1,..., n 2 1: I 2 : span[ϕl 1,l 2,l 3 2 ] E l 1,l 2,l 3, for l 1, l 2, l 3 = 1,..., n 2 1. Te trilinear interpolation can be block diagonalized in te Fourier basis wit te following block representation for l 1, l 2, l 3 = 1,..., n 2 1: Î 2 (l 1, l 2, l 3 ) = ( Î 2 (l 1, l 2, l 3 ) ) T.

70 55 Proof. Using trilinear interpolation and trigonometric identities, it follows tat I 2 ϕl 1,l 2,l 3 2 = +(1 ξ)(1 η)(1 γ)ϕ l 1,l 2,l 3 ξηγϕ n l 1,n l 2,n l 3 ξ(1 η)(1 γ)ϕ n l 1,l 2,l 3 +(1 ξ)ηγϕ l 1,n l 2,n l 3 (1 ξ)η(1 γ)ϕ l 1,n l 2,l 3 +ξ(1 η)γϕ n l 1,l 2,n l 3 (1 ξ)(1 η)γϕ l 1,l 2,n l 3 +ξη(1 γ)ϕ n l 1,n l 2,l 3. Tus I 2 ϕl 1,l 2,l 3 2 is in E l 1,l 2,l 3 for l 1, l 2, l 3 = 1,..., n 2 1 and, from te representation of Î2 (l 1, l 2, l 3 ) obtained in Lemma 3 comes te fact tat Î 2 (l 1, l 2, l 3 ) = (Î 2 (l 1, l 2, l 3 )) T. We now give in Teorem 1, te representation of te coarse grid correction operator K 2 te armonic spaces E l 1,l 2,l 3 for l 1, l 2, l 3 = 1,..., n 2. wit respect to Teorem 1. We consider te tree-dimensional Helmoltz operators (fine grid L, coarse grid L 2 ) as defined in Example 1, a trilinear interpolation I 2, its adjoint as restriction I2. Wit tese components, te armonic spaces E l 1,l 2,l 3 are invariant under te coarse grid correction operator K 2 = I I 2 L 1 2 I2 L for l 1, l 2, l 3 = 1,..., n 2 : K 2 K 2 : E l 1,l 2,l 3 E l 1,l 2,l 3 for l 1, l 2, l 3 = 1,..., n 2. can also be block diagonalized in te Fourier basis and its representation wit respect to te armonic spaces E l 1,l 2,l 3 for l 1, l 2, l 3 = 1,..., n 2 is I 8 [b i c j ] 8,8 /Λ if l 1, l 2, l 3 < n 2 K 2 (l 1, l 2, l 3 ) = I j is te j j identity matrix, I 4 if l 1 = n 2 or l 2 = n 2 or l 3 = n 2 I 2 if l 1 = l 3 = n 2 or l 1 = l 2 = n 2 or l 2 = l 3 = n 2 I 1 if l 1 = l 2 = l 3 = n 2, (3.5) Λ = 4 2 (1 ξ)ξ + (1 η)η + (1 γ)γ κ2, wit ( ) 4 b 1 = (1 ξ)(1 η)(1 γ) c 1 = (1 ξ)(1 η)(1 γ) (ξ + η + γ) κ2 2 ( ) 4 b 2 = ξηγ c 2 = ξηγ (3 ξ η γ) κ2 2 ( ) 4 b 3 = ξ(1 η)(1 γ) c 3 = ξ(1 η)(1 γ) (1 ξ + η + γ) κ2 2 ( ) 4 b 4 = (1 ξ)ηγ c 4 = (1 ξ)ηγ (2 + ξ η γ) κ2 2 ( ) 4 b 5 = (1 ξ)η(1 γ) c 5 = (1 ξ)η(1 γ) (1 + ξ η + γ) κ2 2 ( ) 4 b 6 = ξ(1 η)γ c 6 = ξ(1 η)γ (2 ξ + η γ) κ2 2 ( ) 4 b 7 = (1 ξ)(1 η)γ c 7 = (1 ξ)(1 η)γ (1 + ξ + η γ) κ2 2 ( ) 4 b 8 = ξη(1 γ) c 8 = ξη(1 γ) (2 ξ η + γ) κ2. 2

71 56 Proof. Gatering te results of Lemma 1, 2, 3 and 4 respectively, we first ave, for l 1, l 2, l 3 = 1,..., n 2, Tus, it follows tat L : E l 1,l 2,l 3 E l 1,l 2,l 3, L 2 : span[ϕ l 1,l 2,l 3 2 ] span[ϕ l 1,l 2,l 3 2 ], I 2 : E l 1,l 2,l 3 span[ϕ l 1,l 2,l 3 2 ], I 2 : span[ϕl 1,l 2,l 3 2 ] E l 1,l 2,l 3. K 2 : E l 1,l 2,l 3 E l 1,l 2,l 3. Furtermore, combining te representation of L, L 2, I 2 and I 2 wit respect to El 1,l 2,l 3, we obtain and tus, it follows tat [ L 1 2 Î2 L ] (l1, l 2, l 3 ) = 1 Λ [c i] i=1...8 for l 1, l 2, l 3 = 1,..., n 2 1, [ I 2 L 1 2 Î2 L ] (l1, l 2, l 3 ) = 1 Λ [b ic j ] i, j=1...8 for l 1, l 2, l 3 = 1,..., n 2 1. If l 1 = n 2 or l 2 = n 2 or l 3 = n, as Î2 2 (l 1, l 2, l 3 ) = 0, K 2 is ten reduced to te identity matrix wit a dimension corresponding to te dimension of E l 1,l 2,l 3. Te representation of a Jacobi smooter J in te Fourier basis is now introduced. Lemma 5. Te armonic spaces E l 1,l 2,l 3 for l 1, l 2, l 3 = 1,..., n 2 1 are invariant under te Jacobi smooter J wit damping parameter ω r (Example 1) for te Helmoltz operator L : J : E l 1,l 2,l 3 E l 1,l 2,l 3, for l 1, l 2, l 3 = 1,..., n 2 Te operator J can be represented in te Fourier basis as a diagonal matrix. Its representation wit respect to te spaces E l 1,l 2,l 3 consists in diagonal blocks as described below, for l 1, l 2, l 3 = 1,..., n 2 : ω r Ĵ (l 1, l 2, l 3 ) = 1 L 6 (l 1, l 2, l 3 ). 2 κ2 Proof. We first recall te expression of a Jacobi iteration matrix wit a relaxation parameter ω r : ω r J = I 6 L. 2 κ2 Te range J (E l 1,l 2,l 3 ) is ten E l 1,l 2,l 3 since E l 1,l 2,l 3 is invariant under L. Besides, te representation of J in te Fourier basis is, for l 1, l 2, l 3 = 1,..., n 2, ω r Ĵ (l 1, l 2, l 3 ) = 1 L 6 (l 1, l 2, l 3 ). 2 κ2 assum- We now give te representation of M (Equation 3.2) wit respect to te armonic spaces E l 1,l 2,l 3 ing tat te smooter leaves te spaces of armonics E l 1,l 2,l 3 invariant.

72 57 Corollary 5. Considering te tree-dimensional fine and coarse grid Helmoltz operators (L, L 2 ), a trilinear interpolation I 2, its adjoint as restriction I2, a smooter S wic leaves E l 1,l 2,l 3 invariant, te spaces E l 1,l 2,l 3 are invariant under M. Tis last operator as te following representation in te Fourier basis: M (l 1, l 2, l 3 ) = [Ŝ ν 2(l1, l 2, l 3 ) K 2 (l 1, l 2, l 3 )Ŝ ν 1(l1, l 2, l 3 )] l1,l 2,l 3 =1,...,n/2. were K 2 is given in Teorem 1. Proof. Tis is a direct consequence of Teorem 1. Terefore, te norm of M ( M 2 ) can be computed tanks to Corollary 5: { M 2 = max M (l 1, l 2, l 3 ) 2 1 max(l 1, l 2, l 3 ) n }. (3.6) 2 Te quantity M 2 is obtained by computing numerically M (l 1, l 2, l 3 ) 2 for all l 1, l 2, l 3 = 1,..., n/2. In certain situations, a two-grid cycle can be used as a preconditioner of a Krylov metod. Indeed in our application it is found tat te two-grid metod is not convergent for Helmoltz problems at ig wavenumbers. It can be used as a preconditioner of a Krylov metod [31]. As said in Capter 2, te distribution of a preconditioned operator spectrum in te complex plane can influence te convergence of a Krylov metod. Moreover, in te symmetric case, te spectrum governs teir convergence [101]. Te RFA enables to obtain te spectrum of tis preconditioned operator [125]. We are ten performing tis spectrum study in te RFA framework using preconditioning. As said in Section 2.2.1, a preconditioning matrix M must approximate te inverse of te linear system matrix A. We focus on te case were A is te original Helmoltz matrix L (0) = L for κ = k R (see Example 1) and M te two-grid iteration matrix M applied to a possibly sifted Helmoltz operator L (β) = L for κ 2 = (1 iβ)k 2, were β denotes te sift parameter lying in [0, 1]. It corresponds to te preconditioners depicted in [31] and [42] in te two-dimensional case. Eac preconditioning step requires te solution of te linear system L (β) z = v. One cycle of a geometric two-grid metod is used to approximate te inverse of L (β). Let 1 (β) denote tis approximation. Te convergence of te Krylov subspace metod is tus related to te spectrum of te matrix L (0) 1 (β). If only one cycle is performed, te iteration matrix of te preconditioning pase is equal to te iteration matrix of te multigrid procedure, tat is: M = (I 1 (β) L(β) ) or 1 (β) L(β) = I M (3.7) were M is te two-grid iteration matrix (see Equation (3.2)). From Equation (3.7) te following relation can be deduced: L (0) 1 (β) = L(0) (I M ) (L (β) ) 1. (3.8) Since all operators in Equation (3.8) are diagonalizable in te Fourier basis (Corollary 5), te spectrum of L (0) 1 can be computed solving eigenvalue problems of small size (8 8 at most) only. Terefore, we compute tanks to RFA te spectrum of L (0) 1 (β) for two values of β considering te same two-grid metod as in Corollary 5 and two Jacobi iteration (J 2(ω r)) as a smooter (ν 1 = ν 2 = 2). In Figure 3.6, te spectra of L (0) 1 (0) and L(0) 1 (0.6) are plotted considering a 643 grid for a wavenumber k = π/(6) and relaxation parameters ω r = 0.8 and ω r = 0.3 respectively. Te coice of te parameters β and ω r is discussed in Section

73 58 Figure 3.6: Spectra of L (0) 1 (β) for two values of β, (β = 0, ω r = 0.8) (left) and (β = 0.6, ω r = 0.3) (rigt), considering a 64 3 grid for a wavenumber k = π/(6). Bot spectra plotted in Figure 3.6 look favorable for te convergence of a Krylov metod. Indeed, on one and, using te two-grid metod on te original Helmoltz operator gives a spectrum wit a cluster around one wit few isolated eigenvalues wit positive or negative real parts. On te oter and, wen te two-grid metod is applied on te sifted Helmoltz operator (β = 0.6), te spectrum is lying in te positive real part of te complex plane wit few eigenvalues close to zero. Moreover, it as to be noticed tat te sapes of te spectra are similar as in te two-dimensional case; see Figure 1 in [31] for te original Helmoltz operator and Figure 7 in [42] for te sifted Helmoltz operator. Neverteless te lack of generality of te Rigorous Fourier Analysis (RFA) concerning te assumptions on te smooter and on te boundary conditions leads us to investigate te Local Fourier Analysis (LFA). Indeed, LFA enables to analyze general smooters (see [115, Table 4.4]). Furtermore it does not take into account boundary conditions because it can linearize locally any discrete operator wit a constant stencil. We introduce some elements of LFA in Section before presenting a smooting analysis for te tree-dimensional Helmoltz operator. In fact, tis LFA introduction can be seen as an extension to te tree-dimensional case of Sections 4.2 and 4.3 in [115] Local Fourier analysis (LFA) of a two-grid metod We first introduce tree-dimensional Helmoltz type operators wit periodic boundary conditions in Ω = G [0, 1] 3 : u κ 2 u in (0, 1) 3, u(0, y, z) = u(1, y, z), (y, z) [0, 1] 2, u(x, 0, z) = u(x, 1, z), (x, z) [0, 1] 2, u(x, y, 0) = u(x, y, 1), (x, y) [0, 1] 2. We ten introduce te eigenfunctions of tis operator: ϕ l 1,l 2,l 3 (x, y, z) = e 2iπl 1x e 2iπl 2y e 2iπl 3z, for n/2 l 1, l 2, l 3 < n/2 and (x, y, z) Ω. Te LFA relies on tese grid functions, owever instead of te discrete space Θ, Θ = {(2πl 1, 2πl 2, 2πl 3 ) n/2 l 1, l 2, l 3 < n/2} [ π, π) 3, te continuous space [ π, π) 3 is now considered. Te LFA ten uses te following grid functions: ϕ θ 1,θ 2,θ 3 (x, y, z) = e iθ 1x/ e iθ 2y/ e iθ 3z/, for (θ 1, θ 2, θ 3 ) [ π, π) 3 and (x, y, z) G. Te grid functions ϕ θ 1,θ 2,θ 3 (x, y, z) are linearly independent for any (θ 1, θ 2, θ 3 ) [ π, π) 3. Tus, tey form a basis of G, called once again a Fourier basis.

74 59 Te infinite grid G is considered ere in order to obtain a representation of any linear operator wit a constant stencil (a stencil wic does not depend on (x, y, z)) in tis Fourier basis [115, Lemma 4.2.1]. For instance, if te stencil of te tree-dimensional Helmoltz operator (Example 1) is considered, te following relation olds L ϕ θ 1,θ 2,θ 3 (x, y, z) = L (θ 1, θ 2, θ 3 )ϕ θ 1,θ 2,θ 3 (x, y, z) were L (θ 1, θ 2, θ 3 ) = 1 2 (6 eiθ 1 e iθ 1 e iθ 2 e iθ 2 e iθ 3 e iθ 3 ) κ 2. L (θ 1, θ 2, θ 3 ) is named te representation of L in te Fourier basis. Similarly as in Section (RFA), we assume tat te coarse grid is obtained by standard geometric coarsening. Tis assumption implies tat, on te infinite coarse grid G 2, for eac (θ 1, θ 2, θ 3 ) [ π/2, π/2) 3, tere are seven oter values of (θ ( j 1) 1, θ ( j 2) 2, θ ( j 3) 3 ) [ π, π) 3, wit ( j 1, j 2, j 3 ) {0, 1} 3 \ {(0, 0, 0)}, suc tat ϕ θ 1,θ 2,θ 3 2 (x, y, z) = ϕ θ( j1) 1,θ ( j 2 ) 2,θ ( j 3 ) 3 were, for i = 1, 2, 3, θ (0) i and θ (1) i are defined as: 2 (x, y, z), for (x, y, z) G 2 and ( j 1, j 2, j 3 ) {0, 1} 3 \ {(0, 0, 0)}, θ (0) i := θ i θ (1) i := { θi + π if θ i < 0, θ i π if θ i 0. Tus, only te frequency components ϕ θ 1,θ 2,θ 3 2 for (θ 1, θ 2, θ 3 ) [ π/2, π/2) 3 are visible on G 2. Tis leads us to define low and ig frequencies components of ϕ θ 1,θ 2,θ 3. Definition 3. Low and ig frequencies components of ϕ θ 1,θ 2,θ 3, (θ 1, θ 2, θ 3 ) [ π, π) 3 : ϕ θ 1,θ 2,θ 3 is a low frequency component (θ 1, θ 2, θ 3 ) Θ low := [ π/2, π/2) 3. ϕ θ 1,θ 2,θ 3 is a ig frequency component (θ 1, θ 2, θ 3 ) Θ ig := [ π, π) 3 \[ π/2, π/2) 3. Te coarse level is ten only dealing wit low frequencies. Hig frequencies are ten only managed on te fine level. In a two-grid algoritm, tis means tat ig frequency components of te error will be managed by smooting, wereas low frequencies by te coarse grid correction operator. In te next section, we focus on te computation of te smooting factor. Smooting analysis As said, in te LFA framework it is possible to analyze te smooting beavior of Gauss-Seidel wit lexicograpic ordering (Gauss-Seidel-lex). Tis was not possible in te RFA framework. Tis is of great interest since Gauss-Seidel-lex is a classical relaxation metod tat will be used ereafter. In order to use LFA to analyze te properties of a given smooter, we ave to assume tat te relaxation metod satisfies te following splitting: L + um+1 + L um = b wit L + + L = L (3.9) were, denoting by u m is te previous approximation of u (before te smooting step) and u m+1 te new approximation of u (after te smooting step). Remark 7. Considering te same notations as in Example 1, L = D E F, te expressions of L + and L for Jacobi, forward Gauss-Seidel-lex, backward Gauss-Seidel-lex are: Jacobi: D ) u m+1 + ( Dωr + L u m ω = b r L + = D ω r and L = D ω r + L.

75 60 forward Gauss-Seidel-lex: ( E + D)u m+1 backward Gauss-Seidel-lex: ( F + D)u m+1 Fu m = b L + = D E and L = F. Eu m = b L + = D F and L = E. We now define te errors e m+1 = u u m+1 and e m = u u m at iterations (m + 1) and m respectively denoting by u te discrete solution verifying L u = b. It follows tat: L + em+1 + L em = 0. Tis means tat e m+1 = S e m were S denotes te smooting operator (see [115, Lemma 4.3.1]). Tus, since we can write any linear operator wit a constant stencil on te Fourier basis { ϕ θ 1,θ 2,θ 3, for (θ 1, θ 2, θ 3 ) [ π, π) 3}, we use te Fourier representations of L + and L to obtain te Fourier representation of te smooting operator S. Indeed, Fourier representations L + (θ 1, θ 2, θ 3 ) and L (θ 1, θ 2, θ 3 ) can be easily deduced, applying L + and L to te basis functions ϕθ 1,θ 2,θ 3 (x, y, z). Terefore, if a smooter can be expressed as in Equation (3.9), te Fourier representation of te smooting operator S (θ 1, θ 2, θ 3 ) is (assuming tat L + (θ 1, θ 2, θ 3 ) 0 (θ 1, θ 2, θ 3 ) [ π, π) 3 ) S (θ 1, θ 2, θ 3 ) = L (θ 1, θ 2, θ 3 ) L + (θ 1, θ 2, θ 3 ). We now give te representation of a Jacobi iteration in te Fourier basis. Example 2. For te Jacobi iteration (Jac(ω r )), we ave L + ϕθ 1,θ 2,θ 3 (x, y, z) = L ϕθ 1,θ 2,θ 3 (x, y, z) = 1 6 (κ) 2 ϕ θ 1,θ 2,θ 3 2 ω (x, y, z), ( r 1 (6 (κ) 2 ) )(ω r 1) e iθ 1 e iθ 2 e iθ 3 e iθ 1 e iθ 2 e iθ 3 ϕ θ 1,θ 2,θ 3 2 ω (x, y, z). r Terefore, te Fourier representations L +, L and S are L + (θ 1 6 (κ) 2 1, θ 2, θ 3 ) =, 2 ω ( r L (θ 1 (6 (κ) 2 ) )(ω r 1) 1, θ 2, θ 3 ) = e iθ 1 e iθ 2 e iθ 3 e iθ 1 e iθ 2 e iθ 3, 2 ω r S (Jac(ω r)) ω r ( ) (θ 1, θ 2, θ 3 ) = 1 6 (κ) 2 e iθ 1 e iθ 2 e iθ 3 e iθ 1 e iθ 2 e iθ 3. 6 (κ) 2 We now give te representation of a forward and backward Gauss-Seidel-lex iteration in te Fourier basis. Example 3. Te forward Gauss-Seidel-lex iteration (GS-forw), reads ( ) L + ϕθ 1,θ 2,θ 3 1 (x, y, z) = (6 2 e iθ 1 e iθ 2 e iθ 3 ) κ 2 ( L ϕθ 1,θ 2,θ 3 (x, y, z) = 1 ) 2 (eiθ 1 + e iθ 2 + e iθ 3 ) Terefore, te Fourier representations L +, L and S are L + (θ 1, θ 2, θ 3 ) = ϕ θ 1,θ 2,θ 3 (x, y, z). ϕ θ 1,θ 2,θ 3 (x, y, z), 1 2 (6 e iθ 1 e iθ 2 e iθ 3 ) κ 2, L (θ 1, θ 2, θ 3 ) = 1 2 (eiθ 1 + e iθ 2 + e iθ 3 ), (GS f orw) S (θ 1, θ 2, θ 3 ) = e iθ 1 + e iθ 2 + e iθ 3 6 (κ) 2 e iθ 1 e iθ 2 e iθ 3.

76 61 Remark 8. We ten deduce te Fourier representation of a backward Gauss-Seidel-lex iteration (GS-back): L + (θ 1, θ 2, θ 3 ) = 1 2 (6 eiθ 1 e iθ 2 e iθ 3 ) κ 2, L (θ 1, θ 2, θ 3 ) = 1 2 (e iθ 1 + e iθ 2 + e iθ 3 ), (GS back) S (θ 1, θ 2, θ 3 ) = e iθ 1 + e iθ 2 + e iθ 3 6 (κ) 2 e iθ 1 e iθ 2 e iθ 3. A symmetric Gauss-Seidel-lex (GS-sym) iteration consists in one iteration of forward Gauss-Seidel-lex followed by one iteration of backward Gauss-Seidel-lex: (GS sym) S = S Its Fourier representation can be deduced as: (GS sym) S (θ 1, θ 2, θ 3 ) = S (GS back) (GS back) S (GS f orw). (GS f orw) (θ 1, θ 2, θ 3 ) S (θ 1, θ 2, θ 3 ). Tis Fourier representation enables us to define te smooting factor µ loc (S ). It is te supremum of te absolute value of te smooter components in te Fourier basis S (θ 1, θ 2, θ 3 ) for (θ 1, θ 2, θ 3 ) in te space of ig frequencies Θ ig (see Definition 3 and [115, Figure 4.1]). Definition 4. Smooting factor µ loc (S ) µ loc (S ) = sup S (θ 1, θ 2, θ 3 ). (θ 1,θ 2,θ 3 ) Θ ig Tus, µ loc (S ) can be obtained by solving a maximization problem on (θ 1, θ 2, θ 3 ). Remark 9. Considering Definition 4, wen te original Helmoltz operator is considered, κ = k R, te smooting factor of a symmetric Gauss-Seidel-lex (GS-sym) iteration is equal to te smooting factor of two iterations of Gauss-Seidel-lex (see Remark 8). Indeed, we ave S (GS sym) (θ 1, θ 2, θ 3 ) = S (GS sym) (GS f orw) (θ 1, θ 2, θ 3 ) = S (θ 1, θ 2, θ 3 ) 2. However, te smooting beavior of Gauss-Seidel-lex and symmetric Gauss-Seidel-lex can be different on te original Helmoltz operator wit PML since it is non-symmetric. Tis will be noticed in Section In Tables 3.1 and 3.2 we present smooting factors µ loc of te Jacobi smooter S (Jac(ω r)) for te 3D Helmoltz operator and te sifted 3D Helmoltz operator respectively, considering wavenumbers k suc tat tey verify te stability condition k = π on te fine level (see Relation A.4 in Appendix A). Te 6 smooting factors µ loc ((S (Jac(ω r)) ) ν ) are given on four grids of te multigrid ierarcy and two numbers of iterations: ν = 1, 2 respectively. Te sift parameter β (κ 2 = (1 βi)k 2 ) and te relaxation parameter ω r are cosen suc tat te smooting factor is smaller tan one on te tird grid (1/4) 3 and β as small as possible. Extensive computations of te smooting factor of te sifted operator led us to te following combination of values: { ωr = 0.3, 1 βi = 1 0.6i. Fine grid ((1/) 3 ) Grid ν = 1 ν = 2 ν = 1 ν = 2 ν = 1 ν = 2 ν = 1 ν = 2 (1/) (1/2) (1/4) (1/8) Table 3.1: Smooting factors µ loc ((S (Jac(ω r)) ) ν ) of te Jacobi smooter S (Jac(ω r)), ω r = 0.3 for two values of ν and four grid sizes considering te sifted 3D Helmoltz operator (β = 0.6) for a wavenumber k = π 6.

77 62 For te original Helmoltz operator (β = 0), te smooting factor on te tird grid is always larger tan one for any value of ω r as it as been observed in te two-dimensional case [42]. Ten, we coose te relaxation parameter suc tat te smooting factor on te fine level is as small as possible. We found ω r = 0.8 numerically. Fine grid ((1/) 3 ) Grid ν = 1 ν = 2 ν = 1 ν = 2 ν = 1 ν = 2 ν = 1 ν = 2 (1/) (1/2) (1/4) 3 (2.72) (7.39) (2.76) (7.62) (2.77) (7.68) (2.77) (7.70) (1/8) Table 3.2: Smooting factors µ loc ((S (Jac(ω r)) ) ν ) of te Jacobi smooter S (Jac(ω r)), ω r = 0.8 for two values of ν and four grid sizes considering te original 3D Helmoltz operator (β = 0) for a wavenumber k = π 6. Smooting factors larger tan one are indicated in brackets. We remark in Tables 3.1 and 3.2 tat te smooting factors are similar on a given level of te ierarcy, wen te ratio between te wavenumber k and te mes grid size is kept constant. Ten, as in te two-dimensional case [42], te Jacobi metod is found efficient to smoot ig frequencies for te sifted Helmoltz operator on eac grid of te multigrid ierarcy, wereas it is not possible to obtain a smooting factor smaller tan one on te tird grid for te original Helmoltz operator. Neverteless, it as to be noticed tat te smooting factors in Table 3.1 are obtained for a larger sift parameter (β = 0.6) tan in te two-dimensional case (β = 0.5). Furtermore, tese smooting factors are iger tan in te two-dimensional case (0.81 in two dimensions and 0.87 in tree dimensions for 2 Jacobi iterations). Consequently we deduce tat a multigrid cycle on te tree-dimensional sifted Helmoltz operator wit a Jacobi smooter could not precondition te original Helmoltz operator as efficiently as in te two-dimensional case [42, 43]. In [40, 96], te autors advise to use a plane smooter [88] in combination wit semicoarsening [124] to work towards tis issue. In te next tables, we sow tat improved smooting factors can be obtained for tree-dimensional Helmoltz problems at least on te two finest grids. (GS f orw) In Tables 3.3 and 3.4, we present smooting factors µ loc of te Gauss-Seidel-lex smooter S for te sifted 3D Helmoltz operator and te 3D Helmoltz operator respectively, considering wavenumbers k suc tat k = π 6 (see Relation A.4 in Appendix A). Te smooting factors µ (GS f orw) loc((s ) ν ) are given on four grids of te multigrid ierarcy and for two numbers of iterations: ν = 1 and ν = 2 as previously. For tis smooter, te sift parameter does not enable to obtain a smooting factor smaller tan one on te tird grid (1/4) 3. Te sift parameter β is ten once again taken equal to 0.6. Fine grid ((1/) 3 ) Grid ν = 1 ν = 2 ν = 1 ν = 2 ν = 1 ν = 2 ν = 1 ν = 2 (1/) (1/2) (1/4) 3 (5.84) (34.12) (7.39) (54.66) (8.37) (70.13) (8.91) (79.47) (1/8) Table 3.3: Smooting factors µ loc ((S (GS f orw) ) ν ) of te Gauss-Seidel-lex smooter S (GS f orw) for two values of ν and four grid sizes considering te sifted 3D Helmoltz operator (β = 0.6) for a wavenumber k = π 6. Smooting factors larger tan one are indicated in brackets.

78 63 Fine grid ((1/) 3 ) Grid ν = 1 ν = 2 ν = 1 ν = 2 ν = 1 ν = 2 ν = 1 ν = 2 (1/) (1/2) (1/4) 3 (23) (525) (61) (3687) (794) (630998) (794) (630998) (1/8) Table 3.4: Smooting factors µ loc ((S (GS f orw) ) ν ) of te Gauss-Seidel-lex smooter S (GS f orw) for two values of ν and four grid sizes considering te original 3D Helmoltz operator (β = 0) for a wavenumber k = π 6. Smooting factors larger tan one are indicated in brackets. Tables 3.3 and 3.4 point out tat te lexicograpical Gauss-Seidel metod is more efficient tan te Jacobi metod to smoot te ig frequency components on te finer grids for bot 3D Helmoltz operators (for bot β = 0 and β = 0.6). However, after extensive experiments we can conclude tat tis smooter cannot succeed in smooting on te tird grid for any sift parameter. Terefore, wen a sifted Helmoltz operator is considered, a tree-dimensional geometric multigrid could be improved by considering Gauss-Seidel on te finer grids ((1/) 3,(1/2) 3 ) and Jacobi on te coarse ones. Neverteless, to obtain an efficient coarse Jacobi smooter requires a large sift (β = 0.6). Tis can imply a loss of efficiency of a multigrid iteration. Furtermore, as in te two-dimensional case, te relaxation parameter or te sift parameter ave to be canged for eterogeneous Helmoltz problems to obtain an efficient multi-level preconditioner. Finally we note tat te boundary conditions (PML) can also influence te determination of te sift parameter. A numerical illustration is given in te next section were we analyze spectra and istories of convergence considering te one-dimensional Helmoltz operator wit absorbing boundary conditions preconditioned by a two-level metod. Tese absorbing boundary conditions are formulated wit a Perfectly Matced Layer (PML) [11]. One-dimensional Helmoltz operator wit PML Regarding te formulation and discretization of te Helmoltz operator, we refer to Section A.2 in Appendix A. Te use of a PML formulation implies variable coefficients in te Helmoltz operator A (x, y, z), (x, y, z) Ω. A "frozen" analysis [61, 104] could ten be performed to deduce an upper bound of te convergence factor in te Fourier analysis framework. Indeed, since te operator A (x, y, z) as variable coefficients, te coefficients of its representation in te Fourier basis Ă (x, y, z) depend on te coordinates (x, y, z) Ω. Te "frozen" analysis consists in finding an upper bound of te spectral radius of a multigrid iteration matrix M (x, y, z) for A (x, y, z) as ρ(m (x, y, z)) max ρ( M (x, y, z)). (x,y,z) Ω Yet it cannot be used to deduce te spectrum of te preconditioned operator since it computes spectra of A (x, y, z) for eac (x, y, z) Ω. Besides it is not possible to find te analytic expression of te eigenfunctions of te Helmoltz operator wit PML. (β) is its explicitly computation, were A (0) denotes te original Helmoltz operator wit PML and 1 (β) te operator representing te action of a two-level preconditioner applied to te sifted Helmoltz operator wit PML A (β). Obviously te computation of tis spectrum is not affordable in tree dimensions. Notwitstanding, since te tree-dimensional spectra computed wit a RFA (Section 3.3.1) were really similar to te spectra obtained Te only way to analyze te spectrum of te preconditioned operator A (0) in two dimensions [40, 42] (see Figure 3.6), it is expected tat to compute te spectrum of A (0) (β) in one dimension can provide us some information [41]. Tus, considering two Gauss-Seidel iterations as preand post-relaxations (ν 1 = ν 2 = 2), we compute te spectrum of te preconditioned operator A (0) 1 (β) in one dimension and report te istory of convergence of GMRES(5) using te two-grid preconditioner for different sift parameters. Considering a one dimensional Helmoltz operator wit PML (1/ = 1024, k = π 6, n PML = 16), Figure 3.7 sows istories of convergence of GMRES(5) using te two-grid preconditioner for different values of β and Figure 3.8 sows te corresponding spectra of te preconditioned A (0) (β) (1/ = 1024, k = π 6 ).

79 64 In Figure 3.8, wen a complex sift is used (β 0), GMRES(5) stagnates. Wen te preconditioner is related to te original Helmoltz operator (β = 0), we observe tat te iterative procedure converges. Tis beavior is related to te eigenvalue distribution. Indeed, in Figure 3.8, te spectra ave a similar sape for non-zero values of te sift parameter, eigenvalues lie on an ellipse wit few outliers, several eigenvalues on te ellipse close to zero ave a negative real part. Tese spectra are ten not favorable to te convergence of GMRES. Wen no sift is used (β = 0), te spectrum is clustered around one wit few isolated eigenvalues in a alf plan of te complex plane. Terefore, wen considering te Helmoltz equation wit PML, te sift parameter cannot be used in a similar way as wit oter boundary conditions (Diriclet, Robin (of first or second order type) [40]), since convergence cannot be acieved. However, if te sift parameter is set to a negative value, GMRES preconditioned by a two-grid metod applied on a sifted Helmoltz operator converges (see Figure 3.9). Indeed, GMRES(5) is converging for eac value of β. Te spectra for β 0 do not exibit isolated eigenvalues and are enclosed in te unit circle centered in one (see Figure 3.10). Tis is due to te formulation of te sifted Helmoltz equation in te PML. Indeed, its one-dimensional formulation is (see Equation A.1): iγ x (x) x 1 + iγ x (x) x u(x) (1 iβ)k2 u(x) = s, for x (0, 1), were γ x denotes te one-dimensional PML function (see Equation A.2); it is zero outside te PML layer. If γ x is set to a fixed value in te PML, say (γ x = 1), we obtain te following operator in te PML: 1 2i ( u(x) (2β + 2i)k2 u(x)), Tus, if β 0.5, tis operator will be indefinite in te PML layer at ig wavenumbers and it is expected tat te preconditioner looses its efficiency. A negative sift andles tis difficulty. Moreover, it can be observed tat witout sift, te operator in te PML layer is a Laplace-type operator, tis can be beneficial to te convergence of GMRES. Remark 10. Te results of te smooting analysis in Section also old for te opposite sift parameter ( β). Indeed, we first remind te expression of te smooting factor of a forward lexicograpical Gauss- Seidel (Example 3), (GS f orw) e µ loc ( S iθ 1 + e iθ 2 + e iθ 3 ) = sup. (θ 1,θ 2,θ 3 6 (1 iβ)(k) ) Θ 2 e iθ ig 1 e iθ 2 e iθ 3 Since z = z z C, we ave (GS f orw) e µ loc ( S iθ 1 + e iθ 2 + e iθ 3 = sup. (θ 1,θ 2,θ 3 6 (1 + iβ)(k) ) Θ 2 e iθ ig 1 e iθ 2 e iθ 3 Since te subspace Θ ig is symmetric wit respect to te origin (Definition 3), we can cange (θ 1, θ 2, θ 3 ) to ( θ 1, θ 2, θ 3 ) in te sup. It follows tat (GS f orw) e µ loc ( S iθ 1 + e iθ 2 + e iθ 3 ) = sup. (θ 1,θ 2,θ 3 6 (1 + iβ)(k) ) Θ 2 e iθ ig 1 e iθ 2 e iθ 3 Tus, te smooting factor of forward lexicograpical Gauss-Seidel is te same for a positive sift parameter and its opposite. Te same proof can be done for a Jacobi iteration. Tis as to be kept in mind wen coosing te rigt sift parameter in tree dimensions.

80 65 Figure 3.7: History of convergence of GMRES(5) preconditioned by a two-grid cycle using two Gauss- Seidel iterations as pre- and post-relaxations (ν 1 = ν 2 = 2) to solve a one-dimensional Helmoltz problem wit PML (1/ = 1024, k = π 6 ) for four values of β (0, 0.5, 0.6, 0.7). Convergence is acieved only in te case β = 0 ere. 1 Figure 3.8: Spectra of A (0) (β) (1/ = 1024, k = π 6 ) using two Gauss-Seidel iterations as pre- and postrelaxations (ν 1 = ν 2 = 2) for four values of β, from left to rigt and from top to bottom, β = 0.5, β = 0.6, β = 0.7 and β = 0 respectively. Te unit circle centered in one (in blue) is used to scale te spectra.

81 66 Figure 3.9: History of convergence of GMRES(5) preconditioned by a two-grid cycle using two Gauss- Seidel iterations as pre- and post-relaxations (ν 1 = ν 2 = 2) to solve a one-dimensional Helmoltz problem wit PML (1/ = 1024, k = π 6 ) for four values of β ( 0.7, 0.6, 0.5, 0). 1 Figure 3.10: Spectra of A (0) (β) (1/ = 1024, k = π 6 ) using two Gauss-Seidel iterations as pre- and post-relaxations (ν 1 = ν 2 = 2) for four values of β, from left to rigt and from top to bottom, β = 0.5, β = 0.6, β = 0.7 and β = 0 respectively. Te unit circle centered in one (in blue) is used to scale te spectra. Terefore, te influence of many parameters makes te coice of te sift β delicate. Indeed, te formulation of te Helmoltz problem, te smooter properties on eac grid and te multi-level preconditioner efficiency ave to be taken into account to coose te rigt sift parameter. Tese dependencies on vari-

82 67 ous parameters ave led us not to consider a sifted Helmoltz operator. Considering Tables 3.2 and 3.4, we advocate te use of only two grids in te multigrid ierarcy wit Gauss-Seidel type smooters. In two dimensions, a two-level preconditioner on te original Helmoltz operator as proved efficient [31]. However, its efficiency relies on te use of a direct solver (MUMPS [2, 3]) on te coarse level. In tree dimensions, even on parallel memory distributed computers, te use of a direct metod on te coarse level is proibitive in terms of computational resources. Indeed, at te beginning of tis tesis, te largest treedimensional case tat we could solve in core wit MUMPS [4] was of size on 80 cores of an IBM JS21 macine (two GigaBytes per core). Even if tis size is already large, it is still too small to solve te tree-dimensional Helmoltz equation at large wavenumbers using a direct metod on te coarse level of a two-level preconditioner. Tus, a tree-dimensional two-level preconditioner necessarily implies to use an iterative metod on te coarse level. We call tis sceme a perturbed two-level metod. Consequently, a coarse stopping criterion as to be cosen for te coarse iterative solver. In te next section, we will sow tat a perturbed two-level metod is an efficient preconditioner even wen a large tolerance on te coarse linear system is cosen. 3.4 A perturbed two-level preconditioner We focus on te design of a two-level preconditioner for Helmoltz problems wit absorbing boundary conditions of PML type [12] at ig wavenumbers. Te formulation and discretization of tis problem are discussed in detail in Appendix A. We ave first considered a tree-level preconditioner. Numerical tests confirmed te results of te smooting analysis of Section 3.4.2: tree levels wit geometric coarsening are found inefficient for treedimensional problems (see Figure 3.11). We are ten considering a perturbed two-level cycle as a preconditioner were an iterative metod is used on te coarse level (Algoritm 15). Tis involves tat te dominant component in terms of computational work of te two-level metod will be te solution metod of te coarse problem. Neverteless, a convergence criterion must be cosen to stop iterative metods. Terefore, we ave to select a convergence tresold for te solution of te coarse level problem in order to minimize te computational cost of te coarse solution pase witout damaging te preconditioning properties of te two-grid cycle. In Section 3.4.1, we sow tat, wen using a nonlinear coarse solver (for instance preconditioned GMRES), a large coarse tolerance can ensure a convergence factor close to te one obtained wen te coarse problem is solved exactly. Tis will be proved tanks to a Rigorous Fourier Analysis and corresponds to te main new result of tis capter. Notwitstanding, even if coarse problems are solved witin a large tolerance, tis remains still te most expensive part of te perturbed two-grid metod. It is ten of great interest to select te oter components of te two-level cycle to reduce te number of required iterations. A way to improve a two-grid cycle is to improve te smooters. It often needs to perform a few smooting iterations more to really improve te cycle. In Section 3.4.2, we select te smooter according to some numerical experiments and sow teir smooting effect on te tree-dimensional Helmoltz operator wit PML. Tis selection is performed wit numerical experiments because a traditional Local Fourier Analysis can only provide te smooting factors reported in Table 3.4. Indeed, boundary conditions are not taken into account in LFA and non-standard smooters, for instance Krylov metods, cannot be analyzed in tis framework. Besides, we use te twogrid metod as a preconditioner and not as a solver, tis does not enable us to make a Fourier analysis witout including some random parameters [125]. Prolongation and restriction could also be selected to improve te two-level operator coosing tem depending on te matrix [127] or of iger-order (cubic, quadratic) [61, Section 3.4.3]. Yet to coose matrix-dependent transfer operators implies a iger cost in memory and operations tan trilinear interpolation and full-weigting restriction. Furtermore te implementation of ig-order transfer operators in a parallel environment is not straigtforward; it requires neigboring points at a sometimes large distance. Tus, we do only consider full-weigting restriction and its adjoint as an interpolation in tis work Approximation of te convergence factor of a perturbed two-grid metod In tis section, we first consider a two-grid cycle used as a solver on te tree-dimensional Helmoltz operator wit Diriclet boundary conditions at small wavenumbers. Te discretization of te Helmoltz operator is still andled wit a second order finite difference sceme for a vertex-centered grid arrangement.

83 68 Figure 3.11: Histories of convergence of FGMRES(5) preconditioned by a tree-grid V-cycle wit two iterations of lexicograpical forward Gauss-Seidel as pre- and post-smooter (ν 1 = ν 2 = 2) for a wavenumber k = π 6. We coose bot tese boundary conditions and wavenumbers to be able to use te elements of RFA teory introduced in Section and to obtain a general idea on te influence of te accuracy required for te coarse solution. Troug tis study, we consider a two-grid cycle described in Algoritm 15 (its main components ave been cosen as in Algoritm 13). Algoritm 15 describes a classical two-grid cycle wen ε 2 = 0 (i.e. wen te coarse problem is solved exactly). Algoritm 15 Perturbed two-grid cycle to solve L u = b 1: Presmooting: u := S (L, u, b, ν 1 ) 2: Compute te residual r : r = b L u 3: Restrict te residual: b 2 = I 2 r 4: Set u 2 := 0 5: Solve approximately L 2 u 2 = b 2 on Ω 2 suc tat b 2 L 2 u 2 2 b 2 2 ε 2. 6: Interpolate te coarse solution u 2 to obtain a correction of te fine solution u : I 2 u 2 7: Add tis correction to te solution: u := u + I 2 u 2 8: Postsmooting: u := S (L, u, b, ν 2 ) We would like to determine an estimation of te convergence factor of a two-grid cycle wit an approximate coarse grid solution denoted later by T. In te SPD case, it is known tat te coarse tolerance (ε 2 ) is not required to be very tigt to obtain a good convergence factor for a two-grid cycle [115, p. 45]. We denote by P te perturbed two-grid iteration matrix were te coarse problem is solved inexactly: P (u u ) = S ν 2 (I I 2 C 2I 2 L )S ν 1 (u u ), denoting by C 2 te iteration matrix of te coarse solution metod. According to [87, Relation 3.2] - if a symmetric multigrid sceme is considered [87, section 1] - an upper bound of te spectral radius of P can be found wit respect to te spectral radii of bot M and C 2 : ρ(p ) 1 (1 ρ(m ))(1 ρ(c 2 )), ρ(p ) ρ(m ) + ρ(c 2 )(1 ρ(m )).

84 69 Terefore, if above ρ(c 2 ) = 0.1 and ρ(m ) = 0.5, ρ(p ) is bounded by 0.6 wic remains attractive. Te estimation of ρ(p ) can be even more rigorous considering its representation in te Fourier basis. Tus, if C 2 can be written in te Fourier basis, ρ(p ) can be explicitly computed. However tis analysis does not cover te case of a Krylov metod for te coarse problem. Tis is due to te non-linearity of Krylov metods. Indeed te solution can be expressed as a polynomial of te matrix A applied to te initial error (u u (0) ): m 1 u (u)m = α k A k (u u(0) ), k=0 te coefficients α k of te minimization polynomial depend nonlinearly on te operator A and te projected residual. Despite tis nonlinearity, we propose a simplified analysis to obtain an estimation of te convergence factor of a perturbed two-level cycle depending on te coarse tolerance, ε 2. Tis approac consists in injecting in te Fourier representation of te coarse grid operator (see Teorem 1) a perturbation term corresponding to te approximate coarse problem solution for eac (l 1, l 2, l 3 ) l 1, l 2, l 3 < n 2. First, we consider te following obvious statement. Solving a linear system Ax = b wit an iterative metod suc tat b A x 2 ε, is equivalent to solve exactly te following linear system wit a perturbed b 2 rigt-and side b + b suc tat: A x = b + b wit b 2 ε b 2 b A x 2 b 2 ε. (3.10) In tis case b is noting else tan te opposite of te residual: b = (b A x). Tus, we will consider te effect of te inaccuracy of te coarse solution wen considering a perturbed rigt-and side on te coarse level [b 2 + b 2 ] instead of te coarse rigt-and side b 2 = I 2 L S ν 1 (u u ) only. Wit tese notations, te perturbed two-grid operator T implemented in Algoritm 15 can be written using Equation (3.2): ( T (u u ) = S ν 2 S ν 1 (u u ) I 2 L 1 2 (b 2 + b 2 ) ). (3.11) Te following proposition enables us to block diagonalize tis perturbed two-grid operator in te Fourier basis using some reasonable assumptions. Proposition 9. Wit te same notations as in Algoritm 15 and Corollary 5, we consider one cycle of te perturbed two-grid operator T (Equation (3.11)). Tis operator as te following representation in te Fourier basis: T = [Ŝ ν 2(l1, l 2, l 3 ) Υ 2 (l 1, l 2, l 3 )Ŝ ν 1(l1, l 2, l 3 )] l1,l 2,l 3 =1,...,n/2, wit Υ 2 (l 1, l 2, l 3 ) = b 2 = Î 2 L ν 1(û Ŝ û ) = [α l 1,l 2,l 3 I 8 (1 + ε l 1,l 2,l 3 2 )[b i c j ] 8,8 /Λ if l 1, l 2, l 3 < n 2 I 4 if l 1 = n 2 or l 2 = n 2 or l 3 = n 2 I 2 if l 1 = l 2 = n 2 or l 1 = l 3 = n 2 or l 2 = l 3 = n 2 I 1 if l 1 = l 2 = l 3 = n 2 2 ] l1,l 2,l 3 =1,...,n/2 1, b 2 =[ε l 1,l 2,l 3 2 α l 1,l 2,l 3 2 ] l1,l 2,l 3 =1,...,n/2 1, wit ε l 1,l 2,l 3 2 R, l 1, l 2, l 3 = 1,..., n/2 1. Proof. As discussed in Section 3.3.1, on te coarse grid space Ω 2, spaces of armonics E l 1,l 2,l 3 2 are reduced to one-dimensional spaces span[ϕ l 1,l 2,l 3 2 ]. Terefore, we can write in te coarse Fourier basis (ϕ l 1,l 2,l 3 2, l 1, l 2, l 3 = 1,..., n/2 1), te components of te coarse rigt-and side perturbation as a collinear perturbation of te coarse rigt-and side for eac (l 1, l 2, l 3 ), l 1, l 2, l 3 1,..., n/2 1, i.e.: b 2 = [ α l 1,l 2,l 3 2 b 2 = [ ] ε l 1,l 2,l 3 2 α l 1,l 2,l 3 2 ] l 1,l 2,l 3 =1,...,n/2 1, l 1,l 2,l 3 =1,...,n/2 1.

85 70 We sow ow te expression of Υ 2 can be deduced from te expression of T : ( T (u u ) = S ν 2 S ν 1 (u u ) I 2 L 1 2 [b 2 + b 2 ] ), = Ŝ ν 2 (Ŝ ν 1 (û û ) Î 2 L 1 2 [(1 + εl 1,l 2,l 3 2 )α l 1,l 2,l 3 2 ] l1,l 2,l 3 =1,...,n/2 1). Since te coarse rigt-and side as te following expression in te Fourier basis b 2 = Î 2 L Ŝ ν 1(û û ), we ave: T (u u ) = Ŝ ν 2 (Ŝ ν 1 (û û ) Î 2 L 1 = Ŝ ν 2 (I Î 2 L 1 2 diag([1 + εl 1,l 2,l 3 2 diag([1 + εl 1,l 2,l ] l1,l 2,l 3 =1,...,n/2 1)Î 2 L Ŝ ν 1(û û )) ] l1,l 2,l 3 =1,...,n/2 1)Î 2 L )Ŝ ν 1(û û ). It follows tat: T = [Ŝ ν 2 (l 1, l 2, l 3 ) ( I 8 (1 + ε l 1,l 2,l 3 2 ) ) ν 1(l1 Ξ(l 1, l 2, l 3 )Ŝ, l 2, l 3 ) ] l 1,l 2,l 3 =1,...,n/2 1 Ŝ ν 2+ν 1 (l 1, l 2, l 3 ) if k = n 2 or l = n 2 or m = n 2. were Ξ(l 1, l 2, l 3 ) = Î 2 (l 1, l 2, l 3 ) L 2 1(l 1, l 2, l 3 )Î 2 (l 1, l 2, l 3 ) L (l 1, l 2, l 3 ) for l 1, l 2, l 3 = 1,..., n/2 1. Terefore we ave [I 8 (1 + ε l 1,l 2,l 3 2 )Î 2 Υ 2 = (l 1, l 2, l 3 ) L 2 1(l 1, l 2, l 3 )Î 2 (l 1, l 2, l 3 ) L (l 1, l 2, l 3 )] l1,l 2,l 3 =1,...,n/2 1 I 4 if k = n 2 or l = n 2 or m = n 2 I 2 if k = m = n 2 or k = l = n 2 or l = m = n 2 I 1 if k = l = m = n 2. Te expression of te coarse grid correction operator K 2 of Υ 2. in Teorem 1 gives te final explicit expression We now focus on a specific perturbation b 2 suc tat b 2 2 ε 2 b 2 2, wic means tat te coarse problem is solved wit a normalized error below ε 2 (see Relation 3.10). Tis ypotesis on b 2 adds a constraint on its components in te Fourier basis. Using notations of Proposition 9, te relation b ε2 2 b becomes: n/2 1 (ε l 1,l 2,l 3 2 α l 1,l 2,l 3 2 l 1,l 2,l 3 =1 ) 2 ε 2 2 n/2 1 l 1,l 2,l 3 =1 (α l 1,l 2,l 3 2 ) 2. (3.12) Terefore, to perform a rigorous Fourier analysis wit a coarse perturbation satisfying b 2 2 ε 2 b 2 2, we need to select its Fourier components suc tat relation (3.12) is satisfied. In practice, we cannot verify (3.12) on te ε l 1,l 2,l 3 2 : it would involve te coarse rigt-and-side coefficients α l 1,l 2,l 3 2. Te Fourier analysis could ten be possible only if tese coefficients were accessed at eac iteration of te twogrid cycle; tis kind of analysis would be pointless. We ten focus on a subset of te set spanned by te ypotesis b 2 2 ε 2 b 2 2 : S l 1,l 2,l 3 ε 2 = { ε l 1,l 2,l 3 2 R ε l 1,l 2,l 3 2 ε 2 }. Coosing tis subset clearly implies a loss of generality in our study. However, if a relaxation metod is used as a precondtioner in te coarse solver, it is reasonable to tink tat te coarse residual b 2 will be smoot (see Section for a grapical illustration). Furtermore tis subset is found to be relevant in practice (see Table 3.5) and allows to describe well te perturbed two-grid beavior. Practically, we select few values (10, say) for ε l 1,l 2,l 3 2 in [ ε 2, ε 2 ] and compute te corresponding spectral radii of T (l 1, l 2, l 3, ε l 1,l 2,l 3 2 ) for eac triplet (l 1, l 2, l 3 ) wit l 1, l 2, l 3 = 1,..., n 1. Finally, we obtain an estimation ( ρ(t )) of ρ(t ) as: ρ(t ) = max l 1,l 2,l 3 =1,...,n 1 max ε l 1,l 2,l 3 2 S l 1,l 2,l 3 ε 2 ρ( T (l 1, l 2, l 3, ε l1,l2,l3 2 )). (3.13)

86 71 In Table 3.5 we compare ρ(t ) wit te experimental convergence factor ρ Exp (T ), Algoritm 15 wit a preconditioned Krylov solver on te coarse level. Te experimental convergence factor ρ Exp (T ) is obtained by computing te ratio between te two last errors in te istory of convergence. We perform one Jacobi iteration (ν 1 = ν 2 = 1) wit a relaxation parameter ω = 6/7. We select te largest wavenumbers k on te original Helmoltz operator for wic te classical two-grid metod as still a good convergence factor, typically 0.5. ε , k = , k = , k = , k = 36 ρ(t ) ρ Exp (T ) ρ(t ) ρ Exp (T ) ρ(t ) ρ Exp (T ) ρ(t ) ρ Exp (T ) Table 3.5: Teoretical estimation of te convergence factor ( ρ(t )) and experimental convergence factors ρ Exp (T ) for several coarse tolerances ε 2. Bot teoretical and experimental convergence factors in Table 3.5 confirm tat a really large tolerance on te coarse problem can lead to te same convergence factor as in te case of an exact coarse solution, ε 2 = (consider te last two rows of Table 3.5). Indeed, it can first be noticed tat wen te coarse tolerance is decreasing, te convergence factor is decreasing as well for bot teoretical and experimental computations. Ten, for all grid sizes and wavenumbers considered, tese convergence factors ( ρ(t ) and ρ Exp (T )) are close to 0.5 wen ε 2 is less or equal to 0.2. Table 3.5 also sows tat te teoretical convergence factor ρ(t ) estimates quite well ρ(t ). Tus, a large coarse tolerance ε 2 can provide a two-grid cycle as efficient as a two-grid cycle wit an exact coarse solution on tis model problem. Indeed, really large coarse tolerance (about 0.1) can provide convergence factors tat are similar. However only small wavenumbers were considered in tis simplified analysis. Terefore since at ig wavenumbers te two-grid metod does not converge on te original Helmoltz operator, we propose to analyze spectrum as in Section We no longer consider te spectrum of te preconditioned operator 1 (see Equation 3.7) depending on β but on te coarse tolerance ε 2, 1 (ε 2). Tus, we compute te spectrum of te following operator: L (0) 1 (ε 2) = L (0) (I T (ε 2 )) (L (0) ) 1. (3.14) Since a representation of T (ε 2 ) in te Fourier basis (Proposition 9) is available, we plot te spectrum of te following operator: [ L (0) (l 1, l 2, l 3 ) (Î (l 1, l 2, l 3 ) T (l 1, l 2, l 3, ε l 1,l 2,l 3 2 )) ( L (0) ) 1 (l 1, l 2, l 3 ) ] l 1,l 2,l 3 =1,...,n/2 1 ), were ε l 1,l 2,l 3 2 = argmax { ρ( T (l 1, l 2, l 3, ε l 1,l 2,l 3 2 )) ε l 1,l 2,l 3 2 S l 1,l 2,l 3 ε 2 }. Figure 3.12 sows te spectra of L (0) 1 (ε 2) for two values of ε 2 (ε 2 = 0 and ε 2 = 0.1 respectively).

87 72 Figure 3.12: Spectra of L (0) 1 (ε 2) for two values of ε 2 (ε 2 = 0 (left) and ε 2 = 0.1 (rigt)), considering Helmoltz problems wit Diriclet boundary conditions wit a 64 3 grid for a wavenumber k = π/(6) and two iterations of Jacobi as a smooter (ν 1 = ν 2 = 1) wit relaxation parameter ω r = 0.4. We note tat Figure 3.12, left corresponds to te same spectrum (wit a different scale) as in te left part of Figure 3.6. Te spectra sown in Figure 3.12 are very similar, eigenvalues are clustered around one wereas few eigenvalues are isolated. Terefore te perturbed two-level metod may be a preconditioner as efficient as te exact two-level metod for te original Helmoltz problem. Neverteless, we sall later investigate weter tis result olds wit absorbing boundary conditions of PML type. We address tis important topic in Section 3.5. Numerical examples sow tat tis property still olds for te original Helmoltz operator wit PML at large wavenumbers. Beforeand, we discuss ow to coose te smooter in practice Smooter selection We propose ten to select te smooter tanks to numerical experiments, considering an exact coarse solver. For tis selection, we consider a classical two-grid metod (Algoritm 13) as a preconditioner of FGMRES(5) (Algoritm 3). Te coice of a flexible metod is motivated by te possible use of a Krylov metod as a smooter as it is advised in [37]. Tis study is done for several grids (and teir corresponding wavenumbers) in a parallel setting. We refer to [47] and [48] for te parallel implementation of GMRES and FGMRES respectively. For te implementation of te two-level preconditioner on structured grids, we refer to [115, Section 6]. We select te number of cores so tat te local problems ave te same size on eac grid. We will deal only wit te same number of iterations for pre- and postsmooting: ν 1 = ν 2 = ν. Since a parallel environment is cosen, we use local relaxation metods. As noticed in Section 3.3.2, te smooting analysis sows tat lexicograpic Gauss-Seidel type metods beave well for te Helmoltz problem on te fine level. We are ten focusing on lexicograpic Gauss-Seidel smooters. We consider te local lexicograpic forward Gauss-Seidel metod [115, Remark 6.2.5], instead of te lexicograpic forward Gauss-Seidel metod. We denote by GS LEX te local lexicograpic Gauss-Seidel, GS S Y M te local symmetric lexicograpic forward Gauss-Seidel and GMRES (ν)/gs S Y M (1), ν iterations of GMRES preconditioned by one iteration of GS S Y M. Te coarse problem is solved wit a restarted GMRES(10) preconditioned by one iteration of GS S Y M so tat te coarse normalized residual is below Te initial solution of FGMRES is set to zero and te convergence tresold of te metod is set at Numerical experiments are reported in Table 3.6. We first remark in Table 3.6 tat te number of iterations is increasing wit te size of te problem. Tis is due to te fact tat te wavenumber is coupled to te grid size (see relation A.4 in Appendix A), implying its increase wit respect to te inverse of (te grid size) and tus te increasing indefiniteness of te problem. Concerning te smooters, reading te table from left to rigt, a standard GS LEX is first used and improved by increasing te number of bot pre- and post-smooting iterations. Tis metod can be once more improved considering as a second iteration a backward Gauss-Seidel iteration (GS S Y M ) but increasing te GS S Y M number of iterations as nearly no effect. GMRES alone is not a good smooter

88 73 GS LEX (ν) GS S Y M (ν) GMRES (ν) GMRES (ν)/gs S Y M (1) Grid #Cores ν = 1 ν = 2 ν = 1 ν = 2 ν = 1 ν = 2 ν = 1 ν = Table 3.6: Number of iterations needed to reac 10 6 for FGMRES(5) preconditioned by a two-grid cycle considering several smooters and grids (1/ 3 ) at wavenumbers k = π 6. but using GS S Y M as a preconditioner for GMRES gives te best results wen two GMRES iterations are performed wit respect to te number of iterations. Since te coarse problem is solved at eac iteration and it is te dominant component of te two-grid preconditioner in terms of computational resources, we select te smooter wic minimizes te number of iterations. We will ten use as a smooter a GMRES (2) preconditioned by one GS S Y M iteration in our perturbed two grid algoritm. A grapical study in Matlab is ten provided to furter analyze te results of Table 3.6. In fact, by plotting a slice of a tree-dimensional random error after smooting, we want to point out ow smooters are andling te ig frequency components of te error for te Helmoltz equation wit PML. Terefore, we coose a random error vector wit te Matlab random number generator rand( seed,0) and emulate parallelism for GS LEX ans GS S Y M. Smooting is ten performed on tis error for eac metod of Table 3.6 and a slice of te smooted error is sown. Tis slice is located in te vertical plane (x, z) of te unit cube Ω = [0, 1] 3 for y = 0.5. Since tis is a Matlab program, we consider te smallest grid size 64 3 (k = π ) considered in Table 3.6 and emulate parallelism on two processors. Te smooters are parallelized 6 partitioning te tree-dimensional cubic pysical domain in smaller parallelepipeds. In our case, since we ave only 2 processors, te pysical domain is divided in two parallelepiped boxes along te z-direction. Te error slice is plotted in Figure 3.13, te error slices after application of selected smooters in Table 3.6 are plotted in Figures 3.14, 3.15, 3.16 and 3.17 respectively. Figure 3.13: Slice of te initial error (y = 0.5) in te plane (x, z) for te 64 3 grid built wit te Matlab random number generator rand( seed,0).

89 74 Figure 3.14: Slices of te error (y = 0.5) in te plane (x, z) after one iteration of Gauss-Seidel (GS LEX (1), left) and two iterations of Gauss-Seidel (GS LEX (2), rigt) for te 64 3 grid (k = 33.51) on two processors. Figure 3.15: Slices of te error (y = 0.5) in te plane (x, z) after one iteration of Symmetric Gauss- Seidel (GS S Y M (1), left) and two iterations of Symmetric Gauss-Seidel (GS S Y M (2), rigt) for te 64 3 grid (k = 33.51) on two processors.

90 75 Figure 3.16: Slices of te error (y = 0.5) in te plane (x, z) after one iteration of GMRES (GMRES (1), left) and two iterations of GMRES (GMRES (2), rigt) for te 64 3 grid (k = 33.51) on two processors. Figure 3.17: Slices of te error (y = 0.5) in te plane (x, z) after one iteration of GMRES preconditioned by one iteration of symmetric Gauss-Seidel (GMRES (ν)/gs S Y M (1), left) and two iterations of GMRES preconditioned by one iteration of symmetric Gauss-Seidel (GMRES (ν)/gs S Y M (2), rigt) for te 64 3 grid (k = 33.51) on two processors. We first notice tat tere is a relation between te sape of te error, considered as a surface, and te number of iterations in Table 3.6. In fact, te smooter te surface, te lesser te need of preconditioning steps. GMRES as nearly no smooting effect (Figure 3.16) wereas using it in combination wit GS S Y M gives te smootest errors. However, te effect of parallelism on smooting is remarkable on tese plots, especially on te rigt plot of Figure Indeed, one can see te pysical domain splitting on te z axis in tis figure, smooting is performed independently on eac subdomain. Tis is a consequence of local Gauss-Seidel definition, it is acting only locally. Yet it can be seen in Figure 3.17 tat GMRES enables GS S Y M to smoot uniformly all te components of te error. Terefore, even if GMRES is a bad smooter, it can be used efficiently wit standard local relaxation metod as a preconditioner to balance teir lack of parallelisation. In Section 3.5, we analyze te perturbed two-grid preconditioner according to te coarse tolerance wit te spectrum analysis presented in Section

91 Spectrum analysis of te perturbed two-level metod in te Flexible GMRES framework Before considering te spectrum analysis, we first summarize te findings of Section 3.4 in Algoritm Algoritm of te perturbed two-level preconditioner for tree-dimensional Helmoltz problem Algoritm 16 Perturbed two-grid cycle to solve approximately L z = v 1: Pre-smooting: ν 1 iterations of preconditioned GMRES(m s ): z := K(L, v, z, m s ). 2: Restriction of te residual to obtain te coarse rigt-and side: v 2 = I 2 (v Az ) 3: Solve only approximately te coarse problem L 2 z 2 = v 2 suc tat v 2 L 2 z 2 2 ε 2 tanks to a preconditioned GMRES(m c ). v 2 2 4: Interpolation of te coarse solution z 2 : I 2 z 2. 5: Add tis correction to z : z := z + I 2 z 2 6: Post-smooting: ν 2 iterations of preconditioned GMRES(m s ): z = K(L, v, z, m s ). All Krylov metods are preconditioned by one reverse symmetric Gauss-Seidel iteration. Coarse problem: Take zero as an initial guess. Notations: z and z 2 te fine and te coarse grid solutions, v and v 2 te fine and te coarse grid rigt-and sides, ε 2 te coarse tolerance, m s is te smooter restart size, m c is te coarse solver restart size. Te main differences between Algoritm 16 and Algoritm 13 are on one and te use of a preconditioned Krylov metod as a smooter and on te oter and te approximate solution of te coarse problem. Tis last point is te topic of tis section: wic stopping criterion sould be used on te coarse level wen te perturbed two-grid is used as a preconditioner? Since we are using nonlinear metods as smooter and coarse solver in our two-grid preconditioner, a traditional Fourier analysis (eiter local or rigorous) cannot elp us to answer tis question. However, te result obtained on an academic case in Section tends to yield tat a really large tolerance ε 2 can provide an efficient preconditioner. Indeed, we are using te same transfer operators and a discretization sceme of te same order as in te example in Section Neverteless, te boundary conditions (PML) were not taken into account, we ten evaluate numerically te two-grid preconditioner according to its coarse tolerance. Moreover, in order to better understand te results of te next section, we will perform a Hessenberg spectrum analysis in te FGMRES framework (see Section 2.3.1) Influence of te approximate coarse solution on te convergence of te Krylov metod In tis section, we focus on te beavior of FGMRES(5) preconditioned by our perturbed two-grid metod (Algoritm 16) according to te coarse convergence tresold ε 2. We ave ten performed extensive numerical experiments for different coarse tolerances, using a restarted GMRES (m) preconditioned by a local reverse symmetric Gauss-Seidel cycle as bot a coarse solver (m c = 10) and as a smooter (m s = 2). Te coarse solution metod is obtained as soon as b 2 L 2 x 2 2 ε 2 is satisfied (Algoritm 16 line b 2 2 3). Te total number of iterations (number of applications of te preconditioner) needed by FGMRES(5) to converge to 10 6 for four grid sizes is reported in Table 3.7. We first notice in Table 3.7 tat te number of iterations is increasing wit te problem size as in Table 3.6. Ten, it can be noticed tat te number of iterations as a general trend to decrease wen te coarse tolerance is decreasing too. Finally, we remark tat at a certain coarse tolerance, te number of iterations stabilizes. Indeed, for te smallest grids (64 3, ), te numbers of iterations are te same for ε 2 = 0.1 and ε 2 = 10 12, respectively 6 and 9. For te case of 256 3, one can see tat te number of iterations is stabilizing to 17 wen te coarse tolerance is below 0.4 (wic is really close to te 16 iterations for ε 2 = ). For te largest grid (512 3 ), te number of iterations required no longer beaves in a monotonic way. In fact, te number of iterations first decreases wen ε 2 [0.6, 1], and ten increases wen ε 2 [0.3, 0.5], and finally stabilizes around 40 iterations. Terefore, it can be seen on tese examples tat igly accurate

92 77 Grid # Cores Table 3.7: Number of iterations (It) of FGMRES(5) wit respect to te coarse problem normalized tolerance (ε 2 ) for wavenumbers k = π 6. coarse solution is not needed to reac convergence. Even worse, a more accurate solution can deteriorate (opefully not too muc) te convergence of FGMRES wen large grids are considered. We make a spectrum analysis to better understand tis beavior Spectrum analysis in te flexible GMRES framework for tree-dimensional omogeneous Helmoltz problems In tis section, we make a spectrum analysis as in Section for te tree-dimensional Helmoltz problem. Te matrix H m+1 still denotes te augmented Hessenberg matrix: [ H m+1 = ] H m 0 (m+1) (n m 1). 0 (n m 1) (m+1) I n m 1 We compute te eigenvalues of H m+1 at te end of eac restart for te same coarse grid tolerances as presented in Table 3.7 and superpose tem on te same plot. Te parameter guiding te quality of te preconditioner is ten te coarse grid tolerance ε 2. We focus on a test case were te restart parameter m is equal to 5 on a grid (wic seems to be te more interesting test case according to Table 3.7). We ten compute te eigenspectra of H m+1 at eac FGMRES restart for several coarse grid tolerances. We denote by H (i) m+1 te Hessenberg matrix corresponding to te it restart and by λ(h(i) m+1 (ε 2)) te eigenspectrum of H (i) m+1 corresponding to te coarse tolerance ε 2. Finally, we denote by Λ(H m+1 (ε 2 )) te union of te λ(h (i) m+1 (ε 2)) on te restart parameter i: Λ(H m+1 (ε 2 )) = i λ(h (i) m+1 (ε 2)). We plot Λ(H m+1 (ε 2 )) for m = 5 in Figure Te eigenspectra are distributed wit respect to te coarse grid tolerance in te first five plots and we plot in te bottom-rigt corner tree spectra for relevant values of ε 2 : 1, 0.6 and to present an overview of te evolution of te spectrum. In te upper-left corner, it can be seen tat Λ(H m+1 (ε 2 )) is very similar wen ε 2 = 1 and ε 2 = 0.9: several eigenvalues are close to zero enclosed in an ellipse lying in a alf plane of te complex plane. In te upper-rigt corner plot, Λ(H m+1 (0.8)) is located approximately in te same ellipse as te previous eigenspectra wereas it can be noticed tat Λ(H m+1 (0.7)) is more farter from zero. Looking at Table 3.7, it appears tat te number of iterations is greatly decreasing wen ε In fact, te number of iterations of FGMRES(5) seems to be related to te location of Λ(H m+1 (ε 2 )) in te complex plane. Indeed, for ε 2 0.6, te real parts of Λ(H m+1 (ε 2 )) move away from zero and ten get closer to zero again wen ε A related beavior is recorded in Table 3.7, te number of iterations is varying wit respect to te number of eigenvalues close or not to zero. Tus, tis spectrum study gives extra information about te beavior of te metod. It also confirms tat te efficiency of te flexible preconditioner does not depend monotonically on te convergence of its coarse grid problem. Neverteless, we can only deduce tanks to tis analysis te optimal ε 2 parameter for one grid size only. In Table 3.7, te best ε 2 is 0.6 on a grid, 0.4 on a grid. Moreover, to converge at eac iteration of FGMRES to a certain ε 2 on te coarse level can be very expensive in computational resources.

93 78 Figure 3.18: From rigt to left: Λ(H m+1 ) for different coarse tolerance ε 2, m = 5 on a grid wit k = π and PML. 6

94 79 We ten decide to fix te number of coarse iterations per preconditioning cycle to save some computational time in te solution of te linear system. Indeed, we plot in Figure 3.19, te numbers of coarse iterations needed to get a normalized residual below 0.6 wit respect to te FGMRES iterations. It can be seen tat te number of coarse iteration is oscillating between 50 and 300. In average, 95 coarse iterations are performed for eac FGMRES iteration. Ten we fix te number of coarse iterations to 100 per FGMRES iterations. It can be seen in Table 3.8 tat te number of iterations needed to converge for FGMRES is in te range of te best one of Tables 3.7 wile a constant number of iterations on te coarse level is done. Tis is also confirmed wen plotting te eigenvalues of H m+1, on te grid Λ(H m+1 ) real parts are greater tan 0.2 and and on te grid Λ(H m+1 ) real parts are greater tan 0.5. Terefore, we will follow tis fixed coarse iteration strategy in order to yield a cycle wit fixed computational work per preconditioning step. Figure 3.19: Number of iterations needed by GMRES(10) preconditioned by a reverse symmetric Gauss- Seidel cycle to converge to 0.6 wit respect to te FGMRES(5) current iteration. Grid k # Cores m = Table 3.8: Number of iterations of FGMRES(5) required to reac 10 6 performing 100 iterations of preconditioned GMRES(10) on te coarse level at eac iteration of FGMRES(5) for two wavenumbers. Figure 3.20: From rigt to left: Λ(H m+1 ) spectrum using 100 coarse iterations of GMRES(10) preconditioned by a reverse symmetric Gauss cycle for a grid (k = ) and a grid (k = ) to converge to 10 6 wit FGMRES(5).

95 Conclusions In tis capter, we proposed a tree-dimensional multilevel preconditioner for te solution of te treedimensional Helmoltz equation wit PML. Keeping in mind [42], we tried to extend to tree dimensions a multigrid preconditioner acting on a sifted Helmoltz operator. Tis preconditioner is excepted to work provided tat te rigt smooter acts in tree dimensions. Yet te coice of te sift parameter is an open question tat can only be solved, from our point of view, by a trial and error procedure. Since tis coice strongly depends on te multigrid operator components, te discretization of te Helmoltz operator itself and its omogeneity/eterogeneity (constant/variable velocity in te pysical domain), we ave decided to focus on a preconditioner acting directly on te original Helmoltz operator. Tis strategy implies to use a restricted number of levels (two). We ave ten designed a perturbed two-grid preconditioner for tree-dimensional Helmoltz problems. Its principle relies on two remarkable penomena: in a standard two-grid cycle (Algoritm 15), te coarse solution is not required to be exact to obtain a two-grid metod as efficient as wen te coarse solution is exact. Moreover, te two-grid metod beaves well even if te convergence tresold is very large, say about 0.1. Gauss-Seidel type metods are efficient to smoot error on te fine grid for tree-dimensional Helmoltz problems at ig wavenumbers. Furtermore, in a parallel environment, local Gauss-Seidel metods can be furter improved by a Krylov accelerator suc as GMRES. According to te spectrum analysis of Section 3.5, using te two-level metod as a preconditioner for te original Helmoltz operator is relevant. According to Table 3.7, te two-level preconditioner combined wit FGMRES requires a reasonable number of iterations even at large wavenumbers. However, its numerical efficiency may not imply its computational efficiency. Te next capter is devoted to numerical experiments on parallel computers. Te computational efficiency of our perturbed two-grid preconditioner will be sown bot for omogeneous test cases (constant propagation velocity in te pysical domain) and eterogeneous ones (variable propagation velocity).

96 Capter 4 Numerical experiments - Applications to geopysics 4.1 Introduction In tis capter we evaluate te efficiency of te perturbed two-level preconditioner proposed in Capter 3 for solving tree-dimensional Helmoltz problems occurring in geopysics. Tis evaluation will be performed for bot omogeneous and eterogeneous media in a single and multiple rigt-and side situation. Since te linear systems arising from te discretization of tis Helmoltz operator are very large (see Appendix A), te metods presented in Capters 2 and 3 ave to be implemented in a parallel memory distributed environment. We refer to [47] and [48] for te parallel implementation of GMRES and FGMRES respectively and to Capter 6 in [115] for te parallel implementation of te perturbed two-level preconditioner on structured grids. Denoting by A C n n, B C n p, X C n p, te matrix of te linear system, te rigt-and side and te solution respectively, we focus on preconditioned iterative metods for te solution of AX = B, wit a zero initial iterate. Te iterative procedures are stopped wen te Euclidean norm of eac column of te block residual normalized by te Euclidean norm of te corresponding rigt-and side satisfies te following relation in te 2-norm: B(:, l) AX(:, l) 2 B(:, l) , l = 1,..., p, (4.1) Te tolerance is set to 10 5 so as to use te same stopping criterion in bot single and double precision aritmetic. First we present numerical experiments for te single rigt-and side situation (p = 1). Te FGM- RES(5) metod (Algoritm 3 of Capter 2) preconditioned by one cycle of te perturbed two-grid metod (Algoritm 16 of Capter 3) is used on bot omogeneous and eterogeneous problems. Several wavenumbers - from moderate to uge - will be considered. Wen variable velocity fields are considered, te resulting pressure fields are plotted versus te considered frequencies. Secondly we consider te case of multiple rigt-and sides (p > 1). Once again one cycle of te perturbed two-grid algoritm (Algoritm 16) is used as a preconditioner. Te flexible block metods presented in Section 2.6 will be evaluated only in te case of eterogeneous media. Several numerical tests on public domain model problems will elp us to determine te best strategy wen solving suc linear systems. 81

97 Tree-dimensional omogeneous Helmoltz problems wit a single rigt-and side In tis section we present numerical experiments tat we ave performed during te PRACE (Partnersip for Advanced Computing in Europe) petascale summer scool in Stockolm, Sweden during te last week of August Two parallel computers were available: a Cray XT4 located in Espoo (Finland) and a IBM Blue Gene/P located in Jülic (Germany). We are focusing on two sets of experiments. Te first set called weak scalability experiments consists in increasing te global size of te problem proportionally to te number of cores keeping te size of te local problem on eac core fixed. Te second set called strong scalability experiments consists in increasing te number of cores keeping te size of te global problem fixed. We are also interested in investigating te beavior of te algoritms in single and double precision aritmetic. Indeed geopysical computations are often performed in single precision [89]. For all tese experiments te algoritm used is FGMRES(5) preconditioned by a two-grid cycle (Algoritm 16) wit GMRES(2) preconditioned by a local symmetric Gauss-Seidel iteration as a smooter and 100 iterations of GMRES(10) preconditioned by a local symmetric Gauss-Seidel cycle as an approximate coarse solver. For all tese experiments te algoritm is stopped wen Relation 4.1 is satisfied wit p = 1. Te rigt-and side b, representing te wave source S, is resulting from te discretization of a Kronecker function δ (xδ,y δ,z δ ) were te source position is located at: b = δ (nx /2,n y /2,n PML +1), were n x, n y are te number of points in te x and y directions respectively and n PML te number of points in te absorbing layer (n PML = 16, see Appendix A). Te PML is located inside te pysical grid and te wavenumbers k are selected suc tat tey satisfy te stability condition (Relation A.4): k = π 6. Te actual memory M allocated in our code is given wit te following formula (in Gigabytes (GB)): M = #Cores c=1 [ n loc (c) 6 + (2m f + 1) + (m s + 2) + (m ] g + 2) + 3 ϑ (4.2) were n loc (c) is te local problem size on te core c, m f te restart parameter of FGMRES, m s te restart parameter of te smooter, m g te restart parameter of te coarse solver and ϑ te memory required to store a number in te considered aritmetic precision. It as to be noticed tat te numbers 6 and 3/8 in Equation 4.2 are related to te storage of te solution, rigt-and side, work arrays and te diagonal of te Helmoltz matrix on te fine and coarse grids respectively. Te oter matrix diagonals do not need to be stored in our matrix-free implementation of matrix-vector products and Gauss-Seidel procedures. First we present numerical experiments performed on a Cray XT4 related to weak and strong scalability experiments and a comparison between single and double precision algoritms. Ten we will present results on larger problems performed on te Blue Gene/P focusing on te single precision aritmetic only. In te following tables denotes te mes grid size, k te wavenumber, Grid te number of points of te problem and teir repartition per direction, # Cores te total number of cores, Partition te repartition of te cores per direction, T and It te total elapsed time and te number of iterations respectively, T/It te time per iteration and M te total memory cost of te algoritm in GB (see Relation 4.2) PRACE experiments: Cray XT4 at Espoo (Finland) Cray XT4 Loui Te Cray XT4 Loui 2 consists of 1012 quad-core AMD Opteron Barcelona processors wit 1 GB of memory per core. Te clock rate of tese processors is 2.3 GHz. Eac node (a single quad-core processor) as 4 GB memory (1 GB/core). Te Cray SeaStar interconnect system directly connects all nodes in a 1 ttp:// 2 ttp://

98 83 tree-dimensional torus topology using Hyper-Transport links of Opteron processors. On tis macine, our Fortran 90 code as been compiled wit te Portland compiler suite wit "-O3 -fastsse" options and linked wit te ACML library (AMD Core Mat Library). Weak scalability experiments in single and double precision aritmetic Cray XT4 Loui Weak scalability experiments in single and double precision aritmetic k Grid # Cores T(s) It T/It M(GB) dp sp dp sp dp sp dp sp 1/ / / Table 4.1: Two-grid preconditioned Flexible GMRES(5) for te solution of te Helmoltz equation for te omogeneous model problem wit wavenumber k suc tat k = π/6. Te results are sown for bot single precision (sp) and double precision (dp) aritmetic. In Table 4.1, it is found tat te number of iterations beaves linearly wit te wavenumber k in bot single and double precision aritmetic; tis is illustrated in Figure 4.1. Tis beavior as also been observed in te literature for oter multilevel strategies [14, 96], wen addressing smaller problem sizes altoug. Figure 4.1: Number of iterations (It) of Table 4.1 for bot single and double precision aritmetic wit respect to te wavenumber k. As expected, using single precision aritmetic leads to a reduction by a factor of two in te memory requirements, as sown in Table 4.1. Te time per iteration is found to be almost constant indicating a good load-balancing property.

99 84 Strong scalability experiments in single precision aritmetic Cray XT4 Loui Strong scalability experiments in single precision aritmetic k Grid # Cores Partition T(s) It T/It τ M(GB) 1/ / / Table 4.2: Two-grid preconditioned Flexible GMRES(5) for te solution of te Helmoltz equation for te omogeneous model problem wit wavenumber k suc tat k = π/6. τ = T re f T / P P re f is a scaled speed-up were T, P denote te elapsed time and number of cores on a given experiment respectively. Since it as been remarked in te previous section tat te codes in single and double precision aritmetic beave numerically similarly, we only focus on single precision aritmetic for tese strong scalability experiments. In tis table te global problem size ( ) is kept constant, wereas te number of cores is multiplied by a factor of two from one row to te oter. Te number of required iterations is found to be quite constant. Te differences can be explained by te fact tat te preconditioning metod used in bot te smooter and te coarse solver are local and tus depends on te number of cores and teir repartition. We note tat te memory requirement depends on te number of cores too (indeed wen more cores are used, more local boundaries and overlapping zones ave to be stored). Te τ parameter is an indicator of te scalability of te algoritm: if τ = 1, te code perfectly scales. Taking as T re f and P re f, te time and te number of cores corresponding to te first experiment (2175 seconds and 256 cores respectively), it appears tat te code scales quite well. Te beavior τ = 1.23 for te last experiment is probably due to cace effects and would deserve furter investigation PRACE experiments: IBM Blue Gene/P at Jülic (Germany) IBM Blue Gene/P Jugene Te IBM Blue Gene/P Jugene 3 consists of 72 racks, eac one containing 1024 nodes wit 2 GB of memory per node. A node is made of 4 computing cores running at 850 MHz (32-bit PowerPC 450). Te interconnect system directly connects all nodes in a tree-dimensional torus topology. On tis macine, following te constructor recommandation, our Fortran 90 code as been compiled wit IBM native compilers wit "-O3 -g -qmaxmem=-1 -qarc=450 -qtune=450" options and linked wit te ESSL library. Te virtual node execution mode as been cosen (4 MPI processes per node wit 512 MB as maximum memory per MPI process). Te mapping used is of MESH type. Tanks to te availability of tis macine, we ave been allowed to run large test cases, using up to cores (te wole macine at tat time, August 2008) to solve a linear system of size larger tan 68 billion. All te numerical experiments ave been performed in single precision aritmetic and are summarized in Tables 4.3 and 4.4. Weak scalability experiments in single precision aritmetic Te beavior of te first two experiments (see te first two rows) is similar as in Table 4.1, te number of iterations doubles wen te wavenumber is multiplied by a factor of two. However, wen k = , te number of iterations is nearly tree times te number of iterations wen k = Te problem size ( ) and te large value of te wavenumber could explain tis beavior. Despite tis increase in iterations, te algoritm still scales quite well: te time per iteration is found to be constant (about 29 seconds) wen te ratio between te size of te global problem and te number of cores is kept constant. 3 ttp://

100 85 Blue Gene/P Jugene Weak scalability experiments in single precision aritmetic k Grid # Cores Partition T(s) It T/It M(GB) 1/ / / Table 4.3: Two-grid preconditioned Flexible GMRES(5) for te solution of te Helmoltz equation for te omogeneous model problem wit wavenumber k suc tat k = π/6. Strong scalability experiments in single precision aritmetic Blue Gene/P Jugene Strong scalability experiments in single precision aritmetic k Grid # Cores Partition T(s) It T/It τ M(GB) 1/ / / / Table 4.4: Two-grid preconditioned Flexible GMRES(5) for te solution of te Helmoltz equation for te omogeneous model problem wit wavenumber k suc tat k = π/6. τ = T re f T / P P re f is a scaled speed-up were T, P denote te elapsed time and corresponding number of cores on a given experiment respectively. Most of te comments related to Table 4.2 do apply to Table 4.4 as well. Te number of iterations is sligtly varying because of te local nature of bot te smooters and te preconditioner used in te coarse solver. It can be noticed tat te elapsed time is divided by more tan a factor of two from one row to te next. Indeed te τ parameter is always larger tan one and increases wit te number of cores. It seems to be due to cace effects once again. 4.3 Tree-dimensional eterogeneous Helmoltz problems wit a single rigt-and side In tis section we present numerical experiments for two variable velocity fields tat are publicly available: te SEG/EAGE Salt dome and te SEG/EAGE Overtrust models [5] defined in a domain of size L x L y L z (m 3 ). Te solution metod is te same as in te omogeneous case (Section 4.2). Te source is also located at te center of te (x, y) plane below te PML layer. Contrary to te omogeneous case, we now fix te frequency f in Hz and deduce te mes grid size in m according to Relation A.3: = min (x,y,z) Ω v(x, y, z). (4.3) 12 f Furtermore, te PML layers (see Appendix A) are added around te pysical domain (n PML = 16). Tis implies te following grid partition: [ ] 12 f L x min (x,y,z) Ω v(x, y, z) + 32, 12 f L y min (x,y,z) Ω v(x, y, z) + 32, 12 f L z min (x,y,z) Ω v(x, y, z) Consequently te ratio between te number of unknowns will not be proportional to te ratio of corresponding frequencies indicated later by te "Grid ratio" value. Yet we are still focusing on strong and weak scalability experiments in single precision aritmetic wit a number of cores proportional to te frequency f. First we present experiments for te SEG/EAGE Salt

101 86 dome velocity field. In te following tables, is te mes grid size in m, f te frequency in Hz, Grid te number of points and teir repartition per direction (n x n y n z ), Grid ratio te ratio between te grid size in te current line and te grid size in te preceding line, # Cores te number of cores, Partition te repartition of te cores per direction, T and It te elapsed time and te number of iterations, T/It te time per iteration and M te total memory requested in GB. IBM Blue Gene/P Babel All te numerical experiments in tis section ave been performed on te IBM Blue Gene/P Babel at IDRIS in Orsay (France) 4. Te Babel macine is a IBM Blue Gene/P system. It consists of 10 racks, eac one containing 1024 nodes wit 2 GB of memory per node. A node as 4 computing cores running at 850 MHz (32-bit PowerPC 450). Te interconnect system directly connects all nodes in a tree-dimensional torus topology. On tis macine, our Fortran 90 code as been compiled wit IBM native compiler wit "-O3 -qot -qarc=450 -qtune=450" options and linked wit te ESSL library. Te virtual node execution mode as been cosen (4 MPI processes per node wit 512 MB as maximum memory per MPI process). Te mapping used is of MESH type. Te iterative procedure is stopped wen Relation (4.1) is satisfied for p = SEG/EAGE Salt dome model problem Te SEG/EAGE Salt dome model [5] is a velocity field containing a salt dome in a sedimentary embankment. It is a parallelepiped domain of size km 3. Te minimum value of te velocity is 1500 m.s 1 and its maximum value is 4481 m.s 1 respectively. Te wole velocity field as been considered ere. Weak scalability experiments in single precision aritmetic Salt dome - Blue Gene/P Babel Weak scalability experiments in single precision aritmetic (m) f (Hz) Grid Grid ratio # Cores Partition T(s) It T/It M(GB) Table 4.5: Two-grid preconditioned Flexible GMRES(5) for te solution of te Helmoltz equation for te SEG/EAGE Salt dome model wit mes grid size suc tat = min (x,y,z) Ω v(x, y, z)/(12 f ). Te parameter T denotes te total computational time, It te number of preconditioner applications and M te requested memory. In Table 4.5 we remark tat as in te omogeneous case, in te first tree rows, wen te frequency is multiplied by a factor of two from one line to te next, te number of FGMRES(5) iterations is also multiplied by a factor close to two. However, in te case of f = 20 Hz, te number of iterations greatly increases; it is about four times te number of iterations required to solve te Helmoltz problem at f = 10 Hz. We also remark tat te time per iteration decreases wen te frequency increases. Indeed te Grid ratio is always smaller tan te ratio between te numbers of cores (8). Te real part of te solutions and te velocity fields at tese four frequencies are plotted in Figures 4.2, 4.3, 4.4 and 4.5 respectively. Two different plots are sown: first te contour of te real part of te solution is plotted next to te contour of te velocity field, ten a section of te real part of te solution in te plane (x, y) for y = n y /2 is plotted next to te corresponding section of te velocity field. In tese figures we 4 ttp://

102 87 observe te propagation of te wave and te position of te source. We also note tat te variations in te pressure field due to te eterogeneity of te media clearly appear. Strong scalability experiments in single precision aritmetic Salt dome - Blue Gene/P Babel Strong scalability experiments in single precision aritmetic (m) f (Hz) Grid # Cores Partition T(s) It T/It τ M(GB) Table 4.6: Two-grid preconditioned Flexible GMRES(5) for te solution of te Helmoltz equation for te SEG/EAGE Salt dome model wit mes grid size suc tat = min (x,y,z) Ω v(x, y, z)/(12 f ). Te parameter T denotes te total computational time, It te number of preconditioner applications and M te memory. τ = T re f T / P P re f is a scaled speed-up were T, P denote te elapsed time and corresponding number of cores on a given experiment respectively. First it can be noticed in Table 4.6 tat te metod does not scale as well as in te omogeneous case. In fact, from one row to te next, te number of iterations increases especially wen large numbers of cores (8192, 16384) are considered. Tis penomenon is probably due to te nature of te preconditioner of bot te coarse solution metod and smooters; te eterogeneity of te medium badly influences its efficiency. Indeed, wen more iterations are performed on te coarse level (see Table 4.7 were 200 coarse iterations are imposed instead of 100), te number of outer iterations decreases and te metod scales up to 2048 cores. In Table 4.7 it as to be noticed tat in te case of 4096 cores, te number of iterations is significantly reduced compared to te results sown in Table 4.6. However it is still larger tan te number of iterations on smaller numbers of cores (see first rows of Table 4.7). Salt dome - Blue Gene/P Babel Strong scalability experiments in single precision aritmetic (m) f (Hz) Grid # Cores Partition T(s) It T/It τ M(GB) Table 4.7: Two-grid preconditioned Flexible GMRES(5) performing 200 coarse iterations per cycle for te solution of te Helmoltz equation for te SEG/EAGE Salt dome model wit mes grid size suc tat = min (x,y,z) Ω v(x, y, z)/(12 f ). Te parameter T denotes te total computational time, It te number of preconditioner applications and M te memory. τ = T re f T elapsed time and corresponding number of cores on a given experiment respectively. / P P re f is a scaled speed-up were T, P denote te

103 88 Figure 4.2: Contours and sections of te solution of te tree-dimensional Helmoltz problem at f = 2.5Hz (rigt) and te SEG/EAGE Salt dome - velocity field (left). Figure 4.3: Contours and sections of te solution of te tree-dimensional Helmoltz problem at f = 5Hz (rigt) and te SEG/EAGE Salt dome - velocity field (left).

104 89 Figure 4.4: Contours and sections of te solution of te tree-dimensional Helmoltz problem at f = 10Hz (rigt) and te SEG/EAGE Salt dome velocity field (left). Figure 4.5: Contours and sections of te solution of te tree-dimensional Helmoltz problem at f = 20Hz (rigt) and te SEG/EAGE Salt dome velocity field (left).