A matrix-free preconditioner for sparse symmetric positive definite systems and least square problems

A matrix-free preconditioner for sparse symmetric positive definite systems and least square problems Stefania Bellavia Dipartimento di Ingegneria Industriale Università degli Studi di Firenze Joint work with Jacek Gondzio and Benedetta Morini Lavoro svolto nellambito del Progetto INdAM-GNCS 2012 Metodi e software numerici per il precondizionamento di sistemi lineari nella risoluzione di PDE e di problemi di ottimizzazione Algebra Lineare Numerica e sue Applicazioni, Rome 29-31 Jan 2013 Stefania Bellavia, Università di Firenze 1 / 28

Introduction The Problem Consider systems of the form Hx = b, with H R m m spd Special interest in the case H = AΘA T with A R m n sparse and Θ R n n diagonal spd They arise in at least two prominent applications in the area of optimization: Newton-like methods for weighted least-squares problems, interior point methods Stefania Bellavia, Università di Firenze 2 / 28

Introduction We assume that H is too large and/or too difficult to be formed and solved directly We will solve it using an iterative Conjugate Gradient (CG) like approach We are interested in preconditioning H with a reliable algorithm that does not require forming the whole matrix H at a time (matrix-free) We are also interested in solving sequences of linear systems arising in optimization methods Stefania Bellavia, Università di Firenze 3 / 28

Introduction Preconditioning H Incomplete Cholesky (IC) factorizations are matrix-free in the sense that the columns of H can be computed one at a time, and then discarded Breakdown-free when H is an H-matrix IC factorizations relying on drop tolerances to reduce fill-in have unpredictable memory requirements Alternative approaches with predictable memory requirements depend on the entries of H, [Jones, Plassmann, ACM Trans Math Software 1995], [Lin, Moré, SISC 1999] Eg, let n k = nnz(tril(h(:, k), 1)) and retain the n k + p largest elements in the strict lower triangular part of the kth column of the factor, for some fixed p > 0 High storage requirements if H is dense Stefania Bellavia, Università di Firenze 4 / 28

Introduction Preconditioning H Approximate Inverse preconditioners form factorized sparse approximations for H 1 The Stabilized Approximate Inverse preconditioner (SAINV) by [Benzi, Cullum, Tuma, SISC 2000] is based on a modified Gram-Schmidt process It is matrix-free, ie it employs H multiplicatively and may work entirely with A T It preserves sparsity in the factors by dropping small elements In exact arithmetic, it is applicable to any SPD matrix without breakdowns The underlying assumption is that most entries of H 1 are small in magnitude Stefania Bellavia, Università di Firenze 5 / 28

Introduction Properties of our preconditioner Limited memory: memory bounded by O(m) rather than O(nz(H)) Matrix free: only the action of H on a vector is needed Only a small number k m of general matrix-vector products is required The diagonal of H or its approximation is needed: we expect that in many practical applications we will be able to compute or estimate the diagonal of H at low cost Stefania Bellavia, Università di Firenze 6 / 28

Introduction Properties of our preconditioner Limited memory: memory bounded by O(m) rather than O(nz(H)) Matrix free: only the action of H on a vector is needed Only a small number k m of general matrix-vector products is required The diagonal of H or its approximation is needed: we expect that in many practical applications we will be able to compute or estimate the diagonal of H at low cost PARTIAL CHOLESKY + DEFLATED CG Stefania Bellavia, Università di Firenze 6 / 28

LMP Preconditioner The preconditioner Partial Cholesky factorization limited to a small number k of columns of H + diagonal approximation of the Schur complement, [Gondzio, COAP 2011] 1 Choose k m Consider the formal partition of H [ H11 H H = 21 T H 21 H 22 ], H 11 R k k, H 21 R (m k) k, H 22 R (m k) (m k) 2 Form the first k columns of H, ie H 11, H 21 Stefania Bellavia, Università di Firenze 7 / 28

The preconditioner The preconditioner ced 3 Compute the Cholesky factorization [ L11 L 21 ] of H limited to [ H11 H 21 ] Compute the LDL T factorization H 11 = L 11 Q 11 L T 11 (Discard H 11) Solve L 11 Q 11 L T 21 = HT 21 for L 21, ie L 21 = H 21 L T 11 Q 1 11 (Discard H 21 ) Stefania Bellavia, Università di Firenze 8 / 28

The preconditioner The preconditioner ced 3 Compute the Cholesky factorization [ L11 L 21 ] of H limited to [ H11 H 21 ] Compute the LDL T factorization H 11 = L 11 Q 11 L T 11 (Discard H 11) Solve L 11 Q 11 L T 21 = HT 21 for L 21, ie L 21 = H 21 L T 11 Q 1 11 (Discard H 21 ) It follows where H = [ L11 L 21 I m k ] [ Q11 S S = H 22 H 21 H 1 11 HT 21, is the Schur complement of H 11 in H ] [ L T 11 L T 21 I m k ], Stefania Bellavia, Università di Firenze 8 / 28

The preconditioner The preconditioner ced 4 Set Q 22 = diag(s) = diag(h 22 ) diag(l 21 Q 11 L T 21) and P = [ L11 L 21 I m k ] } {{ } L [ Q11 Q 22 } {{ } Q ] [ L T 11 L T 21 I m k ] } {{ } L T The algorithm for constructing P has some good properties: it cannot break down in exact arithmetic; it has predictable memory requirements, nnz(l) O(km) Stefania Bellavia, Università di Firenze 9 / 28

The preconditioner Storage and computational cost The complete diagonal of H is required If it is not available and H = AΘA T : (H) ii = A T e i 2 2, i = 1,, m Storage: one (sparse) vector A T e i at a time and a vector for the diagonal of H The first k columns of H are computed and stored: He i, i = 1,, k The additional cost of this step is k products of H times a vector The products He i are cheap if H (or A) is sparse The k products He i are expected to be cheaper than the products Hv required by PCG where the vectors v involved are tipically dense Stefania Bellavia, Università di Firenze 10 / 28

The preconditioner Factorized form of P 1 By P = [ L11 L 21 I m k ] [ Q11 Q 22 ] [ L T 11 L T 21 I m k ], it follows [ P 1 L T = 11 L T 11 LT 21 0 I m k ] [ Q 1 11 0 0 Q 1 22 ] [ L 1 11 0 L 21 L 1 11 I m k ] ie a factorized sparse approximation for H 1 Stefania Bellavia, Università di Firenze 11 / 28

The preconditioner Factorized form of P 1 By P = [ L11 L 21 I m k ] [ Q11 Q 22 ] [ L T 11 L T 21 I m k ], it follows [ P 1 L T = 11 L T 11 LT 21 0 I m k ] [ Q 1 11 0 0 Q 1 22 ] [ L 1 11 0 L 21 L 1 11 I m k ] ie a factorized sparse approximation for H 1 [ ] [ ] L11 Q 1/2 11 Letting R = L 21 I m k Q 1/2 22 P 1 H is similar to the block diagonal matrix [ Ik 0 ] 0 Q22 1 we have P = R T R Stefania Bellavia, Università di Firenze 11 / 28

The preconditioner Spectral analysis of P 1 H k eigenvalues of P 1 H are equal to 1 The other eigenvalues are eigenvalues of Q 1 22 S and λ(q 1 22 S) λ min (S) λ max (Q 22 ) λ min (H) λ max (diag(s)) λ(q 1 22 S) λ max(s) λ min (Q 22 ) λ max(h 22 ) λ min (diag(s)) Stefania Bellavia, Università di Firenze 12 / 28

The preconditioner Reordering of H A greedy heuristic technique acts on the largest eigenvalues of H Since H is SPD, λ max (H) tr(h) = tr(h 11 ) + tr(h 22 ) [ ] If Q 22 = I, then P 1 Ik 0 H is similar to, and 0 S ([ ]) λ max (P 1 Ik 0 H) tr = k + tr(s) 0 S Permuting rows and columns of H so that H 11 contains the k largest elements of diag(h) would imply k + tr(s) tr(h) and a large reduction in the value of λ max (P 1 H) with respect to λ max (H) Stefania Bellavia, Università di Firenze 13 / 28

Deflated CG Handling small eigenvalues Applying the greedy technique requires no extra storage In most cases, the greedy reordering takes care of the largest eigenvalues of H and κ 2 (R 1 HR T ) is reduced considerably with respect to κ 2 (H) On the other hand, the smallest eigenvalues of H are sligtly modified or moved towards the origin When the convergence of CG (or CG-like) method is hampered by a small number of eigenvalues of P 1 H close to zero, the Preconditioned Deflated-CG or CG-like algorithm can be useful, [Saad, Yeung, Erhel, Guyomarc h, SISC 2000] Stefania Bellavia, Università di Firenze 14 / 28

Deflated CG Preconditioned Deflated-CG Let the eigenvalues of P 1 H be labeled in increasing order: λ 1 (P 1 H) λ m (P 1 H) Ideal case: Inject l exact eigenvectors of P 1 H associated to λ 1 (P 1 H),, λ l (P 1 H), into the Krylov subspace ( ) µ 1 j x x j H 2 x x 0 H, µ = λ m(p 1 H) µ + 1 λ l+1 (P 1 H) Therefore, convergence of CG method is improved if a few eigenvalues are close to the origin and well separated from the others If the l eigenvectors of P 1 H are numerically approximated, one can expect µ λ m (P 1 H)/λ l+1 (P 1 H) Stefania Bellavia, Università di Firenze 15 / 28

Deflated CG Preconditioned Deflated-CG ced Apply Deflated-CG to the split-preconditioned system R T HR 1 y = R T b, x = R 1 y using a few eigenvectors associated to the smallest eigenvalues of R T HR 1 Symmetric Lanczos processes for sparse symmetric eigenvalue problems require products of R T HR 1 times a vector Each product has the cost of one preconditioned PCG iteration To amortize the cost of approximating eigenvectors, Preconditioned Deflated-CG is suitable for solving systems with multiple right-hand sides and sequences of slowly varying linear systems Stefania Bellavia, Università di Firenze 16 / 28

Numerical results Numerical experiments We implemented the preconditioner in Matlab, ϵ m = 2 10 16 Initial guess for PCG: x 0 = (0,, 0) T Stopping criterion: Hx j b 2 10 6 b 2 A failure is declared after 1000 iterations H = AA T, 35 matrices A from the University of Florida Sparse Matrix Collection, Groups: LPnetlib, Meszaros for Linear Programming problems 1090 m 105127 220 10 5 dens(a) 650 10 3, 551 10 5 dens(h) 251 10 1 Stefania Bellavia, Università di Firenze 17 / 28

Numerical results Numerical experiments ced Experiments with SAINV preconditioner H 1 ZD 1 Z T where Z is unit upper triangular, D is diagonal Code from Sparselab package developed by M Tuma First drop tolerance tested: 10 1 In case of failure, the tolerance is progressively reduced by a factor 10 Stefania Bellavia, Università di Firenze 18 / 28

Numerical results Cost Comparison Tabella : Cost of the construction and application of LMP and SAINV Type Construction Application LMP m sparse-to-sparse products Θ 1/2 (A T e i ) 2 backsolves with L 11 k sparse-to-sparse products AΘ(A T e i ) 1 mat-vec product with D 1 m k backsolves with L 11 m k scalar products in R k m k scalar products in R k k scalar products in R m k SAINV m sparse-to-sparse products AΘ(A T v) 2 mat-vec products with Z 1 mat-vec product with D Stefania Bellavia, Università di Firenze 19 / 28

Numerical results Comparison between LMP(50) and LMP(100) LMP(100) outperforms LMP(50) in terms of PCG iterations 1 09 08 07 Performance profile,execution time LMP(50) LMP(100) 06 π s (τ) 05 04 03 02 01 0 1 15 2 25 3 τ Stefania Bellavia, Università di Firenze 20 / 28

Numerical results Comparison between LMP(50) and SAINV SAINV solved 21 systemsperformance profile on the tests successfully solved by all preconditioners 1 Performance profile, CG iterations π s (τ) 08 06 04 02 LMP(50) SAINV 0 1 2 3 4 5 6 7 8 9 τ Performance profile,execution time 1 π s (τ) 08 06 04 02 0 LMP(50) SAINV 2 4 6 8 10 12 14 16 τ Stefania Bellavia, Università di Firenze 21 / 28

Numerical results Preconditioner density 10 0 density of H and of the factors L and Z 10 2 L Z H 10 4 0 5 10 15 20 25 10 0 density of the factors L and L 1 10 2 L L 1 10 4 0 5 10 15 20 25 Stefania Bellavia, Università di Firenze 22 / 28

Numerical results Experiments with Preconditioned Deflated-CG A few eigenvectors of R T HR 1 are computed by the Matlab package PROPACK [RM Larsen, 1998] The symmetric Lanczos algorithm with partial reorthogonalization is applied A loose accuracy for the convergence criterion, 10 1, is fixed along with a specified maximum dimension, DIM L, of the Lanczos basis allowed The number of products of matrix-vector products is at most DIM L In the Preconditioned Deflated-CG we injected the estimated eigenvectors If convergence was not achieved, the vectors associated with eigenvalues smaller than a prescribed tolerance are selected Stefania Bellavia, Università di Firenze 23 / 28

Numerical results Solution of a single system Prec Prec H P 1 H Defl-CG CG Test name λ max λ min λ max λ min IT L IT L lp d2q06c 127e6 637e-4 648e0 339e-5 278 338 lp pilot 110e5 155e-2 122e1 258e-4 160 264 lp pilot87 101e6 152e-2 222e1 201e-4 250 294 lp stocfor2 160e6 198e-3 771e0 117e-6 97 144 lpi bgindy 897e3 407e-2 555e0 829e-3 38 53 ge 189e8 490e-5 121e1 878e-7 41 58 nl 826e4 700e-3 730e0 161e-4 388 441 scrs8-2c 185e3 349e-5 539e1 832e-5 102 140 Preconditioner formed with k = 50 Number of small eigenvalues estimated: 5 Maximum dimension of the Lanczos basis: 50 Stefania Bellavia, Università di Firenze 24 / 28

Numerical results Sequences of normal equations from least-squares problems Sequences of normal equations arise in the solution of constrained and unconstrained least-squares problems If the coefficient matrices vary slowly, a preconditioner freeze strategy for LMP coupled with Deflated-CGLS can be used We solved the Nonnegative Linear Least-Squares problems 1 min x 0 2 Bx d 2 2, B full rank, by the interior Newton-like method [Bellavia, Macconi, Morini, NLAA 2006] The trial step at jth nonlinear iteration solves ( ) ( ) min BSj Bxj d 2 p IR n p +, W j 0 Stefania Bellavia, Università di Firenze 25 / 28 2

Numerical results LMP in NNLS The matrix of the normal equation is H j = A j A T j, A j = ( S j B T W j ), j = 0, 1, where S j and W j are matrices with entries in (0, 1] and [0, 1] respectively We solve the sequence of linear systems with a frozen preconditioner For a seed matrix, say H 0, we form the LMP preconditioner and compute l approximate eigenvectors associated to the smallest eigenvalues We reuse the preconditioner and the eigenvectors troughout the nonlinear iterations until the preconditioner deteriorates, ie the limit of CGLS iterations is reached Then, the LMP preconditioner and l eigenvectors are refreshed for the current matrix Stefania Bellavia, Università di Firenze 26 / 28

Numerical results LMP(100), 5 small eigs estimated, Lanczos basis dim: 50 Prec Defl-CGLS Prec CGLS Test IT NL(R) IT L IT NL(R) IT L Savings in mat-vec prod lp pilot87 27(1) 3639 30(1) 6023 36% lp ken 11 14 512 19 720 12% lp ken 13 14 485 19 881 31% lp ken 18 24 1937 18 2449 14% lp pds 10 11 607 11 834 15% lp pds 20 13 1629 13 1877 9% lp truss 13 512 14 951 34% deter3 23 1441 28 1910 16% deter5 13 844 26 1939 51% deter7 18 1242 21 2050 33% fxm2-16 33(3) 8686 47(2) 10771 17% ge 35(3) 8425 34(3) 10021 13% nl 28(5) 7376 32(6) 10891 30% scrs8-2c 17 163 * Stefania Bellavia, Università di Firenze 27 / 28

Numerical results Final comments Work in progress: We are using LMP preconditioner in the solution of linear systems arising in Electrostatic and Electromagnetic problems, in cooperation with A Tamburrino, S Ventre, University of Cassino The matrix H is spd can be decomposed as H = H far + H near, -H near is available and includes the diagonal of H -H far is not available, the action of H far on a vector can be (approximated) computed Stefania Bellavia, Università di Firenze 28 / 28

Numerical results Final comments Work in progress: We are using LMP preconditioner in the solution of linear systems arising in Electrostatic and Electromagnetic problems, in cooperation with A Tamburrino, S Ventre, University of Cassino The matrix H is spd can be decomposed as H = H far + H near, -H near is available and includes the diagonal of H -H far is not available, the action of H far on a vector can be (approximated) computed S B, J Gondzio, B Morini, A matrix-free preconditioner for sparse symmetric positive definite systems and least-squares problems, SISC in corso di stampa J Gondzio, Interior point methods 25 years later, EJOR (2012) Stefania Bellavia, Università di Firenze 28 / 28